1
00:00:11,660 --> 00:00:16,400
In this lecture we are going to switch gears a bit and discuss the kind of data we'll be working with

2
00:00:16,400 --> 00:00:17,740
next.

3
00:00:17,750 --> 00:00:23,510
Previously we discussed topics related to the model itself how a neural net will get structured and

4
00:00:23,510 --> 00:00:25,580
the different kinds of functions involved.

5
00:00:25,580 --> 00:00:26,800
That's the model.

6
00:00:26,990 --> 00:00:32,990
But just as important is the data where deep learning and knowing that books really shine is on unstructured

7
00:00:32,990 --> 00:00:36,150
data such as images sound and text.

8
00:00:36,170 --> 00:00:45,960
And so for the next code example we'll be moving on to classifying images.

9
00:00:45,980 --> 00:00:48,310
Of course this presents a challenge.

10
00:00:48,380 --> 00:00:49,970
Previously you learn my rule.

11
00:00:49,970 --> 00:00:54,990
Machine learning is nothing but a geometry problem and all data is the same.

12
00:00:55,010 --> 00:01:00,500
We looked at some pretty simple examples with tabular data such as predicting whether a student would

13
00:01:00,500 --> 00:01:05,750
pass or fail an exam based on how many hours they spent studying and how many hours they spent playing

14
00:01:05,750 --> 00:01:07,310
video games.

15
00:01:07,310 --> 00:01:13,610
Another example was predicting salary based on factors such as years of experience and degree type.

16
00:01:13,610 --> 00:01:17,070
The question now is if your input data X is an image.

17
00:01:17,240 --> 00:01:18,690
How does that fit into this picture.

18
00:01:23,830 --> 00:01:27,090
As usual we're going to start by doing the dumbest thing possible.

19
00:01:27,550 --> 00:01:33,080
But first we have to understand how images are represented in a computer in the first place.

20
00:01:33,160 --> 00:01:39,130
So if we just look at an image let's say the famous Lena image we can see that the image has two properties

21
00:01:39,490 --> 00:01:40,420
height and width.

22
00:01:41,230 --> 00:01:45,610
Therefore your first thought might be let's store this in a matrix.

23
00:01:45,610 --> 00:01:49,270
The next question is what do we store in this matrix.

24
00:01:49,270 --> 00:01:52,680
Well suppose I have a matrix A and I'm looking at the position.

25
00:01:52,700 --> 00:01:58,600
AJ This refers to the eighth row in the J column of the image.

26
00:01:58,600 --> 00:02:05,380
You might think OK so AIG should store the color that appears at this exact position of the image at

27
00:02:05,380 --> 00:02:06,090
position.

28
00:02:06,110 --> 00:02:16,360
RJ The next problem we have to consider is how do we store colors.

29
00:02:16,410 --> 00:02:21,900
In fact there are quite a few ways to store colors and computers but in this course will only discuss

30
00:02:21,900 --> 00:02:27,060
the most common scheme called RG B stands for red green blue.

31
00:02:27,060 --> 00:02:29,340
Now I want you to think back to kindergarten.

32
00:02:29,760 --> 00:02:31,870
Yes all the way back to kindergarten.

33
00:02:31,920 --> 00:02:36,690
I bet you didn't know that what you learned there would come in handy now if you forgot what you learned

34
00:02:36,690 --> 00:02:37,320
in kindergarten.

35
00:02:37,320 --> 00:02:39,800
Well I'm not sure if I can help you.

36
00:02:40,020 --> 00:02:46,200
You recall that there's this idea of primary colors primary colors or a set of colors from which we

37
00:02:46,200 --> 00:02:52,950
can form other colors by combining them a common set of Primary Colors is red blue and yellow although

38
00:02:52,980 --> 00:02:56,970
red green and blue also makes a set of primary colors.

39
00:02:56,970 --> 00:03:01,470
So what do you get if you mix red and blue you get purple.

40
00:03:01,470 --> 00:03:05,250
What do you get if he makes a blue and yellow you get green.

41
00:03:05,250 --> 00:03:07,200
What happens if you mix red and yellow.

42
00:03:07,200 --> 00:03:08,550
You get orange.

43
00:03:08,880 --> 00:03:15,060
Now this one you might not know but if you mix all the primary colors together you get white in fact

44
00:03:15,150 --> 00:03:20,900
from the primary colors it's possible to combine them in different proportions to get any color.

45
00:03:25,860 --> 00:03:28,220
So how do we represent colors.

46
00:03:28,230 --> 00:03:30,430
Well in fact it's not just a number.

47
00:03:30,510 --> 00:03:32,430
It's a set of three numbers.

48
00:03:32,610 --> 00:03:38,280
This set of three numbers tells us how much red how much green and how much blue.

49
00:03:38,610 --> 00:03:48,000
By mixing these colors in different proportions we can get different colors.

50
00:03:48,070 --> 00:03:50,860
So what does this tell us about how to represent images.

51
00:03:51,580 --> 00:03:57,340
Well in addition to height and width we need another dimension always of size three which represents

52
00:03:57,340 --> 00:03:59,570
the three components of color.

53
00:03:59,650 --> 00:04:03,460
So our image will actually be a three dimensional tensor AIG.

54
00:04:03,480 --> 00:04:04,480
Okay.

55
00:04:04,690 --> 00:04:10,420
This is the component of the image representing the eighth row J column and the K of color

56
00:04:15,530 --> 00:04:20,990
the next important thing we have to discuss about how images are stored in computers has to do with

57
00:04:20,990 --> 00:04:23,410
quantized fashion physically.

58
00:04:23,420 --> 00:04:27,530
We know that color is light and we measure it with light intensity.

59
00:04:27,530 --> 00:04:33,850
It's a physical value in the real world and therefore it's pretty much continuous if it's continuous.

60
00:04:33,860 --> 00:04:38,770
That means it has an infinite number of possible values at the same time.

61
00:04:38,780 --> 00:04:44,780
We know that computers don't have infinite precision and even if they did we might not want to make

62
00:04:44,780 --> 00:04:49,160
use of it because that would make our image files take up lots of space.

63
00:04:49,160 --> 00:04:55,670
Well it just so happens that people have already figured out that eight bits is good enough eight bits

64
00:04:55,670 --> 00:05:02,510
gives us two to the eight possible values which is equal to 256 possible values.

65
00:05:02,510 --> 00:05:09,920
These are encoded with the numbers zero up to 255 so in total considering that we have the red green

66
00:05:09,920 --> 00:05:15,200
and blue channels this gives us two to the power eight times due to the power eight times due to the

67
00:05:15,200 --> 00:05:15,460
power.

68
00:05:15,470 --> 00:05:21,520
Eight possible colors which is equal to about sixteen point eight million now.

69
00:05:21,520 --> 00:05:27,230
Using this we can actually calculate how much space it would take to store a RAW image.

70
00:05:27,250 --> 00:05:30,630
Suppose we have an image of size 500 by 500.

71
00:05:30,850 --> 00:05:37,020
Then the number of bits it requires would be five hundred times five hundred times three times eight.

72
00:05:37,060 --> 00:05:39,370
That's six million bits.

73
00:05:39,370 --> 00:05:46,450
If we divide that by eight to get bites we get seven hundred fifty thousand bytes that's equal to about

74
00:05:46,620 --> 00:05:51,430
seven hundred thirty two kilobytes which is almost one megabyte.

75
00:05:51,490 --> 00:05:57,550
You might realize that this is quite large for a 500 by 500 image which is why we have image compression

76
00:05:57,550 --> 00:06:02,740
algorithms such as jpeg that help us store image files in a much more compact format.

77
00:06:07,900 --> 00:06:13,090
As a side note for those of you who are web developers this is where hex colors come from.

78
00:06:13,090 --> 00:06:17,020
These also happen to use the same color scheme and quantification.

79
00:06:17,200 --> 00:06:23,950
So we have three bytes with each byte representing the value for the red green and blue color channels.

80
00:06:23,950 --> 00:06:31,140
Importantly one byte can be represented with two hex digits hex digits go from 0 up to 9 and then a

81
00:06:31,240 --> 00:06:32,110
up to F.

82
00:06:32,350 --> 00:06:40,190
So it's a base 16 at number system to base 16 numbers give us 256 possible values.

83
00:06:40,270 --> 00:06:48,920
So that's why we need to base 16 numbers or 2 hex numbers to represent one by then a color which is

84
00:06:48,920 --> 00:06:53,870
represented by 3 of these bytes would require 6 hex digits.

85
00:06:53,870 --> 00:06:59,900
So that's why when you're recording an HD CML colors are represented by six digits where the digits

86
00:06:59,900 --> 00:07:07,810
can be 0 up to 9 and a up to f.

87
00:07:07,930 --> 00:07:12,100
Now we can simplify this a little bit if we have an image that does not have color.

88
00:07:12,100 --> 00:07:18,790
We call these grayscale images because each pixel can only be black or white or something in between.

89
00:07:18,790 --> 00:07:25,930
Typically we assign black the value of zero in white the value of 255 any value in between is a different

90
00:07:25,930 --> 00:07:32,440
shades of grey and because we only need one value instead of three grayscale images are stored in two

91
00:07:32,440 --> 00:07:35,650
dimensional arrays rather than three dimensional arrays

92
00:07:40,790 --> 00:07:41,960
in map LA lib.

93
00:07:41,960 --> 00:07:47,930
If you tried to use the I.M. show function to plot a grayscale image it will end up assigning its own

94
00:07:47,930 --> 00:07:48,990
color scheme.

95
00:07:49,130 --> 00:07:51,790
But note that this is not the true color of the image.

96
00:07:51,800 --> 00:07:57,620
This is just a heat map where we assign one color such as red to the high values and then another color

97
00:07:57,620 --> 00:08:00,120
such as Blue to the low values.

98
00:08:00,170 --> 00:08:10,760
So that's why you have to pass in C map equals grey when you want to show a grayscale image.

99
00:08:10,780 --> 00:08:16,240
The last thing I want to mention about how images are represented in computers is that sometimes we

100
00:08:16,240 --> 00:08:21,240
don't use 8 bit unsigned integers from 0 to 255.

101
00:08:21,400 --> 00:08:25,650
If you recall neural networks don't like values on such a large scale.

102
00:08:25,720 --> 00:08:31,060
So for neuron that works it's more convenient to scale image values to be floating points between 0

103
00:08:31,060 --> 00:08:37,360
and 1 0 represents no intensity and a 1 represents full intensity.

104
00:08:37,480 --> 00:08:44,080
And yes the astute among you might notice that if we represent images as values between 0 and 1 these

105
00:08:44,080 --> 00:08:50,290
are not standardized values because they are not centered around 0 there are always exceptions and deep

106
00:08:50,290 --> 00:08:52,600
learning for images.

107
00:08:52,600 --> 00:08:58,870
This representation is particularly convenient because it's possible to interpret these values as probabilities

108
00:09:04,020 --> 00:09:04,790
as mention.

109
00:09:04,830 --> 00:09:06,600
There are always exceptions and deep learning.

110
00:09:06,600 --> 00:09:08,510
So here is another one.

111
00:09:08,550 --> 00:09:13,830
One of the most famous neuron networks today is called the VEGF which is used for many computer vision

112
00:09:13,830 --> 00:09:20,300
applications such as image classification transfer learning style transfer and object detection.

113
00:09:20,750 --> 00:09:27,690
Fiji also previously won the image that contest in multiple categories which as you recall is a competition

114
00:09:27,690 --> 00:09:34,130
held every year for research groups to present state of the art image recognition models.

115
00:09:34,140 --> 00:09:40,140
The interesting thing about Fiji is that the original author is did not scale the input data but they

116
00:09:40,140 --> 00:09:43,160
did shift it by its meaning across each colour channel.

117
00:09:43,380 --> 00:09:49,670
Thus for visually images are centered around zero but the range of values is still 256

118
00:09:54,920 --> 00:09:55,430
so far.

119
00:09:55,430 --> 00:09:58,910
We've discussed how images are represented at a computer.

120
00:09:58,910 --> 00:10:06,100
However this still begs the question how is an image represented as input into a neuron that we're if

121
00:10:06,100 --> 00:10:12,850
you recall our input data is usually represented by a matrix X of size n by D where n is the number

122
00:10:12,850 --> 00:10:19,210
of samples and d is the number of features but this doesn't make sense in our case if we have an image

123
00:10:19,210 --> 00:10:19,800
sample.

124
00:10:19,810 --> 00:10:25,210
How can it have the features and image itself is a three dimensional object.

125
00:10:25,210 --> 00:10:30,640
Therefore it would seem that in order to represent a data set of images we would need to have a four

126
00:10:30,640 --> 00:10:36,010
dimensional tensor of shape and by H by W by C where n is the number of samples.

127
00:10:36,160 --> 00:10:41,650
H is the image height W is the image with and c is the number of colour channels which is always 3.

128
00:10:43,750 --> 00:10:49,030
Now this is a little bit of foreshadowing since this is what we'll do with later but for now we are

129
00:10:49,030 --> 00:10:50,550
going to do something simpler

130
00:10:55,700 --> 00:10:56,690
to keep things simple.

131
00:10:56,690 --> 00:11:01,670
Let's say we have a black and white image of size three by three a very tiny image.

132
00:11:01,850 --> 00:11:07,160
Well we can just number each pixel in a zigzag fashion going from left to right and top to bottom.

133
00:11:07,250 --> 00:11:13,140
So we have pixel one pixel two pixel three all the way to pixel nine in the bottom right corner.

134
00:11:13,220 --> 00:11:19,500
Well if we want to represent this as a vector why not just line them up in a row and call it a vector.

135
00:11:19,790 --> 00:11:25,690
So we have a vector of size 9 containing pixel 1 pixel 2 pixel 3 all the way up to pixel 9.

136
00:11:25,940 --> 00:11:27,920
We call this process flattening

137
00:11:33,130 --> 00:11:33,880
in this way.

138
00:11:33,880 --> 00:11:40,790
We can still store all of our images in an end by the array here and as the number of samples and is

139
00:11:40,900 --> 00:11:48,160
height times with times color each row of this matrix is still one sample and each sample just happens

140
00:11:48,160 --> 00:11:50,330
to be a flat in the image.

141
00:11:50,440 --> 00:11:55,400
The features of the image just happened to be the individual pixel values.

142
00:11:55,690 --> 00:12:00,580
And so here we encounter again this idea that all data is the same.

143
00:12:00,580 --> 00:12:06,940
We saw previously how we can represent all sorts of different tabular data as an end by the array from

144
00:12:07,000 --> 00:12:10,330
breast cancer detection to Moore's Law and so on.

145
00:12:10,330 --> 00:12:16,270
Now we can represent images in this format as well so it is a truly powerful concept.