1
00:00:00,600 --> 00:00:00,870
Hey.

2
00:00:01,080 --> 00:00:03,910
Welcome back to the course in this chapter.

3
00:00:03,960 --> 00:00:09,990
We're going to take a look at convolutions, the operation that allows us to detect features in images.

4
00:00:10,560 --> 00:00:11,550
So let's get started.

5
00:00:12,240 --> 00:00:15,450
So what exactly is a convolution operation?

6
00:00:15,960 --> 00:00:22,440
Well, a confirmation operation is a mathematical term that describes the process of combining two functions

7
00:00:22,440 --> 00:00:23,280
to produce a tube.

8
00:00:23,790 --> 00:00:24,520
It's quite simple.

9
00:00:24,540 --> 00:00:27,030
It's like any other mathematical operation if you think about it.

10
00:00:28,020 --> 00:00:33,450
However, in our situation with the output we want is called the feature map, and we'll get to future

11
00:00:33,450 --> 00:00:34,170
maps shortly.

12
00:00:34,440 --> 00:00:40,950
Just remember that and we use a matrix called a filter or cuno, which is applied to our image.

13
00:00:41,460 --> 00:00:43,110
So the first function?

14
00:00:43,380 --> 00:00:45,870
Remember, we talked about two functions here.

15
00:00:46,230 --> 00:00:48,300
The first function is our image.

16
00:00:48,960 --> 00:00:56,520
It is then combined or multiplied depending on how you want to describe the operation would convulses

17
00:00:56,520 --> 00:01:02,610
the actual correct term involved with ACCOUNL or filter, and that produces our feature map.

18
00:01:02,760 --> 00:01:06,210
So this is just mathematically what I'm describing here.

19
00:01:06,240 --> 00:01:07,560
It's quite simple concept.

20
00:01:08,130 --> 00:01:12,180
So now let's take a look at a very basic view of the operation.

21
00:01:12,840 --> 00:01:15,840
Imagine this is our input image here.

22
00:01:16,350 --> 00:01:19,680
Imagine it's a binary image, so the ones represent white.

23
00:01:20,070 --> 00:01:21,540
The zeros represent black.

24
00:01:21,810 --> 00:01:24,150
So it's a weird, pretty weird image right now.

25
00:01:24,390 --> 00:01:25,590
Doesn't really represent anything.

26
00:01:26,400 --> 00:01:28,140
And this is our filter colonel.

27
00:01:28,620 --> 00:01:33,270
So hypothetically, we're just multiplying our input image again.

28
00:01:33,270 --> 00:01:37,260
So a filter, which, you know, to produce this output or feature map.

29
00:01:37,920 --> 00:01:39,420
So how does this work?

30
00:01:40,440 --> 00:01:46,110
So just to refresh your memory, remember I said a one was white and the zero was black.

31
00:01:46,230 --> 00:01:52,950
Well, in reality, grayscale images are represented from zero to 255, with 255 being white.

32
00:01:52,990 --> 00:01:54,540
So you can see it's all white here.

33
00:01:54,960 --> 00:01:59,760
And if you look closely, you can see as the values get lower, it turns into different shades of gray.

34
00:02:00,120 --> 00:02:04,470
With these values being pure black, which is probably going to be zero, even this, we can't see it

35
00:02:04,470 --> 00:02:05,970
because it's a black on black font.

36
00:02:06,480 --> 00:02:11,280
But nevertheless, that's how images are shown in this example here.

37
00:02:11,640 --> 00:02:14,850
So this is an actual image, except the values are skilled differently.

38
00:02:15,030 --> 00:02:15,330
OK.

39
00:02:16,230 --> 00:02:17,400
Hope I didn't lose you there.

40
00:02:17,630 --> 00:02:18,450
It's not not.

41
00:02:18,450 --> 00:02:19,240
It's not complicated.

42
00:02:19,250 --> 00:02:20,340
It's quite simple so far.

43
00:02:20,820 --> 00:02:23,310
So let's continue with the convolution operation.

44
00:02:24,240 --> 00:02:28,740
So now let's take a look at the actual mathematics behind the convolution operation.

45
00:02:29,190 --> 00:02:30,420
No, it's not complicated.

46
00:02:30,480 --> 00:02:31,530
It's actually quite simple.

47
00:02:31,530 --> 00:02:33,900
It's just multiplication and addition.

48
00:02:34,440 --> 00:02:38,400
And what I'll show you, I'll step through the first calculation here for you.

49
00:02:38,500 --> 00:02:43,950
The first calculation, what I mean is that I'll show you how we calculate the first top left digit

50
00:02:43,950 --> 00:02:45,660
here off of each map.

51
00:02:46,320 --> 00:02:51,180
So as you can see of overlaid this blue matrix onto here.

52
00:02:51,690 --> 00:02:57,660
So remember that it was one one zero here, zero zero one one zero one.

53
00:02:58,140 --> 00:02:59,310
Those values still remain.

54
00:02:59,310 --> 00:03:02,310
It's one one zero zero zero one one zero one.

55
00:03:02,790 --> 00:03:09,450
However, you can see it's now multiplied by digits here, so you just directly map this across here

56
00:03:09,510 --> 00:03:14,580
to the top left corner of this image of the of our image, and you multiply each one.

57
00:03:15,030 --> 00:03:16,650
So you can step to the calculations.

58
00:03:16,710 --> 00:03:18,060
Have a rule by row here.

59
00:03:18,090 --> 00:03:20,760
It doesn't matter what sequence you would do two additions.

60
00:03:21,500 --> 00:03:33,180
So one by zero plus zero by one plus one by zero here, plus one by one plus zero by zero plus zero

61
00:03:33,180 --> 00:03:39,570
by minus one then plus zero by zero then plus zero by one by one.

62
00:03:40,140 --> 00:03:42,030
And then plus one by zero.

63
00:03:42,570 --> 00:03:47,550
And if you just work, this out might might seem a bit long, but it's not that long.

64
00:03:47,880 --> 00:03:55,020
It's just you might try nine different additions, multiple locations here, and it just gives us two.

65
00:03:55,170 --> 00:03:57,720
So that's how we calculate the two here.

66
00:03:58,500 --> 00:04:01,320
So you can just check it here and can see this.

67
00:04:01,320 --> 00:04:06,390
This rule, this column corresponds to the first column here, and this one corresponds to the middle

68
00:04:06,390 --> 00:04:06,780
column.

69
00:04:06,780 --> 00:04:11,370
And this one corresponds to that column just to Sant'Egidio just to make sure that we're doing this

70
00:04:11,370 --> 00:04:11,670
right.

71
00:04:12,510 --> 00:04:15,930
So how do we get the next digit?

72
00:04:16,170 --> 00:04:19,290
Well, as you may have noticed, we simply shifted.

73
00:04:19,770 --> 00:04:20,820
Let's do this again.

74
00:04:22,620 --> 00:04:25,140
The matrix by one step to the right.

75
00:04:25,890 --> 00:04:30,420
And then we multiply it out just again and we get to value one.

76
00:04:30,840 --> 00:04:33,690
So we're multiplying all these values and selling them individually.

77
00:04:34,230 --> 00:04:35,280
You get the value one.

78
00:04:35,760 --> 00:04:36,670
And then so on.

79
00:04:36,690 --> 00:04:38,280
This gives us a value of minus one.

80
00:04:38,300 --> 00:04:42,270
I just left all the calculations there because it was getting a bit tedious to keep writing.

81
00:04:42,840 --> 00:04:48,530
And then we keep going again, again, again, again, again and again.

82
00:04:48,540 --> 00:04:51,990
And that's how we get our feature map output.

83
00:04:52,590 --> 00:04:59,730
So you can see when we multiply a five by five image matrix here, by a tree by tree kernel, we get

84
00:04:59,730 --> 00:04:59,820
a.

85
00:04:59,930 --> 00:05:02,850
Tree by tree feature map suggests that's important for later.

86
00:05:03,000 --> 00:05:05,400
Little lessons to just keep that in mind.

87
00:05:06,870 --> 00:05:08,220
So what are image features?

88
00:05:08,730 --> 00:05:15,000
So as you as you know, well, as you may not know yet, but I'll tell you, no future maps actually

89
00:05:15,000 --> 00:05:15,930
feature detectors.

90
00:05:16,560 --> 00:05:19,470
So why did we just do this compilation convolution?

91
00:05:20,040 --> 00:05:27,000
Because convolution filters or kernels actually act as feature detectors, so to detect special features

92
00:05:27,000 --> 00:05:27,690
and images?

93
00:05:28,140 --> 00:05:29,100
Take a look at this one.

94
00:05:29,910 --> 00:05:36,930
This convolution filter right here, this one one zero zero zero minus one minus one is actually an

95
00:05:36,930 --> 00:05:37,950
edge detector.

96
00:05:38,310 --> 00:05:44,820
So when you multiply this convolution, this scanner by this image and this image is basically an edge,

97
00:05:45,390 --> 00:05:46,500
it's a square.

98
00:05:46,530 --> 00:05:49,290
I just threw it like a rectangle just for illustrative purposes.

99
00:05:49,800 --> 00:05:55,440
But you can see this is a gray area to talk here you and this is bright white area and you can see when

100
00:05:55,440 --> 00:05:56,310
you multiply this.

101
00:05:56,730 --> 00:05:59,250
Our feature map actually is an edge as well.

102
00:05:59,550 --> 00:06:02,460
You can see that it translates that edge directly.

103
00:06:02,880 --> 00:06:06,390
This means that it worked as an edge to edge the detector here.

104
00:06:06,990 --> 00:06:12,210
And these are some examples of other feature detectors that can be hardcoded here.

105
00:06:12,600 --> 00:06:20,220
However, just so you know in CNN's Our Kernels Aren't Hardcoded, they're initialized at randomly and

106
00:06:20,850 --> 00:06:24,660
see during the training process, it learns the weights of different filters.

107
00:06:24,660 --> 00:06:29,610
So that's the cool part, which we'll get into shortly, but that's a cool part about CNN's.

108
00:06:31,050 --> 00:06:38,010
So these are some examples of basically kernels that were that we extracted from a tree and CNN.

109
00:06:38,370 --> 00:06:42,090
And you can see this one is looking for like this sort of edge pattern.

110
00:06:42,390 --> 00:06:44,280
This one's looking for like almost a stripe.

111
00:06:44,580 --> 00:06:48,480
This one is looking for a horizontal stripe, but someone's looking for a checkerboard pattern.

112
00:06:48,930 --> 00:06:50,640
This one is looking for a pink blob.

113
00:06:51,060 --> 00:06:56,010
So you can see feature detectors being be able to detect many different features.

114
00:06:56,340 --> 00:07:02,790
So imagine you're now using all of these different features combinations of these to detect what a dog

115
00:07:02,790 --> 00:07:03,720
is or cats.

116
00:07:04,170 --> 00:07:07,560
You see how it makes sense now you can see how powerful CNN's are.

117
00:07:08,430 --> 00:07:15,210
So this is an example of just sliding a window across just to get the output of this feature detector

118
00:07:15,330 --> 00:07:18,870
so you can see how it looks and just does it again.

119
00:07:18,900 --> 00:07:19,980
So let's take a look at it.

120
00:07:21,510 --> 00:07:24,210
And that's the feature map it produces from the output.

121
00:07:26,020 --> 00:07:28,810
So how do we calculate the feature map size?

122
00:07:29,230 --> 00:07:34,720
Remember, I said this was a five by five matrix, which you can see here tree by tree kernel, which

123
00:07:34,720 --> 00:07:37,560
you can see here, and a tree by tree output.

124
00:07:37,570 --> 00:07:40,330
But how do you know this size is going to be a tree burial?

125
00:07:40,340 --> 00:07:45,580
But we can have images of many different sizes, as well as many different sized kernels.

126
00:07:46,090 --> 00:07:48,910
So here's a simple formula to get the output size.

127
00:07:49,330 --> 00:07:52,810
It's an NSD dimension of the image here.

128
00:07:53,070 --> 00:07:53,400
Know we.

129
00:07:53,490 --> 00:07:56,380
We assume it's a square images for CNN.

130
00:07:56,800 --> 00:08:01,240
It makes the calculation and process much, much faster, and it doesn't affect anything.

131
00:08:01,240 --> 00:08:08,650
So even if you resize a different aspect ratio image, it still maintains its silkscreen and lens quite

132
00:08:08,650 --> 00:08:08,950
well.

133
00:08:09,670 --> 00:08:11,860
Even if you resize and change the aspect ratio.

134
00:08:12,190 --> 00:08:15,940
So just remember all images and CNN's square images.

135
00:08:17,050 --> 00:08:25,990
And F is the M colonel or filter dimensions, so we have a tree by tree again, it's a square.

136
00:08:26,860 --> 00:08:31,800
And plus one, so we got five minus three plus one gives us three.

137
00:08:32,170 --> 00:08:36,220
And that's that's that's the dimensions of our feature feature map.

138
00:08:37,630 --> 00:08:38,980
So that's it for this lesson.

139
00:08:39,400 --> 00:08:44,740
In the next lesson, we'll spend some time discussing what feature detectors are and how they help us

140
00:08:44,740 --> 00:08:50,020
classify images, which you might already have some suspicion about it now given this lesson.

141
00:08:50,500 --> 00:08:52,540
Thank you, and I'll see you in the next section.