1
00:00:00,930 --> 00:00:01,200
Hi.

2
00:00:01,410 --> 00:00:03,210
Now let's start looking at pulling.

3
00:00:03,660 --> 00:00:08,550
We explore the Max Pouliot and its purpose in CNN's So let's get started.

4
00:00:09,450 --> 00:00:16,350
So firstly, pulling is simply the process where we reduce the size or dimensionality of a feature map

5
00:00:16,680 --> 00:00:18,510
as we progress to the network.

6
00:00:19,050 --> 00:00:24,810
Now you may have recalled me seeing that reducing the size isn't always good because we we tend to lose

7
00:00:24,810 --> 00:00:25,410
information.

8
00:00:25,440 --> 00:00:31,800
However, pooling with the maximal operation that we used in CNN's allows us to actually keep most of

9
00:00:31,800 --> 00:00:33,480
the information so we don't lose too much.

10
00:00:33,930 --> 00:00:39,570
And this is important because now it allows us to reduce the number of parameters in one network, which

11
00:00:39,570 --> 00:00:41,520
means that it's a much faster in training.

12
00:00:41,520 --> 00:00:43,080
It's a much smaller size.

13
00:00:44,010 --> 00:00:46,320
It can be influenced faster as well.

14
00:00:46,920 --> 00:00:52,380
So it's a good thing that pooling is around, and this process is also called some sampling or doing

15
00:00:52,380 --> 00:00:52,860
sampling.

16
00:00:52,870 --> 00:00:57,390
So in the literature, if you might, you might come across something where it says we subsample the

17
00:00:57,390 --> 00:00:58,770
feature map or don't sample.

18
00:00:59,190 --> 00:01:01,100
Basically, pooling is what they're doing.

19
00:01:01,110 --> 00:01:03,570
Pooling is the standard name, but mostly everyone uses.

20
00:01:03,570 --> 00:01:05,730
But sometimes you will come across these terms.

21
00:01:06,810 --> 00:01:09,000
So here's an example of max pooling.

22
00:01:09,420 --> 00:01:12,480
Imagine this is a feature map output right here.

23
00:01:13,350 --> 00:01:16,530
We use two parameters to control a maximal operation.

24
00:01:16,920 --> 00:01:19,170
We use a kernel size and straight.

25
00:01:19,830 --> 00:01:27,450
So Q2 by Tucano means that first instance of a sliding window covers this yellow portion here.

26
00:01:27,870 --> 00:01:32,970
The second instance covers the blue portion and the turn of the red and the foot to green.

27
00:01:33,660 --> 00:01:39,210
Now you may have already figured out by looking at this, but the max pool operation simply takes the

28
00:01:39,210 --> 00:01:42,180
largest number in this block in this two by two grid.

29
00:01:42,600 --> 00:01:44,190
So that's one hundred and twenty three.

30
00:01:44,760 --> 00:01:49,710
The other one is going to be 253 and is actually 167 here.

31
00:01:49,710 --> 00:01:52,260
So that's actually a mistake because there's no 167 here.

32
00:01:52,740 --> 00:02:00,300
So sorry about that error in this one here, the largest number is 187, and in this one here it's 165.

33
00:02:00,900 --> 00:02:03,900
So you can see effectively that's how max pooling works.

34
00:02:04,080 --> 00:02:04,860
It's quite simple.

35
00:02:05,940 --> 00:02:07,590
So here's a bit more on max pooling.

36
00:02:08,370 --> 00:02:14,250
Typically, we use to buy two kernels with a straight of two by two that those values, those parameters

37
00:02:14,670 --> 00:02:16,390
tend to give us fairly good results.

38
00:02:16,920 --> 00:02:21,480
And it allows us to reduce the dimensionality by a factor of two to width and height.

39
00:02:21,930 --> 00:02:22,710
Take a look at that.

40
00:02:22,800 --> 00:02:29,250
This was a four by four feature map, and by applying the max boot with two by two and a two stride,

41
00:02:30,060 --> 00:02:32,730
we get a two by two matrix and the output.

42
00:02:33,030 --> 00:02:37,290
So it's half the size, so that means it reduced it by a skill factor of two.

43
00:02:38,320 --> 00:02:44,340
Now, pooling also has the advantage, where it makes our model more invariant to minor transformations

44
00:02:44,760 --> 00:02:46,230
and distortions in our image.

45
00:02:46,590 --> 00:02:49,830
OK, I'll take I'll show you an example of that in the next slide.

46
00:02:50,370 --> 00:02:55,770
And lastly, we can also use something called average pooling or some pooling, which you can imagine

47
00:02:55,770 --> 00:02:56,370
from the name.

48
00:02:56,730 --> 00:03:02,310
Average pooling would just take the average of these phone numbers then, and some pooling would to

49
00:03:02,310 --> 00:03:04,650
see the sum of these phone numbers as well.

50
00:03:05,340 --> 00:03:09,930
So how does max pooling achieve translation and variance?

51
00:03:10,950 --> 00:03:15,480
Let's take a look at this slide so you can see this is let us see in this image here.

52
00:03:15,990 --> 00:03:21,570
We can evolve it with this kernel here with this filter and we get the feature map output here with

53
00:03:21,570 --> 00:03:23,760
this blue portion being represented here.

54
00:03:24,240 --> 00:03:28,680
And then after playing max blue, we get this blue being the largest portion here.

55
00:03:29,250 --> 00:03:33,150
So this is just a hypothetical output, by the way, because we don't have the values in this here.

56
00:03:33,630 --> 00:03:38,400
But just imagine that this was the output of the feature map and this was the output of the max blue

57
00:03:38,490 --> 00:03:39,060
operation.

58
00:03:39,750 --> 00:03:45,210
Now what if we shift our C downward by shifting our C downward?

59
00:03:45,600 --> 00:03:48,990
We can now see that the feature map actually has a different output now.

60
00:03:49,440 --> 00:03:51,870
This pixel turned on as opposed to this one.

61
00:03:52,440 --> 00:03:58,650
So by applying the max pool, though, because it's still in this two by two grid here, the maximal

62
00:03:58,650 --> 00:03:59,640
output is still the same.

63
00:04:00,210 --> 00:04:04,900
So this is how we achieve some translation and variance using the max pool.

64
00:04:06,450 --> 00:04:08,460
So why does max pooling works?

65
00:04:08,940 --> 00:04:14,940
Well, as we saw, the purpose of using max pooling is to reduce the feature map size by half in most

66
00:04:14,940 --> 00:04:19,920
cases, and when I say half, that's when we're using the two by two and a stride of two.

67
00:04:20,580 --> 00:04:21,800
So is that OK?

68
00:04:22,140 --> 00:04:23,640
Well, I showed you before that.

69
00:04:24,090 --> 00:04:28,410
It allows us to have translation and variance without losing information, which is a good thing.

70
00:04:29,220 --> 00:04:30,480
So why is that?

71
00:04:30,510 --> 00:04:36,030
Well, that's because neighboring pixels are usually strongly correlated in the lowest layers, especially

72
00:04:36,540 --> 00:04:42,600
that strong correlation allows us to use a maximal operation to reduce feature size without losing much

73
00:04:42,600 --> 00:04:43,140
information.

74
00:04:43,530 --> 00:04:46,680
Remember, the Flitter part two pixels are less curly.

75
00:04:46,810 --> 00:04:49,170
They are and exclusively are more correlated.

76
00:04:49,170 --> 00:04:54,540
They are as well, and they tend to have a lot of similar associations because that's just how images

77
00:04:54,540 --> 00:04:58,860
are one pixel and immediate pixels near might be the same color.

78
00:04:59,130 --> 00:04:59,550
So which is?

79
00:04:59,760 --> 00:05:04,860
Which is a good thing, which means an export operation doesn't inherently lose much information when

80
00:05:04,860 --> 00:05:06,480
applying it to that image.

81
00:05:07,530 --> 00:05:08,820
So that's what this line says.

82
00:05:08,820 --> 00:05:15,930
Here we can apply reduce the size of the output by sub sampling a pooling, and we don't actually lose

83
00:05:16,290 --> 00:05:17,220
much information.

84
00:05:17,880 --> 00:05:23,520
However, though, just remember, by using a bigger stride and improving lives, it will lead to some

85
00:05:23,520 --> 00:05:24,250
information loss.

86
00:05:24,270 --> 00:05:27,090
You can't do this without having a cost at the end.

87
00:05:27,690 --> 00:05:34,080
So in practice, researchers have found that stride of two and a could on size of to use it usually

88
00:05:34,080 --> 00:05:39,900
works quite well without losing much information and allows the CNN to still learn a lot of information.

89
00:05:40,720 --> 00:05:48,780
Now we'll move on to probably the second to last layer in the neural network on CNN, which is the fully

90
00:05:48,780 --> 00:05:49,440
connected layer.

91
00:05:49,890 --> 00:05:51,660
So I'll see you in the next lesson.

92
00:05:51,870 --> 00:05:52,290
Thank you.
