﻿1
00:00:01,090 --> 00:00:07,800
‫The last foundational concept we need to understand before we start building our CNN model in the software,

2
00:00:07,840 --> 00:00:10,240
‫it is that of a pooling layer.

3
00:00:11,720 --> 00:00:17,640
‫And since you know how convolutional layers work, pooling layer is going to be easy to understand.

4
00:00:19,710 --> 00:00:27,420
‫We use pulling layer in our network to reduce the computational load, memory usage and the number of

5
00:00:27,420 --> 00:00:29,240
‫parameters to be estimated.

6
00:00:31,420 --> 00:00:39,370
‫Just like in Convolutional layer each neuron in a pooling layer  also has a small, rectangular, receptive

7
00:00:39,370 --> 00:00:39,760
‫field.

8
00:00:40,870 --> 00:00:44,710
‫We have to define the size of this rectangular, receptive field.

9
00:00:45,840 --> 00:00:49,200
‫The stride, the padding type, just like before

10
00:00:50,610 --> 00:00:54,590
‫However, pooling neurons have no weight.

11
00:00:56,430 --> 00:01:04,800
‫All they do is aggregate the input using an aggregate function such as Max or mean in this image.

12
00:01:06,150 --> 00:01:09,090
‫The layer on top is that of a pooling layer.

13
00:01:10,640 --> 00:01:16,760
‫You can see that each neuron is looking at a two by two set of neurons on the lower layer.

14
00:01:18,470 --> 00:01:23,640
‫The first neuron is looking at these four cells which have a red boundary.

15
00:01:24,980 --> 00:01:29,280
‫Next is looking at these four, which are dotted blue boundary.

16
00:01:30,740 --> 00:01:32,630
‫This means that the stride here is two.

17
00:01:34,280 --> 00:01:41,180
‫By default, the stride in a pooling layer is same as the width or the deceptive field.

18
00:01:45,190 --> 00:01:50,560
‫Now, if we use max function or max pooling, as it is called.

19
00:01:52,140 --> 00:01:58,500
‫Only the maximum input value, out of the four values in this receptive field makes it to the next

20
00:01:58,500 --> 00:01:58,730
‫Layer

21
00:01:59,490 --> 00:02:01,350
‫The other three inputs are dropped.

22
00:02:02,370 --> 00:02:11,830
‫For example, if these are the four output of these four cells, one, five, three, two, then out of

23
00:02:11,830 --> 00:02:14,930
‫these four, five is the largest value.

24
00:02:16,140 --> 00:02:22,110
‫So this neuron in the top layer will have five as output.

25
00:02:25,460 --> 00:02:26,570
‫So it is very simple.

26
00:02:26,870 --> 00:02:29,110
‫No weights, no filters to be trained.

27
00:02:29,780 --> 00:02:36,110
‫Just find the maximum value out of the four values that it sees and it outputs that.

28
00:02:39,040 --> 00:02:41,710
‫If you look at the gif at the bottom.

29
00:02:43,730 --> 00:02:47,750
‫If this is the feature map at which are Max pulling layer is looking at.

30
00:02:49,130 --> 00:02:51,290
‫For the first squared of four

31
00:02:52,640 --> 00:02:53,170
‫Neurons.

32
00:02:53,810 --> 00:02:55,430
‫The largest value is six.

33
00:02:56,270 --> 00:03:01,660
‫So we enter six here in the first cell of the max pooling layer.

34
00:03:03,260 --> 00:03:08,030
‫Similar to Max pooling, there is average pooling, in average pooling.

35
00:03:08,390 --> 00:03:10,970
‫We find out the mean of the four values.

36
00:03:11,660 --> 00:03:18,770
‫So if we are doing average pulling, it will be the average of these four values, six, six, four

37
00:03:18,860 --> 00:03:19,610
‫and five.

38
00:03:20,510 --> 00:03:22,630
‫So that averages five point two five.

39
00:03:24,040 --> 00:03:29,980
‫In the next stride, we look at the next forcefield and we find out their max and the average value.

40
00:03:30,970 --> 00:03:32,470
‫Those are stored in the next year on.

41
00:03:34,610 --> 00:03:38,600
‫Now, also notice that since we are using a stride of 2 here.

42
00:03:40,170 --> 00:03:45,210
‫The pooling layer has half the width and half the height of previously layer

43
00:03:47,840 --> 00:03:52,180
‫You can now imagine how this will reduce the computations and memory usage.

44
00:03:54,050 --> 00:03:55,610
‫Instead of pooling layer.

45
00:03:56,700 --> 00:03:59,580
‫If we had the next convolutional lives straight away.

46
00:04:00,690 --> 00:04:07,210
‫So that layer would have six into eight. Six as height and eight as width

47
00:04:07,680 --> 00:04:08,780
‫So six into eight

48
00:04:09,300 --> 00:04:11,700
‫Forty eight input neurons.

49
00:04:13,060 --> 00:04:18,700
‫So each neuron in the next layer would have forty eight parameters to be trained.

50
00:04:20,600 --> 00:04:23,540
‫But if we have this pooling layer on top.

51
00:04:25,650 --> 00:04:29,930
‫Then each neuron gets only three into four.

52
00:04:30,150 --> 00:04:32,380
‫That is 12 input neurons.

53
00:04:33,300 --> 00:04:36,770
‫So only 12 parameters per neuron are to be trained.

54
00:04:37,890 --> 00:04:40,590
‫So instead of 48, we get 12 parameters

55
00:04:40,890 --> 00:04:46,110
‫To be trained. So the amount of computation goes down significantly.

56
00:04:48,030 --> 00:04:50,880
‫So in this example, we saw that we can do both.

57
00:04:50,970 --> 00:04:58,380
‫Max pooling and mean pulling, but commonly Max pooling works better than the alternative options

58
00:04:58,650 --> 00:05:04,080
‫because it highlights the main features instead of averaging them out.

59
00:05:06,160 --> 00:05:11,020
‫So in our model, most often we'll be using Max, pulling only.

60
00:05:12,590 --> 00:05:13,100
‫That's it.

61
00:05:13,400 --> 00:05:15,260
‫This is the concept behind Max pooling.

62
00:05:16,370 --> 00:05:17,060
‫It is a trade off

63
00:05:17,810 --> 00:05:25,250
‫We give away some extra information in the previous layer to reduce the computational load on our system.

64
00:05:27,200 --> 00:05:30,580
‫I will highlight this impact on computation when we write the code.

