1
00:00:00,810 --> 00:00:01,100
Hey.

2
00:00:01,260 --> 00:00:09,060
So now let's take a look at putting, which allows us to manipulate the feature map size as we progress

3
00:00:09,060 --> 00:00:12,420
through the convolutional neural network, which you'll see shortly.

4
00:00:13,260 --> 00:00:15,120
So remember this lesson?

5
00:00:15,630 --> 00:00:22,170
Remember when we have an image image of the size five by five, and then we performed a convolution

6
00:00:22,170 --> 00:00:23,670
with a tree by tree filter.

7
00:00:24,180 --> 00:00:26,070
It gives us a tree by tree up.

8
00:00:26,070 --> 00:00:31,260
But I remember there was a formula we can calculate the output size by and notice the important thing

9
00:00:31,590 --> 00:00:34,830
that the output produced is smaller than the input.

10
00:00:34,890 --> 00:00:36,300
Remember, this is five by five.

11
00:00:36,690 --> 00:00:38,100
And this is tree by tree.

12
00:00:38,700 --> 00:00:40,740
Is that something that we want?

13
00:00:40,740 --> 00:00:41,580
Is that desirable?

14
00:00:42,090 --> 00:00:46,770
Well, think about this in almost all convolutional neural networks.

15
00:00:47,220 --> 00:00:56,370
There's a sequence of convolutions like in a chain, and you can imagine this chin or sequence of convolutional

16
00:00:56,550 --> 00:00:57,960
convolution blocks.

17
00:00:58,560 --> 00:01:05,400
They will actually decrease the image size if you map size as it progresses through the network.

18
00:01:05,850 --> 00:01:08,050
And that's not always desirable.

19
00:01:08,070 --> 00:01:10,410
In fact, in most cases, that's not desirable.

20
00:01:11,640 --> 00:01:14,700
These are the blocks completion blocks in this example here.

21
00:01:16,120 --> 00:01:22,090
So the week we can see that consecutive convolution list keeps shrinking the output.

22
00:01:22,570 --> 00:01:25,300
But is there a way we can preserve our image size?

23
00:01:25,780 --> 00:01:33,640
What if we took the image this five by five image, but we potted it with zeros in all dimensions so

24
00:01:33,640 --> 00:01:37,600
you can see as a zero on top at a right, at the bottom and on the left.

25
00:01:38,110 --> 00:01:40,390
This now gives us a seven by seven matrix.

26
00:01:40,750 --> 00:01:41,380
Interesting.

27
00:01:42,010 --> 00:01:47,530
So what if we performed the compilation on this and you can see.

28
00:01:47,740 --> 00:01:50,950
Keep going, keep going, keep going and move along.

29
00:01:51,940 --> 00:01:55,870
We are going to produce a feature map size that is No.

30
00:01:55,880 --> 00:01:57,250
Five by five.

31
00:01:58,040 --> 00:01:59,860
Now what else was five five five?

32
00:02:00,820 --> 00:02:02,800
Oh, input image was five by five.

33
00:02:03,130 --> 00:02:03,850
You can see it here.

34
00:02:03,850 --> 00:02:05,410
That's the pink region here.

35
00:02:06,040 --> 00:02:08,320
And you can see what a formula it works out to be.

36
00:02:08,590 --> 00:02:09,160
Seven.

37
00:02:09,160 --> 00:02:09,910
Minus three.

38
00:02:09,970 --> 00:02:13,840
The feature map the filter size sorry plus one, which is five.

39
00:02:14,410 --> 00:02:20,620
So this is a way we can use something called putting, which is simply just putting the input image

40
00:02:20,620 --> 00:02:24,010
with zeros around it to preserve the image size.

41
00:02:24,610 --> 00:02:26,290
So why do we use padding?

42
00:02:26,300 --> 00:02:31,450
I mean, as I said, it's it's not always desirable to have the image shrink, but why is that so?

43
00:02:31,840 --> 00:02:39,380
Well, for very deep networks and when we say very deep networks, deep networks store a lot of information

44
00:02:39,380 --> 00:02:46,480
you can imagine, like how much filters and parameters and activations all stores.

45
00:02:46,900 --> 00:02:54,730
So if you want to keep a very deep network for a very complex dataset, you want to you want to basically

46
00:02:54,730 --> 00:03:00,370
use padding to preserve the image size or the feature map size as it progresses to the network.

47
00:03:01,120 --> 00:03:04,960
And now there's also a second reason why we use padding.

48
00:03:05,830 --> 00:03:09,190
Remember how we passed the convolution filter across the image?

49
00:03:09,310 --> 00:03:10,640
Remove it from left to right.

50
00:03:11,260 --> 00:03:15,040
Well, think about something with pixels at the edges.

51
00:03:15,340 --> 00:03:21,370
They only get touched once or minimal less amount of times than the pixels in the middle.

52
00:03:22,000 --> 00:03:27,640
That means they contribute less to the output of feature maps and that that means we're throwing away

53
00:03:27,640 --> 00:03:28,720
information from them.

54
00:03:29,200 --> 00:03:30,310
And that's not desirable.

55
00:03:30,670 --> 00:03:37,720
However, we're putting because you can imagine this this image being padded with zeros all around the

56
00:03:37,720 --> 00:03:38,410
sliding window.

57
00:03:38,410 --> 00:03:45,670
No touches these edge pixels a lot more often, so it helps the neural network learn a more balanced

58
00:03:45,700 --> 00:03:46,930
information as well.

59
00:03:48,010 --> 00:03:50,470
So that concludes this chapter on padding.

60
00:03:50,980 --> 00:03:54,100
Now let's move on to Stryde, which is a very simple concept.

61
00:03:54,340 --> 00:03:55,870
So I'll see you in the next section.

62
00:03:56,020 --> 00:03:56,440
Thank you.