1
00:00:00,060 --> 00:00:05,250
Hi and welcome to the chapter where we finally put all the pieces together and build a CNN.

2
00:00:05,640 --> 00:00:10,830
So you would have seen previously all of the building blocks that we use remember to convalesce the

3
00:00:10,830 --> 00:00:15,900
max spool do redo fully connected layer in the soft Mac's layers.

4
00:00:16,290 --> 00:00:20,100
We're now going to stack everything together and create a simple CNN.

5
00:00:20,580 --> 00:00:21,810
And this is what it looks like.

6
00:00:22,110 --> 00:00:26,670
No, I won't go into too much detail with this diagram because this diagram can confuse you a bit,

7
00:00:27,180 --> 00:00:29,520
but it's an actual, real CNN here.

8
00:00:30,060 --> 00:00:35,670
What is tells us here this is the input image, which is the 28 by 28 Grayscale because it only has

9
00:00:35,670 --> 00:00:39,960
one in the color depth and you can see the tiny little filter here.

10
00:00:40,230 --> 00:00:45,330
This is our filter colonel here, and it's applied to this image.

11
00:00:45,330 --> 00:00:49,080
Here we use we're using 282 filters in this example here.

12
00:00:49,440 --> 00:00:55,500
And so we have 32 of these feature map outputs here and the feature map output size 26 by 26.

13
00:00:56,130 --> 00:00:58,650
And then we have another convolutional layer here.

14
00:00:59,130 --> 00:01:03,180
So we apply the filters again to this layer and we get a 24 by 24.

15
00:01:03,180 --> 00:01:08,520
And this time we use were using 64 filters to produce 64 feature maps.

16
00:01:09,030 --> 00:01:15,720
And then we applied a max layer here, and then we flatten that max below connected to an interim fully

17
00:01:15,720 --> 00:01:16,350
connected layer.

18
00:01:16,350 --> 00:01:23,670
So we have this one by nine thousand two hundred and sixty layer connected to a one by 128 fully connected.

19
00:01:24,180 --> 00:01:27,060
And then that connects to the final 10 output nodes.

20
00:01:27,450 --> 00:01:30,510
And then out of a lot of that node, we'll have to solve Max Libya.

21
00:01:31,170 --> 00:01:32,240
So let's take a look at this.

22
00:01:32,280 --> 00:01:34,620
Now this is a full year CNN.

23
00:01:35,370 --> 00:01:42,330
So a lot of times people may get confused counting Marilou as a layer or something to flatten operations

24
00:01:42,330 --> 00:01:44,810
as a layer of accounting max pool as layer.

25
00:01:44,820 --> 00:01:52,410
Technically those alleys, yes, but in our new modern terminology, those aren't really considered

26
00:01:52,410 --> 00:01:52,680
layers.

27
00:01:52,680 --> 00:01:56,190
They're considered parts of the convolutional layers.

28
00:01:56,310 --> 00:02:00,600
So control is associated with our activation function.

29
00:02:00,840 --> 00:02:01,980
So it comes as one.

30
00:02:02,340 --> 00:02:05,210
So everything in this box here is a layer.

31
00:02:05,220 --> 00:02:09,690
So we have one to three folios in the CNN.

32
00:02:10,350 --> 00:02:14,460
So it's a fully a deep CNN that's used to classify handwritten digits.

33
00:02:15,300 --> 00:02:20,820
So here's another representation, and this is a representation I prefer because it helps you visually

34
00:02:20,820 --> 00:02:22,980
understand what's happening in the CNN.

35
00:02:23,160 --> 00:02:26,010
So you can see this is the input image right here.

36
00:02:26,190 --> 00:02:30,510
These are the filter, the tributary filters applied, and we apply 22 filters.

37
00:02:30,510 --> 00:02:35,160
So we get a 26 by 26 by 22 feature map output.

38
00:02:35,640 --> 00:02:40,020
And then the second conflict has a tree by tree filter again, and we apply it here.

39
00:02:40,380 --> 00:02:42,600
And then you can see it as a two by two filter here.

40
00:02:42,870 --> 00:02:44,970
That's the max blue filter size.

41
00:02:44,970 --> 00:02:47,260
So that's how we get 12 by 12 or 64.

42
00:02:47,280 --> 00:02:48,150
It's half that.

43
00:02:48,660 --> 00:02:55,170
Remember this feature map size of 24 by 24 by 64 and by applying maximum, we get an output size of

44
00:02:55,170 --> 00:02:57,800
12 by 12 by 664.

45
00:02:58,470 --> 00:03:05,190
And then we flatten this multiple layer and then we have this interim filter connected at 128 new layer

46
00:03:05,190 --> 00:03:05,460
here.

47
00:03:05,940 --> 00:03:08,340
Then that's connected to the final 10 outputs.

48
00:03:08,880 --> 00:03:14,790
So you can see in this table here, this is the depth, the width, the height and the filter size of

49
00:03:14,790 --> 00:03:15,390
each layer.

50
00:03:15,720 --> 00:03:18,960
Just in case you want to go through it in detail and understand.

51
00:03:19,050 --> 00:03:24,240
So feel free to pause this slide and inspect each of one to make sure it makes sense to you.

52
00:03:26,250 --> 00:03:31,740
So let's take a look at how we calculate the output size of the convolutional first convolutional layer.

53
00:03:32,340 --> 00:03:34,020
So that's the one in pink right here.

54
00:03:34,500 --> 00:03:39,420
So you can see we had a 28 by 28 image with a tree by tree filter.

55
00:03:39,990 --> 00:03:41,940
So now we can use this formula here.

56
00:03:42,330 --> 00:03:46,050
This one that gives us the max, the feature map output size.

57
00:03:46,650 --> 00:03:48,870
So we know we use 32 filters.

58
00:03:49,440 --> 00:03:51,030
We use a stride of one.

59
00:03:51,150 --> 00:03:52,270
Putting is zero.

60
00:03:52,290 --> 00:03:53,100
So it's not used.

61
00:03:53,520 --> 00:03:55,030
And a max boost, right, is two.

62
00:03:55,050 --> 00:03:57,330
That's for literally as we consider maximum.

63
00:03:57,840 --> 00:03:59,280
For now, we just use this.

64
00:03:59,610 --> 00:04:01,320
So N is 28.

65
00:04:01,530 --> 00:04:02,250
You can see it here.

66
00:04:03,120 --> 00:04:09,070
P, which is padding is zero, so it's two by zero, which is zero minus three.

67
00:04:09,170 --> 00:04:13,290
Tree is a filter size, so that's how we get twenty six.

68
00:04:13,290 --> 00:04:18,000
In the end, it's 28 minus three, which is 25 over one.

69
00:04:18,120 --> 00:04:20,850
So 25 plus one, which is 26.

70
00:04:21,420 --> 00:04:29,010
So that's how we get to 26 by 26 output size here in the olden days of of constructing see it ends and

71
00:04:29,010 --> 00:04:29,760
neural networks.

72
00:04:30,090 --> 00:04:35,430
We would have to calculate these things nowadays to libraries do help us quite a bit, but it's important

73
00:04:35,430 --> 00:04:39,630
to understand the size of the ugliest as they go through.

74
00:04:40,170 --> 00:04:43,960
This is something you may need to do a lot sometimes of building you on CNN.

75
00:04:44,820 --> 00:04:48,600
So let's take a look at the second convolutional cones, too.

76
00:04:49,320 --> 00:04:51,020
So there's again, just try this.

77
00:04:51,030 --> 00:04:54,750
So the feature map size now was twenty six by twenty six.

78
00:04:55,170 --> 00:05:00,480
So we use 26 year padding is again zero in this case, and the filter size is tree.

79
00:05:00,720 --> 00:05:06,660
So you can see in the formula over here we have twenty six plus, which is zero first of all, minus

80
00:05:06,660 --> 00:05:10,440
three, so that gives us twenty twenty three twenty three.

81
00:05:10,440 --> 00:05:13,630
Divide by one is twenty three plus one twenty four.

82
00:05:13,920 --> 00:05:17,880
So that's how we get 24 by 24 as the output size here.

83
00:05:18,180 --> 00:05:23,430
And remember, we did two dimension, which is the depth of these filters, depends on how much filters

84
00:05:23,430 --> 00:05:26,730
are being used, which we specify when designing or CNN's.

85
00:05:27,180 --> 00:05:33,030
So we do hardcoded how many filters we want to use and then the CNN libraries take it from there and

86
00:05:33,030 --> 00:05:33,750
start treating.

87
00:05:34,080 --> 00:05:36,960
And we'll go into training extensively in the next few sections.

88
00:05:37,320 --> 00:05:42,960
But for now, it's important you understand all the building blocks of a CNN that's a forward propagation

89
00:05:42,960 --> 00:05:44,130
part of a CNN.

90
00:05:45,390 --> 00:05:47,970
So let's take a look at the output size calculation for Max.

91
00:05:48,690 --> 00:05:51,150
This one is simple, and the formula still holds up as well.

92
00:05:51,570 --> 00:05:59,430
We have an which is 24 two p, which is p z zero again minus F, which is two divided by two.

93
00:05:59,550 --> 00:06:09,030
So we have 24 minus two, which is 22 divided by two, which is 11 11 plus one 12.

94
00:06:09,420 --> 00:06:11,250
And that's how we get to 12 by 12.

95
00:06:13,130 --> 00:06:15,290
Oops, that slide a bit too fast.

96
00:06:15,740 --> 00:06:21,680
So now let's take a look at how we calculated up the size of the flattened layer so you can see the

97
00:06:21,680 --> 00:06:23,090
flatten layer is right.

98
00:06:23,090 --> 00:06:24,890
It's nine thousand two hundred sixteen.

99
00:06:25,370 --> 00:06:26,450
That's actually quite simple.

100
00:06:26,810 --> 00:06:32,360
You just multiply all the dimensions of the max layer here, twelve by 12 by four, which is effectively

101
00:06:32,360 --> 00:06:33,410
flattening that layer.

102
00:06:33,950 --> 00:06:35,120
So that's how we got that there.

103
00:06:35,180 --> 00:06:36,290
It's quite simple to work out.

104
00:06:37,220 --> 00:06:38,600
So the rest of our CNN.

105
00:06:38,930 --> 00:06:43,040
This is another article, so we hardcore 228 notes here.

106
00:06:43,760 --> 00:06:46,040
This is something we can specify, and you can use any number.

107
00:06:46,430 --> 00:06:52,760
I tend to use multiple powers of two for my sizes, but you can use anything you want.

108
00:06:53,390 --> 00:06:54,750
You can use 100000.

109
00:06:54,800 --> 00:07:02,810
I've seen a lot of people use a thousand decimal based numbers, so it's fine, and the output of the

110
00:07:02,810 --> 00:07:04,260
classes here don't output.

111
00:07:04,340 --> 00:07:06,920
The woods represent each one represents a class.

112
00:07:07,370 --> 00:07:11,660
So remember, in handwritten digits, we have 10 digits, zero, two to nine.

113
00:07:12,320 --> 00:07:19,760
So we have a node that represents a probability of the output of it, of the image being that class.

114
00:07:21,110 --> 00:07:23,750
So that's it for the building blocks of the CNN.

115
00:07:24,230 --> 00:07:30,050
I know it probably still isn't fit to get fit to get a fully for you yet, but it will shortly sort

116
00:07:30,050 --> 00:07:34,040
of keep watching the slides go back to your election notes if you want.

117
00:07:34,310 --> 00:07:36,680
I have provided them for you to go over.

118
00:07:36,950 --> 00:07:39,010
And don't worry, it will all make sense.

119
00:07:39,080 --> 00:07:43,370
So in the next section, we will take a look at parameter counts and CNN.

120
00:07:43,670 --> 00:07:44,780
So stay tuned for that.

121
00:07:44,930 --> 00:07:45,380
Thank you.
