1
00:00:00,060 --> 00:00:05,610
So now let's take a look at building a model, so to do this, I'll structure the code.

2
00:00:06,120 --> 00:00:07,950
I'll explain everything properly.

3
00:00:07,980 --> 00:00:09,300
So you guys fully understand.

4
00:00:09,810 --> 00:00:16,110
But what I would like to see is that when we're building a model, remember with the CNN, you have

5
00:00:16,110 --> 00:00:17,040
different building blocks.

6
00:00:17,040 --> 00:00:22,350
You have to convert layers, you have your reel, you have your Mac spools, you have a flattened,

7
00:00:22,350 --> 00:00:25,260
fully connected and densely as many outputs.

8
00:00:25,260 --> 00:00:32,640
And all of those things well and PyTorch participate, which allows a lot of low level control over

9
00:00:32,640 --> 00:00:33,180
these things.

10
00:00:33,570 --> 00:00:38,550
So I will give a warning for beginners to pay for this.

11
00:00:38,760 --> 00:00:43,410
The code for this could seem a bit overwhelming compared to the Code for Carers and TensorFlow, which

12
00:00:43,410 --> 00:00:44,580
you'll see in the next section.

13
00:00:45,180 --> 00:00:49,800
However, what I would like to say is that it's a good thing that it's cool, it's complicated.

14
00:00:50,250 --> 00:00:56,490
It actually allows you to have a deeper understanding of CNN's and deep learning and allow you to understand

15
00:00:57,600 --> 00:01:01,950
Python a bit better and then to understand what exactly you're doing in the training process.

16
00:01:02,130 --> 00:01:03,240
So let's begin.

17
00:01:03,540 --> 00:01:06,100
So let's take a look at this.

18
00:01:06,120 --> 00:01:08,010
This is a scene and we're going to be building.

19
00:01:08,370 --> 00:01:11,100
You would remember this from our slides here.

20
00:01:11,110 --> 00:01:14,850
We have the input image, the 28 by 28 by one grayscale image.

21
00:01:15,330 --> 00:01:16,920
We use tree by tree filters.

22
00:01:17,400 --> 00:01:19,230
We get our feature maps here.

23
00:01:19,350 --> 00:01:20,470
We have 22 filters.

24
00:01:20,500 --> 00:01:21,780
We get 22 feature maps.

25
00:01:22,230 --> 00:01:23,940
They're now 26 by 26.

26
00:01:23,940 --> 00:01:29,550
And then we apply tree by tree filters again, which gives us another set of feature maps here.

27
00:01:30,030 --> 00:01:37,710
This is 24 by 24 by 64 because we have we use 64 convolution filters in this list that we apply max

28
00:01:37,710 --> 00:01:40,530
boot to shrink the dimensionality in half.

29
00:01:40,530 --> 00:01:47,610
But we still retain 64 filters or feature maps, and we flatten that to get nine thousand two hundred

30
00:01:47,610 --> 00:01:48,600
sixty two new ones here.

31
00:01:49,140 --> 00:01:53,100
This is connected to 128 nodes with no fully connected layer.

32
00:01:53,280 --> 00:01:54,390
These are two for the connected.

33
00:01:54,930 --> 00:01:59,940
This is also for the connected here to the output layer, where we apply the soft max function to get

34
00:02:00,120 --> 00:02:02,370
probabilities for classes.

35
00:02:02,850 --> 00:02:04,260
So let's take a look at this.

36
00:02:04,890 --> 00:02:06,570
So I have some notes here.

37
00:02:07,020 --> 00:02:09,950
This is how you define the parameters for conflict.

38
00:02:10,440 --> 00:02:12,480
Now, as a kind of 2D layer, we're using.

39
00:02:13,110 --> 00:02:14,800
We specify how much input channels.

40
00:02:14,820 --> 00:02:20,220
So in our case, it's one because it's only one in the depth here, and it would have been treated with

41
00:02:20,220 --> 00:02:20,880
a color image.

42
00:02:21,540 --> 00:02:28,230
Our own channels is a number of filters, which is to clean size is the filter dimensions.

43
00:02:28,560 --> 00:02:34,650
So tree by tree straight and padding, which you are familiar with from the floor is the straight is

44
00:02:34,650 --> 00:02:36,810
how much jumps we take when moving too slowly.

45
00:02:36,810 --> 00:02:43,740
When do around and padding is basically if we put a zero pads around the image here if you want to keep

46
00:02:43,740 --> 00:02:44,700
their nationality going.

47
00:02:45,180 --> 00:02:51,180
So by default, stride is one and padding is zero, so we can set parameters winning here.

48
00:02:52,410 --> 00:02:59,910
So now let's take a look at how we used to watch and then functional to create the network.

49
00:03:00,210 --> 00:03:05,280
So this is a quote I said could be a bit confusing to beginners, but I'll explain everything line by

50
00:03:05,280 --> 00:03:05,520
line.

51
00:03:06,060 --> 00:03:09,580
So this is where we're importing the torched on an.

52
00:03:11,910 --> 00:03:18,120
Modules or objects that can be using to create the CNN and functional F.

53
00:03:18,510 --> 00:03:23,100
That's a class where it's basically called F by convention.

54
00:03:23,530 --> 00:03:27,930
And it's a module that allows us to access some of the parts of the pie to its library.

55
00:03:28,230 --> 00:03:33,360
So it gives us access to things like the activation functions and other convenient functions that we

56
00:03:33,360 --> 00:03:33,720
use.

57
00:03:34,050 --> 00:03:39,030
So these are the two main functions we'll be using to create what objects we're using to create our

58
00:03:39,030 --> 00:03:39,540
CNN.

59
00:03:39,750 --> 00:03:41,910
So let's take a look at this class.

60
00:03:42,240 --> 00:03:48,370
So in PyTorch, we create, we don't have to, but it's generally good practice.

61
00:03:48,390 --> 00:03:53,100
This is how the tends to be done, but a model as a class.

62
00:03:53,100 --> 00:03:56,940
So we call class net and we pass this in here.

63
00:03:57,690 --> 00:04:02,160
This is basically the module function that it inherits from.

64
00:04:02,820 --> 00:04:07,290
So super, you see this function in the first line in it.

65
00:04:07,620 --> 00:04:12,630
I don't know if you know what it is, and it is a function that runs as soon as you instantiate a class.

66
00:04:12,960 --> 00:04:18,780
So Super is a subclass of the module and with inherits all its methods.

67
00:04:19,110 --> 00:04:26,490
So it allows us to access all the methods in the module object so we can use it conveniently in this,

68
00:04:26,730 --> 00:04:33,960
which is which is how it's allowed us to do these sets of these types of declarations in this.

69
00:04:35,120 --> 00:04:41,120
So hopefully you're still with me, but it's actually quite simple, once you get practice, it's good,

70
00:04:41,210 --> 00:04:47,000
it's good to actually get hands-on experience practicing with these things and then slowly you'll develop

71
00:04:47,000 --> 00:04:48,710
a deeper understanding of what's happening.

72
00:04:49,100 --> 00:04:51,910
So for now, we're on this line here.

73
00:04:52,550 --> 00:04:56,780
So we have we create a class that we're inheriting from super net.

74
00:04:56,800 --> 00:05:03,110
That is what we call this one, and we just use that init function just to initialize it as well to

75
00:05:03,110 --> 00:05:03,500
run it.

76
00:05:03,650 --> 00:05:05,930
So now let's move on to dislike.

77
00:05:06,680 --> 00:05:08,150
So this is this is a line.

78
00:05:08,420 --> 00:05:12,560
These are the important lanes where we actually define what we're seeing in Liz.

79
00:05:12,900 --> 00:05:17,540
So we call it the icon of one and we call.

80
00:05:17,570 --> 00:05:18,920
This is how we create to see an area.

81
00:05:18,920 --> 00:05:23,100
So we have Menin, which we it up above here of 2D.

82
00:05:23,660 --> 00:05:30,110
We use one is in both dimension can scroll up here if you want to see the parameters it takes in four

83
00:05:30,110 --> 00:05:34,040
dimension output channels kernel size, straight and putting.

84
00:05:36,470 --> 00:05:42,140
So we have 22 filters here, tree by tree, and we're using the straight and putting default values

85
00:05:42,140 --> 00:05:43,160
of one and zero.

86
00:05:43,460 --> 00:05:48,860
We could specifically specify them here, but we'll just go with the default values in this in the CNN.

87
00:05:50,420 --> 00:05:52,700
And secondly, we have a second currently here.

88
00:05:53,510 --> 00:05:57,860
Now I notice we have 32 as in, but that's because this channel.

89
00:05:58,040 --> 00:06:01,020
The second one outputs 22 filters.

90
00:06:01,050 --> 00:06:03,080
So now the input to this wittily two filters.

91
00:06:03,740 --> 00:06:04,730
64 filters.

92
00:06:04,730 --> 00:06:09,320
What we're going to be using, though, in this conflict, which we buy three filters and again, the

93
00:06:09,320 --> 00:06:15,170
default stride and putting next, we have the Max Pouliot where we just simply destroyed them.

94
00:06:15,890 --> 00:06:20,030
Describe this to filter size, kernel size and the right here.

95
00:06:20,450 --> 00:06:21,950
So that's two two.

96
00:06:22,940 --> 00:06:27,080
And we call this stock pool and then we create another layer here.

97
00:06:27,090 --> 00:06:33,140
But FC once a first for the connected layer, when we take the output of the Max Boolean, it's 64.

98
00:06:33,440 --> 00:06:37,610
Because it was 64 was how many filters we used in the killing of Julia.

99
00:06:38,570 --> 00:06:44,180
And we have 12 by 12, which is a size which we know and we opened this to 228 notes.

100
00:06:44,600 --> 00:06:49,010
And then finally, in the final conflict, we have a 128.

101
00:06:49,160 --> 00:06:51,050
It's connected to the 10 output notes.

102
00:06:51,620 --> 00:06:51,950
No.

103
00:06:52,850 --> 00:06:54,340
What did you notice here?

104
00:06:55,040 --> 00:06:56,930
I kept saying this is but this is the output.

105
00:06:57,170 --> 00:06:58,580
However, it's not linked to it.

106
00:06:58,730 --> 00:07:00,290
These are just objects we're creating here.

107
00:07:00,710 --> 00:07:01,490
This is an object.

108
00:07:01,520 --> 00:07:02,300
This is an object.

109
00:07:02,630 --> 00:07:04,700
All of these things that are still objects.

110
00:07:05,270 --> 00:07:06,860
Think of it like pieces of a pipeline.

111
00:07:07,250 --> 00:07:09,230
But we haven't linked the pipeline yet.

112
00:07:09,500 --> 00:07:12,320
That's what we do in the forward function.

113
00:07:12,830 --> 00:07:15,800
Remember forward sense of a forward pass of forward propagation.

114
00:07:16,370 --> 00:07:17,450
That's how we create.

115
00:07:17,480 --> 00:07:18,410
That's what we know, link.

116
00:07:18,740 --> 00:07:22,210
All of these objects are created and in it, modular focus.

117
00:07:22,730 --> 00:07:23,730
We linked them together.

118
00:07:23,750 --> 00:07:25,880
So now we're treated like we're taking input.

119
00:07:25,880 --> 00:07:26,900
X would be input.

120
00:07:27,950 --> 00:07:35,330
So the input to actually remember when we actually using functions or objects that are declared in a

121
00:07:35,330 --> 00:07:37,900
function, we use both access to the same one.

122
00:07:38,750 --> 00:07:42,950
It's a good step by then terminology or syntax.

123
00:07:44,330 --> 00:07:47,960
So in this function here, this is the real function here.

124
00:07:48,230 --> 00:07:49,970
So we just apply.

125
00:07:50,180 --> 00:07:52,220
We take the input x, we apply it.

126
00:07:53,300 --> 00:07:59,300
Basically, this is how Touches C specifies that you're going to use the function you don't f don't

127
00:07:59,300 --> 00:07:59,750
redo.

128
00:08:00,350 --> 00:08:06,980
Suddenly, you want to use the activation function and which is kind of context input as X.

129
00:08:06,980 --> 00:08:13,670
So it's self-cleaning for one x, as the input X, as the output now at when it went into the XS and

130
00:08:13,790 --> 00:08:14,450
XS output.

131
00:08:14,870 --> 00:08:16,220
You just take this input.

132
00:08:16,220 --> 00:08:22,060
And because of the sequence, this input here is not processed as this year, so it changes that X.

133
00:08:23,240 --> 00:08:30,080
And likewise, we can take the output to this S. committed the input of Columbus to again specifying

134
00:08:30,090 --> 00:08:30,440
redo.

135
00:08:30,680 --> 00:08:32,240
And now we used a Pouliot.

136
00:08:32,660 --> 00:08:37,700
So you can see we know we have pool really closed all in one line, quite convenient.

137
00:08:38,330 --> 00:08:39,650
And again with X here.

138
00:08:41,090 --> 00:08:42,370
Next we have X dots.

139
00:08:42,380 --> 00:08:49,430
View this view function here, and we use the view function to basically reshape our tensor.

140
00:08:49,910 --> 00:08:56,300
Because remember, we have the max spool output here as sixty four by twelve by twelve with color flexible

141
00:08:56,310 --> 00:08:56,990
over sixty four point.

142
00:08:57,020 --> 00:09:01,310
We want to consider what the same thing and this is.

143
00:09:01,490 --> 00:09:02,870
This is what we specified here.

144
00:09:03,200 --> 00:09:05,270
So we why do we use minus one now?

145
00:09:05,750 --> 00:09:11,510
Well, the meaning of the parameter minus one is that it glows and gives us a situation where if you

146
00:09:11,510 --> 00:09:16,580
don't know how many rules you want, but you're sure the number of columns you can use, this minus

147
00:09:16,580 --> 00:09:17,590
one had to specify.

148
00:09:17,600 --> 00:09:20,090
So it's just a way to actually just tell the library.

149
00:09:20,090 --> 00:09:23,480
We're not sure you give me the output and I'll deal with it effectively.

150
00:09:23,480 --> 00:09:24,200
That's what it is.

151
00:09:24,770 --> 00:09:28,190
So we just sort of like, Apply this here, reshape this vector.

152
00:09:28,190 --> 00:09:32,480
This is basically flattening this ActionScript flat in the head.

153
00:09:34,190 --> 00:09:35,430
It flattens this up.

154
00:09:35,450 --> 00:09:37,460
This treaty, Trini Tensor.

155
00:09:37,550 --> 00:09:43,440
I think it gets and basically passes up now to the fully connected layer because it fully connected,

156
00:09:43,440 --> 00:09:46,880
the layer will need a flattened vector to operate on.

157
00:09:47,570 --> 00:09:52,790
That's why it's good to have an understanding of what exactly happens in the CNN here, because with

158
00:09:52,790 --> 00:09:58,850
PyTorch, you actually get a fairly low level with manipulating this data and these tenses.

159
00:09:59,030 --> 00:10:01,000
So you can see how the dimensions are important.

160
00:10:01,010 --> 00:10:02,090
It's good to know these things.

161
00:10:02,750 --> 00:10:09,140
So we have no x flattened vector going as input to FC one layer.

162
00:10:09,860 --> 00:10:12,290
And again, we apply real definition to that.

163
00:10:12,290 --> 00:10:15,920
We'll take the output and then pass it to our last.

164
00:10:16,250 --> 00:10:23,600
FC Dhulia Now you may have noticed one thing that that is sort of actually layer isn't here that don't

165
00:10:23,600 --> 00:10:23,840
work.

166
00:10:23,870 --> 00:10:24,410
That's coming.

167
00:10:24,830 --> 00:10:26,540
It is it is part of the Network Hub.

168
00:10:26,660 --> 00:10:29,520
It's not part of the fold propagation part of that.

169
00:10:29,540 --> 00:10:30,050
We're here.

170
00:10:30,800 --> 00:10:33,530
This here is basically just our model as it is.

171
00:10:33,950 --> 00:10:35,280
And then we apply software.

172
00:10:35,560 --> 00:10:39,760
In a different way, afterword from these outlets, one is open to that list.

173
00:10:40,450 --> 00:10:43,900
So that's what we do here in these two lines at the end of this.

174
00:10:44,290 --> 00:10:47,470
This class doesn't actually do anything just to visit that.

175
00:10:47,470 --> 00:10:50,500
It is not to make anything until we actually run the class.

176
00:10:51,250 --> 00:10:57,250
So we create this obliga equal that that's when we basically create a world class object here.

177
00:10:57,940 --> 00:10:59,590
And then we set it to device.

178
00:10:59,600 --> 00:11:05,030
And if you remember correctly, the device basically tells us, what do we want to do on a GPU or CPU?

179
00:11:05,050 --> 00:11:09,550
It's something we set initially CUDA, meaning it's GPUs, and I would use CUDA as a value.

180
00:11:09,560 --> 00:11:16,450
You can see me hover over it, so it would be passing all the information on this class.

181
00:11:16,630 --> 00:11:19,660
CNN object to dejavu for operation.

182
00:11:20,410 --> 00:11:25,840
Next, let's take a look at the code without the distracting comments, because I did leave a lot of

183
00:11:25,840 --> 00:11:26,920
comments for you guys.

184
00:11:27,310 --> 00:11:28,660
So you understand us fully.

185
00:11:29,200 --> 00:11:30,610
You can see the code looks quite simple.

186
00:11:31,270 --> 00:11:34,690
This is this is the code that codes this, you know?

187
00:11:35,440 --> 00:11:39,070
So hopefully you realize that using pay too much isn't that difficult.

188
00:11:39,490 --> 00:11:44,830
There's just a few things that you need to know how we just create object layers here.

189
00:11:45,340 --> 00:11:47,410
We link them all to get into, followed by for loop.

190
00:11:47,890 --> 00:11:50,740
We create the object here and send it to the GPU or CPU.

191
00:11:51,040 --> 00:11:51,370
OK.

192
00:11:52,210 --> 00:11:59,050
And then that this outputs this year, which is basically the layers in like a printed summary format

193
00:11:59,560 --> 00:12:00,430
of our network.

194
00:12:00,910 --> 00:12:03,400
We can actually print that again here by using it.

195
00:12:03,970 --> 00:12:08,710
If you ever want to access that object and that object can print it like this again, it's the same

196
00:12:08,710 --> 00:12:10,090
thing I think above here.

197
00:12:10,510 --> 00:12:11,560
It's just another way to do it.

198
00:12:12,930 --> 00:12:19,440
So next, we'll take a look at defining the Laws Function Optimizer, and then we'll start training

199
00:12:19,650 --> 00:12:22,260
on CNN, so I'll see you in the next lesson.