1
00:00:11,620 --> 00:00:18,220
In this lecture, we are going to begin discussing code preparation in preparation for our CNN notebook,

2
00:00:19,600 --> 00:00:24,910
this lecture will focus specifically on how we're going to build a model since as we move to more and

3
00:00:24,910 --> 00:00:29,850
more complicated architectures, we'll need more and more sophisticated ways of building models.

4
00:00:30,490 --> 00:00:32,830
There will be two major components of this lecture.

5
00:00:33,520 --> 00:00:39,660
First, as you know, the critical new piece of math that makes an Anand into a CNN is convolution.

6
00:00:40,150 --> 00:00:42,400
So we'll see how to use the convolution module in.

7
00:00:43,780 --> 00:00:48,040
Second, we'll need to integrate this convolution module into the neural network.

8
00:00:48,460 --> 00:00:53,590
I'm going to show you two ways of doing this, but the highlight will be the object oriented method

9
00:00:53,590 --> 00:00:55,110
of building pocketwatch models.

10
00:00:55,960 --> 00:01:01,270
So if you don't remember your object oriented programming from your first year programming class, now

11
00:01:01,270 --> 00:01:05,890
would be the time to crack open your old textbook and reacquaint yourself with those concepts.

12
00:01:11,010 --> 00:01:17,630
So, number one, we want to look at the convolution module, the module we need is called confused.

13
00:01:17,850 --> 00:01:22,740
And on this slide, what I want to do is highlight its arguments and relate them to the math that we

14
00:01:22,740 --> 00:01:23,550
already learned.

15
00:01:25,250 --> 00:01:31,250
As you know, the main parameter in convolution is the filter or the kernel, which is a four dimensional

16
00:01:31,250 --> 00:01:36,440
tensor composed of the number of input channels, the number of output channels and the kernel height

17
00:01:36,440 --> 00:01:37,040
and width.

18
00:01:37,730 --> 00:01:41,420
Recall that the number of channels is also known as the number of feature maps.

19
00:01:41,540 --> 00:01:43,610
In case you're using different terminology.

20
00:01:45,350 --> 00:01:50,240
Because we usually have the colonel height equal to the colonel with we can just specify it as a single

21
00:01:50,240 --> 00:01:53,570
number, which we usually do, OK, when we talk about it in math.

22
00:01:54,770 --> 00:01:58,940
If you wanted to have your colonel height different from your colonel with, you could also specify

23
00:01:58,940 --> 00:01:59,720
it as a tuple.

24
00:02:00,880 --> 00:02:05,500
In modern, deep learning, most of the time, you'll see kernels of size three, although in the old

25
00:02:05,500 --> 00:02:08,860
days you'd also see kernels of size seven and kernels of size five.

26
00:02:10,660 --> 00:02:12,620
Notice how the size is typically odd.

27
00:02:13,030 --> 00:02:16,280
You'll be able to deduce why that is the case after the next lecture.

28
00:02:16,870 --> 00:02:19,210
So that takes care of the first three arguments.

29
00:02:19,690 --> 00:02:24,760
The last relevant argument for us is the stride which controls the step size when you move the kernel

30
00:02:24,760 --> 00:02:26,050
across the input image.

31
00:02:31,120 --> 00:02:37,660
So why don't we use continuity and not kind of one, your kind of 3D, as you know, color images are

32
00:02:37,660 --> 00:02:39,850
three dimensional, eight by with by color?

33
00:02:40,540 --> 00:02:44,940
Well, it's tudy because of the color dimension of an image is not a spatial dimension.

34
00:02:46,100 --> 00:02:48,020
The signal doesn't flow in that direction.

35
00:02:48,710 --> 00:02:54,140
It's also possible to have one de convolutions and three convolutions using the common one B and the

36
00:02:54,140 --> 00:02:55,940
con three delayers, respectively.

37
00:02:57,220 --> 00:03:02,320
So, for example, if you're looking at a signal that varies with time, that would be a one B convolution.

38
00:03:02,890 --> 00:03:06,480
If you're looking at a video, for example, that would be a 3D convolution.

39
00:03:06,910 --> 00:03:11,540
That's because height and width are the first two spatial dimensions and time is a third dimension.

40
00:03:12,160 --> 00:03:17,740
It's also possible to have objects that have three spatial dimensions, height with and depth instead

41
00:03:17,740 --> 00:03:18,270
of pixels.

42
00:03:18,280 --> 00:03:19,360
We have voxels.

43
00:03:19,840 --> 00:03:20,920
You might encounter these.

44
00:03:20,920 --> 00:03:26,110
If you look at something like medical imaging data, by the way, you might be wondering where the word

45
00:03:26,110 --> 00:03:27,110
voxel comes from.

46
00:03:27,640 --> 00:03:33,730
Well, it's pretty obvious that it's analogous to the word pixel by pixel is actually a short form of

47
00:03:33,730 --> 00:03:40,510
the phrase picture element, and thus voxel is the short form of the phrase volume element in any case.

48
00:03:40,570 --> 00:03:46,270
Now we can say that we fully understand at the confused layer so we can move on to asking how do we

49
00:03:46,270 --> 00:03:47,920
include this in a neural network?

50
00:03:53,050 --> 00:03:59,320
Let's now talk about the concept of inheritance pie, which models are all based on modules, and I've

51
00:03:59,320 --> 00:04:03,190
probably used that word a few times throughout the course without really explaining what they are.

52
00:04:03,880 --> 00:04:11,290
Well, a module is the base class that all other modules inherit from inheritance implies an is a relationship.

53
00:04:11,950 --> 00:04:17,590
For example, the linear module, which is the first one we saw in this course, inherits from the module

54
00:04:17,590 --> 00:04:18,190
class.

55
00:04:18,490 --> 00:04:20,150
That means it is a module.

56
00:04:20,860 --> 00:04:23,780
Similarly, the rescue module is also a module.

57
00:04:24,520 --> 00:04:27,580
Similarly, the sequential module is also a module.

58
00:04:28,640 --> 00:04:32,720
This might strike you as surprising because these three things are kind of very different from each

59
00:04:32,720 --> 00:04:33,000
other.

60
00:04:34,130 --> 00:04:39,940
So even though sequential looks like a wrapper around a list of modules, it was also itself a module.

61
00:04:40,550 --> 00:04:45,950
So when you combine a bunch of modules to build a model, that model is itself a module.

62
00:04:48,300 --> 00:04:53,910
This implies that when we create our own custom models, they are also going to be modules and we are

63
00:04:53,910 --> 00:04:56,410
going to inherit from the module class.

64
00:04:57,060 --> 00:05:02,550
So when I said earlier that pie talk is all about composing models from building blocks, well, modules

65
00:05:02,550 --> 00:05:03,810
are those building blocks.

66
00:05:08,960 --> 00:05:14,120
One interesting consequence of everything being a module and models being modules is that there this

67
00:05:14,120 --> 00:05:18,820
hierarchical structure, the first model we saw in this course was just a linear module.

68
00:05:19,550 --> 00:05:25,190
Then we saw that we could stack linear layers together in a sequential object, which is a module of

69
00:05:25,190 --> 00:05:25,900
modules.

70
00:05:26,630 --> 00:05:27,740
That's also a model.

71
00:05:29,510 --> 00:05:35,460
Later, you'll see that we can repeat this pattern so that our model is a module of modules of modules.

72
00:05:36,020 --> 00:05:42,170
So what we've learned is that a module is a model, but a module of modules can also be a model, as

73
00:05:42,170 --> 00:05:45,040
can a module of modules of modules and so on.

74
00:05:46,250 --> 00:05:51,770
By taking the perspective that everything is a module building complex, things becomes easier because

75
00:05:51,770 --> 00:05:54,530
you only have to look at your current level of abstraction.

76
00:05:54,650 --> 00:05:59,060
And no matter how high level that may be, you're still just working with modules.

77
00:06:04,260 --> 00:06:08,790
So why might we want to build a custom model instead of just doing what we've been doing so far?

78
00:06:09,420 --> 00:06:13,190
Well, sometimes we need to do an operation that is not a module.

79
00:06:13,740 --> 00:06:18,720
For example, you know that the interface between the convolution layers and the dense layers in a neural

80
00:06:18,720 --> 00:06:19,870
network is a flatten.

81
00:06:20,520 --> 00:06:26,070
So after all the convolutions and before all the dense layers, we must flatten the data so that it

82
00:06:26,070 --> 00:06:31,290
fits into those respective layers at the boundary between these two different kinds of layers.

83
00:06:32,040 --> 00:06:37,230
The output of a convolution is an image, while a dense layer expects an input vector.

84
00:06:37,920 --> 00:06:44,460
Now, although PI does have a flattened layer, you might wonder what if it didn't exist or what if

85
00:06:44,460 --> 00:06:47,250
some other operation you wanted to do it didn't exist.

86
00:06:47,790 --> 00:06:52,180
In that case, you could build a custom model in the case of flattened.

87
00:06:52,410 --> 00:06:57,770
We know that we can always just use the view function to reshape an image back into a batch of vectors.

88
00:06:58,230 --> 00:07:04,410
In fact, not even one year ago, counting from the time this course was being made, there was no flattened

89
00:07:04,410 --> 00:07:05,490
layer and pi tausche.

90
00:07:05,790 --> 00:07:07,560
So you had to build a custom model.

91
00:07:09,690 --> 00:07:14,900
So you're lucky in that sense today, a flattened layer exists, which makes things easier for you,

92
00:07:15,060 --> 00:07:19,950
but at the same time there are still many things you might want to do, whether it does not exist.

93
00:07:19,950 --> 00:07:21,030
A Pribyl layer for you.

94
00:07:21,870 --> 00:07:27,060
As a side note, I commonly referred to these objects as layers, even though in pie Tahj they are technically

95
00:07:27,060 --> 00:07:28,050
called modules.

96
00:07:28,620 --> 00:07:33,600
Layer is a better visual term in my opinion, since a neural network is made of layers.

97
00:07:33,930 --> 00:07:36,350
It's just easier to understand and see it in your mind.

98
00:07:36,780 --> 00:07:42,090
So please don't get confused when I use the words layer and module interchangeably in the context of

99
00:07:42,090 --> 00:07:42,790
PI talk.

100
00:07:47,980 --> 00:07:53,320
In any case, I hope it's clear why we need custom models when we have all the layers or modules we

101
00:07:53,320 --> 00:07:58,930
need and they all occur sequentially in our model, which is not always the case, by the way, then

102
00:07:58,930 --> 00:08:02,010
we can wrap them up in a sequential, as we've been doing so far.

103
00:08:02,800 --> 00:08:08,080
But if we want to do something more complex, then we need to build a custom model that inherits from

104
00:08:08,080 --> 00:08:09,190
the module class.

105
00:08:11,160 --> 00:08:17,900
Introducing this idea early in the context of CNN is nice because it allows you to compare both approaches,

106
00:08:18,480 --> 00:08:23,610
the easy approach is to do what we've already been doing, which is to build a sequential model that

107
00:08:23,610 --> 00:08:26,760
wraps up all the layers we want to apply in sequential order.

108
00:08:27,390 --> 00:08:33,780
The less easy approach is to build a custom model, but doing it both ways allows you to see the equivalence

109
00:08:33,780 --> 00:08:39,540
between the two and hopefully that makes building custom models less intimidating than it would be otherwise.

110
00:08:39,990 --> 00:08:45,030
In addition, it's also a nice history lesson because less than just one year ago, you wouldn't have

111
00:08:45,030 --> 00:08:45,900
had this luxury.

112
00:08:50,940 --> 00:08:56,820
OK, so let's start by looking at a simple model we've already seen, and again, just to recap the

113
00:08:56,820 --> 00:08:59,010
simplest CNN we can make looks like this.

114
00:08:59,400 --> 00:09:03,380
We have a linear layer followed by a rescue, followed by another linear layer.

115
00:09:03,840 --> 00:09:06,420
No final activation function is needed, as you know.

116
00:09:07,080 --> 00:09:10,020
So what does this look like if we wanted to build a custom model?

117
00:09:15,180 --> 00:09:21,180
So first, we define a class that inherits from an individual inside the constructor, we define all

118
00:09:21,180 --> 00:09:22,920
of our sub modules or layers.

119
00:09:23,780 --> 00:09:25,620
These are the two linear layers and the role.

120
00:09:25,620 --> 00:09:31,650
You notice that all the layers in the sequential model are still defined in the constructor of the custom

121
00:09:31,650 --> 00:09:32,040
model.

122
00:09:34,260 --> 00:09:39,420
We assign these to instance variables so that they can be accessed from other methods, specifically

123
00:09:39,420 --> 00:09:45,270
the forward function, as you can see, the forward function is where we actually apply the operations

124
00:09:45,420 --> 00:09:47,280
defined by the layers we created.

125
00:09:52,450 --> 00:09:57,580
This is just like the simple linear regression model, when we create the linear regression model,

126
00:09:57,580 --> 00:09:59,970
we say variable equals and not linear.

127
00:10:00,610 --> 00:10:06,400
Then when we want to use that model to compute some output, we call that variable object as if it were

128
00:10:06,400 --> 00:10:06,990
a function.

129
00:10:07,750 --> 00:10:13,270
Similarly, inside the Ford function, when we want to compute the output of each of those layers.

130
00:10:13,450 --> 00:10:16,960
We simply call the module objects we created in the constructor.

131
00:10:18,540 --> 00:10:24,540
As an exercise, I would recommend trying to implement your own version of the sequential object using

132
00:10:24,540 --> 00:10:30,390
what you just learned, this would be a great exercise to test your understanding of module inheritance

133
00:10:30,390 --> 00:10:31,970
and building custom models.

134
00:10:31,980 --> 00:10:33,390
So please give it a try.

135
00:10:38,520 --> 00:10:43,830
So at this point, I hope it's easy to see the correspondence between a sequential model and a custom

136
00:10:43,830 --> 00:10:46,020
model that inherits from the module class.

137
00:10:46,710 --> 00:10:52,080
The next thing we're going to do is apply what we learned to build a more complex model, specifically

138
00:10:52,230 --> 00:10:54,000
a convolutional neural network.

139
00:10:54,570 --> 00:10:59,670
As you can see, it's pretty much the same thing just with more layers and different kinds of layers,

140
00:11:00,360 --> 00:11:02,230
as we discussed in the three lectures.

141
00:11:02,250 --> 00:11:08,580
What we have is a sequence of convolution layers, followed by a sequence of dense layers, conveniently

142
00:11:08,580 --> 00:11:10,920
because everything in PI Torture's a module.

143
00:11:11,220 --> 00:11:16,620
We can wrap all the convolution layers in a single sequential object and all the dense layers in a second

144
00:11:16,620 --> 00:11:17,730
sequential object.

145
00:11:22,860 --> 00:11:27,900
That will make it easier to call each of the layers from inside the forward function instead of having

146
00:11:27,900 --> 00:11:32,970
to call each of the layers one at a time, since there might be hundreds of layers, we can just make

147
00:11:32,970 --> 00:11:38,970
a few calls to the sequential objects in addition and note the view function in between the two sequentially.

148
00:11:39,660 --> 00:11:42,900
This helps flatten the data from an image into a feature vector.

149
00:11:48,100 --> 00:11:53,320
You realize something strange about the previous code, which is that I didn't specify the input size

150
00:11:53,320 --> 00:11:58,090
for the first linear layer, to be clear, this is not correct PI syntax.

151
00:11:58,330 --> 00:12:02,440
I'm using this as a placeholder until I explain these concepts in a future lecture.

152
00:12:03,070 --> 00:12:09,070
Remember that in PI Torch there is a level of redundancy when specifying input, size and output size

153
00:12:09,070 --> 00:12:15,390
of each layer, the input size of one layer must match the output size of the incoming layer.

154
00:12:15,790 --> 00:12:20,920
But in the case of convolution, how do we calculate the size of the first linear layer's input?

155
00:12:21,790 --> 00:12:27,530
The answer is we'll have to manually calculate the size of the image as it passes through each convolution.

156
00:12:28,150 --> 00:12:33,460
Then when we flatten the final image, the feature vector dimensionality will simply be the product

157
00:12:33,670 --> 00:12:35,750
of each of the dimensions of the image.

158
00:12:36,100 --> 00:12:39,460
For example, ten by ten by three would become 300.

159
00:12:44,540 --> 00:12:46,370
It's a contrast with our custom model.

160
00:12:46,550 --> 00:12:52,160
I also want to show you how we would build a CNN using the newly created flat layer, which allows us

161
00:12:52,160 --> 00:12:53,970
to make use of the sequential module.

162
00:12:54,620 --> 00:12:57,310
It's pretty easy to see that all the same layers appear.

163
00:12:57,470 --> 00:13:02,480
And the only thing that's different is that instead of explicitly calling the view function, we can

164
00:13:02,480 --> 00:13:03,920
use a flat layer instead.

165
00:13:04,640 --> 00:13:09,260
There is, however, still this small issue of how we calculate the initial input size for the first

166
00:13:09,260 --> 00:13:09,980
linear layer.

167
00:13:10,550 --> 00:13:12,050
We'll discuss that in the next lecture.

168
00:13:12,080 --> 00:13:13,690
So just hold that thought for now.

169
00:13:15,130 --> 00:13:20,020
So what you may have assumed earlier is that building custom models is hard, so maybe it's just better

170
00:13:20,020 --> 00:13:23,930
to use the sequential method for CNN, but in fact, they are nearly equal.

171
00:13:24,430 --> 00:13:27,520
In fact, the hardest part is figuring out this question mark.

172
00:13:27,850 --> 00:13:30,460
What is the initial input size of the first linear layer?

173
00:13:31,360 --> 00:13:36,280
And you can see that either way, whether you build a custom model or just use sequential, you still

174
00:13:36,280 --> 00:13:37,300
have to do this work.

175
00:13:42,400 --> 00:13:47,650
I want to mention one small addition we sometimes make use of in convolutional neural networks, which

176
00:13:47,650 --> 00:13:49,390
is dropout regularisation.

177
00:13:50,200 --> 00:13:55,810
If you've studied machine learning before, you may have heard of concepts such as L2 and L1 Regularisation,

178
00:13:56,020 --> 00:13:58,410
which aren't really used that often in deep learning anymore.

179
00:13:59,800 --> 00:14:03,340
Instead, dropout regularisation is used more commonly.

180
00:14:04,030 --> 00:14:08,870
My goal isn't to get in-depth about dropout in this course because I already have a course for that.

181
00:14:09,040 --> 00:14:10,370
That describes it in detail.

182
00:14:10,930 --> 00:14:16,180
Instead, I'll give you a one line summary, which is that by dropping random notes in a neural network

183
00:14:16,180 --> 00:14:21,520
during training, we make the neuron network less dependent on any particular input feature, which

184
00:14:21,520 --> 00:14:23,770
helps to spread out the influence of each input.

185
00:14:24,460 --> 00:14:27,970
This is similar to what L1 and L2 regularisation try to do.

186
00:14:33,080 --> 00:14:38,840
In Pittsburgh, using dropout is as simple as including a dropout layer and specifying the probability

187
00:14:38,990 --> 00:14:45,440
of dropping a note during training, usually dropout is used between dense layers, but not with convolutions.

188
00:14:45,920 --> 00:14:50,230
Sometimes you'll see them used in Arnon's as well, which you'll learn about in another section.

189
00:14:51,910 --> 00:14:57,040
One interesting discussion I've seen pop up here and there is whether or not you should use dropout

190
00:14:57,040 --> 00:14:59,830
regularisation in the convolutional leers.

191
00:15:00,830 --> 00:15:07,430
While the Torch API lets you do this, it might not necessarily be a good idea, remember that Convolution

192
00:15:07,430 --> 00:15:11,210
is Paddon finding it's looking for a specific stroke or edge.

193
00:15:11,840 --> 00:15:16,430
If you suddenly remove half the pixels from an image, is the stroke still visible?

194
00:15:17,180 --> 00:15:20,090
Can it still be considered the stroke that we are looking for?

195
00:15:20,780 --> 00:15:26,180
Personally, I lean more towards no, but I've seen a few people use drop out in their convolution letters.

196
00:15:28,130 --> 00:15:32,240
Even if it doesn't hurt, if it also doesn't help, then there's no point in using it.

197
00:15:32,900 --> 00:15:35,940
As always, I recommend experimentation or guessing.

198
00:15:36,230 --> 00:15:40,250
So if you're interested in trying it, I would strongly suggest doing so.

199
00:15:42,350 --> 00:15:44,900
OK, so one small note about how to use drop out.

200
00:15:45,350 --> 00:15:49,850
Now the details are outside the scope of this course, but if you're interested in learning more about

201
00:15:49,850 --> 00:15:54,300
how Drop-Out works, please see extra reading in the course repo.

202
00:15:55,220 --> 00:15:59,960
One thing you'll just have to accept on faith for now is that it works differently when you are training

203
00:15:59,960 --> 00:16:03,300
your neural network compared to when you were using it for predictions.

204
00:16:03,860 --> 00:16:08,810
So because of this pie, George has two modes for your model train mode and EVAR mode.

205
00:16:09,380 --> 00:16:14,000
Basically, before you do any training, you should make sure your model is in train mode by calling

206
00:16:14,000 --> 00:16:15,050
model train.

207
00:16:15,620 --> 00:16:22,070
Before you do predictions, you can switch to either mode by calling model eval and note that this also

208
00:16:22,070 --> 00:16:26,450
applies to other layers which behave differently during training, such as Bache Norm, which will be

209
00:16:26,450 --> 00:16:27,320
discussed later.

210
00:16:32,130 --> 00:16:37,170
As a final note for this lecture, I want to give you some foresight into why learning about how to

211
00:16:37,170 --> 00:16:39,080
build custom models is so useful.

212
00:16:39,660 --> 00:16:44,520
We went over one major reason previously, which is that you might need to build a model that contains

213
00:16:44,520 --> 00:16:47,220
operations which are not storage modules.

214
00:16:47,970 --> 00:16:53,490
But so far we've only seen one basic kind of neuron that work in this course, one that encompasses

215
00:16:53,490 --> 00:16:54,810
linear models and ends.

216
00:16:54,810 --> 00:17:01,050
And CNN's that is these models always have one input and one output, and each layer is stacked on top

217
00:17:01,050 --> 00:17:01,800
of the last.

218
00:17:02,430 --> 00:17:05,780
Have you ever considered what if I want my model to have branches?

219
00:17:06,300 --> 00:17:08,420
This by definition is not sequential.

220
00:17:08,430 --> 00:17:10,420
So you can't use and not sequential.

221
00:17:11,160 --> 00:17:13,800
What if I want my model to have multiple inputs?

222
00:17:14,040 --> 00:17:15,450
Same story, can't you?

223
00:17:15,460 --> 00:17:17,680
Sequential because this is not sequential.

224
00:17:18,360 --> 00:17:20,730
What if I want my model to have multiple outputs?

225
00:17:21,120 --> 00:17:23,490
Same story can't use and in that sequential.

226
00:17:25,180 --> 00:17:30,220
Now, you might scoff and say, I will never need these types of models, lazy programmers, just showing

227
00:17:30,220 --> 00:17:33,810
off that he knows all these complicated things, but in fact, you are wrong.

228
00:17:34,330 --> 00:17:37,060
We will be looking at such models later in this course.