WEBVTT

00:00.510 --> 00:02.970
-: Hello, and welcome to this Python tutorial.

00:02.970 --> 00:07.140
So now the next step is to make that count neurons function

00:07.140 --> 00:08.520
which will give us what we want,

00:08.520 --> 00:10.797
that is, this number of neurons

00:10.797 --> 00:14.280
and this huge vector after the convolutions are applied.

00:14.280 --> 00:17.100
That's the only missing information we need right now,

00:17.100 --> 00:19.680
and we are gonna get it with a function.

00:19.680 --> 00:21.840
So let's make this function.

00:21.840 --> 00:26.840
We are going to call it count_neurons very simply.

00:27.810 --> 00:30.630
And what is this count neurons function going

00:30.630 --> 00:32.460
to take as arguments?

00:32.460 --> 00:35.820
Well, it is going to take the object self,

00:35.820 --> 00:37.710
but then it's going to take something else,

00:37.710 --> 00:40.650
because this number of output neurons

00:40.650 --> 00:44.640
in the flattening layer actually only depends on one thing,

00:44.640 --> 00:48.270
it depends on the dimensions of the original input image,

00:48.270 --> 00:50.730
the one that goes at the very beginning

00:50.730 --> 00:52.170
of the neural network.

00:52.170 --> 00:54.420
And so the only argument we need right now,

00:54.420 --> 00:56.010
is actually these dimensions,

00:56.010 --> 00:58.350
the dimensions of the input images.

00:58.350 --> 00:59.790
Therefore, let's give a name

00:59.790 --> 01:02.220
to this argument representing the dimensions

01:02.220 --> 01:03.450
of the input image.

01:03.450 --> 01:07.170
And we are gonna call it image dim.

01:07.170 --> 01:08.003
Alright.

01:08.003 --> 01:11.340
And I can tell you right now that the actual dimensions

01:11.340 --> 01:16.340
of the input images coming from doom are gonna be 80 by 80.

01:16.800 --> 01:19.530
We're gonna reduce the size of the original images

01:19.530 --> 01:23.400
to 80 by 80, and that's gonna be the format

01:23.400 --> 01:26.220
of the images going into the neural network.

01:26.220 --> 01:30.060
So image dim is actually going to be one, 80, 80,

01:30.060 --> 01:31.620
and the one corresponds to the fact

01:31.620 --> 01:33.810
that we're working with black and white images.

01:33.810 --> 01:35.460
That is with only one channel.

01:35.460 --> 01:37.560
So image dim is going to be equalated

01:37.560 --> 01:41.280
to the tuple one, 80 and 80.

01:41.280 --> 01:43.620
Alright. So that's the only argument we need.

01:43.620 --> 01:45.840
And now let's count the neurons.

01:45.840 --> 01:47.460
So how are we going to do that?

01:47.460 --> 01:48.780
Well, first of all,

01:48.780 --> 01:51.210
we actually don't have any input image right now.

01:51.210 --> 01:54.060
We don't have any doom image that we can import.

01:54.060 --> 01:55.500
We're gonna do that later.

01:55.500 --> 01:58.680
So the first thing we have to do is create a fake image,

01:58.680 --> 02:01.200
but that has dimensions 80 by 80.

02:01.200 --> 02:04.320
We're gonna create that fake image with fake pixels,

02:04.320 --> 02:05.370
and that will still give us

02:05.370 --> 02:07.020
eventually the number that we want,

02:07.020 --> 02:09.960
because that number only depends on the dimensions,

02:09.960 --> 02:13.320
and not on the pixels that are inside the images.

02:13.320 --> 02:15.720
So let's just create a fake image to start,

02:15.720 --> 02:18.510
and then we will compute the number of neurons that we want.

02:18.510 --> 02:21.947
So the trick to create a fake image is, well,

02:21.947 --> 02:24.870
we are gonna call it X, first of all,

02:24.870 --> 02:28.800
and then we are gonna use the torch.rand

02:28.800 --> 02:31.770
because you know we're going to put some random pixels

02:31.770 --> 02:32.603
in this images.

02:32.603 --> 02:35.460
So we're using this random functions from torch,

02:35.460 --> 02:37.170
which is the rand function.

02:37.170 --> 02:40.920
Then inside we're gonna input, as you can see,

02:40.920 --> 02:44.550
the dimensions of the images, that is one, 80, 80.

02:44.550 --> 02:47.250
But since we are going to input this image

02:47.250 --> 02:49.800
into the neural network, and as you remember,

02:49.800 --> 02:53.100
the neural network can only accept batches of input states,

02:53.100 --> 02:55.320
that is here, batches of input images,

02:55.320 --> 02:57.810
we are going to create that fake dimension

02:57.810 --> 03:00.630
which we can directly do in this rand function.

03:00.630 --> 03:02.670
We actually just need to start with a one,

03:02.670 --> 03:04.440
that will correspond to the batch,

03:04.440 --> 03:07.560
and then we can just put the tuple, one, 80, 80,

03:07.560 --> 03:11.100
corresponding to the dimensions of the input image.

03:11.100 --> 03:13.890
And as you understood, these dimensions are contained

03:13.890 --> 03:16.080
in this image dim argument,

03:16.080 --> 03:19.320
which represents that tuple one, 80, 80.

03:19.320 --> 03:23.610
So now we just need to add image dim,

03:23.610 --> 03:26.790
but in order to pass the elements of a tuple,

03:26.790 --> 03:29.190
because you know right now image dim is a tuple,

03:29.190 --> 03:31.980
as a list of arguments of a function,

03:31.980 --> 03:35.100
we need to add here before image dim,

03:35.100 --> 03:37.470
that is before the tuple, a star.

03:37.470 --> 03:40.200
The star will allow to pass the elements

03:40.200 --> 03:41.850
of the image dim tuple

03:41.850 --> 03:44.160
as a list of arguments for a function.

03:44.160 --> 03:46.890
And as you can see, that's exactly what is specified here

03:46.890 --> 03:49.920
with the star and the dimensions.

03:49.920 --> 03:53.753
Alright. So that will create an image of fake pixels.

03:53.753 --> 03:56.760
So that will have nothing to do with the doom images,

03:56.760 --> 03:58.560
but again, we will still be able to get

03:58.560 --> 04:00.510
the final number of neurons.

04:00.510 --> 04:03.090
And now the last thing that we need to do, remember,

04:03.090 --> 04:08.090
is to convert this input batch vector into a torch variable,

04:09.270 --> 04:13.050
because this is going to go into the neural network.

04:13.050 --> 04:16.350
Alright, so this now represents an input image,

04:16.350 --> 04:17.880
of random pixels,

04:17.880 --> 04:20.280
that was just converted into a torch variable,

04:20.280 --> 04:22.530
and that will now go into the neural network,

04:22.530 --> 04:23.820
and more specifically,

04:23.820 --> 04:26.730
the convolutional layers of the neural network.

04:26.730 --> 04:29.670
Because since we only need the number of neurons

04:29.670 --> 04:31.560
after the convolutions are applied,

04:31.560 --> 04:34.200
we will just go up to the convolution three,

04:34.200 --> 04:36.570
so right after the third convolutional layer,

04:36.570 --> 04:39.810
and we will not go into the two full connections here.

04:39.810 --> 04:41.220
And that's because the number of neurons

04:41.220 --> 04:45.510
that we want is between convolution three and FC1.

04:45.510 --> 04:48.300
Alright, so now that we have one input image

04:48.300 --> 04:49.860
with the right dimensions,

04:49.860 --> 04:52.860
well, it's time to propagate this image

04:52.860 --> 04:56.190
into the neural network to reach the flattening layer.

04:56.190 --> 04:59.400
Then we're gonna get the neurons in the flattening layer,

04:59.400 --> 05:01.770
and we will just get the information that we want,

05:01.770 --> 05:04.950
that is, the number of neurons in this flattening layer.

05:04.950 --> 05:06.150
So now what we have to do

05:06.150 --> 05:08.910
is exactly what we do in a forward function,

05:08.910 --> 05:12.120
we need to propagate the signals into the neural network,

05:12.120 --> 05:14.190
but only in the convolutional layers,

05:14.190 --> 05:16.290
until we reach the flattening layer.

05:16.290 --> 05:17.310
So let's do this.

05:17.310 --> 05:19.920
We're going to update X.

05:19.920 --> 05:23.640
Now X is the input image, and with the second X here,

05:23.640 --> 05:27.300
X will become, well, the first convolutional layer.

05:27.300 --> 05:30.180
And now what we have to do is a three steps process.

05:30.180 --> 05:33.810
First step, we apply the convolution to the input images.

05:33.810 --> 05:36.300
Then second step, we apply max pooling

05:36.300 --> 05:38.640
to the obtained convoluted images.

05:38.640 --> 05:39.810
And then third step,

05:39.810 --> 05:44.430
we activate the neurons in this pooled convoluted images.

05:44.430 --> 05:47.640
And so X will become this first convolutional layer

05:47.640 --> 05:51.000
composed of all these pooled convoluted images.

05:51.000 --> 05:51.930
So let's do this.

05:51.930 --> 05:54.450
First step, apply the first convolution,

05:54.450 --> 05:57.000
convolution one to the input images.

05:57.000 --> 06:00.450
So what we do is take our convolution one,

06:00.450 --> 06:05.450
self.convolutionone, there we go.

06:05.490 --> 06:08.310
We apply it to our input images,

06:08.310 --> 06:11.460
which so far are represented by X.

06:11.460 --> 06:12.690
So that's the first step.

06:12.690 --> 06:13.830
First step done.

06:13.830 --> 06:17.730
Now, second step, we are going to apply max pooling

06:17.730 --> 06:22.470
to our convoluted images returned by convolution 1X.

06:22.470 --> 06:23.820
And to apply max pooling, well,

06:23.820 --> 06:26.910
we're gonna take a function from the functional module.

06:26.910 --> 06:30.030
So we take F the shortcut then dot

06:30.030 --> 06:34.770
and then we're gonna use the function max pool 2D.

06:34.770 --> 06:38.490
That's the one, we put self convolution 1X

06:38.490 --> 06:41.760
in the parenthesis of the max pool 2D function,

06:41.760 --> 06:45.750
because we apply max pooling to the convoluted images.

06:45.750 --> 06:49.980
But this max pooling function takes additional arguments

06:49.980 --> 06:53.190
which are first, the kernel size.

06:53.190 --> 06:56.880
So again, that's the size of the window sliding

06:56.880 --> 06:58.050
through your images,

06:58.050 --> 07:00.810
and that will take the maximum of the pixels in each slide.

07:00.810 --> 07:02.910
So that will still detect the features,

07:02.910 --> 07:04.710
because the features are associated

07:04.710 --> 07:07.860
to a high value of the pixel in the arrays,

07:07.860 --> 07:09.780
as you saw in the intuition lectures.

07:09.780 --> 07:12.150
So this first argument here we need to input

07:12.150 --> 07:14.070
is this kernel size.

07:14.070 --> 07:15.630
And we're gonna take three,

07:15.630 --> 07:17.910
that's a common choice for the kernel size.

07:17.910 --> 07:21.000
And then we need to input the strides, you know,

07:21.000 --> 07:25.140
by how many pixels it's going to slide in the images,

07:25.140 --> 07:27.450
and we are gonna take a stride of two.

07:27.450 --> 07:29.790
Again, that's a common choice.

07:29.790 --> 07:30.690
So there we go.

07:30.690 --> 07:32.580
Now the second step is done.

07:32.580 --> 07:34.410
And now let's move on to the third step,

07:34.410 --> 07:38.190
which is to activate all the neurons in this pooled

07:38.190 --> 07:41.520
and convoluted images in this first convolutional layer.

07:41.520 --> 07:42.900
And to do this, again,

07:42.900 --> 07:46.170
we are going to apply a function to all this

07:46.170 --> 07:48.540
and so here I'm taking F again,

07:48.540 --> 07:50.640
because we're gonna take another function,

07:50.640 --> 07:52.380
which as you might have guessed is going

07:52.380 --> 07:54.300
to be an activation function.

07:54.300 --> 07:55.980
But which one as usual it's going

07:55.980 --> 07:58.860
to be a rectifier activation function,

07:58.860 --> 08:02.760
and maybe you remember the name for that is ReLu.

08:02.760 --> 08:03.593
There we go.

08:03.593 --> 08:04.426
That's the one.

08:04.426 --> 08:09.290
And so we apply ReLU to our pooled convoluted images,

08:10.320 --> 08:12.510
that is all this.

08:12.510 --> 08:14.340
Alright, and that's it.

08:14.340 --> 08:15.330
Three steps done.

08:15.330 --> 08:16.470
That was very quick.

08:16.470 --> 08:18.810
So remember, the way we have to look at this,

08:18.810 --> 08:23.160
is first, we apply the convolution to our input images,

08:23.160 --> 08:27.180
then we apply max pooling to our convoluted images,

08:27.180 --> 08:28.920
obtained with the convolution,

08:28.920 --> 08:30.840
and then we activate the neurons

08:30.840 --> 08:34.140
in all this pooled convolutional layer

08:34.140 --> 08:36.333
with the rectifier activation function.

08:37.200 --> 08:41.130
So perfect, we get our first convolutional layer,

08:41.130 --> 08:42.870
on which was applied max pooling,

08:42.870 --> 08:46.260
and in which the neurons are now activated.

08:46.260 --> 08:47.460
And so basically what it does

08:47.460 --> 08:50.040
is that it propagates the signals

08:50.040 --> 08:52.620
from the first convolutional layer to the next one.

08:52.620 --> 08:54.120
And speaking of the next one,

08:54.120 --> 08:56.177
that's exactly what we're gonna take care of right now.

08:56.177 --> 08:58.020
We are going to do the same thing,

08:58.020 --> 09:00.450
as we just did on the first convolutional layer,

09:00.450 --> 09:02.400
to the second convolutional layer,

09:02.400 --> 09:04.830
to, again, propagate the signals further

09:04.830 --> 09:06.150
into the neural network,

09:06.150 --> 09:09.810
by activating the neurons of the second convolutional layer.

09:09.810 --> 09:10.800
But before doing this,

09:10.800 --> 09:13.110
we need to get this convolutional layer,

09:13.110 --> 09:16.470
and so we are going to apply convolution two to X

09:16.470 --> 09:18.420
that is now the first convolutional layer,

09:18.420 --> 09:20.910
while we are going to apply convolution two to X,

09:20.910 --> 09:23.370
to obtain the second convolutional layer,

09:23.370 --> 09:25.200
after which we will be max pooling it,

09:25.200 --> 09:27.960
and then finally activating its neurons.

09:27.960 --> 09:29.130
So let's do this.

09:29.130 --> 09:30.090
It's actually very easy,

09:30.090 --> 09:35.090
we just need to copy that and paste in that below.

09:35.340 --> 09:37.680
Now, of course, we need to replace convolution one

09:37.680 --> 09:40.470
by convolution two, and there we go.

09:40.470 --> 09:42.030
That's actually ready.

09:42.030 --> 09:43.890
See, very easy.

09:43.890 --> 09:45.270
And so now with this line,

09:45.270 --> 09:49.350
we propagate the signals from the second convolutional layer

09:49.350 --> 09:50.250
to the next one,

09:50.250 --> 09:52.680
which is going to be the third convolutional layer.

09:52.680 --> 09:54.690
And to get the third convolutional layer, well,

09:54.690 --> 09:57.210
we need to apply that again.

09:57.210 --> 10:00.150
So I'm copying this, pasting that below,

10:00.150 --> 10:03.900
and replacing convolution two by convolution three.

10:03.900 --> 10:06.330
And that's done, isn't it so practical?

10:06.330 --> 10:10.440
We propagate the signals in the three convolutional layers

10:10.440 --> 10:13.143
in a flashlight, thanks to this awesome structure.

10:14.010 --> 10:15.330
Alright, so perfect.

10:15.330 --> 10:17.970
Now we have our signals propagated

10:17.970 --> 10:21.360
up to the third convolutional layer, and after,

10:21.360 --> 10:23.220
and speaking of after that leads us

10:23.220 --> 10:26.130
to what we're looking for, what we're interested in,

10:26.130 --> 10:28.500
that is the flattening layer.

10:28.500 --> 10:32.190
Alright, so now that we have our third convolutional layer,

10:32.190 --> 10:33.780
that's the last X here,

10:33.780 --> 10:36.480
it's time to get our flattening layer.

10:36.480 --> 10:38.340
And so that's exactly what we're gonna do now,

10:38.340 --> 10:41.070
we're going to flatten all the pixels

10:41.070 --> 10:43.620
of this third convolutional layer, that is,

10:43.620 --> 10:46.440
we're gonna take all the pixels of all the channels

10:46.440 --> 10:48.600
of the third convolutional layer.

10:48.600 --> 10:52.110
We're gonna put them one after the other in a huge vector,

10:52.110 --> 10:54.720
and, of course, this huge vector is gonna be nothing else

10:54.720 --> 10:56.280
than the flattening layer.

10:56.280 --> 10:58.860
And at the same time, we will use a trick

10:58.860 --> 11:02.070
to get the number of neurons in this flattening layer.

11:02.070 --> 11:03.750
That's exactly what we're looking for,

11:03.750 --> 11:05.730
that's the number of neurons we're missing.

11:05.730 --> 11:08.850
And therefore, let's directly return what we want.

11:08.850 --> 11:09.810
And in this return,

11:09.810 --> 11:12.660
we are going to flatten this third convolutional layer,

11:12.660 --> 11:15.150
and get at the same time the number of neurons

11:15.150 --> 11:16.590
in this flattening layer.

11:16.590 --> 11:17.970
So we're gonna take X,

11:17.970 --> 11:20.280
which is our third convolutional layer,

11:20.280 --> 11:21.930
we're gonna take all the channels

11:21.930 --> 11:23.850
of the third convolutional layer,

11:23.850 --> 11:25.170
and we're gonna use a function,

11:25.170 --> 11:28.980
which is the size function to flatten all the pixels

11:28.980 --> 11:32.130
of all these channels in one same huge vector.

11:32.130 --> 11:36.090
And so the trick, you can find it in the PyTorch tutorial,

11:36.090 --> 11:38.520
while first we take the data of X

11:38.520 --> 11:40.603
because X is a special structure,

11:40.603 --> 11:42.180
you know, it's a torch variable,

11:42.180 --> 11:44.310
so it has a pretty complex structure,

11:44.310 --> 11:48.150
but first we need to access it with data here.

11:48.150 --> 11:52.080
Then we need to view what's inside of it.

11:52.080 --> 11:54.180
So we use this view function

11:54.180 --> 11:56.850
and now we need to access what we're looking for

11:56.850 --> 12:01.850
and that is given with the arguments one and minus one.

12:02.190 --> 12:04.500
You don't have to understand what's inside the structure,

12:04.500 --> 12:05.850
but you can just understand

12:05.850 --> 12:09.090
that this is how we're gonna get this number of neurons.

12:09.090 --> 12:12.180
And then to finish, we need to add size,

12:12.180 --> 12:15.840
then parenthesis and inside we input one.

12:15.840 --> 12:17.400
So basically what we do here,

12:17.400 --> 12:20.970
what we do is, we take all the pixels of all the channels,

12:20.970 --> 12:23.970
and we put them one after the other in this huge vector

12:23.970 --> 12:27.150
which will be the input of the fully connected network.

12:27.150 --> 12:29.370
That's basically what this size one does,

12:29.370 --> 12:32.550
and with this, can get this number of neurons

12:32.550 --> 12:34.020
that we're looking for.

12:34.020 --> 12:36.480
Alright. So now we get what we want.

12:36.480 --> 12:40.380
And so finally, we can replace number neurons here

12:40.380 --> 12:43.380
by what is returned by this function,

12:43.380 --> 12:47.400
when it is applied to the format of the doom images,

12:47.400 --> 12:50.160
that is one by 80 by 80.

12:50.160 --> 12:53.730
So what we have to do now is replace number neurons

12:53.730 --> 12:58.730
by we take the count neurons function,

12:59.160 --> 13:02.790
which we apply to the format of the doom images,

13:02.790 --> 13:07.790
which will be the tuple one, 80 and 80.

13:09.420 --> 13:10.500
And there we go.

13:10.500 --> 13:12.450
And, of course, we don't forget self,

13:12.450 --> 13:17.130
because counts neuron is actually a method of the CNN class.

13:17.130 --> 13:18.660
So we need to add the self,

13:18.660 --> 13:21.180
and now the warning should disappear.

13:21.180 --> 13:22.500
And there we go.

13:22.500 --> 13:23.940
Now everything is good.

13:23.940 --> 13:26.790
We get the architecture of the neural network,

13:26.790 --> 13:28.170
with nothing missing,

13:28.170 --> 13:30.240
and we have this count neurons function,

13:30.240 --> 13:33.330
in case, you know, you want to try some other architectures,

13:33.330 --> 13:36.300
and you don't want to count this number of neurons manually.

13:36.300 --> 13:37.500
You just use this function,

13:37.500 --> 13:40.140
you apply it to the format of your images,

13:40.140 --> 13:42.090
and this will get you directly what you want,

13:42.090 --> 13:44.400
that is the number of neurons in the flattening layer,

13:44.400 --> 13:45.810
without having to do anything

13:45.810 --> 13:48.450
and whatever the architecture is.

13:48.450 --> 13:49.470
So that's pretty cool.

13:49.470 --> 13:53.310
And now we are done with the first big important step

13:53.310 --> 13:56.160
of this brain that we're making,

13:56.160 --> 13:59.670
and we have one last step that is one last function to make,

13:59.670 --> 14:02.550
which is going to be the main forward function.

14:02.550 --> 14:04.830
So we are gonna propagate the signals

14:04.830 --> 14:06.630
from the beginning of the brain, that is,

14:06.630 --> 14:09.870
from the eyes of the AI up to the output layer,

14:09.870 --> 14:12.420
that is after the second pool connection.

14:12.420 --> 14:14.250
So we'll do that in the next tutorial.

14:14.250 --> 14:16.173
And until then, enjoy AI.
