WEBVTT

00:00.450 --> 00:02.700
Speaker: Hello, and welcome to this Python tutorial.

00:02.700 --> 00:04.680
Now we're gonna make the forward function,

00:04.680 --> 00:06.690
that will forward propagate the signal,

00:06.690 --> 00:08.670
throughout the brain, from the very beginning,

00:08.670 --> 00:11.100
with the input images up to the outputs,

00:11.100 --> 00:13.920
which will contain the Q values for the actor

00:13.920 --> 00:15.060
and the V value.

00:15.060 --> 00:18.210
That is the value taken by the V function, for the critic.

00:18.210 --> 00:21.420
So, it's gonna be quite similar as what we did for doom

00:21.420 --> 00:23.850
but this time something is gonna change.

00:23.850 --> 00:25.530
The main thing that's gonna change is that,

00:25.530 --> 00:27.480
now we have an endless TM, in the brain.

00:27.480 --> 00:29.670
So, we'll have to do something more,

00:29.670 --> 00:32.820
to propagate the signal and be careful with that.

00:32.820 --> 00:34.920
And the other thing, less important,

00:34.920 --> 00:37.590
but still that changes compared to before,

00:37.590 --> 00:40.950
is that we're not gonna use a ReLU activation function,

00:40.950 --> 00:42.907
as you know, the non-linear activation function.

00:42.907 --> 00:45.180
But we're gonna use the ELU,

00:45.180 --> 00:48.930
which is kind of a more, sophisticated ReLU function.

00:48.930 --> 00:51.810
We will see that in the pytorch documentation.

00:51.810 --> 00:52.950
So let's do this.

00:52.950 --> 00:54.240
Let's make this function.

00:54.240 --> 00:56.010
We start with the depth.

00:56.010 --> 00:59.250
It's actually the last function of this (indistinct) class.

00:59.250 --> 01:02.580
So, we're gonna call it forward like last time.

01:02.580 --> 01:06.900
And this forward function is gonna take self, the object,

01:06.900 --> 01:10.140
because we're gonna use the object, and the inputs.

01:10.140 --> 01:13.560
So, important to understand what will these inputs be.

01:13.560 --> 01:16.200
This will not only be the input images,

01:16.200 --> 01:18.600
these inputs will also contain the hidden nodes

01:18.600 --> 01:21.090
and the cell nodes of the LSTM.

01:21.090 --> 01:23.130
So, that's why I wanted to highlight,

01:23.130 --> 01:25.170
that some things are gonna change now.

01:25.170 --> 01:28.380
Basically, we're considering, in the forward function,

01:28.380 --> 01:31.170
the hidden nodes and the cell nodes of the LSTM.

01:31.170 --> 01:35.160
And speaking of them, now, what we're gonna do is,

01:35.160 --> 01:38.850
separate these two inputs of this argument inputs,

01:38.850 --> 01:41.670
in the forward function, and how can we separate them?

01:41.670 --> 01:44.070
Well, we can recall a new variable,

01:44.070 --> 01:46.200
which will be the input images.

01:46.200 --> 01:48.900
So, that's, this time, the input images.

01:48.900 --> 01:53.900
And, we separate them with the topple HX and CX,

01:54.810 --> 01:56.700
which is the topple of the hidden states

01:56.700 --> 01:59.340
and the cell states, of the LSTM.

01:59.340 --> 02:04.340
So, HX are the hidden states, and CX are the cell states.

02:04.530 --> 02:05.363
All right?

02:05.363 --> 02:07.590
And that will be equal to the input,

02:07.590 --> 02:09.510
that is this argument here.

02:09.510 --> 02:11.610
So now we've made this separation, and therefore,

02:11.610 --> 02:15.180
we can start to propagate the signal throughout the brain.

02:15.180 --> 02:18.870
And to do that, we are going to get, successively,

02:18.870 --> 02:21.690
the different layers, from the first one to the last one,

02:21.690 --> 02:22.830
by using our connections.

02:22.830 --> 02:26.520
That is the convolutions, the LSTM connection

02:26.520 --> 02:30.450
and the linear connection here, the full connections.

02:30.450 --> 02:33.690
So let's do this. Now, it's gonna be the same as before.

02:33.690 --> 02:37.320
We're gonna get our first layer, that we're gonna call X.

02:37.320 --> 02:38.237
And to get this first layer,

02:38.237 --> 02:41.010
we need to propagate the signal,

02:41.010 --> 02:43.080
from the inputs, to this first layer.

02:43.080 --> 02:45.960
And therefore, we need to use the first convolution.

02:45.960 --> 02:48.180
Because it is this first convolution that propagates

02:48.180 --> 02:52.102
the signal, from the inputs images, to the first layer.

02:52.102 --> 02:54.418
So what we're gonna do now, is copy this,

02:54.418 --> 02:57.083
because this is the first convolution.

02:57.083 --> 02:59.921
We base that here, and we apply this first convolution

02:59.921 --> 03:04.921
to our input images, which are now the right inputs.

03:04.980 --> 03:05.813
And there we go.

03:05.813 --> 03:07.230
That propagates the signal,

03:07.230 --> 03:09.810
from the input images to the first layer.

03:09.810 --> 03:11.490
But now, remember, we have to use

03:11.490 --> 03:13.530
a non-linear activation function

03:13.530 --> 03:15.780
to break the linearity,

03:15.780 --> 03:16.980
in order to be able to learn

03:16.980 --> 03:20.310
the non-linear relationships, inside the images.

03:20.310 --> 03:23.490
And to do this, we are gonna use, as we said,

03:23.490 --> 03:25.920
the ELU activation function, that we're about to

03:25.920 --> 03:28.170
see right now, in the pytorch duct.

03:28.170 --> 03:30.090
But before that, let's get it.

03:30.090 --> 03:31.710
So to get it, it's like ReLU,

03:31.710 --> 03:33.540
we take the functional module,

03:33.540 --> 03:37.560
which has a shortcut, F, then dot, and then ELU.

03:37.560 --> 03:40.740
And then we put all this in parenthesis

03:40.740 --> 03:45.510
because we want to, non-linearly, activate the neurons

03:45.510 --> 03:47.490
of this first layer here.

03:47.490 --> 03:49.860
That we obtained, by applying the first convolution

03:49.860 --> 03:51.150
on the inputs.

03:51.150 --> 03:53.550
So now let's go to the pytorch duct,

03:53.550 --> 03:55.320
to understand what ELU is.

03:55.320 --> 03:56.190
Here it is.

03:56.190 --> 04:00.870
So, you can access it to pytorch.org/./nn.html,

04:02.280 --> 04:05.490
and then you have to find non-linear activations.

04:05.490 --> 04:07.380
And in the non-linear activation function,

04:07.380 --> 04:10.680
you will find, well ReLU, that's the classic one we know.

04:10.680 --> 04:12.810
So it's just a max of zero and x.

04:12.810 --> 04:14.250
You have to graph in mind.

04:14.250 --> 04:17.160
Then you have ReLU six, which is this one.

04:17.160 --> 04:19.530
So, a little more sophisticated.

04:19.530 --> 04:21.540
And there we go, we have ELU.

04:21.540 --> 04:22.440
And as you can see,

04:22.440 --> 04:26.580
ELU is a ReLU, plus an additional element.

04:26.580 --> 04:29.340
So, it's like a more sophisticated ReLU.

04:29.340 --> 04:30.510
And so that's the one we use

04:30.510 --> 04:33.000
to non-linearly, activate the neurons

04:33.000 --> 04:34.140
in the different layers.

04:34.140 --> 04:36.690
And by the way, this ELU activation function,

04:36.690 --> 04:39.450
is called, the exponential linear unit.

04:39.450 --> 04:40.290
So there we go.

04:40.290 --> 04:44.430
We apply the ELU on the first convolutional layer

04:44.430 --> 04:46.320
and now, things are gonna be easy.

04:46.320 --> 04:48.030
We're gonna proceed to the next,

04:48.030 --> 04:50.010
forward propagation of the signal,

04:50.010 --> 04:52.980
which is from, the first convolutional layer

04:52.980 --> 04:55.470
to the second convolutional layer,

04:55.470 --> 04:57.510
which we are gonna call X

04:57.510 --> 05:00.150
because, basically, we're just updating X.

05:00.150 --> 05:02.250
Now X is the first convolutional layer.

05:02.250 --> 05:03.900
And by propagating the signal,

05:03.900 --> 05:06.840
from the first convolutional layer to the next one,

05:06.840 --> 05:09.450
X will become the next convolutional layer.

05:09.450 --> 05:11.190
And so to propagate the signal,

05:11.190 --> 05:13.845
from the first convolutional layer to the second one,

05:13.845 --> 05:17.310
we can simply, copy this and paste that here

05:17.310 --> 05:20.490
and replace conf one by conf two.

05:20.490 --> 05:22.440
And now, of course, the second convolution

05:22.440 --> 05:24.450
is not applied to the input images,

05:24.450 --> 05:28.560
but to X, that is the first convolutional layer,

05:28.560 --> 05:29.820
which is right here.

05:29.820 --> 05:30.780
All right, perfect.

05:30.780 --> 05:33.000
Now we get our second convolutional layer.

05:33.000 --> 05:35.490
And now let's propagate the signal, again,

05:35.490 --> 05:38.220
from the second convolutional layer, to the third one.

05:38.220 --> 05:40.830
And, therefore, we can directly copy this

05:40.830 --> 05:43.170
and paste that here.

05:43.170 --> 05:45.990
And replace comf two by comf three.

05:45.990 --> 05:46.980
There we go.

05:46.980 --> 05:49.290
And last one, now to propagate the signal,

05:49.290 --> 05:52.380
from the third convolutional layer, to the fourth one.

05:52.380 --> 05:56.040
And last one, we can just copy this again, paste that here.

05:56.040 --> 05:58.800
And replace comf three by comf four.

05:58.800 --> 06:00.210
There we go.

06:00.210 --> 06:01.830
So, let's recap.

06:01.830 --> 06:03.450
We start with our inputs.

06:03.450 --> 06:04.920
We apply the first convolution

06:04.920 --> 06:07.260
to get the first convolutional layer.

06:07.260 --> 06:08.910
Then we apply the second convolution

06:08.910 --> 06:10.560
to the first convolutional layer,

06:10.560 --> 06:12.900
to obtain the second convolutional layer.

06:12.900 --> 06:14.550
Then we apply the third convolution,

06:14.550 --> 06:16.260
to the second convolutional layer,

06:16.260 --> 06:18.540
to obtain the third convolutional layer.

06:18.540 --> 06:20.970
And finally, we apply the fourth convolution

06:20.970 --> 06:22.770
to the third convolutional layer,

06:22.770 --> 06:25.470
to obtain the fourth convolutional layer.

06:25.470 --> 06:27.600
And that's how the signal is propagated,

06:27.600 --> 06:30.090
throughout the eyes of the ai.

06:30.090 --> 06:30.923
So there we go.

06:30.923 --> 06:34.560
We have, now the output signal, after the four convolutions.

06:34.560 --> 06:35.640
And now you know what to do.

06:35.640 --> 06:38.610
We need to expand this whole output signal

06:38.610 --> 06:40.890
in one dimensional vector.

06:40.890 --> 06:42.840
That's the flattening step.

06:42.840 --> 06:45.270
So, we're gonna update X again.

06:45.270 --> 06:49.230
X now will become this flattened one-dimensional vector.

06:49.230 --> 06:52.410
And to do this, that's the same, we need to take X,

06:52.410 --> 06:57.410
which is, so far, the fourth convolution layer, X, dot.

06:57.510 --> 07:00.000
Then we use the view function,

07:00.000 --> 07:02.520
and we first input minus one,

07:02.520 --> 07:05.490
to say that, we want a one dimensional vector.

07:05.490 --> 07:06.930
And then as a second argument,

07:06.930 --> 07:10.560
we need to input the number of elements in this vector.

07:10.560 --> 07:13.620
And that is, remember, 32x3x3,

07:13.620 --> 07:16.323
and therefore, we input here 32x3x3.

07:20.250 --> 07:21.083
There we go.

07:21.083 --> 07:22.680
Now we have our flatten vector

07:22.680 --> 07:24.660
and the flattening step is done.

07:24.660 --> 07:28.230
Perfect. Now let's take care of the LSTM part.

07:28.230 --> 07:29.850
So, as you understood,

07:29.850 --> 07:33.180
the LSTM takes as input, the flatten vector,

07:33.180 --> 07:37.170
this one dimensional vector of 32x3x3 elements.

07:37.170 --> 07:40.475
So, it's already ready and well prepared for the LSTM.

07:40.475 --> 07:42.360
The LSTM is now ready to take,

07:42.360 --> 07:44.520
this flattened vector as input.

07:44.520 --> 07:47.490
And, therefore, we can take our LSTM

07:47.490 --> 07:52.490
and input, as argument first X, already flattened vector.

07:53.130 --> 07:56.250
That is, this X right here, that we just expanded.

07:56.250 --> 08:00.870
But, also, and that's where this topple comes into play.

08:00.870 --> 08:05.870
We need to input HX and CX, and we can use HX and CX here,

08:06.450 --> 08:08.190
because we made that separation,

08:08.190 --> 08:12.960
from the original input argument of the forward function.

08:12.960 --> 08:15.900
So LSTM X, the flattened output vector,

08:15.900 --> 08:17.550
after the four convolutions

08:17.550 --> 08:20.490
and this topple of the hidden and the cell node.

08:20.490 --> 08:21.420
So there we go.

08:21.420 --> 08:22.800
Then we must not forget the self,

08:22.800 --> 08:26.190
because LSTM is a variable of our end function.

08:26.190 --> 08:28.290
So a variable attached to the object.

08:28.290 --> 08:30.090
So cell.LSTM.

08:30.090 --> 08:33.240
And this, will actually, return two outputs,

08:33.240 --> 08:34.680
a topple of two outputs,

08:34.680 --> 08:36.570
which will be the output hidden node

08:36.570 --> 08:37.860
and the output cell node.

08:37.860 --> 08:39.240
So it's actually a topple.

08:39.240 --> 08:43.500
And, therefore, we can update HX, the hidden node

08:43.500 --> 08:47.190
and CX, the cell node, because that's exactly

08:47.190 --> 08:48.993
the output of this LSTM here.

08:50.010 --> 08:52.200
Great. So, we are almost done.

08:52.200 --> 08:55.470
Now that we have the output of the LSTM,

08:55.470 --> 08:57.690
we need to get the useful output

08:57.690 --> 09:00.360
because, actually, only the hidden nodes are useful

09:00.360 --> 09:01.860
and, therefore, we're gonna get it,

09:01.860 --> 09:06.403
by updating X again and X will now be equal to HX.

09:07.466 --> 09:09.840
It's the first element of the output topple

09:09.840 --> 09:12.660
of the LSTM, X equals HX.

09:12.660 --> 09:14.250
And we're almost done.

09:14.250 --> 09:17.280
Remember that we have two brains, one brain for the actor

09:17.280 --> 09:18.840
and one brain for the critic.

09:18.840 --> 09:21.750
And, therefore, we have two output signals to return,

09:21.750 --> 09:23.490
the output signal of the actor

09:23.490 --> 09:25.530
and the output signal of the critic.

09:25.530 --> 09:27.480
And, therefore, now what we're gonna do,

09:27.480 --> 09:29.550
is return these two output signals.

09:29.550 --> 09:30.810
And how can we do that?

09:30.810 --> 09:32.010
Well, that's very easy.

09:32.010 --> 09:35.460
We simply need to take our linear full connections

09:35.460 --> 09:37.980
but separately, that is a linear full connection

09:37.980 --> 09:41.160
of the critic and the linear full connection of the actor.

09:41.160 --> 09:45.120
And we apply each of these full connections to the output X.

09:45.120 --> 09:47.550
That is the useful output of the LSTM.

09:47.550 --> 09:48.600
And that will be all.

09:48.600 --> 09:50.460
That will be the output signal.

09:50.460 --> 09:51.293
So there we go.

09:51.293 --> 09:52.126
Let's do it.

09:52.126 --> 09:54.480
We first take self, our object,

09:54.480 --> 09:57.480
then we get the linear full connection of the critic,

09:57.480 --> 10:01.050
which is critic and core linear.

10:01.050 --> 10:04.500
Which we apply to X, the output signal of the LSTM.

10:04.500 --> 10:07.980
And then same, we take self again, then dot.

10:07.980 --> 10:10.470
And then we take the linear full connection of the actor,

10:10.470 --> 10:15.470
which is actor and core linear, which same we apply to X.

10:16.290 --> 10:17.133
There we go.

10:18.000 --> 10:19.710
So that's the main thing we need.

10:19.710 --> 10:21.240
But then we're also going to return

10:21.240 --> 10:25.500
the topple of HX to hidden nodes and CX to cell nodes,

10:25.500 --> 10:27.090
because we'll be using them later

10:27.090 --> 10:29.490
in the retro loop of the LSTM.

10:29.490 --> 10:30.480
All right, perfect.

10:30.480 --> 10:33.000
So, now we're done with the brain,

10:33.000 --> 10:34.260
or should I say the brains?

10:34.260 --> 10:36.270
Because we actually made two brains.

10:36.270 --> 10:38.220
One for the actor and one for the critic.

10:38.220 --> 10:42.120
So, congratulations for making the A3C brains.

10:42.120 --> 10:43.920
I hope that wasn't too overwhelming,

10:43.920 --> 10:46.800
to combine a CNN and LSTM.

10:46.800 --> 10:48.300
But at least, the good news is that,

10:48.300 --> 10:51.300
we're really working with the best and most powerful model.

10:51.300 --> 10:52.980
So, there we go.

10:52.980 --> 10:56.850
We are actually done with this first foul model, the py.

10:56.850 --> 10:58.050
And so in the next tutorial,

10:58.050 --> 11:00.270
we'll take care of the optimizer

11:00.270 --> 11:03.300
because we're gonna make a separate optimizer.

11:03.300 --> 11:05.130
We're not going to code each line of code

11:05.130 --> 11:08.250
because a lot of it comes from the research papers.

11:08.250 --> 11:10.500
And this is actually pretty specific.

11:10.500 --> 11:13.920
And if we go into the deep details of what's going on

11:13.920 --> 11:17.430
with this optimizer, this might be a little too overwhelming

11:17.430 --> 11:18.750
for what's gonna happen next.

11:18.750 --> 11:22.260
Because we still have the train function to make,

11:22.260 --> 11:24.120
which will be a huge function,

11:24.120 --> 11:27.000
and that contains the algorithm of the A3C.

11:27.000 --> 11:29.280
So trust me, you want to keep some energy for that.

11:29.280 --> 11:32.490
And, therefore, we will not spend too much time on this.

11:32.490 --> 11:34.110
But still, I will expand the code

11:34.110 --> 11:36.330
and you will understand the whole idea

11:36.330 --> 11:38.160
behind this optimizer.

11:38.160 --> 11:41.310
So, congrats again, for making this act to critic class

11:41.310 --> 11:44.730
and I'll see you in the next tutorial to make the optimizer.

11:44.730 --> 11:46.203
Until then, enjoy ai.