WEBVTT

00:00.420 --> 00:02.820
-: Hello, and welcome to the very last step

00:02.820 --> 00:04.830
of this part one Building the AI.

00:04.830 --> 00:06.780
Now the only thing that we have to do left

00:06.780 --> 00:09.180
is to make this big forward function

00:09.180 --> 00:12.330
that will propagate the signal from the very beginning

00:12.330 --> 00:15.570
when the brain is getting the image to the very end

00:15.570 --> 00:17.370
when the AI plays the action.

00:17.370 --> 00:19.140
So we're gonna make this whole function

00:19.140 --> 00:21.270
and that's gonna be our last step

00:21.270 --> 00:23.370
before we move on to part two

00:23.370 --> 00:26.850
Training our AI with Deep Convolution Q Learning.

00:26.850 --> 00:27.707
So let's do this.

00:27.707 --> 00:30.960
We are gonna take the function call

00:30.960 --> 00:33.720
which actually is similar to the in it function.

00:33.720 --> 00:35.760
That is, it's an existing function,

00:35.760 --> 00:38.790
but this time we use it to call some other functions,

00:38.790 --> 00:40.470
the ones that we made before.

00:40.470 --> 00:41.550
Because you know we're gonna use

00:41.550 --> 00:43.260
the forward function from the brain

00:43.260 --> 00:45.600
and the forward function from the body.

00:45.600 --> 00:48.150
And so we're using this cold function now

00:48.150 --> 00:50.198
to basically call these functions.

00:50.198 --> 00:53.160
So call is gonna take two arguments.

00:53.160 --> 00:56.010
The first one itself, of course, the object.

00:56.010 --> 00:59.070
And a second argument, which according to you

00:59.070 --> 01:00.330
what is it going to be?

01:00.330 --> 01:02.880
Well, we are doing the whole propagation this time.

01:02.880 --> 01:05.160
So what we want to take as input

01:05.160 --> 01:06.960
is of course the input images,

01:06.960 --> 01:09.030
because of course that's the storing point,

01:09.030 --> 01:10.830
when the AI is playing the game.

01:10.830 --> 01:13.320
It is first visualizing the images of the game

01:13.320 --> 01:15.420
then propagates the signals in the brain

01:15.420 --> 01:17.160
and then plays the action.

01:17.160 --> 01:20.610
Therefore, the second argument is going to be input.

01:20.610 --> 01:24.437
And now we are ready to make this whole propagation.

01:24.437 --> 01:25.980
So let's do this.

01:25.980 --> 01:27.870
Okay, so the first step, where is it?

01:27.870 --> 01:32.190
The first step is receiving the input images from the game.

01:32.190 --> 01:35.640
And since these images are gonna enter the neural network

01:35.640 --> 01:36.480
where you can imagine,

01:36.480 --> 01:39.720
that we have to format them in a special structure

01:39.720 --> 01:42.780
and the structure is of course a torch structure.

01:42.780 --> 01:44.850
So the first thing that will happen is that

01:44.850 --> 01:48.150
we will convert these images into a NumPy array.

01:48.150 --> 01:51.072
Then we will convert the NumPy array into a torch denser.

01:51.072 --> 01:53.670
And then finally, we will put the torch denser

01:53.670 --> 01:56.310
inside a torch variable that will contain

01:56.310 --> 01:58.320
both the tensor and a gradient.

01:58.320 --> 02:01.530
That's for our dynamic graphs to compute very efficiently.

02:01.530 --> 02:04.590
The gradient later is to catch a grade in this sense.

02:04.590 --> 02:06.270
So that's our first step.

02:06.270 --> 02:09.649
And then once we get the right format of our images,

02:09.649 --> 02:12.360
well they will be able to enter the neural network.

02:12.360 --> 02:13.440
And then that's where we'll do

02:13.440 --> 02:16.350
the whole propagation of the signals.

02:16.350 --> 02:17.820
So let's do this first step.

02:17.820 --> 02:20.400
Converting the image into the right format.

02:20.400 --> 02:23.160
So our images are so far input.

02:23.160 --> 02:24.930
So now we're going to create a new variable

02:24.930 --> 02:26.790
which I'm calling input.

02:26.790 --> 02:29.520
So that's the real input of the neural network.

02:29.520 --> 02:31.890
And this input, where is it going to be?

02:31.890 --> 02:34.500
Well, first we need to take our input,

02:34.500 --> 02:36.810
that is our original images.

02:36.810 --> 02:40.800
Then as we said, we want to convert these images

02:40.800 --> 02:42.330
into NumPy arrays.

02:42.330 --> 02:45.270
So to do this, we can simply take NumPy,

02:45.270 --> 02:49.500
which has a shortcut, NP, then the function array.

02:49.500 --> 02:52.860
So we put input in the parenthesis of the function array.

02:52.860 --> 02:53.693
There we go.

02:53.693 --> 02:56.160
Now it is converted into some NumPy arrays.

02:56.160 --> 02:58.410
But then since the cells of the NumPy arrays

02:58.410 --> 03:01.530
will contain the pixels, it is actually safer

03:01.530 --> 03:04.470
to specify the type float.

03:04.470 --> 03:07.290
It's better to make sure we have some floats right now.

03:07.290 --> 03:12.290
And to make sure of it, we can use NP dot float 32 here.

03:12.720 --> 03:15.570
All right, so now we still have a NumPy arrays

03:15.570 --> 03:17.790
but with the type float.

03:17.790 --> 03:19.590
All right, and that's also for another reason.

03:19.590 --> 03:24.240
It's that tensors are by definition arrays of a single type.

03:24.240 --> 03:28.050
And so we choose this single type to be a float, float 32.

03:28.050 --> 03:30.120
All right, so now that we have our NumPy arrays,

03:30.120 --> 03:33.960
the next step is to convert that into a torch tensor.

03:33.960 --> 03:36.570
And to do this, we can use for example,

03:36.570 --> 03:41.570
torch dot and then the from on this core NumPy function

03:42.570 --> 03:45.420
so that we'll convert that into a torch tensor.

03:45.420 --> 03:46.253
There we go.

03:46.253 --> 03:49.260
And now the last step is to put these torch tensors

03:49.260 --> 03:51.480
into a torch variable containing

03:51.480 --> 03:53.370
both the tenor and the gradient.

03:53.370 --> 03:54.960
And you know how to do it.

03:54.960 --> 03:59.250
Of course, we take our variable class

03:59.250 --> 04:02.880
because actually everything that is inside this variable

04:02.880 --> 04:05.760
is actually the input of the variable class.

04:05.760 --> 04:07.740
But I wanted to show that to you this way

04:07.740 --> 04:10.740
because you know we start with our input images,

04:10.740 --> 04:12.930
then we convert them into NumPy arrays,

04:12.930 --> 04:16.110
then to torch tensors, and then to variable.

04:16.110 --> 04:17.190
And now we're good.

04:17.190 --> 04:19.980
They are allowed to enter the neural network.

04:19.980 --> 04:22.080
That is first the eyes of the AI,

04:22.080 --> 04:24.060
and then the fully connected layers

04:24.060 --> 04:26.190
to lead to the predictions.

04:26.190 --> 04:28.380
So speaking of the eyes of the AI,

04:28.380 --> 04:30.480
that's exactly what we're gonna do now.

04:30.480 --> 04:34.500
We're gonna propagate these allowed images now,

04:34.500 --> 04:35.790
into the eyes of the AI,

04:35.790 --> 04:38.820
that is through the three convolution layers.

04:38.820 --> 04:41.640
And to do this, you're gonna see now how it's so simple.

04:41.640 --> 04:43.680
That's because we already have our brain

04:43.680 --> 04:46.290
and our body from the in it function.

04:46.290 --> 04:49.080
We simply need to take our brain,

04:49.080 --> 04:54.080
so self dot brain and apply this brain to the input images.

04:55.230 --> 04:58.950
And that will propagate thanks to the forward function

04:58.950 --> 05:02.190
here from the brain that will propagate the signals

05:02.190 --> 05:03.420
inside the brain.

05:03.420 --> 05:05.670
And since the forward function of the brain

05:05.670 --> 05:07.470
returns the output signals,

05:07.470 --> 05:09.780
that is the neurons of the output layer

05:09.780 --> 05:11.400
containing the Q values.

05:11.400 --> 05:14.490
Well, this self brain input here

05:14.490 --> 05:16.440
will return this output signal.

05:16.440 --> 05:18.300
And therefore we're gonna put here

05:18.300 --> 05:20.310
whether it returns into a variable,

05:20.310 --> 05:23.253
and we're gonna call it very simply output.

05:24.210 --> 05:26.880
And this output is the output signal of the brain.

05:26.880 --> 05:29.940
And now, now that we have the output signal of the brain,

05:29.940 --> 05:32.970
well, we have to propagate this output signal to the body.

05:32.970 --> 05:36.060
And to do this, we're gonna use the second forward function

05:36.060 --> 05:37.020
from the body.

05:37.020 --> 05:40.510
And to do this, we simply need to take our body

05:41.730 --> 05:44.730
and apply it to of course, the output,

05:44.730 --> 05:47.100
because the forward function of the body

05:47.100 --> 05:50.640
takes as input, the output signals of the brain.

05:50.640 --> 05:54.150
So that's exactly what the output is right now

05:54.150 --> 05:55.950
and returns the actions.

05:55.950 --> 05:58.320
And therefore, since it returns the actions,

05:58.320 --> 06:00.450
well here we're gonna add

06:00.450 --> 06:03.840
actions equals self dot body output.

06:03.840 --> 06:06.150
All right, so now you can see that very simply

06:06.150 --> 06:08.850
we propagated the signals inside the brain

06:08.850 --> 06:10.650
and then from the brain to the body,

06:10.650 --> 06:13.440
first by using the forward function from the brain,

06:13.440 --> 06:16.170
which takes as input, the input images,

06:16.170 --> 06:18.300
and then propagate them into the brain

06:18.300 --> 06:20.220
to return the Q values.

06:20.220 --> 06:23.160
And then we propagate this output signal into the body

06:23.160 --> 06:24.900
with the forward function of our body

06:24.900 --> 06:26.880
to get the action to play.

06:26.880 --> 06:30.000
And so now the only remaining thing that we have to do

06:30.000 --> 06:33.060
and that's the very last line of code of this part one,

06:33.060 --> 06:34.440
Building the AI.

06:34.440 --> 06:37.500
Well, we have to return the action to play

06:37.500 --> 06:39.510
and that is actions.

06:39.510 --> 06:42.810
However, right now the actions have the torch format,

06:42.810 --> 06:45.240
and we need to convert them back into NumPy arrays.

06:45.240 --> 06:46.073
And to do this,

06:46.073 --> 06:49.350
we're gonna take the data structure of these actions

06:49.350 --> 06:52.860
and then add here the NumPy function.

06:52.860 --> 06:53.820
And there we go.

06:53.820 --> 06:56.730
Now we have the actions returned in the right format.

06:56.730 --> 06:58.050
So congratulations.

06:58.050 --> 07:00.720
We are now done with this first part one.

07:00.720 --> 07:03.390
We built the AI in three steps.

07:03.390 --> 07:06.840
First, we made the brain, second, we made the body.

07:06.840 --> 07:09.900
And third, we assembled the brain and the body.

07:09.900 --> 07:11.910
And we propagated the whole signal,

07:11.910 --> 07:15.540
from the eyes to the moment we played the action.

07:15.540 --> 07:17.100
So that's a first step done.

07:17.100 --> 07:18.390
That was a huge step.

07:18.390 --> 07:20.790
But now as you understood, we build an AI,

07:20.790 --> 07:22.200
but it is still stupid.

07:22.200 --> 07:24.150
We need to train it to be intelligent.

07:24.150 --> 07:26.640
So we need to train it to do what we wanted to do.

07:26.640 --> 07:28.800
And to do this we're gonna use the reward

07:28.800 --> 07:30.390
after doom environment, you know,

07:30.390 --> 07:32.190
because it's learning from the reward

07:32.190 --> 07:34.740
by being reinforced when it gets a good reward

07:34.740 --> 07:37.020
and by being punished or weakened

07:37.020 --> 07:38.670
when it's getting a bad reward.

07:38.670 --> 07:41.670
So that's where the Q learning will come into play.

07:41.670 --> 07:44.340
And so that's exactly what we'll do in this part two

07:44.340 --> 07:47.490
Training the AI with Deep Convolution Q Learning.

07:47.490 --> 07:48.750
I can't wait to start.

07:48.750 --> 07:50.343
And until then, enjoy AI.