WEBVTT

00:00.510 --> 00:03.210
-: Hello and welcome to this Python tutorial.

00:03.210 --> 00:05.850
All right, so now we're gonna make the forward function,

00:05.850 --> 00:09.240
which will propagate the output signals of our brain

00:09.240 --> 00:10.500
to the body of the AI

00:10.500 --> 00:12.510
so that it will play the right action

00:12.510 --> 00:13.770
to reach the vest.

00:13.770 --> 00:15.420
But there is no right action yet

00:15.420 --> 00:17.370
because there is no training yet.

00:17.370 --> 00:19.740
We have not trained the AI yet,

00:19.740 --> 00:22.350
but this is exactly what we will do in part two,

00:22.350 --> 00:24.630
Implementing Deep Convolutional Q-Learning,

00:24.630 --> 00:25.590
which, by the way.

00:25.590 --> 00:27.690
I will rename Training the AI

00:27.690 --> 00:29.940
with Deep Convolutional Q-Learning.

00:29.940 --> 00:32.700
But right now we need to forward the signal

00:32.700 --> 00:35.370
from the output layer of the brain to the body,

00:35.370 --> 00:37.140
and that's exactly what we're gonna do

00:37.140 --> 00:38.220
with this forward function,

00:38.220 --> 00:41.370
which is the last function of our body.

00:41.370 --> 00:42.840
So let's do this.

00:42.840 --> 00:46.350
We start with def forward,

00:46.350 --> 00:47.970
and according to you,

00:47.970 --> 00:50.310
what argument is it going to take?

00:50.310 --> 00:53.310
Well, it's going to take, of course, first self,

00:53.310 --> 00:55.200
and then is there another one?

00:55.200 --> 00:56.850
Well, yes, there is.

00:56.850 --> 00:58.290
And what is it going to be?

00:58.290 --> 00:59.910
Well, very naturally,

00:59.910 --> 01:03.030
we want to forward the output signal of the brain

01:03.030 --> 01:04.050
to the body

01:04.050 --> 01:05.820
and therefore the input will be

01:05.820 --> 01:07.770
the output signal of the brain.

01:07.770 --> 01:10.590
And so now we need to give a name to these output signals.

01:10.590 --> 01:13.323
And so I'm gonna add here the argument output.

01:14.550 --> 01:17.670
All right, so that corresponds to the output signals

01:17.670 --> 01:20.400
of the brain after the input images

01:20.400 --> 01:22.470
are propagated through all the brain

01:22.470 --> 01:23.850
to reach the output layer,

01:23.850 --> 01:24.960
which is X here,

01:24.960 --> 01:27.330
returned by the forward function of the brain.

01:27.330 --> 01:29.100
And now this output signal of the brain

01:29.100 --> 01:30.960
will be forwarded to the body

01:30.960 --> 01:32.490
with this new forward function

01:32.490 --> 01:35.430
that we make in the softmax body class.

01:35.430 --> 01:36.690
So let's do this.

01:36.690 --> 01:38.850
Let's add some column here.

01:38.850 --> 01:40.380
And now, as you understood,

01:40.380 --> 01:43.950
we're gonna use a softmax method to play the action.

01:43.950 --> 01:46.410
That means that the body of our AI

01:46.410 --> 01:48.930
after receiving the output signals of the brain

01:48.930 --> 01:51.390
will play the actions with a softmax technique.

01:51.390 --> 01:54.390
So basically now what we have to do is exactly the same

01:54.390 --> 01:56.580
as what we did for the self-driving car.

01:56.580 --> 01:59.520
We're gonna get our distribution of probabilities.

01:59.520 --> 02:00.780
That's the first step.

02:00.780 --> 02:03.390
And then we're gonna sample an action

02:03.390 --> 02:05.910
according to this distribution of probabilities.

02:05.910 --> 02:07.980
So basically what we could do now is get

02:07.980 --> 02:09.630
our self-driving car file

02:09.630 --> 02:11.550
and copy-paste what we implemented

02:11.550 --> 02:14.790
for the select action function in the self-driving car.

02:14.790 --> 02:15.690
But let's do it again.

02:15.690 --> 02:16.800
It will be good practice.

02:16.800 --> 02:20.130
And actually, you can try to type it before me.

02:20.130 --> 02:23.850
Okay, so first what we're gonna do is get our probabilities.

02:23.850 --> 02:26.820
So I remind this is a distribution of probabilities

02:26.820 --> 02:28.170
for each of the Q-values

02:28.170 --> 02:31.800
which depend on the input image and each action.

02:31.800 --> 02:33.330
So we have one Q-value

02:33.330 --> 02:36.660
for each of the six or seven possible actions,

02:36.660 --> 02:39.870
and therefore we get a distribution of seven probabilities.

02:39.870 --> 02:40.703
I'm saying seven

02:40.703 --> 02:43.650
because I think there is seven actions instead of six,

02:43.650 --> 02:46.830
because besides moving forward, left, right, or shooting,

02:46.830 --> 02:48.270
we can also run.

02:48.270 --> 02:50.370
So that makes seven possible actions,

02:50.370 --> 02:53.520
and therefore we get a distribution of seven probabilities,

02:53.520 --> 02:57.240
one for each Q-value associated to each action.

02:57.240 --> 02:59.040
So props equals,

02:59.040 --> 03:00.840
and now remember what we have to do?

03:00.840 --> 03:04.920
Well, basically we have to use the softmax function

03:04.920 --> 03:06.570
from the functional module.

03:06.570 --> 03:07.403
So that's very simple.

03:07.403 --> 03:11.220
We take our functional module first, then dot,

03:11.220 --> 03:13.680
and then we take our softmax function.

03:13.680 --> 03:14.550
Here it is.

03:14.550 --> 03:15.870
We press Enter,

03:15.870 --> 03:19.440
and now we input the arguments of the softmax function,

03:19.440 --> 03:22.680
which I remind are the elements for which

03:22.680 --> 03:25.500
you wanna create a distribution of probabilities.

03:25.500 --> 03:27.690
And so that's, of course, the Q-values,

03:27.690 --> 03:30.660
that is, the outputs of the neural network.

03:30.660 --> 03:32.700
That's the outputs of the neural network

03:32.700 --> 03:35.910
for which you wanna create a distribution of probabilities.

03:35.910 --> 03:38.070
And I remind that we want to create this distribution

03:38.070 --> 03:42.030
of probabilities to be able to explore the different actions

03:42.030 --> 03:44.250
instead of directly picking the one

03:44.250 --> 03:45.990
that has the maximum Q-value.

03:45.990 --> 03:48.870
If we directly pick the one that has the maximum Q-value

03:48.870 --> 03:51.330
while we don't explore much the other actions,

03:51.330 --> 03:52.860
and we might miss something.

03:52.860 --> 03:54.990
But with the softmax method

03:54.990 --> 03:56.820
we can do some more exploration

03:56.820 --> 03:59.310
and therefore maybe find some hidden solutions

03:59.310 --> 04:01.830
in the patterns that might be much better.

04:01.830 --> 04:04.230
So again, I highly recommend softmax,

04:04.230 --> 04:07.860
and therefore now what we have to do is input the Q-values,

04:07.860 --> 04:09.540
that is, our outputs here,

04:09.540 --> 04:11.370
the outputs of our brain.

04:11.370 --> 04:13.950
So outputs, there we go.

04:13.950 --> 04:18.150
But then we have this temperature parameter that we can use

04:18.150 --> 04:21.690
that we can configure to customize the exploration.

04:21.690 --> 04:24.600
Remember that the higher we set the temperature,

04:24.600 --> 04:27.840
the less exploration of the other actions we will do

04:27.840 --> 04:29.760
because the best action will be selected

04:29.760 --> 04:31.440
with the higher probability

04:31.440 --> 04:32.940
as opposed to the other actions

04:32.940 --> 04:35.660
which will be selected with lower probabilities.

04:35.660 --> 04:37.830
So that's exactly like for the self-driving car,

04:37.830 --> 04:40.950
and therefore we have to multiply the output here

04:40.950 --> 04:45.660
by our temperature parameter, cel.t.

04:45.660 --> 04:47.250
There we go.

04:47.250 --> 04:49.440
Perfect. Now we get a little warning

04:49.440 --> 04:51.600
because we haven't used perhaps yet,

04:51.600 --> 04:53.250
but we are about to use it now.

04:53.250 --> 04:55.530
And so that brings us to the next thing we have to do.

04:55.530 --> 04:57.870
How are we gonna use these probabilities?

04:57.870 --> 05:01.200
Well, we're going to sample the final action to play

05:01.200 --> 05:03.720
from this distribution of probabilities,

05:03.720 --> 05:06.780
and therefore, what we have to do now is use

05:06.780 --> 05:09.720
the multinomial function to sample the action

05:09.720 --> 05:12.330
according to this distribution of probabilities.

05:12.330 --> 05:15.120
So now we're ready to get our actions.

05:15.120 --> 05:17.790
So I'm creating a new variable here because that will become

05:17.790 --> 05:21.510
the actions that will be played by the body of our AI.

05:21.510 --> 05:25.950
And so now we take our distribution of probabilities, probs,

05:25.950 --> 05:27.870
to which we add dot

05:27.870 --> 05:32.220
and then the multinomial method.

05:32.220 --> 05:34.950
All right, and now we get our final actions to play.

05:34.950 --> 05:38.310
They're assembled from our probs distribution.

05:38.310 --> 05:39.540
Okay, perfect.

05:39.540 --> 05:42.330
So now we are ready to return what we want,

05:42.330 --> 05:44.670
that is, the actions to play.

05:44.670 --> 05:47.100
And these are, of course, actions.

05:47.100 --> 05:48.810
And now the warning should disappear.

05:48.810 --> 05:50.430
We use everything we want.

05:50.430 --> 05:52.170
There we go. Perfect.

05:52.170 --> 05:54.030
So now the forward function is ready,

05:54.030 --> 05:57.510
and congratulations, the body is also ready.

05:57.510 --> 06:00.420
So now we have our brain, we have our body,

06:00.420 --> 06:02.430
and therefore we're ready to assemble them

06:02.430 --> 06:04.560
to make the future AI.

06:04.560 --> 06:07.200
Our future AI will be composed of nothing else

06:07.200 --> 06:08.940
than a brain and a body,

06:08.940 --> 06:10.710
and so it will have intelligence

06:10.710 --> 06:12.720
and a body to play the actions,

06:12.720 --> 06:14.730
which will be the right actions to play

06:14.730 --> 06:16.620
thanks to its intelligence.

06:16.620 --> 06:19.740
But remember, before we have to train its intelligence,

06:19.740 --> 06:21.690
and that's what we'll do in part two,

06:21.690 --> 06:25.170
Training the AI with Deep Convolutional Q-Learning.

06:25.170 --> 06:28.470
All right, so let's make the AI in the next tutorials.

06:28.470 --> 06:31.920
It's again going to be a class of two functions, I think.

06:31.920 --> 06:34.650
And so this will require two or three tutorials.

06:34.650 --> 06:35.640
So I can't wait.

06:35.640 --> 06:36.780
This will be exciting.

06:36.780 --> 06:38.073
And until then, enjoy AI.