WEBVTT

00:00.330 --> 00:02.430
Narrator: Hello and welcome to part two,

00:02.430 --> 00:05.460
training the AI with deep convolutional Q-learning.

00:05.460 --> 00:07.950
That's right, now that we built the AI

00:07.950 --> 00:09.990
with the architecture of the neural network,

00:09.990 --> 00:12.810
the body, the way the actions are played and everything

00:12.810 --> 00:17.220
it's time to train this AI with deep convolution Q-learning.

00:17.220 --> 00:21.180
So that's from now that we will implement experience replay,

00:21.180 --> 00:23.940
working with the Q values, working with the rewards

00:23.940 --> 00:25.890
and there's even gonna be a bonus

00:25.890 --> 00:29.100
which will improve a lot, the training process.

00:29.100 --> 00:31.650
And that is called eligibility trace.

00:31.650 --> 00:34.080
Eligibility trace is a powerful technique

00:34.080 --> 00:37.680
which consists of accumulating the reward

00:37.680 --> 00:40.710
over several steps and the Q values are learned

00:40.710 --> 00:42.660
on this accumulation of rewards.

00:42.660 --> 00:45.420
As opposed to before, where the Q values were learned

00:45.420 --> 00:46.830
after each transition.

00:46.830 --> 00:49.110
Therefore, after getting each reward.

00:49.110 --> 00:51.030
This time we will be learning the Q values

00:51.030 --> 00:54.630
after getting several rewards instead of just one reward.

00:54.630 --> 00:57.960
So instead of having one transition after the other

00:57.960 --> 01:00.630
and updating the Q value each time.

01:00.630 --> 01:04.590
Well, the Q values are gonna be updated every n-steps,

01:04.590 --> 01:06.900
because eligibility trace is rather called,

01:06.900 --> 01:08.970
n-steps eligibility trace.

01:08.970 --> 01:10.860
And N is this number after which

01:10.860 --> 01:12.840
the Q values are gonna be updated.

01:12.840 --> 01:15.630
And in our model here we're gonna have an equals 10.

01:15.630 --> 01:18.570
So that means that will be a 10 steps eligibility trace

01:18.570 --> 01:21.810
and therefore we will update and learn the Q values

01:21.810 --> 01:24.420
every 10 steps after accumulating the rewards

01:24.420 --> 01:25.710
on these 10 steps.

01:25.710 --> 01:26.820
So that's a bonus

01:26.820 --> 01:29.700
that will make our model even more powerful

01:29.700 --> 01:31.170
and you will see that in the end

01:31.170 --> 01:33.150
we will get outstanding results.

01:33.150 --> 01:35.970
I was really amazed when I saw the final results.

01:35.970 --> 01:40.620
I used to work on models that took a lot of time to execute.

01:40.620 --> 01:42.630
The AI took a lot of time to train

01:42.630 --> 01:44.490
but you will see that with this one

01:44.490 --> 01:47.970
plus the neural network that we made, that is our brain

01:47.970 --> 01:50.070
and our body here with Steph Max,

01:50.070 --> 01:52.890
we will get a very powerful model

01:52.890 --> 01:54.630
and therefore very powerful AI,

01:54.630 --> 01:57.300
because you will see that it will ridiculise Doom.

01:57.300 --> 01:59.340
You all understand what I'm talking about.

01:59.340 --> 02:01.800
So as you can see in this part two

02:01.800 --> 02:04.410
we are starting by getting the Doom environment,

02:04.410 --> 02:06.720
and I actually prepare the lines of code for you.

02:06.720 --> 02:10.680
We are just using the image pre-processing external file

02:10.680 --> 02:12.900
from our working directory folder.

02:12.900 --> 02:16.200
So basically the order is rather to first

02:16.200 --> 02:18.390
take this line of code, gym.meg,

02:18.390 --> 02:20.190
P packet Doom cord of V-zero.

02:20.190 --> 02:23.220
So Doom cord of V-zero is the name of the environment

02:23.220 --> 02:24.450
of the game we're playing.

02:24.450 --> 02:28.290
So first we import the environment with this gym.meg,

02:28.290 --> 02:32.190
that's what you can find on the OpenAI gym tutorials.

02:32.190 --> 02:35.970
But then, we use this pre-process image class,

02:35.970 --> 02:38.850
which is a class from image pre-processing

02:38.850 --> 02:41.400
to pre-process the images

02:41.400 --> 02:43.470
that will come into the neural network.

02:43.470 --> 02:46.560
And we pre-process them so that they have a square format

02:46.560 --> 02:51.480
with the dimensions 80 by 80, and that remembers because

02:51.480 --> 02:53.010
in our neural network--

02:53.010 --> 02:56.580
Well, we set our input images to have the dimensions

02:56.580 --> 02:58.530
one by 80 by 80.

02:58.530 --> 03:00.510
Remember one is the number of channels.

03:00.510 --> 03:02.130
And so, one means that we're working with

03:02.130 --> 03:03.600
black and white images.

03:03.600 --> 03:06.870
So that's the gray scale here.

03:06.870 --> 03:11.430
And 80 by 80 means that the dimensions of our input images

03:11.430 --> 03:12.990
will be 80 by 80.

03:12.990 --> 03:14.760
And that is what we set in the neural network,

03:14.760 --> 03:17.730
but of course then we need to specify this

03:17.730 --> 03:21.180
when inputting the images, which is exactly what we do here

03:21.180 --> 03:23.610
with this pre-processed image class.

03:23.610 --> 03:25.920
And then, after we import the environment

03:25.920 --> 03:28.560
with the right format of the input images,

03:28.560 --> 03:31.140
while we import the whole game with the videos

03:31.140 --> 03:32.490
with this line of code.

03:32.490 --> 03:34.770
And remember, the cool thing about this is that

03:34.770 --> 03:37.860
in the end we'll see the videos of our AI playing Doom.

03:37.860 --> 03:39.780
So we will see how it will kill the monsters,

03:39.780 --> 03:42.030
try to reach the best and everything.

03:42.030 --> 03:44.730
So that will be super exciting and remember that

03:44.730 --> 03:48.600
these videos will go into this videos folder.

03:48.600 --> 03:52.200
All right, and last line here, but I want to show it to you

03:52.200 --> 03:55.170
because that's important, that's now more related to

03:55.170 --> 03:56.880
the AI that we're building.

03:56.880 --> 04:00.510
Well, remember that our neural network

04:00.510 --> 04:02.850
takes as input, number actions.

04:02.850 --> 04:06.480
That's because we want to make an AI that we can test easily

04:06.480 --> 04:09.630
on several environments, on several Doom environments.

04:09.630 --> 04:11.340
And since the different Doom environments

04:11.340 --> 04:12.960
have different number of actions,

04:12.960 --> 04:16.320
well, we specified this number actions variable

04:16.320 --> 04:19.170
as the input of the cnn, the brain.

04:19.170 --> 04:21.690
And therefore, now what we're gonna do is

04:21.690 --> 04:25.290
get this number actions variable using

04:25.290 --> 04:27.420
the Doom environment that we just imported

04:27.420 --> 04:29.310
and create it into this variable.

04:29.310 --> 04:31.650
And later this number actions variable

04:31.650 --> 04:34.860
that we're about to create will be the input of the brain.

04:34.860 --> 04:37.740
So let's do this, I'm introducing this,

04:37.740 --> 04:40.470
real now, variable number actions.

04:40.470 --> 04:42.690
So number actions equals,

04:42.690 --> 04:45.720
now we're gonna take our Doom environment,

04:45.720 --> 04:48.120
that is the variable that we created.

04:48.120 --> 04:51.480
So Doom environment, then we add here dot

04:51.480 --> 04:52.680
and then, well here we go,

04:52.680 --> 04:54.960
we take the first here, action-space.

04:54.960 --> 04:57.180
That's the set of your actions.

04:57.180 --> 05:00.840
I encourage you to have a look at the OpenAI tutorials

05:00.840 --> 05:03.360
to see how it works, to understand how

05:03.360 --> 05:05.640
the OpenAI gym environments work.

05:05.640 --> 05:08.010
But basically, this is the set of actions.

05:08.010 --> 05:10.770
And from this set of actions, we can access

05:10.770 --> 05:12.930
the number of actions in the environment.

05:12.930 --> 05:16.080
And to do this, we add a dot and n.

05:16.080 --> 05:17.790
N is the number of actions.

05:17.790 --> 05:22.230
And therefore Doom-n-dot-action-space-dot-n,

05:22.230 --> 05:23.940
will return seven.

05:23.940 --> 05:26.700
It will return seven because there are seven actions.

05:26.700 --> 05:28.740
I know that we can see six actions

05:28.740 --> 05:31.650
in the Doom environments on the OpenAI gym page,

05:31.650 --> 05:33.420
but I think we can also run.

05:33.420 --> 05:35.700
And so you know, we can move forward, move left,

05:35.700 --> 05:39.000
move right turn left, turn right and shoot.

05:39.000 --> 05:40.170
And besides, we can run.

05:40.170 --> 05:42.270
So that makes seven actions.

05:42.270 --> 05:45.300
All right, and that's it for getting the Doom environment.

05:45.300 --> 05:48.570
We have the Doom environment, we have the number of actions.

05:48.570 --> 05:52.830
So we have so far everything that we need for our brain.

05:52.830 --> 05:55.950
We will then just create an object, a brain object,

05:55.950 --> 05:58.470
which we'll call, cnn, in minimal letters.

05:58.470 --> 06:01.380
And since the init function takes number of actions

06:01.380 --> 06:04.860
as argument, well we will input the number of actions

06:04.860 --> 06:07.500
in the cnn object that we will create.

06:07.500 --> 06:09.720
And then of course we will create the body

06:09.720 --> 06:11.820
and eventually the AI.

06:11.820 --> 06:14.160
And that's why the next section, I'm gonna call it,

06:14.160 --> 06:16.710
building an AI, because now

06:16.710 --> 06:19.170
we can build as many AI as we want.

06:19.170 --> 06:22.110
That's the awesome thing about object oriented programming,

06:22.110 --> 06:24.330
we can build any AI as we want.

06:24.330 --> 06:26.670
And so we're gonna build our AI

06:26.670 --> 06:28.470
that has this sophisticated brain

06:28.470 --> 06:31.830
and that's exactly what we'll do in the next tutorial.

06:31.830 --> 06:33.573
Until then, enjoy AI.