WEBVTT

00:00.540 --> 00:01.470
Hadelin: Are you ready?

00:01.470 --> 00:02.460
Let's do this.

00:02.460 --> 00:03.930
Let's start by installing

00:03.930 --> 00:06.180
all the system dependencies for ViZDoom.

00:06.180 --> 00:07.830
Let's click this play button

00:07.830 --> 00:11.010
and now it will install all the dependencies,

00:11.010 --> 00:13.710
as you can see, like Pillow or SciPy.

00:13.710 --> 00:15.810
And also some other dependencies

00:15.810 --> 00:18.240
that are gonna be needed to run this successfully.

00:18.240 --> 00:20.100
But all the rest, like PyTorch,

00:20.100 --> 00:22.320
the gym modules are already installed.

00:22.320 --> 00:23.160
So that's really awesome.

00:23.160 --> 00:25.380
That's really the beauty of Google Colab.

00:25.380 --> 00:26.790
And that is thanks to which,

00:26.790 --> 00:29.400
none of you will have any issue,

00:29.400 --> 00:32.580
executing the code and visualizing the final result.

00:32.580 --> 00:33.750
All right, so this is gonna take

00:33.750 --> 00:35.340
actually one or two minutes,

00:35.340 --> 00:37.830
so I'm just going to fast forward here

00:37.830 --> 00:39.240
and I'll see you very soon

00:39.240 --> 00:42.094
for the rest of the execution of the code.

00:42.094 --> 00:44.490
(smacking lips) All right, we seem to be at the end

00:44.490 --> 00:47.070
of the installation of the system dependencies.

00:47.070 --> 00:49.470
As you can see, it is downloading them,

00:49.470 --> 00:50.700
installing them,

00:50.700 --> 00:53.970
while connecting them with all the right requirements.

00:53.970 --> 00:57.420
And in a matter of seconds it should be done.

00:57.420 --> 00:58.470
Right.

00:58.470 --> 00:59.850
And let's see,

00:59.850 --> 01:01.500
three, two,

01:01.500 --> 01:02.333
there we go.

01:02.333 --> 01:03.990
Successfully installed everything.

01:03.990 --> 01:05.310
Don't worry about these errors here,

01:05.310 --> 01:07.590
they won't impact the execution of the code.

01:07.590 --> 01:10.230
But everything is successfully installed.

01:10.230 --> 01:12.090
Just as we want, all right?

01:12.090 --> 01:14.410
So, now next step, very important,

01:14.410 --> 01:16.680
look at this important note.

01:16.680 --> 01:19.110
It says that after installing all dependencies

01:19.110 --> 01:22.140
basically after executing this first cell here,

01:22.140 --> 01:24.120
you have to restart your Runtime.

01:24.120 --> 01:26.610
Otherwise you will get some execution errors here.

01:26.610 --> 01:28.470
So let's do that quickly, it's very simple.

01:28.470 --> 01:30.300
You just need to click Runtime here,

01:30.300 --> 01:32.760
and then restart Runtime,

01:32.760 --> 01:33.990
and then yes, all right.

01:33.990 --> 01:35.790
This will restart your Runtime.

01:35.790 --> 01:38.910
And now you can just execute all these cells

01:38.910 --> 01:40.590
by just clicking the play button.

01:40.590 --> 01:41.700
So let's do this.

01:41.700 --> 01:44.970
Starting with this first file, image pre-processing.

01:44.970 --> 01:47.820
All right, so let's first import the libraries,

01:47.820 --> 01:50.310
then pre-process the images

01:50.310 --> 01:51.840
with the pre-process image class.

01:51.840 --> 01:53.520
And now we already moved

01:53.520 --> 01:57.000
to the experience replay implementation,

01:57.000 --> 01:58.710
meaning this one.

01:58.710 --> 01:59.670
All right.

01:59.670 --> 02:00.503
So let's do this.

02:00.503 --> 02:02.490
We first import the libraries,

02:02.490 --> 02:05.400
then we define one step of the environment,

02:05.400 --> 02:10.230
then we make the AI progress on several end step steps

02:10.230 --> 02:11.910
with the end step progress class

02:11.910 --> 02:14.910
exactly the same as what we have in the folder.

02:14.910 --> 02:18.090
And then we implement the experience replay

02:18.090 --> 02:20.610
by building this replay memory class.

02:20.610 --> 02:22.170
All right, all good?

02:22.170 --> 02:24.930
And now we move on to the third file.

02:24.930 --> 02:27.990
This one, AI for Doom, ai.py.

02:27.990 --> 02:30.213
And we first import the libraries,

02:31.140 --> 02:33.480
then, all right so it takes a little time here,

02:33.480 --> 02:35.970
because we import all the torch modules,

02:35.970 --> 02:39.000
then we import the packages for OpenAI and Doom.

02:39.000 --> 02:42.120
So with the gym and ViZDoom gym and the wrappers.

02:42.120 --> 02:43.230
Okay, so all good.

02:43.230 --> 02:46.320
And then we move on to part one here,

02:46.320 --> 02:47.220
building the AI.

02:47.220 --> 02:49.260
Where we're gonna make the brain, then the body,

02:49.260 --> 02:51.510
and then assembling everything.

02:51.510 --> 02:53.910
So let's first make the brain,

02:53.910 --> 02:54.743
all right,

02:54.743 --> 02:55.950
with the CNN class.

02:55.950 --> 02:57.990
Then, let's make

02:57.990 --> 02:59.280
the body,

02:59.280 --> 03:01.350
with the Softmax body class.

03:01.350 --> 03:03.330
And then let's make the AI,

03:03.330 --> 03:04.860
with the AI class.

03:04.860 --> 03:05.693
All right?

03:05.693 --> 03:06.526
So all good.

03:06.526 --> 03:08.190
And now we already move on to part two

03:08.190 --> 03:09.630
where we're gonna train the AI

03:09.630 --> 03:12.690
with of course, Deep Convolutional Q-Learning.

03:12.690 --> 03:15.660
All right, so exactly the same as what we have here.

03:15.660 --> 03:18.330
We first get the Doom environment with our new modules,

03:18.330 --> 03:20.460
ViZDoom, don't worry about this,

03:20.460 --> 03:21.900
this is not an error.

03:21.900 --> 03:24.420
Then we are gonna build the AI by, you know,

03:24.420 --> 03:26.970
creating the different objects, the brain, CNN,

03:26.970 --> 03:28.470
the bodies of Softmax body

03:28.470 --> 03:31.110
and the whole AI containing the brain and the body.

03:31.110 --> 03:32.400
All right, did I execute this?

03:32.400 --> 03:33.360
Yes.

03:33.360 --> 03:35.730
Then, we set up experience replay

03:35.730 --> 03:39.390
with end steps and the memory as an objective replay memory.

03:39.390 --> 03:43.320
And then we implement eligibility trace,

03:43.320 --> 03:45.810
all right, to improve the performance.

03:45.810 --> 03:47.340
Then we make the moving average

03:47.340 --> 03:50.160
on 100 steps with the MA class.

03:50.160 --> 03:52.620
And finally, my friends, are you ready?

03:52.620 --> 03:54.180
Well this is now time for, you know,

03:54.180 --> 03:55.230
the very exciting part

03:55.230 --> 03:58.620
where we're gonna train the AI over 20 epochs.

03:58.620 --> 03:59.610
All right so you will see

03:59.610 --> 04:01.500
that this will be already a bit long.

04:01.500 --> 04:03.090
You know, it will take like,

04:03.090 --> 04:05.880
maybe one or two hours because I increase dimensions.

04:05.880 --> 04:08.720
Feel free to reduce dimensions back to 80 by 80

04:08.720 --> 04:09.870
if you find this too long.

04:09.870 --> 04:10.740
But trust me,

04:10.740 --> 04:13.200
you will have much better videos with these dimensions.

04:13.200 --> 04:14.130
Okay?

04:14.130 --> 04:14.963
So.

04:14.963 --> 04:15.810
Are you ready?

04:15.810 --> 04:17.940
Let's do this in three,

04:17.940 --> 04:18.780
two,

04:18.780 --> 04:19.890
one,

04:19.890 --> 04:20.723
go.

04:20.723 --> 04:23.777
All right, so this will execute the code of the training

04:23.777 --> 04:26.220
and in a few seconds we should be able to see

04:26.220 --> 04:27.510
the first epoch,

04:27.510 --> 04:30.270
which will have, of course, a negative reward.

04:30.270 --> 04:32.253
But you will see that, you know, over the epoch's

04:32.253 --> 04:35.370
that reward will increase little by little,

04:35.370 --> 04:37.050
until reaching positive rewards

04:37.050 --> 04:40.320
and on then until reaching hundreds of reward.

04:40.320 --> 04:41.910
Well, let's aim for that.

04:41.910 --> 04:44.190
Actually, let's hope that with 20 epochs

04:44.190 --> 04:47.880
we will have some final rewards at some hundreds,

04:47.880 --> 04:50.640
you know, like 100 or 200 or 300.

04:50.640 --> 04:52.980
Because with these rewards, I experimented with them,

04:52.980 --> 04:57.240
actually, yes, first epoch, negative rewards minus 98.

04:57.240 --> 05:00.810
So I was saying that with rewards at around 100, 200, 300,

05:00.810 --> 05:03.030
we will get already some great result.

05:03.030 --> 05:05.490
You know, we will see the AI managing to

05:05.490 --> 05:07.680
either kill some monsters or avoiding them

05:07.680 --> 05:10.140
or you know, moving towards the vest.

05:10.140 --> 05:10.973
Okay?

05:10.973 --> 05:12.570
So, that's the first epoch.

05:12.570 --> 05:14.580
Epoch 1 minus 98.

05:14.580 --> 05:15.413
And then you know,

05:15.413 --> 05:16.620
we'll see epoch number two,

05:16.620 --> 05:18.840
with maybe already a better reward.

05:18.840 --> 05:20.190
But you know, at the beginning, of course,

05:20.190 --> 05:21.690
the AI is not trained.

05:21.690 --> 05:23.850
It is exploring the environment, right.

05:23.850 --> 05:26.640
Remember this trade off in reinforcement learning,

05:26.640 --> 05:28.710
exploration versus exploitation.

05:28.710 --> 05:31.410
Well, at the beginning the AI is purely exploring,

05:31.410 --> 05:32.730
and then it's gonna train

05:32.730 --> 05:34.860
and then it's gonna become smarter and smarter.

05:34.860 --> 05:37.980
And that's when it's going to reach some high rewards.

05:37.980 --> 05:39.510
So, it's totally fine

05:39.510 --> 05:41.100
to have negative rewards at the beginning.

05:41.100 --> 05:43.800
Maybe we'll get that over the three first epochs.

05:43.800 --> 05:46.410
But then you will see that after I put number four,

05:46.410 --> 05:48.390
five or six, well we will start reaching

05:48.390 --> 05:50.460
maybe positive rewards,

05:50.460 --> 05:55.320
and then hopefully rewards at around 100, 200 or 300, okay?

05:55.320 --> 05:57.660
So, it's gonna take a little while as I said.

05:57.660 --> 06:00.450
So we're not gonna stay here for two hours,

06:00.450 --> 06:02.730
otherwise I will run out of things to say.

06:02.730 --> 06:04.740
So, what I'm gonna do is,

06:04.740 --> 06:09.150
I'm going to put a fun or cool music now, and there we go.

06:09.150 --> 06:10.860
Epoch number two, minus 62.

06:10.860 --> 06:12.720
So there is already some improvement, that's good.

06:12.720 --> 06:14.460
But you will see that there will be

06:14.460 --> 06:16.110
some even better improvements,

06:16.110 --> 06:18.480
the more we progress in the epochs.

06:18.480 --> 06:19.313
Okay?

06:19.313 --> 06:20.146
So what was I saying?

06:20.146 --> 06:22.800
Yes, I'm gonna put some cool music now

06:22.800 --> 06:25.530
and play the training in accelerated mode.

06:25.530 --> 06:26.363
And of course,

06:26.363 --> 06:28.410
I'll see you at the end of the training

06:28.410 --> 06:30.450
to see the final results.

06:30.450 --> 06:31.283
All right?

06:31.283 --> 06:32.190
So there we go.

06:32.190 --> 06:33.023
three,

06:33.023 --> 06:33.856
two,

06:33.856 --> 06:34.689
one,

06:34.689 --> 06:35.567
go.

06:35.567 --> 06:38.734
(upbeat techno music)

07:00.403 --> 07:03.903
(music volume increasing)

07:13.515 --> 07:17.015
(music volume increasing)

07:19.352 --> 07:20.202
(Hadelin claps hands)
(music stops)

07:20.202 --> 07:22.680
All right, and here we are at the end of the training.

07:22.680 --> 07:23.700
Congratulations.

07:23.700 --> 07:26.700
You trained a Deep Convolutional Q-Learning model

07:26.700 --> 07:28.740
on a very challenging application,

07:28.740 --> 07:30.600
which is to play the game of "Doom".

07:30.600 --> 07:32.220
So, what to say first?

07:32.220 --> 07:33.900
Well, as we hoped for,

07:33.900 --> 07:37.710
we reached some average reward at more than 100.

07:37.710 --> 07:40.170
Then, what is important to say is, of course,

07:40.170 --> 07:43.530
that with more epochs you will get higher rewards.

07:43.530 --> 07:45.660
So if you're ready to, for example,

07:45.660 --> 07:47.580
train this model for more epochs,

07:47.580 --> 07:49.650
like 100 epochs, or even more,

07:49.650 --> 07:51.120
in order to reach rewards

07:51.120 --> 07:54.750
more about 300, 400, 500, or even 1000,

07:54.750 --> 07:56.070
well, feel free to do it.

07:56.070 --> 07:57.540
For example, you can let this run

07:57.540 --> 07:59.250
for the night while you sleep.

07:59.250 --> 08:00.750
And when you wake up in the morning,

08:00.750 --> 08:02.490
you get your better results.

08:02.490 --> 08:05.640
Note that you can also use in the Runtime at GPU, right?

08:05.640 --> 08:07.470
If you change the Runtime type,

08:07.470 --> 08:08.970
which I shouldn't do because otherwise

08:08.970 --> 08:10.590
it will restart the notebook,

08:10.590 --> 08:12.480
but, in the hardware accelerator here

08:12.480 --> 08:14.610
you can choose GPU or even TPU.

08:14.610 --> 08:16.920
But that's only if you wanna, you know,

08:16.920 --> 08:20.460
optimize the performance and do some super hard training.

08:20.460 --> 08:22.410
But here I just used the classic thing

08:22.410 --> 08:25.230
because I just wanna show you how to execute all this.

08:25.230 --> 08:26.340
And there we go.

08:26.340 --> 08:29.370
Now we're going to execute the rest of the workbook

08:29.370 --> 08:31.290
with this extra code,

08:31.290 --> 08:33.300
only specific to this Colab notebook,

08:33.300 --> 08:34.133
where, of course,

08:34.133 --> 08:36.660
we're gonna visualize the AI in action.

08:36.660 --> 08:37.493
All right.

08:37.493 --> 08:38.326
So let's do this.

08:38.326 --> 08:40.500
Let's first import the libraries, right?

08:40.500 --> 08:41.580
All good.

08:41.580 --> 08:44.220
Then, we're gonna print the input shape

08:44.220 --> 08:46.230
and the number of possible actions.

08:46.230 --> 08:47.880
All right, so here,

08:47.880 --> 08:50.790
we're gonna get indeed that we have seven possible actions,

08:50.790 --> 08:52.890
you know in the Doom Corridor environment.

08:52.890 --> 08:55.560
So these are move forward, move backward,

08:55.560 --> 08:57.240
go left, go right,

08:57.240 --> 08:58.073
shoot,

08:58.073 --> 09:00.510
and then maybe protect yourself or whatever.

09:00.510 --> 09:02.460
I don't know what the last action is,

09:02.460 --> 09:03.810
but something like that.

09:03.810 --> 09:07.620
Then these are the dimensions of the input frame.

09:07.620 --> 09:09.510
This corresponds to the height of the frame.

09:09.510 --> 09:11.790
It has 240 pixels height.

09:11.790 --> 09:14.610
This corresponds to the width of the frame, right?

09:14.610 --> 09:16.440
It is 320

09:16.440 --> 09:17.273
large.

09:17.273 --> 09:18.660
And this corresponds to, you know,

09:18.660 --> 09:21.540
the fact that we work with colored images

09:21.540 --> 09:22.890
and the three here corresponds

09:22.890 --> 09:26.010
to the three elements of RGB channels.

09:26.010 --> 09:26.843
All right.

09:26.843 --> 09:28.890
Then, let's execute the next cell.

09:28.890 --> 09:30.600
Displaying a frame of the environment

09:30.600 --> 09:32.280
just to see how it is like,

09:32.280 --> 09:34.170
and indeed remember that's useful

09:34.170 --> 09:37.320
to see the environment you're working with, right?

09:37.320 --> 09:39.420
So here we see that we are in Doom Corridor.

09:39.420 --> 09:41.400
But if you want to experiment

09:41.400 --> 09:43.800
with some more environments here,

09:43.800 --> 09:44.986
actually, you know, remember,

09:44.986 --> 09:48.090
they are also in the main page.

09:48.090 --> 09:49.050
Here, right here.

09:49.050 --> 09:51.450
Yes you have all the lists at the environments, you know,

09:51.450 --> 09:53.490
that I recommend experimenting with.

09:53.490 --> 09:56.580
So, you know, if you want to have a look at another one,

09:56.580 --> 09:58.410
well, you know I can show you actually,

09:58.410 --> 09:59.430
let's get this one.

09:59.430 --> 10:03.000
And then I'll go back to ViZDoom Corridor.

10:03.000 --> 10:07.055
Right, so if I replace that by this one,

10:07.055 --> 10:08.460
(keyboard keys clicking) right.

10:08.460 --> 10:10.350
And I re-execute this, well,

10:10.350 --> 10:12.390
we will get three actions this time.

10:12.390 --> 10:13.920
Of course it is a more simple one.

10:13.920 --> 10:15.510
And if we execute this cell,

10:15.510 --> 10:17.700
well, we'll see this environment, right?

10:17.700 --> 10:20.100
So it's just a way to quickly see

10:20.100 --> 10:22.680
what you're working with and, and to get a preview

10:22.680 --> 10:26.277
basically of what the AI will do in the environment.

10:26.277 --> 10:27.110
(lips smacking) All right,

10:27.110 --> 10:29.490
so let's go back to ViZDoom Corridor.

10:29.490 --> 10:31.440
Let's re-execute this,

10:31.440 --> 10:34.260
you know in case we need it for the next cells.

10:34.260 --> 10:37.500
All right, perfect Doom Corridor.

10:37.500 --> 10:39.780
Now let's move on to the final cell.

10:39.780 --> 10:41.550
So this is a helper function

10:41.550 --> 10:43.830
that will be used for the visualization.

10:43.830 --> 10:45.480
So let's execute this cell.

10:45.480 --> 10:48.210
And now, let's run the AI on one episode.

10:48.210 --> 10:50.880
And here you will be able to understand the code

10:50.880 --> 10:51.870
because you know, basically,

10:51.870 --> 10:53.640
it is the process of, you know,

10:53.640 --> 10:55.710
running the AI on a full episode,

10:55.710 --> 10:58.830
where at each step, it is in a specific state,

10:58.830 --> 11:01.170
it's going to play in action within the state,

11:01.170 --> 11:02.760
then it's going to get the reward

11:02.760 --> 11:05.040
and then reach the next state, right?

11:05.040 --> 11:07.530
So this is the classic MDP process, right,

11:07.530 --> 11:09.270
Mark of Decision Process.

11:09.270 --> 11:10.103
And so there we go.

11:10.103 --> 11:11.730
That's done for one episode.

11:11.730 --> 11:12.660
And finally,

11:12.660 --> 11:14.580
we're gonna get the video

11:14.580 --> 11:17.280
of the game play of our AI,

11:17.280 --> 11:19.860
who was trained for 20 epochs,

11:19.860 --> 11:22.590
was able to reach more than 100 reward,

11:22.590 --> 11:24.450
which will get us some pretty good results,

11:24.450 --> 11:26.910
but maybe not reaching the vest, but it's okay,

11:26.910 --> 11:28.710
you'll experiment that by yourself.

11:28.710 --> 11:32.100
And now I would like you to click this folder button here

11:32.100 --> 11:35.550
because I will want to show you how the video is populated.

11:35.550 --> 11:38.460
So these are some folders containing, you know,

11:38.460 --> 11:40.170
some elements like the frames.

11:40.170 --> 11:42.960
You know, the frames of the results or some JSON files.

11:42.960 --> 11:45.780
But really what we'll be interested in is the final video.

11:45.780 --> 11:48.870
And to get it, we just need to click the play button here

11:48.870 --> 11:51.060
and you will see that it will be populated

11:51.060 --> 11:52.170
in the main folder here.

11:52.170 --> 11:53.003
Don't miss it.

11:53.003 --> 11:55.500
And it's okay, you know, it looks like the cell

11:55.500 --> 11:57.630
was already executed, which is the case,

11:57.630 --> 11:59.310
but you'll see that in a few seconds

11:59.310 --> 12:01.200
we will see an AVI file,

12:01.200 --> 12:02.670
which is a video file,

12:02.670 --> 12:03.930
being populated here.

12:03.930 --> 12:05.820
It will appear in like 10 seconds.

12:05.820 --> 12:08.220
I can even do a, a countdown if you want.

12:08.220 --> 12:09.390
So let's do this.

12:09.390 --> 12:10.770
And 10,

12:10.770 --> 12:11.760
nine,

12:11.760 --> 12:12.593
eight,

12:12.593 --> 12:13.740
seven,

12:13.740 --> 12:14.880
six,

12:14.880 --> 12:16.020
five,

12:16.020 --> 12:17.070
four,

12:17.070 --> 12:18.210
three,

12:18.210 --> 12:19.200
two,

12:19.200 --> 12:20.035
one,

12:20.035 --> 12:21.497
go.

12:21.497 --> 12:24.150
And now it should really appear in a few seconds.

12:24.150 --> 12:25.260
Yeah, there we go.

12:25.260 --> 12:28.470
All right, so agentgameplay.avi, that's your video.

12:28.470 --> 12:30.630
So let's download it.

12:30.630 --> 12:34.470
And it will be downloaded on your computer,

12:34.470 --> 12:36.900
which I will find right here.

12:36.900 --> 12:40.020
Right, so that's the video, agentgameplay.avi.

12:40.020 --> 12:42.540
Make sure to open it with a video player

12:42.540 --> 12:44.550
that has codex like VLC, right?

12:44.550 --> 12:47.070
It won't work with QuickTime Player if you're on a Mac.

12:47.070 --> 12:48.906
But it will definitely work with VLC.

12:48.906 --> 12:50.850
So let's have a look.

12:50.850 --> 12:52.290
And here is the video.

12:52.290 --> 12:54.930
All right so, let me just press pause here.

12:54.930 --> 12:56.850
All right, so here is the video, and as you can see,

12:56.850 --> 13:01.277
so this is the 256 by 256 dimensions of the frame.

13:01.277 --> 13:03.420
So as you can see, it's not very large, right?

13:03.420 --> 13:05.760
So that's why I really wanted to work

13:05.760 --> 13:08.820
with these dimensions instead of 80 by 80.

13:08.820 --> 13:10.140
So now let's have a look at the video.

13:10.140 --> 13:13.860
So this is the AI playing "Doom" on one episode.

13:13.860 --> 13:16.020
And it has to avoid the monsters, not be killed.

13:16.020 --> 13:18.240
It has to move forward to reach the vest.

13:18.240 --> 13:19.590
So let's see how it does.

13:19.590 --> 13:20.423
three,

13:20.423 --> 13:21.840
two, one,

13:21.840 --> 13:23.217
go.

13:23.217 --> 13:24.050
(mouse button clicking)

13:24.050 --> 13:26.160
All right, so first it gets shot.

13:26.160 --> 13:28.110
It moves forward, okay,

13:28.110 --> 13:30.180
and then it got killed by the monsters.

13:30.180 --> 13:32.010
But, that's still quite good you know,

13:32.010 --> 13:34.410
it understood that it had to move forward towards the vest.

13:34.410 --> 13:38.100
Because the highest rewards is got by reaching the vest.

13:38.100 --> 13:40.860
And of course, you know, if you really wanna see the video

13:40.860 --> 13:42.300
of your AI reaching the vest,

13:42.300 --> 13:44.190
and winning at this game basically,

13:44.190 --> 13:47.130
you will have to train your AI for more epochs.

13:47.130 --> 13:49.680
And maybe do some even other kind of improvement,

13:49.680 --> 13:52.080
like tuning your brain of the AI,

13:52.080 --> 13:53.640
or doing some parameter tuning,

13:53.640 --> 13:55.290
like tuning the learning rate.

13:55.290 --> 13:56.880
Well, you have many options.

13:56.880 --> 13:59.310
If any of you get an amazing video,

13:59.310 --> 14:00.840
or you know, the video of an AI

14:00.840 --> 14:03.900
having reached a reward of more than 1000 for example,

14:03.900 --> 14:06.600
well, feel free to share it in the Q and A.

14:06.600 --> 14:09.613
I'm sure other students will be super happy to get it.

14:09.613 --> 14:12.900
All right so, I hoped you liked implementing

14:12.900 --> 14:15.090
the Deep Convolutional Q-Learning model.

14:15.090 --> 14:16.740
Now we're gonna move on to the next part,

14:16.740 --> 14:19.470
which will be about implementing the A3C model,

14:19.470 --> 14:21.900
an even better and more powerful model,

14:21.900 --> 14:25.230
which will implement to play the game of "Breakout".

14:25.230 --> 14:26.940
So I'll see you in the next part,

14:26.940 --> 14:28.893
and until then, enjoy AI.
