WEBVTT

00:00.300 --> 00:03.270
-: Hello and welcome to the third module of this course,

00:03.270 --> 00:04.290
The A3C,

00:04.290 --> 00:07.140
Asynchronous actor critic agent.

00:07.140 --> 00:09.660
And so now I can really say welcome,

00:09.660 --> 00:12.150
To the state of the art, machine learning.

00:12.150 --> 00:14.700
Well, at the time I'm saying this because maybe some,

00:14.700 --> 00:16.590
Of you will take the course in one or two years,

00:16.590 --> 00:19.674
But at the time, I'm saying this in 2017.

00:19.674 --> 00:21.750
While you are about to work on one,

00:21.750 --> 00:24.582
Of the most powerful model in artificial intelligence,

00:24.582 --> 00:25.950
But there is more,

00:25.950 --> 00:28.770
That is not the only special thing about this module.

00:28.770 --> 00:30.090
Not only we're about to work

00:30.090 --> 00:32.430
With the most powerful AI model.

00:32.430 --> 00:35.816
But we are going to implement the most powerful version,

00:35.816 --> 00:40.260
Of this algorithm, that is the most optimized version,

00:40.260 --> 00:42.960
Implemented version of the A3C model.

00:42.960 --> 00:45.510
Because you can imagine that there is the heart,

00:45.510 --> 00:47.880
Of the A3C algorithm, but then there are a lot,

00:47.880 --> 00:50.910
Of tools that we can use to optimize the whole model.

00:50.910 --> 00:52.830
And so not only you're gonna have the heart,

00:52.830 --> 00:53.978
Of the A3C algorithm,

00:53.978 --> 00:56.820
But who are also going to implement all these tools

00:56.820 --> 00:59.610
Around it to make the model super powerful.

00:59.610 --> 01:01.530
And why did I want to do that?

01:01.530 --> 01:03.090
Well, that's for two reasons.

01:03.090 --> 01:03.990
The first reason is,

01:03.990 --> 01:06.240
That we are approaching the end of this course.

01:06.240 --> 01:07.920
This course is the highest level course,

01:07.920 --> 01:11.505
Between the three courses, Ml, DL, and AI at a Z.

01:11.505 --> 01:14.370
So now I think you're ready to take it at the next level.

01:14.370 --> 01:16.140
And the second reason is that,

01:16.140 --> 01:19.170
Solving breakout is actually super challenging.

01:19.170 --> 01:20.790
Remember in the promo video,

01:20.790 --> 01:23.460
We wanted to put breakout as the first module,

01:23.460 --> 01:25.770
Because we thought it would be the easiest challenge,

01:25.770 --> 01:26.610
But not at all.

01:26.610 --> 01:28.860
It was actually the most difficult challenge.

01:28.860 --> 01:32.520
An easy way of explaining this, is that, well in Doom,

01:32.520 --> 01:35.307
The monsters are big and therefore easier to detect.

01:35.307 --> 01:37.830
And therefore easier to kill or avoid.

01:37.830 --> 01:38.670
But in Breakout,

01:38.670 --> 01:42.690
We have this tiny ball that the AI has to detect as well,

01:42.690 --> 01:45.030
Because the AI will still have eyes here.

01:45.030 --> 01:47.495
We're still going to do deep reinforcement learning.

01:47.495 --> 01:49.890
So it's actually super challenging

01:49.890 --> 01:52.320
And that's why we don't really have a choice,

01:52.320 --> 01:56.040
To implement the most powerful version of the A3C.

01:56.040 --> 01:59.130
Now, why do I say this is the most powerful version?

01:59.130 --> 02:00.360
That's for a particular reason.

02:00.360 --> 02:02.910
It's not like I'm saying that, I'm going to,

02:02.910 --> 02:05.430
Implement the most powerful version of the A3C.

02:05.430 --> 02:08.500
Now, it's not this, The reason I'm saying this is that,

02:08.500 --> 02:10.650
The version we're about to implement,

02:10.650 --> 02:12.840
And this is something very special we're gonna do,

02:12.840 --> 02:15.840
Is actually a version of the A3C that was implemented,

02:15.840 --> 02:18.360
By somebody, but corrected by,

02:18.360 --> 02:20.130
One of the most influential people,

02:20.130 --> 02:21.510
In machine learning today,

02:21.510 --> 02:24.780
Who happens to be the creator of PyTorch.

02:24.780 --> 02:27.600
His name is Adam Paszke.

02:27.600 --> 02:29.010
So now what we're gonna do

02:29.010 --> 02:32.250
We're gonna go on GitHub on the PyTorch main page.

02:32.250 --> 02:34.560
And if you scroll down down to the end

02:34.560 --> 02:36.864
Down to the bottom, you will see the team,

02:36.864 --> 02:40.770
The team of Pytorch creators and contributors.

02:40.770 --> 02:44.070
And you can see here that PyTorch is currently maintained,

02:44.070 --> 02:45.660
By Adam Paszke.

02:45.660 --> 02:48.840
That's the person who we should really be grateful for,

02:48.840 --> 02:51.480
Because there are very few versions, of the A3C,

02:51.480 --> 02:53.190
That work well for Breakout.

02:53.190 --> 02:54.630
And he corrected one of the code,

02:54.630 --> 02:58.440
For the A3C to make Breakout work perfectly well.

02:58.440 --> 03:01.080
So Adam Paszke is not only maintaining PyTorch,

03:01.080 --> 03:03.630
But also he's one of the creative PyTorch.

03:03.630 --> 03:04.860
And as I said today, he's,

03:04.860 --> 03:08.160
In the top 10 most influential people in machine learning.

03:08.160 --> 03:09.339
So we can feel confidence,

03:09.339 --> 03:11.790
That the version we are about to implement,

03:11.790 --> 03:15.180
Is probably the most powerful version of the A3C today.

03:15.180 --> 03:17.730
And so what is this implementation?

03:17.730 --> 03:20.550
Well, originally it comes from,

03:20.550 --> 03:23.190
A developer called Elia Kostrikov.

03:23.190 --> 03:24.023
And so as you can see,

03:24.023 --> 03:26.852
He did a PyTorch implementation of the A3C,

03:26.852 --> 03:29.837
Which originally didn't work well for Breakout,

03:29.837 --> 03:32.970
But then somebody made the pool request.

03:32.970 --> 03:34.830
If we go to the pool request here,

03:34.830 --> 03:37.920
We can see in the close one that, there we go.

03:37.920 --> 03:41.160
We have this one, a cleaner solution to G sharing problem.

03:41.160 --> 03:43.950
And guess who this pool request was made from?

03:43.950 --> 03:47.280
It was made from Adam Paszke, the Creative PyTorch.

03:47.280 --> 03:50.361
And that solved the problem, that's made the A3C,

03:50.361 --> 03:52.800
Work very well on Breakout,

03:52.800 --> 03:54.832
Without waiting for days and days.

03:54.832 --> 03:58.470
And therefore if we go back to this implementation,

03:58.470 --> 04:00.529
We can see the four contributors,

04:00.529 --> 04:02.768
Of this most powerful implementation.

04:02.768 --> 04:04.530
And here are the contributors.

04:04.530 --> 04:06.630
So thank you very much to all of them,

04:06.630 --> 04:08.850
And we can say a huge and special thank you,

04:08.850 --> 04:12.780
To Adam Paszke for fixing the gradient sharing problem.

04:12.780 --> 04:15.480
He started by doing a fork, which is a sub branch,

04:15.480 --> 04:17.910
Of the code, and then he did a pool request,

04:17.910 --> 04:20.207
To the developer to fix this problem.

04:20.207 --> 04:24.360
There was in the code, which is a gradient sharing problem,

04:24.360 --> 04:26.820
And that's how he became a major contributor,

04:26.820 --> 04:28.140
Of this implementation.

04:28.140 --> 04:30.600
Making the whole thing work perfectly well.

04:30.600 --> 04:31.710
And trust me, I did a lot,

04:31.710 --> 04:34.470
Of experimentation on the A3C model.

04:34.470 --> 04:36.054
I actually implemented five models.

04:36.054 --> 04:38.250
I was even desperate that it didn't work well.

04:38.250 --> 04:41.520
So I made my own Breakout on KIV to have a bigger ball,

04:41.520 --> 04:44.310
And therefore an easier pre-processing of the images.

04:44.310 --> 04:47.760
Then I went back to open AI and made my own implementation,

04:47.760 --> 04:51.090
Of the A3C, but that took ages, to run,

04:51.090 --> 04:53.430
And train on a pretty powerful computer.

04:53.430 --> 04:55.380
So I wanted to find a better way.

04:55.380 --> 04:58.800
And that's the way it is, this very powerful implementation,

04:58.800 --> 05:00.780
Of the A3C model of which one,

05:00.780 --> 05:04.560
Of the major contributors is the creator of PyTorch.

05:04.560 --> 05:07.320
So what we're gonna do in this module,

05:07.320 --> 05:08.880
I think you're ready for that.

05:08.880 --> 05:11.910
Is implement this highest level code,

05:11.910 --> 05:13.678
For the implementation of the A3C.

05:13.678 --> 05:16.688
So we're basically going to re-implement all these files,

05:16.688 --> 05:18.660
And we will mostly insist,

05:18.660 --> 05:21.780
On the files that are directly related to the A3C.

05:21.780 --> 05:24.660
For all the parts that are directly related to the A3C.

05:24.660 --> 05:27.030
We will implement the code line by line,

05:27.030 --> 05:29.490
For the others I will just explain the code.

05:29.490 --> 05:31.560
So we should be able to tackle this,

05:31.560 --> 05:33.840
Without finding it too overwhelming.

05:33.840 --> 05:34.680
So there we go.

05:34.680 --> 05:36.180
Quite a special module.

05:36.180 --> 05:39.030
Not only we work on a state of the arts model of AI,

05:39.030 --> 05:41.190
But also at the time I'm speaking,

05:41.190 --> 05:43.530
I'm highly confident we're implementing the most,

05:43.530 --> 05:46.050
Powerful version of the A3C.

05:46.050 --> 05:47.100
So let's do it.

05:47.100 --> 05:48.570
Let's go back to Python.

05:48.570 --> 05:50.580
And let's start all this,

05:50.580 --> 05:51.690
Before we start.

05:51.690 --> 05:54.030
We're gonna do the most simple thing we're gonna do

05:54.030 --> 05:56.940
In this module, setting the working directory folder.

05:56.940 --> 05:59.730
So let's go to our AI a to z template folder.

05:59.730 --> 06:02.910
Module three Breakout, the most challenging one.

06:02.910 --> 06:03.810
And there we go.

06:03.810 --> 06:05.220
That's all our files.

06:05.220 --> 06:09.120
So let's see which ones are directly related to A3C.

06:09.120 --> 06:11.670
And so let's see which ones we're going to implement,

06:11.670 --> 06:14.820
Line by line and focus our energy on.

06:14.820 --> 06:16.440
So there are actually two fouls.

06:16.440 --> 06:19.470
The first one is Model .py, which is this one.

06:19.470 --> 06:21.810
So we will re implement it line by line,

06:21.810 --> 06:22.950
Because that's the most important.

06:22.950 --> 06:25.530
That's where we make the A3C brains.

06:25.530 --> 06:27.630
And the most important thing to understand here is,

06:27.630 --> 06:30.600
That we will have a shared model which will have,

06:30.600 --> 06:33.921
The same update of the weight for the actor and the critic.

06:33.921 --> 06:36.573
So that's a part of the special version of the A3C,

06:36.573 --> 06:40.080
The shared model with the shared update of the weights.

06:40.080 --> 06:42.630
And then the other most important foul,

06:42.630 --> 06:44.970
That we will implement line by line,

06:44.970 --> 06:47.770
Is the train.py file, of course,

06:47.770 --> 06:51.480
Right after we made the brains of the A3C,

06:51.480 --> 06:53.160
Well we have to train them,

06:53.160 --> 06:56.160
And we trained them in this train.py file.

06:56.160 --> 06:57.810
So this is quite a long code,

06:57.810 --> 07:01.588
But this is what contains the heart of the A3C model,

07:01.588 --> 07:05.040
Which will have two losses to reduce the value loss,

07:05.040 --> 07:06.326
Which is the loss related,

07:06.326 --> 07:10.560
To the predictions of the critic and the policy loss,

07:10.560 --> 07:14.820
Which is the loss related to the predictions of the actor.

07:14.820 --> 07:15.750
So this is quite new,

07:15.750 --> 07:18.060
But you know, that's because in the A3C,

07:18.060 --> 07:20.790
We're basically working with several agents,

07:20.790 --> 07:23.190
Each one having their own copy of the environment,

07:23.190 --> 07:25.140
But we also have this fully connected layer,

07:25.140 --> 07:27.840
That outputs a value of the V function,

07:27.840 --> 07:29.850
And that basically is a common vision,

07:29.850 --> 07:31.530
Of what's happening in the game.

07:31.530 --> 07:33.878
So, this will be quite challenging.

07:33.878 --> 07:36.450
So make sure to be in good shape.

07:36.450 --> 07:38.250
And for the rest of the fouls,

07:38.250 --> 07:40.890
well I will just explain them in details

07:40.890 --> 07:43.290
But not by spending too much time on them.

07:43.290 --> 07:45.810
Believe me, you want to keep your energy for this.

07:45.810 --> 07:47.160
This will be already a lot.

07:47.160 --> 07:49.380
So these fouls are,

07:49.380 --> 07:50.640
Ends .py.

07:50.640 --> 07:52.299
Which is an improvement,

07:52.299 --> 07:55.500
Of the gym environment thanks to Universe.

07:55.500 --> 07:56.333
So basically,

07:56.333 --> 07:59.220
That foul just improves the gym environment with Universe,

07:59.220 --> 08:01.620
And that allows to have an optimal pre-processing,

08:01.620 --> 08:02.670
Of the images,

08:02.670 --> 08:06.120
And also to normalize all the values of the environment,

08:06.120 --> 08:09.919
Like the colors intensities or the rewards intensities,

08:09.919 --> 08:12.270
While all the values of the environment.

08:12.270 --> 08:14.490
This file normalizes all these values.

08:14.490 --> 08:17.010
And also make sure we have an optimal pre-processing,

08:17.010 --> 08:18.120
Of the images.

08:18.120 --> 08:19.380
And as you can see, this is taken,

08:19.380 --> 08:24.380
From this open AI GitHub page with a Universe star agent.

08:24.810 --> 08:27.450
So we will not spend too much time on this.

08:27.450 --> 08:28.860
We will actually stop here.

08:28.860 --> 08:29.970
You just need to understand,

08:29.970 --> 08:33.270
That we improve the gym environment with Universe,

08:33.270 --> 08:36.057
To get an optimal pre-processing, of the images.

08:36.057 --> 08:40.290
The rest is not that important, especially for the A3C.

08:40.290 --> 08:42.300
Then we have main.py

08:42.300 --> 08:45.150
Which is the code that will execute the whole thing.

08:45.150 --> 08:47.400
So you know the code that will run the whole thing,

08:47.400 --> 08:49.887
Create the brain, train the brain and output the videos.

08:49.887 --> 08:54.300
And that's because it will run all these codes here.

08:54.300 --> 08:58.170
So model, we saw what it was then myoptin.py,

08:58.170 --> 09:02.384
Is a special optimizer that's basically the atom optimizer,

09:02.384 --> 09:06.240
But adapted to this shared model that we're implementing.

09:06.240 --> 09:08.774
So we will explain all this code in one tutorial.

09:08.774 --> 09:13.050
Then we have test.py, that's actually the last one.

09:13.050 --> 09:15.288
So test.py is basically the file, that will,

09:15.288 --> 09:17.460
Implement a test agent.

09:17.460 --> 09:20.370
So that is an agent, that will play Breakout,

09:20.370 --> 09:22.500
Without updating the model.

09:22.500 --> 09:24.840
So that's totally independent from the training.

09:24.840 --> 09:27.510
And we will also explain this code in details.

09:27.510 --> 09:31.500
Besides the good news is that you will have two codes,

09:31.500 --> 09:33.660
One code, which will be the code we implement,

09:33.660 --> 09:36.240
In the tutorials, but without any comment.

09:36.240 --> 09:37.810
And one of the code that is one,

09:37.810 --> 09:40.680
Of the code folder with all the codes commented.

09:40.680 --> 09:44.160
So with all these six valves, all well commented,

09:44.160 --> 09:46.830
So that if you miss something in any tutorial,

09:46.830 --> 09:48.240
Well you will be able to look,

09:48.240 --> 09:51.180
At the commanded code to understand what's going on.

09:51.180 --> 09:52.080
So there we go.

09:52.080 --> 09:54.480
I hope you're excited to implement this.

09:54.480 --> 09:56.966
You're really at the top of the mountain now or just&lt;

09:56.966 --> 09:59.730
Below the top because you need to understand this first

09:59.730 --> 10:02.535
But you're getting there to take a good breath of oxygen.

10:02.535 --> 10:05.670
And there we go for this super exciting journey.

10:05.670 --> 10:07.653
Until then enjoy AI.
