WEBVTT

00:00.300 --> 00:01.133
Narrator: Hello,

00:01.133 --> 00:02.430
and welcome to the first part

00:02.430 --> 00:04.260
of this AI implementation,

00:04.260 --> 00:06.060
part one, building the AI.

00:06.060 --> 00:09.060
And as you can see, I already added a dimension

00:09.060 --> 00:11.130
in the structure of this implementation,

00:11.130 --> 00:14.130
with these three sections composing part one,

00:14.130 --> 00:16.620
which clearly show how we're gonna build this AI.

00:16.620 --> 00:18.810
First, we're gonna make the brain,

00:18.810 --> 00:20.940
that is nothing else than the neural network,

00:20.940 --> 00:22.830
then we're gonna make the body,

00:22.830 --> 00:25.590
which will define how the actions are gonna be played.

00:25.590 --> 00:27.780
And then once we have the brain and the body,

00:27.780 --> 00:30.660
we will assemble them to make the AI,

00:30.660 --> 00:32.460
which will be in the last section

00:32.460 --> 00:33.810
of this first part.

00:33.810 --> 00:35.880
So now you can already have a good vision

00:35.880 --> 00:37.890
of the structure of this implementation.

00:37.890 --> 00:39.180
First, we make the brain,

00:39.180 --> 00:40.230
then we make the body,

00:40.230 --> 00:42.870
and then we assemble the two to make the AI.

00:42.870 --> 00:44.790
And in this tutorial, we're gonna start

00:44.790 --> 00:46.590
with the first section

00:46.590 --> 00:48.810
that is about making the brain.

00:48.810 --> 00:51.330
And this is going to take us four tutorials,

00:51.330 --> 00:53.310
you can imagine that making a brain

00:53.310 --> 00:54.900
is not like making a cake,

00:54.900 --> 00:57.270
so this will require more than one tutorial.

00:57.270 --> 00:58.710
And of course, as usual,

00:58.710 --> 01:02.400
we are going to represent that brain with a class,

01:02.400 --> 01:04.650
because we will need several functions.

01:04.650 --> 01:08.670
And in order to have a structure of several functions

01:08.670 --> 01:11.310
that will be organized in some kind of instructions,

01:11.310 --> 01:13.170
well, of course, we need a class.

01:13.170 --> 01:14.003
And that's perfect,

01:14.003 --> 01:16.050
because once we make that class,

01:16.050 --> 01:19.350
well we will be able to create as many brains as we want

01:19.350 --> 01:22.050
by just creating some objects of this class.

01:22.050 --> 01:24.090
So again, class is in Python,

01:24.090 --> 01:24.923
and in general,

01:24.923 --> 01:28.200
in object oriented programming languages are very practical

01:28.200 --> 01:31.110
because you make your model of something you wanna build

01:31.110 --> 01:34.320
and then you're able to create as many objects as you want,

01:34.320 --> 01:36.270
and they will have all the features

01:36.270 --> 01:37.860
that you define in the class.

01:37.860 --> 01:40.500
And for our brain, the features will be, of course,

01:40.500 --> 01:41.333
well first of all,

01:41.333 --> 01:43.470
the architecture of the neural network,

01:43.470 --> 01:45.360
which I remind will be a CNN,

01:45.360 --> 01:47.220
and of course, the different functions,

01:47.220 --> 01:48.150
like for example,

01:48.150 --> 01:51.150
passing on the signals from the input neurons

01:51.150 --> 01:52.410
to the output neurons.

01:52.410 --> 01:53.250
So that way of course,

01:53.250 --> 01:55.890
the forward function that we will make.

01:55.890 --> 01:56.910
So let's do this.

01:56.910 --> 01:58.890
Let's start making the brain,

01:58.890 --> 02:00.660
this is going to be pretty exciting.

02:00.660 --> 02:02.490
It is one of my favorite parts,

02:02.490 --> 02:04.860
and therefore let's get straight into it.

02:04.860 --> 02:09.420
So we're gonna start by introducing the class, of course,

02:09.420 --> 02:10.920
and we're gonna call this class

02:10.920 --> 02:13.860
well, I hesitated to call it brain,

02:13.860 --> 02:15.270
but let's be more direct

02:15.270 --> 02:17.340
and let's call it CNN,

02:17.340 --> 02:20.520
because actually the brain is a CNN network,

02:20.520 --> 02:22.500
a convolutional neural network.

02:22.500 --> 02:24.600
So as you want, you can call it brain if you want,

02:24.600 --> 02:27.483
but at least now we know what we're building.

02:28.650 --> 02:32.400
And CNN, as for the network of the self-driving car,

02:32.400 --> 02:35.190
is going to inherit from the nn.Module.

02:35.190 --> 02:36.840
So remember, the nn.Module

02:36.840 --> 02:40.170
is what we imported here

02:40.170 --> 02:42.720
and we want to be able to use all the tools

02:42.720 --> 02:44.100
of this nn.Module

02:44.100 --> 02:47.040
and therefore, we want to use this technique

02:47.040 --> 02:48.750
in object-oriented programming,

02:48.750 --> 02:50.040
which is inheritance,

02:50.040 --> 02:52.200
and which allows us to, you know,

02:52.200 --> 02:55.320
well, use all the tools from a parent class.

02:55.320 --> 02:59.940
And this parent class is going to be nn.Module

02:59.940 --> 03:00.900
There we go.

03:00.900 --> 03:02.910
And now we can use all the tools

03:02.910 --> 03:05.520
and objects of the nn.Module.

03:05.520 --> 03:06.353
All right?

03:06.353 --> 03:08.070
So now that we have our inheritance,

03:08.070 --> 03:12.120
we can go into the class to make our first function.

03:12.120 --> 03:13.470
And as you probably guess,

03:13.470 --> 03:16.350
the first function is the init function,

03:16.350 --> 03:18.690
that will define all the variables

03:18.690 --> 03:20.340
of the future brain objects,

03:20.340 --> 03:22.170
you know, the future CNN objects

03:22.170 --> 03:23.670
that will be created.

03:23.670 --> 03:24.810
All right, so let's do this,

03:24.810 --> 03:25.643
def__init__

03:29.910 --> 03:32.070
and now we need to input some variables.

03:32.070 --> 03:35.310
So first, the first variable is going to be self.

03:35.310 --> 03:37.500
That of course, refers to the object.

03:37.500 --> 03:39.840
Now, I guess you're pretty comfortable with this.

03:39.840 --> 03:42.060
Then we are gonna add another variable

03:42.060 --> 03:44.040
which will be the number of actions

03:44.040 --> 03:45.480
in the Doom environment.

03:45.480 --> 03:50.480
So we're gonna call it number actions, number of actions.

03:50.550 --> 03:53.160
And actually this variable is not compulsory

03:53.160 --> 03:54.480
for the init function,

03:54.480 --> 03:56.820
it's just that if you want to test the AI

03:56.820 --> 03:57.920
that we're gonna build

03:58.862 --> 03:59.695
on other Doom environments,

03:59.695 --> 04:01.080
well this will be very practical,

04:01.080 --> 04:05.010
because we will import this number of actions variable

04:05.010 --> 04:08.460
from the Doom environment wrappers with two discrete,

04:08.460 --> 04:10.470
and when doing that, we will, you know,

04:10.470 --> 04:14.160
input the name of the environment, Doom v0,

04:14.160 --> 04:16.110
and so if you want to, you know,

04:16.110 --> 04:17.670
experiment with this AI

04:17.670 --> 04:18.990
on other Doom environments

04:18.990 --> 04:20.820
and play on other games,

04:20.820 --> 04:22.080
well you won't have anything to do,

04:22.080 --> 04:24.210
because this number of actions

04:24.210 --> 04:26.310
will directly get the number of actions

04:26.310 --> 04:29.070
in the Doom environment you'll be playing with.

04:29.070 --> 04:29.903
Okay.

04:29.903 --> 04:32.760
So that's it for the two arguments of this init function,

04:32.760 --> 04:34.260
so now we can go inside.

04:34.260 --> 04:36.480
And now, remember what we have to do.

04:36.480 --> 04:39.930
The first thing we have to do is activate the inheritance

04:39.930 --> 04:41.610
with the super function.

04:41.610 --> 04:43.950
So that's exactly like for the self-driving car,

04:43.950 --> 04:46.140
we take the super function,

04:46.140 --> 04:49.500
then inside we start by inputting the class

04:49.500 --> 04:51.510
that will define the neural network,

04:51.510 --> 04:53.640
and that is CNN.

04:53.640 --> 04:55.650
Then we have to input self,

04:55.650 --> 04:57.180
to refer to the object.

04:57.180 --> 04:58.380
But then remember, that's not all,

04:58.380 --> 05:00.630
we need to add here a dot

05:00.630 --> 05:02.230
and then the init function

05:03.810 --> 05:05.250
with some parenthesis.

05:05.250 --> 05:07.950
And by doing that, we activate the inheritance,

05:07.950 --> 05:11.490
and now we can use all the tools from the nn.Module.

05:11.490 --> 05:12.323
All right.

05:12.323 --> 05:13.590
So now I think it's time

05:13.590 --> 05:17.220
to build the architecture of the neural network.

05:17.220 --> 05:20.413
And so as you remember, we are gonna build a CNN,

05:20.413 --> 05:22.500
a convolutional neural network,

05:22.500 --> 05:25.620
simply because this time, the AI will have eyes,

05:25.620 --> 05:29.370
and the eyes of the AI will be the convolutional layers

05:29.370 --> 05:31.500
of this convolutional neural network.

05:31.500 --> 05:34.830
And then after the AI visualizes the images

05:34.830 --> 05:36.510
with the convolutional layers,

05:36.510 --> 05:38.310
it will pass on the signals

05:38.310 --> 05:41.160
into a classic artificial neural network,

05:41.160 --> 05:42.570
so that is the one that we had before,

05:42.570 --> 05:44.640
with fully connected layers.

05:44.640 --> 05:47.730
And that's where it will try to predict the Q values

05:47.730 --> 05:50.343
for each possible actions that we can play.

05:51.240 --> 05:53.760
So you have the architecture in mind,

05:53.760 --> 05:56.220
first, we're gonna have some convolutional layers,

05:56.220 --> 05:58.290
and then some fully connected layers,

05:58.290 --> 06:01.500
and this will be the brain of our AI.

06:01.500 --> 06:04.920
So what we're gonna do to, you know,

06:04.920 --> 06:07.650
be able to have a step back at what we're making,

06:07.650 --> 06:09.900
well, let's just make this architecture

06:09.900 --> 06:11.790
with the variables we want to create.

06:11.790 --> 06:14.640
So, actually, speaking of this architecture

06:14.640 --> 06:16.680
we are gonna build a CNN

06:16.680 --> 06:18.900
with three convolutional layers,

06:18.900 --> 06:21.330
and then after that, one hidden layer.

06:21.330 --> 06:24.360
That means that we will need three convolutional connections

06:24.360 --> 06:26.160
and two full connections.

06:26.160 --> 06:27.660
And speaking of connections,

06:27.660 --> 06:29.580
that's exactly what we're about to define,

06:29.580 --> 06:32.490
that will be the variables of our CNN class,

06:32.490 --> 06:33.570
and therefore, right now I'm going

06:33.570 --> 06:35.130
to define five variables,

06:35.130 --> 06:37.440
three for the convolutional connections,

06:37.440 --> 06:39.360
and two for the full connections.

06:39.360 --> 06:40.193
So let's do this.

06:40.193 --> 06:42.750
We're gonna start with the convolution connections.

06:42.750 --> 06:46.063
So I'm gonna call them self.convolution1

06:48.605 --> 06:50.160
I'm gonna copy that

06:50.160 --> 06:54.000
and paste them below.

06:54.000 --> 06:55.110
And there we go,

06:55.110 --> 06:56.700
self convolution2

06:56.700 --> 06:58.203
and self convolution3.

06:59.040 --> 07:01.320
That's our convolution connections.

07:01.320 --> 07:03.630
So this first convolution1 here

07:03.630 --> 07:06.750
will apply some convolution to the input images

07:06.750 --> 07:09.300
to get a first convolutional layer.

07:09.300 --> 07:11.619
Then the second convolution2 here,

07:11.619 --> 07:15.180
will take the first convolutional layer as input,

07:15.180 --> 07:17.160
and by applying again some convolution,

07:17.160 --> 07:19.830
it will create a second convolutional layer.

07:19.830 --> 07:21.210
And in this convolutional layer,

07:21.210 --> 07:23.490
we will get some new images,

07:23.490 --> 07:26.010
each of them detecting one specific feature.

07:26.010 --> 07:27.570
So we will get these new images

07:27.570 --> 07:29.130
in a convolutional layer,

07:29.130 --> 07:32.580
then we will apply this convolution2 here

07:32.580 --> 07:35.310
to connect these new images

07:35.310 --> 07:37.230
from this first convolutional layer

07:37.230 --> 07:40.410
to some new images of a second convolutional layer.

07:40.410 --> 07:41.970
And these new images, again,

07:41.970 --> 07:43.500
will detect some features

07:43.500 --> 07:46.200
in the images of the first convolutional layer,

07:46.200 --> 07:49.200
so it's just to reinforce the feature detection.

07:49.200 --> 07:52.770
And then to the images of the second convolutional layer,

07:52.770 --> 07:55.560
we apply the third convolution here

07:55.560 --> 07:57.030
to get for each of them,

07:57.030 --> 08:00.300
some more images that detect even more features

08:00.300 --> 08:02.010
inside the input images.

08:02.010 --> 08:03.300
And so the more we do this,

08:03.300 --> 08:05.220
the more we apply some convolutions

08:05.220 --> 08:07.500
to the different layers of images,

08:07.500 --> 08:08.610
well, the more we are able

08:08.610 --> 08:10.470
to detect some features.

08:10.470 --> 08:12.840
And that's how by detecting the features,

08:12.840 --> 08:15.210
the AI will understand where the monsters are,

08:15.210 --> 08:16.890
where it has to shoot to kill them,

08:16.890 --> 08:18.450
and where it should go to.

08:18.450 --> 08:20.250
It will also detect the walls,

08:20.250 --> 08:21.420
the obstacles,

08:21.420 --> 08:23.310
well literally, where it has to go.

08:23.310 --> 08:25.350
And that is thanks to

08:25.350 --> 08:27.990
what all these convolutional layers detect

08:27.990 --> 08:29.752
in the original input images.

08:29.752 --> 08:30.750
All right.

08:30.750 --> 08:35.100
So that's for the convolution part of the CNN,

08:35.100 --> 08:37.980
but then remember, after the convolutional layers,

08:37.980 --> 08:42.120
we have to flatten all the pixels obtained

08:42.120 --> 08:43.110
by the different series

08:43.110 --> 08:44.850
of convolutions that were applied.

08:44.850 --> 08:47.460
And by flattening all the arrays of pixels,

08:47.460 --> 08:49.740
we get this huge vector

08:49.740 --> 08:51.180
that will become the input

08:51.180 --> 08:53.460
of a classic artificial neural network,

08:53.460 --> 08:56.190
and that's where we get our fully connected layers,

08:56.190 --> 08:58.680
and therefore, our full connections.

08:58.680 --> 09:00.360
And so now, what we have to do

09:00.360 --> 09:02.670
is create two new variables

09:02.670 --> 09:04.800
because we're gonna have one hidden layer

09:04.800 --> 09:07.530
in this classic artificial neural network that comes next,

09:07.530 --> 09:09.360
and therefore we need one full connection

09:09.360 --> 09:11.730
from this huge flattened vector

09:11.730 --> 09:13.530
to this one hidden layer.

09:13.530 --> 09:17.010
And a second full connection between this one hidden layer

09:17.010 --> 09:18.180
and the output layer,

09:18.180 --> 09:19.740
composed of the output neurons

09:19.740 --> 09:21.930
that are the Q values.

09:21.930 --> 09:24.240
So let's make these two first connections

09:24.240 --> 09:27.210
and then we will define all these connections.

09:27.210 --> 09:28.890
So as for the self-driving car,

09:28.890 --> 09:32.730
we're gonna call them self.fc1

09:32.730 --> 09:36.210
and then self.fc2.

09:36.210 --> 09:37.043
All right.

09:37.043 --> 09:38.460
So now we have all our variables.

09:38.460 --> 09:40.170
And so now what we have to do, is of course,

09:40.170 --> 09:44.280
define them with the classes of the nn.Module.

09:44.280 --> 09:46.740
So that means we'll basically create the architecture

09:46.740 --> 09:47.820
of the neural network,

09:47.820 --> 09:49.170
and that's what we'll do

09:49.170 --> 09:50.460
in the next tutorial.

09:50.460 --> 09:52.203
Until then, enjoy AI.