WEBVTT

00:00.450 --> 00:02.910
-: Hello and welcome to this tutorial.

00:02.910 --> 00:05.310
Now we are gonna make this second function

00:05.310 --> 00:06.780
to initialize the weights,

00:06.780 --> 00:10.860
and this one will be used to get an optimal learning

00:10.860 --> 00:13.230
of actually these weights.

00:13.230 --> 00:14.640
So the second function,

00:14.640 --> 00:16.747
we're gonna call it weights_init,

00:19.980 --> 00:24.000
and it will take as argument the object M,

00:24.000 --> 00:26.160
which will represent the neural network.

00:26.160 --> 00:27.090
So that's all.

00:27.090 --> 00:28.620
And then colon,

00:28.620 --> 00:30.360
and now let's get inside the function

00:30.360 --> 00:32.550
to define what we want it to do.

00:32.550 --> 00:36.390
So basically what we want it to do is initialize the weights

00:36.390 --> 00:37.470
of the neural network

00:37.470 --> 00:40.050
in such a way that we get an optimal learning.

00:40.050 --> 00:43.560
So this will not seem particularly intuitive.

00:43.560 --> 00:46.740
This is based on research papers and experiments.

00:46.740 --> 00:48.480
We are going to initialize the weights

00:48.480 --> 00:51.240
in a specific way that we haven't seen before,

00:51.240 --> 00:54.900
but believe me, that will optimize the learning process.

00:54.900 --> 00:56.610
So we will just implement it

00:56.610 --> 00:59.040
without getting into the details of why

00:59.040 --> 01:00.900
we initialize the weights this way.

01:00.900 --> 01:03.810
And so we're gonna start by using a trick

01:03.810 --> 01:05.850
which will be used later to make the distinction

01:05.850 --> 01:09.660
between the convolution and the full connection.

01:09.660 --> 01:12.840
Because remember, our AI will have eyes

01:12.840 --> 01:15.330
and therefore it will have some convolutional layers,

01:15.330 --> 01:18.210
and of course it will also have some fully connected layers,

01:18.210 --> 01:20.547
and we will have a different initialization

01:20.547 --> 01:23.790
of the weight for these two types of connections.

01:23.790 --> 01:26.220
So we're gonna use this trick to separate

01:26.220 --> 01:27.810
these two kinds of connections,

01:27.810 --> 01:29.700
and then we'll use some if conditions

01:29.700 --> 01:32.070
to get a different initialization

01:32.070 --> 01:34.140
for each of these connections.

01:34.140 --> 01:36.720
So this trick is to create a new variable

01:36.720 --> 01:39.150
that we're gonna call class name,

01:39.150 --> 01:42.780
and that will be equal to M, our object.

01:42.780 --> 01:44.700
So M represents the neural network,

01:44.700 --> 01:46.037
but it's an object.

01:46.037 --> 01:47.400
We will see that later.

01:47.400 --> 01:51.390
And we're gonna get a special attribute from this object,

01:51.390 --> 01:56.340
which will be class name with double underscore first class

01:56.340 --> 02:00.480
double underscore dot double underscore again name,

02:00.480 --> 02:03.330
and almost there, another double underscore.

02:03.330 --> 02:07.140
So that's a pretty ugly trick to look for the type

02:07.140 --> 02:09.720
of connection of our neural network object,

02:09.720 --> 02:12.180
but that will give us exactly what we want.

02:12.180 --> 02:13.440
You're gonna see it's gonna make sense

02:13.440 --> 02:15.510
when we start the if conditions.

02:15.510 --> 02:17.760
And by the way, speaking of if conditions,

02:17.760 --> 02:19.650
we can start them right now.

02:19.650 --> 02:22.140
And so what we're gonna do now is

02:22.140 --> 02:24.480
start a first if condition,

02:24.480 --> 02:26.250
which will get us the first case,

02:26.250 --> 02:29.490
that is, if the connection is a convolution.

02:29.490 --> 02:30.720
And so to write this condition,

02:30.720 --> 02:35.520
we write if class name dot find.

02:35.520 --> 02:38.820
Here we use a method, the find method,

02:38.820 --> 02:42.273
find and inside of input in quotes,

02:43.157 --> 02:45.270
conv for convolution.

02:45.270 --> 02:49.380
And so if class name dot find conv is,

02:49.380 --> 02:53.040
we're gonna do different than minus one.

02:53.040 --> 02:55.470
That is actually if we have a convolution,

02:55.470 --> 02:58.170
because minus one means no.

02:58.170 --> 03:02.130
Well, in that case we will do a special initialization

03:02.130 --> 03:03.390
of the weights.

03:03.390 --> 03:04.980
So this condition here means

03:04.980 --> 03:07.800
if we have a convolution connection.

03:07.800 --> 03:08.633
So in that case,

03:08.633 --> 03:12.750
what we do is run this specific initialization

03:12.750 --> 03:14.010
of the weights we wanna do.

03:14.010 --> 03:18.030
And so that's where all the non-intuitive things will come.

03:18.030 --> 03:20.280
We will start by creating a variable

03:20.280 --> 03:24.120
that we're gonna call weight underscore shape.

03:24.120 --> 03:27.150
So weight underscore shape will be a list

03:27.150 --> 03:30.150
that will basically contain the shape of the weight

03:30.150 --> 03:31.830
in our neural network M.

03:31.830 --> 03:36.150
And so we're gonna use the list function to create a list,

03:36.150 --> 03:39.810
and inside we're going to input M, the neural network,

03:39.810 --> 03:43.560
dot weight, which will be the weight of the neural network,

03:43.560 --> 03:46.500
but in the convolution connection.

03:46.500 --> 03:48.840
And to get the shape of these weights,

03:48.840 --> 03:53.160
we use another attribute, which is dot data

03:53.160 --> 03:55.320
and then size.

03:55.320 --> 03:57.750
Size will get us the shape of these weights

03:57.750 --> 03:59.640
in the convolution connection.

03:59.640 --> 04:01.860
So now weight shape contains

04:01.860 --> 04:04.290
in a list the shape of the weights

04:04.290 --> 04:08.460
and the convolution connections of our neural network M.

04:08.460 --> 04:10.050
All right, so we have weight shape.

04:10.050 --> 04:12.930
Then to initialize the weights

04:12.930 --> 04:14.520
of this convolution connection,

04:14.520 --> 04:16.650
we're gonna need two values.

04:16.650 --> 04:20.400
First is the product of the first dimension

04:20.400 --> 04:22.890
by the second dimension by the third dimension.

04:22.890 --> 04:24.840
So that's what we're gonna get right now.

04:24.840 --> 04:27.660
And then we will also need to get the zeroth dimension

04:27.660 --> 04:30.510
times the second dimension times the third dimension,

04:30.510 --> 04:32.400
and then we'll use these two values

04:32.400 --> 04:35.640
in the computation of how we initialize the weights.

04:35.640 --> 04:36.990
So let's get these two.

04:36.990 --> 04:40.293
This first product, we call it fun in,

04:41.250 --> 04:44.280
and that will be equal to the product,

04:44.280 --> 04:45.900
and we're gonna use the prod function,

04:45.900 --> 04:47.820
which is a function by NumPy,

04:47.820 --> 04:50.190
which has the shortcut NP.

04:50.190 --> 04:52.800
So NP, that's prod,

04:52.800 --> 04:55.770
and inside prod we input what we want to make

04:55.770 --> 04:57.180
the product of.

04:57.180 --> 05:00.720
And so as we said, that is the dimension one, two,

05:00.720 --> 05:02.850
and three of our weight shape.

05:02.850 --> 05:04.200
And so to get this,

05:04.200 --> 05:06.070
we can take our weight shape

05:07.170 --> 05:10.500
and get the indexes of these three dimensions.

05:10.500 --> 05:13.230
And so we said it's dimension one

05:13.230 --> 05:15.990
up to dimension three included.

05:15.990 --> 05:19.020
So up to dimension four excluded.

05:19.020 --> 05:20.580
And that's how we can get it.

05:20.580 --> 05:24.390
Four, the upper bound here, is not included.

05:24.390 --> 05:26.160
So that's exactly what we want.

05:26.160 --> 05:28.923
Then same for fun out.

05:30.840 --> 05:34.380
As we said, fun out is going to be the product

05:34.380 --> 05:37.507
of the dimension zero times the dimension two

05:37.507 --> 05:39.720
times the dimension three.

05:39.720 --> 05:43.860
And so here we can get the index from two included

05:43.860 --> 05:45.690
to four excluded.

05:45.690 --> 05:49.170
So that will get us the product of dimension two and three.

05:49.170 --> 05:52.890
And then we can multiply it by the dimension zero,

05:52.890 --> 05:57.890
which we can access with weight shape zero of index zero.

05:59.190 --> 06:04.190
So to sum up, this is dim one times dim two times dim three,

06:08.580 --> 06:13.580
and just below we have dim zero times dim two

06:16.560 --> 06:20.310
times dim three of our weight shape list of weights.

06:20.310 --> 06:22.560
All right, so now we're gonna use these two values

06:22.560 --> 06:25.830
fun in and fun out to proceed to the initialization,

06:25.830 --> 06:28.350
because we're going to compute a new value

06:28.350 --> 06:31.140
that we're gonna call W bound,

06:31.140 --> 06:34.020
and that will be equal to the square root,

06:34.020 --> 06:38.393
which we can get with the function NP from NumPy dot SQRT,

06:39.480 --> 06:40.830
exactly like before.

06:40.830 --> 06:45.830
So the square root of six divided by fun in plus fun out.

06:46.770 --> 06:51.630
So fun in plus fun out.

06:51.630 --> 06:52.680
There we go.

06:52.680 --> 06:54.990
So this W bound here represents

06:54.990 --> 06:58.230
in some way the size of the tensor of weights.

06:58.230 --> 06:59.730
And why did we get this?

06:59.730 --> 07:03.180
It's because then what we are just about to do now

07:03.180 --> 07:05.940
is we want to generate some random weights

07:05.940 --> 07:08.850
that are inversely proportional to the size

07:08.850 --> 07:10.170
of the tensor of weights,

07:10.170 --> 07:12.660
because indeed what we're about to do now

07:12.660 --> 07:15.213
is take our neural network M,

07:16.170 --> 07:18.540
then get its weight,

07:18.540 --> 07:21.240
so by still taking the attribute weight,

07:21.240 --> 07:26.070
then access its data, that is, the tensor itself.

07:26.070 --> 07:28.860
And then from this tensor of weights,

07:28.860 --> 07:32.760
we're gonna generate some random weights that are

07:32.760 --> 07:37.140
inversely proportional to the size of the tensor of weights.

07:37.140 --> 07:39.150
And so in this uniform function now,

07:39.150 --> 07:40.890
we have to input a lower bound,

07:40.890 --> 07:43.593
which will be minus W bound,

07:44.430 --> 07:48.663
and the upper bound, which will be plus W bound.

07:49.740 --> 07:52.470
Okay, so that's for the weights.

07:52.470 --> 07:55.680
And now we need to initialize the bias.

07:55.680 --> 07:57.480
And good news for the bias.

07:57.480 --> 07:59.160
It's gonna be much more simple.

07:59.160 --> 08:03.030
We are going to initialize them all with zeros.

08:03.030 --> 08:07.230
So to get these bias, we take them from our model,

08:07.230 --> 08:09.900
of course, that is, our neural network.

08:09.900 --> 08:13.830
And then the attribute for the bias is bias.

08:13.830 --> 08:16.170
Then same we access to the data,

08:16.170 --> 08:18.450
and then we're gonna use a method,

08:18.450 --> 08:21.930
which is the fill underscore method,

08:21.930 --> 08:23.580
which, as you might have guessed,

08:23.580 --> 08:27.300
is used to fill the tensor of bias with zeros.

08:27.300 --> 08:30.360
Well, with zeros, we have to specify that we want to fill it

08:30.360 --> 08:31.650
with zeros here,

08:31.650 --> 08:34.560
and so that's why I'm inputting here zero.

08:34.560 --> 08:36.330
All right, so to summarize,

08:36.330 --> 08:39.270
we generate some random weights inversely proportional

08:39.270 --> 08:41.250
to the size of the tensor of weights

08:41.250 --> 08:43.860
and we initialized the bias with zeros.

08:43.860 --> 08:47.400
All right, so that was for the initialization

08:47.400 --> 08:49.770
of the convolution connections,

08:49.770 --> 08:53.280
and now we need to do the same for the full connection.

08:53.280 --> 08:57.150
And so we're gonna add a new condition, elif.

08:57.150 --> 08:59.340
Same we take this trick we used,

08:59.340 --> 09:02.130
first class name that is this variable

09:02.130 --> 09:05.130
that contains the different names of the connections.

09:05.130 --> 09:09.333
So if class name dot same we use the find method,

09:10.290 --> 09:14.550
to which we input in quotes this time a full connection,

09:14.550 --> 09:16.860
that is, a classic linear connection

09:16.860 --> 09:19.380
in a classic artificial neural network.

09:19.380 --> 09:23.190
And so the name for that is linear.

09:23.190 --> 09:26.460
And same, we're gonna use this trick to say

09:26.460 --> 09:30.040
that we want it to be different than minus one

09:31.170 --> 09:34.920
so that this elif class name find linear

09:34.920 --> 09:39.060
different than minus one means if the connection is linear,

09:39.060 --> 09:41.340
that is, if we have a classical connection.

09:41.340 --> 09:44.790
So in that case, how are we going to initialize the weights?

09:44.790 --> 09:46.877
Well, it's gonna be quite the same.

09:46.877 --> 09:50.850
We are going to introduce a weight shape variable

09:50.850 --> 09:54.150
which will not erase the first one because we will either be

09:54.150 --> 09:55.590
in this case or that case,

09:55.590 --> 09:57.090
so it will not be the same.

09:57.090 --> 09:59.910
So we can totally reuse that.

09:59.910 --> 10:04.910
Then same, we're gonna introduce a fun in variable,

10:05.310 --> 10:06.810
which this time will not be equal

10:06.810 --> 10:09.660
to the product of these three dimensions

10:09.660 --> 10:11.790
but actually this time it will be equal

10:11.790 --> 10:16.790
to simply the dimension one,

10:17.370 --> 10:19.890
and that's because for the full connection

10:19.890 --> 10:24.120
there is less connections than in a convolution connection.

10:24.120 --> 10:26.820
You saw this in the intuition lectures in the ANN

10:26.820 --> 10:28.470
and the CNN section.

10:28.470 --> 10:30.690
There is less dimension for a full connection

10:30.690 --> 10:32.610
than for a convolution.

10:32.610 --> 10:36.180
So basically we just take this dimension one here.

10:36.180 --> 10:39.630
Then same, we're gonna have a fun out variable

10:39.630 --> 10:43.200
which we'll then use to compute W bound.

10:43.200 --> 10:48.060
And this fun out dimension is going to be weight shape

10:48.060 --> 10:49.860
of index zero,

10:49.860 --> 10:51.480
that is, the dimension zero.

10:51.480 --> 10:53.820
All right, then to compute W bound,

10:53.820 --> 10:55.440
it's going to be the same.

10:55.440 --> 10:58.830
It's going to be the square root of six

10:58.830 --> 11:01.770
divided by the sum of fun in and fun out.

11:01.770 --> 11:04.860
So there we go.

11:04.860 --> 11:07.140
And then the good news is that

11:07.140 --> 11:10.140
it's exactly the same as previously.

11:10.140 --> 11:12.540
We used the uniform function for the weights

11:12.540 --> 11:15.510
and the fill function for the bias

11:15.510 --> 11:19.860
to get the same kind of initialization,

11:19.860 --> 11:22.230
but this time with a different fun in and fun out,

11:22.230 --> 11:24.510
and therefore a different W bound.

11:24.510 --> 11:26.130
So that's the same principle.

11:26.130 --> 11:27.480
That's the same idea.

11:27.480 --> 11:30.030
The only thing that changes here is that we have

11:30.030 --> 11:32.010
less dimensions for the full connection

11:32.010 --> 11:35.460
and therefore more simple computation of this bound

11:35.460 --> 11:38.373
of the weights here to generate these random weights.

11:39.210 --> 11:42.300
But the good news is that now it's ready.

11:42.300 --> 11:45.270
Not only this weights init function is ready,

11:45.270 --> 11:47.280
but now we have our two tools,

11:47.280 --> 11:50.280
and so we are ready to start building the brain.

11:50.280 --> 11:51.270
So I can't wait.

11:51.270 --> 11:53.490
This will be, of course, the most exciting part.

11:53.490 --> 11:55.500
This was just to warm up

11:55.500 --> 11:57.630
and get us ready for the big thing.

11:57.630 --> 12:00.060
So we'll take care of that in the next tutorial.

12:00.060 --> 12:01.050
Well, actually, it's gonna take

12:01.050 --> 12:02.640
several tutorials, of course.

12:02.640 --> 12:04.410
We'll start by making the eyes,

12:04.410 --> 12:05.243
and then remember,

12:05.243 --> 12:08.610
we'll add an LSTM to learn the temporal properties

12:08.610 --> 12:09.630
of the input,

12:09.630 --> 12:12.150
and then we'll take care of the actor and the critic,

12:12.150 --> 12:13.200
and that's where we'll use

12:13.200 --> 12:15.750
these two function normalized columns initializer

12:15.750 --> 12:17.903
and weights init. So I can't wait to do that.

12:17.903 --> 12:20.610
We are gonna make something very powerful now,

12:20.610 --> 12:22.770
so get ready for it.

12:22.770 --> 12:24.513
Until then, enjoy AI.
