WEBVTT

00:00.240 --> 00:02.760
-: Hello and welcome to this Python tutorial.

00:02.760 --> 00:04.410
All right, so we have one last function

00:04.410 --> 00:07.200
to implement in our replay memory class.

00:07.200 --> 00:08.700
That's to sample function.

00:08.700 --> 00:11.160
And, that's of course, to get some random samples

00:11.160 --> 00:12.690
from our memory and therefore,

00:12.690 --> 00:15.628
this function will return these random samples.

00:15.628 --> 00:17.850
All right, so let's implement it.

00:17.850 --> 00:19.830
We are going to call it, "Sample."

00:19.830 --> 00:21.390
(keys tapping)
Here we go.

00:21.390 --> 00:25.260
And, this function takes two arguments as input.

00:25.260 --> 00:27.480
The first one, as usual, self.

00:27.480 --> 00:30.180
Our future objective, to replay memory class.

00:30.180 --> 00:32.280
And, the second argument is...

00:32.280 --> 00:33.480
Can you try to guess?

00:33.480 --> 00:36.870
Well, we're taking some samples, a fixed size,

00:36.870 --> 00:40.140
and therefore, we need to choose a size for our samples.

00:40.140 --> 00:42.840
And more precisely, we call it a batch size.

00:42.840 --> 00:46.110
So, that's the name we're gonna give to our second argument.

00:46.110 --> 00:48.120
Batch Size.

00:48.120 --> 00:51.082
And, there we go, we have our two arguments

00:51.082 --> 00:54.480
and now we can implement the sample function.

00:54.480 --> 00:56.160
So, now I just want to warn you,

00:56.160 --> 00:58.590
this is going to get a little technical,

00:58.590 --> 01:01.230
but I'll try my best to explain.

01:01.230 --> 01:05.190
So, we're gonna start by creating the samples variable.

01:05.190 --> 01:06.930
This is just the variable that will

01:06.930 --> 01:09.570
contain the samples of the memory.

01:09.570 --> 01:11.460
All right, so samples equal,

01:11.460 --> 01:14.430
so now, how are we gonna get these samples?

01:14.430 --> 01:17.100
Well, first of all we have to take our memory,

01:17.100 --> 01:20.910
because we're getting these samples from our memory.

01:20.910 --> 01:23.280
Then, we will probably need the batch size,

01:23.280 --> 01:25.470
because the samples we're gonna get

01:25.470 --> 01:27.810
contain batch size elements.

01:27.810 --> 01:30.000
So, we need memory, we need batch size,

01:30.000 --> 01:32.820
and then, we need some PyTorch, or Python tricks,

01:32.820 --> 01:35.580
to get a good format of these samples.

01:35.580 --> 01:38.715
So, what I'm gonna do, I'm gonna write the line of code,

01:38.715 --> 01:41.940
and then, I'm going to explain it element by element.

01:41.940 --> 01:46.170
So, let's do it. I'm starting by taking a zip() function.

01:46.170 --> 01:48.570
I'm going to explain very soon, what it does.

01:48.570 --> 01:50.190
And, inside this zip function,

01:50.190 --> 01:54.030
I'm goin to add a star, I'm going to explain that, as well.

01:54.030 --> 01:58.200
A star and "random.sample".

01:58.200 --> 02:00.600
So, random, as you might have guessed,

02:00.600 --> 02:03.390
is the random library that we import here.

02:03.390 --> 02:05.670
So, that's the main reason why we had to

02:05.670 --> 02:07.380
import this random library.

02:07.380 --> 02:09.990
It's because we were taking some random samples,

02:09.990 --> 02:12.930
so from this random library,

02:12.930 --> 02:15.540
we're gonna use the sample function.

02:15.540 --> 02:17.940
So, this is our variables and this is a function,

02:17.940 --> 02:20.580
so I'm gonna add some parenthesis, and now,

02:20.580 --> 02:22.800
as you can see, sample is a function,

02:22.800 --> 02:25.500
and we have to input some arguments.

02:25.500 --> 02:27.840
So, as you can see, the first argument is self,

02:27.840 --> 02:30.930
and actually, speaking of self, this corresponds to

02:30.930 --> 02:32.280
self.memory,

02:32.280 --> 02:34.830
the memory of our future instance,

02:34.830 --> 02:37.170
object of our replay memory class.

02:37.170 --> 02:40.975
So, I'm gonna add here "self.memory",

02:40.975 --> 02:43.290
and then the second argument is,

02:43.290 --> 02:44.490
as you might have guessed,

02:44.490 --> 02:47.070
the size of the batch we wanna take randomly

02:47.070 --> 02:49.620
from our memory, and that we gave it a name.

02:49.620 --> 02:53.100
That is batch size, so the second argument

02:53.100 --> 02:55.860
is going to be batch size.

02:55.860 --> 02:58.680
All right, so the line of code is typed,

02:58.680 --> 03:01.470
and now I'm going to explain what it does.

03:01.470 --> 03:04.860
So, first of all, with this random.sample function,

03:04.860 --> 03:08.760
we're taking some random samples from the memory,

03:08.760 --> 03:12.540
that have a fixed size of batch size.

03:12.540 --> 03:14.130
So, that's understandable.

03:14.130 --> 03:18.510
But, then what does this zip(*) function does?

03:18.510 --> 03:20.520
Well, there is no mystery about it,

03:20.520 --> 03:22.860
it is just like reshape function.

03:22.860 --> 03:26.070
So for example, I am going to add a little comment here,

03:26.070 --> 03:28.440
just to explain that I'm going to remove it.

03:28.440 --> 03:32.520
So, let's say that, for example, we have a list

03:32.520 --> 03:34.510
of the following elements, for example,

03:34.510 --> 03:38.160
first, one, two, three,

03:38.160 --> 03:39.660
and then the second element,

03:39.660 --> 03:43.170
four, five, six.

03:43.170 --> 03:45.510
So we have a list of two-tiples,

03:45.510 --> 03:48.360
of three elements one, two, three, and four, five, six.

03:48.360 --> 03:52.860
Well then, if I apply the zip() function with a star on it,

03:52.860 --> 03:54.750
well, what will it become?

03:54.750 --> 03:57.130
zip(*list)

03:58.200 --> 04:01.380
That's going to be equal to a new list,

04:01.380 --> 04:03.360
but of a different shape.

04:03.360 --> 04:05.220
And, this different shape is going to be

04:05.220 --> 04:07.140
one, four,

04:07.140 --> 04:09.660
then two, three,

04:09.660 --> 04:11.493
and then five, six.

04:12.480 --> 04:13.950
All right, so that's just what it does,

04:13.950 --> 04:17.490
it just reshapes your list. Okay?

04:17.490 --> 04:18.750
So, now that you understand

04:18.750 --> 04:21.660
what this zip(*list) does,

04:21.660 --> 04:24.570
well, now let's explain why we have to do it.

04:24.570 --> 04:26.220
So, as you understood,

04:26.220 --> 04:28.860
we are gonna add the events to the memory,

04:28.860 --> 04:32.190
and the events have to form first, the state,

04:32.190 --> 04:33.450
then the action,

04:33.450 --> 04:34.830
and then, the reward.

04:34.830 --> 04:37.290
But, for our algorithm, we don't want this format,

04:37.290 --> 04:40.470
we actually want our samples to have the following format,

04:40.470 --> 04:42.888
formats composed of three samples,

04:42.888 --> 04:45.180
one sample for the state,

04:45.180 --> 04:46.800
one sample for the actions,

04:46.800 --> 04:48.750
and one sample for the reward.

04:48.750 --> 04:51.270
So, for example, let's say that this, "One, two, three"

04:51.270 --> 04:54.720
is state one, action one, and then reward one.

04:54.720 --> 04:57.810
And then, state two, action two, and reward two.

04:57.810 --> 05:00.870
Well, what we want is, one batch for each,

05:00.870 --> 05:03.690
one batch for state one and state two,

05:03.690 --> 05:06.510
one other batch for action one and action two,

05:06.510 --> 05:10.140
and a third batch for reward one and reward two.

05:10.140 --> 05:12.630
That's just a format that is going to be expected next,

05:12.630 --> 05:15.180
because then we will wrap these batches

05:15.180 --> 05:17.190
into a PyTorch variable.

05:17.190 --> 05:19.740
And a PyTorch variable, remember, is a variable

05:19.740 --> 05:23.490
that contains both a tensor and a gradient,

05:23.490 --> 05:26.160
and that in order to be able to differentiate

05:26.160 --> 05:28.080
with respect to a tensor.

05:28.080 --> 05:31.140
To be able to differentiate with respect to a tensor,

05:31.140 --> 05:33.000
we need the structure of a variable

05:33.000 --> 05:35.520
containing a tensor and a gradient.

05:35.520 --> 05:37.800
Again, that's how PyTorch works.

05:37.800 --> 05:40.590
So, to summarize, we're creating one batch

05:40.590 --> 05:43.860
for each of the states, actions and rewards.

05:43.860 --> 05:46.170
And then, we're gonna put each of these batches

05:46.170 --> 05:48.480
separately, into some PyTorch variables,

05:48.480 --> 05:50.820
which each one will get a gradient.

05:50.820 --> 05:51.900
So that eventually,

05:51.900 --> 05:54.600
we'll be able to differentiate each of them.

05:54.600 --> 05:57.480
All right, so that's the purpose of the zip() function.

05:57.480 --> 06:00.510
So, let me just remove this comment,

06:00.510 --> 06:03.180
and now, the only thing that we have to do left,

06:03.180 --> 06:06.180
is to return the samples.

06:06.180 --> 06:07.500
So, as I just explained,

06:07.500 --> 06:10.110
we cannot return the samples directly,

06:10.110 --> 06:11.040
for the simple reason,

06:11.040 --> 06:15.630
that we want to put the samples into a PyTorch variable.

06:15.630 --> 06:18.450
So, to do this for each of the samples,

06:18.450 --> 06:22.470
we're gonna use the map() function, and this map() function,

06:22.470 --> 06:27.390
will do the mapping from the samples to Torch variables,

06:27.390 --> 06:30.150
that will contain a tensor and a gradient.

06:30.150 --> 06:32.250
So, as you can see, this map() function

06:32.250 --> 06:33.600
takes several arguments.

06:33.600 --> 06:36.450
The first argument is a function, and this function

06:36.450 --> 06:38.490
is going to be the function that will convert

06:38.490 --> 06:40.740
the samples into some Torch variables.

06:40.740 --> 06:43.800
And, the second argument is what we want to apply

06:43.800 --> 06:46.530
this function on to, so that will be

06:46.530 --> 06:48.840
the argument of this function.

06:48.840 --> 06:50.550
And therefore, what is it going to be?

06:50.550 --> 06:52.770
That's of course going to be the samples.

06:52.770 --> 06:55.890
So, the second argument here is going to be the samples.

06:55.890 --> 06:58.170
But then, let's define the function,

06:58.170 --> 07:01.020
on which we want to apply each of the samples.

07:01.020 --> 07:02.460
So, to define a function here,

07:02.460 --> 07:04.890
we need to first give a name to the function,

07:04.890 --> 07:07.050
which we'll call, "Lambda".

07:07.050 --> 07:08.820
That's just a name I'm giving,

07:08.820 --> 07:11.640
lambda, then "x",

07:11.640 --> 07:14.370
which is going to be the variable of this function,

07:14.370 --> 07:17.130
so, that is just a name I am giving for the variable.

07:17.130 --> 07:18.480
And then, Colin.

07:18.480 --> 07:21.120
And here, we give the expression of the function.

07:21.120 --> 07:24.333
That is, what we want this lambda function to return.

07:25.380 --> 07:27.000
And so, what it is going to be?

07:27.000 --> 07:29.850
Well, it's supposed to be something that will

07:29.850 --> 07:33.840
convert our samples into a Torch variable.

07:33.840 --> 07:34.800
And to do this,

07:34.800 --> 07:37.530
we already mentioned it in some previous tutorials,

07:37.530 --> 07:40.080
well, we have the variable function for that.

07:40.080 --> 07:43.200
The variable function will make that conversion

07:43.200 --> 07:44.820
from a Torch tensor,

07:44.820 --> 07:46.680
to a variable that will contain

07:46.680 --> 07:48.810
this tensor and the gradient.

07:48.810 --> 07:52.200
So, the first thing I'm gonna add here, is "variable".

07:52.200 --> 07:56.010
Variable, inside of which, I'm going to convert "x",

07:56.010 --> 07:58.530
because "x" is going to be the samples,

07:58.530 --> 08:02.400
once lambda will be applied onto the samples.

08:02.400 --> 08:06.000
But then, that's not all, there is one last technical thing

08:06.000 --> 08:07.263
that we need to implement.

08:07.263 --> 08:09.638
That is the fact, that for each batch,

08:09.638 --> 08:11.790
which is contained in a sample.

08:11.790 --> 08:13.103
For example, the batch of the actions,

08:13.103 --> 08:16.650
A1, A2, A3, and the other actions,

08:16.650 --> 08:18.480
we have to concatenate it,

08:18.480 --> 08:21.150
with respect to the first dimension,

08:21.150 --> 08:23.100
which corresponds to the state.

08:23.100 --> 08:25.530
And, why do we have to make this concatenation?

08:25.530 --> 08:27.690
It's just for everything to be well aligned.

08:27.690 --> 08:31.080
That is, that in each row, the state, the action,

08:31.080 --> 08:35.160
and the reward corresponds to the same time_t,

08:35.160 --> 08:38.520
so that eventually, we get a list of batches,

08:38.520 --> 08:42.420
all well aligned, and each batch is a PyTorch variable.

08:42.420 --> 08:44.820
So, how can we make this concatenation?

08:44.820 --> 08:46.710
Well, we need to use the cat function,

08:46.710 --> 08:48.150
from the Torch library.

08:48.150 --> 08:49.387
So, we're gonna add here,

08:49.387 --> 08:54.150
"torch", to which we add ".cat" applied to "x".

08:54.150 --> 08:55.350
But then, in this cat function,

08:55.350 --> 08:57.300
we need to specify the dimension,

08:57.300 --> 09:00.840
with respect to which we want to make that concatenation.

09:00.840 --> 09:03.651
And, as I just mentioned, this is the first dimension

09:03.651 --> 09:05.850
that has exact zero.

09:05.850 --> 09:08.820
And, here we go, we have our function ready.

09:08.820 --> 09:11.700
This lambda function will take the samples,

09:11.700 --> 09:14.490
concatenate them with respect to the first dimension,

09:14.490 --> 09:17.250
and then eventually, we convert these tensors

09:17.250 --> 09:18.870
into some Torch variables,

09:18.870 --> 09:21.930
that contains both a tensor and a gradient.

09:21.930 --> 09:24.060
So that later, when we apply the casting grade,

09:24.060 --> 09:27.000
in this sense, we will be able to differentiate,

09:27.000 --> 09:28.590
to update the weights.

09:28.590 --> 09:31.980
All right, so this function is ready, and then here,

09:31.980 --> 09:35.220
at the second argument of the map() function,

09:35.220 --> 09:36.893
we need to specify onto what

09:36.893 --> 09:39.480
we want to apply this lambda function.

09:39.480 --> 09:42.693
And, that is on all our samples.

09:42.693 --> 09:45.840
There we go, we will apply this lambda function

09:45.840 --> 09:48.254
on all the samples, so that eventually,

09:48.254 --> 09:50.970
we obtain a list of batches,

09:50.970 --> 09:53.820
where each batch is a PyTorch variable.

09:53.820 --> 09:56.280
All right, so that was quite technical,

09:56.280 --> 09:58.830
but now at least, everything will work well.

09:58.830 --> 10:00.510
We won't use this technique afterwards,

10:00.510 --> 10:01.830
we only use it here,

10:01.830 --> 10:04.110
so if you don't want to have a deep understanding

10:04.110 --> 10:05.790
of the technical details, here.

10:05.790 --> 10:07.350
Well, that's fine, you can just

10:07.350 --> 10:10.650
copy these three lines of code to sample your memory,

10:10.650 --> 10:13.500
if you want to make an artificial intelligence with PyTorch.

10:13.500 --> 10:15.990
It's as you want, but now, the good news is

10:15.990 --> 10:19.110
that we are done with this replay memory class.

10:19.110 --> 10:21.099
Experience replay is now implemented,

10:21.099 --> 10:24.210
and we can move on to the next and final class,

10:24.210 --> 10:26.940
which will be the whole, deep curating model.

10:26.940 --> 10:29.610
So, in this deep curating model, we will have,

10:29.610 --> 10:31.140
of course, our network.

10:31.140 --> 10:33.120
We will add experience replay,

10:33.120 --> 10:36.600
and then all the rest of the deep curating algorithm,

10:36.600 --> 10:39.180
so it's going to be a much bigger class.

10:39.180 --> 10:41.010
We're gonna make about 10 functions,

10:41.010 --> 10:43.890
but that's only because we are doing this step-by-step,

10:43.890 --> 10:46.100
so that you can understand better, what's going on.

10:46.100 --> 10:49.260
So, I can't wait to implement our deep curating model,

10:49.260 --> 10:51.183
and until then, enjoy AI.