WEBVTT

00:00.390 --> 00:02.970
-: Hello and welcome to this Python tutorial.

00:02.970 --> 00:05.730
All right, so now we're gonna make the push function

00:05.730 --> 00:07.620
which will do two tasks.

00:07.620 --> 00:10.410
First, it will append a new transition

00:10.410 --> 00:12.960
or a new event in the memory.

00:12.960 --> 00:15.540
And then second, it will make sure that the memory

00:15.540 --> 00:18.150
has always 100 transitions.

00:18.150 --> 00:20.340
I'm saying 100 because we gave the example

00:20.340 --> 00:22.860
of 100 events in the previous tutorial,

00:22.860 --> 00:25.050
but in fact this will be much more than 100.

00:25.050 --> 00:28.530
This will be rather maybe 10,000 or 100,000.

00:28.530 --> 00:30.270
We'll see. But anyway,

00:30.270 --> 00:32.850
this value will be the capacity.

00:32.850 --> 00:35.670
All right, so let's make this push function.

00:35.670 --> 00:39.570
So as usual, we start with def, to define a new function

00:39.570 --> 00:41.430
and then we give a name to this function.

00:41.430 --> 00:43.740
So we call it, push.

00:43.740 --> 00:46.500
And this function will have two arguments.

00:46.500 --> 00:50.130
First, as usual, self, that refers to the object.

00:50.130 --> 00:51.180
Add the next one.

00:51.180 --> 00:52.890
What do you think that will be?

00:52.890 --> 00:55.830
Where I'll remember, this push function will be used

00:55.830 --> 00:58.890
to append a new event in the memory.

00:58.890 --> 01:01.350
We already have the memory, so what we need now,

01:01.350 --> 01:03.173
as a variable, is an event.

01:03.173 --> 01:05.760
So that will be our argument, our input,

01:05.760 --> 01:09.930
and we will append this input in the memory

01:09.930 --> 01:12.273
which is a variable of the object.

01:13.170 --> 01:15.540
All right, so event,

01:15.540 --> 01:17.850
you can actually call it event or transition.

01:17.850 --> 01:18.810
That's the same.

01:18.810 --> 01:20.850
And you will see in the next code sections,

01:20.850 --> 01:24.180
what exactly is this event, what form it has.

01:24.180 --> 01:26.880
Actually I can tell you now, this event,

01:26.880 --> 01:29.070
this transition that we're adding to the memory

01:29.070 --> 01:31.140
is a total of four elements.

01:31.140 --> 01:34.170
The first one is the last state, that is st.

01:34.170 --> 01:35.880
The second one is the new state.

01:35.880 --> 01:37.440
That is st plus one.

01:37.440 --> 01:41.160
The third one is the last section, that is 80,

01:41.160 --> 01:42.840
the action that was displayed.

01:42.840 --> 01:45.390
And the fourth one is the last reward,

01:45.390 --> 01:48.180
the last reward obtained. That is rt.

01:48.180 --> 01:51.750
So that's exactly the form that this event will have.

01:51.750 --> 01:53.370
All right, and that's all.

01:53.370 --> 01:55.440
We just need the event because we just want to

01:55.440 --> 01:57.180
append the event to the memory,

01:57.180 --> 01:58.830
and then making sure that the memory

01:58.830 --> 02:01.260
has capacity elements.

02:01.260 --> 02:03.840
All right, so now let's go inside the function.

02:03.840 --> 02:06.120
So the first thing we'll do is append

02:06.120 --> 02:07.620
the new event to the memory.

02:07.620 --> 02:09.630
And that's very simple because we're gonna use

02:09.630 --> 02:10.830
the append function.

02:10.830 --> 02:12.360
So that will be direct.

02:12.360 --> 02:15.300
And when we use the append function, we must start

02:15.300 --> 02:18.720
with the list to which we want to append something.

02:18.720 --> 02:21.030
And this list is of course, memory.

02:21.030 --> 02:22.290
So we start with memory.

02:22.290 --> 02:25.350
And since memory is a variable of the object, we start here

02:25.350 --> 02:28.617
with, self dot memory.

02:28.617 --> 02:30.570
There we go.

02:30.570 --> 02:33.570
So self dot memory, and then we add a dot

02:33.570 --> 02:35.910
and then the append function, which is the first one.

02:35.910 --> 02:39.210
So append, and inside the append function,

02:39.210 --> 02:42.450
we input what we want to append to memory,

02:42.450 --> 02:44.760
which is of course our events.

02:44.760 --> 02:46.350
So event here.

02:46.350 --> 02:49.110
And that will append the new event,

02:49.110 --> 02:51.330
composed of the last date, new state,

02:51.330 --> 02:54.570
last action, and last reward, to the memory.

02:54.570 --> 02:56.850
All right, so that's the first thing done.

02:56.850 --> 02:58.980
And then the second thing we need to do is

02:58.980 --> 03:03.960
make sure that the memory always contains capacity elements.

03:03.960 --> 03:07.500
So let's say capacity is now 100,000.

03:07.500 --> 03:09.510
That's probably the capacity we'll choose, because

03:09.510 --> 03:12.690
then 1 million elements might make the training slow.

03:12.690 --> 03:15.120
So let's say 100,000.

03:15.120 --> 03:16.350
Now we're gonna make sure

03:16.350 --> 03:20.007
that our memory always contains 100,000 transitions,

03:20.007 --> 03:22.830
100,000 events, and never more.

03:22.830 --> 03:24.360
So of course at the beginning

03:24.360 --> 03:26.340
it will have one, then two and three.

03:26.340 --> 03:29.490
But then once it reaches 100,000 events,

03:29.490 --> 03:32.400
well it will always have 100,000 events.

03:32.400 --> 03:36.180
So to make sure that we simply need to make an if condition

03:36.180 --> 03:39.570
with this upper bound that we don't want to go over.

03:39.570 --> 03:42.840
So if, okay so the idea that we'll use here

03:42.840 --> 03:45.630
is that if we go over the limit,

03:45.630 --> 03:48.720
well we will delete the first transition,

03:48.720 --> 03:52.140
the first event of the memory, and therefore

03:52.140 --> 03:54.000
we're gonna take the length function

03:54.000 --> 03:56.130
to take the length of the memory,

03:56.130 --> 03:58.470
that is the number of elements in the memory.

03:58.470 --> 04:02.730
So here in the length function, we input self dot memory.

04:02.730 --> 04:04.380
That's the memory.

04:04.380 --> 04:09.120
So if the number of elements in self dot memory is larger

04:09.120 --> 04:12.150
than the capacity, well, in that case

04:12.150 --> 04:15.210
we will remove the first element to make sure

04:15.210 --> 04:18.120
that the memory always has the same number

04:18.120 --> 04:20.550
of capacity elements.

04:20.550 --> 04:21.900
And to delete the first element

04:21.900 --> 04:23.040
there is nothing more simple.

04:23.040 --> 04:24.570
We're gonna use another function

04:24.570 --> 04:27.210
which is the Dell Python trick.

04:27.210 --> 04:29.820
So Dell, and therefore

04:29.820 --> 04:31.710
we want to remove the first transition

04:31.710 --> 04:34.500
which is the oldest transition in the memory

04:34.500 --> 04:37.380
because the last transitions are the ones that we append

04:37.380 --> 04:39.540
and therefore that's the newest transitions.

04:39.540 --> 04:42.360
So the first transitions are the oldest one.

04:42.360 --> 04:47.360
And so here we want to delete self dot memory and bracket.

04:49.200 --> 04:51.690
And we take the first element of the memory

04:51.690 --> 04:53.250
which has index zero.

04:53.250 --> 04:55.500
So self dot memory zero.

04:55.500 --> 04:58.590
Now interesting, I have a little warning which says

04:58.590 --> 05:00.990
that there is a non-defined name capacity.

05:00.990 --> 05:03.990
That's because the capacity here is not the input.

05:03.990 --> 05:07.590
That must be the capacity variable attached to the object.

05:07.590 --> 05:10.140
And therefore, here we need to add a self,

05:10.140 --> 05:11.580
self dot capacity.

05:11.580 --> 05:13.230
And now the warning is gone.

05:13.230 --> 05:15.780
So now you understand even more the use of self.

05:15.780 --> 05:18.390
That's really to refer to the object

05:18.390 --> 05:22.470
to take the capacity of the object that will be created,

05:22.470 --> 05:25.555
that is an instance of the replay memory class.

05:25.555 --> 05:29.250
All right, so we're done with this push function.

05:29.250 --> 05:31.320
And so now we can move on to the next function

05:31.320 --> 05:33.480
which is the simple function

05:33.480 --> 05:35.190
which will take some random samples

05:35.190 --> 05:38.850
from this memory of the last capacity elements.

05:38.850 --> 05:40.470
And doing this will improve a lot

05:40.470 --> 05:42.480
the deep learning process.

05:42.480 --> 05:44.550
All right, so let's do this in the next tutorial.

05:44.550 --> 05:46.503
And until then, enjoy AI.
