WEBVTT

00:00.420 --> 00:02.880
-: Hello and welcome to this Python tutorial.

00:02.880 --> 00:04.620
All right, it's a very quick tutorial today

00:04.620 --> 00:06.690
to make the score function,

00:06.690 --> 00:10.080
and so basically, this function will just compute the score

00:10.080 --> 00:12.840
on the sliding window of the rewards,

00:12.840 --> 00:15.960
and so basically, we will very simply compute the mean

00:15.960 --> 00:18.180
of all the rewards in the reward window.

00:18.180 --> 00:19.650
So this will be very simple.

00:19.650 --> 00:20.880
Let's do this now.

00:20.880 --> 00:22.500
We're gonna make this new function

00:22.500 --> 00:24.480
that we're gonna call score.

00:24.480 --> 00:27.270
And the score function will just take the argument self

00:27.270 --> 00:29.850
because basically we don't need anything.

00:29.850 --> 00:30.870
We need to take self

00:30.870 --> 00:33.810
because, of course, we will take self.rewardWindow.

00:33.810 --> 00:35.160
So just self.

00:35.160 --> 00:37.800
And then colon and there we go.

00:37.800 --> 00:39.990
It's gonna take one line of code.

00:39.990 --> 00:41.850
So we wanna compute the mean

00:41.850 --> 00:43.830
of all the rewards in the reward window.

00:43.830 --> 00:46.710
So that's basically the sum of all the rewards

00:46.710 --> 00:48.120
in this reward window

00:48.120 --> 00:50.250
that are between -1 and +1

00:50.250 --> 00:54.150
divided by the total number of elements in this window.

00:54.150 --> 00:55.170
So let's do this.

00:55.170 --> 00:57.270
We're directly going to return that.

00:57.270 --> 00:59.730
So I'm starting with return.

00:59.730 --> 01:01.800
And so we need to take the sum

01:01.800 --> 01:05.220
of all the rewards in the reward window,

01:05.220 --> 01:06.570
and to do this, we simply need

01:06.570 --> 01:08.820
to take the reward window itself

01:08.820 --> 01:11.823
and so I'm inputting here self.rewardWindow.

01:14.542 --> 01:15.840
And so very simply,

01:15.840 --> 01:20.790
this will sum all the elements inside rewardWindow.

01:20.790 --> 01:22.200
So that's pretty practical.

01:22.200 --> 01:23.910
And then to get the mean,

01:23.910 --> 01:26.730
we need to divide this sum

01:26.730 --> 01:29.490
by the number of elements in the rewardWindow,

01:29.490 --> 01:31.170
and to get the number of elements,

01:31.170 --> 01:34.200
well, we need to take the len function,

01:34.200 --> 01:37.023
and then we take our rewardWindow again.

01:39.210 --> 01:40.043
There it is.

01:40.043 --> 01:42.810
But now we just need to be careful with something.

01:42.810 --> 01:47.190
It's that len self.rewardWindow is a denominator

01:47.190 --> 01:50.160
and this must absolutely not be equal to zero.

01:50.160 --> 01:52.020
No matter what, we need to avoid this

01:52.020 --> 01:54.360
and to make sure that the denominator here

01:54.360 --> 01:55.770
is not equal to zero.

01:55.770 --> 01:58.800
We're gonna add this safety trick.

01:58.800 --> 02:01.510
We're gonna add here a plus one

02:02.460 --> 02:05.670
so that len self.rewardWindow plus one

02:05.670 --> 02:07.350
will never be equal to zero.

02:07.350 --> 02:09.450
If the denominator here is equal to zero,

02:09.450 --> 02:11.370
this will crash your system.

02:11.370 --> 02:13.200
So we must avoid it,

02:13.200 --> 02:15.000
and that's totally fine to add a plus one.

02:15.000 --> 02:17.850
We will still get a good measure in the score.

02:17.850 --> 02:18.780
All right, perfect.

02:18.780 --> 02:20.010
And so that's all.

02:20.010 --> 02:21.480
We have our score function,

02:21.480 --> 02:23.640
which will give us the mean of the rewards

02:23.640 --> 02:25.200
in the signing window.

02:25.200 --> 02:28.050
All right, now let's move on to the next function,

02:28.050 --> 02:29.520
which is the save function

02:29.520 --> 02:31.530
that will save your model,

02:31.530 --> 02:33.480
that is save the brain of your core

02:33.480 --> 02:34.830
so that you can then be able

02:34.830 --> 02:37.530
to reuse it by loading it with another function

02:37.530 --> 02:39.840
that we'll make after the save function.

02:39.840 --> 02:43.440
So that's really practical to have this save trick,

02:43.440 --> 02:45.510
save function to save your models

02:45.510 --> 02:47.130
in case you want to reuse them

02:47.130 --> 02:49.590
for any kind of purpose where they can be useful.

02:49.590 --> 02:51.540
So that's what we'll do in the next tutorial.

02:51.540 --> 02:53.373
And until then, enjoy AI.
