WEBVTT

00:00.420 --> 00:02.850
-: Hello, and welcome to the super exciting part

00:02.850 --> 00:06.450
of our AI creation, the part where we make it smart.

00:06.450 --> 00:07.890
So that's exactly what happens when

00:07.890 --> 00:08.900
training the AI.

00:08.900 --> 00:10.950
We will train its intelligence

00:10.950 --> 00:13.500
to reach the goal we're wanting to accomplish.

00:13.500 --> 00:15.660
And to do this, we're going to basically

00:15.660 --> 00:19.290
train the neural network to output the right predictions,

00:19.290 --> 00:20.850
and then everything is already ready

00:20.850 --> 00:22.620
because these output signals

00:22.620 --> 00:25.110
from the brain already have the right transmission

00:25.110 --> 00:27.750
to the body to play the final actions.

00:27.750 --> 00:30.720
So basically now what we're about to do is

00:30.720 --> 00:32.430
something we already did before,

00:32.430 --> 00:34.950
we are just going to take some random batches

00:34.950 --> 00:37.530
from the memory, get our inputs from these samples,

00:37.530 --> 00:40.860
get the outputs, get the targets, get the predictions,

00:40.860 --> 00:42.900
compute the last error between the predictions

00:42.900 --> 00:45.900
and the targets, and then perform backward propagation with

00:45.900 --> 00:48.810
Stochastic Gradient Descent to update the weights according

00:48.810 --> 00:51.990
to how much they contributed to this last error.

00:51.990 --> 00:53.160
So let's do all this,

00:53.160 --> 00:55.320
you're gonna see how it's going to be so easy

00:55.320 --> 00:58.320
because we already have all the tools to implement this,

00:58.320 --> 01:00.780
not only we have the PyTorch tools,

01:00.780 --> 01:03.390
like the optimizer and the loss functions,

01:03.390 --> 01:06.810
but also we have all the classes that we made before

01:06.810 --> 01:08.070
like our brain, of course

01:08.070 --> 01:10.770
which we're gonna use to get the predictions,

01:10.770 --> 01:14.820
then our Experience Replay Implementation, Eligibility

01:14.820 --> 01:18.810
Trace, and all these tools combined to the PyTorch tools

01:18.810 --> 01:21.420
will make the training super performant, and therefore,

01:21.420 --> 01:24.750
eventually we will get a super powerful AI.

01:24.750 --> 01:26.490
So let's make this training happen.

01:26.490 --> 01:28.440
Let's make our AI smart.

01:28.440 --> 01:30.902
And the first thing we're gonna do now is get

01:30.902 --> 01:33.540
the Loss Function that we'll use during the training,

01:33.540 --> 01:36.630
when computing the error, and an optimizer.

01:36.630 --> 01:39.150
That's the first thing we'll do, so let's create

01:39.150 --> 01:40.740
a variable for the Loss Function.

01:40.740 --> 01:44.850
We're gonna call it "Loss" and this will be equal to

01:44.850 --> 01:47.070
the "MSC loss"

01:47.070 --> 01:48.840
function from

01:48.840 --> 01:50.377
the "nn" module.

01:50.377 --> 01:52.830
"nn" dot "MSC loss"

01:52.830 --> 01:54.210
That's the loss function we'll use

01:54.210 --> 01:57.000
because basically our predictions are Q values,

01:57.000 --> 01:58.830
you know, we're predicting the Q values

01:58.830 --> 02:00.600
of the different actions and therefore,

02:00.600 --> 02:02.070
since these are real numbers

02:02.070 --> 02:05.697
well we're kind of doing some Neural Network for Regression

02:05.697 --> 02:08.637
and therefore the Loss Function is the Means Core Error.

02:08.637 --> 02:12.570
That's the Loss Function we use in general for regression.

02:12.570 --> 02:13.470
All right, so now

02:13.470 --> 02:16.650
that we have our Loss Function, let's get our Optimizer.

02:16.650 --> 02:19.731
So "Optimizer" here, that's the variable we create for it,

02:19.731 --> 02:22.856
"Optimizer" and we're gonna take,

02:22.856 --> 02:27.210
as usual, as for the self-driving car, the Atom Optimizer.

02:27.210 --> 02:31.050
That's a very powerful Optimizer that will work wonders

02:31.050 --> 02:32.220
for the training.

02:32.220 --> 02:33.487
So let's get this one.

02:33.487 --> 02:35.377
"Optim" dot

02:35.377 --> 02:36.990
"Atom"

02:36.990 --> 02:40.050
And remember that's exactly for the self-driving car,

02:40.050 --> 02:43.170
we have to input two essential arguments.

02:43.170 --> 02:46.770
The first one is the one that will make the connection

02:46.770 --> 02:48.270
between the optimizer

02:48.270 --> 02:50.400
and the parameters of our neural network,

02:50.400 --> 02:53.160
that is the weights of the neurons of our brain.

02:53.160 --> 02:56.759
And to do this, we take our brain, which we called "cnn"

02:56.759 --> 02:59.400
that's the object we created for our brain.

02:59.400 --> 03:03.660
And so "cnn" dot "remember parameters"

03:03.660 --> 03:04.590
There we go.

03:04.590 --> 03:06.360
And some parenthesis.

03:06.360 --> 03:09.390
So that makes the connection between the optimizer

03:09.390 --> 03:12.960
and the weights of the neurons in the brain of our AI.

03:12.960 --> 03:16.950
And then the second argument is a learning rate

03:16.950 --> 03:19.140
and that's given by "LR"

03:19.140 --> 03:22.230
And so here we have to take a small learning rate

03:22.230 --> 03:24.390
because we don't want to converge too fast

03:24.390 --> 03:26.370
and we want to have some exploration.

03:26.370 --> 03:27.780
And therefore a good learning rate that we

03:27.780 --> 03:32.303
can take here is a small one that is oh 0.001, that is 0.1%.

03:33.540 --> 03:36.543
I think that's the same we used for the sale driving car.

03:37.470 --> 03:40.770
All right, so now we have a Loss Function, an Optimizer.

03:40.770 --> 03:43.590
So we are almost ready to start the Full Loop

03:43.590 --> 03:46.290
where actually we will start the Full Loop right now

03:46.290 --> 03:47.640
but just before we do it

03:47.640 --> 03:51.450
we are gonna decide the size of the number of Epochs

03:51.450 --> 03:53.040
we will be training the AI

03:53.040 --> 03:55.380
and therefore I'm creating a new variable here,

03:55.380 --> 03:58.533
which will correspond to this number of Epochs.

03:59.460 --> 04:02.610
And let's set it equal to 100.

04:02.610 --> 04:05.010
That will be way enough to train the AI,

04:05.010 --> 04:08.610
and I even bet that the AI will manage to reach the best

04:08.610 --> 04:11.580
way before 100, like 20 or 30.

04:11.580 --> 04:14.100
Let's see, but for now, let's take 100

04:14.100 --> 04:16.230
and if we need it, we will increase it,

04:16.230 --> 04:18.480
but I don't think that will be necessary.

04:18.480 --> 04:20.460
Okay, so now that we have our number of Epochs

04:20.460 --> 04:22.200
we can start to make the Full Loop.

04:22.200 --> 04:24.180
You know, this main Full Loop

04:24.180 --> 04:27.120
of the training when we train over the Epochs.

04:27.120 --> 04:31.650
So four, then the iterative variable is going to be "epoch"

04:31.650 --> 04:35.130
That's what we choose four Epoch in.

04:35.130 --> 04:39.030
Now of course, we're gonna use the Range Function to say

04:39.030 --> 04:42.780
that we want to go from the first Epoch "one"

04:42.780 --> 04:44.883
to "Number of Epochs"

04:47.137 --> 04:49.290
"plus one" because remember

04:49.290 --> 04:52.140
the upper bound of a range is not included.

04:52.140 --> 04:54.810
And therefore, if we want to go up to 100,

04:54.810 --> 04:57.690
well we have to specify in the Epochs "plus one"

04:57.690 --> 04:59.520
to go up to 100.

04:59.520 --> 05:03.750
All right, so "call in" and now let's get into the loop.

05:03.750 --> 05:06.570
All right, so the first thing we're gonna do is to

05:06.570 --> 05:08.760
do 200 runs of 10 steps.

05:08.760 --> 05:11.460
So each Epoch will be 200 runs,

05:11.460 --> 05:14.100
one after the other, of 10 steps.

05:14.100 --> 05:16.770
And to do this, we have this "Run Steps" function

05:16.770 --> 05:19.050
from our Experience Replay class, and therefore

05:19.050 --> 05:21.360
to use this function, which is actually a method

05:21.360 --> 05:24.000
because we will get it from our Memory object

05:24.000 --> 05:26.874
which is an object from the Replay Memory class.

05:26.874 --> 05:30.450
To generate these 200 runs of 10 steps,

05:30.450 --> 05:34.170
well, we have to take our Memory Object that I

05:34.170 --> 05:36.217
remind we created right here

05:36.217 --> 05:39.510
"memory" is an object of the Replay Memory class

05:39.510 --> 05:44.010
with end steps that is 10 steps and a capacity of 10,000.

05:44.010 --> 05:48.660
We created this object, and from this object we take, well

05:48.660 --> 05:50.947
this "Run Steps" function

05:50.947 --> 05:55.947
"Run Steps" and we specify 200 successive runs of 10 steps.

05:57.090 --> 06:01.020
So that will just at each epoch basically run 200 steps.

06:01.020 --> 06:02.043
And now, now

06:02.043 --> 06:05.460
that we have these 200 steps running at each epoch

06:05.460 --> 06:09.270
well, it's time to sample some batches from these runs

06:09.270 --> 06:12.000
and to sample these batches, we have another function

06:12.000 --> 06:14.430
from our memory, which is "Sample Batch"

06:14.430 --> 06:17.580
and that will exactly generate some batches,

06:17.580 --> 06:19.500
from these 200 runs.

06:19.500 --> 06:22.800
But remember, these batches are this time batches

06:22.800 --> 06:25.980
of series of transitions that is series

06:25.980 --> 06:28.140
of 10 steps as opposed to before

06:28.140 --> 06:30.030
where the batches were just some batches

06:30.030 --> 06:32.350
of single transitions Here this time there are

06:32.350 --> 06:36.120
gonna be batches of 10 steps, 10 transitions

06:36.120 --> 06:39.780
and therefore now it's time to get from our memory

06:39.780 --> 06:41.760
these random batches and to get

06:41.760 --> 06:44.970
then we use the "Simple Batch@ function to

06:44.970 --> 06:47.310
which we have to apply the batch size.

06:47.310 --> 06:50.310
And for the batch size where we can take 32

06:50.310 --> 06:54.180
or even 64 or even 128,

06:54.180 --> 06:55.830
remember for batch sizes

06:55.830 --> 06:58.140
it's a common practice to use 32.

06:58.140 --> 06:59.490
That's what you will see

06:59.490 --> 07:02.160
in general in the Neural Networks Architectures

07:02.160 --> 07:03.870
when doing some batch learning.

07:03.870 --> 07:05.790
But this time it's quite different,

07:05.790 --> 07:09.420
we are just sampling some batches of 10 steps.

07:09.420 --> 07:11.820
So it's better to take batches with larger sizes.

07:11.820 --> 07:14.679
So that's why we can take 64 or 128.

07:14.679 --> 07:19.679
So we're gonna take 128, and actually this is gonna be

07:19.770 --> 07:24.480
inside a four loop because we want to take several batches

07:24.480 --> 07:25.860
and we are taking them

07:25.860 --> 07:29.520
in what is returned by this "Simple Batch" function.

07:29.520 --> 07:31.830
So this four-loop-four-batch in memory

07:31.830 --> 07:34.590
Simple Batch 128

07:34.590 --> 07:37.597
means that every 128 steps,

07:37.597 --> 07:42.450
well, our memory will give us a batch of size 128

07:42.450 --> 07:46.440
which will contain actually the last 128 steps

07:46.440 --> 07:48.030
that were just run.

07:48.030 --> 07:51.540
We're just getting some batches of size 128.

07:51.540 --> 07:54.390
And the learning is going to happen on these batches.

07:54.390 --> 07:55.830
And besides inside these batches

07:55.830 --> 07:58.380
we will have eligibility trace running, you know

07:58.380 --> 08:00.330
to learn every 10 steps.

08:00.330 --> 08:03.930
All right, so now inside this loop, which is still happening

08:03.930 --> 08:07.950
in one epoch, but now this time we are in a specific batch.

08:07.950 --> 08:10.950
And so now the first thing we're gonna do is

08:10.950 --> 08:14.700
we're gonna get our inputs and our targets separately.

08:14.700 --> 08:16.680
And that, as I told you, it's very easy

08:16.680 --> 08:18.210
we can do it with one

08:18.210 --> 08:21.750
of the tools we implemented, which is eligibility traits.

08:21.750 --> 08:22.890
As you can see here

08:22.890 --> 08:26.970
this eligibility traits function takes a batch as input

08:26.970 --> 08:29.847
and now we have the batch and returns as output

08:29.847 --> 08:32.520
the inputs and the targets.

08:32.520 --> 08:33.353
So right now

08:33.353 --> 08:35.760
what we can simply do is create two new variables

08:35.760 --> 08:38.850
which are gonna be the inputs and the targets.

08:38.850 --> 08:42.400
And do this, input comma targets

08:43.350 --> 08:46.050
equals exactly what returns this

08:46.050 --> 08:48.150
eligibility trace function apply

08:48.150 --> 08:49.110
to a batch.

08:49.110 --> 08:52.440
So we will apply this function to the batch of our loop.

08:52.440 --> 08:54.150
And so what we'll do is just

08:54.150 --> 08:58.500
eligibility trace applied

08:58.500 --> 09:00.810
to the batch of our full loop.

09:00.810 --> 09:01.643
All right?

09:01.643 --> 09:04.470
So that gets us the input and the targets.

09:04.470 --> 09:06.150
But, in PyTorch,

09:06.150 --> 09:08.220
there is always something more we have to do.

09:08.220 --> 09:10.560
And of course this is to convert the inputs

09:10.560 --> 09:11.460
of the neural network

09:11.460 --> 09:14.460
and also the targets into some torch variables.

09:14.460 --> 09:16.320
But no worries, there is nothing new.

09:16.320 --> 09:18.510
We know how to do it, we can do it this way.

09:18.510 --> 09:22.500
We take our inputs, then our targets,

09:22.500 --> 09:26.643
and while they will be equal to variable input,

09:27.870 --> 09:29.010
that's for the inputs,

09:29.010 --> 09:32.520
and variable targets

09:32.520 --> 09:34.260
and that's for the targets.

09:34.260 --> 09:35.430
All right?

09:35.430 --> 09:38.190
So the inputs of the brain are converted

09:38.190 --> 09:42.060
into some torch variables, and the targets also

09:42.060 --> 09:44.400
are converted into some torch variables.

09:44.400 --> 09:48.780
So now we can get the input, enter the neural network,

09:48.780 --> 09:50.340
and why do we need to do this?

09:50.340 --> 09:53.850
That's because the next step is to get the predictions.

09:53.850 --> 09:55.680
We have the input, we have the target,

09:55.680 --> 09:57.900
now, of course, we need our predictions

09:57.900 --> 10:00.300
because then what happens is that we will compute the loss

10:00.300 --> 10:03.000
between the predictions and the targets.

10:03.000 --> 10:06.780
So let's get these predictions to get them,

10:06.780 --> 10:08.550
well again, this is so simple now,

10:08.550 --> 10:11.727
we just need to take our brain, which is CNN,

10:11.727 --> 10:16.727
our convolution neural network, and apply it to our inputs.

10:17.460 --> 10:18.420
There we go.

10:18.420 --> 10:20.880
The inputs go into the neural network,

10:20.880 --> 10:24.270
and the neural network will output the predictions.

10:24.270 --> 10:25.103
Perfect.

10:25.103 --> 10:27.330
So now we have the predictions, we have the targets,

10:27.330 --> 10:30.420
so we can get the loss, and that's the next step.

10:30.420 --> 10:32.160
We're gonna introduce a new variable

10:32.160 --> 10:34.950
because right now we're gonna get the last error

10:34.950 --> 10:37.170
which is different than the loss function

10:37.170 --> 10:40.200
because we use the loss function to get the loss error.

10:40.200 --> 10:44.460
So loss error here, and that we will get it

10:44.460 --> 10:48.850
with the loss function applied to our predictions

10:49.830 --> 10:52.290
and the targets.

10:52.290 --> 10:53.460
There we go.

10:53.460 --> 10:55.200
See how everything is smooth now?

10:55.200 --> 10:56.220
Everything is logical.

10:56.220 --> 10:59.520
We get input first, the targets, then thanks to the inputs

10:59.520 --> 11:00.750
we get the predictions.

11:00.750 --> 11:02.940
And then thanks to the predictions and the targets,

11:02.940 --> 11:04.143
we get the last error.

11:05.190 --> 11:07.440
So, very logical and smooth.

11:07.440 --> 11:09.300
And now what is the next step?

11:09.300 --> 11:10.950
Well, same logical path.

11:10.950 --> 11:12.240
Now, that we have the loss,

11:12.240 --> 11:14.490
we can back propagate this loss error

11:14.490 --> 11:16.020
back into the neural network

11:16.020 --> 11:17.310
to update the weights.

11:17.310 --> 11:19.770
And we do that with stochastic gradient descent.

11:19.770 --> 11:21.600
And to perform stochastic gradient descent,

11:21.600 --> 11:23.220
we need our optimizer

11:23.220 --> 11:26.550
but we already got it here, our atom optimizer,

11:26.550 --> 11:29.490
but now at this point, remember what we have to do,

11:29.490 --> 11:31.440
we have to initialize it.

11:31.440 --> 11:32.460
And to initialize it,

11:32.460 --> 11:35.490
remember we take our optimizer object,

11:35.490 --> 11:37.470
and then we apply

11:37.470 --> 11:41.580
the zero grad method.

11:41.580 --> 11:42.413
There we go,

11:42.413 --> 11:43.830
we don't forget the parenthesis,

11:43.830 --> 11:45.600
that initializes it.

11:45.600 --> 11:49.740
And now next step is to back propagate the last error back

11:49.740 --> 11:51.180
into the neural network.

11:51.180 --> 11:52.680
And to do this while we take

11:52.680 --> 11:53.770
our loss error

11:54.930 --> 11:56.430
and we apply on it

11:56.430 --> 11:59.070
the backward method.

11:59.070 --> 12:02.280
So that's exactly to apply backward propagation.

12:02.280 --> 12:05.370
And then finally, now that the loss error is back propagated

12:05.370 --> 12:06.750
into the neural network,

12:06.750 --> 12:08.190
well we can update the weight

12:08.190 --> 12:10.470
with stochastic gradient descent.

12:10.470 --> 12:13.980
And to do this, remember we take our optimizer

12:13.980 --> 12:17.610
and then we apply the step method.

12:17.610 --> 12:18.510
There we go.

12:18.510 --> 12:20.340
The weights are now updated.

12:20.340 --> 12:22.980
As I told you, not only we already did it,

12:22.980 --> 12:26.310
but now it seems so simple and so natural.

12:26.310 --> 12:28.740
So, now we're gonna do something fun.

12:28.740 --> 12:31.980
We are going to print the average reward every epoch.

12:31.980 --> 12:34.740
So you know, we can keep track of how the AI is going,

12:34.740 --> 12:36.300
how the training is going.

12:36.300 --> 12:39.000
We want to see the average reward increasing

12:39.000 --> 12:40.800
over the steps, over the epochs.

12:40.800 --> 12:44.160
And at first, of course there is this exploration phase.

12:44.160 --> 12:47.670
So, the average reward might not increase at the beginning

12:47.670 --> 12:50.520
but then once it reaches the exploitation phase,

12:50.520 --> 12:53.610
then we'll see the average reward definitely increase.

12:53.610 --> 12:56.430
And it will increase up to a certain level

12:56.430 --> 12:59.490
which is when it reaches the vest as fast as possible.

12:59.490 --> 13:01.473
So, next on with a print,

13:02.340 --> 13:04.770
you know we are doing this in one netbox,

13:04.770 --> 13:07.260
we have to go back to the loop here, print,

13:07.260 --> 13:08.550
and then we're gonna print,

13:08.550 --> 13:13.230
well first epoch, a column, then percent S,

13:13.230 --> 13:15.180
because we're gonna convert everything

13:15.180 --> 13:16.920
into a string that's better.

13:16.920 --> 13:21.423
And then we're gonna add the average reward,

13:22.680 --> 13:25.770
and then we add percent S as well.

13:25.770 --> 13:30.770
Then we're gonna close the quote, and then we add a percent.

13:30.960 --> 13:32.763
And on the other side, you know,

13:33.987 --> 13:34.820
we input the variables

13:34.820 --> 13:36.750
that are gonna be this first person S,

13:36.750 --> 13:38.550
that is the epoch here.

13:38.550 --> 13:40.530
And this second variable corresponding

13:40.530 --> 13:43.140
to the average reward, which we'll compute right now.

13:43.140 --> 13:45.900
So the average reward variable doesn't exist yet.

13:45.900 --> 13:48.240
We're going to create it right now.

13:48.240 --> 13:52.620
So, we are gonna use STR epoch,

13:52.620 --> 13:53.910
even if epoch is a number

13:53.910 --> 13:56.700
we will convert that into a string that's better.

13:56.700 --> 14:00.330
And, we are going to add STR

14:00.330 --> 14:02.310
that's gonna be the average reward,

14:02.310 --> 14:03.600
and so we're gonna create a variable

14:03.600 --> 14:07.470
that we're gonna call AVG reward.

14:07.470 --> 14:10.270
And now we're gonna create this variable and compute it.

14:11.498 --> 14:12.510
Okay, so let's do this.

14:12.510 --> 14:15.041
That's the only thing we have to do left.

14:15.041 --> 14:16.320
So, epoch we already have.

14:16.320 --> 14:17.700
Now let's compute average reward

14:17.700 --> 14:20.250
and we need to compute it right here

14:20.250 --> 14:21.690
still in the epoch loop,

14:21.690 --> 14:23.340
but out of the batch loop.

14:23.340 --> 14:26.190
Because now we have our batch sampled,

14:26.190 --> 14:28.410
and we have our training happening in the batch,

14:28.410 --> 14:29.730
but now the board propagation

14:29.730 --> 14:32.310
plus the backward propagation is done in the batch.

14:32.310 --> 14:34.560
So we are getting back into the epoch loop

14:34.560 --> 14:38.430
and we can now compute the cumulative rewards

14:38.430 --> 14:41.520
which we can do with our end steps object

14:41.520 --> 14:44.370
because our end steps object contains

14:44.370 --> 14:46.080
this function reward steps

14:46.080 --> 14:48.750
that allows us to get the cumulative rewards

14:48.750 --> 14:51.780
happening in the steps, you know, during the end steps run.

14:51.780 --> 14:54.990
So we are going to use it right now to

14:54.990 --> 14:57.840
update the new rewards of the steps

14:57.840 --> 15:02.070
and then we will update the moving average object

15:02.070 --> 15:06.420
by adding the community rewards to the moving average object

15:06.420 --> 15:07.920
and then recomputing the average.

15:07.920 --> 15:10.680
And that's how we're gonna get the average reward.

15:10.680 --> 15:11.610
So let's do this.

15:11.610 --> 15:15.090
The first thing we need is the rewards that are updated.

15:15.090 --> 15:19.470
So, let's call them rewards steps.

15:19.470 --> 15:20.970
And then as we said,

15:20.970 --> 15:25.970
we take our N steps object, which was I remind,

15:26.550 --> 15:30.180
created here an object of the end step progress class

15:30.180 --> 15:32.280
from our experience, replay foul.

15:32.280 --> 15:34.110
So end steps object,

15:34.110 --> 15:39.110
then we add rewards steps, and then some parenthesis.

15:39.900 --> 15:41.880
All right, so that will get us

15:41.880 --> 15:44.130
the new cumulative rewards of the steps.

15:44.130 --> 15:45.420
All right?

15:45.420 --> 15:49.140
But then we need to add these new cumulative rewards

15:49.140 --> 15:51.120
in our moving average object.

15:51.120 --> 15:54.150
And to do this, we have a method this time

15:54.150 --> 15:57.540
in the moving average class, which is this ad method.

15:57.540 --> 15:58.373
So that's very simple.

15:58.373 --> 16:00.600
We take our moving average object

16:00.600 --> 16:03.900
which we created here with 100 steps.

16:03.900 --> 16:06.720
Then we're gonna use our ad method.

16:06.720 --> 16:11.370
And then in the ad method we input our reward steps

16:11.370 --> 16:13.130
and this will add the rewards

16:13.130 --> 16:16.110
of the steps into the moving average.

16:16.110 --> 16:16.943
All right, and finally,

16:16.943 --> 16:19.350
we can compute the average reward.

16:19.350 --> 16:22.800
And that is, well, you know, that's the same variable here.

16:22.800 --> 16:27.030
So that's what is going to be equal to the average reward.

16:27.030 --> 16:28.410
And to get it

16:28.410 --> 16:31.410
we just need to use the average method this time

16:31.410 --> 16:33.840
for more moving average object.

16:33.840 --> 16:38.160
And that is we do NA.average.

16:38.160 --> 16:40.560
Just like that because our moving average

16:40.560 --> 16:41.820
was already updated

16:41.820 --> 16:44.580
with the new reward steps that we just added

16:44.580 --> 16:46.530
thanks to the ad method.

16:46.530 --> 16:48.720
Great, so now we have our average reward.

16:48.720 --> 16:50.400
So that will populate here

16:50.400 --> 16:53.580
and this is going to be printed every epoch.

16:53.580 --> 16:55.110
All right, so we're done.

16:55.110 --> 16:57.240
So, I'm so excited to see the results

16:57.240 --> 16:58.073
and actually

16:58.073 --> 16:59.430
I'm gonna have a surprise for you

16:59.430 --> 17:01.740
in the next tutorial while watching the results.

17:01.740 --> 17:03.780
So, it's gonna be pretty exciting.

17:03.780 --> 17:06.240
And so now I guess it's time to play with the AI

17:06.240 --> 17:07.740
and have fun.

17:07.740 --> 17:11.610
All right, so prepare yourself a good coffee or a good tea.

17:11.610 --> 17:13.950
Now it's time to sit comfortably in our chair

17:13.950 --> 17:17.160
and watch some very cool videos of our AI playing Zoom.

17:17.160 --> 17:18.930
So let's do that to the next tutorial.

17:18.930 --> 17:20.763
And until then, enjoy AI.
