WEBVTT

00:00.450 --> 00:01.283
-: Hello and welcome back

00:01.283 --> 00:02.116
to the course on

00:02.116 --> 00:03.210
artificial intelligence.

00:03.210 --> 00:04.170
In today's section,

00:04.170 --> 00:05.880
we're tackling the topic

00:05.880 --> 00:06.990
of Deep

00:06.990 --> 00:08.250
Q-learning.

00:08.250 --> 00:09.270
So let's see how we're going

00:09.270 --> 00:10.500
to attack this.

00:10.500 --> 00:12.420
In this section we will learn,

00:12.420 --> 00:13.467
a Deep Q-learning intuition,

00:13.467 --> 00:15.090
the learning side of things.

00:15.090 --> 00:16.530
So, we are going to separate

00:16.530 --> 00:17.790
Deep Q-learning,

00:17.790 --> 00:18.623
the intuition behind it,

00:18.623 --> 00:19.456
into two parts,

00:19.456 --> 00:20.820
the learning and the acting.

00:20.820 --> 00:21.653
And we're going to have

00:21.653 --> 00:22.650
two tutorials on that.

00:22.650 --> 00:23.760
So first we'll understand

00:23.760 --> 00:26.460
how the neural networks actually learn

00:26.460 --> 00:28.320
and how they update their weights

00:28.320 --> 00:31.530
based on what we are feeding them in

00:31.530 --> 00:33.060
and how the whole concept

00:33.060 --> 00:33.893
of key learning works.

00:33.893 --> 00:34.890
So how we're going to take

00:34.890 --> 00:36.540
the temporal difference concepts

00:36.540 --> 00:37.890
that we discuss in simple key learning.

00:37.890 --> 00:39.480
We're going to apply them into

00:39.480 --> 00:40.500
Deep Q-learning.

00:40.500 --> 00:41.730
And then we're going to talk

00:41.730 --> 00:43.740
about how Deep Q-learning algorithms

00:43.740 --> 00:46.350
actually decide what action to take

00:46.350 --> 00:47.790
in what state.

00:47.790 --> 00:49.620
Then we're going to talk about experience replay

00:49.620 --> 00:51.570
a very important addition on top

00:51.570 --> 00:53.160
of Deep Q-learning,

00:53.160 --> 00:54.930
which actually enables

00:54.930 --> 00:56.190
Deep Q-learning to work properly

00:56.190 --> 00:57.210
and you'll see why it's important

00:57.210 --> 00:58.350
from that tutorial.

00:58.350 --> 00:59.280
And then we're going to talk

00:59.280 --> 01:02.490
about action selection policies.

01:02.490 --> 01:04.650
We're going to talk about how

01:04.650 --> 01:06.330
Deep Q-learning agents

01:06.330 --> 01:07.500
are able

01:07.500 --> 01:10.140
to combine exploration

01:10.140 --> 01:11.190
with exploitation.

01:11.190 --> 01:12.720
So, once they found something,

01:12.720 --> 01:13.650
a good approach,

01:13.650 --> 01:14.640
they can use that approach,

01:14.640 --> 01:15.990
but also they need to explore

01:15.990 --> 01:16.890
so that they don't get stuck

01:16.890 --> 01:18.870
in a local maximum.

01:18.870 --> 01:19.980
And one more thing,

01:19.980 --> 01:22.650
I wanted to mention about the section

01:22.650 --> 01:24.000
is it is highly beneficial

01:24.000 --> 01:26.190
if you have a look at annex number one,

01:26.190 --> 01:27.990
'Artificial Neural Networks'.

01:27.990 --> 01:30.840
So if you go and explore all those topics

01:30.840 --> 01:33.300
we've got some very powerful intuition tutorials

01:33.300 --> 01:34.680
prepared for you there.

01:34.680 --> 01:36.000
If you haven't done, of course,

01:36.000 --> 01:37.410
if you haven't done the Deep learning course,

01:37.410 --> 01:38.430
if you've done the deep learning course,

01:38.430 --> 01:40.260
then you already know all of these things

01:40.260 --> 01:41.490
and you can proceed with the section.

01:41.490 --> 01:43.680
But if you want to get that additional knowledge

01:43.680 --> 01:46.050
about neural networks before you proceed

01:46.050 --> 01:47.640
with this part of the course,

01:47.640 --> 01:48.930
this is highly advisable because

01:48.930 --> 01:51.090
it will help you understand exactly

01:51.090 --> 01:52.740
how neural networks work

01:52.740 --> 01:54.300
and why they're so powerful,

01:54.300 --> 01:55.770
why we're leveraging them

01:55.770 --> 01:58.290
in this Deep Q-learning algorithm.

01:58.290 --> 01:59.970
And once you've refreshed your knowledge

01:59.970 --> 02:01.350
or gain that knowledge

02:01.350 --> 02:03.150
on a neural networks from that annex

02:03.150 --> 02:04.200
then come back here

02:04.200 --> 02:06.600
and we will proceed with the Deep Q-learning.

02:06.600 --> 02:08.610
If you're pretty comfortable with neural networks

02:08.610 --> 02:10.170
then let's get straight into it.

02:10.170 --> 02:11.310
Let's start talking about

02:11.310 --> 02:13.320
Deep Q-learning intuition.

02:13.320 --> 02:14.520
And I look forward to seeing you

02:14.520 --> 02:15.450
on the first tutorial.

02:15.450 --> 02:17.163
Until then, enjoy AI.