WEBVTT

00:01.110 --> 00:02.460
-: Hello and welcome back to the course

00:02.460 --> 00:04.740
on Artificial Intelligence.

00:04.740 --> 00:06.570
Today we're going to discuss the plan of attack

00:06.570 --> 00:07.770
for this section.

00:07.770 --> 00:09.450
We're talking about Q-learning

00:09.450 --> 00:11.220
and we've got quite a few tutorials

00:11.220 --> 00:13.200
so I think it's a good idea for us to

00:13.200 --> 00:18.150
quickly go through them to understand what to expect

00:18.150 --> 00:20.550
in the upcoming videos.

00:20.550 --> 00:22.140
So here we go.

00:22.140 --> 00:25.260
All right, what we will learn in this section.

00:25.260 --> 00:26.790
First things first we'll talk about

00:26.790 --> 00:29.220
what reinforcement learning actually is

00:29.220 --> 00:33.240
and what the philosophy behind reinforcement learning is

00:33.240 --> 00:36.780
and how reinforcement learning actually can be seen

00:36.780 --> 00:40.590
in real life and how it relates to things

00:40.590 --> 00:42.120
that we observe in real life,

00:42.120 --> 00:44.760
or actually things that we do ourselves.

00:44.760 --> 00:46.530
Then we'll talk about the Bellman equation.

00:46.530 --> 00:50.910
A very fundamental concept underpinning everything

00:50.910 --> 00:52.770
or a lot of things that are happening

00:52.770 --> 00:55.710
in reinforcement learning, especially in the space

00:55.710 --> 00:58.950
of Q-learning and what we're going to be discussing

00:58.950 --> 01:01.770
in this section of the course and in the following sections.

01:01.770 --> 01:03.900
Then we'll talk about the plan

01:03.900 --> 01:08.340
and the plan that in artificial intelligence

01:08.340 --> 01:11.610
comes up with in order to navigate inside an environment

01:11.610 --> 01:14.580
and we'll see how that comes together.

01:14.580 --> 01:17.700
A very quick but quite interesting tutorial.

01:17.700 --> 01:20.490
Then we'll talk about market decision processes,

01:20.490 --> 01:21.323
a new concept.

01:21.323 --> 01:24.030
We're going to introduce a very new concept,

01:24.030 --> 01:29.030
which will slowly even add an extra layer of sophistication

01:29.250 --> 01:31.080
to our Bellman equation,

01:31.080 --> 01:33.270
to our whole reinforcement learning,

01:33.270 --> 01:34.950
to our Q-learning concepts.

01:34.950 --> 01:36.960
And that's the way this section is structured

01:36.960 --> 01:38.420
that we introduce the Bellman equation

01:38.420 --> 01:41.550
in very simplistic form, and then slowly

01:41.550 --> 01:45.390
throughout the tutorials we add layers of sophistication

01:45.390 --> 01:48.630
to it in order to get to the final version

01:48.630 --> 01:52.650
that is our designated destination in terms of Q-learning.

01:52.650 --> 01:55.530
But we'll get there slowly in order for us to have

01:55.530 --> 01:57.840
enough time to process all the information

01:57.840 --> 02:00.480
and let it settle in and mark of decision processes

02:00.480 --> 02:03.210
is an extra layer of sophistication

02:03.210 --> 02:04.950
on top of what we've already discussed

02:04.950 --> 02:07.473
or what we will have already discussed by then.

02:08.460 --> 02:11.190
Then we'll talk about policies versus plans.

02:11.190 --> 02:12.780
Another interesting tutorial.

02:12.780 --> 02:13.830
They're all interesting.

02:13.830 --> 02:15.930
Just another quick tutorial

02:15.930 --> 02:18.420
on how policies differ from plans

02:18.420 --> 02:19.590
and what the differences there are.

02:19.590 --> 02:22.980
And these are terms that you will probably hear or read

02:22.980 --> 02:26.430
in other literature if you're going to be delving into it

02:26.430 --> 02:29.940
to get additional information on reinforcement learning.

02:29.940 --> 02:32.520
Then we'll talk about adding a living penalty

02:32.520 --> 02:37.520
to our environments, and that's kind of another way

02:38.100 --> 02:40.680
of adding complexity into the environments

02:40.680 --> 02:43.320
that our agents are going to be operating in.

02:43.320 --> 02:46.230
Then we'll talk about the intuition behind Q-learning.

02:46.230 --> 02:49.350
So up until that tutorial, we're going to be talking

02:49.350 --> 02:50.790
values of states.

02:50.790 --> 02:53.310
And then finally we're going to switch to talking about

02:53.310 --> 02:56.100
values of actions or Q values.

02:56.100 --> 02:59.790
And then we're going to introduce the temporal difference.

02:59.790 --> 03:02.460
So this is the tutorial where everything that we've learned

03:02.460 --> 03:06.690
is going to come together to explain how exactly

03:06.690 --> 03:09.150
do agents or artificial,

03:09.150 --> 03:11.160
how does artificial intelligence learn.

03:11.160 --> 03:13.440
How does it update its values

03:13.440 --> 03:16.800
throughout the iterative process that it's going through.

03:16.800 --> 03:19.440
And then finally, we're going to look at

03:19.440 --> 03:21.540
a visualization of Q-learning.

03:21.540 --> 03:23.550
So we're going to take everything we learned

03:23.550 --> 03:27.060
and we're going to look at it happen in front of our eyes

03:27.060 --> 03:29.700
and watch an artificial intelligence

03:29.700 --> 03:33.960
actually perform Q-learning and do all the things

03:33.960 --> 03:35.880
that we're going to discuss on an intuitive level

03:35.880 --> 03:37.950
is going to actually do in practice.

03:37.950 --> 03:42.000
And that will help us even further grasp that knowledge

03:42.000 --> 03:44.520
that we're going to be covering off in this section.

03:44.520 --> 03:46.050
So hopefully you're very excited

03:46.050 --> 03:47.460
about these upcoming tutorials.

03:47.460 --> 03:50.790
I definitely am and there's some very interesting

03:50.790 --> 03:53.100
slides coming up and more importantly

03:53.100 --> 03:56.010
the concepts themselves are very, very interesting

03:56.010 --> 03:59.730
and I'm sure you're going to enjoy them quite a lot.

03:59.730 --> 04:01.380
And I look forward to seeing you next time.

04:01.380 --> 04:03.243
Until then, enjoy AI.