WEBVTT

00:00.930 --> 00:02.010
-: Hello, and welcome back

00:02.010 --> 00:03.930
to the course on artificial intelligence.

00:03.930 --> 00:05.940
So we've talked about the Bellman Equation,

00:05.940 --> 00:08.490
and we've analyzed our little maze.

00:08.490 --> 00:11.100
Let's have a look at the plan.

00:11.100 --> 00:12.720
What is the plan?

00:12.720 --> 00:14.670
Well, here is our maze analysis,

00:14.670 --> 00:17.708
and we know that we can see, actually,

00:17.708 --> 00:19.530
the values of each state.

00:19.530 --> 00:23.250
We can see what the value of being in every single state is,

00:23.250 --> 00:25.710
and therefore, the AI can,

00:25.710 --> 00:27.810
or the agent can navigate this maze.

00:27.810 --> 00:28.800
So what is the plan?

00:28.800 --> 00:32.070
Well, the plan is simply like a treasure map

00:32.070 --> 00:33.885
for the artificial intelligence.

00:33.885 --> 00:36.660
Instead of looking at these values,

00:36.660 --> 00:37.950
let's just replace them with arrows,

00:37.950 --> 00:41.429
which indicate in which direction the agent should go

00:41.429 --> 00:43.440
because it knows those values.

00:43.440 --> 00:47.250
So, an ideal scenario after it's explored this environment,

00:47.250 --> 00:49.170
it knows the values of being in each state

00:49.170 --> 00:50.880
and therefore it can come up with this map.

00:50.880 --> 00:52.350
So let's have a look again.

00:52.350 --> 00:54.360
We know that here value's one.

00:54.360 --> 00:55.830
So if you are here,

00:55.830 --> 00:57.030
out of the two,

00:57.030 --> 00:58.860
the better one is this one, so you go right.

00:58.860 --> 01:00.990
From here, out of the two, this one's a better one,

01:00.990 --> 01:02.760
this one's a better one, this one's a better one.

01:02.760 --> 01:04.770
Or actually, from here you have two options, right?

01:04.770 --> 01:06.930
So here's kind of like a tie.

01:06.930 --> 01:09.540
So you just pick one at random, doesn't matter which one

01:09.540 --> 01:13.050
because the value in either case is the same

01:13.050 --> 01:15.090
and moreso even if we look through,

01:15.090 --> 01:18.690
it'll take the same number of steps to get to the end.

01:18.690 --> 01:20.460
From here, you've got three options,

01:20.460 --> 01:22.110
but this one is the better value,

01:22.110 --> 01:23.550
from here this one is a better value,

01:23.550 --> 01:25.470
from here, obviously this one is a better value,

01:25.470 --> 01:29.400
because here you just get a minus one reward right away.

01:29.400 --> 01:32.130
And from here you have three actually,

01:32.130 --> 01:34.080
but this one is the best one out of all of them,

01:34.080 --> 01:35.370
this value of the state.

01:35.370 --> 01:38.640
And so therefore, if we replace them with arrows,

01:38.640 --> 01:41.070
it makes sense that this is how the agent would go

01:41.070 --> 01:41.970
if it starts here,

01:41.970 --> 01:44.550
or if for some reason it ends up in this square,

01:44.550 --> 01:46.290
it knows how to get out here.

01:46.290 --> 01:48.510
It starts in this square, it knows how to get out here,

01:48.510 --> 01:49.343
and so on.

01:49.343 --> 01:51.450
So that is what a plan is.

01:51.450 --> 01:53.880
And don't confuse plan with policy,

01:53.880 --> 01:55.110
because we are gonna be talking

01:55.110 --> 01:56.460
about policies for their own.

01:56.460 --> 01:58.230
Policies are very similar to plans,

01:58.230 --> 01:59.820
but they have a little trick to them

01:59.820 --> 02:02.400
because the environment's gonna be a bit different.

02:02.400 --> 02:03.780
It's gonna be stochastic.

02:03.780 --> 02:05.850
And that's what we're going to talk about

02:05.850 --> 02:07.920
in the next tutorial.

02:07.920 --> 02:10.020
So I can't wait to see you on the next one.

02:10.020 --> 02:12.213
And until then, enjoy AI.