WEBVTT

00:00.840 --> 00:02.610
-: Hi everyone, and welcome back.

00:02.610 --> 00:05.460
In this lecture, we're gonna get our environment set up

00:05.460 --> 00:08.490
and I want to introduce this more high level overview

00:08.490 --> 00:11.640
for those of you that wanna try and solve it on your own.

00:11.640 --> 00:12.810
And this may look familiar,

00:12.810 --> 00:14.460
in the last lecture you saw this,

00:14.460 --> 00:15.870
so if you read through it, my apologies,

00:15.870 --> 00:18.060
we're just gonna go through it really quickly.

00:18.060 --> 00:20.910
First things first, in this project,

00:20.910 --> 00:23.460
we're really aiming to keep it as simple as possible

00:23.460 --> 00:25.530
in the sense we don't need to import too many libraries.

00:25.530 --> 00:27.420
We're basically just gonna be using NumPy.

00:27.420 --> 00:30.870
For that, we just need to import NumPy as NP,

00:30.870 --> 00:33.450
usually the common reference for NumPy.

00:33.450 --> 00:36.390
And we also want to set up our environment.

00:36.390 --> 00:37.650
As you'll see here,

00:37.650 --> 00:39.390
and we're just gonna go through this again

00:39.390 --> 00:40.740
really quickly as an overview,

00:40.740 --> 00:42.540
the first step for our Q-learning

00:42.540 --> 00:44.520
is we wanna define an environment

00:44.520 --> 00:46.320
that the postman has to navigate,

00:46.320 --> 00:47.790
we need that environment set up

00:47.790 --> 00:50.457
so we can actually iterate and go through it.

00:50.457 --> 00:51.690
In this lecture,

00:51.690 --> 00:53.160
the environment's going to consist

00:53.160 --> 00:55.830
of states, actions, and rewards.

00:55.830 --> 00:58.830
States and actions are inputs for the Q-learning AI agent,

00:58.830 --> 01:02.370
while the possible actions are the AI agents' outputs.

01:02.370 --> 01:04.470
Our states, we can think about

01:04.470 --> 01:06.870
and look at this image as our representation,

01:06.870 --> 01:07.860
the states in our environment

01:07.860 --> 01:10.230
are all the possible locations within the city.

01:10.230 --> 01:12.120
We can call this "example city".

01:12.120 --> 01:14.670
Some of these locations are the city boundaries

01:14.670 --> 01:16.470
which will be our black squares,

01:16.470 --> 01:18.870
while other locations are aisles

01:18.870 --> 01:21.210
that the postman can use to travel through the city,

01:21.210 --> 01:22.920
those are gonna be the white squares.

01:22.920 --> 01:24.390
The green square indicates

01:24.390 --> 01:27.300
the item, packaging and shipping area.

01:27.300 --> 01:28.740
The black and green squares

01:28.740 --> 01:31.740
are what we're gonna call terminal states.

01:31.740 --> 01:34.620
So overall, our goal or our AI agent's goal,

01:34.620 --> 01:36.120
we wanna use the shortest path,

01:36.120 --> 01:37.890
we want our agent to learn the shortest path

01:37.890 --> 01:40.689
between the item packaging area, which are green,

01:40.689 --> 01:43.140
and all the other locations in the city

01:43.140 --> 01:45.363
where the postman is allowed to travel.

01:49.500 --> 01:50.490
In the above image,

01:50.490 --> 01:55.490
we have 121 possible states or locations within the city.

01:56.040 --> 01:58.608
These states are arranged in a grid 11 by 11,

01:58.608 --> 02:00.810
each location can hence be identified

02:00.810 --> 02:02.670
by its row and column index.

02:02.670 --> 02:04.650
So what would be our first step?

02:04.650 --> 02:05.970
And this is where I want you guys

02:05.970 --> 02:08.640
to start thinking about how you can define it.

02:08.640 --> 02:10.500
We need to define our environment.

02:10.500 --> 02:12.330
This is a good example of our image

02:12.330 --> 02:13.260
and how we're gonna approach it,

02:13.260 --> 02:14.760
so how would you model that?

02:14.760 --> 02:16.380
Remember, we're using NumPy.

02:16.380 --> 02:18.300
So we need to define these boundaries

02:18.300 --> 02:21.000
and we can define a 3D NumPy array

02:21.000 --> 02:25.410
to hold our current Q values for each state and action pair.

02:25.410 --> 02:27.224
As we see our representation,

02:27.224 --> 02:30.300
and for those of you who are not familiar with it

02:30.300 --> 02:33.240
or maybe this is new or you want to just have a refresher,

02:33.240 --> 02:35.550
the AI A-Z handbook from this course

02:35.550 --> 02:38.520
is extremely helpful, highly recommended.

02:38.520 --> 02:40.350
So what do we have to do here?

02:40.350 --> 02:41.580
We can actually,

02:41.580 --> 02:43.530
let me just expand this really quickly for us

02:43.530 --> 02:45.210
so we can view it a little easier,

02:45.210 --> 02:46.890
let me just add some code cells.

02:46.890 --> 02:49.620
We're gonna define our 3D NumPy array.

02:49.620 --> 02:51.420
How would you go about this?

02:51.420 --> 02:53.070
So we have some options,

02:53.070 --> 02:55.590
but the most really straightforward and simple option,

02:55.590 --> 02:57.423
let's call it environment rows,

02:59.940 --> 03:02.790
rows, and let's set it to 11, it's an 11 by 11.

03:02.790 --> 03:04.940
Then we can also do an environment_columns,

03:09.030 --> 03:11.310
and we can also set this to 11.

03:11.310 --> 03:14.370
Lastly, we can set our Q values

03:14.370 --> 03:16.680
since we need to add our NumPy

03:16.680 --> 03:19.030
with the environment rows, environment columns.

03:20.100 --> 03:25.100
And we can set this as Q values equal to NumPy.zeros,

03:26.400 --> 03:29.114
and we need to use our environment rows,

03:29.114 --> 03:32.073
environment columns,

03:32.970 --> 03:36.000
and we have our 3D NumPy array,

03:36.000 --> 03:40.353
our 3D environment representation set with our environment.

03:41.220 --> 03:42.450
Awesome.

03:42.450 --> 03:44.340
Now, and we're gonna leave it off here

03:44.340 --> 03:45.990
but I want you guys to start thinking about

03:45.990 --> 03:46.823
how to solve this

03:46.823 --> 03:48.570
since you already have your environment set up.

03:48.570 --> 03:50.100
The next thing that you're going to want to do

03:50.100 --> 03:52.980
as a hint is set up your actions.

03:52.980 --> 03:55.830
Your agent needs to be able to move through the environment.

03:55.830 --> 03:57.480
So how would you represent that?

03:57.480 --> 04:01.170
How would you write that in for this problem?

04:01.170 --> 04:02.070
Let's leave it off here.

04:02.070 --> 04:05.280
In the next video, we are going to revisit those actions.

04:05.280 --> 04:07.530
Awesome, I'll see you guys in the next video.