WEBVTT

00:00.390 --> 00:03.930
-: Hello and welcome again to our self-driving car module.

00:03.930 --> 00:06.060
So in this tutorial, I'm going to explain

00:06.060 --> 00:08.850
the environment on which we will implement

00:08.850 --> 00:11.310
our artificial intelligence and that will contain,

00:11.310 --> 00:14.580
of course, the car that we will train to drive itself,

00:14.580 --> 00:17.970
and to avoid obstacles, and on which we will draw some roads

00:17.970 --> 00:21.180
and some blocks for our car to navigate around them.

00:21.180 --> 00:25.560
So we will later build this artificial intelligence to

00:25.560 --> 00:28.170
train this car, to drive on the road, you know,

00:28.170 --> 00:30.030
without crossing the limits,

00:30.030 --> 00:31.970
and avoiding some obstacles that

00:31.970 --> 00:34.140
we will put inside the road.

00:34.140 --> 00:35.910
So this is a pretty exciting challenge

00:35.910 --> 00:38.010
and actually there are two separate files

00:38.010 --> 00:39.000
as you can see.

00:39.000 --> 00:42.030
There is the AI python file that our artificial

00:42.030 --> 00:45.240
intelligence that will do all the training to

00:45.240 --> 00:47.250
train the car how to self drive,

00:47.250 --> 00:49.110
and we have the map.py file

00:49.110 --> 00:52.080
that is the code that makes all this environment.

00:52.080 --> 00:53.220
So here is that code,

00:53.220 --> 00:57.060
that's actually 200 lines of code, a little more.

00:57.060 --> 01:00.480
So this code is not typically related to AI,

01:00.480 --> 01:03.390
it is just a code to make the environment, to make the map.

01:03.390 --> 01:05.130
So I'm gonna go through each

01:05.130 --> 01:07.230
of the sections one by one to explain,

01:07.230 --> 01:08.880
but we're not going to implement this code

01:08.880 --> 01:10.440
line by line from scratch

01:10.440 --> 01:13.740
because we want to focus on artificial intelligence.

01:13.740 --> 01:15.120
But let's still go through the sections

01:15.120 --> 01:17.340
one by one to understand what's happening.

01:17.340 --> 01:20.610
So first, we import the essential libraries.

01:20.610 --> 01:22.080
That's for any code.

01:22.080 --> 01:23.820
We need some libraries to

01:23.820 --> 01:26.490
perform some tasks more efficiently.

01:26.490 --> 01:28.860
Then we import all the Kivy packages.

01:28.860 --> 01:30.840
So, that's not very important

01:30.840 --> 01:33.270
because this is all specific to Kivy.

01:33.270 --> 01:35.160
We're using Kivy to make the map,

01:35.160 --> 01:37.080
and so we're importing a lot of classes

01:37.080 --> 01:39.247
and objects to be able to make this map

01:39.247 --> 01:42.180
and add some tools in the map.

01:42.180 --> 01:44.400
All right, then this line is important.

01:44.400 --> 01:48.240
This line is AI related because basically,

01:48.240 --> 01:51.330
this is where we import our brain,

01:51.330 --> 01:52.680
the brain of the car

01:52.680 --> 01:56.400
which will be an object of this DQN class.

01:56.400 --> 02:00.540
And the DQN class is our artificial intelligence itself.

02:00.540 --> 02:03.150
You will see we will implement the DQN class

02:03.150 --> 02:04.650
in the following tutorials.

02:04.650 --> 02:06.150
And as you might have guessed,

02:06.150 --> 02:09.630
DQN stands for Deep Q Networks.

02:09.630 --> 02:12.600
So we will implement a Deep Q Learning Network,

02:12.600 --> 02:14.010
and then once it's ready

02:14.010 --> 02:18.360
we will be importing it here with this line from the AI.

02:18.360 --> 02:21.960
And the AI is of course our AI Python file.

02:21.960 --> 02:24.450
All right, so can't wait to implement this.

02:24.450 --> 02:26.070
This is gonna be quite a journey,

02:26.070 --> 02:28.860
but you will see this is gonna be very exciting,

02:28.860 --> 02:30.300
because thanks to the AI

02:30.300 --> 02:33.240
the car will be able to drive itself.

02:33.240 --> 02:36.420
All right, and now before I move on to the next sections

02:36.420 --> 02:39.030
we have to explain how we will train this car.

02:39.030 --> 02:41.130
I'm not going to explain the neural network right now,

02:41.130 --> 02:43.170
but I'm going to explain the idea

02:43.170 --> 02:46.170
of how we are gonna train the car to drive itself

02:46.170 --> 02:48.330
and to avoid obstacles.

02:48.330 --> 02:49.500
So you know, in real life,

02:49.500 --> 02:52.920
if you want to train a real car to avoid some walls,

02:52.920 --> 02:55.800
or some obstacles, well, what would you do?

02:55.800 --> 02:57.900
You would definitely not take real walls

02:57.900 --> 03:01.170
or real big obstacles and smash your car onto them.

03:01.170 --> 03:02.850
That would cost you a lot of money.

03:02.850 --> 03:03.960
Instead,

03:03.960 --> 03:07.740
a more intelligent idea would be to punish your car,

03:07.740 --> 03:10.380
not when it smashes a wall or an obstacle,

03:10.380 --> 03:12.780
but when it goes onto some sand.

03:12.780 --> 03:14.190
So it's like you have a field,

03:14.190 --> 03:17.280
this field has some roads on which the car has to stay,

03:17.280 --> 03:19.830
and the roads are limited by some sand.

03:19.830 --> 03:22.050
And each time the car goes onto the sand,

03:22.050 --> 03:24.090
it's like it's going onto an obstacle.

03:24.090 --> 03:26.790
Because once the car goes into some sand,

03:26.790 --> 03:28.140
it'll be slowed down

03:28.140 --> 03:30.450
and we will make sure that the car is penalized.

03:30.450 --> 03:31.830
It's punished for that,

03:31.830 --> 03:33.570
and that is one essential point

03:33.570 --> 03:35.190
of our artificial intelligence.

03:35.190 --> 03:37.743
The bad reward comes whenever the car goes

03:37.743 --> 03:40.230
onto some sand and is slowed down.

03:40.230 --> 03:42.270
All right, and therefore here,

03:42.270 --> 03:45.270
I'm introducing last x and last y,

03:45.270 --> 03:47.700
which are the coordinates of the last point

03:47.700 --> 03:50.730
in memory when we draw some sand on the map.

03:50.730 --> 03:51.563
All right,

03:51.563 --> 03:54.180
and then we get our artificial intelligence

03:54.180 --> 03:57.870
which we call brain, and that contains our neural network,

03:57.870 --> 04:00.690
and we will call it brain because this is actually the brain

04:00.690 --> 04:04.320
of the car and that contains our renewal network.

04:04.320 --> 04:06.270
All right, so in this line of code,

04:06.270 --> 04:10.230
as you can see I'm creating an object of the DQN class.

04:10.230 --> 04:12.840
I will remind what classes and objects are,

04:12.840 --> 04:15.750
but brain is the object, DQN is the class,

04:15.750 --> 04:19.620
and 5, 3 and 0.9 are the inputs of the class.

04:19.620 --> 04:21.030
So that's very simple.

04:21.030 --> 04:24.030
Five corresponds to the states that

04:24.030 --> 04:26.790
are encoded vectors of five dimensions.

04:26.790 --> 04:27.990
We will see what they are,

04:27.990 --> 04:29.700
perfectly describing what's happening

04:29.700 --> 04:31.800
in the environment on the map.

04:31.800 --> 04:34.530
Then three is the number of actions.

04:34.530 --> 04:36.450
There will be three possible actions.

04:36.450 --> 04:39.300
Go left, go straight or go right.

04:39.300 --> 04:41.970
And 0.9 is the Gamma parameter

04:41.970 --> 04:44.220
in the Deep Q learning algorithm.

04:44.220 --> 04:46.950
All right, and then we have the action to rotation.

04:46.950 --> 04:50.910
So action to rotation is a vector of three elements,

04:50.910 --> 04:53.280
0, 20 and -20.

04:53.280 --> 04:57.690
And so we have to do this because the actions are encoded

04:57.690 --> 05:00.930
by three numbers, 0, 1, and 2,

05:00.930 --> 05:03.210
and that corresponds to the indexes

05:03.210 --> 05:04.680
of this action to rotation vector.

05:04.680 --> 05:05.820
So for example,

05:05.820 --> 05:10.820
if the action that is selected at time T is 0,

05:11.250 --> 05:13.830
well 0 corresponds to the index

05:13.830 --> 05:15.840
of this action to rotation vector,

05:15.840 --> 05:19.110
and the value of index 0 is 0,

05:19.110 --> 05:21.480
and therefore we will go straight.

05:21.480 --> 05:24.720
Then if the action selected is 1,

05:24.720 --> 05:26.940
well 1 corresponds to the index

05:26.940 --> 05:29.040
of this action to rotation vector.

05:29.040 --> 05:32.790
And the value of this vector that has index 1, is 20.

05:32.790 --> 05:35.850
So 20 corresponds to a rotation of 20 degrees,

05:35.850 --> 05:39.210
and that means the car will go 20 degrees to the right.

05:39.210 --> 05:42.000
And then if the action selected is 2,

05:42.000 --> 05:44.610
well 2 corresponds to the index 2

05:44.610 --> 05:46.650
of this action to rotation vector.

05:46.650 --> 05:50.310
And therefore the car will do a rotation of -20 degrees,

05:50.310 --> 05:52.500
and therefore, it'll go to the left.

05:52.500 --> 05:56.490
All right, then we introduce the last reward variable,

05:56.490 --> 06:00.180
because at each state we'll be getting the last reward.

06:00.180 --> 06:01.013
So remember,

06:01.013 --> 06:03.210
if the car doesn't go onto some sand

06:03.210 --> 06:05.160
then the reward will be positive.

06:05.160 --> 06:07.530
And if the car goes onto some sand,

06:07.530 --> 06:09.630
well it will get a bad reward.

06:09.630 --> 06:10.980
And at each time,

06:10.980 --> 06:13.500
this variable will contain this reward

06:13.500 --> 06:15.480
that it gets at each time T.

06:15.480 --> 06:18.120
And then we initialize the scores,

06:18.120 --> 06:20.730
which is a vector that will contain the rewards,

06:20.730 --> 06:21.660
not all of them,

06:21.660 --> 06:24.900
but the rewards onto a sliding window, so that you know,

06:24.900 --> 06:26.880
we can make a curve of the mean score

06:26.880 --> 06:29.400
of the rewards with respect to time.

06:29.400 --> 06:32.700
All right, then in this code section, we initialize the map.

06:32.700 --> 06:36.150
So we initialize, for example, the send variable.

06:36.150 --> 06:37.200
So that's important.

06:37.200 --> 06:40.170
The send variable is actually going to be an array

06:40.170 --> 06:43.230
in which the cells will be the pixels of the map.

06:43.230 --> 06:44.490
And in each cell,

06:44.490 --> 06:47.190
we will have a 1 if there is some sand,

06:47.190 --> 06:49.500
and a 0 if there is no sand.

06:49.500 --> 06:52.170
At the beginning, we will not be drawing anything,

06:52.170 --> 06:53.940
so there will be no sand at all.

06:53.940 --> 06:55.320
And therefore all the cells

06:55.320 --> 06:57.780
of the sand array will have a 0,

06:57.780 --> 06:59.430
so there will be 0s everywhere.

06:59.430 --> 07:01.320
And as soon as we draw some sand,

07:01.320 --> 07:04.590
well the cells on which we draw the sand, will get a 1.

07:04.590 --> 07:07.830
And we initialize the arrays with all the 0s right here.

07:07.830 --> 07:10.320
Sand equals NP 0s.

07:10.320 --> 07:13.500
Then we have this important thing, which is the goal.

07:13.500 --> 07:15.660
So the goal is a point in the map

07:15.660 --> 07:17.610
which we will train the car to reach,

07:17.610 --> 07:19.830
so it's like destination.

07:19.830 --> 07:21.540
So what is this goal going to be?

07:21.540 --> 07:25.020
Well, this is going to be the upper left corner of the map.

07:25.020 --> 07:26.610
So we'll train the car to go

07:26.610 --> 07:28.500
to the upper left corner of the map,

07:28.500 --> 07:31.650
and then once it reaches the upper left corner of the map,

07:31.650 --> 07:32.610
then we will train it to go

07:32.610 --> 07:34.440
to the bottom right corner of the map.

07:34.440 --> 07:36.570
So we can imagine the following scenario.

07:36.570 --> 07:40.650
The upper left corner of the map is the Airport of a city,

07:40.650 --> 07:41.880
and the bottom right corner

07:41.880 --> 07:44.520
of the map is the downtown of the city.

07:44.520 --> 07:48.030
And we will train a taxi or Uber to do some round trips

07:48.030 --> 07:50.160
between the Airport and the downtown.

07:50.160 --> 07:50.993
And of course,

07:50.993 --> 07:52.890
we'll make the task difficult to this taxi

07:52.890 --> 07:55.680
by drawing some more and more difficult roads,

07:55.680 --> 07:58.620
and adding more and more obstacles on the street to see

07:58.620 --> 08:00.510
if the taxi can still manage

08:00.510 --> 08:02.970
to go from the Airport to downtown.

08:02.970 --> 08:04.170
So this is gonna be fun.

08:04.170 --> 08:05.820
And so that's why here

08:05.820 --> 08:09.030
I'm setting the coordinates of the first goal,

08:09.030 --> 08:10.260
that is the Airport,

08:10.260 --> 08:12.810
which is at the upper left of the screen.

08:12.810 --> 08:16.860
So the map will be like a square, like this.

08:16.860 --> 08:19.410
And the coordinates of the origin that is

08:19.410 --> 08:22.290
the coordinate 0, 0 is right here,

08:22.290 --> 08:25.320
and then larger is this distance here.

08:25.320 --> 08:29.640
So the coordinates 20 and larger -20 will therefore

08:29.640 --> 08:30.480
be right here,

08:30.480 --> 08:32.760
the upper left corner of the map.

08:32.760 --> 08:35.790
And why did I choose 20 and not 0?

08:35.790 --> 08:36.623
Well,

08:36.623 --> 08:39.180
that's because we want to train the car not to

08:39.180 --> 08:40.560
rush into the walls, you know,

08:40.560 --> 08:42.750
we want to train it to avoid the walls as well.

08:42.750 --> 08:45.330
And therefore, the goal must not be 0 because we don't

08:45.330 --> 08:46.770
want the car to touch the wall.

08:46.770 --> 08:48.150
We want to touch the goal,

08:48.150 --> 08:50.160
so we have to put it right here.

08:50.160 --> 08:52.110
And then I'm just introducing the last

08:52.110 --> 08:53.340
distance variable,

08:53.340 --> 08:55.050
which just gives the current distance

08:55.050 --> 08:58.980
from the car to the goal and that I'm initializing to 0.

08:58.980 --> 09:02.370
All right, and now, time to make the car and the game.

09:02.370 --> 09:03.930
So we're gonna make two classes,

09:03.930 --> 09:06.840
one class for the car and one class for the game.

09:06.840 --> 09:08.010
And inside these classes,

09:08.010 --> 09:11.160
we will already make some connections with our AI.

09:11.160 --> 09:13.200
So we'll do that in the next tutorial.

09:13.200 --> 09:15.213
And until then, enjoy AI.