WEBVTT

00:00.090 --> 00:00.923
Instructor: Hello, my friends,

00:00.923 --> 00:03.870
and welcome to this new data science use case with ChatGPT,

00:03.870 --> 00:07.560
which this time will be on reinforcement learning.

00:07.560 --> 00:09.420
So we're gonna tell ChatGPT right now

00:09.420 --> 00:13.260
that we have programmed a virtual self-driving car

00:13.260 --> 00:14.730
without any AI yet.

00:14.730 --> 00:19.020
So we have just implemented it's ability to move forward,

00:19.020 --> 00:21.300
turn left, turn right, and stop.

00:21.300 --> 00:25.050
And we're gonna say that this self-driving car is rewarded

00:25.050 --> 00:27.540
if it manages to reach a certain destination,

00:27.540 --> 00:30.420
and penalized if it goes away from this destination,

00:30.420 --> 00:33.360
or if it runs into some obstacles.

00:33.360 --> 00:35.460
Okay, and then we're gonna ask ChatGPT

00:35.460 --> 00:37.680
which AI model should be implemented

00:37.680 --> 00:40.020
for the self-driving car and how.

00:40.020 --> 00:40.853
Okay?

00:40.853 --> 00:41.686
So let's do this.

00:41.686 --> 00:43.530
I'm gonna open a new chat here.

00:43.530 --> 00:45.637
And inside we're gonna say exactly,

00:45.637 --> 00:49.110
"Hey, I have programmed

00:49.110 --> 00:53.550
a virtual self-driving car

00:53.550 --> 00:57.000
with the actions being

00:57.000 --> 00:59.820
to move forward,

00:59.820 --> 01:01.320
turn left,

01:01.320 --> 01:03.030
turn right,

01:03.030 --> 01:04.710
and stop.

01:04.710 --> 01:08.370
And then this car is rewarded

01:08.370 --> 01:13.370
once it reaches a certain destination,

01:13.890 --> 01:16.080
and penalized

01:16.080 --> 01:19.980
if it goes further away

01:19.980 --> 01:23.070
from this destination,

01:23.070 --> 01:28.070
or if it runs into some obstacles."

01:28.650 --> 01:29.760
Okay, now, let's be greedier.

01:29.760 --> 01:31.530
I just wanted to ask for some recommendations

01:31.530 --> 01:34.110
on how to build a reinforcement learning model

01:34.110 --> 01:35.490
for the self-driving car,

01:35.490 --> 01:38.700
but let's actually ask to directly build it.

01:38.700 --> 01:39.533
Why not?

01:39.533 --> 01:41.730
Let's see what ChatGPT is capable of.

01:41.730 --> 01:44.650
So I'm gonna ask, "Can you please

01:45.750 --> 01:49.230
write me a Python code

01:49.230 --> 01:53.580
using the best libraries

01:53.580 --> 01:58.580
to build a reinforcement learning model

01:59.340 --> 02:01.770
that implements

02:01.770 --> 02:06.750
the AI inside that self-driving car?"

02:08.010 --> 02:09.870
All right, so it's a bit greedy to ask this,

02:09.870 --> 02:11.160
but we never know.

02:11.160 --> 02:14.010
ChatGPT might give us exactly what we want.

02:14.010 --> 02:16.500
So let's try and if it runs into any trouble,

02:16.500 --> 02:17.340
we will help it.

02:17.340 --> 02:20.670
Okay, so let's press Enter and here we go.

02:20.670 --> 02:22.477
Sure, okay, that's a great start again.

02:22.477 --> 02:24.180
"I can help you get started with implementing

02:24.180 --> 02:25.230
a reinforcement learning model

02:25.230 --> 02:26.700
for your self-driving car in Python."

02:26.700 --> 02:27.533
Amazing.

02:27.533 --> 02:29.730
"Here is some code that demonstrates how you can use,

02:29.730 --> 02:30.960
OpenAI Gym library."

02:30.960 --> 02:31.793
Awesome.

02:31.793 --> 02:33.180
"To train a reinforcement learning model

02:33.180 --> 02:34.920
for your self-driving car."

02:34.920 --> 02:36.330
Okay, so that's amazing actually,

02:36.330 --> 02:40.050
I didn't expect it to, ChatGPT to do it so easily,

02:40.050 --> 02:43.230
and it makes all sense what it's doing now.

02:43.230 --> 02:45.600
It's initializing, the Q-table,

02:45.600 --> 02:47.670
the number of states, number of actions,

02:47.670 --> 02:49.920
a learning rate initialized to 0.1,

02:49.920 --> 02:51.660
a discount factor to 0.95.

02:51.660 --> 02:54.270
This is really amazing, I'm in an awe.

02:54.270 --> 02:55.743
The exploration rate to 0.5,

02:56.610 --> 02:58.950
the maximum number of episodes to 1,000,

02:58.950 --> 03:01.950
maximum number of steps per episodes to 100.

03:01.950 --> 03:04.770
Then it trains the Q-learning model, amazing.

03:04.770 --> 03:07.140
I'm gonna challenging him then to implement

03:07.140 --> 03:09.810
a more advanced model like the deep Q-learning model,

03:09.810 --> 03:12.720
or the A3C but let's see.

03:12.720 --> 03:16.380
It implements everything from scratch with the For Loop.

03:16.380 --> 03:18.030
It starts by choosing an action,

03:18.030 --> 03:21.690
then it implements the step where the AI takes the action,

03:21.690 --> 03:25.410
then it updates the Q-table and then it sets the state

03:25.410 --> 03:26.243
to the next day.

03:26.243 --> 03:28.260
This is really the reinforcement learning process,

03:28.260 --> 03:30.510
the Q-learning process, this is amazing.

03:30.510 --> 03:33.450
Then it even implements how to test the Q-learning model,

03:33.450 --> 03:36.960
and then finally it closes the environment.

03:36.960 --> 03:39.780
Wow, that was really amazing.

03:39.780 --> 03:41.430
Okay, I didn't expect that much.

03:41.430 --> 03:43.080
So since it did it so easily,

03:43.080 --> 03:46.470
let's actually challenge it even more, even harder.

03:46.470 --> 03:48.247
So first, let's say, of course,

03:48.247 --> 03:52.593
"Thanks a lot, that was really helpful.

03:54.690 --> 03:56.040
However,

03:56.040 --> 03:59.700
I do know already

03:59.700 --> 04:04.260
how to implement Q-learning.

04:04.260 --> 04:06.210
So could you

04:06.210 --> 04:09.030
please do the same

04:09.030 --> 04:11.280
with a more

04:11.280 --> 04:16.280
advanced reinforcement learning model like, for example,

04:18.720 --> 04:21.990
deep Q-learning."

04:21.990 --> 04:25.297
Or even better, let's be super greedy.

04:25.297 --> 04:30.297
"A state of the art reinforcement learning model."

04:32.340 --> 04:33.420
Okay, ready for this?

04:33.420 --> 04:34.980
Here we go.

04:34.980 --> 04:36.420
And, "Certainly."

04:36.420 --> 04:37.800
He always says, "Certainly."

04:37.800 --> 04:38.633
That's amazing.

04:38.633 --> 04:39.870
"Here is some code that demonstrate

04:39.870 --> 04:42.180
how you can use the OpenAI baselines library

04:42.180 --> 04:44.760
to train a deep Q-learning model, DQN,

04:44.760 --> 04:45.900
for your self-driving car."

04:45.900 --> 04:47.310
And here we go again.

04:47.310 --> 04:49.140
It gives us the code.

04:49.140 --> 04:50.820
That's really amazing.

04:50.820 --> 04:55.820
All right, so yes, it uses now the Wrap DeepMind library

04:57.300 --> 05:01.260
from the baseline's common Atari library,

05:01.260 --> 05:03.270
and it implements the AI again

05:03.270 --> 05:05.040
with the deep Q-learning model this time.

05:05.040 --> 05:07.117
That's really, really amazing.

05:07.117 --> 05:09.450
"This code uses the DQN algorithm

05:09.450 --> 05:12.120
to train a deep Q-learning model for your self-driving car.

05:12.120 --> 05:15.450
The model will learn to take the best action at each state

05:15.450 --> 05:17.430
in order to maximize the reward.

05:17.430 --> 05:20.460
Alternatively, if you would like to use a more

05:20.460 --> 05:21.930
advanced reinforcement learning model,

05:21.930 --> 05:24.450
you can use the OpenAI Spinning Up library,

05:24.450 --> 05:26.790
which provides implementations of state-of-the-art

05:26.790 --> 05:28.380
reinforcement learning algorithms.

05:28.380 --> 05:30.960
Here is some code that demonstrate how you can use

05:30.960 --> 05:32.250
the Spinning Up library to train

05:32.250 --> 05:34.950
a proximal policy optimization, PPO model."

05:34.950 --> 05:36.097
Indeed, that's a great model.

05:36.097 --> 05:37.290
"For your self-driving car."

05:37.290 --> 05:39.690
And here you go, it gives you the model again

05:39.690 --> 05:40.950
with the whole code.

05:40.950 --> 05:44.430
Wow, what a revolution, this ChatGPT.

05:44.430 --> 05:47.220
Seriously, I'm really impressed.

05:47.220 --> 05:49.357
Right, and there you go, you have the code.

05:49.357 --> 05:51.210
"This code uses the PPO algorithm

05:51.210 --> 05:52.530
to train a reinforcement learning model

05:52.530 --> 05:53.940
for your self-driving car.

05:53.940 --> 05:56.370
The model will learn to take actions to maximize

05:56.370 --> 05:58.770
the expected sum of discounted rewards.

05:58.770 --> 05:59.820
I hope this helps.

05:59.820 --> 06:01.050
Let me know if you have any questions,

06:01.050 --> 06:02.820
or if you would like further assistance."

06:02.820 --> 06:05.370
Well, even I didn't expect so much,

06:05.370 --> 06:07.560
so I'm just impressed just like you,

06:07.560 --> 06:09.450
the same time doing all this.

06:09.450 --> 06:14.450
So we're gonna say, "A big thank you, that was amazing."

06:16.470 --> 06:17.820
And done.

06:17.820 --> 06:20.220
Thanks again for watching this data science use case

06:20.220 --> 06:21.150
with ChatGPT.

06:21.150 --> 06:22.830
Let's now move on to the next one.

06:22.830 --> 06:25.113
And until then, enjoy machine learning.