WEBVTT

00:02.310 --> 00:04.620
Luka: Hello, everyone, and welcome back.

00:04.620 --> 00:05.453
In this video,

00:05.453 --> 00:07.710
we are going to talk about ChatGPT

00:07.710 --> 00:11.370
and how to use it to implement the same algorithm for Doom

00:11.370 --> 00:13.350
that we are going to use in the course,

00:13.350 --> 00:16.290
convolutional deep Q-learning.

00:16.290 --> 00:17.790
So in the next couple of minutes,

00:17.790 --> 00:22.440
we are going to walk through the way that I actually pointed

00:22.440 --> 00:26.190
and asked ChatGPT for us to get our model.

00:26.190 --> 00:29.580
So let me help you understand how I did it

00:29.580 --> 00:33.720
and why I did certain prompts in the way I did.

00:33.720 --> 00:38.460
So the initial prompt was pretty well-crafted from my end.

00:38.460 --> 00:39.293
I said:

00:39.293 --> 00:41.527
"You are a Senior Machine Learning Expert

00:41.527 --> 00:43.717
"with years of experience working on projects

00:43.717 --> 00:47.550
"applying machine learning to simulations and robotics."

00:47.550 --> 00:48.750
This is the first part,

00:48.750 --> 00:51.330
setting the stage, giving them a role.

00:51.330 --> 00:55.410
So basically, the vector space of the whole ChatGPT

00:55.410 --> 00:59.310
will be prompted more on machine learning topics.

00:59.310 --> 01:04.310
Now I want to bring into my context, Doom.

01:04.500 --> 01:06.600
And actually, how to reach that point,

01:06.600 --> 01:08.887
I said, "Okay, I hired you to help me

01:08.887 --> 01:12.457
"with implementing a Convolutional Deep Q-Learning algorithm

01:12.457 --> 01:14.407
"for the Doom environment.

01:14.407 --> 01:15.337
"The goal is to have

01:15.337 --> 01:18.457
"a fully-functional reinforcement learning algorithm

01:18.457 --> 01:22.327
"that can play Doom, a game environment.

01:22.327 --> 01:24.187
"I have implemented classes

01:24.187 --> 01:27.903
"for experience_replay and memory and image processing.

01:28.777 --> 01:30.337
"I will provide you that

01:30.337 --> 01:34.800
"so you can incorporate that into the final code.

01:34.800 --> 01:37.893
This is really important, which most people misses.

01:38.790 --> 01:42.120
You need to have some way of telling him,

01:42.120 --> 01:43.860
or telling ChatGPT,

01:43.860 --> 01:45.720
that you already have something

01:45.720 --> 01:50.720
and you want to use that code in the final implementation.

01:50.910 --> 01:54.390
So if I didn't provide this as a context,

01:54.390 --> 01:57.240
it will generate some code,

01:57.240 --> 02:01.140
some implementation of convolutional deep Q-learning

02:01.140 --> 02:04.080
that will not consider using experience_replay

02:04.080 --> 02:06.330
and our image processing function,

02:06.330 --> 02:07.323
but we want that.

02:08.250 --> 02:10.537
So, then I continue in saying:

02:10.537 --> 02:14.107
"Your goal is to implement the full solution using Python,

02:14.107 --> 02:16.717
"Torch for model,

02:16.717 --> 02:21.187
"and optimize the code to be executed in Google Colab

02:21.187 --> 02:23.167
"so that students going through the code

02:23.167 --> 02:28.167
"can execute those with not so strong computers."

02:28.260 --> 02:33.060
This is basically telling the ChatGPT

02:33.060 --> 02:35.190
what is the format of the final solution

02:35.190 --> 02:37.200
that you'd like to have.

02:37.200 --> 02:40.050
And then, finally, "ask clarifying questions if needed"

02:40.050 --> 02:42.060
will provide it with more context.

02:42.060 --> 02:44.197
And, "Eh, if you don't know,

02:44.197 --> 02:47.580
"if you don't have enough context, please ask me."

02:47.580 --> 02:49.737
And finally, asked, "Got it?"

02:50.640 --> 02:53.730
I wanted to confirm that it got.

02:53.730 --> 02:58.730
This is a clever way of allowing it to reflect on this

03:00.210 --> 03:05.210
and not strictly go to generate next prompts, next answers.

03:05.820 --> 03:07.230
If you don't do this,

03:07.230 --> 03:09.090
if you don't ask "Got it?" or "Clear?"

03:09.090 --> 03:12.780
or "Do you understand?" or something like that,

03:12.780 --> 03:16.020
it will completely ignore some of the parts

03:16.020 --> 03:18.240
and start generating.

03:18.240 --> 03:20.760
Since you have the limit of how many tokens

03:20.760 --> 03:22.980
you have in your questions and in the answer,

03:22.980 --> 03:26.310
you would like to break it so you can get a refresh

03:26.310 --> 03:28.610
on the amount of tokens that you can generate.

03:29.820 --> 03:32.377
So the first answer is:

03:32.377 --> 03:33.607
"I understood the task.

03:33.607 --> 03:36.697
"I'm going to do exactly what I'm going to ask.

03:36.697 --> 03:39.600
"But before I proceed, I have a few questions."

03:39.600 --> 03:43.980
And this is where this "if you have any questions, ask me",

03:43.980 --> 03:45.750
it asked me pretty good questions.

03:45.750 --> 03:49.200
First off, do you have a custom environment,

03:49.200 --> 03:51.240
or do you use something that is already there?

03:51.240 --> 03:53.883
So OpenAI, VizDoom, and stuff like that.

03:54.750 --> 03:56.850
Then we move to more like,

03:56.850 --> 03:58.950
are there any specific performance metrics

03:58.950 --> 04:00.783
that you'd like me to follow?

04:01.800 --> 04:05.850
Are there requirements on the model itself?

04:05.850 --> 04:08.223
What versions of Python and PyTorch?

04:09.180 --> 04:10.140
And finally,

04:10.140 --> 04:12.990
are there any specific constraints on the deadline?

04:12.990 --> 04:17.160
This is from mostly business plans that it was trained on,

04:17.160 --> 04:19.803
but for it, like, it doesn't really,

04:20.970 --> 04:23.283
like this is not relevant for it, but okay.

04:24.150 --> 04:25.177
I provided:

04:25.177 --> 04:30.177
"Yes, I will use OpenAI Gym, and here is how to load it."

04:30.480 --> 04:31.713
This is from the course.

04:33.000 --> 04:35.133
I said, okay, follow average reward,

04:36.030 --> 04:40.953
no specific requirements, be creative as long as it works,

04:42.120 --> 04:46.320
use latest models or versions of the libraries,

04:46.320 --> 04:48.090
and what is the deadline.

04:48.090 --> 04:49.797
I just wrote "ASAP".

04:51.150 --> 04:54.120
Additionally, here are these classes I mentioned.

04:54.120 --> 04:57.000
So I just pasted the classes that we have in the course.

04:57.000 --> 05:00.720
So experience replay, everything,

05:00.720 --> 05:04.773
and finally, replay memory, image processing,

05:05.790 --> 05:06.623
and that's it.

05:07.681 --> 05:10.230
And with this, I build the context,

05:10.230 --> 05:13.530
and now it started generating.

05:13.530 --> 05:15.753
So it generated the code.

05:16.590 --> 05:21.363
It basically started by pre-processing the image,

05:22.560 --> 05:25.110
then created the deep Q-network,

05:25.110 --> 05:27.513
which is pretty similar to what we have.

05:29.070 --> 05:31.983
Then, it started by initializing that,

05:32.910 --> 05:34.863
and then training code.

05:36.000 --> 05:40.350
Everything so far worked, and then it breaks here.

05:40.350 --> 05:43.950
If you want to continue to generate something longer,

05:43.950 --> 05:46.110
I just wrote "continue".

05:46.110 --> 05:49.530
And it said, "Continue? Yeah, certainly."

05:49.530 --> 05:51.840
It continues generating,

05:51.840 --> 05:55.020
and bam, we have the full code.

05:55.020 --> 05:55.853
Amazing.

05:57.060 --> 06:00.750
It stated how to use it, frequency,

06:00.750 --> 06:02.793
what to change, and stuff like that.

06:03.720 --> 06:05.940
It helped me with this as well.

06:05.940 --> 06:09.630
And let me bring you to my Google Colab.

06:09.630 --> 06:13.053
So I started by pasting the code here,

06:14.040 --> 06:16.290
pasted our pre-processing function.

06:16.290 --> 06:18.720
I'll jump to changes in this pre-processing function

06:18.720 --> 06:19.553
in a second.

06:20.970 --> 06:24.843
Then, I pasted this experience replay,

06:25.740 --> 06:29.163
some of the installation from my end.

06:30.150 --> 06:31.833
We'll explain this in a second.

06:32.790 --> 06:37.790
I played with his deep Q-network and training code.

06:38.430 --> 06:41.640
So there are definitely ways to improve this.

06:41.640 --> 06:45.810
Especially if you have local environment, it will work.

06:45.810 --> 06:49.440
If you are using this in Google Colab, it might break.

06:49.440 --> 06:50.840
So let me bring you to that.

06:51.750 --> 06:54.240
So I did it, and I got an error.

06:54.240 --> 06:58.110
I said, "Hey, my initial code that it provided did not work.

06:58.110 --> 07:03.110
I just pasted the code, this error message,

07:03.810 --> 07:06.600
and saying, "I can't resize an images."

07:06.600 --> 07:10.440
And if you take a look at our code from the course,

07:10.440 --> 07:13.200
it requires to resize the image with this.

07:13.200 --> 07:17.310
But the newer version of SciPy is actually removing this.

07:17.310 --> 07:18.663
So it said, "Oh, yeah,

07:19.537 --> 07:21.637
"it appears that this may have been removed."

07:21.637 --> 07:25.200
"Here is the new version with Pillow.

07:25.200 --> 07:27.360
And actually, it didn't require that.

07:27.360 --> 07:32.260
It basically rewrote the whole pre-processing image class

07:33.270 --> 07:36.723
with Pillow instead of SciPy.

07:37.620 --> 07:41.283
So I used it, and it's working actually.

07:42.990 --> 07:44.227
And then I said, "Okay,

07:44.227 --> 07:47.257
"I have a problem with Environment using Google Colab.

07:47.257 --> 07:50.160
"How to solve it? This is the error."

07:50.160 --> 07:52.087
And it said, "Oh, yeah, apologize.

07:52.087 --> 07:55.657
"It seems that this particular environment is not working.

07:55.657 --> 07:56.917
"Here is what you do.

07:56.917 --> 07:59.880
"You basically install and do this."

07:59.880 --> 08:02.580
I got an error, another error.

08:02.580 --> 08:04.350
And this error actually indicated

08:04.350 --> 08:09.350
that rendering is not found in this particular version.

08:11.850 --> 08:14.280
And it said, "Okay, install this version."

08:14.280 --> 08:17.040
I did. Another error.

08:17.040 --> 08:18.457
And it said, "Oh, yeah, yeah,

08:18.457 --> 08:23.457
"Google Colab does not support rendering, OpenGL libraries,"

08:24.270 --> 08:26.760
which means, for us, that we need to do,

08:26.760 --> 08:27.870
instead of Google Colab,

08:27.870 --> 08:31.350
something else for visualization purposes.

08:31.350 --> 08:34.500
And it basically gave me a step-by-step,

08:34.500 --> 08:37.440
leading on what to do, how to change that,

08:37.440 --> 08:38.820
and stuff like that.

08:38.820 --> 08:42.450
And if you do this locally, you will solve the problem.

08:42.450 --> 08:45.120
But this is basically now teaching you

08:45.120 --> 08:47.200
how to use properly ChatGPT.

08:51.720 --> 08:55.980
Like, right here, you saw that I got some errors,

08:55.980 --> 08:58.830
I prompted with those errors, and so on.

08:58.830 --> 08:59.880
And now, you have,

08:59.880 --> 09:01.350
like, if you scroll back,

09:01.350 --> 09:05.220
you have full-functioning code with a deep Q-network,

09:05.220 --> 09:06.630
with a training code,

09:06.630 --> 09:09.300
and you can go and ask it to generate a test code,

09:09.300 --> 09:11.280
and it will.

09:11.280 --> 09:12.183
Amazing, right?

09:13.080 --> 09:14.130
So that's it.

09:14.130 --> 09:19.130
This is how to use ChatGPT to solve the deep Q,

09:20.190 --> 09:22.590
or convolutional deep Q-network for Doom.

09:22.590 --> 09:24.453
Thanks. Enjoy the course!
