WEBVTT

00:02.520 --> 00:04.620
Instructor: Hello everyone and welcome back.

00:04.620 --> 00:07.920
In this video we'll talk about how to use ChatGPT

00:07.920 --> 00:11.640
to solve "Breakout,"

00:11.640 --> 00:15.540
using ChatGPT to implement A3C.

00:15.540 --> 00:19.470
So I already prompted it with a lot of crazy prompts,

00:19.470 --> 00:23.010
we got amazing results, we'll see them in a second.

00:23.010 --> 00:25.170
So this video will be basically walking view

00:25.170 --> 00:26.640
through my thought process

00:26.640 --> 00:28.740
on how I actually got the results,

00:28.740 --> 00:31.470
and how you can actually use ChatGPT

00:31.470 --> 00:34.890
to create A3C algorithm as well.

00:34.890 --> 00:39.360
Okay, so I started similarly to what we had for "DOOM,"

00:39.360 --> 00:41.340
I stated, okay, you are basically

00:41.340 --> 00:42.930
a machine learning expert

00:42.930 --> 00:46.110
with years of experience in robotics.

00:46.110 --> 00:50.490
I hire you to do A3C algorithm for "Breakout."

00:50.490 --> 00:52.110
The goal is to have

00:52.110 --> 00:56.100
a fully functional SOTA, A3C algorithm

00:56.100 --> 00:58.401
that can play the "Breakout" game.

00:58.401 --> 01:00.150
I have implemented classes,

01:00.150 --> 01:02.520
the testing code environment,

01:02.520 --> 01:05.040
and custom versions of Adam optimizer

01:05.040 --> 01:07.590
that is used for as a shared-weights optimizer

01:07.590 --> 01:09.361
for the A3C model.

01:09.361 --> 01:13.076
So this is giving it more context, what I did,

01:13.076 --> 01:18.000
I will provide these three things to you.

01:18.000 --> 01:20.160
So as I reference before generating

01:20.160 --> 01:21.900
the rest of the code,

01:21.900 --> 01:23.250
your goal is to implement

01:23.250 --> 01:25.650
the full solution using Python, Torch,

01:25.650 --> 01:27.690
similarly to what we had,

01:27.690 --> 01:31.680
and optimize the code to be executed in Google Colab,

01:31.680 --> 01:33.540
same what we had before.

01:33.540 --> 01:35.850
Additionally, comment every single line

01:35.850 --> 01:38.590
of the code so the students can understand

01:40.710 --> 01:42.990
what you did in the code.

01:42.990 --> 01:46.500
Ask clarifying questions if needed, got it.

01:46.500 --> 01:48.360
So we have some overlapping

01:48.360 --> 01:53.130
what we did for conversational Deep Q for "DOOM,"

01:53.130 --> 01:54.600
but I added a lot of context

01:54.600 --> 01:57.390
that I already have things implemented,

01:57.390 --> 02:00.210
and I'd like them to be used.

02:00.210 --> 02:03.030
So that's what I start and it's confirmed.

02:03.030 --> 02:06.840
Yes, I understand and I'll ask any questions,

02:06.840 --> 02:09.510
please provide the rest of the code.

02:09.510 --> 02:11.753
Then I stated, this is Adam optimizer,

02:11.753 --> 02:15.180
pasted the code for Adam optimizer,

02:15.180 --> 02:16.920
for the resources we have,

02:16.920 --> 02:18.870
and I gave this instruction,

02:18.870 --> 02:21.963
I said, wait for the rest of the code before implementing,

02:23.130 --> 02:26.070
I just, before I got any good results,

02:26.070 --> 02:27.210
I actually just pasted

02:27.210 --> 02:30.510
and it immediately jumped to implementing A3C,

02:30.510 --> 02:32.310
without waiting for the rest.

02:32.310 --> 02:33.750
So I added this as a kind

02:33.750 --> 02:38.750
of systemic code, systemic prompt,

02:39.060 --> 02:40.560
and it said, okay,

02:40.560 --> 02:41.820
please provide the environment

02:41.820 --> 02:42.653
and test the code.

02:42.653 --> 02:43.650
So it's actually referring

02:43.650 --> 02:46.323
to what we gave in the first prompt.

02:47.160 --> 02:51.030
I said, cool, here is the implementation

02:51.030 --> 02:52.173
of Gym environment.

02:53.760 --> 02:56.010
I pasted that main code,

02:56.010 --> 02:57.128
I added context here,

02:57.128 --> 02:59.040
if you take a look,

02:59.040 --> 03:01.383
and this is the main file running everything.

03:03.690 --> 03:05.267
And then I said, okay, cool,

03:05.267 --> 03:06.813
wait for the testing code.

03:07.830 --> 03:08.766
And it said, yeah, I,

03:08.766 --> 03:11.880
however, I still need to see the testing code

03:11.880 --> 03:13.410
before I implement A3C,

03:13.410 --> 03:15.813
I said, cool, here is a testing code.

03:16.920 --> 03:19.170
And you see this is how if you have

03:19.170 --> 03:21.480
a bigger, bigger project,

03:21.480 --> 03:23.230
you can use that

03:24.330 --> 03:27.810
to kinda sway it to use

03:27.810 --> 03:30.870
that piece of a code or as a reference,

03:30.870 --> 03:33.630
in the style so you can actually generate something

03:33.630 --> 03:34.983
in your style of code.

03:36.360 --> 03:38.370
Cool, and when I done that,

03:38.370 --> 03:41.190
it said, great, go for it.

03:41.190 --> 03:45.840
And it said basically I'm going to use this main code,

03:45.840 --> 03:49.200
and testing, and implement this.

03:49.200 --> 03:52.800
And then it said, created model.py,

03:52.800 --> 03:55.830
which actually we did in the course as well.

03:55.830 --> 03:58.620
And he said, here is the ActorCritic model,

03:58.620 --> 04:00.630
and I compare to ours.

04:00.630 --> 04:02.310
This is the newer version of python,

04:02.310 --> 04:05.640
so much of the weights initialization

04:05.640 --> 04:09.060
and stuff like that can be done in a more elegant way.

04:09.060 --> 04:13.680
So it did it and it actually uses pretty well,

04:13.680 --> 04:15.663
then generated train code,

04:16.500 --> 04:19.860
and basically ActorCritic.

04:19.860 --> 04:22.020
And if you take a look,

04:22.020 --> 04:23.039
it's a pretty simplistic,

04:23.039 --> 04:25.980
so it definitely needs to be improved.

04:25.980 --> 04:27.423
So what you can do,

04:28.500 --> 04:31.560
you can say, okay, I said continue,

04:31.560 --> 04:35.373
and it continues writing it down.

04:37.920 --> 04:41.040
It created this

04:41.040 --> 04:45.030
as a kind of a loss,

04:45.030 --> 04:46.980
but there is, yeah,

04:46.980 --> 04:47.910
basically everything

04:47.910 --> 04:51.333
that we needed, it optimized everything,

04:53.160 --> 04:56.550
and you can see that optimizer.stop at the end.

04:56.550 --> 04:58.080
So it basically used everything

04:58.080 --> 04:59.220
that we defined so far,

04:59.220 --> 05:02.490
as well as the proper loss,

05:02.490 --> 05:05.730
and yeah, optimizer, everything.

05:05.730 --> 05:07.591
So that's it.

05:07.591 --> 05:10.530
In a couple of simple steps,

05:10.530 --> 05:13.170
we managed to implement the whole code

05:13.170 --> 05:15.270
for one of the state-of-the-art models

05:15.270 --> 05:18.450
that happened to be like one of the best

05:18.450 --> 05:21.060
in 2017, 2018.

05:21.060 --> 05:24.900
To this day, it actually hold a lot

05:24.900 --> 05:27.240
of capacity to solve complex environments

05:27.240 --> 05:28.710
such as "Breakout."

05:28.710 --> 05:32.490
So you saw, now that we have ChatGPT for example,

05:32.490 --> 05:35.250
you can guide it to implement certain

05:35.250 --> 05:36.990
certain parts of the code.

05:36.990 --> 05:39.450
And if you don't know what certain parts

05:39.450 --> 05:41.430
of the code actually does,

05:41.430 --> 05:42.450
you can go and ask it,

05:42.450 --> 05:44.853
for example, cool,

05:47.310 --> 05:51.840
can you explain training part

05:51.840 --> 05:56.200
and where is the custom

05:58.320 --> 05:59.223
of Adam?

06:04.740 --> 06:08.430
So I'm prompting it to actually explain itself,

06:08.430 --> 06:10.290
so that's another way

06:10.290 --> 06:12.570
to basically prompt it out

06:12.570 --> 06:16.803
to give you more reasoning behind its generation.

06:17.730 --> 06:18.563
And you can see

06:18.563 --> 06:21.213
that it's actually going to explain step by step.

06:22.920 --> 06:25.890
So yeah, we can wait for this,

06:25.890 --> 06:27.510
you can do the same prompts

06:27.510 --> 06:30.030
and you will get really similar results to mine,

06:30.030 --> 06:31.950
but generally now what it's going to do

06:31.950 --> 06:34.050
it's going to take the train function

06:34.050 --> 06:38.010
that it defined right at the top right here,

06:38.010 --> 06:40.890
and it'll basically go through every single step

06:40.890 --> 06:43.803
and implement it.

06:44.700 --> 06:46.053
So yeah, that would be it.

06:47.580 --> 06:48.600
I definitely encourage you

06:48.600 --> 06:52.863
to try out this code in Google Colab,

06:53.730 --> 06:54.780
It can be done,

06:54.780 --> 06:58.380
It'll be a bit too tricky to make it work,

06:58.380 --> 06:59.760
because of the visualization

06:59.760 --> 07:02.430
and limitations of the Goggle Colab environment.

07:02.430 --> 07:03.840
However, you can definitely go

07:03.840 --> 07:07.083
and paste the errors here, and you can,

07:08.580 --> 07:11.310
you can go and get some pretty decent results

07:11.310 --> 07:13.890
for that to debug the process.

07:13.890 --> 07:16.860
And yeah, with that I'll let you be,

07:16.860 --> 07:20.400
and now you have the way to use ChatGPT

07:20.400 --> 07:24.960
on how to get value for A3C.

07:24.960 --> 07:27.690
And yeah, hope you enjoyed

07:27.690 --> 07:29.973
the course so far, bye.
