WEBVTT

00:04.120 --> 00:07.640
We are now at the stage where we are fully set up and can start coding.

00:07.920 --> 00:13.200
In this lecture, we will go over how we can replicate what we just saw in the API explorer using Python

00:13.200 --> 00:15.600
libraries to perform these API requests.

00:16.360 --> 00:21.720
Two final things that I wanted to mention before we start is that at first we will only use Python to

00:21.760 --> 00:22.760
extract the data.

00:22.800 --> 00:28.040
However, when we come to develop the whole ELT pipeline, we will need to make some changes to integrate

00:28.040 --> 00:30.120
Docker and airflow into the code.

00:30.160 --> 00:33.000
Secondly, we will be making requests to the YouTube API.

00:33.640 --> 00:37.240
Each request has a cost which depends on the resource and method.

00:37.400 --> 00:41.120
We will only use the list method, which is at the time of this recording.

00:41.120 --> 00:46.080
For most resources, one unit and the daily limit is 10,000 units.

00:46.600 --> 00:49.840
And here you can see an example for the resource we have used already.

00:49.880 --> 00:53.920
Videos that lists only incurs one unit.

00:54.000 --> 00:55.200
This is important to know.

00:55.200 --> 01:00.770
So if you are testing and playing around with the API, you are aware that if you run too many requests,

01:00.770 --> 01:05.570
you can reach the limit and won't be able to continue with the API requests for that day.

01:05.930 --> 01:11.330
What you see here is the link to the quotas and the limits, and it will be in the appendix of the section

01:11.690 --> 01:14.370
referencing what we had discussed in the last lecture.

01:14.410 --> 01:20.370
We said that the first hurdle, which in our Russian analogy is the biggest hurdle we said it is to

01:20.370 --> 01:21.730
get the playlist ID.

01:22.290 --> 01:26.570
So here we need to build a script that gets it ID for us.

01:27.210 --> 01:31.810
If we were to translate what we discussed in the last lecture into a Python script, we would first

01:31.850 --> 01:38.570
need to use a popular module called Requests in Python that allows you to send HTTP requests very easily.

01:39.210 --> 01:43.050
We would first start by installing the requests module dependency using pip.

01:43.410 --> 01:47.330
You can do this by first making sure you are in your virtual environment.

01:47.490 --> 01:53.890
Remember that to do this, we need to first run the activate command, which was this command in PowerShell.

01:54.930 --> 01:57.090
As you can see, the ramp has been activated.

01:57.210 --> 02:01.690
Once we activate it, we then have to run the command pip install requests.

02:02.210 --> 02:08.530
But before we even can execute this command, I actually forgot to git push the ignore file.

02:08.530 --> 02:09.810
So let me do that right now.

02:09.970 --> 02:20.850
So git add git commit and let's write at dot git ignore back.

02:24.090 --> 02:24.850
And push.

02:30.770 --> 02:34.810
Okay now the git ignore has been successfully pushed.

02:34.810 --> 02:40.810
So we can move on to the command that we wanted to run which is pip install requests.

02:42.130 --> 02:44.330
Give it some time to install all the dependencies.

02:45.770 --> 02:51.730
Once requests has successfully been installed, we can verify that the install was successful by just

02:51.730 --> 02:57.730
doing a list command, and we will see that the requests is now showing up here.

02:58.620 --> 03:04.820
Now let's create a Python file which will have a.py extension and let's name it video underscore UI.

03:05.220 --> 03:09.500
As we ultimately want to get the video statistics of each video of the channel.

03:10.340 --> 03:13.020
So let's do that video stats UI.

03:13.500 --> 03:17.540
And the first step which we will do is to actually import the requests library.

03:20.780 --> 03:24.340
Now we need a URL which we will use to make the requests.

03:24.980 --> 03:27.660
Going back to the API explorer.

03:28.700 --> 03:33.020
This time we will need to use the channels resource as we have already seen in the last slide.

03:33.060 --> 03:43.500
To get the URL that we need, we go under channels list api fullscreen and this is the base URL that

03:43.500 --> 03:44.300
we will use.

03:44.620 --> 03:47.460
Obviously we have to fill in the required parameters.

03:48.100 --> 03:56.300
In our case we will need to use content details and also the channel handle which is MrBeast.

03:57.100 --> 04:01.180
As you can see, once we do these changes, the URL will change as well.

04:02.020 --> 04:06.460
Now, to get the channel handle for the Mr. Beast channel, I showed you that this Mr. Beast.

04:06.460 --> 04:14.500
But you can confirm this by going on Mr. Beast's YouTube channel and you will find the handle after

04:14.500 --> 04:16.300
the At symbol here.

04:16.340 --> 04:18.940
So this is the channel handle.

04:19.380 --> 04:25.580
And as a side note here, you will also see the total number of videos at this current point in time

04:26.100 --> 04:28.300
that Mr. Beast has on his channel.

04:28.420 --> 04:34.500
What we can do now is to copy the URL from the API explorer and set it as the URL variable into our

04:34.500 --> 04:35.140
scripts.

04:35.580 --> 04:36.700
Let's do that right now.

04:38.580 --> 04:45.900
So we're copying the URL from here and set it as a variable into our script.

04:48.420 --> 04:52.500
Obviously here we need to replace the API key that you see here.

04:53.140 --> 04:57.350
This we can do it by setting a variable as a global variable.

04:59.390 --> 05:03.310
And referencing it using Python f strings.

05:07.790 --> 05:14.510
Apart from the API key, it would be also a good thing to do that we replace the channel handle with

05:14.510 --> 05:15.430
a variable as well.

05:23.830 --> 05:24.830
And same thing.

05:27.870 --> 05:29.230
Use the curly brackets.

05:30.030 --> 05:34.510
And here we have the URL which we will use with the requests library.

05:34.990 --> 05:40.470
Now when we use the request library we can use the get methods to get a response by writing the following

05:41.630 --> 05:43.830
response equals requests.

05:44.350 --> 05:50.790
So we're using the library for just imported using the get method referencing the URL.

05:52.190 --> 05:54.510
Let's test this out by printing the response.

05:54.550 --> 05:57.400
And this should give a Of 200 response.

05:58.600 --> 05:59.480
Let's run this.

06:02.720 --> 06:04.440
And we have a 200 response.

06:04.480 --> 06:05.160
Fantastic.

06:06.760 --> 06:09.120
You can then parse this response using JSON.

06:09.120 --> 06:10.440
So we would write the following.

06:10.960 --> 06:14.560
Let's create a data variable and set it equal to the response.

06:15.920 --> 06:18.280
I'm getting the JSON from that response.

06:19.200 --> 06:23.400
Now we can print data and showcase the output like that.

06:23.440 --> 06:26.960
Or we can also import the JSON module.

06:30.840 --> 06:34.200
And use JSON dot dumps.

06:36.200 --> 06:40.200
This will take the data and we use the indent equals four.

06:41.360 --> 06:47.720
So for those of you who are not familiar, Json.dumps method is a Python method used to convert a Python

06:47.760 --> 06:50.040
object into a JSON formatted string.

06:50.200 --> 06:54.640
We use indent form as it is a common convention in Python for code readability.

06:54.680 --> 06:56.960
If we had to print this output.

06:59.680 --> 07:00.760
Run the file.

07:00.760 --> 07:03.920
From here we will see the JSON output that we require.

07:03.960 --> 07:09.320
Now we said we want the playlist id which if we look at the docs you would know it is this one here

07:09.560 --> 07:10.520
near uploads.

07:10.920 --> 07:15.120
Don't confuse it with this ID as this ID is the channel ID.

07:16.000 --> 07:20.080
So now it's a matter of going down the chain of JSON to extract the playlist ID.

07:20.360 --> 07:25.760
To help you better visualize how this JSON is structured, you can install an extension on VS code.

07:26.040 --> 07:32.760
I use JSON crack which is researching the extensions over here and type in JSON crack.

07:34.880 --> 07:40.080
You will find the extension I'm referring to, so if you don't have it, install it.

07:40.080 --> 07:42.560
I already have it, so it's already installed.

07:43.080 --> 07:47.960
If at the time you're watching this lecture, this extension is no longer available or on your ID,

07:48.000 --> 07:48.920
you don't see it.

07:49.120 --> 07:51.520
You can try searching for another similar extension.

07:51.760 --> 07:55.650
Once you have printed the response, Which is this one here?

07:56.130 --> 07:58.450
You can copy and paste it into a JSON file.

07:59.570 --> 08:00.610
Let's do that right now.

08:01.730 --> 08:05.090
Go back here and create a JSON file.

08:05.690 --> 08:10.570
Let's give it the same name as the Python script, but we use JSON extension.

08:13.210 --> 08:18.730
Once you open a new JSON file, it should have this orange icon in the top right, which relates to

08:18.730 --> 08:21.250
the JSON extension which we just installed.

08:22.010 --> 08:27.010
Press on it and here you can see the JSON visualized.

08:29.250 --> 08:30.290
We zoom in a bit more.

08:30.290 --> 08:35.330
We can see this part over here relates to this part over here.

08:35.690 --> 08:39.770
And the breakdown of the JSON structure is as you're seeing here.

08:40.770 --> 08:45.850
A cool part about all of this is the JSON crack also gives you the path from the root of the JSON,

08:45.850 --> 08:50.450
which you can see once you click on the uploads ID box.

08:51.490 --> 08:56.620
So this is the full path to get to the last part of the JSON structure.

08:56.620 --> 09:01.780
And the root in our case is the data variable, which we defined in our code, which has the whole JSON

09:01.780 --> 09:02.380
response.

09:03.420 --> 09:06.820
So we had to go back to the code and paste it in here.

09:06.860 --> 09:10.260
We can change the route with the data variable.

09:10.460 --> 09:17.100
And in order to get the value from the uploads key we also need to append the dot uploads key.

09:17.140 --> 09:22.820
Since we're using Python and since this is effectively a Python dictionary, you can access the items

09:22.820 --> 09:26.780
by using square brackets instead of these dots over here.

09:26.820 --> 09:30.220
So this same structure can be written as follows.

09:30.780 --> 09:36.940
These two are identical, but Python only recognizes the structure here and here we can remove the output

09:36.940 --> 09:38.700
provided to us by JSON crack.

09:39.420 --> 09:43.060
Let's structure this by splitting what we have here into two steps.

09:43.180 --> 09:47.820
We can get the first element of channel items by defining the variable channel items.

09:53.620 --> 09:58.860
And from there we get the channel playlist ID referencing the channel items variable we just created.

09:58.900 --> 10:06.260
So let's create the variable channel playlist ID equals just create it here.

10:07.060 --> 10:07.940
And that's it.

10:10.100 --> 10:13.900
Playlist ID we should get what we are looking for.

10:15.700 --> 10:16.700
Let's run this code.

10:18.660 --> 10:19.980
And here we have it.

10:19.980 --> 10:22.300
This is the channel playlist ID we've been looking for.

10:22.660 --> 10:27.340
So at this stage we have the basic code to get the playlist ID, but we need to do some changes in the

10:27.340 --> 10:31.420
code to make it more modular and adhere to software engineering best practices.

10:31.460 --> 10:34.340
First of all, we need to enclose all this code in a function.

10:34.580 --> 10:40.180
Let's call the function get playlist ID and in Python to define a function you can use this syntax.

10:40.740 --> 10:46.260
Def and the function id which we just set will name it as such.

10:52.070 --> 10:56.150
Next, we need to enclose the code inside the function in a try except clause.

10:56.630 --> 11:01.630
This is a best practice, and you do this to ensure that the function can gracefully handle potential

11:01.630 --> 11:07.070
issues and provide meaningful error messages instead of crashing, and you don't know what caused the

11:07.070 --> 11:07.430
issue.

11:10.870 --> 11:13.150
Again, I have to indent the code.

11:15.190 --> 11:16.230
Here we rewrite the.

11:16.230 --> 11:22.030
Except for now when working with the requests module, the docs and their errors and exceptions, which

11:22.030 --> 11:23.830
will be in the appendix of this section.

11:24.350 --> 11:26.670
And this is what you can see here.

11:27.750 --> 11:31.110
It says we should use the response the trace for status.

11:33.550 --> 11:35.310
To capture HTTP errors.

11:35.670 --> 11:42.150
And also that all exceptions inherit from request dot exceptions dot request exception.

11:42.870 --> 11:46.230
So we will use these two pieces of information in our code.

11:47.270 --> 11:50.990
Knowing this let's incorporate the final changes to the function.

11:52.230 --> 11:53.830
Let's go back to the code.

11:55.950 --> 11:58.150
Let's first start by replacing.

11:58.190 --> 12:01.990
Actually we shall remove this print as we don't need it.

12:04.110 --> 12:09.430
And we can instead write the response dot res for status.

12:10.950 --> 12:14.910
These other prints we can comment out for now because we will need them.

12:16.190 --> 12:22.230
Ultimately we want to also return the playlist ID, so let's do that.

12:22.230 --> 12:30.110
Return here FM stick playlist ID and we return the channel playlist ID.

12:31.190 --> 12:38.030
Finally, in the except clause, let's write what we just saw the requests documentation.

12:39.150 --> 12:49.270
So we write accept requests, dot exceptions, dot request exception as in.

12:50.070 --> 12:53.600
And then we raise the exception here.

12:53.600 --> 12:58.520
What we are saying is that any exception relating to requests will be set to the variable E.

12:59.040 --> 13:01.000
And then we need to raise the exception.

13:01.000 --> 13:02.680
So we write raise e.

13:04.600 --> 13:10.400
Now if we modify the function we need to ensure that it still runs as expected to execute a function

13:10.400 --> 13:11.000
in Python.

13:11.040 --> 13:17.520
A popular approach is to use the if underscore name equals double underscore main block, which I will

13:17.520 --> 13:17.960
write.

13:18.000 --> 13:25.960
Now if double underscore name is equal to me.

13:30.200 --> 13:33.200
Then we write the function name here.

13:33.200 --> 13:36.160
If you have coded in Python before, you will have seen this.

13:36.160 --> 13:38.320
And if you have to go over what it means.

13:38.560 --> 13:45.000
This name part is a special built in variable in Python, and it gets its value depending on how the

13:45.000 --> 13:45.880
script is run.

13:46.040 --> 13:54.090
And this main part over here is the value of name when the script is run directly and not imported as

13:54.090 --> 13:54.650
a module.

13:55.250 --> 14:02.130
So what we are saying here is if we run the script directly, meaning we push the run script button

14:02.130 --> 14:03.170
here in VS code.

14:03.490 --> 14:07.010
When this happens, name variable is set to main.

14:07.050 --> 14:11.570
The if conditional then becomes true and the contents of this block is run.

14:11.610 --> 14:14.210
In this case, the function will be executed.

14:15.410 --> 14:20.930
If, however, we have to run this script from another script, the name variable will not be set to

14:20.970 --> 14:27.610
main, but it will be set to the name of the Python script, which in our case is video underscore stats.

14:27.770 --> 14:32.530
Let's do a quick example and write the following in the video script.

14:32.570 --> 14:34.010
First, let me close this here.

14:35.010 --> 14:40.130
Let's add a print statement in the if block to showcase what happens when this script is run directly.

14:46.930 --> 14:52.210
And then also we need to create another block to print text, highlighting that the script was executed

14:52.210 --> 14:53.250
from another script.

14:53.490 --> 14:56.130
In this case, the function won't be executed.

15:06.170 --> 15:10.290
So if we were to run a directory, the if clause would get executed.

15:11.250 --> 15:12.090
Let's do that.

15:14.090 --> 15:18.810
And as you can see, the print statement of the if block got executed.

15:19.490 --> 15:26.330
If, however, we were to run this main script from another script, let's call it import videos stats.

15:26.570 --> 15:31.290
So import videos stats.py.

15:33.930 --> 15:38.090
The S block will be executed and the function we just created won't run.

15:38.930 --> 15:45.370
So let's first import the main script and then write a simple print statement showcasing where we are

15:45.370 --> 15:46.570
running the script from.

15:54.180 --> 16:01.460
Again, we're doing this to highlight that we are running this script from the import video underscore

16:01.460 --> 16:02.500
setup UI.

16:03.300 --> 16:04.460
Let's run the script.

16:05.660 --> 16:12.140
And as you can see, the prints from the code executed and the function was not run like this.

16:12.180 --> 16:16.380
We have gone over this famous double underscore name equals double underscore main.

16:16.420 --> 16:19.180
If you didn't get this last part of the lecture that's okay.

16:19.580 --> 16:21.460
Feel free to rewind and listen again.

16:22.060 --> 16:25.060
It definitely won't come in the way of completing this project.

16:25.380 --> 16:30.260
It is just a good to know and helps with code, modularity, and even for testing purposes.

16:31.380 --> 16:33.220
Before we start, let's do a bit of cleaning.

16:33.260 --> 16:39.220
Let's delete the JSON file we created to showcase JSON crack so we can press delete.

16:40.780 --> 16:45.060
And also we can remove the prints from the if name equals main block.

16:47.820 --> 16:49.580
That and also remove the block.

16:50.460 --> 16:56.900
One final thing we can remove is the import video stats, as we will not be needing it in our course.

16:59.260 --> 17:04.780
And actually, one thing that we need to do before closing off is to confirm that the script still runs.

17:04.860 --> 17:06.140
Returning the variable.

17:06.180 --> 17:09.900
What we see here won't output the result in the terminal.

17:10.020 --> 17:12.780
So what we can do is uncomment the print.

17:13.420 --> 17:15.620
We need to correct this variable name.

17:16.900 --> 17:18.740
And now we can press run.

17:22.300 --> 17:23.340
And there we have it.

17:23.900 --> 17:26.340
The channel playlist ID is printed in the terminal.

17:26.340 --> 17:30.100
And so we can confirm that the script is still working as expected.

17:30.380 --> 17:32.100
Now let's comment the print again.

17:33.260 --> 17:33.860
We save.

17:34.420 --> 17:35.980
Let's stop here for this lecture.

17:35.980 --> 17:38.660
In the next lecture we will push our code to GitHub.

17:38.980 --> 17:42.020
But first we will need to introduce the dot env file.

17:42.340 --> 17:43.300
I will see you then.
