WEBVTT

00:04.400 --> 00:05.040
At this stage.

00:05.040 --> 00:06.560
We now have the video IDs.

00:06.840 --> 00:12.240
What we need is a function that relates these video IDs to the final seven variables of interest, which

00:12.240 --> 00:14.720
we have discussed already, and that you see here.

00:17.640 --> 00:19.120
From the Russian analogy.

00:19.160 --> 00:22.880
We know that we will use the videos resource to get the video data.

00:23.240 --> 00:28.920
And if we had to go to the API explorer to get the seven variables of interest, we'll need to use the

00:28.920 --> 00:33.600
content details, statistics and snippet for the path parameter that you see here.

00:33.800 --> 00:36.520
The video IDs we found in the previous lecture.

00:36.520 --> 00:39.080
We can specify here in this ID folder.

00:39.680 --> 00:45.360
Here I am just defining one video, but looking at the documentation, it states that we can define

00:45.360 --> 00:50.320
it as a comma separated list of video IDs with data type string.

00:50.520 --> 00:56.960
An important note on this is that for this concatenated string of video IDs, same as for the maximum

00:56.960 --> 01:02.950
results in a page, we can only have a maximum of 50 IDs in this video ID string.

01:03.910 --> 01:10.430
So knowing this, we can start by defining a helper function to split the video ID list into batches.

01:10.990 --> 01:12.870
Let's define it right now in VS code.

01:15.550 --> 01:24.710
Let's go down here and let's call it batch list which takes as input the video IDs list and also the

01:24.750 --> 01:31.150
batch size which we will set it equal to the max results when we come to implement it inside the function.

01:32.470 --> 01:43.430
So the batch list video ID list and also takes batch size.

01:43.870 --> 01:49.510
From here we just define a for loop that will loop through the list and batches of size batch size.

01:49.950 --> 01:53.470
And we each time want to eat the batched video list.

01:54.550 --> 02:06.620
So we define a for loop for video ID in the range going from zero to the length of the whole video ID

02:06.620 --> 02:16.940
list, which we found in the previous function, taking also into consideration the batch size.

02:18.980 --> 02:25.100
We will also yield the video idealists.

02:27.740 --> 02:28.660
In batches.

02:31.180 --> 02:37.620
So starting from video ID all the way to the batch size length.

02:43.980 --> 02:46.420
Note how here we write here, then not return.

02:46.860 --> 02:52.340
If we specify return, the loop will stop in the first iteration, which is not what we want.

02:52.500 --> 02:57.980
So now that we have defined this function, let's momentarily go back to the API explorer so that I

02:57.980 --> 03:01.620
can show you the Response from what we have defined already.

03:02.380 --> 03:04.700
Again, here I will use just one video ID.

03:06.860 --> 03:12.220
Let's go down and we untick the old tool and press execute.

03:12.740 --> 03:15.100
And the response we get the variables of interest.

03:15.100 --> 03:16.860
These being the snippet.

03:17.300 --> 03:18.420
This is the snippet key.

03:19.060 --> 03:21.980
And we want the published at.

03:22.420 --> 03:30.780
And the title of the video going down to the content details which is this key over here.

03:30.860 --> 03:32.860
From here we want the duration of the video.

03:33.060 --> 03:36.500
And if we had to scroll further down we also want the statistics.

03:36.740 --> 03:40.820
And for this we know that we want the views, the likes and the comment count.

03:41.140 --> 03:46.580
Now that we know what we are after, let's copy the URL from the API explorer and paste it into our

03:46.580 --> 03:47.180
script.

03:48.060 --> 03:49.740
And we take this back to VS code.

03:50.020 --> 03:51.500
Let's just paste it here for now.

03:51.940 --> 03:54.140
We will obviously have to do some changes to this URL.

03:54.180 --> 03:55.860
But for now we can leave it as it is.

03:56.140 --> 03:56.900
Moving on.

03:56.900 --> 04:02.290
Let's now define the main function by giving it a meaningful name like extract.

04:05.170 --> 04:11.450
Video data, which takes the video ID list from the previous function as argument.

04:11.450 --> 04:13.650
So we take the argument as video IDs.

04:13.810 --> 04:15.690
We can also define an empty list.

04:15.690 --> 04:20.570
Let's call it extract the data that will store the seven variables we are after.

04:20.570 --> 04:28.370
For all the videos that are in Mr. Beast's channel extracted data, and we define it as an empty list.

04:29.130 --> 04:34.050
Before we move on, let's try and give a bit more structure to this main function.

04:34.650 --> 04:36.810
Let's put the helper function inside.

04:37.970 --> 04:40.810
Let's put it below the extracted data definition.

04:42.970 --> 04:44.810
And we will also use the URL.

04:47.450 --> 04:48.290
Somewhere here.

04:49.610 --> 04:55.770
Now that we have modified the structure a bit, let's encapsulate the code in a try and accept clause.

04:56.490 --> 04:59.760
So we do try Same as before.

05:00.960 --> 05:12.480
Accept requests dot exceptions dot request exception as a and reraise the error should there be one.

05:12.960 --> 05:18.160
The next step is to loop through the whole list in batches at each batch, defining the concatenated

05:18.160 --> 05:24.120
video ID string using the Python join method, which we will then be use to build the URL.

05:25.160 --> 05:35.520
So let's do that right now and define a for loop for batch in batch lists, which we have defined already.

05:36.920 --> 05:41.400
This will use the video IDs and also the max results.

05:42.640 --> 05:47.960
Now we join the video IDs into a comma separated string as per the docs.

05:50.280 --> 05:53.720
Comma separated using the join method in Python.

05:57.990 --> 05:59.670
Now at this point we need the URL.

05:59.950 --> 06:03.030
So let's define the URL here and paste it here.

06:03.590 --> 06:06.030
Obviously we also need to do some cleaning.

06:06.230 --> 06:08.950
So in this case the part stays as is.

06:09.750 --> 06:12.070
But the ID we need to change.

06:12.070 --> 06:15.430
This will be the video ID string.

06:15.790 --> 06:17.550
Here we also need to define an f string.

06:18.750 --> 06:19.550
Put this here.

06:20.710 --> 06:24.110
And same as always we also change the API key.

06:26.710 --> 06:29.070
Same as we have done for the previous functions.

06:29.430 --> 06:30.910
We then get the response.

06:30.950 --> 06:35.310
Check for errors and if there are no errors, get the response data in JSON format.

06:35.510 --> 06:37.950
So let's copy this from the previous functions.

06:44.670 --> 06:45.350
So this.

06:45.350 --> 06:46.430
We don't need it anymore.

06:46.990 --> 06:50.110
The batch list since we are defining it in the main function.

06:50.590 --> 06:55.390
Now similar to what we have done in the previous function, we can also use the get method to get the

06:55.390 --> 07:03.310
values under the main items key for video ID snippets, content details and statistics, with the empty

07:03.350 --> 07:07.390
list as a value to return if the specified key does not exist.

07:08.350 --> 07:10.630
Let's write this right now.

07:10.950 --> 07:19.630
So for item in data dot, get get from the items.

07:20.750 --> 07:25.630
If there is no key that we are after, we define an empty list.

07:27.670 --> 07:28.910
That would be the default.

07:29.590 --> 07:41.510
And also we start defining for video ID we will take it from the ID key snippet will be the snippets

07:41.550 --> 07:42.030
key.

07:45.910 --> 07:51.430
Content details will be in the content details key.

07:55.070 --> 07:56.940
And finally this stakes.

07:59.220 --> 08:01.300
Will be in the statistics game.

08:04.620 --> 08:09.780
From here, it's only a matter of creating a dictionary that will contain all the data we are interested

08:09.780 --> 08:10.100
in.

08:10.660 --> 08:11.820
Let's write that right now.

08:12.900 --> 08:14.900
So we define the dictionary.

08:15.100 --> 08:16.380
Let's call it video data.

08:17.900 --> 08:23.700
We can define a dictionary that will take the variables from this loop.

08:24.300 --> 08:27.060
So we can define the video id.

08:34.220 --> 08:37.900
Which will write the key's first title published at.

08:42.060 --> 08:43.100
Duration.

08:46.860 --> 08:47.940
View count.

08:58.770 --> 09:00.890
And just to be sure that we are all on the same page.

09:01.050 --> 09:06.890
These seven variables in this dictionary are the final data we want for each Mr. Beast YouTube video.

09:06.890 --> 09:08.770
So let me complete this dictionary.

09:28.730 --> 09:33.610
So as you are seeing for statistics variables, we are using the Get methods as.

09:33.610 --> 09:38.970
While I was testing, I found that there are some instances where not all the videos have views, likes

09:39.010 --> 09:40.210
and comments visible.

09:40.570 --> 09:44.090
So we need to take this into account in our case.

09:44.130 --> 09:47.370
And that is what we are saying for these three variables.

09:47.370 --> 09:53.250
If for a specific video, either the views, likes or comments counts are not available, then replace

09:53.250 --> 09:54.930
this value with none.

09:55.210 --> 10:01.920
So finally we just append the data to the extracted data variable as we had defined at the beginning

10:01.920 --> 10:02.880
and return it.

10:03.880 --> 10:07.760
So extract the data dot append.

10:09.400 --> 10:11.720
Getting the video data which we just defined here.

10:13.960 --> 10:17.160
And we simply return the final variable.

10:18.920 --> 10:24.480
One very important thing to notice and I was about to make this mistake is the indentation of our code.

10:25.000 --> 10:31.440
Here you see that the video data dictionary is outside our for loop like this.

10:31.480 --> 10:34.080
We only get data for the first 50 videos.

10:34.080 --> 10:40.400
So to get the data for all the videos, we need to indent the video data dictionary and the appending

10:40.440 --> 10:42.800
to the main variable inside the for loop.

10:43.440 --> 10:45.640
So we do that like this.

10:46.480 --> 10:48.200
Now the function is complete.

10:48.640 --> 10:53.840
If we run this function, once we loop through all the videos we will get a large dictionary.

10:53.880 --> 10:59.440
Let's showcase this by going in the double underscore name equals double underscore main part of the

10:59.440 --> 11:01.560
script and make some changes.

11:01.800 --> 11:09.040
So let's first make a change of defining the variable for video IDs function.

11:09.040 --> 11:12.480
And from here we can call the function that we just defined.

11:14.360 --> 11:15.320
And as an input.

11:15.360 --> 11:18.040
Getting the video IDs from the previous function.

11:18.040 --> 11:25.080
For purpose of the demonstration, I will also include a print statement to showcase the outputs of

11:25.080 --> 11:26.720
the run in the terminal.

11:26.760 --> 11:27.960
Let's press run.

11:28.920 --> 11:33.440
This might take some time since we are looping through a number of videos.

11:35.320 --> 11:36.440
And there you have it.

11:37.080 --> 11:41.280
This is the video data we will be populating in our data warehouse.

11:41.280 --> 11:43.320
Same as I have done in the previous lecture.

11:43.320 --> 11:49.240
I will remove the print, save and commit this change to GitHub and I will see you in the final part

11:49.240 --> 11:51.240
of this four part lecture series.
