WEBVTT

00:00.260 --> 00:01.310
All right, everyone.

00:01.310 --> 00:05.660
So on the right side, I have my Google Chrome browser running.

00:05.660 --> 00:09.050
And on the left side is the Visual Studio code.

00:09.320 --> 00:15.110
So let's talk first about what exactly data we are going to be scraping.

00:15.320 --> 00:23.390
So if we go inside of the IMDb page, you can go into movies, TV, show time, and you can select most

00:23.390 --> 00:24.760
popular movies.

00:24.770 --> 00:29.430
And if you go in here, you should get to a page much like this one.

00:29.450 --> 00:37.160
Most popular movies, 100 titles, and each of these items have a movie title.

00:37.190 --> 00:38.870
They have a rank.

00:38.870 --> 00:41.090
So this one is ranked number one.

00:41.330 --> 00:45.410
It has an IMDb rating of 8.4.

00:46.700 --> 00:49.250
So let's define exactly what we want to get.

00:49.280 --> 00:53.150
We want to get a title.

00:53.510 --> 00:57.170
So for example, Bohemian Rhapsody.

01:01.000 --> 01:03.340
And we want to get a rank.

01:03.340 --> 01:12.490
In this case, it's rank number one and we want to get the IMDb rating 8.4, and then we want to get

01:12.490 --> 01:19.480
the well, we want to get the poster image in the end, the big resolution image of this one.

01:19.600 --> 01:23.230
So we need to go inside of the page by clicking here.

01:24.210 --> 01:29.760
And here is a more descriptive page of the of the movie.

01:29.760 --> 01:33.630
I would just call that the description URL.

01:34.800 --> 01:40.620
So let's copy the URL here just for a example sake.

01:41.160 --> 01:50.070
And now if we right click here and say open image or save image, we are going to get a lower resolution

01:50.070 --> 01:51.930
image of the poster.

01:51.930 --> 01:52.950
I don't want that.

01:52.950 --> 01:59.970
I'm kind of picky here, so I want the the high resolution image, which you can only get by clicking

01:59.970 --> 02:01.290
on the image here.

02:02.090 --> 02:07.070
Now this image is much higher resolution, which is the one we are going for.

02:07.400 --> 02:15.640
So the URL I get up here is different than when I'm on the description descriptive page on here.

02:15.650 --> 02:18.980
I will call this the poster URL.

02:20.770 --> 02:29.410
And this is where well, we will be taking nightmare chairs to to download the image itself.

02:29.650 --> 02:35.350
So that's all, folks, that's all the data that we want from these different movies.

02:35.350 --> 02:44.590
So each one of them, we get the rating rank title, go inside the site here, click on the image or

02:44.590 --> 02:46.750
get the URL for this image.

02:46.750 --> 02:48.730
And then we go inside Nightmare.

02:49.570 --> 02:53.650
Render this page because it requires JavaScript.

02:54.880 --> 02:57.400
And we are going to get the image.

02:57.850 --> 03:04.270
Okay, guys, in the next section, we are going to be getting the titles for each of all of these movies.
