WEBVTT

00:00.470 --> 00:01.610
All right, everyone.

00:01.610 --> 00:08.180
Now that we have described exactly what we want to scrape, we are going to write some actual code now.

00:08.180 --> 00:14.090
So let's go ahead and import request from.

00:15.450 --> 00:16.050
Request.

00:16.050 --> 00:16.980
Promise.

00:19.130 --> 00:20.840
And we will also import.

00:20.850 --> 00:21.590
Cheerio.

00:23.890 --> 00:25.180
So quiet.

00:25.180 --> 00:25.960
Cheerio.

00:26.320 --> 00:29.830
And then let's make a function.

00:30.550 --> 00:34.360
We call it scrape titles for now.

00:35.590 --> 00:37.690
Uh, let's call it titles.

00:39.510 --> 00:43.260
Ranks and ratings.

00:43.970 --> 00:47.820
We scrape lots of things in this one, but I'll show you how we do it.

00:47.820 --> 00:55.350
So first of all, we get the titles right and then I will call the function down here in the script

00:55.350 --> 00:56.130
itself.

00:56.280 --> 01:02.970
So we need to get the result using request.

01:02.970 --> 01:14.460
So we say result, await, request get, and we put in the URL for all of the top 100 pages.

01:14.520 --> 01:16.860
We top 100 movies we have here.

01:18.720 --> 01:27.300
Put it in here and then we have the trivial result, the HTML we can select in.

01:27.300 --> 01:29.490
So that's going to be Cheerio.

01:29.670 --> 01:32.160
Load result.

01:32.490 --> 01:37.110
And remember also to use Await, this is also a async function.

01:38.100 --> 01:45.810
Now let's go inside of our browser, inside of the Chrome browser and fiddle around a bit to find the

01:45.810 --> 01:52.020
Cheerio, uh, the Cheerio elements or jQuery elements we can select.

01:53.220 --> 02:00.750
So we go and select, for example, using the select element inside of Chrome developer tools.

02:00.750 --> 02:02.790
We can select the title here.

02:03.090 --> 02:04.350
We select that one.

02:04.350 --> 02:09.600
We can see that each of the movies is actually inside of a table.

02:09.690 --> 02:12.270
So each of them is just a table row.

02:12.720 --> 02:19.140
And inside here we have a cell which has the name title column or the class title column.

02:19.140 --> 02:25.470
It has an a element and this one has a link to the description page.

02:25.470 --> 02:30.780
We see where we can see more about the movie and also the title inside here.

02:30.930 --> 02:36.630
It also has a title attribute, but actually the title attribute contains just the director and actors.

02:36.630 --> 02:38.370
So we are not going to be using that.

02:38.370 --> 02:42.270
We just get the text representation instead of the A element.

02:43.350 --> 02:51.780
Now let's just go and right click this element, select copy and say copy selector and let's go inside

02:51.780 --> 02:52.750
of our console.

02:52.750 --> 02:55.660
So now, um.

02:57.520 --> 03:03.550
If you go and select all of this, I think that's a pretty long selector and it's also only going to

03:03.550 --> 03:13.000
select the first element, but we want to select all of them and luckily we have jQuery enabled on this

03:13.000 --> 03:13.390
page.

03:13.390 --> 03:15.610
So if we go here and say.

03:17.790 --> 03:26.250
That text where we select all of the title column and the a element which contains the title.

03:26.730 --> 03:31.020
We get all of the titles for the different movies.

03:33.120 --> 03:40.380
And the idea is we are going to get an array instead over here, which each of the movie elements.

03:41.160 --> 03:44.250
So let's go ahead and do that.

03:45.330 --> 03:48.310
So we have a movies array.

03:48.330 --> 03:57.720
I will make here and we put in the let's see, the, the more short selector we have here, which works

03:57.720 --> 03:58.380
fine.

03:59.100 --> 04:07.320
And we see CD title comma A and we go and say map instead.

04:07.320 --> 04:11.010
So we return a, a array.

04:11.790 --> 04:19.200
So index element and we use an arrow function here and we are going to return.

04:20.700 --> 04:24.160
We are going to return the title.

04:27.000 --> 04:29.670
Text like so.

04:30.810 --> 04:34.350
And let's see in the end how our movies look.

04:36.720 --> 04:37.680
That's it.

04:37.710 --> 04:40.680
Now let's run the script.

04:40.800 --> 04:44.190
Node Index JS.

04:45.150 --> 04:46.230
Let's run it.

04:50.650 --> 04:51.160
Okay.

04:51.160 --> 04:59.080
So this is something that I stumbled a bit about because it looks like we're actually getting an array

04:59.080 --> 05:02.140
of, um, cherry objects.

05:02.230 --> 05:06.760
Cherry objects, instead of just getting the title that we want.

05:07.540 --> 05:16.090
And so Cherry has a little strange thing about doing that, which means you can chain the functions

05:16.090 --> 05:18.520
or cherry or events.

05:18.790 --> 05:25.510
Basically, if you want to just the titles, you just type dot get after this one and you will get the

05:25.510 --> 05:28.240
values that you really are looking for.

05:30.290 --> 05:31.940
So let's try it again.

05:36.450 --> 05:37.320
All right.

05:37.320 --> 05:45.210
So now we get all of the titles of the different movies in the order they are ranked.

05:45.510 --> 05:49.230
Okay, So in the next section, let's go and get some more data.
