WEBVTT

00:01.610 --> 00:09.710
All right, So now we got the we got the CSS selector for the the poster here.

00:09.770 --> 00:15.430
Now we want to go through each of the movies to do this automatically.

00:15.440 --> 00:24.470
So to begin with, we are going to write a function which we call scrape poster URL, and this takes

00:24.470 --> 00:28.310
in the array of movies we made in this function here.

00:30.410 --> 00:37.850
So this is going to be a little weird now, just like in our but it's actually just like in our Craigslist

00:37.850 --> 00:47.330
scraper because we're going to use a for loop and go through each of these movies in parallel asynchronously.

00:47.420 --> 00:52.940
So what I mean by that is that we have this movies with post URL.

00:54.800 --> 01:02.920
And we are going to use the promise dot all and we are also going to use.

01:02.930 --> 01:14.000
So we run a movies dot map and here we have async movie which we go through each of the movies and this

01:14.000 --> 01:17.660
is pretty much just like in our Craigslist scraper.

01:17.660 --> 01:24.560
So the reason why we are using Promise dot all is because we're going to be using some async calls to

01:24.560 --> 01:28.220
we're using requests in this dot map function.

01:28.880 --> 01:37.370
And if you don't have the await promise dot all the the um it's not going to wait for all the promises

01:37.370 --> 01:41.720
to finish and this array is not going to be complete with the data.

01:42.770 --> 01:46.310
Um, so this is just like our previous quick scraper.

01:46.430 --> 01:49.220
But let me show you again what we do.

01:49.220 --> 01:56.210
So we have, um, the HTML from, uh, from request.

01:56.570 --> 02:01.790
So request dot get and we say movie dot description URL.

02:02.420 --> 02:05.750
So the description again is just this page.

02:05.750 --> 02:10.070
We go inside to here and we make the.

02:11.090 --> 02:13.400
Uh, stereo selector here.

02:14.570 --> 02:19.160
So stereo load HTML.

02:19.610 --> 02:24.290
And now we can select the poster URL.

02:25.400 --> 02:28.540
Actually, I'm going to just add it to the movie here.

02:28.550 --> 02:35.150
Movie dot poster URL, and I'm going to select the element here.

02:35.300 --> 02:44.390
So that's the one we got previously and get the h href attribute like so.

02:44.390 --> 02:46.700
And then I will return the movie.

02:47.120 --> 02:54.350
So then we're going to end up with the same array as we have here in this function, but it's going

02:54.350 --> 02:57.710
to be also adding the poster URL to each element.

02:58.640 --> 03:02.690
And then after that, I think I also want to return the array here.

03:04.640 --> 03:09.470
Now in this function we made over here, scraping the title ranks and so on.

03:09.860 --> 03:14.630
Instead of just doing a console log, I also want to return the array here.

03:14.720 --> 03:16.430
So return movies.

03:18.160 --> 03:18.610
Yeah.

03:19.630 --> 03:25.050
And I think also because we're doing so many requests, um.

03:26.110 --> 03:26.490
Request.

03:26.800 --> 03:27.610
Get in here.

03:27.610 --> 03:30.310
I think it's a good idea to have a try.

03:30.310 --> 03:34.000
Catch clause here and.

03:34.820 --> 03:37.100
Just put it in here.

03:39.920 --> 03:41.780
Just in case we get some errors.

03:41.780 --> 03:45.880
I don't want the whole thing to shut down.

03:45.890 --> 03:49.190
If there's one URL that is odd or something.

03:51.150 --> 03:53.310
So doh, doh, doh, doh.

03:53.430 --> 04:01.350
So now we are going to make another async function, which is going to be sort of called our main.

04:02.100 --> 04:03.780
And in here.

04:05.150 --> 04:07.190
We call the previous functions.

04:07.190 --> 04:09.440
We wrote this one.

04:09.440 --> 04:18.590
So this one is now returning the array return movies with all the titles, ratings and so on.

04:18.980 --> 04:29.060
And then we put in the movies again inside of our scrape post URL, and this one is going to return

04:29.060 --> 04:33.470
the movies array, this time with the post URL to each element.

04:34.370 --> 04:40.760
And then we can try and do a console log, see how our array is looking.

04:41.000 --> 04:46.700
And now instead of calling this one scrape titles and ranks, we are going to call Main instead.

04:48.410 --> 04:49.130
Okay.

04:50.560 --> 04:52.990
Now let's try and run it.

04:55.100 --> 04:57.680
So I am getting a heat error here.

04:57.680 --> 05:02.900
Movies dot map is not a function that is art.

05:02.930 --> 05:07.040
Well, that's because I need to also await this function here.

05:07.130 --> 05:12.080
And I also need to await this one because they're also asynchronous functions.

05:13.210 --> 05:16.090
Let's try again, ladies and gentlemen.

05:20.980 --> 05:22.630
Moment of truth.

05:23.410 --> 05:27.550
So we get some errors here.

05:27.700 --> 05:30.730
So like I said, there was some, um.

05:32.000 --> 05:35.240
There is some errors.

05:35.240 --> 05:41.490
We get some functions that are missing the URL or something.

05:41.520 --> 05:43.760
We get those up here.

05:44.420 --> 05:48.170
I can also just comment that out for now.

05:48.170 --> 05:51.830
If I don't want my console log to be messy.

05:51.860 --> 05:58.300
We could also look and take off any elements that doesn't have the title to make it a bit more clean.

05:58.310 --> 06:02.420
The data we're working with, but for now I'm just going to keep it like it is.

06:03.020 --> 06:09.110
But as you can see, after the errors we get here, we see the actual elements we have here.

06:09.110 --> 06:13.280
We have a poster URL for each of these elements.

06:14.860 --> 06:24.520
Also, just like in our other example, I'm going to prepend this imdb.com to the URL.

06:24.520 --> 06:26.950
So it's just ready to go.

06:28.150 --> 06:30.340
Let's try and.

06:31.210 --> 06:32.020
Run it again.

06:32.050 --> 06:33.340
See how it looks.

06:40.360 --> 06:41.140
All right.

06:41.320 --> 06:48.470
Now we get this nice URLs here for each of the movies containing the poster URL.

06:48.490 --> 06:52.060
So if you go inside to this URL here.

06:56.970 --> 06:59.250
You get to the poster itself.

07:00.570 --> 07:01.170
See.

07:02.490 --> 07:09.630
So in the next section, we are going to be looking at scraping with the nightmare, scraping the picture

07:09.630 --> 07:15.390
itself, and why actually we need to be using a nightmare in this example.

07:15.990 --> 07:16.590
Okay.

07:16.620 --> 07:18.090
See you in the next section.
