WEBVTT

00:01.760 --> 00:09.980
All right, So now we got our script changed a bit so we can scrape all of the, uh, well, the URLs

00:09.980 --> 00:11.810
and we get the nice array back.

00:11.810 --> 00:17.060
But now we also need to actually go through each of these description pages.

00:17.060 --> 00:19.610
So let's try and do that in this lecture.

00:21.550 --> 00:23.440
So let's create a new function.

00:23.680 --> 00:28.600
We call it scrape description pages or scrape description page.

00:29.140 --> 00:30.640
We pass in a URL

00:32.760 --> 00:33.880
and

00:36.880 --> 00:38.920
do a try catch clause.

00:39.160 --> 00:40.870
It's always nice to do.

00:42.640 --> 00:44.400
Um, let's see.

00:44.410 --> 00:48.700
So we pass in the homes to the description page.

00:50.690 --> 00:57.740
And then what we want to do is we want to go to a new page in this function.

00:58.190 --> 01:05.480
I don't want to launch a new browser for every description URL we are going to go into for every home.

01:05.480 --> 01:12.530
That would be a huge waste of resources to open a new browser or totally new browser window every time

01:12.530 --> 01:14.120
we have a new URL.

01:14.330 --> 01:20.780
Instead, what we want to do is we want to go to a new page that we're just scraping the description

01:20.780 --> 01:26.660
page on and we want to go to the new URLs on this page.

01:27.260 --> 01:30.500
Let me show you what we mean exactly in code.

01:31.010 --> 01:39.410
So I'm going to declare the browser variable up here in the global scope and I'm going to move this

01:39.410 --> 01:43.370
statement to our main function.

01:43.370 --> 01:55.530
Instead, remember to remove the const, so it uses the global variable and then instead of making a

01:55.530 --> 02:02.040
new page for every description page we go through, like instead of making ten new pages, I just want

02:02.040 --> 02:05.580
to use one page for all of these URLs.

02:05.910 --> 02:09.900
So I'm also going to create a new page here.

02:09.900 --> 02:11.460
Description page.

02:12.790 --> 02:18.520
And let's say await browser new page.

02:19.750 --> 02:24.640
So that's going to create a new page which we can pass into our function here.

02:24.910 --> 02:30.220
So add another parameter up to the function call here, which we call page.

02:33.440 --> 02:37.430
And let's see.

02:38.000 --> 02:43.520
Remember to put on weight here, and then we put in the description page.

02:45.650 --> 02:46.370
All right.

02:46.400 --> 02:52.460
Now we just need to make the call in here just to see if it is working.

02:52.460 --> 02:57.440
So we go and say await page, dot, go to.

02:58.220 --> 03:01.810
So that should go through the array.

03:01.820 --> 03:12.020
We get the array of homes and we should go through each of these homes, arrays or URLs and open the

03:12.020 --> 03:13.370
description page.

03:13.790 --> 03:17.540
And so we need to make a for loop.

03:18.770 --> 03:27.140
I'm going to make a good old fashioned follow because a follow up is run in serial, unlike something

03:27.140 --> 03:30.980
like for each which which is going to run it in parallel.

03:31.070 --> 03:34.310
And we don't want to run this in parallel.

03:34.310 --> 03:39.380
It's, uh, it's not going to end so well for you.

03:39.500 --> 03:46.310
I believe it's, uh, you might want to mess around with it later if you want to optimize for things

03:46.310 --> 03:47.450
and run it in parallel.

03:47.480 --> 03:50.030
But for now, we just keep it in serial.

03:51.080 --> 04:01.850
Um, so we get the homes array and we go through each of these arrays and we pass in the URL.

04:04.140 --> 04:10.740
So Holmes I and the description page which we make up here and.

04:11.760 --> 04:12.240
Yeah.

04:12.240 --> 04:19.080
So this should just open each of these description pages in a single page.

04:19.980 --> 04:23.110
Let's see how it's going to look like.

04:23.130 --> 04:24.720
So here it is.

04:25.840 --> 04:29.320
Opening the homes array mean the homes page.

04:29.440 --> 04:36.490
And there we are, opening the description page and it should be going to a new page every once in a

04:36.490 --> 04:37.250
while.

04:37.270 --> 04:43.570
So you can see it's going to a new page and another one and so on.

04:43.570 --> 04:45.370
You get the picture right.

04:46.060 --> 04:53.290
Um, so of course the next step we have to do now is we want to scrape some values from each of these

04:53.290 --> 04:54.190
pages.

04:54.220 --> 04:57.460
And yeah, that's, that's the fun part.
