WEBVTT

00:00.920 --> 00:07.430
So now we got all of the job titles and the URLs to the job descriptions into one nice little array

00:07.430 --> 00:09.320
with all the objects.

00:09.320 --> 00:12.470
But what about the job descriptions as well?

00:12.470 --> 00:18.350
The ones that you see when you actually click on the job title or the URL for the job.

00:18.380 --> 00:20.990
That's what we're going to get in this section.

00:21.230 --> 00:26.510
We're going to be scraping all of the text we have in here with the job description for each of the

00:26.510 --> 00:27.320
jobs.

00:28.190 --> 00:31.640
So let's go inside of the code and see how we approached this.

00:32.240 --> 00:37.700
So first of all, I think it's a good idea to call this function something else, because we're going

00:37.700 --> 00:44.000
to have a separate function for scraping the jobs itself on the index pages and the job descriptions.

00:44.000 --> 00:47.030
It's nice to kind of separate things in this way.

00:47.630 --> 00:53.060
So I'm going to call this scrape jobs from index pages.

00:54.670 --> 01:00.280
And then I'm going to make a new async function called main.

01:01.700 --> 01:06.830
And let's call Maine down here instead of the other function.

01:07.310 --> 01:12.740
And then we call the function scrape jobs from index pages.

01:12.740 --> 01:16.220
And we return the all jobs from there instead.

01:17.310 --> 01:18.510
Just like this.

01:18.540 --> 01:23.190
Now make sure to also return all shops inside of this function.

01:25.110 --> 01:26.070
This one.

01:26.070 --> 01:30.930
Next up, let's create our new function for scraping the job descriptions.

01:31.740 --> 01:34.800
So let's call it.

01:35.710 --> 01:36.520
Here.

01:36.520 --> 01:39.940
Let's call it scrape job descriptions.

01:41.030 --> 01:43.160
Make sure it's also an async function.

01:45.160 --> 01:47.410
And then what we're going to be doing in here.

01:47.410 --> 01:53.110
We're going to receive all the jobs in here, and then we're going to iterate through all of the jobs

01:53.110 --> 01:54.670
and get the descriptions.

01:54.970 --> 02:00.640
We can do a dot map over here, and we have a job object for each of these jobs.

02:01.120 --> 02:07.900
And then in here we can do a Axios request and get the HTML with the job description.

02:08.200 --> 02:10.480
So we can say job description.

02:11.340 --> 02:13.470
Page and we can say await.

02:14.390 --> 02:20.900
Uh, make sure we have an async in here as well, because we're going to be using await inside this

02:20.900 --> 02:21.530
loop.

02:22.460 --> 02:26.930
So we say await and we can say Axios dot get.

02:27.170 --> 02:31.430
And we can need to say the root of the URL first.

02:31.430 --> 02:36.980
So which is prefix list dot vercel dot app slash.

02:36.980 --> 02:40.760
And then we can just do a plus with the job URL.

02:41.750 --> 02:48.230
And then after that we need to create this cheerio object or the cheerio selector jQuery selector.

02:49.620 --> 02:54.750
Where you put where you say cherry dot load job description page.

02:54.750 --> 02:56.250
Make sure it's dot data.

02:56.250 --> 02:58.020
That's the actual HTML.

02:58.470 --> 03:03.150
And then since let's check out how this page actually looks like.

03:03.150 --> 03:09.060
Well you can see here there is just one div element on this page.

03:09.060 --> 03:15.450
It doesn't have any class or ID, but we can simply just select a div element on this page.

03:15.450 --> 03:18.330
And we're going to get the job description out from that.

03:20.520 --> 03:21.480
So let's do that.

03:21.480 --> 03:23.910
We can just say const description.

03:25.570 --> 03:31.330
Use our stereo selector and say, just get the deer element on this page and get me the text of it.

03:31.630 --> 03:34.990
Then we can do a console log of the description.

03:36.780 --> 03:38.940
And see what we get out from this.

03:39.300 --> 03:42.960
Make sure to call the scrape job descriptions down here.

03:42.990 --> 03:48.870
Let's call it with the all jobs array we got from our previous function we just wrote.

03:51.690 --> 03:58.200
Okay, let's try and run node index.js and see if we get some job descriptions in the console.

04:02.140 --> 04:03.340
And there you go.

04:03.340 --> 04:09.490
You can see that we get a lot of descriptions inside of the console, which is awesome.

04:10.700 --> 04:16.880
Now in the next section we're going to also combine it into this data object we are passing around so

04:16.880 --> 04:23.450
that we get just one object data object with both the job URLs titles and the description.
