WEBVTT

00:00.350 --> 00:04.280
A brief recap inside of the chops with headers.

00:04.280 --> 00:08.210
That map we're returning a promise and await promise.

00:08.210 --> 00:11.750
Dot All is waiting for all of these promises to resolve.

00:11.750 --> 00:17.930
And again, this returns a complete array of all the chops with their description.

00:19.620 --> 00:22.010
We use return await.

00:22.570 --> 00:26.790
And this is going to return the job description.

00:27.100 --> 00:28.860
All the jobs with their description.

00:28.870 --> 00:33.610
Let's try and console log all of the data, see what we're getting.

00:39.290 --> 00:42.560
And it's going to take more time.

00:43.400 --> 00:48.530
We can see that there's a undefined on this array.

00:48.530 --> 00:50.960
So I'm clearly doing something wrong here.

00:51.170 --> 00:57.770
And I think I need to return also the object every time, like so.

01:02.010 --> 01:05.040
And I will try and run it again and.

01:07.150 --> 01:14.650
This time we can see an array full of objects containing the description and all of the other properties

01:14.650 --> 01:16.150
that we scraped before.

01:39.500 --> 01:44.810
Now let's try and get the compensation for each job also.

01:44.810 --> 01:54.170
So we go again and select the element and we can see we have this just a span element without any class

01:54.170 --> 02:00.500
or ID, so but it's inside this attribute group class.

02:01.160 --> 02:06.860
And so let's try and select the element with the class attribute group instead.

02:07.130 --> 02:10.370
And so if we can get the compensation out from that.

02:12.320 --> 02:20.510
So I could use, for example, the select the class attribute group and then select the first child

02:20.510 --> 02:22.310
of this element.

02:22.340 --> 02:23.810
Let's try and do that.

02:24.350 --> 02:28.460
So attribute group and text.

02:28.610 --> 02:33.650
We can see there's compensation and the employment type, but we just want the compensation.

02:34.040 --> 02:36.770
So let's select the first child.

02:41.170 --> 02:45.670
So and then we get the compensation $23 per hour.

02:45.700 --> 02:47.410
That's exactly what we want.

02:48.430 --> 02:55.790
I think I want to clean this up a bit and maybe remove the compensation in front and just have the 23

02:55.810 --> 02:57.550
slash hours instead.

02:58.760 --> 03:00.440
Let's make a variable first.

03:00.460 --> 03:01.480
Let's try it out.

03:01.480 --> 03:06.670
Compensation and replace compensation.

03:07.980 --> 03:09.870
With just an empty string.

03:13.620 --> 03:16.980
And this way our data is a bit more clean, I think.

03:16.980 --> 03:20.700
So let's do that inside of a NodeJS project as well.

03:29.900 --> 03:34.580
First, let's get the raw compensation data.

03:37.390 --> 03:39.850
I think I will make this into a variable first.

03:49.640 --> 03:50.450
Paste that in.

03:53.200 --> 04:01.240
And then let's get the more clean version of the text without the compensation in front where you use

04:01.240 --> 04:08.860
the replace to replace the compensation with a empty string like so.

04:09.790 --> 04:14.410
And then we have the data nicely formatted on chopped compensation.

04:16.130 --> 04:18.110
Let's try and see how the code looks.

04:23.960 --> 04:28.940
And we can see we have the compensation here, hourly rate.

04:29.270 --> 04:34.610
Some of the some of the jobs don't have a compensation listed.

04:34.610 --> 04:37.130
But I think it's an interesting property to have.

04:37.130 --> 04:45.020
At least then let's put it inside a try catch clause to catch any errors we might get.

04:47.580 --> 04:53.250
I will just output the error to console log for now or console dot error.

04:56.960 --> 04:58.070
And that's it.

04:59.860 --> 05:06.820
Now we got a good base of building a scraper, at least for Craigslist.

05:06.820 --> 05:12.160
And there's probably a lot of other sites you can use these techniques techniques on.

05:12.940 --> 05:16.630
And let's try and see how many jobs we got.

05:16.660 --> 05:20.350
We should have 120 like there should be on the page.

05:22.200 --> 05:24.840
So you can see 120 jobs.

05:24.840 --> 05:27.270
So that's exactly the amount we need.

05:29.090 --> 05:32.960
Now you can keep building on this project.

05:33.080 --> 05:34.910
You can make it pay.

05:35.330 --> 05:39.400
Scrape the sites or pages on this as well.

05:39.410 --> 05:42.800
We are only scraping one site or one page right now.

05:42.920 --> 05:48.440
There is multiple pages, but you can also build it up so it scrapes all of the pages there is of the

05:48.440 --> 05:49.160
jobs.

05:50.370 --> 05:54.870
You can also build a rest API based on this scraper.

05:54.870 --> 06:02.100
So if you go for to a rest endpoint, then bam, you get all of the data like we get in the console

06:02.100 --> 06:09.150
here and you could present it inside of a client or just a basic HTML view.

06:09.180 --> 06:10.350
It's up to you.

06:11.440 --> 06:17.950
You could also just save all of the data to a file if that's what your client or what you want, like

06:17.950 --> 06:20.890
a CSV file, and that's up to you.
