WEBVTT

00:00.910 --> 00:06.820
Okay, so now we have all the HTML of the Craigslist page inside of this string variable, but we also

00:06.820 --> 00:11.260
need to be able to select different elements and their values that we want.

00:11.960 --> 00:17.410
And of course, you could use regular expressions maybe, but it's a lot more easy and efficient to

00:17.410 --> 00:21.730
use something like jQuery or in NodeJS case.

00:21.760 --> 00:22.840
Cheerio.

00:24.660 --> 00:33.210
So you can see we use the dollar sign to define the chario loaded page, which is going to enable us

00:33.210 --> 00:37.830
to select all the different elements using CSS selectors.

00:39.290 --> 00:43.820
So let's start out by selecting all the titles of the jobs.

00:43.820 --> 00:46.910
So let's open the Chrome developer tools.

00:46.910 --> 00:55.640
We use Ctrl, Shift I to go into the tools and you can also go inside of the menu inside Google Chrome,

00:55.640 --> 01:00.080
say more tools, Developer tools to access the developer tools.

01:01.820 --> 01:08.450
And then we can select all of the different elements with the icon up here and we can click on them

01:08.450 --> 01:16.280
and then we get inside the Elements menu and see how the HTML look at this specific element.

01:18.030 --> 01:24.790
Well, for example, right here we have the title for the autonomous Technical Autonomous vehicle trainer.

01:25.030 --> 01:31.690
We have a result info class inside of this unordered list with the class rows.

01:32.110 --> 01:43.030
And so each job title and the date and so on is inside this result row or list item which also has the

01:43.390 --> 01:44.860
class result info.

01:45.070 --> 01:52.270
Now inside this p class result info we have all the information we want in the first page.

01:54.040 --> 02:00.040
Let's try first and just define a sample object with all the information that we want to scrape from

02:00.040 --> 02:03.550
this list, just so we get an overview of what we want to get.

02:04.320 --> 02:12.390
So I'm going to call it Scrape with Salt and have a title of the job, for example, technical autonomous

02:12.390 --> 02:13.620
vehicle trainer.

02:13.980 --> 02:20.610
And we're also going to have a description which is a longer text when you click on the link of the

02:20.610 --> 02:27.420
job where there's a long description about the job, the requirements and who they are and so on.

02:28.190 --> 02:30.080
It's all this sex down here.

02:31.190 --> 02:34.510
And then I think we also want the.

02:34.550 --> 02:35.570
Let's see.

02:37.040 --> 02:38.150
Let's see what we should get.

02:38.180 --> 02:40.430
We should get the date and was posted.

02:40.430 --> 02:42.500
Could be interesting to get also.

02:43.730 --> 02:47.300
So I will make that as a JavaScript object.

02:47.660 --> 02:49.730
I mean a JavaScript date object.

02:55.310 --> 03:03.860
Then I think we could also get the URL for the job so you know where you can read the full description

03:03.860 --> 03:05.140
of the job and so on.

03:05.180 --> 03:07.280
See the location, etcetera.

03:10.390 --> 03:14.350
And that is what you can see when you click on the URL in here.

03:14.350 --> 03:15.970
So that's on the first page.

03:16.060 --> 03:21.190
And let's also get the neighborhood that the job is inside or the hood.

03:21.640 --> 03:29.860
And then I think it could also be interesting maybe to get the address of the job.

03:30.880 --> 03:33.940
See the actress over here on the right side.

03:37.310 --> 03:38.330
Main Street.

03:40.540 --> 03:43.630
And I think we also want to get the compensation.

03:46.120 --> 03:48.700
Interesting data to get for all of the jobs.

03:48.730 --> 03:50.530
See which one pays the most.

03:51.370 --> 03:56.530
And of course, remember, this is just a sample so you can see what we're actually going to get.

03:56.860 --> 04:01.330
We're going to get this for every job on this page inside of Craigslist.
