WEBVTT

00:00.380 --> 00:02.930
Okay, so now enough talking.

00:02.930 --> 00:07.820
It's time to actually start trying to get some data from the website.

00:07.820 --> 00:13.640
So we let's try and get the titles of the job first.

00:14.340 --> 00:21.570
So instead of going into NodeJS and writing out our CSS selectors there and testing them out, it's

00:21.570 --> 00:27.930
a lot easier to do it inside the Chrome browser first and test out if you can get the data you need

00:27.960 --> 00:33.990
because if you do it inside Node JS first, then you have to start up the chromium browser every time

00:33.990 --> 00:36.060
and then you go, okay, that didn't work.

00:36.090 --> 00:42.990
Then you have to close down the browser and write some code and try and start up the browser again.

00:42.990 --> 00:45.060
And it just takes longer time.

00:45.060 --> 00:53.220
So let me show you how I debug or find out what CSS selectors to use directly inside the browser.

00:53.550 --> 01:00.840
So inside the page here with all the listings, let's go and press F12 to open up the Chrome developer

01:00.840 --> 01:01.620
tools.

01:01.950 --> 01:09.810
And then in here let's press the expect tool to select elements on a page and see exactly where in the

01:09.810 --> 01:11.560
page HTML it.

01:11.580 --> 01:13.410
The element is located.

01:13.590 --> 01:20.170
So now I have the selector enabled and I can click on the title here.

01:20.170 --> 01:27.100
So if I click on that, I now get redirected to exactly where in the HTML the element is.

01:27.400 --> 01:35.290
So now I can, now I can inspect the element and see how is it made up, what kind of attribute or classes

01:35.290 --> 01:36.370
does it use?

01:38.310 --> 01:45.650
So I can see here that the text content of this a HTML element is the job title.

01:45.660 --> 01:52.650
And then there's a href attribute which is containing the link to the job description.

01:52.680 --> 01:59.130
Then there's a data ID property, probably something that Craigslist is using internally.

01:59.130 --> 02:03.030
When they are generating the page, it's not relevant to us right now.

02:03.420 --> 02:12.990
And then there is a class attribute containing the result title and the A shorthand header link text

02:12.990 --> 02:13.530
here.

02:13.650 --> 02:19.380
So what we can do now is that we can select this element using the CSS class selector.

02:19.620 --> 02:27.960
So we can use the result title here and try to select all the elements with this CSS class.

02:29.130 --> 02:33.450
And using this, we can then extract all the job titles from this website.

02:33.630 --> 02:41.010
So let's go ahead and I'm going to copy this result title CSS class here, just Ctrl C, just so we

02:41.010 --> 02:42.900
have it in the clipboard here.

02:44.370 --> 02:48.000
So I'm going to write out the jQuery CSS selector now.

02:48.000 --> 02:55.590
So that's going to be a dollar sign and then a parenthesis, then a quote, and then I will put a dot

02:55.620 --> 02:58.350
in front and then the result title.

02:58.710 --> 03:09.300
Now the dot in front is because it is a CSS class, If it was a CSS or an ID, it would be a hashtag.

03:09.420 --> 03:12.210
It was just an element, like an a element.

03:12.210 --> 03:19.530
It would just be an A, But now because we're trying to target a CSS class, all these titles have the

03:19.530 --> 03:22.010
CSS class of result title.

03:22.020 --> 03:23.940
Then we have the dot in front.

03:24.930 --> 03:31.500
And then we can say dot and then we are going to iterate through each of these elements and do something.

03:31.590 --> 03:38.070
So we can say dot each and this each loop have a two arguments.

03:38.100 --> 03:46.110
It has an index which is just the index where the element is in the array, and then we have the element

03:46.110 --> 03:47.610
that is going to return.

03:48.610 --> 03:56.050
So the element is just the well, you can call it jQuery element that you can then extract data from.

03:56.690 --> 04:03.320
So we are going through each of these result titles and then we can say console log.

04:03.710 --> 04:09.710
Then dollar sign again and then pass in the element instead and say dot text.

04:12.750 --> 04:19.620
Then close all of the parenthesis and then when I press this, I will see all of the titles here.

04:23.130 --> 04:26.100
So that's all of the job titles right there for us.

04:28.640 --> 04:35.700
And now after fiddling around and we found out what kind of CSS selector we can use to get this property.

04:35.720 --> 04:37.100
The job titles.

04:37.100 --> 04:43.220
We're now going to move on to the NodeJS section where we put it inside of NodeJS and actually use it

04:43.220 --> 04:45.860
to scrape the data using NodeJS.

04:45.950 --> 04:47.780
So see you in the next section.
