WEBVTT

00:00.320 --> 00:05.540
So now it's time to finally go and write our scraping code inside of NodeJS.

00:05.570 --> 00:10.490
After we found out what CSS selector to use inside the Chrome browser.

00:11.120 --> 00:17.060
So now let's first of all, make sure that you're using the URL where all the listings are.

00:17.090 --> 00:19.430
So this URL I have right here.

00:19.880 --> 00:22.910
And let's use that instead of Google.com.

00:24.750 --> 00:30.240
The first thing we need to do is to get the page HTML from Puppeteer.

00:30.240 --> 00:31.350
So how do we do that?

00:31.350 --> 00:33.570
Well, let me show you first.

00:33.570 --> 00:37.980
Let me let me just make this code window a little bigger.

00:38.490 --> 00:43.320
So we're going to say we're going to say just const HTML.

00:44.380 --> 00:47.410
Await page dot content.

00:48.210 --> 00:48.910
That's it.

00:48.930 --> 00:53.940
That's how you get the HTML back from a page using puppeteer.

00:54.270 --> 01:00.360
Now the next thing we need to do is to somehow be able to select the elements or the data that we need,

01:00.360 --> 01:03.660
just like we did inside the console in the browser.

01:05.100 --> 01:12.390
Now, remember, in the beginning of this section, we started importing Puppeteer and a package called

01:12.390 --> 01:13.260
Cerio.

01:13.440 --> 01:19.170
So Cerio is going to enable us to select elements from an HTML page.

01:20.000 --> 01:23.780
So let's import Cheerio up here in the top of the file.

01:23.780 --> 01:27.590
Let's say const cheerio require.

01:27.830 --> 01:28.910
Cheerio.

01:30.010 --> 01:37.600
And then down here again, let's say const, and then we have the dollar sign, just like we had the

01:37.600 --> 01:40.150
dollar sign inside of the Chrome browser.

01:41.120 --> 01:48.440
And then we say Cheerio load, and then we pass in the HTML from the side.

01:49.450 --> 01:56.770
And then cheerio is going to pass or load the HTML that we have from the page and it's going to enable

01:56.770 --> 02:00.400
us to select elements just like we did inside the Chrome browser.

02:00.730 --> 02:08.380
So now we can do the magic trick here where we just take the code that we had from the console inside

02:08.380 --> 02:14.740
the Chrome browser, which was this one here where we get all the job titles.

02:15.160 --> 02:20.200
So I'm just going to select that and copy and paste it in here.

02:21.240 --> 02:29.040
So now we have this code right here and it's going to console log all of the job titles.

02:30.600 --> 02:37.530
And notice how it is using the same dollar sign variable just like we do inside of the browser.

02:37.530 --> 02:45.360
We just created our own dollar sign variable for the jQuery selector using Chario inside of NodeJS.

02:46.750 --> 02:48.310
So now enough talking.

02:48.310 --> 02:54.430
Let's try and run the code and see if it actually prints out all of the job titles, just like it did

02:54.430 --> 02:56.410
inside the Chrome console.

02:56.740 --> 02:58.990
So Node Index.js.

02:59.020 --> 03:00.760
Let's see what happens.

03:03.340 --> 03:06.130
So now it's starting up the browser here.

03:07.580 --> 03:13.910
And it looks like it loaded the page now and then inside the terminal console, I can see all of the

03:13.910 --> 03:14.810
job titles.

03:14.810 --> 03:18.890
So I hope you got to the same point as me right now.

03:18.890 --> 03:22.970
And you're well on your way to becoming a scraping ninja.

03:23.570 --> 03:27.650
Now, of course, we need to get some more properties now.

03:27.650 --> 03:32.120
We also need to get the date it was posted and the job description URL.

03:32.300 --> 03:35.540
So that's what we're going to be looking at in the next section.
