WEBVTT

00:01.200 --> 00:03.960
So what if you couldn't get your request working?

00:03.960 --> 00:10.140
When you're looking inside of Chrome developer tools, you couldn't figure out how to get the login

00:10.140 --> 00:15.660
request or you couldn't figure out how to replicate it inside of Postman.

00:16.320 --> 00:17.190
Well.

00:17.990 --> 00:21.320
There is a fix for that and that is to use puppeteer.

00:21.350 --> 00:29.360
Now, it is going to be the last option I would use because Puppeteer launches a browser, a Chrome

00:29.360 --> 00:36.020
browser and automates it, which uses a lot more resources than requests and is a lot more prone to

00:36.020 --> 00:36.950
crashing.

00:37.130 --> 00:44.060
It is a bit more heavy on resources, so if you need to run it somewhere, it's going to take more resources.

00:44.240 --> 00:47.510
So it's going to be the last option I would use.

00:48.380 --> 00:50.540
But it is a viable option to use.

00:50.540 --> 00:54.830
And I'm going to show you how to do it if you can't make it work otherwise.

00:56.410 --> 01:00.310
So first, let's go ahead inside of Visual Studio code.

01:00.310 --> 01:01.840
Make a new directory.

01:01.870 --> 01:03.880
Let's call it Craigslist.

01:04.910 --> 01:06.080
Puppeteer.

01:07.460 --> 01:10.550
And then initialize NPM.

01:14.610 --> 01:15.690
An em dash.

01:15.690 --> 01:16.140
Dash?

01:16.170 --> 01:16.890
Yes.

01:17.430 --> 01:22.440
Then let's go ahead and open the project inside of Visual Studio Code.

01:24.780 --> 01:27.030
Then let's go inside the terminal again.

01:28.360 --> 01:32.080
And this time, let's say yarn add puppeteer.

01:36.960 --> 01:38.670
And it might take some time for you.

01:38.670 --> 01:44.850
If it's if it's the first time you're downloading puppeteer It's a quite a big package because it's

01:44.850 --> 01:47.430
getting all of the Chrome browser, basically.

01:48.610 --> 01:54.250
Now, once that is done, let's go ahead and make an index.js file.

01:55.610 --> 01:57.860
And let's import puppeteer.

02:05.600 --> 02:11.150
And then let's make a main function, as we always do.

02:11.890 --> 02:14.530
And call it down here.

02:16.450 --> 02:20.830
Let's also put a try catch clause around it just for good measure.

02:28.430 --> 02:29.090
Okay.

02:29.420 --> 02:35.760
Now, first thing we need to do is to make the browser instantiate the browser inside of puppeteer.

02:35.780 --> 02:38.150
So we go const browser.

02:39.140 --> 02:42.830
I wait for it to launch.

02:43.100 --> 02:48.830
And then let's put in the headers and set that to false so we can see the browser.

02:49.010 --> 02:54.290
If you're not, if you're deploying the browser on some server somewhere, then you probably want to

02:54.290 --> 02:59.810
set this to false or just keep it by default by removing the parameter.

03:00.140 --> 03:04.550
But while we are debugging, it's really nice to be able to see the browser.

03:05.350 --> 03:10.360
So let's go ahead and type in Node Index.js and see what happens.

03:12.220 --> 03:16.220
So here you can see it launched a new Chrome browser window.

03:16.240 --> 03:20.830
It says Chrome is being controlled by automated test server test software.

03:21.340 --> 03:25.090
So that is how far we got so far.

03:25.990 --> 03:28.390
Now let's make a new page.

03:28.390 --> 03:34.510
So we say const page await and then browser dot new page.

03:36.150 --> 03:41.160
And then we simply say go to a specific URL.

03:41.550 --> 03:48.810
So wait, page go to and then we paste in the URL of this Craigslist login site.

03:48.840 --> 03:50.340
So I'm going to log out.

03:51.750 --> 03:55.290
And then we have this URL here with the form to log in.

03:57.390 --> 04:02.160
Then let's paste in the URL here and let's see what we got.

04:02.190 --> 04:04.710
Now, so let's start it up again.

04:07.030 --> 04:14.440
And now you can see it opens up the browser and goes to a new tab and it starts to go to the site where

04:14.440 --> 04:15.460
we can log in.

04:16.880 --> 04:23.120
Now what we need to do now is to type in our email and our password, and then we need to press the

04:23.120 --> 04:24.140
login button.

04:24.230 --> 04:26.810
So how do you do that inside of Puppeteer?

04:26.840 --> 04:29.060
Well, it is quite easy.

04:30.760 --> 04:39.430
We simply say await page, dot type, and then we need to put in the CSS selector for this element that

04:39.430 --> 04:41.080
we want to type into.

04:42.480 --> 04:45.510
So if you open up chrome developer tools.

04:46.840 --> 04:49.750
And you use the select element over here.

04:50.530 --> 04:56.380
Point on the form input element you want to input on.

04:56.830 --> 05:04.450
For example, this one you can see it's a type of input and it has an element ID of input email handle.

05:04.720 --> 05:12.610
And you could also see when I hover my mouse over the CSS selector is right up there in the upper right,

05:12.640 --> 05:18.790
upper left corner saying input with the hashtag input email handle.

05:18.970 --> 05:22.510
So let's use that as our CSS selector.

05:24.130 --> 05:30.970
So then we say input, hashtag input, email handle.

05:31.240 --> 05:39.910
Let's check if yeah, that was correctly spelled and then put a comma and then you type in the text

05:39.910 --> 05:41.590
that you want to type in.

05:44.940 --> 05:47.310
Now let's see how that works.

05:53.580 --> 05:54.480
So there we go.

05:54.480 --> 06:00.840
So now it went to the side and it also typed in my email in the login form.

06:01.530 --> 06:06.990
You can also pass another optional parameter or option with a delay.

06:07.020 --> 06:14.010
So if you want to have a delay on, say, 20 milliseconds or 200 milliseconds for some reason, then

06:14.010 --> 06:15.020
you can do that.

06:15.030 --> 06:17.160
But for now it works without.

06:17.160 --> 06:18.810
So let's keep it simple.

06:20.860 --> 06:23.140
Then let's say we also need the password.

06:23.140 --> 06:28.800
So we need to find out what is the CSS selector for this password.

06:28.810 --> 06:36.670
So again, we hover the selector over here and I can see that it's input, hashtag input password.

06:38.940 --> 06:40.650
So we say, wait.

06:41.330 --> 06:43.790
Type input.

06:44.830 --> 06:46.780
Input password.

06:47.470 --> 06:50.230
And then I paste in my password.

06:51.850 --> 06:53.710
Which I CAN'trillionEMEMBER in my head.

06:53.710 --> 06:54.430
So.

06:55.270 --> 06:55.840
Oops.

06:56.640 --> 06:57.810
Just like that.

07:00.320 --> 07:05.120
And yeah, so let's make another test and see if it types in the password.

07:07.270 --> 07:08.230
And it does.

07:08.230 --> 07:13.000
So now we just need to click on the login button.

07:13.270 --> 07:19.210
And again, it's very similar to when you use the type, you just say click instead of type.

07:19.210 --> 07:22.090
So page, dot, click.

07:22.750 --> 07:27.490
And then put it in the CSS selector of the element that you want to click.

07:28.700 --> 07:31.100
Then you're going to Chrome Developer Tools.

07:31.130 --> 07:33.470
Find out the CSS selector.

07:34.350 --> 07:35.980
I can see that.

07:36.000 --> 07:37.230
Let's move this away.

07:37.230 --> 07:46.710
Here we can see this is button with an ID of login and the class is account form button.

07:46.830 --> 07:52.830
But let's try and just say button hashtag login with the login is the ID.

07:54.770 --> 07:57.440
Which you can also see if you.

07:58.630 --> 07:59.980
Look in here.

08:00.010 --> 08:01.450
Here is the element.

08:01.480 --> 08:04.210
There's the ID and there's the class.

08:04.300 --> 08:09.970
Let's just use the element and ID That should be enough to select this element.

08:10.510 --> 08:13.540
So the element here and the ID.

08:14.970 --> 08:22.440
And again, you can also set a delay here if you want to do that, but we are not going to use it because

08:22.440 --> 08:24.690
it's not necessary in this situation.

08:25.380 --> 08:31.710
For some sites, maybe you want to make your your scraping look a little more human like and you want

08:31.710 --> 08:33.480
to put in some random delays.

08:33.480 --> 08:34.770
That could happen.

08:34.860 --> 08:37.860
But for this site, it works fine without.

08:41.260 --> 08:42.370
So there we go.

08:42.370 --> 08:50.830
And now it is logged into my account and it went into my account page where I can see my my posting

08:50.830 --> 08:51.760
in here.

08:51.940 --> 08:59.140
And we also need to go and we need to go to the billing page, which is our objective in this scraping

08:59.140 --> 08:59.950
project.

09:00.400 --> 09:05.500
And so then we need to log in and then go to the billing page.

09:05.500 --> 09:07.390
So let's see how we can do that.

09:08.730 --> 09:18.810
So obviously the most natural thing you might want to do is to say, await page, dot, go to, and

09:18.810 --> 09:22.200
then let's put in the URL we have up here.

09:22.930 --> 09:24.370
For the billing page.

09:24.990 --> 09:29.640
So we go ahead and do that and let's see what happens.

09:36.040 --> 09:39.850
So as you can see, nothing happens.

09:40.120 --> 09:42.280
We're not being redirected to the bidding page.

09:42.280 --> 09:43.780
We're not even logged in.

09:44.080 --> 09:49.150
And if you were really fast, you can see that it typed in my login information.

09:49.180 --> 09:55.780
It clicked on the login, but then it went straight to this bidding page instead of waiting for the

09:55.780 --> 09:57.760
login to actually finish.

09:58.390 --> 09:59.830
That's what a human would do.

09:59.860 --> 10:06.400
We would wait until the login has finished and then we would click on the billing page to go to the

10:06.400 --> 10:07.450
billing page.

10:08.800 --> 10:10.570
So how do we do that?

10:10.570 --> 10:15.130
Or how do we make Puppeteer know to do that as well?

10:15.640 --> 10:21.970
The way is to say await page dot, wait for navigation.

10:23.100 --> 10:23.990
And that's it.

10:24.000 --> 10:29.100
Now, it's going to wait until this redirect has happened and the page is loaded.

10:29.940 --> 10:32.520
So let's see what happens now.

10:37.730 --> 10:38.570
And there we go.

10:38.570 --> 10:42.740
So it was quite fast, but it did exactly what we wanted to.

10:42.770 --> 10:49.730
It went and locked in and then it went on to the billing page was once the navigation has finished.

10:50.610 --> 10:56.730
And now if we want to get this value here, then we can use Cheerio instead.

10:57.610 --> 11:02.530
So let me just stop this one here and then let's say yarn add.

11:02.530 --> 11:03.490
Cheerio.

11:05.840 --> 11:08.420
Then let's import Cherry up here.

11:12.660 --> 11:17.580
And now we can say const content.

11:18.630 --> 11:21.870
Await page content.

11:22.680 --> 11:29.040
And then we can say const jQuery sign or dollar sign await.

11:29.160 --> 11:30.280
Cheerio.

11:30.330 --> 11:32.250
Load content.

11:34.110 --> 11:36.060
And let's find out.

11:37.560 --> 11:42.330
Then let's find out what the CSS selector is for this building.

11:43.120 --> 11:48.130
So if you go and click and select this element here, let's say that's what we want to scrape.

11:48.750 --> 11:51.960
We can try and say right click here.

11:51.990 --> 11:53.030
Say copy.

11:53.040 --> 11:54.390
Copy selector.

11:55.560 --> 11:57.510
Let's see how that selector looks like.

11:57.510 --> 11:58.550
It looks okay.

11:58.560 --> 12:02.460
Reasonable if I try it out in the console here.

12:02.490 --> 12:03.630
Dot text.

12:04.660 --> 12:08.260
I can see that we indeed get the text that I like to have.

12:08.940 --> 12:14.160
So let's just copy that selector and use it inside of our Cheerio.

12:15.160 --> 12:17.440
And then we can say that text as well.

12:17.830 --> 12:22.750
And let's just say console log so we can see if we are successful.

12:24.240 --> 12:27.270
Now let's try and run it again and see if it works.

12:35.320 --> 12:42.880
Now let's check our console log and we can see no paid posting accounts exist, which means that it

12:42.880 --> 12:45.640
indeed did get this information here.

12:47.830 --> 12:49.100
So that is it, people.

12:49.120 --> 12:58.030
That is how you can add as a LAStrillionESORT as sort of using a big hammer to put in the nail in the

12:58.030 --> 12:59.660
wood, so to say.

12:59.680 --> 13:07.870
That is the last option you can use if you can't figure out how to use Postman or request NodeJS to

13:07.870 --> 13:08.440
log in.

13:08.470 --> 13:13.720
Now, as I said before, it does use more resources than requests in NodeJS.

13:13.960 --> 13:16.030
It is a bit more prone to crashing.

13:16.030 --> 13:20.380
If you're loading up a full page, your browser can crash.

13:21.120 --> 13:23.730
But it is a perfectly viable solution.

13:23.730 --> 13:27.060
If you can't figure out how to do the login request.

13:27.060 --> 13:34.050
Some sites are just really weird or you can't figure out how you can get this request working inside

13:34.050 --> 13:35.100
of Postman.

13:35.280 --> 13:41.140
And then at some point you can simply say, okay, I'm just going to do this inside of Puppeteer instead.

13:41.160 --> 13:48.960
That is a perfectly fine solution and I hope you can use this to login in and scrape any other sites

13:48.960 --> 13:50.760
you may have in the future.

13:51.210 --> 13:52.500
So that is it, people.

13:52.500 --> 13:56.190
If you have any other suggestions or questions, let me know.
