WEBVTT

00:00.440 --> 00:07.610
So now let's move on to Visual Studio code or whatever editor you prefer to use, and let's start making

00:07.610 --> 00:10.640
our project with NodeJS and request.

00:10.820 --> 00:13.100
First, let's initialize a project.

00:13.100 --> 00:16.880
So let's go ahead and make a directory first.

00:17.090 --> 00:21.380
Let's call it Craigslist of scraper.

00:22.150 --> 00:32.830
Let's go inside the directory and initialize NPM so we can download some packages with npm init dash

00:32.830 --> 00:33.760
dash js.

00:34.630 --> 00:38.650
Then let's open that directory inside of visual studio code.

00:41.150 --> 00:45.470
And let's go and say Craigslist or scraper.

00:47.990 --> 00:52.910
Okay, so now we have the off script opened inside Visual Studio code.

00:53.330 --> 00:57.020
Let's go ahead and make an index.js file.

00:57.930 --> 01:06.090
And then let's let's also open our terminal so we can import some packages because we're going to need.

01:06.970 --> 01:08.640
So we say yarn add.

01:08.860 --> 01:10.660
We're going to need request.

01:11.570 --> 01:12.560
And request.

01:12.560 --> 01:13.310
Promise.

01:14.620 --> 01:15.580
That's it.

01:17.830 --> 01:21.430
So now let's import request up here.

01:21.790 --> 01:25.600
So require request promise.

01:25.930 --> 01:31.060
All of this is basically something you already know from the previous Craigslist tutorials.

01:31.900 --> 01:38.380
And now let's say we have a async function main.

01:39.760 --> 01:42.370
And then let's run Main down here.

01:44.140 --> 01:46.270
Now let's try and make our request.

01:46.270 --> 01:55.330
So const HTML await request and then we say post because we are making a post request.

01:56.670 --> 02:03.480
And then we paste in the URL that we found from the Chrome developer tab and used inside of Postman

02:03.900 --> 02:05.550
and paste it in here.

02:08.300 --> 02:12.560
And then let's also put this inside of a try catch clause.

02:12.560 --> 02:15.140
So any errors get logged?

02:18.790 --> 02:22.770
Catch error and say console error.

02:22.780 --> 02:23.380
Error.

02:25.630 --> 02:30.250
Now we need some way to pass in this form data.

02:30.700 --> 02:38.380
If we look inside of the request promise documentation, if you scroll down a bit, look around, you

02:38.380 --> 02:46.270
can see there are some sections about how to post data, and there's also one about how to post like

02:46.480 --> 02:47.710
HTML forms too.

02:48.160 --> 02:56.020
And the way they do that is you simply set this form object with keys and values and they're going to

02:56.020 --> 03:02.170
set the header of content type to be this zvb form URL encoded.

03:03.280 --> 03:05.680
So that's basically what we need to do.

03:06.620 --> 03:09.110
So let's put a little comma here.

03:09.410 --> 03:11.840
Curly braces and then write form.

03:12.680 --> 03:15.230
And then another curly brace.

03:15.230 --> 03:19.010
And here we pass in the key and values of this form data.

03:20.500 --> 03:21.480
So let's go ahead.

03:21.490 --> 03:24.160
Look inside of Postman inside the Body.

03:25.250 --> 03:29.270
And then let's paste the Or copy the key here.

03:29.270 --> 03:30.920
Input email handle.

03:31.660 --> 03:34.270
And paste in the value.

03:38.570 --> 03:41.000
And the same for the password input.

03:41.000 --> 03:41.840
Password.

03:45.220 --> 03:47.440
And let's copy the password.

03:49.750 --> 03:52.690
So now we have all of the form data.

03:53.510 --> 03:56.120
Next up is the headers.

03:56.140 --> 04:02.060
Remember, we had to set a referrer header in order to make our login request successful.

04:02.510 --> 04:12.200
So go ahead and copy the value of the referrer header and then make another comma here and make a headers

04:12.230 --> 04:18.410
object and put in the referrer and the value.

04:19.700 --> 04:22.670
So now we set a referrer header.

04:23.500 --> 04:25.330
Inside of our request.

04:26.590 --> 04:30.310
So that's all we needed to make it work inside of Postman.

04:30.340 --> 04:33.400
Now let's see if it works inside of request.

04:34.870 --> 04:39.690
So to see if it works, let's try and say import F.

04:39.710 --> 04:45.340
S just to save the result and check it out once it's done.

04:45.700 --> 04:52.660
So write file sync and then we can check if the login actually is working.

04:55.060 --> 04:58.750
So let's try and run Node Index.js.

05:04.360 --> 05:11.350
And now it doesn't look like it actually saved the file, which means it probably never came to this

05:11.350 --> 05:12.070
point.

05:12.250 --> 05:19.930
And it also threw out a huge response object, which means it probably threw an exception instead and

05:19.930 --> 05:21.580
did a console error.

05:22.730 --> 05:27.140
And the first thing I always try and look for is the response code.

05:27.140 --> 05:30.230
So let's look for the response code.

05:30.230 --> 05:41.240
We have status code here of 302 says found, which is actually the same as we have inside of the Chrome

05:41.240 --> 05:42.560
developer tag.

05:42.920 --> 05:45.590
I mean the chrome developer tools.

05:46.190 --> 05:47.810
Interestingly enough.

05:48.350 --> 05:56.600
It is also you to found so that it's not a bad sign that we get it though we do get a 200.

05:56.600 --> 05:57.260
Okay.

05:57.260 --> 05:59.180
Inside of postman.

05:59.930 --> 06:09.890
But let's try and work around this somehow by default request is going to treat any other response status

06:09.890 --> 06:12.760
code than to x x.

06:14.190 --> 06:17.650
2XX as an error.

06:17.670 --> 06:24.630
So if you get something like 3 or 2 like we did just now, it's going to treat it as an error and throw

06:24.630 --> 06:25.620
an exception.

06:25.620 --> 06:31.520
So therefore, we never get to this point where we can check out the actual response HTML.

06:33.570 --> 06:38.850
So in order to disable this check for if you have a.

06:40.070 --> 06:42.380
Well, if it's going to throw an exception.

06:43.570 --> 06:50.560
We need to make another option that is called simple and set it to false.

06:52.040 --> 06:55.580
You can read up about this also in their documentation.

06:55.940 --> 06:59.570
If you try and search a little for.

07:01.530 --> 07:04.320
And let's try and search well.

07:04.620 --> 07:12.210
It also says up here almost at the top by default, Http response codes other than to x will cause the

07:12.210 --> 07:13.770
promise to be rejected.

07:14.190 --> 07:19.740
This can be overridden by setting the options symbol to false.

07:20.350 --> 07:22.390
So that's simply what we are doing.

07:22.390 --> 07:24.220
We are setting the simple to be false.

07:24.220 --> 07:29.380
So it's going to not treat this 302 as an error and throw an exception.

07:29.740 --> 07:33.670
Now let's try and see how the response looks now.

07:36.410 --> 07:42.740
So now you can see it did go further because it did not throw an exception and it started writing the

07:42.740 --> 07:44.160
log in HTML.

07:45.120 --> 07:49.320
So as you can see, it is disappointingly empty.

07:49.350 --> 07:51.150
This login.html.

07:52.480 --> 08:01.960
Now, another thing that could be causing this empty login HTML is that we are not following the redirects

08:02.260 --> 08:04.840
that the server is trying to make us do.

08:05.490 --> 08:12.570
By default, postman is always going to follow redirects and of course, the same inside of Chrome.

08:13.090 --> 08:18.280
But by default we request does not follow all redirects.

08:19.060 --> 08:22.810
Let's try and search for redirect in here.

08:24.750 --> 08:33.020
And there's no no results for redirect inside of request promise documentation.

08:33.030 --> 08:39.900
But if we look inside, just request documentation, which is what request promise is based on.

08:40.690 --> 08:43.720
And try to search for redirect.

08:45.150 --> 08:49.710
It says it's follow follows redirects by default.

08:49.890 --> 08:52.440
But there is some kind of.

08:52.990 --> 08:54.430
Catch this.

08:54.850 --> 09:00.370
It is only Http 300 responses that it false redirects for.

09:01.370 --> 09:03.020
So well.

09:03.020 --> 09:12.320
Even though we get a 302 response it I'm just going to try and set follow all redirects to true anyway,

09:12.350 --> 09:13.400
see if it works.

09:13.400 --> 09:16.160
It could be that it makes a difference.

09:21.650 --> 09:25.790
Let's try and see again how index HTML looks like.

09:27.860 --> 09:34.970
So now we actually get some kind of HTML by setting the symbol to false.

09:34.970 --> 09:40.340
So we moved on to writing the file and setting follow or redirect to True.

09:40.370 --> 09:42.890
We also got a file being written.

09:44.260 --> 09:47.110
But it doesn't look like we are actually locked in yet.

09:47.140 --> 09:50.260
It still says Craigslist account login.

09:51.760 --> 09:59.380
And if I go down and look, I can still see it still looks like it is asking us to log in.

09:59.380 --> 10:03.160
It's not the actual account page I'm seeing here.

10:04.060 --> 10:06.280
There's also something about creating account.

10:06.280 --> 10:08.950
This is just a login page we're seeing here.

10:10.880 --> 10:16.040
One last option that you can do is to enable the cookie jar.

10:16.310 --> 10:23.210
So cookie jar, by that I mean the storage of all your cookies.

10:24.700 --> 10:31.070
Postman is saving all of the cookies that the request wants you to save.

10:31.090 --> 10:33.040
So it's going to save those.

10:33.550 --> 10:36.770
And the same for Chrome developer tools as well.

10:36.800 --> 10:38.350
I mean Chrome browser.

10:40.360 --> 10:46.270
So let's try and set saving cookie jar or saving cookies to be true.

10:46.450 --> 10:49.930
You do that by simply saying jar True.

10:52.070 --> 10:55.520
So let's say Node Index.js again.

10:57.030 --> 10:59.370
And let's look at the file.

11:01.000 --> 11:06.280
And ladies and gentlemen, it looks like we are locked in now.

11:07.270 --> 11:10.210
So here you can see home of Stefan Hulshof.

11:10.870 --> 11:12.030
That is me.

11:12.040 --> 11:13.180
Logout.

11:13.840 --> 11:17.980
Postings, drafts, searches, settings, billing and so on.

11:18.280 --> 11:24.280
So that is actually the site if I try to log in so you can see it again.

11:26.440 --> 11:33.700
So here you can see the site when you're logged in home of Stephen Hudson with the menu up here.

11:33.700 --> 11:35.560
Drafts, searches, billing.

11:35.860 --> 11:40.030
Remember, our objective is to go to this site inside of billing.

11:42.040 --> 11:44.320
So how can you do that?

11:44.320 --> 11:47.800
Because you can't just do a post request on this side.

11:47.800 --> 11:48.490
Really?

11:49.330 --> 11:56.560
Maybe you could, but sometimes you have to log in first and then navigate around in the side.

11:59.220 --> 12:06.120
So how do we get on to this billing side once we are logged in inside of request?

12:06.300 --> 12:10.890
Well, it is fairly simple to do it once you got this done.

12:12.240 --> 12:19.380
The thing we need to do first is to make sure that we have this cookie jar enabled for all of our requests.

12:19.380 --> 12:26.370
So it's going to save all of the cookies for any other requests we make after this login.

12:26.940 --> 12:30.330
So go and remove this jar up here.

12:30.660 --> 12:32.070
The jar through.

12:32.580 --> 12:38.190
And then up here where we imported, let's say, dot defaults.

12:38.900 --> 12:42.200
And then in the options we say char true.

12:43.470 --> 12:49.140
So now this request object is going to save the cookie jar for any request.

12:49.140 --> 12:49.940
I mean it.

12:49.940 --> 12:53.160
It can use the cookies for any request you make.

12:53.430 --> 13:04.470
So after we log in up here, we can do another request down here where we say const billing HTML maybe

13:04.650 --> 13:16.980
and say await request dot get this time because we're just getting things and paste in the URL of this

13:17.010 --> 13:19.320
billing side we have here.

13:22.250 --> 13:28.340
And then let's try and do another file, right, to check how our response looks.

13:29.850 --> 13:31.710
See feeling dot HTML.

13:32.800 --> 13:33.310
Peeling.

13:38.230 --> 13:39.400
Just like that.

13:40.210 --> 13:44.350
And now let's try and run Node Index.js again.

13:49.400 --> 13:53.430
So now our billing dot HTML file got written.

13:53.450 --> 13:55.160
We are still logged in.

13:55.160 --> 13:56.150
I can see.

13:57.650 --> 13:59.690
And our billing.

13:59.690 --> 14:01.280
How does that look like?

14:01.310 --> 14:04.790
It says, how does it look?

14:05.420 --> 14:08.330
Says here paid posting accounts.

14:08.360 --> 14:10.580
No paid posting accounts exist.

14:10.610 --> 14:15.500
Which is exactly what there is inside of this billing page.

14:16.210 --> 14:18.880
So now we are successfully logged in.

14:18.880 --> 14:28.930
We can go into any site inside once we are logged in and that is because that the request object is

14:28.930 --> 14:32.650
saving this cookie that we get once we log in.

14:34.940 --> 14:42.590
This session cookie that I talked about earlier, which tells Craigslist that we are logged in and we

14:42.590 --> 14:43.910
are authenticated.

14:45.740 --> 14:47.030
So that is it, folks.

14:47.030 --> 14:51.830
That is how you make a login with request.

14:51.860 --> 14:59.030
How you find out where you need to log in, what data you need to pass on to log in, and how you make

14:59.030 --> 15:01.760
the request inside of NodeJS request.

15:02.390 --> 15:05.840
So the process to do all of this is to first.

15:06.610 --> 15:08.620
Look inside your network tab.

15:08.650 --> 15:10.480
Try to make a login.

15:10.780 --> 15:13.540
Find out what log.

15:13.540 --> 15:15.460
Here is the login itself.

15:15.730 --> 15:16.690
When you want.

15:16.720 --> 15:26.860
Then once you find it, try to make the same request inside of Postman until you find out what headers,

15:26.860 --> 15:31.930
body, whatever is necessary to make this request work.

15:31.960 --> 15:33.430
The login work.

15:34.200 --> 15:40.000
And then once you're done with that, the next step is to make it work inside of NodeJS request.

15:40.240 --> 15:46.180
As you could see, there is also some things you needed to change even though we had the same headers

15:46.180 --> 15:54.550
and form data because request is a little different from Postman, it's a bit more raw or simplistic

15:54.580 --> 15:55.570
you could say.

15:57.560 --> 15:59.300
So that is it, folks.

15:59.420 --> 16:03.500
I hope that you can use this for a lot of things.

16:03.500 --> 16:10.820
I have used it for a lot of things and I hope you have lots of good, productive work out of it.

16:10.820 --> 16:12.740
So thank you for watching.

16:12.740 --> 16:16.970
And in the next section, I'm going to show you also how to log in.

16:16.970 --> 16:23.270
If you can't get this to work, if you can't for some reason, figure out how to make this work inside

16:23.270 --> 16:27.170
of Postman with the Chrome developer tools.

16:27.350 --> 16:33.050
Or maybe the site is a more JavaScript site and you can't use an API.

16:33.170 --> 16:39.170
The last option to do is to use puppeteer, which I'm also going to show you how to use.
