WEBVTT

00:01.870 --> 00:08.890
So if you take a closer look at the login request inside of Chrome or inside of Postman, well, we

00:08.890 --> 00:11.740
can see there is lots of cookies being set.

00:11.860 --> 00:16.600
And one of these cookies is called the Csrf cookie name.

00:16.780 --> 00:23.990
And this value in here is exactly the same as the one we have in the form that we send for the login.

00:24.010 --> 00:29.740
Now, this csrf cookie is already been set as we go into the front page of this site.

00:30.310 --> 00:38.170
So the question now is how do we get this csrf cookie from request and use it inside of the form Now,

00:38.620 --> 00:45.100
because we don't have this form data, then the request is going to say 4 or 3 forbidden.

00:46.860 --> 00:48.350
Let me show you how we can do that.

00:48.350 --> 00:54.170
So first, we need to make a separate cookie jar so that we can get the cookies out from request.

00:54.530 --> 01:00.800
Because as it is now, we can't say something like request dot, get cookies.

01:00.830 --> 01:02.780
There's no method on that.

01:02.780 --> 01:06.950
Like that on request, we have to make a separate cookie jar.

01:06.950 --> 01:09.050
So const cookie jar.

01:11.470 --> 01:12.550
Equals.

01:12.850 --> 01:15.100
Actually, let me put it down here.

01:15.370 --> 01:17.920
Request Char.

01:18.430 --> 01:24.010
And when you call it like that, it just makes a cookie char for you to use inside of a request.

01:24.130 --> 01:34.210
So then we have to say again, request equals request defaults and then say char equals cookie char.

01:37.000 --> 01:40.660
And we have to change this to be a let instead of const.

01:41.570 --> 01:46.130
And let's remove these defaults here because we set them down here instead.

01:48.500 --> 01:54.920
Okay, so now we have a separate cookie jar where we can do something like cookie jar, get cookie string,

01:54.920 --> 01:57.320
get cookies, set cookies and so on.

02:00.040 --> 02:07.090
Now let's try and let's try and see what happens after we visit the first page so we can say cookie

02:07.090 --> 02:09.790
jar, get cookie string.

02:09.970 --> 02:16.690
And with this method we have to put in the site that we want to get all our cookies that has been set

02:16.690 --> 02:17.710
for this site.

02:18.330 --> 02:22.290
So we say get cookie string put in the URL.

02:23.110 --> 02:27.970
And let's say console.log just to see what we have in here.

02:30.630 --> 02:37.110
Let's comment this out and let's try and run Node Index.js and see what happens in here.

02:39.170 --> 02:40.970
So here we have.

02:42.860 --> 02:52.940
We have a CSF cookie name with the value here and this is the value we need to pass inside of the form.

02:52.970 --> 02:57.350
We can see here the csrf test name inside of the form.

02:57.350 --> 02:59.690
We need to use that value inside there.

03:00.680 --> 03:08.180
So we need to do some string splitting here in order to get this exact cookie value or cookie key and

03:08.180 --> 03:08.870
value.

03:09.410 --> 03:16.040
So we can say const, let's take this one and put it inside the cookie string.

03:16.040 --> 03:16.790
Valuable.

03:16.820 --> 03:19.340
Very valuable.

03:19.700 --> 03:21.080
I can't even say that.

03:21.800 --> 03:26.000
So then we can say const, split it.

03:27.140 --> 03:30.650
By CSR Cookie name.

03:32.030 --> 03:35.480
And we can say cookie string and split it by.

03:36.090 --> 03:37.320
CSR of cooking.

03:37.500 --> 03:44.190
There's really no easier way of doing this when you're just using the default request cookie jar if

03:44.190 --> 03:48.660
you want to do more serious cookie manipulation and stuff like that.

03:48.810 --> 03:55.290
You can use an external cookie jar, but for now I'm just using the default one we have inside of NodeJS

03:55.290 --> 03:56.100
request.

03:57.510 --> 04:00.180
And trust me, it's going to work fine.

04:02.100 --> 04:10.080
Now let's try and run, say run and debug just to test out and see what we have here and have a breakpoint

04:10.080 --> 04:11.070
here at the bottom.

04:12.070 --> 04:12.520
Let's see.

04:12.520 --> 04:13.990
We have cookie string.

04:14.570 --> 04:17.390
Then we have split it by cookie.

04:17.420 --> 04:18.170
Csrf.

04:18.200 --> 04:19.580
Csrf.

04:19.580 --> 04:20.470
Cookie name.

04:20.480 --> 04:28.640
So this one, the first one we have here is the value we have right after we have Csrf cookie name.

04:28.640 --> 04:35.750
You can see here, this is our cookie name for AC three and then we split it by that we get the number

04:35.750 --> 04:41.330
two item in this split array to be the CSR of cookie name value.

04:42.810 --> 04:45.450
So now we need to split it again.

04:45.450 --> 04:49.240
Const c sf.

04:49.570 --> 04:52.470
Let's see what the form value name was.

04:52.510 --> 04:56.130
CSF test name, let's call it that.

04:56.760 --> 05:00.750
And we say split it by cookie name and we say split again.

05:01.870 --> 05:04.720
Let's see what we got after this.

05:10.310 --> 05:10.970
Come on.

05:15.400 --> 05:16.890
Oh, is it making an error?

05:16.900 --> 05:18.790
Maybe I have an error here.

05:20.340 --> 05:27.210
A cookie name split is not a list because I have to choose the second one of the split array.

05:30.230 --> 05:31.700
So we have this one.

05:31.700 --> 05:36.320
And the second item in the array is the cookie value.

05:36.740 --> 05:44.900
Then we have this one where we split by the semicolon just to get the value only, and this is the first

05:44.900 --> 05:46.070
item in the array.

05:46.580 --> 05:48.340
So let's just go through it again.

05:48.350 --> 05:50.120
Here we have the cookie string.

05:50.600 --> 05:51.920
It looks like this.

05:51.950 --> 05:55.790
We want to get only this key and value.

05:56.120 --> 06:01.550
So we we split it by this name.

06:03.470 --> 06:10.850
And then we get this here where the second item in the array is right after the splitted value that

06:10.850 --> 06:11.570
we took.

06:13.460 --> 06:14.780
So it's this one.

06:15.390 --> 06:19.340
That's the value we want to get.

06:19.350 --> 06:27.660
And now to get rid of these semicolon and space and the other values, I say split again with the semicolon

06:27.750 --> 06:33.180
and we just get the, um, we actually need to get the.

06:34.220 --> 06:38.240
This the first item in this array.

06:39.140 --> 06:43.310
So then we can say that zero after here.

06:44.010 --> 06:49.020
And then we have the exact value we are looking for.

06:50.250 --> 06:51.540
Which is this one.

06:51.960 --> 06:53.280
Nine, eight, nine.

06:53.310 --> 06:54.890
A58.

06:55.590 --> 07:00.420
We can see that matches with our raw cookie string in here.

07:02.830 --> 07:03.670
Okay.

07:04.420 --> 07:10.990
So that's not something I usually do when I have to automate the login of a website.

07:11.650 --> 07:17.200
Only in this case where they have something with csrf and they pass it into the form that's not I don't

07:17.200 --> 07:23.230
think that's sometimes forms do that with the values, but it's not always you have to do these things.

07:23.230 --> 07:29.530
Sometimes you can just get away with enabling cookies and you can just authenticate away.

07:30.110 --> 07:32.660
But this one is a little more tricky.

07:33.620 --> 07:35.600
But I think we got it now.

07:36.260 --> 07:37.400
Let's see.

07:37.820 --> 07:40.850
So now we pass in the CSF in the form.

07:41.120 --> 07:43.670
Let's try and run Node Index.js.

07:43.700 --> 07:49.340
Let's also do a console log of the login result and see what we got.

07:50.550 --> 07:51.900
Login result.

07:57.980 --> 08:02.810
And here we can see it says Success, True success page dashboard.

08:02.840 --> 08:11.120
That means we are now logged in and now the server has saved in the session that this cookie is authenticated

08:11.120 --> 08:16.820
and we can now scrape the rest of the pages that is behind the authentication.

08:17.000 --> 08:26.960
So we can go into this URL which is matching preferences and we can just say something like const matches,

08:27.500 --> 08:31.040
await request get and put it in the URL.

08:31.340 --> 08:38.570
Notice we don't have to put anything else in because we already have the cookie set inside of the inside

08:38.570 --> 08:39.500
of request.

08:39.770 --> 08:46.790
And now let's try and save this page to see if we get the right page from the request.

08:47.720 --> 08:48.560
Let's say.

08:49.900 --> 08:54.940
Um, let's say const FS to save the file inside of NodeJS.

08:56.140 --> 08:56.540
F.

08:56.560 --> 09:06.880
S and we can say f s right file sync and we could put in matches dot HTML with the matches we get from

09:06.880 --> 09:09.820
NodeJS and see what we have in here.

09:10.870 --> 09:12.550
Node Index.js.

09:17.290 --> 09:20.140
Now, let's check it out inside of Chrome.

09:20.140 --> 09:23.620
So open with Google Chrome.

09:24.920 --> 09:31.730
And indeed we can see we now have all of the internship matches inside of Google Chrome that we downloaded

09:31.730 --> 09:33.350
from NodeJS request.

09:34.530 --> 09:40.590
And then we can basically just go ahead and do any sort of regular web scraping or we can get all of

09:40.590 --> 09:42.690
the items in this website.

09:42.900 --> 09:45.790
And I think I've already shown you that inside of the course.

09:45.790 --> 09:48.090
So I'm not going to show you with this example.

09:48.090 --> 09:55.050
This was more to show you how we can log in and authenticate ourselves in a website that uses something

09:55.050 --> 10:04.560
like cookies and session authentication and a little bit of csrf sessions or tokens.

10:05.390 --> 10:07.640
So I hope you got a lot out of this.

10:07.640 --> 10:11.720
And if you have any questions or suggestions, please let me know.

10:11.870 --> 10:14.660
And yeah, I'll see you around.
