WEBVTT

00:00.140 --> 00:07.910
Now we've been able to get a nice looking URL, which is actually a URL to our Facebook wall and now

00:07.910 --> 00:12.350
we just need to make another request so we can get to our Facebook wall.

00:12.800 --> 00:14.480
So let's try and do that.

00:14.480 --> 00:15.920
So const.

00:16.990 --> 00:18.010
Home page.

00:18.640 --> 00:20.410
Await request.

00:21.380 --> 00:29.060
Obviously request get because it's a get request we're not posting now and headers location.

00:30.750 --> 00:37.920
And then let's try and see if we can write, because this is supposed to be HTML.

00:38.070 --> 00:45.570
I mean, we're supposed to get a body element inside of this request, which is pure HTML that we can

00:45.570 --> 00:50.470
save much like we got inside of our postman.

00:50.490 --> 00:51.870
We have this.

00:54.350 --> 01:02.120
This large body with lots of HTML, which is basically our Facebook wall page, and we need to be getting

01:02.120 --> 01:06.260
the same thing inside of our request here.

01:08.390 --> 01:17.170
Now let's try and write a function to save this HTML to a file because that's pretty handy to have.

01:17.180 --> 01:23.990
So we can look on the file and load it into our browser to see how the response is looking.

01:24.470 --> 01:31.520
So function write file, we take in a body and then we say FS.

01:33.260 --> 01:34.840
That's import fs.

01:36.710 --> 01:44.930
Const FS require FS and FS is already in NodeJS by default, so you don't need to use NPM.

01:46.100 --> 01:47.720
FS write file.

01:48.830 --> 01:51.500
And then we put in the path of the file.

01:51.500 --> 01:54.380
So we'll just write tested HTML.

01:54.590 --> 02:01.880
Then we have the body, then a callback, if there's any errors or if it's successful.

02:02.540 --> 02:07.370
So if there's an error, we say console error.

02:08.550 --> 02:08.820
Ever.

02:09.990 --> 02:12.150
And if not, we'll just write.

02:13.360 --> 02:15.100
HTML was saved.

02:17.650 --> 02:21.190
And then we can use the function over here.

02:21.190 --> 02:21.690
Right.

02:21.700 --> 02:23.710
File home page.

02:25.880 --> 02:30.680
Now let's try and run it again and look at what result we're getting from Facebook.

02:31.960 --> 02:33.910
So HTML was saved.

02:35.940 --> 02:39.180
And we can see there's actually this HTML page.

02:39.180 --> 02:46.440
It's a little hard to read when it's only one line, but we can actually open it inside of our browser.

02:46.440 --> 02:55.080
So if I go inside the folder for our project, right click the text HTML, then open it in Chrome.

02:56.770 --> 03:00.640
We can see the page that we actually got from Facebook.

03:00.880 --> 03:06.940
Now, here it says in Danish, in my local language, that I need to log on first.

03:07.240 --> 03:13.120
So that means something with the log on went wrong when we tried to make this request.

03:13.390 --> 03:14.920
And why did something go wrong?

03:14.920 --> 03:16.030
What went wrong?

03:16.240 --> 03:25.090
Well, when you log on to the Facebook page using this post request, Facebook is setting some cookies

03:25.120 --> 03:28.900
on your browser or in this case, in Postman.

03:29.140 --> 03:33.310
And that's how Facebook knows that you logged in successfully.

03:33.490 --> 03:41.200
And since we're doing this request without saving or reading any cookies, Facebook doesn't know we're

03:41.200 --> 03:41.980
logged in.

03:43.320 --> 03:46.730
So how do we also use cookies inside of request?

03:46.740 --> 03:48.870
Well, actually, it's really simple.

03:49.020 --> 03:54.680
We just go inside of our default options and we set char to true.

03:54.720 --> 04:00.450
So they call it char, just like a just like a cookie jar.

04:00.630 --> 04:09.270
So if you set the char to true, then request is automatically going to save any cookies that it's going

04:09.270 --> 04:10.710
to get from Facebook.

04:10.710 --> 04:16.080
And the next request you make using request is going to have those cookies.

04:16.080 --> 04:19.890
So then Facebook can see that you've successfully logged in.

04:20.910 --> 04:24.690
Now let's save that and let's try and log in again.

04:29.630 --> 04:30.650
And there we go.

04:30.650 --> 04:32.840
Now let's try and reload the page.

04:32.840 --> 04:34.160
I have it opened here.

04:34.160 --> 04:35.750
This is the local file.

04:35.960 --> 04:38.360
Reloaded it and there we go.

04:38.390 --> 04:41.780
We can see that I've been successfully logged in now.

04:42.140 --> 04:50.870
And this is just the page that we saved from from Facebook and yeah, that's it.

04:50.900 --> 04:56.930
Now we can also try and do some selecting and basically we can do whatever we want.

04:56.960 --> 05:00.610
You can scrape this file as much as you like to.

05:00.620 --> 05:05.210
You can select these elements inside of Chario.

05:06.770 --> 05:15.560
Notice that all of the Oedipus seems to have a roll of article attributes, so you can select all of

05:15.560 --> 05:19.760
the elements with a roll of article with Cheerio.

05:21.760 --> 05:31.030
And you can even load the file just this HTML file, load it using FS and then load it using cheerio

05:31.030 --> 05:35.290
and then you can select elements and extract all of the data you want.

05:36.070 --> 05:43.120
So that's how you can scrape Facebook using just request and cheerio.
