WEBVTT

00:01.010 --> 00:05.930
So in the last section, we made sure not to fire all of our requests all at once.

00:05.930 --> 00:11.600
And now in this section, we're going to be avoiding getting banned even further by putting in a sleep

00:11.600 --> 00:13.820
call for each of our requests.

00:13.850 --> 00:22.100
Now, even when we have these sequential requests instead, it can still be quite fast compared to if

00:22.100 --> 00:26.300
a normal user is just clicking around on the Craigslist website.

00:26.300 --> 00:30.110
You see, they wouldn't really be clicking this fast around on jobs.

00:30.110 --> 00:36.320
That's not really a human like behavior, and we want to emulate sort of someone just clicking around

00:36.320 --> 00:42.890
on the website, uh, just slowly clicking from page to page to avoid getting banned from websites and

00:42.890 --> 00:44.960
overloading the web servers.

00:45.320 --> 00:46.640
So how can we do that?

00:46.640 --> 00:48.290
Well, it's quite easy.

00:48.290 --> 00:54.680
We just need to create this, uh, sleep function, which we can call anywhere in our functions when

00:54.680 --> 00:59.120
we want to wait a little bit before we pull the next request.

00:59.120 --> 01:02.030
So our sleep function looks something like this.

01:02.060 --> 01:06.770
We're going to pass in the milliseconds that we want to wait on this function.

01:06.770 --> 01:09.470
And it returns a promise.

01:10.190 --> 01:13.370
And we have this, uh resolve variable here.

01:13.370 --> 01:16.730
And this promise also returns.

01:16.730 --> 01:19.190
Let's make sure we have a new keyword here.

01:19.190 --> 01:23.120
Also returns a set timeout call.

01:23.750 --> 01:26.390
And we call this resolve.

01:26.390 --> 01:33.260
Once the milliseconds has passed on the set timeout it looks like this.

01:33.260 --> 01:36.980
And then we can just call it from anywhere in our code.

01:36.980 --> 01:45.230
Let's try and do also a console log from the index page because that actually fires off really fast

01:45.230 --> 01:45.950
as well.

01:46.610 --> 01:54.410
Let's just do it with the request filed with the index variable here.

01:55.130 --> 01:57.260
And then we can see how fast it fires.

01:57.260 --> 02:01.550
First without the sleep you can see it fires off really fast.

02:01.550 --> 02:10.610
And then if we do an await of sleep with one, one 2nd or 1000 milliseconds await, we can see how fast

02:10.610 --> 02:11.510
it goes now.

02:11.870 --> 02:19.520
And you can see now it waits one second between each request, which is quite a lot, quite slower,

02:19.520 --> 02:23.750
but also much less prone to getting you banned from websites.

02:24.980 --> 02:32.060
So we're also going to call this sleep function down in our descriptions here, just so we don't risk

02:32.060 --> 02:33.740
getting blocked here as well.

02:36.840 --> 02:37.830
So that's it.

02:37.830 --> 02:46.050
Now you have learned how to scrape a static website with pagination, how to get all of the pages and

02:46.050 --> 02:52.650
get all the URLs from these dynamic objects, the job description URLs, and scrape those as well as

02:52.650 --> 02:53.220
well.

02:53.310 --> 02:59.460
And you learned how to make sure your request are being filed in sequential to avoid getting banned

02:59.460 --> 03:06.540
or blocked from websites, and also put a little sleep or waiting time in between each of your requests

03:06.540 --> 03:09.630
to avoid getting banned even further.

03:10.960 --> 03:11.890
So that's it.

03:11.890 --> 03:17.560
Now keep in mind there is some websites that do require JavaScript to actually render the content.

03:17.560 --> 03:22.270
In those cases, we can't use excuse to get just the pure HTML.

03:22.270 --> 03:28.420
We do need to have a full browser running instead, which is something like puppeteer you would use

03:28.420 --> 03:32.020
to render the website and then get the data from it.

03:32.110 --> 03:38.110
I'm going to show you in another section how to use puppeteer to scrape websites as well.

03:38.110 --> 03:39.400
So I'll see you there.

03:39.400 --> 03:41.800
And thank you so much for watching this section.

03:41.800 --> 03:43.750
I hope you got a lot out of it.
