WEBVTT

00:00.080 --> 00:06.980
Notice how fast the tests are running when we do this rather than when we are hitting the site itself

00:06.980 --> 00:08.810
and writing out our scraper.

00:09.140 --> 00:10.940
And at the same time.

00:11.650 --> 00:20.710
Craigslist had no idea that we are building out a scraper for their site, and that's going to really

00:20.710 --> 00:23.330
lessen the likelihood of us getting banned.

00:23.350 --> 00:28.810
Actually, we can't get banned because we are not hitting Craigslist at all at the moment.

00:29.350 --> 00:36.950
So then once you have written out your scraper, you can start running it at certain intervals instead.

00:36.970 --> 00:39.580
Maybe only once an hour or once a day.

00:39.580 --> 00:44.050
And your likelihood of getting banned then is quite small.

00:44.050 --> 00:49.540
As long as the scraper is not scraping a huge amount of data at one point.

00:52.070 --> 00:59.690
And you also have a security belt that your scraper is working as it should be, and you can keep on

00:59.690 --> 01:04.250
making changes in the code and note that it's still working.

01:10.940 --> 01:17.900
For example, if we wanted to make a separate function for getting the different details, I can do

01:17.900 --> 01:18.440
that.

01:18.530 --> 01:27.260
So we can write out a function, for example, for get date posted where we pass in the jQuery variable

01:27.260 --> 01:28.490
and element.

01:29.450 --> 01:32.660
And then we can simply say we return this.

01:34.390 --> 01:35.950
So I can say return.

01:37.710 --> 01:38.730
This one here.

01:38.730 --> 01:42.480
And then down here, we can call get date posted.

01:43.430 --> 01:50.300
And I can pass in the two variables here and I can still see that all of my test is passing.

01:50.300 --> 01:57.410
So I have a security knowing that my code is still working, even though I did this refactor right now.

01:57.950 --> 02:06.770
So that's going to enable you to make changes in your code and ensure that it's easier to maintain.

02:06.800 --> 02:09.680
Let's try and get one for getting the hood also.

02:09.680 --> 02:12.320
So we write the same thing basically.

02:13.690 --> 02:17.170
And then we cut this for getting the hood.

02:20.110 --> 02:29.560
We say return this one here and then we call get Hood and pass in the dollar sign and element.

02:30.160 --> 02:37.420
And I still see all of my tests are passing, so I know that the code is still working.

02:38.920 --> 02:47.320
And I will have to tell you, that makes your code in the long in the long end a lot nicer and easier

02:47.320 --> 02:53.710
to maintain rather than having some code that you build out and suddenly you don't want to touch it

02:53.710 --> 02:56.770
because you don't know if you're going to break anything.

02:57.100 --> 03:01.450
So there's a lot of, um, there's a lot of positive.

03:02.090 --> 03:07.280
Effects or side effects and lots of positive things about doing TDD.

03:07.790 --> 03:13.670
The only negative, if you can call it a negative, is that it does take extra work.

03:13.700 --> 03:17.480
We did have to write out all of these these tests.

03:17.510 --> 03:18.200
We had to.

03:18.230 --> 03:23.990
We had to figure out a way to read this data without hitting the actual site.

03:24.830 --> 03:28.730
So you have to do some extra work when you write tests.

03:28.730 --> 03:34.280
But hopefully and it usually does pay off in the end.

03:36.000 --> 03:42.240
So that's been a little introduction to you guys for test driven development and how to build a scraper

03:42.240 --> 03:45.270
without actually hitting the site all of the time.

03:45.990 --> 03:52.110
If you thought this was interesting, let me know if you'd like to know some more things about how to

03:52.110 --> 03:53.100
do this.

03:53.700 --> 03:56.430
I'll be happy to make more examples.

03:56.430 --> 04:00.270
So see you guys and I hope you enjoy the rest of the course.
