WEBVTT

00:00.130 --> 00:05.320
The first thing we need to do is to clone the repository I have of the code here.

00:05.330 --> 00:11.780
I'm going to leave a link in the description or the lecture notes where you can get the scraper.

00:12.020 --> 00:17.000
So we go ahead and say git clone and put in the URL.

00:18.310 --> 00:21.070
And then we simply go into the directory.

00:21.070 --> 00:23.650
We open it inside of Visual Studio code.

00:25.190 --> 00:27.860
And it should be named cos scraper.

00:29.570 --> 00:30.500
There we go.

00:34.650 --> 00:44.040
And then we go and say npm install or npm I to get all the different dependencies we have.

00:49.510 --> 00:57.160
And now the thing the first thing you need to do is to replace this URL inside of config, inside the

00:57.160 --> 01:02.430
config folder with your own MongoDB connection URL that you made.

01:02.440 --> 01:09.670
So go inside of lab if you use M lab and create a new database.

01:11.990 --> 01:16.550
So go inside M-lab and in MongoDB deployments.

01:16.580 --> 01:18.140
Create a new database.

01:19.390 --> 01:22.720
And call it something like Craigslist cars.

01:24.730 --> 01:27.640
I'll just show you how I made mine.

01:28.780 --> 01:32.020
I have my Craigslist cars, Craigslist cars here.

01:32.200 --> 01:35.710
And initially you don't have any collections.

01:36.780 --> 01:40.770
I will delete them now and go ahead and make a user.

01:40.770 --> 01:42.300
Add a database user.

01:43.720 --> 01:46.180
And then you're ready to go.

01:46.180 --> 01:53.680
So then you have a URL with a user and a password, and then you can replace that URL inside here.

01:54.450 --> 01:57.960
So fairly basic MongoDB stuff.

01:57.960 --> 02:00.180
If you use another provider, you can do that.

02:00.180 --> 02:02.580
Just replace this, of course.

02:03.120 --> 02:11.340
Now, inside of the code itself we have up here, we have the model, we have request.

02:11.340 --> 02:12.450
Cheerio.

02:12.480 --> 02:18.150
You should know that from previous parts of the course creates this card, the model itself that we're

02:18.150 --> 02:20.280
going to save to the database.

02:20.700 --> 02:28.650
It's basic model with a URL where you can see the cars article or post the timestamp.

02:28.650 --> 02:33.780
The post was made and the title of the card article itself.

02:34.470 --> 02:38.760
Then we have the URL where I get all of the different cars from.

02:40.700 --> 02:45.620
That is the one I showed here in the other episode.

02:45.930 --> 02:54.770
And so here's all the cars we scrape and well, then it just basically goes and gets the different elements.

02:54.770 --> 03:01.190
I think I've shown that in other sections, so I will not go into too much depth about that.

03:02.150 --> 03:04.100
And then we have.

03:05.100 --> 03:07.000
Inserting cars.

03:07.020 --> 03:12.120
Craigslist cars in the MongoDB where we go over all of the results.

03:12.120 --> 03:19.710
This array of results we have from Craigslist, we just basically check if there's already this car

03:19.710 --> 03:20.670
in the database.

03:20.670 --> 03:26.430
So if there's already one car with this specific URL, we're not going to save it in the database so

03:26.430 --> 03:28.080
we don't get duplicates.

03:28.560 --> 03:36.840
And then I just well, fire off all of this requests to MongoDB and wait for the promises to resolve.

03:36.840 --> 03:45.390
And once the promises have resolved, I disconnect from MongoDB and the program is closing.

03:45.510 --> 03:49.920
So if you run it just by saying node index.js.

03:50.750 --> 03:52.610
It will see connected to MongoDB.

03:53.780 --> 03:54.980
And.

03:56.960 --> 03:58.580
Let's give it a little while.

03:58.580 --> 04:00.920
And then it says disconnected from MongoDB.

04:01.250 --> 04:05.300
And now if we check inside of our lab.

04:09.170 --> 04:11.660
Inside of the collections.

04:11.710 --> 04:20.510
You can see there's 120 documents being made because we didn't have any cars in the collection before,

04:20.510 --> 04:23.510
but now it's been filled up with the first page.

04:23.810 --> 04:32.900
But if I run it again, it should not create 120 more because we already have the cars inside of the

04:32.900 --> 04:33.890
database.

04:34.910 --> 04:44.840
So if I go ahead and refresh the page, I can see we still have 120 documents because there hasn't been

04:44.840 --> 04:47.750
added any new cars in this page yet.

04:47.750 --> 04:50.180
In the one second I just waited.

04:50.600 --> 04:57.980
So instead we wait ten minutes before running the scraper again inside of Heroku and then we gradually

04:57.980 --> 04:59.720
add new cars that we.

05:01.390 --> 05:10.680
So in the next section, let's see how you can deploy to Heroku and how you can run it in periodic times.

05:10.690 --> 05:17.200
So but remember to replace this URL with your own or else it's not going to work.

05:17.200 --> 05:19.240
And then I'll see you in the next section.
