WEBVTT

00:00.170 --> 00:00.920
Hello, everyone.

00:00.920 --> 00:07.190
In this section we're going to look at how to deploy a scraper to Google Cloud platform and how to run

00:07.190 --> 00:09.680
it at a specific interval.

00:10.550 --> 00:17.480
So again, I have the same scraper that we introduced from a previous section, except for this time

00:17.480 --> 00:20.150
I have added a little rest API.

00:21.410 --> 00:28.580
So this means that we are going to run this scraper by doing a get request on just a base URL here.

00:29.510 --> 00:35.690
So if I go ahead inside of Postman and I just run a get request.

00:36.600 --> 00:37.700
On the base URL.

00:38.730 --> 00:48.210
It's going to return course script and it has saved the cars or the Craigslist cars into the MongoDB

00:48.210 --> 00:54.390
database right here, just like in our Heroku app, except for in our Heroku app.

00:54.390 --> 01:03.780
We just had a NodeJS application and Heroku set up a cron job to run this NodeJS application in a specific

01:03.780 --> 01:04.560
interval.

01:04.770 --> 01:12.060
The way it works inside of Google Cloud platform is that you set up a cron dot yaml file where you specify

01:12.060 --> 01:19.950
a specific URL that you want this cron job to do a get request on and then you specify a schedule for

01:19.950 --> 01:20.340
it.

01:23.160 --> 01:31.500
And now this cron job then is going to do a get request on a URL, which is just the base URL here.

01:31.740 --> 01:38.220
And then I set up a get endpoint on this base URL using Express Rest API.

01:38.790 --> 01:44.160
And I simply initiate the main scraping function in here.

01:45.260 --> 01:53.120
So it's exactly the same as our previous section or the original scraper I created and deployed in Heroku.

01:53.210 --> 01:59.330
But now we have just a get endpoint where we initiate the scraper instead.

02:02.260 --> 02:09.190
And that is because that is the only way that you can do a schedule cron job inside of Google Cloud

02:09.190 --> 02:15.550
Console or Google Cloud Server is by setting up this cron job.

02:15.550 --> 02:19.540
And the cron job will then do a get request on a specific URL.

02:19.900 --> 02:24.040
You can't make the cron job, just run a specific command.

02:24.070 --> 02:27.160
You have to set it up to a URL instead.

02:28.600 --> 02:32.230
Now in order to deploy it, it is quite easy.

02:32.260 --> 02:34.990
You just go ahead into the console.

02:35.350 --> 02:39.940
And first of all, you can do the Google app deploy.

02:40.480 --> 02:46.660
And to deploy your app or Gcloud app deploy to deploy your NodeJS app.

02:47.020 --> 02:56.380
And then you do a another way, you add the chrome Yaml file and that is going to run this and then

02:56.380 --> 03:03.310
you will see that it's running a type of cron jobs up here and then it will give you a URL where you

03:03.310 --> 03:07.480
can visit the task or the cron jobs.

03:08.650 --> 03:15.470
And then in the side this URL here you can see the cron job you have set up with the frequency and when

03:15.470 --> 03:20.980
it was last run and you can also set it to run now if you want to do that.

03:22.530 --> 03:28.800
So that is how you deploy a periodic scraper into Google app engine.

03:29.070 --> 03:33.190
And I hope you got something out of this and you can use it.

03:33.210 --> 03:39.600
One more thing I forgot to say is that you can check out the code I made, of course, on the Google

03:39.600 --> 03:44.580
deploy Deploy branch inside of this car scraper code.

03:44.580 --> 03:51.390
So go ahead and clone the repository and switch over to the Google deploy branch and you can see all

03:51.390 --> 03:52.080
of the code.