WEBVTT

00:00.140 --> 00:07.280
So I got a question about how to deploy a puppeteer web scraper inside a Heroku dyno, which turns out

00:07.280 --> 00:11.390
to be a little more complicated than I originally thought it would be.

00:11.420 --> 00:18.080
Now, we already covered a section where I show you how to deploy a scraper on Heroku or Google Cloud.

00:18.080 --> 00:23.260
And this works fine for a regular web scraper with cherry and request.

00:23.270 --> 00:30.170
But if we start using Puppeteer, then you need to add the so-called Buildpack to your Heroku dyno.

00:30.200 --> 00:34.970
This is beginning to sound terribly complicated already, but don't worry.

00:34.970 --> 00:40.970
I'm going to show you step by step how to deploy a puppeteer web scraper to Heroku.

00:42.180 --> 00:42.630
Okay.

00:42.660 --> 00:51.030
Step one is to get my GitHub repository with the minimal puppeteer project and go ahead and clone this

00:51.030 --> 00:52.350
git repository.

00:53.110 --> 00:58.540
So let's go inside the console and say git clone and put in the URL.

00:59.950 --> 01:04.180
And then let's open up this project inside Visual Studio code.

01:04.900 --> 01:11.370
So now that we have the project open, let me just give you a brief intro to how the project is.

01:11.380 --> 01:17.920
So we have this very simple file here, Index.js, which has the property dependency.

01:17.950 --> 01:23.500
We set the headers to be true and we pass in this no sandbox argument.

01:23.530 --> 01:28.660
It's very important that you have the no sandbox argument or else it is not going to work.

01:29.140 --> 01:31.870
And then we go to the example.com page.

01:31.870 --> 01:38.740
We simply get the text from the page and then we do a console output to make sure that our puppeteer

01:38.740 --> 01:42.700
browser is working as it should be inside of Heroku.

01:43.090 --> 01:49.360
So with that being said, let's go ahead inside of Heroku and create a new app that we can deploy this

01:49.360 --> 01:50.200
app to.

01:50.740 --> 01:54.450
Now inside Heroku, let's go ahead and make a new app.

01:54.460 --> 01:55.870
This is step two.

01:57.100 --> 02:01.870
So we say here I will call mine Puppeteer Udemy.

02:02.900 --> 02:06.680
And let's say reaching is Europe because that's where I'm located.

02:07.430 --> 02:13.460
Now we need to use the Heroku CLI in order to add this repository.

02:13.460 --> 02:22.820
So we go and take the existing git repository and we simply copy this command and go inside of the repository

02:23.030 --> 02:28.130
inside the terminal and we add it to our Heroku app.

02:29.000 --> 02:30.110
Just like that.

02:30.140 --> 02:32.840
Now we set the remote to be Heroku.

02:32.870 --> 02:40.400
Now we need to publish or push this app so we can say Git push, Heroku master.

02:40.430 --> 02:46.040
Now, it is important that you do this push first before we add the buildpacks.

02:46.040 --> 02:53.840
Because I found out by testing myself that if I add the Buildpack first for puppeteer and then push

02:53.840 --> 03:00.020
up to local, I simply don't get any output in my console, so it doesn't seem like it's running at

03:00.020 --> 03:00.700
all.

03:00.710 --> 03:06.440
So it's important we first do this push where we should be getting an error when we run it.

03:06.440 --> 03:13.190
So let's go inside a new terminal here and let's say Heroku logs.

03:14.130 --> 03:14.700
Tail.

03:14.700 --> 03:17.700
And then a and we call it puppeteer.

03:18.090 --> 03:21.390
Udemy, which is our Heroku app name.

03:25.620 --> 03:30.540
Okay, so now it's saying deploy to Heroku and.

03:31.760 --> 03:36.070
Let's see inside the console now, we see lots of errors here.

03:36.080 --> 03:40.820
So these errors come up because we don't have the build pack added.

03:40.970 --> 03:47.330
But remember that you need to do this push first before we add the build pack or else you're not going

03:47.330 --> 03:50.270
to get any console output at all.

03:50.270 --> 03:51.650
It's weird.

03:51.650 --> 03:58.250
It doesn't make sense, but I've tested this multiple times and this is what seems to happen every time.

03:58.250 --> 04:03.620
If I add the build pack right away before pushing first.

04:06.090 --> 04:12.630
So with that now let's go and add the Buildpack and then let's do a little commit and push again.

04:12.660 --> 04:16.080
So let's add the puppeteer Buildpack here.

04:16.380 --> 04:18.600
It's called this one.

04:18.600 --> 04:21.180
Join the whatever.

04:21.180 --> 04:28.980
And then we say Dash A and have our Heroku app name Udemy.

04:29.070 --> 04:30.030
Just like that.

04:32.390 --> 04:39.370
Okay, so now I've added the buildpack and then we can run git push heroku master.

04:39.380 --> 04:43.730
I think I need to make a little commit before I can push.

04:44.090 --> 04:45.620
Yeah, I need to make a little commit.

04:45.620 --> 04:46.430
So let's do that.

04:46.430 --> 04:48.170
I just add a.

04:50.530 --> 04:52.180
Space like that.

04:52.180 --> 04:53.800
And let's go ahead.

04:54.590 --> 04:56.210
And commit that.

04:58.680 --> 05:00.060
With Buildpack.

05:02.420 --> 05:06.350
And then we can say git push, Heroku master.

05:09.350 --> 05:12.590
Okay, so now it is building the project again.

05:12.800 --> 05:17.210
This time with with the build pack added.

05:17.210 --> 05:22.430
So it's going to take a little more time because it is quite a big pack that we're adding.

05:22.430 --> 05:29.870
But now it should be running and giving us the designs inside of the console.

05:29.870 --> 05:34.850
So we know for sure that now the puppeteer scraper inside Heroku is running.

05:38.330 --> 05:43.730
Okay, so now our building has finished and our deploy to Heroku.

05:43.760 --> 05:50.570
So if we go ahead and check inside the other terminal where we had the Heroku logs running, we can

05:50.570 --> 05:54.920
now see the whole of the site that we decided to scrape.

05:55.280 --> 06:02.090
So just like that, that's how we deploy a puppeteer web scraper to Heroku.
