WEBVTT

00:03.830 --> 00:09.710
In this section, we are going to learn how to start up the chromium browser with puppeteer and how

00:09.710 --> 00:11.720
to make it visit a website.

00:11.750 --> 00:13.070
It's quite easy.

00:13.990 --> 00:17.940
First we are going to make a new file in the project folder.

00:17.950 --> 00:20.320
Let's call it Index.js.

00:21.860 --> 00:27.950
Now inside the folder, let's import puppeteer using require in the top of the file.

00:28.340 --> 00:33.080
So const puppeteer require puppeteer.

00:35.510 --> 00:38.480
And then let's make an async function.

00:41.060 --> 00:46.430
It's async await is one of the newest and really cool features of JavaScript.

00:46.460 --> 00:50.450
I'm going to leave a link in the lecture content so you can read more about it.

00:50.450 --> 00:56.030
If you're not already familiar with this feature or if you want to read more about difference between

00:56.030 --> 00:58.880
using then clauses and async await.

01:01.850 --> 01:08.030
So the first thing we need to do is to tell Puppeteer to initialize or open up the chromium browser.

01:08.030 --> 01:10.220
And doing this is really easy.

01:10.640 --> 01:17.630
We just write out const browser equals await puppeteer launch.

01:17.750 --> 01:22.850
And then we put in an option called Headless where we set it to false.

01:23.940 --> 01:30.060
So now we have the browser inside this browser variable and we can access our browser and control it

01:30.060 --> 01:31.200
from this variable.

01:31.810 --> 01:35.890
So notice that we're using the headless false option here.

01:36.190 --> 01:42.520
This means that the browser will not be hidden from you if you try to launch it without the headless

01:42.520 --> 01:43.390
false option.

01:43.390 --> 01:45.400
The browser is just going to be hidden.

01:47.080 --> 01:48.070
Running the browser.

01:48.070 --> 01:49.330
Headless is great.

01:49.330 --> 01:55.480
If you already developed your scraper and you are running it on a server without a graphical interface.

01:55.720 --> 02:02.560
But now, while we are developing and debugging our scraper, it's a pretty good advantage to see what

02:02.560 --> 02:07.690
Puppeteer is doing in the browser and this way we can more easily debug it.

02:09.700 --> 02:14.260
Now on to the next step, which is to open a new page or a new tab.

02:14.260 --> 02:17.320
And then we are going to actually visit a URL.

02:18.320 --> 02:18.800
Again.

02:18.800 --> 02:20.140
It is super easy.

02:20.150 --> 02:26.930
You just write out const page equals await browser new page.

02:28.190 --> 02:35.170
And now we use the browser variable to instantiate a new page variable which we can also control.

02:35.180 --> 02:38.930
And now we can finally make the page go to a URL.

02:39.110 --> 02:47.600
So we say await page dot, go to and let's try out a https google.com.

02:49.390 --> 02:54.970
Now let's try and run this script and see how it runs if we end up on Google.com.

02:55.240 --> 02:59.800
So node and then the file index dot js.

03:01.390 --> 03:09.340
So Puppeteer is now starting up the chromium browser and it is going to a URL and here we are on Google.com.

03:10.830 --> 03:11.790
So there you go.

03:11.790 --> 03:18.270
That is how you instantiate a new puppeteer controlled chromium browser, how you create a new tab or

03:18.270 --> 03:21.180
a new page, and then make it visit a URL.

03:21.840 --> 03:27.090
In the next section, we are going to take a closer look at the website where scraping the Craigslist

03:27.090 --> 03:30.960
website and what kind of data we are trying to scrape.