WEBVTT

00:01.450 --> 00:02.410
Okay, everyone.

00:02.410 --> 00:09.070
So now we are getting the number of guests that each room can accommodate inside of the listings.

00:09.100 --> 00:13.120
We also want to get the bedrooms, the baths and the baths.

00:13.270 --> 00:19.270
And the way we're going to do that is pretty much the same method as we are using in here.

00:19.690 --> 00:25.150
So if you want to go ahead and do that yourself, get some practical exercise out of it.

00:25.150 --> 00:28.870
I recommend that you pause the video and try to do that.

00:28.900 --> 00:36.040
One thing to keep in mind before you do it is that I made a mistake here, because if we don't return

00:36.040 --> 00:41.320
any matches from this regular expression up here, this value becomes null.

00:41.320 --> 00:48.640
It doesn't have a length of zero, so you just check if it is not null instead of this.

00:49.120 --> 00:55.420
Okay, so if you want to do this exercise yourself, go ahead and pause the video now.

00:57.320 --> 01:05.360
And now for the rest of you who are lazy or just want to see how we do it, we go ahead and I do it

01:05.360 --> 01:08.780
now and you will follow along with me, of course.

01:09.170 --> 01:14.540
So we have we will write a function called return matches.

01:15.590 --> 01:19.310
Because what we're doing is so similar to this.

01:20.000 --> 01:26.990
And so we pass in the room text and the regular expression we're going to test and we have the regular

01:26.990 --> 01:35.540
expression matches which rerun on room text with the regular expression.

01:35.540 --> 01:44.570
And by default, we're going to set the result here of non-applicable and.

01:45.860 --> 01:48.760
Just going to make the text a bit bigger for you.

01:49.930 --> 02:02.260
And then we check if the regex matches is not null and if it's not null, then we just return the mean.

02:02.260 --> 02:06.520
We set the result to be the first match.

02:08.180 --> 02:15.410
And if we don't, I mean, if the regex matches is null, then there's something wrong or our regex

02:15.440 --> 02:18.050
is not working as much as we want to.

02:18.080 --> 02:28.250
So we return a no regex matches exception exception and we say no regex matches found for the regex

02:28.250 --> 02:30.140
match we're trying to use.

02:30.530 --> 02:34.280
So we can debug it later and look at the text and so on.

02:34.640 --> 02:38.060
And finally we will return the result.

02:39.530 --> 02:40.280
Okay.

02:41.220 --> 02:50.820
And so we're going to go ahead and delete all this and we'll just say return matches, pass in the room

02:50.820 --> 02:55.470
text and the regular expression, which I just deleted.

02:55.710 --> 03:00.090
And that was the plus guest.

03:02.000 --> 03:02.840
All right.

03:04.180 --> 03:08.470
And actually we're supposed to be guest allowed, not guest match.

03:09.610 --> 03:15.580
And yeah, now we can go ahead and write for the bedrooms.

03:17.110 --> 03:18.430
Return matches.

03:19.210 --> 03:20.590
Room text.

03:21.700 --> 03:24.530
And we pass in a number.

03:24.550 --> 03:29.980
I expect it to be a number and bedroom and pass.

03:32.760 --> 03:40.650
Return matches and then a number and path.

03:42.850 --> 03:43.750
Then we have bats.

03:43.780 --> 03:46.330
Also return matches.

03:47.230 --> 03:48.340
Room Text.

03:53.330 --> 03:53.690
Bit.

03:54.530 --> 03:55.090
Yeah.

03:55.880 --> 04:05.120
So and then I instead of having these console log down here, I want to return an object with all of

04:05.120 --> 04:08.540
the values we got here.

04:13.480 --> 04:14.050
That's.

04:16.140 --> 04:21.360
And then instead I want to say const result.

04:22.110 --> 04:22.800
Here.

04:22.800 --> 04:33.150
So the the object returned from the method here is coming down here and we want to console log that

04:33.150 --> 04:33.930
instead.

04:36.960 --> 04:38.760
And let's see.

04:38.760 --> 04:46.140
So I also want to if there's an exception, it's going to throw a mean display this error so we can

04:46.140 --> 04:50.280
see what regular expression is at fault here.

04:50.280 --> 04:59.160
And I also want to show the I also want to see the URL so we can go and check the page that's failing

04:59.910 --> 05:04.530
and I'll put that on top of the error.

05:04.530 --> 05:12.630
And another thing I think that, oh, I forgot to put in the price per night in my values.

05:12.630 --> 05:16.320
I think that's pretty important and.

05:17.310 --> 05:17.820
Okay.

05:17.820 --> 05:28.200
And so the URL I also want to pass in the room text in my exception if I got an exception.

05:28.200 --> 05:36.300
So I will write console error room text and now room text.

05:36.330 --> 05:37.770
We need to.

05:39.710 --> 05:42.380
Actually, I forgot to have my selector here.

05:42.950 --> 05:44.450
That's a mistake.

05:45.350 --> 05:47.810
You probably have it in your.

05:49.160 --> 05:49.760
Code.

05:49.760 --> 05:50.990
That's just my mistake.

05:50.990 --> 05:52.100
Then I deleted it.

05:52.550 --> 05:56.810
But we actually need to move this room text variable.

05:57.610 --> 06:02.350
To the outer scope so we can display displays it.

06:02.710 --> 06:05.560
Display it inside of the catch here.

06:09.120 --> 06:09.960
Like that.

06:13.360 --> 06:15.850
And let's see how it goes.

06:35.090 --> 06:35.930
Okay.

06:35.940 --> 06:40.230
Room Testmatch is not a function.

06:40.230 --> 06:43.230
Yeah, that's because I need to have a text on.

06:43.950 --> 06:45.630
Sorry, guys.

06:47.630 --> 06:51.080
That is probably only my code that has that error.

06:51.350 --> 06:56.000
I don't know why I missed the room text definition.

07:04.620 --> 07:06.850
Okay, so let's see.

07:06.870 --> 07:08.310
One bedroom, one bath.

07:08.340 --> 07:09.240
One bath.

07:10.050 --> 07:13.050
Yeah, that seems to match what we have.

07:14.130 --> 07:18.300
And okay, so we get a error here.

07:18.660 --> 07:23.410
You see, we get no regex regex matches found for path.

07:23.430 --> 07:32.730
So if we go ahead and look at this listing, we see it says one shared path.

07:33.570 --> 07:48.930
So the regex match of path or number space path is going to fail on this example because well, it's

07:48.930 --> 07:51.120
looking for a number and a path.

07:52.320 --> 07:57.630
And well, you can make it into an optional word instead.

07:57.960 --> 08:01.710
Let's try and copy this.

08:06.150 --> 08:06.950
Let's see.

08:06.960 --> 08:07.680
Let's see.

08:12.140 --> 08:13.700
Am I getting it all?

08:13.700 --> 08:15.530
Is this all of it?

08:19.560 --> 08:20.340
Let's see.

08:21.210 --> 08:23.880
It looks like it's missing the values we want.

08:23.910 --> 08:25.440
I can't really see those.

08:31.950 --> 08:32.160
Yeah.

08:32.160 --> 08:32.460
Okay.

08:32.460 --> 08:33.270
It's up here.

08:34.750 --> 08:37.150
So, uh, let me see.

08:39.310 --> 08:41.650
So we have baths, right?

08:42.070 --> 08:47.590
And that did not match or does not match this one.

08:47.590 --> 08:49.630
So maybe if we put in.

08:50.460 --> 08:52.530
Bath as an optional word.

08:52.830 --> 08:54.030
It can work.

08:54.030 --> 08:59.160
So I put the word in parenthesis and a question mark after.

09:00.960 --> 09:02.600
Let's see if it's still the text.

09:02.610 --> 09:03.350
One bath.

09:03.360 --> 09:04.530
It doesn't.

09:05.160 --> 09:07.830
Maybe if I have question mark before.

09:10.640 --> 09:11.660
Uh, come on, Christian.

09:11.660 --> 09:12.020
Mark.

09:12.170 --> 09:12.950
After.

09:16.070 --> 09:19.640
Oh yeah, that's because I have a space in front.

09:19.640 --> 09:21.320
So there's like two spaces.

09:21.320 --> 09:26.090
So if I put space here instead, it's going to detect both of them.

09:26.360 --> 09:34.700
And now we're kind of getting into a, I would say, advanced of, of, of course regular expressions

09:34.700 --> 09:41.780
can be a lot worse than this, but, um, we're, we're getting a little into regular expressions now

09:41.900 --> 09:51.530
and there's, there's other examples that you're going to run into where your regular expression will

09:51.530 --> 09:53.510
fail in these cases.

09:55.130 --> 09:58.970
So some of the rooms also just have studio.

09:59.000 --> 10:00.440
It just says studio here.

10:00.440 --> 10:05.000
So we could also make a or clause for studio here.

10:06.080 --> 10:13.310
However, I think that I don't want to make this into a regular expression course and go into that much

10:13.310 --> 10:13.870
detail.

10:13.880 --> 10:19.490
I think you guys by now are getting the point of how this works.

10:20.180 --> 10:25.310
So guys, this concludes the section on puppeteer scraping.

10:25.460 --> 10:33.080
There's a lot more functions inside of Puppeteer, but I think we covered the most important things,

10:33.080 --> 10:43.460
which is sort of getting the page and and parsing the HTML, the markup so we can get it.

10:43.610 --> 10:50.630
Obviously the the most important feature of these browser automation automation frameworks that you

10:50.630 --> 10:57.960
got like Nightmare JS and Puppeteer is that it's going to render the JavaScript so pages that we normally

10:57.960 --> 11:01.680
can't see, like Airbnb is now visible to us.
