WEBVTT

00:01.330 --> 00:07.300
Okay, guys, so now it's time for us to scrape the rooms we have or.

00:07.300 --> 00:08.080
No, sorry.

00:08.080 --> 00:11.560
The number of guests that's allowed in the rooms.

00:11.980 --> 00:18.580
I've removed all the code I showed you in the previous lecture because I just wanted to show you why

00:18.580 --> 00:22.690
we are going to use regular expressions now.

00:23.980 --> 00:24.360
Um.

00:25.720 --> 00:26.680
So.

00:27.970 --> 00:29.350
Let's get on onto it.

00:29.350 --> 00:37.210
So on my right side, I have my favorite site for checking regular expressions.

00:37.510 --> 00:42.400
And I'm going to be showing you in a moment how we are going to use this.

00:42.790 --> 00:45.460
Let's first try and see.

00:47.250 --> 00:50.580
Let's first just try and run the script.

01:02.360 --> 01:11.210
So I'm just going to copy this into a blank page while the scraper is still running new pages here.

01:12.380 --> 01:15.740
I'm going to go into Chrome Developer Tools.

01:18.100 --> 01:20.470
And now.

01:21.700 --> 01:29.440
Each of these elements in is contained inside a div with an ID that is called room.

01:29.440 --> 01:35.890
And basically it's all the text for the posting we have inside of this selector.

01:36.370 --> 01:47.320
So if we go room dot text, we get a bunch of text here and up in the top we have the information that

01:47.320 --> 01:48.550
we're looking for.

01:49.670 --> 01:53.330
Free guest, one bedroom, one bed, one bath.

01:53.570 --> 01:58.010
So basically, we're going to take all of this text.

02:00.020 --> 02:03.630
And put it inside of my sight.

02:03.660 --> 02:09.320
Reg 101 to test the regular expression we're going to write.

02:09.680 --> 02:11.700
So let's try and see.

02:11.720 --> 02:19.310
So if you're not familiar with regular expressions, it's basically a way to find patterns inside of

02:19.310 --> 02:20.150
text.

02:20.750 --> 02:27.930
So, for example, the plus means we are looking for any kind of number of any length.

02:27.950 --> 02:31.400
If it's just D, it is just a single number.

02:32.900 --> 02:37.130
So if we look for the plus and guest.

02:38.770 --> 02:44.920
Then we are going to find just the text here which is says free guest.

02:45.310 --> 02:53.460
Now notice that I'm not putting guest behind because some rooms may be just have one guest only allowed.

02:53.470 --> 03:02.110
So if we have the guest with the S in the end it's going to not detect a text if it has only guest.

03:02.860 --> 03:11.200
So just to make an example, if it has one guest, this regular expression is not going to detect it.

03:11.200 --> 03:15.190
So we need to have only guest instead.

03:15.970 --> 03:19.990
But it is going to be fine for what we're using it for.

03:22.600 --> 03:26.540
So we see that this regular expression is working fine.

03:26.560 --> 03:35.530
Obviously, if the, um, the owner of the room is typing somewhere else, uh, a number and guest after,

03:35.530 --> 03:38.020
we're going to detect it as well.

03:38.620 --> 03:45.100
But most likely the information we want is right in the top of this text.

03:45.100 --> 03:53.560
So it's highly likely that the first result of this regular expression is going to be what we want.

03:54.820 --> 04:03.640
So with that being said, let's just copy that regular expression we have here and go inside of our

04:03.640 --> 04:04.360
code.

04:06.240 --> 04:18.840
So let me just make a room text variable for the selector containing all of the rooms text, and then

04:18.840 --> 04:23.910
we are going to run a regular expression on that and return the results.

04:24.450 --> 04:26.160
So let's see.

04:26.160 --> 04:27.210
We can run.

04:30.630 --> 04:32.640
Guest matches.

04:32.790 --> 04:34.620
Room, text.

04:35.010 --> 04:36.180
Match.

04:36.780 --> 04:47.670
And then we put in our regular expression with two forward slashes and type in our regular expression

04:48.840 --> 04:50.010
like so.

04:51.090 --> 04:59.190
And just to make sure that we're not going to get a index out of bounds error, I will check if the

04:59.220 --> 05:04.200
guest matches has a length over zero.

05:05.040 --> 05:15.470
And if it does, we will assign a variable, let's call it guest allowed by default.

05:15.480 --> 05:24.660
I will say not applicable in case we can't find this regular expression and I will say guest allowed

05:24.660 --> 05:30.010
equals guest matches and we get the first match.

05:31.480 --> 05:34.600
So that should be good enough.

05:35.530 --> 05:40.330
Let's try and print the guest allowed.

05:41.710 --> 05:52.660
And let's also just return the values we have gotten so far price per night and guests allowed like

05:52.660 --> 05:52.990
so.

05:52.990 --> 05:54.490
Let's see how it run.

05:56.590 --> 05:58.450
Let's go and run this script.

06:03.260 --> 06:03.680
Hmm.

06:03.680 --> 06:04.400
That is.

06:04.400 --> 06:05.280
That's odd.

06:05.300 --> 06:06.440
It's not running.

06:06.860 --> 06:15.380
I notice sometimes that it is not running all the time, and we'll try and debug why that is happening.

06:16.220 --> 06:18.470
Let's try and run it again.

06:20.250 --> 06:22.320
So now it seems to be running.

06:22.860 --> 06:23.790
So there we go.

06:23.790 --> 06:27.150
We get the crowns and the number of guest.

06:27.810 --> 06:35.700
I think the reason why the index page is sometimes getting stuck in this scraper that I've noticed that

06:35.700 --> 06:41.940
sometimes is because we don't have the wait until in that page.

06:41.940 --> 06:45.480
So let me just go ahead and add that as well.

06:47.370 --> 06:54.420
And, um, yeah, as you can see, it is getting the price and the number of guests that is allowed

06:54.420 --> 06:58.720
in the room seems to be going nicely too.

06:58.920 --> 07:01.740
Guest four Guest.

07:02.800 --> 07:06.430
Yeah, it seems to be running just fine.

07:09.590 --> 07:14.750
Okay, everyone, in the next section, we are going to get the rest of the values, but it is pretty

07:14.750 --> 07:22.490
much going to be just like we are getting the the guest, the number of guests allowed.

07:22.490 --> 07:26.390
So see you in the next lecture where we get the rest of the values.
