WEBVTT

00:00.960 --> 00:02.080
In the previous lecture.

00:02.200 --> 00:08.720
We extracted these links here and as you see it is not more readable.

00:08.760 --> 00:09.160
Okay.

00:09.720 --> 00:16.480
And also the code that you see here is not very, very user friendly.

00:16.480 --> 00:20.720
And also we need to modify the code that is here.

00:22.840 --> 00:28.840
And also the data that we are extracting, we need to develop or improve the code.

00:28.840 --> 00:31.400
So we could have a better result than this.

00:31.800 --> 00:34.280
So as you see we have a request here.

00:34.280 --> 00:38.720
And also the request is using the Get method here okay.

00:38.760 --> 00:42.000
But we do need to have this function here at all.

00:43.680 --> 00:46.600
Um we have the link here okay.

00:47.960 --> 00:50.040
Now um.

00:52.640 --> 00:54.840
Okay we have this one.

00:55.560 --> 00:57.120
So I need to improve this.

00:58.160 --> 01:00.400
I need to catch this from here.

01:00.440 --> 01:00.800
Okay.

01:00.940 --> 01:03.140
And then add it here.

01:03.860 --> 01:06.740
So remove this function from here.

01:06.980 --> 01:09.820
So right now I'm trying to improve this code in a better way.

01:09.860 --> 01:10.380
Okay.

01:10.420 --> 01:11.860
So we have the URL here.

01:12.780 --> 01:14.980
And let's create another function.

01:15.740 --> 01:23.980
This function is going to be let's name it extract links.

01:24.340 --> 01:24.780
Okay.

01:26.060 --> 01:28.900
So the URL is going to be here.

01:30.060 --> 01:40.220
And inside this method we are going to use something like response that we have it here.

01:40.580 --> 01:48.620
Or let me remove that from here maybe or let that be there right now okay.

01:48.620 --> 01:50.340
We have the response.

01:50.580 --> 01:53.980
It is going to be equal to request.

01:56.420 --> 01:56.900
Okay.

01:56.940 --> 02:01.390
Request dot get It here.

02:01.390 --> 02:04.150
Inside these I'm going to give the URL.

02:04.190 --> 02:04.590
Okay.

02:04.870 --> 02:06.950
Now I have the response here.

02:07.550 --> 02:10.150
And let's return something.

02:10.390 --> 02:17.590
And that thing is re dot find all.

02:17.630 --> 02:19.510
So I remove that from there.

02:19.590 --> 02:21.350
Remove this one as well.

02:23.190 --> 02:31.470
And also in here I'm going to just return the result that is here.

02:31.510 --> 02:31.710
Okay.

02:31.750 --> 02:33.150
We have the response.

02:33.150 --> 02:35.950
We have the content of response.

02:36.510 --> 02:39.270
And we are returning that from here.

02:40.030 --> 02:41.190
That looks perfect.

02:41.950 --> 02:48.270
And now instead of printing a trip that doesn't exist, now we need to do another thing.

02:49.510 --> 02:52.910
So let's create another trip here.

02:53.030 --> 02:55.270
So we have a trip link.

02:55.510 --> 02:56.030
Okay.

02:56.550 --> 02:59.350
This trip links is equal to extract.

02:59.390 --> 03:02.410
Link that we just created.

03:02.770 --> 03:09.210
And I'm going to give the URL as argument here.

03:09.370 --> 03:11.770
So URL.

03:11.810 --> 03:25.650
And now if I just print the URL not retrieved link and save it and then come right here and then hit

03:25.650 --> 03:26.170
enter.

03:26.570 --> 03:28.730
Now you see that still it is working.

03:28.730 --> 03:33.530
And we have the same thing that we were we that we had before.

03:34.050 --> 03:36.450
Now you see again we have the same result.

03:36.610 --> 03:37.050
Okay.

03:38.290 --> 03:48.690
Now instead of just printing the link here, I want to use a for loop to iterate inside them and get

03:48.730 --> 03:50.370
more details.

03:54.650 --> 03:54.930
So.

03:58.010 --> 04:07.030
Let's use a for loop here for link in traveling and in here.

04:07.030 --> 04:15.630
Now if I print the link you will see all this link in each line separately.

04:15.910 --> 04:23.390
So now you see that it looks more bitter okay.

04:23.390 --> 04:28.630
But still you see a lot of thing that we really don't need them.

04:28.630 --> 04:31.430
For example, we have index.php.

04:31.470 --> 04:40.990
We have a lot of things that doesn't necessarily need to be in here, Wikipedia for example.

04:42.070 --> 04:44.870
And a lot of things.

04:44.910 --> 04:46.430
Let me move back here.

04:46.430 --> 04:54.390
For example, I am going to find something like this one.

04:54.390 --> 04:57.910
Not really that one for example.

05:00.130 --> 05:01.890
I am going to find the login page.

05:01.930 --> 05:04.330
Okay so we use login dot php.

05:04.810 --> 05:09.930
So the thing I want is to have this link this part okay.

05:10.170 --> 05:12.810
And then add new thing at the end.

05:13.370 --> 05:15.890
So here you see we have a lot of things.

05:15.930 --> 05:17.650
HTTP index dot PHP.

05:18.210 --> 05:22.170
And these are links that these are not what I want.

05:22.210 --> 05:24.570
For example we have youtube.com.

05:24.570 --> 05:26.010
We don't want that at all.

05:26.450 --> 05:26.810
Okay.

05:32.130 --> 05:39.370
So let's come back here and try to solve this better than this okay.

05:42.010 --> 05:47.890
Uh, to solve these remove some of the link that we don't need.

05:49.370 --> 05:56.970
We need to use, uh, for example, first let me explain something here that you see.

05:57.010 --> 05:59.130
For example, we have index dot PHP.

05:59.170 --> 06:05.780
The page is equal to login dot PHP and you see that it doesn't look like a complete URL.

06:07.180 --> 06:10.220
We need to first handle this one.

06:10.220 --> 06:10.620
Okay.

06:11.020 --> 06:15.940
To handle that we need to import something called URL parse.

06:15.980 --> 06:18.580
Okay so let's use URL.

06:19.620 --> 06:29.260
Uh parse this one and you see that uh, it is it looks like that it is not installed.

06:41.620 --> 06:47.260
So as you see that now, uh, it is not installed.

06:47.300 --> 06:48.580
Looks like it is not installed.

06:48.780 --> 06:50.460
So let's remove this.

06:50.500 --> 06:59.100
And now in Python three, instead of using URL parse, we kind of use an updated version of this.

06:59.160 --> 06:59.560
Okay.

06:59.800 --> 07:12.160
So I'm going to just uh from a URL lib dot parse import something else.

07:12.160 --> 07:16.840
And that is going to be u r l join.

07:16.880 --> 07:17.280
Okay.

07:17.600 --> 07:20.920
Now we can use this URL join to join some of these.

07:20.920 --> 07:24.880
For example you see we have index dot uh this one.

07:24.880 --> 07:30.960
And also in most of these links we don't have https ww at the beginning.

07:30.960 --> 07:32.280
So we need to handle that.

07:32.400 --> 07:35.640
We can handle that very easily right here.

07:35.760 --> 07:40.280
And I'm going to say that link is equal to URL join.

07:41.560 --> 07:43.080
So in here I'm going to join two things.

07:43.080 --> 07:45.760
One of them is going to be your URL.

07:46.080 --> 07:49.920
And I'm going to join this with link okay.

07:50.200 --> 07:55.600
So let's save it and re execute this.

07:55.600 --> 08:00.980
Now you see that uh those thing that we didn't need is removed.

08:01.020 --> 08:01.420
Okay.

08:01.620 --> 08:09.660
And also you see some of the the links is printed here like this or was or something we don't need really.

08:09.860 --> 08:12.700
And I want to I want only these links.

08:13.620 --> 08:21.060
So you see we have OWASp here OWASp, OWASp okay.

08:22.020 --> 08:29.900
And also this is some link like hacker and a lot of things that we really don't need.

08:29.900 --> 08:32.540
For example WW dot php dot net.

08:32.580 --> 08:33.660
I don't need that.

08:33.780 --> 08:36.420
So I need to handle this as well.

08:36.580 --> 08:44.420
So to handle this we can uh use an if statement here.

08:44.900 --> 08:56.540
So and in here I'm going to tell that if the URL that we have is in link, then print that link for

08:56.540 --> 08:56.860
me.

08:57.180 --> 08:58.660
Otherwise not.

09:01.640 --> 09:07.040
So I am going to save it and let's execute this once again.

09:07.320 --> 09:16.200
Now you see that I am having only the links that is, uh, that I need.

09:16.560 --> 09:16.840
Okay.

09:16.880 --> 09:23.040
For example, and these all have, uh, this thing at the beginning.

09:23.680 --> 09:26.080
And also it is what I have.

09:26.120 --> 09:27.560
Okay, what I need.

09:27.560 --> 09:35.040
And if you wanted to do this attack on a real, uh, website, not a real website on a live website,

09:35.560 --> 09:39.280
not HTTP website, not more mortality.

09:39.320 --> 09:48.080
You can give the link of that specific website here, and it will give the appropriate links that you

09:48.080 --> 09:48.840
need here.

09:49.240 --> 09:51.960
Now you can use each of them of each of these links.

09:52.000 --> 10:00.290
For example, uh, this one you see I hit logged in and right now I am in mean, this web page.

10:00.770 --> 10:08.290
And if I go into something else, for example, let's see if we could find something very, very useful.

10:10.170 --> 10:10.490
Okay.

10:10.490 --> 10:13.050
These are indexes installation dot php.

10:15.490 --> 10:16.050
Um.

10:28.850 --> 10:29.810
Anything you want.

10:29.850 --> 10:30.170
Okay.

10:30.210 --> 10:31.410
It will give you.

10:34.650 --> 10:36.010
Show logs.

10:38.090 --> 10:41.170
So you see that we are into the logs here.

10:41.690 --> 10:47.530
And there is another page that you need to find.

10:47.570 --> 10:50.250
So perfect.

10:50.570 --> 10:56.170
Now you see that, uh, we handle this very easily.

10:56.210 --> 10:56.810
Okay.

10:56.850 --> 10:57.890
So thanks for watching.

10:57.890 --> 10:59.490
And I will see you in the next lecture.
