WEBVTT

00:00.530 --> 00:05.660
In this rerecorded video, I'm going to go over wget and Httrack.

00:06.380 --> 00:08.840
I'm recording this video for a couple of reasons.

00:08.840 --> 00:16.220
One is the audio quality was not consistent with the new equipment I've been using for this course,

00:16.220 --> 00:23.420
and also the old one did have Trace Labs Linux, whereas the course now is using CSI Linux.

00:23.420 --> 00:25.790
So let's take a look.

00:26.210 --> 00:31.280
So Httrack and wget are two great ways if you need to clone a website.

00:31.280 --> 00:36.650
And the reason why you might need to clone a website is say we're doing an investigation on a website

00:36.650 --> 00:38.780
and we want to do more research on it.

00:38.810 --> 00:42.200
Well, having a offline copy can come in really handy.

00:42.230 --> 00:48.680
Of course, we could do things like use the Internet Archive to archive it, but also having the offline

00:48.680 --> 00:50.660
copy comes in really handy.

00:50.660 --> 00:52.250
So let's take a look.

00:52.250 --> 00:55.730
Both are going to be terminal systems here.

00:55.970 --> 01:00.110
So I'm just going to open a terminal here.

01:00.110 --> 01:03.260
Open this up here and let me see if I could.

01:05.570 --> 01:07.670
Make this a little bit larger.

01:10.250 --> 01:13.730
Actually let's just go ahead and get into this.

01:13.730 --> 01:18.320
So first command I'm going to do is w get.

01:20.330 --> 01:20.900
Okay.

01:20.930 --> 01:23.870
And this is to clone a website.

01:23.900 --> 01:26.330
Well actually before I do wget let's do this.

01:26.330 --> 01:31.970
Let's create a folder I'm going to do say we want to clone Google for for investigation.

01:32.000 --> 01:36.830
I'm going to do I'm going to make a directory for it first so it's easier to find.

01:37.520 --> 01:40.670
So I could do mkdir space.

01:40.670 --> 01:42.590
And then you can name the folder whatever you want.

01:42.620 --> 01:45.890
In this case I'm going to name it Google okay.

01:45.890 --> 01:46.730
And hit enter.

01:46.730 --> 01:48.890
And now we have a folder called Google.

01:48.890 --> 01:53.540
So now I'm going to change directory I'm going to do CD space Google.

01:54.110 --> 01:57.320
And that's going to take me into the Google folder.

01:57.320 --> 02:01.220
So from here I could do wget wget space.

02:01.340 --> 02:08.720
And I want to do dash m because I'm going to mirror the website, I'm going to do w w w dot google.com

02:10.700 --> 02:11.480
okay.

02:11.510 --> 02:16.670
And we can see it actually going through the process here of cloning that site.

02:23.600 --> 02:29.180
So while that's running, I'm actually just going to jump in there and we're going to take a look at

02:29.180 --> 02:29.330
it.

02:29.330 --> 02:32.420
So let me click on the File Explorer here.

02:32.990 --> 02:33.920
Wrong one.

02:39.680 --> 02:41.390
Let me close that out here.

02:43.790 --> 02:44.330
Okay.

02:44.360 --> 02:50.690
And I'm actually just going to stop that because I'm that should be enough information there.

02:50.720 --> 02:51.860
Let me minimize that.

02:51.890 --> 02:53.480
I'm going to go to home here.

02:54.080 --> 02:56.900
And then we're going to go to Google and we can see this.

02:56.930 --> 03:00.830
It created a folder in here w w w google.com.

03:00.830 --> 03:02.930
So the site that I actually cloned.

03:03.620 --> 03:09.170
So if I go in here I can see all the different things that it started indexing off of that site here.

03:09.650 --> 03:14.360
And somewhere in here is going to be the index page index.html.

03:14.360 --> 03:20.390
If I click on here, we can see that it actually Did in fact clone Google along with a lot of other

03:20.390 --> 03:22.580
stuff off of that page.

03:22.610 --> 03:26.570
So if I left this running, it eventually would have cloned the entire page.

03:26.570 --> 03:31.340
But again, for this demonstration, I didn't really need to actually do that.

03:31.340 --> 03:35.600
So that's wget.

03:37.520 --> 03:40.790
Now the other program we could use is called Httrack.

03:40.790 --> 03:42.380
And it essentially does the same thing.

03:42.410 --> 03:44.090
It clones websites.

03:44.090 --> 03:45.860
So let me expand this out.

03:45.860 --> 03:47.330
I'm going to create a new folder here.

03:47.330 --> 03:48.980
Again I'm going to name it Google two.

03:49.010 --> 03:56.180
So mkdir to make the directory make directory Google two.

03:56.390 --> 03:56.900
Okay.

03:56.930 --> 03:58.610
And let's change directory CD.

03:58.880 --> 04:01.430
Google two.

04:01.460 --> 04:03.890
And now we're in the Google two folder.

04:03.890 --> 04:09.230
So for httrack I'm going to type in Httrack.

04:09.260 --> 04:16.880
And then I'm going to do dash dash sorry dash dash help.

04:16.880 --> 04:22.730
And this gives us the all the different commands we could run for Httrack.

04:26.040 --> 04:26.490
Okay.

04:26.490 --> 04:37.290
And we can see there's a lot of different options that we could do to to clone these sites here.

04:37.290 --> 04:38.670
So we could use proxy.

04:38.700 --> 04:44.520
We could do w to mirror website or dash dash mirror.

04:44.520 --> 04:46.980
And it gives us examples here.

04:49.950 --> 04:53.730
So H track is another really useful program here.

04:55.350 --> 04:57.990
So let's take a look at this.

05:00.570 --> 05:02.850
So I'm going to do.

05:04.920 --> 05:06.810
H track

05:08.640 --> 05:15.210
w and let's try w w w dot google.com.

05:19.380 --> 05:19.710
Okay.

05:19.710 --> 05:23.430
And we can see it's actually cloning that website here.

05:24.990 --> 05:30.960
And we can see the different uh URLs within google.com that it's actually cloning.

05:31.050 --> 05:33.300
So again I'm going to minimize this.

05:33.330 --> 05:34.710
We're going to go to home here.

05:34.710 --> 05:36.900
And we're going to go to the Google two folder.

05:36.900 --> 05:40.560
And we can see it actually starting to index everything here.

05:42.330 --> 05:44.940
So just like a track.

05:44.940 --> 05:49.320
So in this case a track is doing a full clone.

05:49.320 --> 05:51.180
So I can actually see the images there.

05:51.180 --> 05:56.250
I could probably see the images with wget also if I let it run long enough.

05:56.580 --> 06:03.000
But as you see, if I go back into the terminal here, we should see it still kind of going through.

06:03.030 --> 06:03.570
There we go.

06:03.570 --> 06:04.860
And it finished.

06:06.060 --> 06:10.080
So H track again gives us a clone of that site.

06:10.080 --> 06:13.080
And if we're doing an investigation we can kind of go through the index.

06:13.080 --> 06:17.490
We could take a look at all the different, different files and everything else about that.

06:17.490 --> 06:21.930
We could basically deconstruct that website HTML page.

06:22.080 --> 06:27.300
And again, the two useful programs for that is both h track and wget.

06:27.300 --> 06:32.550
And you run those through the terminal on your CSI Linux machine.

06:32.580 --> 06:34.170
Thank you so much for watching.

06:34.170 --> 06:35.550
I'll see you in the next video.
