WEBVTT

00:00.280 --> 00:06.160
In the introduction of this section, I told that we are trying to get job listings from Craigslist,

00:06.340 --> 00:09.220
but what data are we trying to scrape exactly?

00:09.250 --> 00:11.920
Well, here I am on a Craigslist site.

00:11.920 --> 00:16.600
It is the San Francisco Bay Craigslist, to be exact.

00:17.230 --> 00:22.540
So I'm going to go into the job section and then go to software QA, DBA.

00:23.930 --> 00:27.050
And in here we can see a listing of jobs.

00:27.260 --> 00:31.820
So you can see they have a date that the job was posted.

00:32.090 --> 00:35.060
They also have a title of the job.

00:35.150 --> 00:39.050
And then they have a neighborhood in the parenthesis.

00:39.970 --> 00:45.130
Then we can also click on the job listing and then we get to a job description.

00:46.020 --> 00:46.920
In here.

00:46.920 --> 00:52.380
We have all the job description here, and then we also have a compensation.

00:54.590 --> 00:55.560
So it's not a lot.

00:55.580 --> 00:59.390
You're getting paid for this job, but they obviously haven't filled it out.

00:59.900 --> 01:06.770
Some job listings do not have a neighborhood filled out and some do not have compensation filled out.

01:06.980 --> 01:10.380
But we're going to scrape it from wherever we can.

01:10.400 --> 01:14.120
So if we're logging, we get that data with us.

01:14.540 --> 01:21.080
If not, then at least we have the job description, the job title, and the date it was posted.

01:22.350 --> 01:25.420
So this is the data we're trying to scrape from Craigslist.

01:25.440 --> 01:32.700
We're going to get the data was posted, the job title, the neighborhood, and then we get the URL

01:32.730 --> 01:40.230
of the job description and we get the content of all the job description and a compensation.

01:41.620 --> 01:46.780
So that's a brief overview of the data we're trying to scrape in the next section.

01:47.020 --> 01:52.450
I'm going to show you how our data structure would look like for these scraping results.
