WEBVTT

00:01.590 --> 00:08.520
I hope you did well in the last project, which was, again, a classification project, now in this

00:08.520 --> 00:13.870
particular project we would again be applying classification but on an industry level project.

00:14.670 --> 00:19.890
So this project is regarding flooding of junk properties.

00:20.190 --> 00:27.810
So flooding the property as possibly junk beforehand can help businesses prioritize their efforts and

00:27.810 --> 00:34.990
focus them on more probable success rather than bogged down with the weight of unsaleable portfolios.

00:35.400 --> 00:43.410
So these property sellers, they are trying to fly certain properties which don't really get sold easily

00:43.650 --> 00:51.180
so that they can target on something which is more beneficial and could give success faster.

00:51.420 --> 00:57.780
So in this project, we will be making use of the data from the past operations where many properties

00:57.780 --> 01:01.500
were found to be junk after their porches for renovations.

01:02.070 --> 01:10.920
So this particular dataset contains around sixty two thousand drawers of data and it has around 31 various

01:10.920 --> 01:14.950
attributes regarding the location of the property and different details.

01:16.050 --> 01:23.640
So your task here is to build a predictive model for predicting whether a property should be marked

01:23.640 --> 01:26.850
as junk on the basis of the listing details.

01:26.850 --> 01:31.890
And the different preliminary assessment is now here.

01:32.130 --> 01:40.620
You how you will be evaluating your model is on the basis of the U.S. score, which has to be more than

01:40.620 --> 01:41.930
zero point six five.

01:41.970 --> 01:44.700
What do you do for this particular project?

01:45.270 --> 01:47.550
So let's have a look at the data set.

01:47.580 --> 01:50.900
So here you can see this particular poll over just junk.

01:51.300 --> 01:53.360
This is the target column.

01:53.370 --> 01:59.490
So you need to classify these properties so that you get a value, either zero or one here.

02:00.000 --> 02:08.190
And here are different columns with different types of values, such as impedance style, price details,

02:08.190 --> 02:16.290
listing the material, the channel, the zip code, the insurance, the blood type architecture and

02:16.290 --> 02:18.740
different different information represented.

02:19.020 --> 02:23.870
So what you will have to do is, again, you can see there are a lot of categorical variables.

02:24.300 --> 02:32.160
So your main focus here would be looking for these variables into dummy variables and then to find out

02:32.160 --> 02:34.750
the variables which are actually important.

02:36.570 --> 02:47.310
So I hope you will be working on this particular project and would be so able to solve this very easily

02:47.310 --> 02:54.690
as this is not a very complex problem, but again, has a lot of tasks which you would have to perform

02:54.690 --> 02:57.150
so that you can achieve a good result.

02:57.870 --> 02:58.440
Thank you.