WEBVTT

00:00.570 --> 00:07.350
Hi, I hope you have worked on this first project, which is the house price prediction problem, the

00:07.350 --> 00:13.740
next project which we would work under supervised learning, is, again, a regression project which

00:13.740 --> 00:18.090
is based on the property hazard and the school prediction.

00:18.450 --> 00:23.610
So in this particular project, we will be using supervised learning and regression.

00:24.240 --> 00:26.970
And this is an industry level project.

00:26.980 --> 00:34.110
So the data center would be a little more typical in nature and a little complex in nature in comparison

00:34.110 --> 00:35.050
to the previous one.

00:35.730 --> 00:44.910
So here we would be given a mass dataset which contains around forty thousand rows of properties and

00:44.910 --> 00:46.650
thirty for various dots.

00:46.830 --> 00:49.680
That is thirty for various columns of data.

00:50.190 --> 00:57.890
So this particular dataset deals with a problem which is property hazard problem.

00:58.920 --> 01:06.750
So it says that there are different properties in a particular location and one particular organization

01:06.750 --> 01:12.260
is trying to do a domain Damiens of these properties.

01:12.270 --> 01:16.860
They are trying to restore the properties.

01:17.100 --> 01:24.720
Now, when this organization takes up the task of restoring this property, so there are certain hazards

01:24.720 --> 01:30.870
which are associated with the property which is being restored, there could be a case where the property

01:30.870 --> 01:34.550
has zero risk and it is completely fine to restore it.

01:34.800 --> 01:40.550
Well, there could be a few properties where it would be very difficult to restore the property and

01:40.560 --> 01:43.860
there could be different hazards which could be possible.

01:43.860 --> 01:48.690
Or maybe the company might lose a lot of money by investing in that property.

01:48.690 --> 01:52.160
And still, it would not be good enough for selling to someone else.

01:52.770 --> 01:58.180
So that is a different risk level which will be associated with these properties.

01:58.410 --> 02:04.890
So here we have around forty thousand eight hundred properties and could be four different task authority

02:04.890 --> 02:13.050
for different risk factors associated with different things to which they are creating a generalized

02:13.050 --> 02:14.800
single risk score.

02:15.810 --> 02:24.360
So this risk score is actually depending on these thirty four different dust right now.

02:25.080 --> 02:30.440
These are for taking up in different heavy equipment maintenance by the manual.

02:30.630 --> 02:34.110
Also, the manual crew is actually doing these performing these days.

02:34.500 --> 02:40.680
So each of these days have been given a hazard score, which is eventually used to decide the level

02:40.680 --> 02:46.350
of safety checks and caution while planning for the maintenance process of these properties.

02:47.280 --> 02:54.660
So your task here is to build a predictive model for predicting the hazards, given other information

02:54.660 --> 02:59.670
related to the main building, that school may have given these 30 foot columns which have different

02:59.670 --> 03:02.520
maintenance task and did different properties.

03:02.700 --> 03:08.820
And out of that, you need to make sure that you create a model which is able to find out accurate risk.

03:09.870 --> 03:15.540
Now, one thing to note here is that the risk score is not a continuous value.

03:15.990 --> 03:19.660
So what we have been dealing with till now are continuous values.

03:19.830 --> 03:21.880
So now where does this go?

03:22.560 --> 03:26.940
Should this be taken up as a classification problem or regression problem?

03:26.940 --> 03:34.640
But because these are actual numbers and a lot of numbers, so it's not fit to be a classification problem.

03:35.520 --> 03:39.390
That is why you will be using a regressive it.

03:39.630 --> 03:45.370
So, for example, what you can do is you can use the objective function, which is counterpoise.

03:45.890 --> 03:55.510
So counterpoised is actually will be helpful in finding out in applying regression back for the numbers.

03:55.530 --> 03:58.500
That is something which will not have the floating point values.

03:58.710 --> 04:03.240
So you will be dealing with something which is in a whole number or integer form.

04:04.250 --> 04:04.520
Right.

04:04.590 --> 04:06.000
So we can use that.

04:06.360 --> 04:11.580
Apart from that, this project for this project, you will be working with me.

04:11.690 --> 04:12.270
Absolutely.

04:12.960 --> 04:18.570
So what you can do is you can find out the mean absolute error for the modules which you have created

04:18.930 --> 04:26.730
and you can find out the score as one minus the mean absolute error divided by a five point four.

04:26.940 --> 04:30.800
You don't need to really think much about how this coding is happening.

04:31.740 --> 04:39.330
You just need to make sure that the score, which you get after this particular calculation comes out

04:39.330 --> 04:41.610
to be more than zero point five.

04:41.640 --> 04:45.840
And that would be a good model, which you would have generated.

04:46.920 --> 04:56.490
So, again, there is no specific rule or no specific threshold value that your model has to be top

04:56.490 --> 04:59.880
notch or should have an accuracy of this or.

05:00.020 --> 05:02.960
I told this, I mean, absolutely nothing like that.

05:03.860 --> 05:07.610
This is the minimal point that the school has to be at least 2.5.

05:08.360 --> 05:10.130
That is greater than zero point five.

05:10.520 --> 05:12.740
And apart from that, there is no limit.

05:12.740 --> 05:18.020
You can keep improving your model, implement different models, compare different models, stick them

05:18.020 --> 05:23.700
together, do whatever you feel like and build your borders.

05:23.720 --> 05:31.340
And again, my model, which I would be sharing as the code, is just one of these solutions.

05:31.670 --> 05:34.670
There is no absolute perfect solution.

05:35.090 --> 05:40.880
You can try at your own to create your own good solution for that.

05:41.420 --> 05:43.250
OK, so thanks.

05:44.960 --> 05:52.310
Please work on this particular model, building this particular problem and I will be sharing the solution

05:52.310 --> 05:54.290
in the next video for this one.

05:54.650 --> 05:55.070
Thank.
