WEBVTT

00:01.710 --> 00:08.740
OK, so let us first discuss the solution for the first project that this house price prediction problem.

00:09.330 --> 00:17.860
So here is what we have done is we have imported all the required libraries and we have got the data.

00:18.270 --> 00:24.690
So here we have the data said that this house data set and this data set contains around twenty one

00:24.690 --> 00:30.360
thousand rooms and twenty one columns out of these.

00:30.720 --> 00:33.300
We have selected these different data.

00:33.300 --> 00:33.790
Right.

00:34.320 --> 00:39.060
So we have got the database, which includes object.

00:39.330 --> 00:41.640
So we have got this data.

00:42.330 --> 00:50.940
And next, what we are doing here is checking that we have only this data column, which is a timestamp.

00:50.940 --> 00:53.850
So we will basically ignore this data column.

00:54.360 --> 01:02.400
So what we are doing here is we are checking if there are any columns which are having not a numbers

01:02.430 --> 01:03.030
of value.

01:03.390 --> 01:11.070
And then we see that we get that there are no particular columns, which are no particular rules, which

01:11.070 --> 01:14.190
have a number in the values.

01:15.360 --> 01:22.790
So we see that the data is pretty much structured and it does not have any norten no values.

01:23.190 --> 01:29.780
So we can basically jump in to finding the correlation between the features and data equity.

01:29.810 --> 01:38.280
But so now when we are finding out the correlations, so we have simply got the feature values and the

01:38.280 --> 01:43.230
target values and we are finding out the correlation.

01:43.260 --> 01:50.100
So this is the blind correlation list, which we have created and we are putting in the Peerson correlation

01:50.100 --> 01:57.780
inside this, using this bias in our function and comparing each of those values with the target value.

01:57.810 --> 02:04.470
So we are comparing the all the features with the target value to find out the correlation values.

02:05.670 --> 02:12.360
So next, what we are doing is we are converting all this data, which we have got for the correlation

02:12.360 --> 02:13.890
in toward the Gulf Stream.

02:14.310 --> 02:24.210
And we have created this data correlations, data frame for that and for each location they are checking

02:24.240 --> 02:31.380
and they are getting the values, the absolute values, and we are sorting based on these correlation

02:31.380 --> 02:31.980
values.

02:33.270 --> 02:43.290
So here we can see the maximum correlation value of which we have here is you can see that here it is,

02:43.290 --> 02:53.610
70 percent correlation and it goes down to zero point zero two, which is basically two percent correlation

02:53.610 --> 02:54.590
between the values.

02:54.840 --> 03:00.840
So you can see the minimum correlation that we have is between long and prices and conditions and prices,

03:01.080 --> 03:07.500
while the maximum correlation which we have is of seventy five, seventy percent between square feet

03:07.500 --> 03:08.790
living and the price.

03:10.740 --> 03:17.000
So we can see that the top five features are most correlated features with the target price.

03:17.280 --> 03:23.220
And so what we do is we applaud the best tool aggressor's jointly.

03:23.220 --> 03:25.440
So we applaud this.

03:25.470 --> 03:26.970
So how do we do that?

03:27.300 --> 03:36.270
So we are taking this very column and this X column this week column would consist of square feet living

03:36.300 --> 03:42.180
on the three, which are the top two columns which have the maximum correlation.

03:44.040 --> 03:52.710
And we are again sorting the values so that we can see a proper plot with these values and we are creating

03:53.040 --> 03:55.650
this y value y es shape.

03:55.650 --> 04:00.110
So we are getting the values with respect to that index.

04:02.310 --> 04:07.380
So we are getting the X values and Y values corresponding to these next.

04:07.380 --> 04:11.280
We are generating the different plots for that.

04:11.280 --> 04:16.720
So we are generating four square feet with grade and prices.

04:16.740 --> 04:18.520
So here we have the data.

04:18.540 --> 04:20.500
So this is the square data.

04:20.520 --> 04:23.730
This is the three data and this is the price data.

04:24.210 --> 04:30.980
So here you can see how these plots are very much similar to each other.

04:31.410 --> 04:39.030
And you can see that these square feet and great plots are almost similar, only difference being the

04:39.420 --> 04:42.100
range and the scale a little bit.

04:42.360 --> 04:45.540
Otherwise, everything else seems to be a lot similar.

04:46.890 --> 04:52.440
So next, what we will be doing is we will fit the model first.

04:52.590 --> 04:57.360
So we are generating the base model, using the menials regression.

04:57.630 --> 04:59.700
So when we are generating this base.

05:00.300 --> 05:09.230
We are getting some of the top forelimbs that is you can see here we have square feet living green,

05:09.240 --> 05:14.620
square feet above square feet, living bathroom to your bedroom.

05:14.640 --> 05:16.080
So let's see here.

05:18.120 --> 05:23.180
We have these floors, all of these.

05:24.030 --> 05:27.080
So these are different columns which we are taking off.

05:28.590 --> 05:39.530
So now we are getting the X and Y columns and the performing cross-validation on this, trying to split.

05:39.540 --> 05:42.010
So we are getting the split for this data.

05:42.030 --> 05:49.320
We are simply getting the split of the data, which is the size of zero point two.

05:49.320 --> 05:56.130
That is, 20 percent of data will belong to the best data and the rest will be a part of our training

05:56.130 --> 05:56.760
dataset.

05:58.140 --> 06:01.920
So here we have extreme X test wiping rightest.

06:02.610 --> 06:11.520
So next, what we have done is we have this regression and then we have predicted the values and we

06:11.520 --> 06:16.620
are scoring this regression and the score gives us zero point seven zero.

06:18.600 --> 06:27.090
So here we are getting around 70 percent prediction score, which is not a very good score and not a

06:27.090 --> 06:28.200
very bad score.

06:28.230 --> 06:35.670
This is a decent point, but yes, because we have implemented the linear model, the line is a Christian,

06:36.090 --> 06:40.080
so it can be improved a lot more for thought.

06:40.110 --> 06:41.460
So let's go ahead.

06:42.900 --> 06:49.860
So we are calculating the square area for this, which comes out to be one nine three six one five.

06:50.130 --> 06:57.000
Now, one thing to note here is that the root mean squared error is the squared value squared value

06:57.000 --> 07:03.670
error and house prices are again amounts which are larger value.

07:03.690 --> 07:10.080
That is why the item is he is also a large of what you are asking is to bring it closer to zero.

07:10.410 --> 07:14.460
The value is large because it is a house price.

07:14.460 --> 07:17.800
If it was an interest rate, would have been a very small value.

07:18.150 --> 07:22.700
So there is nothing to be scared about out of this particular value.

07:22.710 --> 07:23.730
It's just the scale.

07:24.930 --> 07:27.960
So next, what we are doing is we are implementing exhibits.

07:28.920 --> 07:35.190
So here I have taken up an estimated one hundred and eighty two point zero eight and I have three in

07:35.550 --> 07:37.080
this particular model.

07:37.110 --> 07:45.540
And after cleaning I get the ingredients code as eighty one point three seven percent, which is almost

07:45.540 --> 07:48.660
close to eighty five percent.

07:48.660 --> 07:54.810
Eighty four percent of what we can do is this is just a simple implementation.

07:54.810 --> 08:01.560
Using examples, suppressing what you will be doing is you can similarly implement a random forest,

08:01.560 --> 08:10.140
extremely distressed, or you could implement the different algorithms got stuck together and these

08:10.140 --> 08:13.280
could be I could achieve a lot more better result.

08:13.800 --> 08:19.070
So that is something which you will have to do and you will have to implement more modules.

08:19.080 --> 08:24.120
You will have to compare different models and then get better results out of the.