WEBVTT

00:01.670 --> 00:09.020
In last session, we saw that how we can implement linear regression, but have you given a thought

00:09.170 --> 00:11.810
when we will be implementing linear regression?

00:12.080 --> 00:17.990
Then each and every column which we have in the data will get a confession.

00:19.010 --> 00:25.760
Now, this would be a chance then one of the problems does not really need a confusion.

00:26.300 --> 00:28.090
That is not sufficient.

00:28.100 --> 00:31.740
Should have been zero or not present at all.

00:33.410 --> 00:42.200
So in such cases, it is very difficult to find out which needs to be removed or we cannot really find

00:42.200 --> 00:48.830
out how the value should be present for that particular column, because somehow linear regression would

00:48.830 --> 00:55.580
always try to find the for the columns which we have provided, which will lead to overfitting.

00:56.780 --> 01:04.730
The models would have low accuracy in case the model or fit the model to try to capture the noise in

01:04.730 --> 01:10.160
the training data and will try to find a confusion even if it does not present.

01:11.510 --> 01:17.390
The noise, the data point that does not really be present, the two properties of the data, but random

01:17.390 --> 01:24.410
chance will actually impact the proficient values and lead to larger coefficients.

01:25.640 --> 01:32.250
So learning this data makes the model more flexible at the risk of overfitting.

01:32.930 --> 01:35.240
So the line language we saw earlier.

01:39.770 --> 01:46.310
We will get something which is resembling to this, although the line will be linear, but still it

01:46.310 --> 01:48.410
will try to do all of it to the data.

01:49.630 --> 01:58.520
Hence, we need to do something about it, and that is what Regularisation is, the organization thinks

01:58.540 --> 02:08.830
the coefficient towards zero and it discourages learning a more complex or flexible model so it will

02:08.830 --> 02:10.750
avoid the risk of overprotect.

02:11.410 --> 02:16.860
So to avoid overvoting in linear regression, we will apply regularisation to it.

02:18.090 --> 02:21.090
Now, let us see what are different methods of legalization.

02:21.780 --> 02:24.060
There are two methods of regularization.

02:24.390 --> 02:29.480
One is the repression and the second one is the last ignition.

02:30.120 --> 02:35.070
The formula for retrogression is summation of the town square.

02:35.280 --> 02:40.440
And for the last regression is summation of the absolute value of the vehicle.

02:41.010 --> 02:51.360
Hence, it will actually add the original value, the original linear equation, and add a little value

02:51.360 --> 02:51.780
to it.

02:52.350 --> 02:58.540
Now here we have this confusion, which is actually causing these regularization.

02:59.190 --> 03:02.700
So this is the regularization which has been added to it.

03:03.360 --> 03:05.370
Now, one thing to remember is.

03:06.400 --> 03:15.460
That rigid aggression will try to minimize the proficient black vegetation, will not be able to bring

03:15.460 --> 03:22.390
the sufficient to exact Zeder, hence there will always be a little sufficient value left.

03:23.560 --> 03:31.570
While last regression will try to bring the coefficients to zero if possible, hence the volumes which

03:31.570 --> 03:36.250
are actually not required will get a sufficient value as zero.

03:36.280 --> 03:44.140
So there is no impact because of those of efficient on volumes on the regression model which we have

03:44.140 --> 03:44.620
provided.

03:45.850 --> 03:47.560
So let's compare both of these.

03:47.920 --> 03:55.420
So rate will shrink the coalition for the least important predictors to very close to zero, while also

03:55.540 --> 04:02.310
has the effect of forcing the conversion system to be exactly equal to zero when the winning parameter

04:02.320 --> 04:04.060
lambasts sufficiently large.

04:04.510 --> 04:07.180
So it will never make them exactly zero.

04:07.180 --> 04:14.200
And final model will include all the predictors in case of regression by in case of loss or regression.

04:14.410 --> 04:22.020
The method will be performed on the body and it will go on the efficiency to zero.

04:22.210 --> 04:30.010
Hence it will also allow us to do variable selection so we can use this for feature selection also.

04:31.840 --> 04:34.380
So let us look at the board itself.

04:36.880 --> 04:40.570
So here is the code for rage and regression.

04:41.200 --> 04:49.930
So for implementing, religionless, we will import and lassalle from psychic non linear models and

04:50.200 --> 04:58.960
we will import grid so we could have implemented religionless or just the way we implemented the linear

04:58.960 --> 04:59.440
model.

05:00.430 --> 05:02.230
Simply by saying.

05:04.610 --> 05:08.460
More dog food and then doing pretty here.

05:08.840 --> 05:14.960
We want to try different values of LAMDA because we are not aware what value of lamb there should be

05:14.960 --> 05:15.150
there.

05:15.530 --> 05:18.380
So that is why you are using TV.

05:19.600 --> 05:30.160
Now, what we will do is it will create different models with a different value every time so that we

05:30.160 --> 05:36.910
can compare between different models with different values and select the one which gives the best results

05:36.910 --> 05:37.270
to us.

05:38.020 --> 05:47.140
So here what we are doing is we are creating a variable lambda, which contains the values between one

05:47.140 --> 05:47.890
two hundred.

05:48.610 --> 05:51.940
The number of values which we are generating is one hundred.

05:52.160 --> 05:53.860
Remember the space function?

05:54.070 --> 06:01.170
The space function generates the number of values that is in case of space.

06:01.180 --> 06:03.980
We provide the number of values we want to generate.

06:04.240 --> 06:08.590
And in case of acreage, we used to provide the step function.

06:09.770 --> 06:17.360
So that is what we are doing, we are generating values from one 200 and we are using those and how

06:17.360 --> 06:20.450
we are using those, we are creating a barometer.

06:21.810 --> 06:23.790
Value, this barometer is a.

06:25.020 --> 06:32.400
Dictionary, and it contains the alpha values, these alpha values contains all the lambda values which

06:32.400 --> 06:40.660
we have created, that is values from one to hundred, and we are creating an object of the rich model.

06:41.100 --> 06:45.540
And in this rich model, we can define if we want to intercept or not.

06:45.570 --> 06:52.800
So here we are saying we want to put in the zip and we have creating a model object of the regression.

06:53.940 --> 06:58.240
Then we provide all the details to the grid search.

06:59.670 --> 07:03.590
So in the grid search, we give the type of model we want to run.

07:04.440 --> 07:07.610
Then we give the parameter values which we want to put in.

07:08.310 --> 07:10.980
Then we give them cross-validation value.

07:10.980 --> 07:13.420
That is the number of forwards which we want to have.

07:13.650 --> 07:19.620
Now, earlier, we have used the string split, which will simply split the data into a training and

07:19.650 --> 07:20.720
existing dataset.

07:21.000 --> 07:24.530
But now we are actually using the cross validation method.

07:24.750 --> 07:30.900
Then we will create a certain number of calls and out of those words, each fold will get to be the

07:30.900 --> 07:36.670
testing data and all of the data points will get to better training data at every point of time.

07:37.950 --> 07:44.160
And we are giving this scoring method, the scoring method, which we have chosen now is negative,

07:44.160 --> 07:46.030
mean, absolute error.

07:46.710 --> 07:50.390
So we have just given the declaration of the grid search.

07:50.640 --> 07:58.230
So this grid search will run this model with all the values of the meter, all the combination of values

07:58.230 --> 08:01.520
in the parameters, but then false.

08:01.950 --> 08:11.580
So total models which will be created would be 100 models and all the hundred models will be done ten

08:11.580 --> 08:12.890
times as they are.

08:12.900 --> 08:17.600
We will be 10 cross-validation go into the model.

08:17.610 --> 08:20.010
Running will be 2000 models.

08:21.450 --> 08:27.960
So we are doing now the grid such thing, so it is draining the model on each of the four values which

08:27.960 --> 08:33.420
we have provided, and after that it will give us the best estimate.

08:33.930 --> 08:38.190
So here we can see that the best estimate is if it is equal to one.

08:38.850 --> 08:45.400
So we will look at the see the results and we can also create a function.

08:45.520 --> 08:50.790
So here I have defined a function which will basically take the desired and it will.

08:52.070 --> 09:01.310
Compare the results and provide us the different values for which the model has performed the best here

09:01.310 --> 09:04.100
I can give the number of models which I want to have.

09:04.280 --> 09:06.730
So I want to have the top three models.

09:06.800 --> 09:09.730
So I've given both physical beauty as a default value.

09:09.980 --> 09:14.840
You can change the number of models when you run the method itself.

09:15.230 --> 09:17.000
So here I am running the meter.

09:18.350 --> 09:19.610
I'm seeing report.

09:20.740 --> 09:27.640
It will the grid search not see the results and the number of results I want is five.

09:28.350 --> 09:30.260
So here I'm checking the model.

09:30.540 --> 09:33.840
So the first model has alpha value one.

09:33.960 --> 09:41.610
The second model has a value to the third best model has the alpha value three and the fifth best model

09:41.610 --> 09:42.890
has alpha value for you.

09:43.620 --> 09:48.470
And it is not sequential in nature usually, but it is just by chance.

09:48.480 --> 09:50.720
It is coming out to be one, two, three, four, five.

09:51.120 --> 09:58.940
And you can compare the performance or the performance with the least mean validation.

09:58.950 --> 10:01.440
School is minus six point six zero seven.

10:02.860 --> 10:10.330
And after that, we have minus six point six one two when we are comparing the mean validation school

10:10.360 --> 10:11.760
of mean absolute error.

10:13.090 --> 10:15.740
In that case, the value has to be near to zero.

10:16.870 --> 10:20.080
So the best one would be the one nearest to.

10:22.130 --> 10:29.030
Now, here we can get the best estimate, the value and from the best estimates, we can put the model

10:29.030 --> 10:32.460
again and get the coefficient is.

10:32.750 --> 10:35.060
So here you can see the full details.

10:35.070 --> 10:40.980
It is five point eight zero minus two point five nine minus five point seven one one point one one.

10:41.210 --> 10:44.990
So let us compare these corporations with the ones which we had earlier.

10:45.020 --> 10:49.310
So here we are, five point eight minus 2.5, minus five point seven.

10:51.890 --> 10:54.180
Yes, five point one one minus two point six.

10:54.200 --> 10:59.210
This is four point forty, so you can see the fishing values have actually reduced.

11:03.830 --> 11:10.890
So it has improved the proficient values, now we can run the last regression again.

11:11.330 --> 11:14.930
So in regression, the process would be entirely the same.

11:15.200 --> 11:17.360
We will again give the lambda values.

11:17.360 --> 11:23.330
We will again do the grid search, create the grid search object, give the model detailed, give the

11:23.330 --> 11:29.180
barometer details, give the number of cross-validation that we want, then the spotting criteria.

11:31.300 --> 11:37.450
Then we are giving the greater short faith, we are flipping the model, then we are finding out the

11:37.450 --> 11:38.270
best estimate.

11:39.360 --> 11:42.730
After this, we are finding out the ranking of the models.

11:42.860 --> 11:50.010
You here, you can see the best model has value as minus six point seven three, four, five four range

11:50.050 --> 11:53.030
is it was minus six point six zero seven.

11:53.290 --> 11:59.890
So here's the registration has actually performed better than the last regression.

12:01.180 --> 12:03.730
And again, we can find out the.

12:06.130 --> 12:12.430
Int. values and efficient values, and based on the disciplined Confucian values, you can see that

12:13.030 --> 12:18.020
the values would be down to zero in the professions are not really required.

12:18.340 --> 12:20.650
So we'll compare those with spend.

12:20.650 --> 12:21.970
So let's compare those.

12:24.310 --> 12:26.530
So here the model name is.

12:28.470 --> 12:28.950
Would be.

12:29.610 --> 12:36.170
So we compare that and find out the official tally, so I'll just run those piece of quotes here.

12:36.180 --> 12:41.900
You can see I have created Lassalle on the school model and I've given the best estimate to it.

12:42.330 --> 12:46.740
Then I have footed the new model, the value of the best estimate.

12:47.310 --> 12:50.520
And then I've got the list of the.

12:51.540 --> 12:53.200
Columns on the position.

12:53.460 --> 13:02.100
So here you can see that the positions have actually brought down a lot and the efficient one, five

13:02.340 --> 13:10.920
and six have been thrown into exact zeros while the coefficient X three has been reduced to minus zero

13:10.920 --> 13:17.730
point zero zero five, which is very low in comparison to the coefficients which we had earlier in case

13:17.730 --> 13:18.330
of orange.

13:18.660 --> 13:23.550
So here you can see that the values were right to be reduced to a lower value.

13:23.910 --> 13:31.070
But LASO has done a great job of reducing the column names to zero, the efficiency to zero.

13:32.200 --> 13:39.100
So this is how we can run the linear regression, the regression and actually find out which columns

13:39.100 --> 13:46.810
are going to be dropped so we can drop these columns and then make up for further analysis and how we

13:46.810 --> 13:48.370
want to go ahead with the process.

13:49.360 --> 13:56.350
So after this, we will be learning about a logistic regression, which is a classification model in

13:56.350 --> 13:57.190
the next session.

13:57.440 --> 14:04.150
And once we are done with the logistic regression, you will get to know how we will run that and then

14:04.150 --> 14:06.230
we will discuss about three methods.

14:06.460 --> 14:12.160
So that will give us another model which will help us to find out similar kind of vision values.

14:13.330 --> 14:17.500
Which we can use for there to find out which columns can be dropped.