WEBVTT

00:01.240 --> 00:10.240
In this session, we would implement stacking, so for stacking, we will be using different non-linear

00:10.240 --> 00:18.420
models in the first tier and we will be providing a linear model in the second year.

00:18.880 --> 00:27.280
You can use several layers in a stacking model and decide upon yourself how you want to stack.

00:27.520 --> 00:30.570
Stacking is completely a creative thing.

00:30.820 --> 00:38.350
You can try different combinations, you can try different things and create your own model using stacking.

00:39.610 --> 00:47.290
So let us begin with the first thing which we will be doing is we will import find us a number and we

00:47.290 --> 00:54.620
will import of whatever columns, whatever libraries we want for the implementation.

00:54.880 --> 01:04.360
So I will be using the nearest neighbors of I will have random forest and logistic regression.

01:05.020 --> 01:14.260
I will be evaluating it using oracy uses school and for modern selection, I will be using full time

01:14.380 --> 01:15.460
frame display.

01:17.720 --> 01:28.320
Next, we will be using the census income data set, which we have used previously also in this dataset.

01:28.790 --> 01:35.720
We have different values as greater than five or less than equal to five.

01:35.990 --> 01:47.480
And based on the details, we have to find out if we have to keep or if the class has to be greater

01:47.480 --> 01:52.020
than equal to five fifty thousand or it has to be less than 50.

01:53.120 --> 01:55.670
We have already seen the data set.

01:55.670 --> 01:58.000
So let me show you the data set again.

01:58.790 --> 02:01.040
So let me run this.

02:02.740 --> 02:04.420
And let me add the.

02:12.310 --> 02:14.290
So this is the data that we have.

02:17.020 --> 02:19.630
And here we have converted the vehicle.

02:21.020 --> 02:29.600
Next, we will be deleting the education column because it contains same information as the one presented

02:29.600 --> 02:32.720
in the Ordell column, which is education.

02:32.720 --> 02:33.230
No.

02:35.380 --> 02:45.160
After this, we will be creating sub columns using the categorical classes like Worldclass, finally

02:45.160 --> 02:51.710
we marital status, occupation, relationship, race, sex, native countries.

02:51.940 --> 02:57.900
So these are the columns which we will be converting from categorical column to dummy columns.

02:58.720 --> 03:00.630
So these are the steps associated.

03:00.640 --> 03:03.170
So first we have deleted the education column.

03:03.700 --> 03:12.940
After that, we have selected all the columns, object columns and from these object columns we have

03:13.430 --> 03:14.050
run.

03:15.020 --> 03:15.620
Allu.

03:17.480 --> 03:25.370
This logic's checks, if a particular category contains more than 99 values, then only it will be converted

03:25.760 --> 03:27.970
into the variable.

03:27.980 --> 03:29.780
Otherwise it will skip that.

03:30.320 --> 03:37.200
You can choose any frequency based on your dataset, based on the number of values present in your dataset.

03:37.940 --> 03:45.580
And after that, we have made this string split and retrieve the dataset.

03:45.590 --> 03:55.820
So we have got the train and the test data set and the CD train would be divided into extreme and vitrine

03:55.820 --> 04:00.230
further by dropping the volume and keeping only the right column.

04:00.590 --> 04:08.990
And similarly, the test will be divided and do X test and write this data frame by removing the Y column

04:08.990 --> 04:09.670
or an.

04:11.100 --> 04:12.570
Keeping only the white collar.

04:13.770 --> 04:20.790
These are the columns which have been converted into categorically from the ordinary classes, these

04:20.790 --> 04:25.890
categorical classes have been converted into one hot and Cold War dummy variables.

04:26.850 --> 04:30.180
Next is the list of classifiers which I will be using.

04:30.210 --> 04:33.060
So these are different algorithms which I will be using.

04:33.570 --> 04:37.830
The first one is gay neighbors classify it.

04:38.010 --> 04:42.080
Next is the random forest, then X to be classifiers.

04:42.630 --> 04:48.090
These are the details of the same one is having 50 neighbors.

04:48.600 --> 04:55.620
Next, I have created random forest with two hundred meters and balanced with another random forest

04:55.620 --> 04:57.780
with hundred estimated.

04:57.780 --> 04:58.410
And then.

05:00.140 --> 05:07.280
It's to be classified, I've created both with one of the investigators and the learning rate being

05:07.280 --> 05:09.620
zero point one and zero point zero one.

05:10.920 --> 05:13.500
And this is the list of algorithms which I have.

05:16.580 --> 05:26.710
Next, I will simply use these rules of the day I have created counted the number of rows of detail

05:26.760 --> 05:32.900
present in the training data set, so it has twenty six thousand and forty eight rows of data.

05:33.890 --> 05:39.110
Next, I will create a layered one, this layer one will contain.

05:40.110 --> 05:42.510
All the zeros with.

05:43.890 --> 05:48.510
The number of roads being twenty six thousand forty eight.

05:49.680 --> 05:54.480
This is the data, this will basically contain the.

05:55.600 --> 05:59.050
Predictions which we will be making from year one.

06:00.080 --> 06:08.980
So we will be training more those on X train and by train and then making predictions in the scale of

06:09.000 --> 06:14.840
one, two, three, four and five using the field method.

06:15.410 --> 06:23.630
And the full method will allow us to have out of box predictions in this particular dataset so that

06:23.630 --> 06:27.470
we can further train this on a linear model.

06:28.850 --> 06:36.050
So the next thing which we are doing here is we are creating giffels, so here we are creating then

06:36.050 --> 06:45.080
split out of the total data and we are beginning with the fall to being one full value, being one.

06:46.020 --> 06:55.440
And out of the Kiev split, we have taken the a number of Rose does, so this is the best efforts we

06:55.440 --> 07:05.700
have taken in it and we have taken out the train and left out junk from the Kiev split.

07:08.990 --> 07:20.000
So give the split will basically create folds and it will create thin folds out of the entire dataset.

07:21.250 --> 07:29.050
So it will divide this entire dataset into training dataset and left out a chunk.

07:29.930 --> 07:39.080
Now, what we will be doing is it we are bringing the full detail, the number of fold, the if it is

07:39.080 --> 07:42.890
the first or second fold or which fold are we running this for?

07:43.460 --> 07:48.590
And further internally, it will move on all the algorithms that we have.

07:48.590 --> 07:52.520
That is these five algorithms which we have with us.

07:54.330 --> 08:03.990
So this law will loop over the split and over the illegal items, so it will be called the dream data

08:03.990 --> 08:04.650
on the.

08:05.690 --> 08:08.300
And it will start and get the.

08:09.290 --> 08:17.660
Of value and the value of the training data set, and then it will decide upon the left junk that is

08:17.660 --> 08:24.710
the size of the junk and based on the left out junk, it will keep an eye on the entire data set.

08:25.010 --> 08:27.830
Then it covers all defense split.

08:28.790 --> 08:37.050
And finds out by fitting the algorithm and predicting the probability of of the left outcome.

08:38.040 --> 08:42.050
So it will drain on the extreme drain and white brain drain data.

08:43.180 --> 08:47.020
And predict the extreme left out junk values.

08:48.130 --> 08:55.630
And after we have predicted these extreme left out junk values, we will fill the little one data frame,

08:56.110 --> 08:58.420
which contains these blank values.

08:58.450 --> 09:07.420
So as of now, and we will fill in the left junk part of the details name the the the values.

09:07.420 --> 09:09.700
That is the values which we have predicted.

09:10.850 --> 09:19.690
And then we will obey the value of fall to fall through so that it will again loop over the second fall.

09:19.940 --> 09:28.520
So first of all, it will consider the first world as the best value or the value which needs to be

09:28.520 --> 09:29.520
predicted later.

09:29.840 --> 09:30.960
That is the left out.

09:31.610 --> 09:35.330
Then it will consider the next world as the leftover junk and so on.

09:36.610 --> 09:40.340
So it is keep a note of default and then work on it.

09:40.600 --> 09:46.360
So here it is working on the first hole and predicting the values for the five.

09:48.140 --> 09:50.260
Algorithms, then it is.

09:51.700 --> 09:54.520
Going over the second full and.

09:55.910 --> 10:02.360
Again, working on the final verdict, so this it will be doing for the oil then folds, which we have

10:02.360 --> 10:07.220
in the data, and here we have all the predicted values which we have from the.

10:09.010 --> 10:10.480
Folds, would we have run?

10:12.320 --> 10:20.780
Now, what we will be doing is now that we have all of these values, we want to create the now this

10:20.800 --> 10:25.280
layer will be three on top of this entire data.

10:25.490 --> 10:31.490
So now this becomes our values and the value remains the same.

10:31.910 --> 10:36.350
So now using these values, we will apply a logistic regression.

10:36.350 --> 10:42.160
On top of this, you can apply linear regression in case is a regression problem.

10:42.170 --> 10:48.050
But because the problem which we are solving is a logistic problem, that is if value is greater than

10:48.050 --> 10:49.940
equal to 50 or less than 50.

10:50.480 --> 10:53.300
So that is why we are applying logistic regression.

10:55.030 --> 10:59.360
So we will consider these as the X values for the next clip.

10:59.800 --> 11:07.600
So what we will be doing is now, again, we are considering later to decide which is taken from this

11:07.600 --> 11:08.770
particular date frame.

11:08.770 --> 11:10.390
We have combined this entire thing.

11:10.750 --> 11:18.060
And now for layer two, these values become zettl values.

11:18.070 --> 11:27.280
So we have created a new data frame layer to this which will contain zero values.

11:28.000 --> 11:35.440
And these zettl values will now be fulfilled by the predicted probability, by the.

11:37.220 --> 11:43.370
By the model which we have created here, so they'll know what we have done is.

11:44.440 --> 11:52.900
We have voted on all of these forms and from all of these words, we have predicted the values here

11:53.410 --> 11:56.140
and now what we are doing here is.

11:57.740 --> 12:09.320
Here we are again, creating a model of the debris where we have zero values and we're vibrating on

12:09.320 --> 12:13.620
the algorithm and we are generating the layer to debate.

12:16.340 --> 12:22.910
Now, in case of little, we will generate a logistic regression and this logistic regression will be

12:22.910 --> 12:24.620
fixed on layer one.

12:26.290 --> 12:34.630
And this little one will now be this model, which has been generated by fitting the model on only one

12:35.140 --> 12:40.870
will now be used for predicting the probability of the layer to be.

12:42.910 --> 12:43.480
Now.

12:44.490 --> 12:47.490
Once we predict the probability of layer two data.

12:48.810 --> 12:50.370
Then begin for the.

12:52.750 --> 12:59.440
Check the probability of it, so from the data, we can see that the accuracy score comes out to be

12:59.440 --> 13:00.640
92 percent.

13:01.330 --> 13:04.940
Now let us apply a simple classifier.

13:05.290 --> 13:08.530
So this is what needs to be classified, which I have for.

13:10.220 --> 13:18.070
So this has an estimated one hundred law needed one, and this classified unfitting only one and little

13:18.080 --> 13:24.700
and wide data, it gives us the same accuracy, which is zero point nine.

13:25.520 --> 13:33.530
So here you can see that instead of applying all igby classified, we have applied a very simple model,

13:33.530 --> 13:37.640
which is logistic regression, and it is also giving the same accuracy.

13:37.640 --> 13:39.970
That is not much difference in both of these.

13:40.730 --> 13:43.280
So we can use any of the models.

13:43.280 --> 13:53.110
We can either use a simple model or a linear a linear model or nonlinear model or in the second layer.

13:53.300 --> 13:57.880
So it is completely up to us what we are choosing to implement here.

13:59.220 --> 14:07.590
So this is about the stacking implementation, so all you need to take care of is that the data which

14:07.590 --> 14:12.480
you will be using needs to be done properly.

14:12.690 --> 14:20.730
And after transformation, you will simply applied layer one very thin layer when you will use different

14:21.000 --> 14:22.410
nonlinear models.

14:22.410 --> 14:25.470
And in year two, you will use a linear model.
