WEBVTT

00:01.150 --> 00:02.290
Hi there.

00:02.320 --> 00:08.470
Now, we have discussed about linear regression, we have discussed about logistic regression, we have

00:08.470 --> 00:18.060
talked about the decision tree and the informal methods and methods we have learned about the Vikings

00:18.070 --> 00:20.110
and how we can implement bigging.

00:20.170 --> 00:26.820
Using random forest now under the informal learning method is boosting machines.

00:27.070 --> 00:32.250
So let us learn about boosting machines and see how those can be implemented.

00:32.920 --> 00:35.410
So let us get Ferdo.

00:35.590 --> 00:38.340
So this is what boosting looks like.

00:38.590 --> 00:41.560
So as discussed earlier, we will be having.

00:42.580 --> 00:52.120
Sequential models where each model, each of the first model will be learning from this particular data,

00:52.120 --> 00:56.530
will be learning from the X and will try to predict value.

00:57.560 --> 01:04.310
After this model, the output from this model, that is the error which we will be getting from this

01:04.310 --> 01:08.610
particular model, will be pushed into the next model to be predicted.

01:08.780 --> 01:15.150
So the target for the second model will actually be the error from the first model.

01:15.560 --> 01:21.470
So it will keep on improvising the previous model in the next iterations.

01:21.600 --> 01:29.280
So the last model, we would expect that the last model has no error presentiment.

01:30.290 --> 01:37.910
Now, if they will have no error present in the last model, then that would mean that we have somewhere

01:38.270 --> 01:41.060
over fritted on the training data.

01:41.450 --> 01:49.430
So we should know that where we need to stop and we should also know that there has to be some kind

01:49.430 --> 01:54.370
of regularization which needs to be applied on this boosting and got it to.

01:55.770 --> 02:03.630
So the major points to be remembered about boosting is that these models are generated in a sequential

02:03.630 --> 02:10.190
order in comparison to bigging where in bagging the models are activated by.

02:11.710 --> 02:18.380
I'm boosting is created by using different decision tree stumps again.

02:18.670 --> 02:26.540
Now, these decision trees dumb does not try to feed data to any specific pattern, like an in model,

02:26.710 --> 02:29.950
which would always try to figure out what linear equation.

02:30.160 --> 02:33.850
So here there is no specific pattern in which we are trying to fix.

02:34.100 --> 02:39.220
We are just trying to create small models.

02:40.850 --> 02:48.200
That each of these one models would be explored on something we just want to create more and more,

02:48.620 --> 02:54.600
which will be able to emphasize on one specific, this specific pattern.

02:54.800 --> 03:02.570
So each model will learn something and push it forward to the other model, and the model will try to

03:02.570 --> 03:07.890
improvise on the patterns which were missed out by the previous model.

03:08.180 --> 03:14.570
So there will be a certain amount of voltage which will be given to the model when the items in the

03:14.570 --> 03:18.350
model one, which model one could not predict.

03:18.620 --> 03:27.000
So the values which were misclassified or not predicted properly by model one will be improvised in

03:27.000 --> 03:29.900
model do by giving a higher ratings to them.

03:30.320 --> 03:35.450
So this is how each model will try to learn some part of the patterns which are present.

03:35.710 --> 03:42.770
And somehow the entire combination will be able to learn all the patterns this way and will be able

03:42.770 --> 03:46.250
to predict better in comparison to a normal decision.

03:46.250 --> 03:46.550
Three.

03:48.120 --> 03:56.250
So here the input variables remain the same in all the cloners, the input variable will always be X,

03:56.490 --> 04:02.610
Y and Z target variable will be changed to the added value from the previous model.

04:03.600 --> 04:12.300
The target value for the first model will VI and the target value of value for the next model will be

04:12.300 --> 04:16.440
Y minus W one w w well three and so on.

04:16.680 --> 04:19.860
Where the well one is the prediction which is made from the.

04:20.900 --> 04:25.220
Each week learner, that is model one more do more to transform.

04:28.460 --> 04:38.420
Now, let us see how the algorithm works, so to find vehicule, we apply these learning algorithms

04:38.570 --> 04:40.310
with a different distribution.

04:40.820 --> 04:47.090
Each time the best learning algorithm is applied, it generates a new prediction.

04:48.200 --> 04:50.270
This is an iterative process.

04:50.420 --> 04:58.400
And after many iterations, the boosting algorithm combines these weak rules into a single strong prediction.

04:58.850 --> 05:00.440
What are these weak rules?

05:00.650 --> 05:07.940
These weak rules are the small decisions dump's or these small weak learners or the small modules which

05:07.940 --> 05:09.790
we're connecting here.

05:10.490 --> 05:14.900
Each model is a vehicle which we are combining together.

05:17.470 --> 05:25.180
Now for choosing the right distribution, here are the following steps, the first step, be the best

05:25.180 --> 05:32.680
lunatic's or one of the distributions and assigns equal big order attention to each observation.

05:33.610 --> 05:41.200
Step two will be if there is any prediction error caused by the force to be stolen base Lerner then

05:41.200 --> 05:45.480
b b higher attention to the observations having prediction error.

05:46.090 --> 05:49.110
Then we apply the next based learning algorithm.

05:49.510 --> 05:58.870
So in these step three, the step two will be keep on repeating until and unless we reach the higher

05:58.870 --> 06:04.390
accuracy or the limit of the based learning algorithm is reached.

06:04.570 --> 06:12.340
So does that mean we are able to find higher accuracy and we are able to reduce the error?

06:12.340 --> 06:17.350
We will just keep on improving on the previous based learner.

06:17.440 --> 06:23.920
So we will keep giving the higher priority or more attention to the previous misclassified points.

06:24.700 --> 06:28.900
We are able to improvise once there is a position with.

06:29.960 --> 06:33.600
There is no improvisation happening then we can stop.

06:34.250 --> 06:41.540
So finally, it combines the outputs from the vehicle and creates a strong llona, which eventually

06:41.540 --> 06:51.020
improves the prediction power of the model boosting pays high in focus on the examples which are misclassified

06:51.170 --> 06:54.690
or have higher errors by perceiving the group.

06:55.190 --> 07:03.890
So each model here is a vehicle learner or is called a vehicle, which are combined together to form

07:03.890 --> 07:06.440
a strong learner and each of.

07:07.220 --> 07:09.380
When misclassify something.

07:09.380 --> 07:10.490
So there is one rule.

07:10.520 --> 07:16.310
So this rule will be able to classify some part of the data correctly and it will be misclassifying

07:16.310 --> 07:17.120
some other part.

07:17.390 --> 07:23.930
Now the part which was misclassified will be given more weight, it will be given more attention, and

07:23.960 --> 07:27.370
this particular model will try to classify it properly.

07:28.340 --> 07:36.290
Now, again, there will be slight improvisation, then we will combine more than one and there will

07:36.290 --> 07:39.110
be more number of rules which were created here.

07:39.380 --> 07:42.770
Now, again, these monitors would have missed out on something.

07:42.950 --> 07:50.900
So those predictions which were misclassified, those will be, again, taken up and those will be given

07:50.900 --> 07:53.510
higher rated by this particular model.

07:53.660 --> 07:56.450
And this model will again try to improvise.

07:57.110 --> 08:05.570
So this is how each and every model will be a small rule or some have some property which will be capturing

08:05.570 --> 08:12.020
some different patterns so that when combined all together, they will work as a strong learning.

08:16.190 --> 08:25.430
Now, let us consider what is either boosting, either boosting is one of the basic algorithms in boosting,

08:25.670 --> 08:30.510
which will allow us to understand how this entire thing works.

08:30.800 --> 08:34.120
So let us start with this method.

08:34.310 --> 08:37.310
So let us consider the box one.

08:37.490 --> 08:38.630
This is box one.

08:38.630 --> 08:39.680
This is box two.

08:39.950 --> 08:41.060
This is box three.

08:41.210 --> 08:42.680
And this is box for.

08:44.860 --> 08:53.410
Now, in case of box one, you can see that we have assigned equal weight to each data point and applied

08:53.410 --> 08:58.210
our decision stamp to classify them as plus or minus.

08:58.960 --> 09:07.240
The decision stem from the one has generated a vertical line at the left side to classify the data point.

09:07.540 --> 09:13.270
So this is the forced decision stump, which is classifying the data now.

09:13.360 --> 09:20.590
It classifies these two points correctly, but it has misclassified these three.

09:21.580 --> 09:29.200
Plus, signs and classified all the negative signs correctly, right, so we see that this vertical

09:29.200 --> 09:33.990
line has incorrectly predicted the three plus items minus.

09:34.090 --> 09:39.090
And in such case, we will assign higher ratings to these three plus signs.

09:39.280 --> 09:44.670
So we will provide more higher ratings to these three plus signs and apply on their decisions.

09:45.430 --> 09:47.420
So what will happen in the next room?

09:47.650 --> 09:48.770
In the next room?

09:48.820 --> 09:55.000
What we do is because we have provided more ratings to these three storms.

09:55.270 --> 10:00.390
So these are bigger in size as compared to the rest of the data point.

10:02.020 --> 10:08.110
Now, in this case, the second decision, staff will try to predict them correctly, the three plus

10:08.110 --> 10:08.980
signs correctly.

10:09.160 --> 10:11.470
So that is a new vertical line.

10:11.470 --> 10:14.590
They do, and the right hand side has been added.

10:14.770 --> 10:24.550
So what this has done is it has misclassified these classify these three misclassify plus signs correctly

10:24.760 --> 10:27.040
and also these two plus signs correctly.

10:27.040 --> 10:31.450
But now it has misclassified these to these three negative signs.

10:31.840 --> 10:37.720
So now what we will do is we will now give a higher rate to these minus signs.

10:39.460 --> 10:45.500
So now what we have done is we are we have given higher ratings to these three minus signs.

10:45.820 --> 10:47.260
So now what will happen?

10:47.260 --> 10:54.800
We will have our decisions down the three, which will predict these misclassified of points correctly.

10:55.030 --> 11:01.300
Now, this time, a horizontal line would be generated to classify the plus and the minus and based

11:01.300 --> 11:03.940
on the higher weight of the misclassified observations.

11:04.180 --> 11:05.380
So now what will happen?

11:05.380 --> 11:12.490
It is generated this line, which would have classified these three plus signs correctly and these three

11:12.490 --> 11:15.750
minus signs correctly because they had a higher rate this time.

11:16.210 --> 11:24.760
So now we have these two plus signs which have been misclassified and one minus nine, which has been

11:24.760 --> 11:25.660
misclassified.

11:26.050 --> 11:34.450
So now what we do is in the box four, we have combined the one, the two and these three to form a

11:34.450 --> 11:39.520
strong prediction, having a complex rule as compared to the individual V cloners.

11:40.390 --> 11:49.350
So on combining these three decision stems, we can see that now these points have been classified properly.

11:49.540 --> 11:55.900
So these negative signs have been justified as negative and these positive signs have been classified

11:55.900 --> 11:56.800
as positive.

11:58.390 --> 12:05.840
So the algorithm has classified these observations quite well as compared to the individual vehicle.