WEBVTT

00:01.470 --> 00:06.960
The next model, which we will be discussing about is K nearest neighbors.

00:08.420 --> 00:13.590
But the name you can understand that this is about nearest neighbors.

00:14.420 --> 00:23.810
So here we will be considering the data point and from the point we will find out the nearest neighbors

00:23.810 --> 00:25.730
of that particular data point.

00:27.580 --> 00:35.500
And based on the nearest neighbors, we will decide what to us should the particular point should be

00:35.500 --> 00:43.870
assigned to or what value should the value of the particular data point be getting.

00:44.320 --> 00:48.320
So what value should be predicted out of that particular point?

00:49.150 --> 00:50.010
So let us go.

00:53.780 --> 01:02.410
So here is the number of neighbors, so let's say we have this particular data and we have assigned

01:02.750 --> 01:03.480
to be three.

01:04.370 --> 01:10.820
So in this particular data as gays three and we want to find out the class for this particular study,

01:11.150 --> 01:20.450
then we will consider the majority point that this this three, hence this red star is actually belonging

01:20.450 --> 01:21.970
to these four people.

01:22.190 --> 01:24.980
Plus that is Class B.

01:26.330 --> 01:34.970
Now, if you consider me as six, then it will fall into this particular region, that is the nearest

01:34.970 --> 01:35.940
six volumes.

01:36.680 --> 01:45.440
Now, when we are considering the nearest six in this scenario, the majority vote is of these yellow

01:45.440 --> 01:47.390
classes that this class.

01:48.680 --> 01:57.200
So now we can see that how the nearest the number of the key value decides what should be the value

01:57.200 --> 02:00.420
of the point, which we are looking for.

02:01.290 --> 02:09.800
Now, the small value of K passes over everything that is due to learning the localized five.

02:10.340 --> 02:14.690
It is not able to find the general bargain.

02:15.470 --> 02:22.720
So that is the reason why, because this point is looking for a very small number of key.

02:23.030 --> 02:30.170
That is why it is not able to identify that this point is actually belonging to Class E.

02:32.080 --> 02:39.700
Next is when we have a very large give and then it will become highly generalized.

02:40.390 --> 02:43.240
I mean, Mr. Complex bought it.

02:43.510 --> 02:50.750
So let's say that the nearest key value is actually a very high value system.

02:51.610 --> 03:01.840
So in that case, the nearest ten point would be one, two, three, four, five, six, seven, eight,

03:02.410 --> 03:04.000
nine and 10.

03:04.750 --> 03:06.520
That is all of the values.

03:07.060 --> 03:12.960
And then we take a majority vote on the glasses actually have five five point.

03:13.510 --> 03:17.790
So we will not really be able to analyze what this actually means.

03:18.880 --> 03:27.260
Hence, we need to find out a value of care, which is not very large and not very small.

03:28.060 --> 03:33.140
So we have to make sure that the value of K is acceptable.

03:34.030 --> 03:41.800
So for this, what we usually do is we find out the results for different key values.

03:42.490 --> 03:44.870
We will be finding out the results.

03:44.890 --> 03:52.690
Work is equal to one, is equal to two, is equal to 10 or 20 based on the amount of data that we have.

03:54.020 --> 04:02.900
And then we will compare that which particular value gives the best result, and that is the key which

04:02.900 --> 04:04.160
we will be selecting.

04:08.810 --> 04:12.260
Now, let us look at a few properties of Ken.

04:13.290 --> 04:22.080
So human can be used for both classification as well as regression, predictive problems.

04:23.040 --> 04:30.840
Now, when you want to classify so in case of classification, we will take the nearest key and take

04:30.840 --> 04:36.900
a majority vote of the point around it and then decide which class to the point below.

04:38.970 --> 04:46.570
When we are working with regression in case of regression, we will be considering these points and

04:46.570 --> 04:49.710
taking a weighted average of the values.

04:52.640 --> 05:01.340
A weighted average will allow to regularise the values because some values might be close and some points

05:01.340 --> 05:02.640
can be fired up.

05:03.200 --> 05:10.640
So a weighted average or weighted average on the basis of distance will allow to modulate the value

05:10.640 --> 05:14.530
in case of finding out the regression values.

05:17.160 --> 05:21.420
It is mainly used for classification, predictive problems.

05:22.200 --> 05:25.510
Why is it mainly used for classification, predictive problems?

05:25.740 --> 05:32.220
It is mainly used for classification problems because it can consider that it is a distance based in

05:32.220 --> 05:39.520
Wertham and it might ignore a few buttons because it is looking for the nearest point.

05:39.690 --> 05:43.110
So it is looking in a circular region around.

05:45.950 --> 05:54.380
Now, it is called a lazy learning algorithm devised by the lazy learning algorithm because there is

05:54.380 --> 05:56.830
no specialized training phase.

05:58.350 --> 06:09.630
In this gordito, it uses all the data for training classification, so it will not need explicit training

06:09.990 --> 06:17.430
period, when we are actually making a prediction, we can simply provide all the data points and these

06:17.430 --> 06:18.480
data points.

06:18.480 --> 06:21.990
It says we act as the training.

06:21.990 --> 06:29.480
What if the removal of your data point, then it will be classified differently.

06:29.640 --> 06:34.680
And if the other data points, then it reclassified different.

06:35.370 --> 06:39.100
So that is why it is called the lazy learning called.

06:41.010 --> 06:48.870
Then it is known the Matrix learning algorithm, that is, it does not assume anything about the underlying

06:48.870 --> 06:49.170
data.

06:50.350 --> 06:59.770
And it finds the it's based on closely matching data points that it will consider the closest data point

07:00.040 --> 07:01.910
for this particular criteria.

07:01.930 --> 07:11.140
And because the closeness is measured by the distance, which is the reason why we will be scaling the

07:11.150 --> 07:15.670
data, so we will scale the data so that.

07:17.060 --> 07:26.660
The distance will be equated for different types of features, so because there could be a feature named

07:26.660 --> 07:34.520
age and they could be another feature amount now age will be ranging from, let's say, zero to hundred

07:35.060 --> 07:37.790
and the amount might range in lacson.

07:38.600 --> 07:41.860
So in that case, we need to have skinny.

07:45.220 --> 07:57.190
Now, let us discuss a few pros, so in order to uncomplicated and easy to apply in an atom is uncomplicated.

07:57.310 --> 07:59.360
It is not complicated in nature.

07:59.530 --> 08:01.110
It is very easy to apply.

08:01.130 --> 08:07.120
So you can simply see if we have the data points and we know the distances, then we can easily apply

08:07.120 --> 08:11.140
the algorithm by hand and find out what the point should be.

08:13.440 --> 08:21.600
Then there are only two metrics to provide the algorithm value and the distance metrics, so.

08:22.930 --> 08:30.040
All we need for this algorithm is the value of work and the distance between the points so that we can

08:30.040 --> 08:36.910
just decide on the basis of that how many points we need to consider and we find the nearest point and

08:36.910 --> 08:38.560
then make our decision.

08:40.960 --> 08:46.700
Next walks with any number of classes, not just binary classification.

08:47.080 --> 08:54.280
So in this case, we might have any number of classes and based on that, we can just find out the distance

08:54.520 --> 08:59.000
and the majority number of classes and then we can be able to go.

08:59.030 --> 09:03.100
We can simply use this algorithm for five classes or ten classes.

09:04.900 --> 09:13.330
Then it is easy to add new data to the bottom because it is a lazy loner, so it does not need any specific

09:13.330 --> 09:20.970
training so we can easily apply more data to bellbottom and then we will be able to analyze them according.

09:23.660 --> 09:31.040
Now, let us just discuss about a few points, so it is computationally expensive, right, because

09:31.040 --> 09:38.450
it will be finding out the distance between all points because it is a distance, basically, and it

09:38.450 --> 09:45.560
will have to find distance between all the points and then it will have to find the nearest distance

09:45.770 --> 09:50.870
of all points from this particular point, and then it will make their decision.

09:51.740 --> 09:55.060
That is why it is computationally expensive.

09:56.510 --> 10:04.130
Next is it is having a high memory storage requirement because it will have to save all the distances,

10:04.430 --> 10:10.720
that is why it will have a high memory storage, then it is hard to work with categorical features.

10:10.910 --> 10:13.800
So let's say we have certain categorical features.

10:13.820 --> 10:15.650
Now we have categories.

10:15.650 --> 10:18.580
We have values like age and amount.

10:18.890 --> 10:22.280
In that case, it is very easy to find out the distances.

10:22.550 --> 10:25.250
But let's say we have classes.

10:25.520 --> 10:29.060
One is gender and another class is.

10:29.270 --> 10:37.450
Let's say if someone is married or not, then there will be a very difficult to find out.

10:37.460 --> 10:39.950
What is the distance between the what?

10:43.830 --> 10:50.100
Next prediction is for a big number of features, that is for a big number of features, it will have

10:50.100 --> 10:54.370
to find out the distance based on all the features being considered.

10:54.600 --> 11:00.660
So the dimensions of the distance which will be found out, would be very high, which is why it will

11:00.660 --> 11:05.550
be a little difficult to work with these and also.

11:07.080 --> 11:15.720
The sensitivity towards the scaling of data is very important and also there might be a presence of

11:15.760 --> 11:19.650
irrelevant features which might tweak the distances.

11:20.070 --> 11:30.450
So let's say we have certain data points like is an amount and we do not scale them properly, then

11:30.780 --> 11:39.330
they might get biased towards the amount and we might not really consider the changes which are caused

11:39.330 --> 11:42.250
in the age of the person.

11:42.480 --> 11:46.640
So that is why it is very important to make sure that the data is.

11:51.610 --> 11:54.430
So this is about an.

11:55.850 --> 11:59.270
Next, we would learn how we can implement in.

12:00.560 --> 12:07.490
And always remember, Kevin is also always learning algorithm, they know what everybody seems to have

12:07.490 --> 12:10.150
learned are all supervised learning algorithm.

12:10.550 --> 12:17.030
That is another algorithm named Kamins, which is an unsupervised learning algorithm, which we will

12:17.030 --> 12:18.450
be learning very soon.

12:18.860 --> 12:26.490
So people usually get confused between the nearest neighbors and the key means algorithm.

12:26.660 --> 12:34.760
So we will be discussing the comparison and we will be visiting again and again while we will be discussing

12:34.760 --> 12:39.850
gaming's so that there is no confusion between both of the constant.