WEBVTT

00:01.260 --> 00:01.580
Hi.

00:02.100 --> 00:08.570
Now that we have covered all the supervised learning algorithms and learned how we can actually create

00:08.600 --> 00:15.810
a pipeline out of them, let us have a look at the next wave of machine learning, which is called unsupervised

00:15.810 --> 00:16.160
learning.

00:16.620 --> 00:21.000
Then now we have learned about supervised learning algorithms.

00:21.270 --> 00:26.780
Those algorithms allowed us to make predictions about the data.

00:26.820 --> 00:35.280
So in those models, we basically gave the input value and the output value to the machine to learn

00:35.280 --> 00:42.270
from it so that when another input data will be given to it, it will be able to make predictions and

00:42.270 --> 00:44.360
find out the output values again.

00:44.850 --> 00:49.520
But in case of unsupervised learning, we don't need that output value.

00:49.530 --> 00:55.080
We don't need those values now because we don't want to predict the anything.

00:55.350 --> 00:57.750
We just want to group things together.

00:57.780 --> 01:00.050
We just want to form clusters here.

01:00.390 --> 01:02.040
So let us have a look at this.

01:03.360 --> 01:10.420
So in case of unsupervised learning, I mean, it is finding patterns in the data.

01:10.920 --> 01:20.280
So supervised learning, needed output or target values to be provided as it wanted to predict the specific

01:20.280 --> 01:26.730
value or label, such as guide dog prices, what kind of fruit dessert?

01:26.730 --> 01:28.540
Is it a mango or is it a banana?

01:28.680 --> 01:33.120
So it wanted those details because it actually had to predict that.

01:33.600 --> 01:42.390
So we used to give our entire weather data to the machine so that it can predict what temperature will

01:42.390 --> 01:45.810
be tomorrow or if or if I should play or not.

01:46.050 --> 01:50.970
So these kind of decision we had to make these kind of output values we had to provide.

01:51.990 --> 01:59.490
But now unsupervised learning does not require any outward values because it does not want to predict

01:59.520 --> 02:00.300
any labels.

02:00.750 --> 02:06.680
It only requires input values and it wants to group the items.

02:07.170 --> 02:10.900
So we even give several input data here.

02:11.550 --> 02:19.140
And based on this input data, the machine will try to find different patterns and segregate these elements

02:19.140 --> 02:20.400
into different groups.

02:20.610 --> 02:27.220
It will just group them together and it will not really know if this is an apple or this is a banana.

02:27.360 --> 02:31.860
So it will not know what is present in the information.

02:32.070 --> 02:37.800
It will only know how to segregate these on the basis of the pattern which is present.

02:38.700 --> 02:41.530
Now, why do we need unsupervised learning?

02:41.970 --> 02:46.730
We need unsupervised learning to create more focused marketing company.

02:47.160 --> 02:56.220
So let's say I have a new product coming out and I want to find out which people will actually be interested

02:56.220 --> 02:57.160
in the coffee product.

02:57.570 --> 03:02.460
So for that, I will have to find out different groups of people.

03:02.460 --> 03:10.380
I'm their likings and I will have to segregate people on the basis of their likings and based on their

03:10.380 --> 03:13.500
likings or based on their daily activities.

03:13.740 --> 03:20.100
I can see that these kind of people will actually be interested in coffee or these kind of people will

03:20.100 --> 03:20.870
be interested in.

03:21.480 --> 03:30.780
So we will find different thighbones and different characteristics about the people or regarding the

03:30.780 --> 03:34.270
product so that we can categorize them differently.

03:34.590 --> 03:35.280
So let's see.

03:35.280 --> 03:37.880
I have some details about the products.

03:38.310 --> 03:38.820
Let us see.

03:38.820 --> 03:46.440
I have some products and I get to know that something is in store level or not in-store label if something

03:46.440 --> 03:48.570
is a hardware or a software.

03:49.520 --> 03:57.560
If what memory space something needs, so based on all these days, I can actually classify, I can

03:57.560 --> 04:03.810
actually cluster things together into different types of products.

04:04.070 --> 04:11.290
I can get a cluster or group different type of products together and create a bunch of them.

04:12.860 --> 04:21.510
So we can find clusters of data, pardon's in groups, similar items together, we can do product categorization.

04:21.770 --> 04:24.100
And next thing is anomaly detection.

04:24.980 --> 04:30.120
What anomaly detection helps us in achieving is that is legacy.

04:30.170 --> 04:36.390
We have a daily transaction and we have a monthly transaction of Safian thousand rupees.

04:36.860 --> 04:44.300
Now, if somehow that is one month, when I make a transaction of one like rupees, then it creates

04:44.300 --> 04:46.130
a red flag for my bank.

04:46.280 --> 04:51.800
And my bank will immediately communicate to me that there is a very high amount of transaction which

04:51.800 --> 04:52.580
has been made.

04:53.270 --> 05:00.920
So this amount will actually be waiting for every person based on their normal transaction.

05:02.430 --> 05:11.880
Hence, anomaly detection is the process of finding out some different vibin or something which is different

05:12.030 --> 05:18.560
from the normal behavior of someone stato next is dimensionality reduction.

05:18.780 --> 05:24.440
We have discussed about the feature selection in case of supervised learning.

05:24.720 --> 05:33.120
We have been working on feature selection by different methods like using feature importance by removing

05:33.120 --> 05:35.700
columns based on different ideas.

05:36.000 --> 05:40.440
But now there is the need or dimensionality reduction.

05:40.440 --> 05:46.420
And let's say we are not able to remove columns based on all of the midterms, which we have known better.

05:46.460 --> 05:55.000
Now we can use dimensionality reduction to reduce the size of data to a very large extent.

05:55.290 --> 06:00.570
So how we will do that, we will be launching it in unsupervised learning.

06:02.080 --> 06:10.300
Now, let us see some few important things which we need to begin off by walking on unsupervised learning

06:10.300 --> 06:10.860
problems.

06:11.200 --> 06:20.140
So the first thing which we need to consider is scaling of data scaling or data is also required in

06:20.140 --> 06:20.980
case of.

06:21.960 --> 06:31.680
Gannon and SBM, because there again, the information is being dealt with on terms of distance, so

06:31.680 --> 06:41.610
it is important to scale the data in Gannon and SVM and in all the distance based algorithms which are

06:41.610 --> 06:44.280
majorly present in the unsupervised learning.

06:45.810 --> 06:49.390
So why do we actually need Killingworth data?

06:50.380 --> 06:58.510
Now, let us see if the data is normally or uniformly distributed, then standardization is a suitable

06:58.510 --> 06:58.900
method.

06:59.110 --> 07:00.950
Now let us see what happens here.

07:01.120 --> 07:04.730
So let us say this is an on normalised house data.

07:05.170 --> 07:12.120
So here we have data about the years or is of how much years old is the house.

07:12.760 --> 07:19.840
And here we have details about the numbers of rooms now based on the data which is present here.

07:20.080 --> 07:23.060
How can you see that the data is distributed?

07:23.500 --> 07:30.880
We can simply say that the data has a horizontal distribution and points us in a horizontal.

07:32.460 --> 07:42.240
But if we normalize the data now here, if we want to find out the distance or we want to see how different

07:42.300 --> 07:50.070
number of rooms are impacting or how yours are impacting some particular value, then it will be very

07:50.070 --> 07:56.760
difficult to find us based on the number of rooms because there is no change in the number of rooms.

07:57.000 --> 08:00.800
The changing number of rooms is not prominent enough.

08:01.410 --> 08:07.140
But if we normalize the data, this is the normalized data here.

08:07.140 --> 08:10.550
The range of the data has been changed earlier.

08:10.560 --> 08:17.670
The range for number of rooms, four zero two hundred and the years of old was zero two hundred.

08:19.020 --> 08:26.970
Now, here, the values of the houses have been changed and the number of rules now ranged from zero

08:26.970 --> 08:33.300
to one, and the years old range from zero point zero zero point four.

08:33.600 --> 08:37.250
Now you can see how the data is really leaning.

08:37.530 --> 08:43.620
Now we can actually find out two clusters that this type of data is different and this type of data

08:43.620 --> 08:44.110
is different.

08:44.430 --> 08:48.960
Now, if I want to find out where this point actually lies.

08:50.300 --> 08:55.760
So this point is very far away from these points, but it is.

08:56.970 --> 09:02.640
There is no impact of the number of rooms we cannot see anything which is coming out of the number of

09:02.640 --> 09:05.840
rooms here if you want to find out any distances.

09:06.250 --> 09:12.260
If I want to learn something about this data on the basis of a number of rules, it is very difficult

09:12.280 --> 09:12.590
here.

09:12.900 --> 09:19.590
But if I want to have a look at this particular data on the basis of number of rules, I can easily

09:19.590 --> 09:22.740
see that for this one, the number of rooms is zero point five.

09:23.220 --> 09:26.880
So I can easily understand this based on this normalized data.

09:27.030 --> 09:35.380
And see, in fact, comes in picture when we have we are dealing with different machine learning algorithms.

09:35.670 --> 09:44.280
So what will happen is if data is not normalized, then one particular column will have all the impact

09:44.280 --> 09:53.010
on the decision which would be made by the column, which has a very small range, really get diminished.

09:53.220 --> 09:55.410
The impact of it will be lost.

09:56.410 --> 10:04.060
So that is why it is very important to scale the data now in case the data is normally distributed,

10:04.270 --> 10:07.260
then we will follow the standardization procedure.

10:07.480 --> 10:13.290
And if the data is not normally distributed, then we will go for normalization of the data.

10:15.320 --> 10:21.320
Now you can see the changes which normalization and standardization will bring.

10:21.650 --> 10:30.080
So if this is the actual data, then normalization will try to bring it from between zero to one, bring

10:30.080 --> 10:38.420
the data between range of zero to one in both X and Y axis, while in case of standardization, it will

10:38.420 --> 10:43.640
try to bring the data between one standardization that this.

10:44.580 --> 10:48.130
Between minus one and plus one in the axis.

10:48.450 --> 10:52.410
So this is the difference between normalization and standardization.

10:55.120 --> 11:01.360
So you can simply have a look at the distribution of the data and based on the distribution of data,

11:01.360 --> 11:06.490
you can decide if you want to do standardization of the data or normalization of the data.

11:06.940 --> 11:09.360
So here are the formulas for that.

11:09.370 --> 11:18.550
If you want to do normalization, then normalization can be done by X minus Xman, divided by X, max

11:18.550 --> 11:19.870
minus X Y.

11:19.870 --> 11:23.620
Standardizations will be done by X minus you, divided by Sigma.

11:25.310 --> 11:29.660
Well, new is the meaning of the data, and Sigma is the standard deviation.

11:32.630 --> 11:38.900
Now, here you can see how that changes, so here we have an original data.

11:40.120 --> 11:47.620
Well, we have different columns with different ranges, so when we normalize the data, you can see

11:48.190 --> 11:51.760
all the data points to bring growth to the same range.

11:51.910 --> 11:56.020
So now the impact will not get diminished.

11:56.030 --> 12:02.770
So if we do normalization of this data, then the impact of the first, second and third column, which

12:02.770 --> 12:05.500
was getting diminished, will not be diminished.

12:06.370 --> 12:10.210
So this is how normalization and standardization helps us.

12:13.210 --> 12:21.190
Now, next, we need to know what clustering is, clustering is a process to create groups based on

12:21.190 --> 12:22.530
similarity measure.

12:23.150 --> 12:25.360
OK, so what is similarity measure?

12:25.570 --> 12:31.850
Similarity measure is different criteria on the basis of which these groups are similar.

12:32.110 --> 12:38.890
So let's say we have some fruits and vegetables and we want to find out the similarity measure.

12:39.100 --> 12:47.810
The similarity measure would be the taste of the item which we have or if it is a juicy fruit or vegetable.

12:47.980 --> 12:49.830
So there are different tribes.

12:49.960 --> 12:51.730
So based on that, we can decide.

12:52.030 --> 12:59.050
So let's say we are deciding on the basis of taste, then apples and potatoes will be brought into the

12:59.050 --> 13:01.270
same category because that is just using.

13:03.130 --> 13:09.850
So principle of maximisation of interest, lustrous similarities and minimization of investor similarity

13:09.850 --> 13:11.770
is the only principle here.

13:11.980 --> 13:16.400
So what we want to do is we want do group items together.

13:16.420 --> 13:18.490
So how will we roll them together?

13:18.790 --> 13:28.240
We will try to maximize the distance between two different clusters and minimize the distance between

13:28.240 --> 13:31.780
the points or data points which are belonging to the same cluster.

13:34.390 --> 13:39.170
Now, here we have a grouping of details of how we can grouplove data.

13:39.400 --> 13:46.570
So let's say we have these data points so you can easily see that for these data points.

13:46.570 --> 13:48.110
There are two groups present.

13:48.430 --> 13:52.550
One is this particular group and another is this particular group.

13:52.960 --> 13:58.480
So here we can easily group this data into one and two groups.

13:58.690 --> 14:04.840
But now let us say, if we want to have more number of groups, then what will happen?

14:04.870 --> 14:08.620
These are the two farthest groups so we can create these two groups.

14:08.860 --> 14:12.730
Now, let's say I want to create four clusters out of this data.

14:13.600 --> 14:17.470
So create two to create four clusters out of the data.

14:17.740 --> 14:23.410
The farthest ones will again be subdivided, will belong to two different clusters.

14:23.710 --> 14:27.270
Now, these points belong to a similar area.

14:27.280 --> 14:29.170
So these will be clustered together.

14:30.940 --> 14:38.020
And these three points are a little separate from these points, so it will be considered in a different

14:38.020 --> 14:38.420
class.

14:38.710 --> 14:42.970
Similarly, these three points are a little far away from these two.

14:43.120 --> 14:45.520
So these will be considered in another class.

14:45.760 --> 14:53.920
Now, let us see if we want to include more number of clusters, then these points also seem to be different

14:53.920 --> 14:54.790
from each other.

14:54.830 --> 14:56.970
They still have a little distance between them.

14:57.280 --> 15:02.680
So we can have the number of clusters so we can create six clusters like this.

15:05.480 --> 15:12.440
And if we start to find out more a number of clusters, if we start to find out more number of clusters,

15:12.740 --> 15:19.910
then it will kind of create clusters which don't really exist also, because now it seems like the data

15:19.910 --> 15:21.440
has been clustered properly.

15:23.190 --> 15:27.080
Now, one more thing is here, if you see this particular data.

15:28.200 --> 15:35.700
Here, when we see this particular data legacy of this data actually was in the form of.

15:37.090 --> 15:37.900
This data.

15:39.350 --> 15:45.970
Then think about it, would you be able to find the clusters it will have been.

15:46.310 --> 15:54.860
It would have been very difficult to find out clusters from this kind of data, while it is very simple

15:54.860 --> 16:00.650
to find out clusters using this kind of data, because now the data is present in a similar scale.

16:01.430 --> 16:05.900
Now the state data has been either standardized or normalized.

16:06.020 --> 16:13.130
So the distance is clearly visible here because the distance was not clearly visible, because the scales

16:13.130 --> 16:16.730
were so different, the points were overlapping.

16:16.880 --> 16:21.300
So no clustering could have been possible in this particular situation.

16:21.800 --> 16:26.080
That is why normalization or standardization is so important.

16:28.270 --> 16:29.560
Now, you can see here.

16:31.700 --> 16:40.490
We have these three clusters created, so the distance between this and this line is called in the cluster

16:40.520 --> 16:47.750
distance that is distance between two different clusters and the distance between points in the same

16:47.750 --> 16:50.710
cluster is called an intra cluster distance.

16:53.300 --> 17:00.740
And what is the definition of clustering, the principle of maximization, of the cluster similarity

17:00.950 --> 17:08.360
and minimization of the entire cluster similarity, which is we want to minimize this particular distance

17:08.540 --> 17:10.910
and maximize this distance?

17:11.980 --> 17:16.480
That is exactly what we have done here by creating the flusters.

17:21.810 --> 17:29.910
Now, let us have a look at a different type of distance findings, so one is missing the link.

17:31.730 --> 17:40.520
Single link is to find out distance between the closest points when we find out the distance between

17:40.520 --> 17:44.900
the closest points, it is called single link distance.

17:45.740 --> 17:52.940
If they want to find out the distance between the farthest point, then it is called a completely.

17:54.710 --> 18:00.590
Because in this view of finding out the distance between the farthest points, the points which are

18:01.070 --> 18:02.660
farthest from each other.

18:03.860 --> 18:12.380
Here, average linking is when we want to find out distance between the average of all the bears.

18:12.620 --> 18:19.240
So what we will do, we will basically consider all the bears, all the bears.

18:19.250 --> 18:24.890
But as NPR's so I will bear this with this also this with this also this with this.

18:25.230 --> 18:30.140
So I will create different bears and doing Albrecht's average of all the.

18:31.360 --> 18:40.210
Once President S.A.G. distance is when I will find out the distance from the centroid, like this so-called

18:40.220 --> 18:45.700
Synthroid distance, so I will create the same idea and find their distance from the centroid.

18:47.710 --> 18:54.610
This is about unsupervised learning, I will be discussing different unsupervised learning algorithms

18:54.790 --> 18:56.140
from the next session.