WEBVTT

00:01.030 --> 00:05.990
Higher in this session, we will be implementing hierarchical clustering.

00:06.340 --> 00:13.840
So let us begin the first thing which we will be doing is imposed a different library, such as import

00:14.780 --> 00:17.310
fundus as Phoebe.

00:20.570 --> 00:28.760
No, as and then we will import matplotlib.

00:31.350 --> 00:32.580
Don't buy plot.

00:35.120 --> 00:42.260
As a lot, next, we will import Seabourne.

01:03.080 --> 01:06.170
So we lean forward from this.

01:08.900 --> 01:13.900
We don't really need this one, let us import.

01:17.190 --> 01:18.420
And Skillern.

01:19.620 --> 01:20.350
Dr..

01:21.500 --> 01:23.330
Pre processing.

01:27.340 --> 01:28.320
Import.

01:29.890 --> 01:35.230
Sked, this particular library will actually help us to scale the data.

01:37.240 --> 01:45.160
Next from Escalon dot cluster, we will import the.

01:48.000 --> 01:50.190
Argumentative flustering.

02:01.670 --> 02:07.760
And the last we will implode this school, which is a metric for evaluation.

02:28.960 --> 02:30.790
Let me run this.

02:40.410 --> 02:42.090
Thought Matrix.

02:45.470 --> 02:53.210
So after this, we will be importing the items dataset, which is the data set, which we will be using

02:53.210 --> 02:55.670
for this particular tutorial.

02:56.030 --> 02:59.780
So let me import defined by other.

03:02.150 --> 03:06.140
Is equal to the filename that is itis.

03:07.260 --> 03:08.490
NSV.

03:10.020 --> 03:19.350
Next, I will give the data frames while the iris is equal to the dot read.

03:21.210 --> 03:22.080
CSFI.

03:25.030 --> 03:26.970
Inside this, I give the other.

03:27.610 --> 03:36.520
So it should be treated like this only and I will give I just thought hard for you to see the detective.

03:37.120 --> 03:38.980
So this is the data that we have.

03:39.520 --> 03:42.100
So this data contains idee.

03:43.510 --> 03:51.400
The sibilant separate the battlement battle delivered on the species of the.

03:52.540 --> 04:00.460
Fly, which we have now, because we don't want to classify anything, we are just implementing the

04:00.460 --> 04:01.690
clustering algorithm.

04:02.180 --> 04:09.520
So clustering algorithm, we don't need the target values and we also don't need this idea because it

04:09.520 --> 04:11.230
is a completely redundant column.

04:11.240 --> 04:13.090
So I will drop these columns.

04:13.630 --> 04:18.640
So I simply say iris dot drop.

04:23.680 --> 04:30.730
And in this, I will give me a list of the columns which I want to drop, that this idea.

04:33.210 --> 04:38.040
And the other one is species.

04:41.720 --> 04:51.800
Next, I will give the axis, which is one, and in place equal to two.

04:53.550 --> 04:58.860
Next, I will print the data set for you, so I'll say I just started.

05:00.350 --> 05:09.350
And this is the data center, which we have now, now let us create clusters out of the first two columns.

05:10.350 --> 05:11.190
So.

05:12.310 --> 05:19.750
For this, we will, first of all, let us create the new grandmotherhood, so let us actually visualise

05:19.750 --> 05:23.010
how hierarchical clustering would be working.

05:23.560 --> 05:29.800
So we will import the library from sci fi.

05:29.810 --> 05:35.380
So we will see from sci fi dot Plaistow.

05:36.830 --> 05:37.520
Dorte.

05:38.880 --> 05:39.880
Hirers.

05:41.210 --> 05:44.420
He import.

05:46.570 --> 05:47.320
Dendle.

05:49.250 --> 05:50.270
Then, Joe.

05:51.590 --> 05:52.130
Gramme.

05:53.490 --> 06:04.120
And we want linkage from it, so we will import these libraries and now we will need to scale this particular

06:04.120 --> 06:07.210
data to so how we will scale the Zetas.

06:07.210 --> 06:11.430
So to scale this data, we have imported this particular package.

06:13.950 --> 06:17.130
So we will simply see Iris.

06:18.790 --> 06:20.860
Underscores the steady.

06:21.930 --> 06:25.650
Is equal to feed dot data frame.

06:28.350 --> 06:29.280
And the.

06:30.550 --> 06:31.240
Give me.

06:34.270 --> 06:41.460
Details, so we will skew the details of this skin and insight designed to give the dataset so Iris,

06:41.710 --> 06:43.540
so this will simply scale the data.

06:43.580 --> 06:46.630
Now, now I will have to give the columnists.

06:46.630 --> 06:58.600
So the column will be columns equal to I can put this into a list for them and I get ideas dot columns

06:58.600 --> 06:58.900
from.

07:00.600 --> 07:06.870
So this will basically give me the list of the columns and I'm converting these columns into list from

07:07.770 --> 07:10.370
next, let me display the data for you.

07:10.380 --> 07:16.820
So I see Iris Dot sorry, underscore Steve.

07:18.150 --> 07:24.730
Not this time, so that we can see what exactly is present in this data.

07:25.320 --> 07:26.250
So this is the.

07:27.580 --> 07:29.590
Data which has been circulating now.

07:34.890 --> 07:42.110
Next, less let us create our program out of it, so to create an anagram, letters of creative, the

07:42.120 --> 07:42.680
good for us.

07:42.690 --> 07:45.750
So we will see if not not finger.

07:46.840 --> 07:53.110
And inside this, we will give the big size, the big size is.

07:55.270 --> 07:57.250
Let us say 10, Gummow five.

07:58.770 --> 08:01.560
And we want to have a lot.

08:02.640 --> 08:05.120
Got X neighbor.

08:06.660 --> 08:09.090
So in the X level, I want to give.

08:13.100 --> 08:13.960
Somphone.

08:17.400 --> 08:21.400
Or simply say IDEX for simplicity.

08:22.140 --> 08:23.970
And next, we will have.

08:26.810 --> 08:28.700
Or just a simple index.

08:32.540 --> 08:35.850
Next, we will have the available.

08:41.700 --> 08:44.610
So in the vilely will let us have the distance.

08:47.360 --> 08:49.220
Next, we will create.

08:50.100 --> 08:51.180
The language.

08:52.870 --> 09:00.610
So this will create the linkage for us, so we have given linkage and the data set and the method which

09:00.610 --> 09:08.590
you want to create the linkage with and then from the data, we want to print the linkage which we have

09:08.590 --> 09:16.340
created and provide the rotation and different details about the levels on the road.

09:17.110 --> 09:18.670
So we will just print this.

09:19.960 --> 09:21.730
It will take a little time.

09:22.880 --> 09:28.040
And this is the dental of which has been created, so here you can see the.

09:29.450 --> 09:35.600
No signs of the clusters, the values which are present, the indexes.

09:36.570 --> 09:41.670
So here you can see that this is the second and next, so this is in the next.

09:46.610 --> 09:52.240
You can see that these are the glass doors which have been created, so if we want to have two clusters,

09:52.240 --> 09:55.980
we can simply break this distance equal to 15.

09:56.320 --> 09:59.680
And these would be the two clusters which will be developing.

09:59.890 --> 10:06.580
If you want three clusters, then we can break it at this point, and then we will have one, two and

10:06.580 --> 10:07.360
three cluster.

10:07.360 --> 10:09.760
That is green, red and blue.

10:11.760 --> 10:18.180
So these are we have been highlighted, so these are the actual clusters which are present in this dataset,

10:18.690 --> 10:21.610
which is actually being depicted here.

10:22.290 --> 10:28.590
So now we will go ahead and plot a cluster.

10:28.590 --> 10:35.310
So we report that we will again take a subset of data so that we can visualize it more properly.

10:35.340 --> 10:39.510
So for that, let's take only these two columns.

10:40.970 --> 10:42.320
So we will see.

10:43.580 --> 10:44.510
Iris.

10:45.630 --> 10:47.760
Is equal to Iris.

11:09.680 --> 11:13.580
We have updated the data set, now we can see it is.

11:14.990 --> 11:16.070
Describe.

11:20.780 --> 11:29.390
So here you can see the values now begins and the scale, the data, or we could have simply done this

11:29.390 --> 11:31.410
from the standardized details.

11:31.430 --> 11:33.380
And it does do it with the standardized data.

11:42.430 --> 11:47.920
So here you can see we have simply used these standardized data, which we had created earlier.

11:51.210 --> 11:59.280
Now we will big of the clusters, so we will see clusters now also for creating the clusters we value

12:00.000 --> 12:03.510
from several number of clusters, so.

12:04.420 --> 12:13.510
Create this particular for loop, then we are outrating on top of clusters ranging from two to 19 and

12:13.660 --> 12:16.460
we have created the cluster model object.

12:16.900 --> 12:22.870
So this is the model object which has the object of argument, of clustering where we have provided

12:22.870 --> 12:30.970
the number of clusters, the affinity, which is the distance, calculating metric which we have used

12:31.330 --> 12:34.120
and the type of linkage which we have decided.

12:35.250 --> 12:44.550
And we have provided the data, so we have the models which then predicted the model using the to standardize

12:44.550 --> 12:47.850
data, which has only these two columns for now.

12:48.890 --> 12:52.190
And we have gathered the label in Kluster label.

12:54.170 --> 13:00.230
And after getting these values, we have actually calculated this old school.

13:01.630 --> 13:02.650
Using the.

13:03.650 --> 13:08.390
Standard data set and the label which has been generated.

13:09.680 --> 13:18.560
And we are next printing the score for each number of clusters, so when we will run this, it will

13:18.560 --> 13:20.580
give us all the number of clusters.

13:20.840 --> 13:28.640
Now, we already know that the score, which is closest to one, is the best Szilard school.

13:28.910 --> 13:34.380
And unfortunately, here the score is around zero point four.

13:34.850 --> 13:39.920
So we will select this particular Senate school, which appears to be the maximum one.

13:40.980 --> 13:48.420
Out of all the Senate scores, which we have, so we will select and is equal to three.

13:49.330 --> 13:54.460
So let us actually implement this for these three cluster's.

13:56.100 --> 13:57.980
So we will pick this up.

14:09.340 --> 14:15.820
In this, we will put three, which is the best question which we have found.

14:18.930 --> 14:22.470
So we can see these live average values for this.

14:23.680 --> 14:28.320
Which comes out to be zero point forty, which is exactly the one which we have got.

14:29.590 --> 14:33.620
So you can see that even out there running this particular method.

14:33.640 --> 14:39.040
Again, it did not change the index value, it came up to be seen.

14:39.520 --> 14:49.330
That is because the idea of clustering or we can see the hierarchical clustering is always giving the

14:49.330 --> 14:53.110
same results, no matter how many times we are running it.

14:54.160 --> 15:00.880
Because it is not dependent on any external criteria, it is dependent on the distances between the

15:00.880 --> 15:02.500
points which are not changing.

15:05.320 --> 15:10.870
Next, what we will be doing is so next we will create a blog.

15:12.530 --> 15:16.160
So to create this particular plot, let us.

15:17.790 --> 15:25.080
Get the neighbors for this, so for Lebas, these are the neighbors which we have.

15:26.720 --> 15:34.430
And let us look at these levels to the data set, which we already have, so we will see Iris.

15:35.600 --> 15:41.750
Under Iris and we will save Mehboob.

15:42.790 --> 15:48.010
And unbelievable, we will for the values from the cluster labels.

15:51.670 --> 15:55.790
Next, we will look at of.

15:56.620 --> 16:01.990
So for that, we will see there's not a lot.

16:03.180 --> 16:09.770
And inside this, we will give the details so we don't want to further aggression, so we will save

16:09.940 --> 16:10.470
dreg.

16:12.280 --> 16:13.330
Equal to.

16:14.430 --> 16:15.450
Foy's.

16:19.570 --> 16:20.950
Next, we will.

16:22.600 --> 16:30.600
Provide the X and Y values so that X and Y values will be the values which we have here.

16:34.740 --> 16:42.660
So these are the X and Y values, so X is equal to this and this is why want.

16:44.610 --> 16:45.750
Next, we will.

16:46.900 --> 16:49.990
Provide the data set, so data is.

16:50.920 --> 16:51.970
Iris.

16:54.140 --> 16:55.370
And for the.

16:57.580 --> 17:00.820
Now, this, again, has to be in the eye of standard.

17:02.450 --> 17:05.990
And this also has to be the standard data.

17:07.630 --> 17:15.730
Next, which we will be doing is selecting the Hill, so we will give you as the cluster labels, which

17:15.730 --> 17:16.240
we have.

17:21.970 --> 17:24.250
Or we can simply give this one to.

17:29.910 --> 17:32.420
So let's create this plot.

17:34.280 --> 17:42.140
So this is the plot which have been generated, so here you can see that we have got three clear clusters

17:42.140 --> 17:42.800
created.

17:45.350 --> 17:54.020
So this is how we can use and limit on how people clustering for generating clusters for any type of

17:54.020 --> 17:54.630
dataset.

17:55.220 --> 18:01.980
Now we can use a different number of variables here.

18:02.000 --> 18:07.770
We have used only two because it is easy to visualize for these two videos.

18:08.270 --> 18:15.500
But we could have used all four of these and it would have created as good clusters as these ones.

18:16.310 --> 18:20.300
But the only drawback here is we cannot apply this.

18:20.690 --> 18:28.970
Or I must say that it does not prescribe to apply this for a large dataset, because then it would take

18:28.970 --> 18:33.020
a lot of time to generate the data from all the clusters.

18:33.620 --> 18:43.310
So this is about how the clustering next we will we will generate the gaming's clustering in comparison

18:43.310 --> 18:48.290
to the hierarchical clustering once we have learned about the gaming's clusters.