WEBVTT

00:01.200 --> 00:08.310
Now that we have the both unsupervised learning, let us have a discussion on different types of evaluation

00:08.310 --> 00:13.230
parameters, which we have so clustering is used for.

00:14.790 --> 00:17.820
Grouping different types of data, datapoint.

00:18.210 --> 00:24.000
Now here you can see we have different types of data points present and these have been clustered by

00:24.000 --> 00:25.690
different methods which represent.

00:27.120 --> 00:32.430
These different methods allow to create clusters with different properties.

00:33.450 --> 00:41.700
Now, this particular documentation, which is present inside Gitlow, provide a complete overview of

00:41.700 --> 00:44.410
how each algorithm can be implemented.

00:45.030 --> 00:53.730
So when we go below, we can directly see the implementations like DV scan, hierarchical clustering.

00:56.820 --> 00:57.780
Gaming's.

00:59.480 --> 01:00.470
And rescan.

01:01.630 --> 01:09.130
All of these have been implemented here, and you can see the implementation and details about the same

01:09.130 --> 01:11.830
by going inside the demo for the same.

01:19.490 --> 01:22.190
Here is the entire implementation of this.

01:26.310 --> 01:33.060
This is another documentation for different types of examples, which we have, so here you can actually

01:33.060 --> 01:40.620
find out different examples for classification problems, flustering problems for convenience, estimation.

01:41.630 --> 01:48.410
Different data set examples, decision trees, ensemble methods and different methods which are available,

01:48.710 --> 01:51.620
and there are different exercises also which are available.

01:51.630 --> 01:56.690
So you can try these out and see how good you can work on these.

01:57.220 --> 02:01.890
Next, we will try to have a look at the demonstrations.

02:01.930 --> 02:09.200
So here we have different demonstrations, all of gaming's and different type of algorithms which are

02:09.200 --> 02:09.660
present.

02:10.460 --> 02:16.040
So here we can see the hudud, the clustering, the underground, which has been created here.

02:17.110 --> 02:19.120
And how we can plot this.

02:23.890 --> 02:25.800
Again, you can go to plastering.

02:26.960 --> 02:28.520
And see the.

02:29.940 --> 02:32.310
Implementation of gaming's here.

02:33.500 --> 02:38.930
This has the complete implementation of gaming's with the plot.

02:44.180 --> 02:46.340
For those who can again go up.

02:48.990 --> 02:55.170
And again, have a look at a different clustering algorithms, the demos, the examples, so it has

02:55.170 --> 02:57.680
a lot of information which you can use.

02:57.690 --> 03:01.310
And here is a comparing different clusters and algorithms.

03:01.590 --> 03:05.790
So a lot of examples are present here for your exploration.

03:06.480 --> 03:11.930
This is the website for Psychic Land, which has all these information available.

03:13.430 --> 03:20.210
Then you can go to the user guide, which is present above this, you visit guide contains details of

03:20.210 --> 03:28.720
all the supervised learning algorithms, unsupervised learning algorithms, how we can do modern selection,

03:28.730 --> 03:30.940
how we can evaluate the models.

03:31.980 --> 03:39.240
Lake cross-validation during the hypovolemic, those metrics and scoring model, persistance, validation

03:39.370 --> 03:39.720
of.

03:41.400 --> 03:50.100
How we can do that transformations how we can load different detours so all of these things are given

03:50.100 --> 03:57.100
in this particular Whipsnade, which is very nice and very nicely explained, everything is provided.

03:58.020 --> 04:00.580
So we'll go to clustering here and clustering.

04:00.720 --> 04:05.640
We have this last option, which is clustering performance evaluations, which we go through this.

04:06.300 --> 04:13.530
So we have several metrics which are present for clustering performance evaluation, first one being

04:13.530 --> 04:15.960
the adjusted right index.

04:17.600 --> 04:26.480
It is the for the ground truth, for the class assignment, so are clustering algorithm of language

04:26.480 --> 04:33.590
of the same level and just an index is the function that measures the similarity of the assignments,

04:33.590 --> 04:37.170
ignoring full mutation and with chance normalization.

04:37.640 --> 04:45.080
So it will basically compare the actually deals with the labels, which we have provided and tell us

04:45.260 --> 04:47.090
how good our clustering is.

04:47.690 --> 04:51.560
But this is used when we already have the labels of.

04:53.010 --> 05:00.870
So the advantages are that this particular labelled assignment having at a school close to zero point

05:00.900 --> 05:10.550
zero, so for any value of interested in sample, which is not the case for index and the measure,

05:10.860 --> 05:12.390
so these can be used.

05:13.630 --> 05:23.620
Then there is another index, which is mutual mutual information based school, which can again be found

05:24.070 --> 05:28.980
using Escalon metric DOT adjusted mutual information.

05:29.290 --> 05:31.990
So this is the school which you can actually use.

05:35.350 --> 05:38.200
The perfect labeling school would be one.

05:40.760 --> 05:49.790
Next, which we have, is the homogeneity or completeness and we measure these are different measures

05:50.150 --> 05:50.910
which we have.

05:51.310 --> 05:58.850
So this has the concept of homogeneity, score and completeness for so homogeneity is what gives how

05:58.850 --> 06:01.820
homogeneous the data is, the clusters.

06:05.160 --> 06:17.340
So we can find this out, and the one which is most important and mostly used is the psyllid coefficient

06:17.940 --> 06:24.990
psyllid coefficient is of comparing the mean distance between a sample and all other points in the same

06:24.990 --> 06:33.480
cluster that the mean distance between the sample and all other points belonging to the next nearest

06:33.480 --> 06:34.050
clusters.

06:34.440 --> 06:41.520
So based on these distances, that is the inverted cluster distance and the cluster distance.

06:41.790 --> 06:48.540
It finds out this and it's called the Senate score ranges from minus one to one there.

06:48.810 --> 06:52.440
One is the perfect cluster formation.

06:52.770 --> 07:00.060
Negative one means that the cluster has been misclassified and the zettl means that the clusters are

07:00.060 --> 07:01.420
overlapping in nature.

07:01.690 --> 07:05.460
And here you can again find the implementation of the same.

07:05.880 --> 07:12.480
We will be using this implementation while we will be implementing different models in our lecture's.

07:13.660 --> 07:22.270
So the simple implementation is we will involved the skill on import metric and from the metric, we

07:22.270 --> 07:25.500
will simply use metric dorsolateral.

07:25.930 --> 07:33.340
We will give the values, the labels and the metric that you want to use for the distance calculation,

07:33.700 --> 07:37.130
and it will give the values which should be near two one.

07:37.840 --> 07:45.810
So this is one very important index and most importantly used index, which is political efficient.

07:46.060 --> 07:52.120
So this is what we will be using for this particular obsession and for all the.

07:53.560 --> 07:56.950
Of clustering algorithms, which we will be implementing.

07:57.490 --> 08:07.210
So this is about the Escalon Library and how you can go through this library and take different algorithms

08:07.210 --> 08:10.540
and the examples which are available under this.

08:11.080 --> 08:18.340
So you can go to the examples and see different examples available for all types of problems which are

08:18.340 --> 08:18.850
present.

08:20.130 --> 08:27.510
Similarly, you can go to the user guide and under User Guide, you can look up any algorithm which

08:27.510 --> 08:35.830
you want to find about and learn about it next in case you want to learn about the EPA itself.

08:35.850 --> 08:44.820
You can simply go to the EPA and it will start showing the the methods and different usually used libraries

08:44.970 --> 08:46.590
from this particular EPA.

08:48.390 --> 08:57.020
So this is about Escalon and how we can actually use different coefficients and most importantly, Sillett

08:57.060 --> 09:00.810
index for finding out the goodness of a cluster.

09:01.350 --> 09:02.710
So thank you.

09:02.880 --> 09:10.380
In the next session, we will be learning about Haralson clustering and then we will talk about different

09:10.380 --> 09:12.360
clustering methods which are available.