WEBVTT

00:01.050 --> 00:08.100
How in this session, we will discuss about the next unsupervised learning algorithm that the clustering

00:08.100 --> 00:10.500
algorithm named SCAN.

00:23.110 --> 00:29.340
Whatever drawbacks we have in case of humans would not be best in case of Busca.

00:31.000 --> 00:33.160
So let us discuss about the scam.

00:33.340 --> 00:46.050
So in case of gaming's clustering, we needed the number of clusters and we used to create very good

00:46.210 --> 00:48.370
clusters in case of gaming's clusters.

00:48.580 --> 00:55.230
And we would fix two separate points based on which the clusters would be created.

00:55.600 --> 01:01.680
Now, because of those Troyes, the clusters would meet radically initial.

01:03.240 --> 01:13.320
But in case of the rescan, we will not be declaring how many cluster's people to babies can or although

01:13.620 --> 01:19.540
they take the number of clusters, so for this they become important.

01:19.770 --> 01:28.590
We will need to barometer's the first family there is Epsilon and which is the neighborhood size.

01:29.100 --> 01:35.910
And the second parameter is the minimum number of point, which is required to form a cluster.

01:37.080 --> 01:44.910
Now, in case of a baby scan, it will also detect the number of clusters and it also does not make

01:44.910 --> 01:53.940
assumptions of those vertical clusters, which was happening in case of gaming's in babies can it can

01:53.940 --> 02:02.160
be used for anomaly detection and it does not get impacted by the outliers because the clusters will

02:02.160 --> 02:07.380
be formed based on the minimum number of points, which we will be declaring.

02:08.420 --> 02:16.120
So if there are any outliers, then based on these minimum points, we can actually figure that out,

02:16.430 --> 02:21.440
those outliers, and not consider them for those clustered.

02:22.880 --> 02:29.880
So for this, we will go to the same website where we can actually visualize the clustering.

02:31.850 --> 02:35.840
So this is the website for visualizing the clustering.

02:36.230 --> 02:42.030
So here we can simply select the type of data which people want.

02:42.350 --> 02:44.770
So let's try the smiley face again.

02:46.790 --> 02:48.650
Now in this smiley face.

02:49.890 --> 02:57.210
We will have to provide the absolute value and the minimum point's value, so minimum point's value

02:57.420 --> 03:03.430
will actually define how many points are required to actually form clusters.

03:04.140 --> 03:07.470
So for this, the minimum point is stated.

03:07.470 --> 03:15.870
As for the absolute value is given as one, which means that how many neba we want to have, the higher

03:15.870 --> 03:23.520
the value of Epsilon, the more points will be considered in case of one creating the neighborhood.

03:23.700 --> 03:25.560
So let's run this.

03:31.640 --> 03:38.150
Now, I want to begin this, it will randomly start to the starting point and start creating the clusters.

04:26.340 --> 04:34.040
Here you can see how easily the scan was able to find out unflawed photo of the clusters which will

04:34.050 --> 04:34.500
present.

04:36.560 --> 04:46.010
Now, let us try to look at more details, that is try to run another visualisations so we go to restart

04:46.010 --> 04:49.190
and this time we select the Pembridge Smiley.

04:51.160 --> 04:54.590
And let's decrease the Absolon value.

04:54.970 --> 04:57.790
So now the Absolon value is zero point six zero.

05:53.610 --> 05:57.580
So here you can see it is currently checking for different point.

05:57.870 --> 06:03.750
So it started checking these points, which are presently in the middle, but because it was not able

06:03.750 --> 06:11.670
to find four point one particular place, so it could not really create clusters of of these points.

06:12.090 --> 06:15.270
So let me increase the epsilon value for this one.

06:20.170 --> 06:26.500
Now, you can see that because the absolute value was low, smaller neighborhood clusters were created

06:26.500 --> 06:28.930
because there were more number of clusters created.

06:29.290 --> 06:34.900
Now, if I will increase the value to let us say it for Epsilon.

06:38.160 --> 06:39.570
Now, let's run this.

06:42.220 --> 06:48.640
And let us keep the minimum number of points for clustered as three instead of.

07:06.010 --> 07:11.080
With higher value of epsilon, you can see the cluster making process is also Plaistow.

07:14.660 --> 07:23.000
And you will see that it is checking into each and every point for the cluster creation, but wherever

07:23.000 --> 07:27.800
it will find at least three point, there will be it will create a cluster.

07:45.240 --> 07:52.970
But you can see it has detected all of the point as outliers, so let us try one more scenario.

07:55.710 --> 08:03.180
Now, let us have a look at the exemptions, so as we saw individualisation similarly here, you can

08:03.180 --> 08:09.680
see that it is able to take the clusters, even though they are not spherical in nature.

08:09.870 --> 08:12.840
It will detect the clusters in whatever shape.

08:12.840 --> 08:18.120
They are fresh and it can detect different patterns which are present in the data.

08:18.330 --> 08:21.690
And it will not try to create vertical clusters.

08:21.690 --> 08:25.200
And it will also isolate the outliers.

08:25.200 --> 08:31.920
So it will keep the outliers aside so that we will know that these are actually not a part of the ditzen.

08:33.730 --> 08:41.620
So here is a comparison of Baby Scammon Kamins, so baby scan is able to detect these dooring separately

08:41.620 --> 08:45.070
like means try to create very few clusters.

08:45.340 --> 08:48.820
Same thing applied for these two harpoons.

08:49.270 --> 08:54.530
And here again, a baby scan was able to find both the different classes.

08:54.550 --> 08:55.720
So this is different.

08:56.680 --> 08:58.570
And so they're different clusters.

08:58.870 --> 09:03.170
Was able to group these together while here in gaming's.

09:03.190 --> 09:06.790
It just right through a cluster and spherical vs.

09:08.660 --> 09:15.080
Here you can see that it has created three different clusters in Wisconsin canings, which are spherical

09:15.080 --> 09:15.580
clusters.

09:15.590 --> 09:19.140
So it has evolved as good as K means algorithm.

09:19.700 --> 09:24.830
I'm here you can see the most important example, which is uniform distribution.

09:25.070 --> 09:32.330
So the scan does not impose clusters if there are no different clusters in the data.

09:32.360 --> 09:41.060
So here there are actually no clusters and this is able to detect that thing while gaming's tries to

09:41.060 --> 09:45.210
create clusters, no matter the president or not.

09:48.720 --> 09:56.140
Now, to think the quality of the posters which have created we have this scoring metric which is scored

09:56.220 --> 10:00.980
for you, it is also called Sillett Index.

10:01.470 --> 10:09.270
If the index value is high, it means that the object is very much the way it's all cluster and poorly

10:09.270 --> 10:11.000
matched to the neighboring cluster.

10:11.370 --> 10:19.560
So Senate coefficient is calculated using the mean distance and the mean nearest cluster of distance.

10:20.130 --> 10:30.850
That is a and so those select corporations will be defined as s.a given by by minus E divided by a maximum

10:30.870 --> 10:32.200
of NBA.

10:34.560 --> 10:42.770
So here is the average dissimilarity of I object to all of the objects in the same cluster.

10:42.960 --> 10:53.490
So it compares the inverse cluster of cluster distance V VI compares the distance of a particular weight

10:53.490 --> 10:56.880
with the of all objects in the closest cluster.

10:57.510 --> 11:00.830
So this is what's called a gift.

11:01.080 --> 11:04.380
So we want this sentence to be closer to one.

11:04.620 --> 11:10.140
If the school of school is close to one, it means that the clusters are well formed.

11:10.410 --> 11:17.520
And if the center of school is minus one, it means that the clusters are actually placed in different

11:17.820 --> 11:18.470
clusters.

11:18.480 --> 11:25.620
So instead of being placed in the cluster, a point has been missed class of misplaced in a different

11:25.620 --> 11:26.190
cluster.

11:26.550 --> 11:36.040
And when the Szilard index value is zero, it means that the clusters are kind of overlapping in nature.

11:37.500 --> 11:40.260
So this is about the range of the value.

11:40.260 --> 11:44.550
That is, if citified is close to one, the sample is well clustered.

11:44.730 --> 11:50.520
If civic value is zero, could be assigned to one of the cluster closest to it.

11:50.550 --> 11:57.630
I'm topolice equally far away from both the clusters that this means that it indicates overlapping clusters

11:57.870 --> 12:00.420
and absolute value is minus one.

12:00.720 --> 12:04.250
Then it means that the sample is misclassified.

12:04.500 --> 12:14.070
So one is the perfect score and we want the solid score to be as close to one as possible.

12:14.790 --> 12:17.190
So this is about six score.

12:17.490 --> 12:26.580
Next, we will implement the babies can include them and so that you will get a picture and be able

12:26.580 --> 12:34.200
to compare different clustering algorithms that have had a good game means and they'll be scanned.

12:34.350 --> 12:43.830
And as we already told, a DV scan is used when we don't have any spherical clusters and ketamine's

12:43.830 --> 12:49.590
works better than we have spherical clusters and we actually know the number of clusters.

12:49.740 --> 12:56.220
In case we don't know what is the number of clusters, then we can easily use the rescan by providing

12:56.220 --> 12:59.160
the epsilon and minimum number of point value.

12:59.820 --> 13:06.980
The scam is used when we have certain outliers and then we want to isolate the outliers and I want to

13:06.990 --> 13:09.120
consider the other players in this condition.

13:10.760 --> 13:17.810
And regarding how to head up clustering, so hierarchical clustering would be helpful when we have a

13:17.810 --> 13:19.040
smaller dataset.

13:19.280 --> 13:24.020
So this is one restriction which hierarchical data clustering has.

13:24.230 --> 13:30.950
Otherwise hierarchical clustering is very nice clustering method because it does not change the type

13:30.950 --> 13:35.000
of cluster with each and every one like gaming's.

13:35.000 --> 13:40.400
I'm the rescan, so it gives the same cluster everytime had the clustering.

13:41.360 --> 13:46.360
So this is of one difference between these three clustering and qualities.

13:46.610 --> 13:49.240
So next we will see the implementation.

13:49.430 --> 13:51.770
So you will get a clearer picture of the same.