WEBVTT

00:00.780 --> 00:07.020
In this session, we will implement dimensionality reduction, so the first thing which we will be having

00:07.170 --> 00:08.850
a look at is Deacy.

00:09.600 --> 00:20.400
DNA is an algorithm which allows us to convert a very large dataset into a small dataset and into a

00:20.400 --> 00:24.360
very less number of rules in a very small number of components.

00:25.440 --> 00:34.700
Now, the companies generated by DNA are not as good as the ones which are generated by a PC, but the

00:35.370 --> 00:37.980
is usually used for the.

00:39.220 --> 00:40.930
Visualization purposes.

00:41.110 --> 00:51.730
So in case we want to visualize the clusters in the data, then we may use as otherwise the most frequently

00:51.730 --> 00:53.730
used one is VXI.

00:54.710 --> 01:05.930
So for this, we will implement by importing Findus, no B.S. from Escalon decomposition scaley A.S.A.

01:05.930 --> 01:08.180
from Escalon manyfold.

01:09.350 --> 01:17.690
So for this particular problem, we will be using the digit data set, this is in handwritten digits,

01:17.690 --> 01:18.250
data set.

01:18.800 --> 01:27.860
So this dataset consists of different values, consists of small images, small black and white images

01:28.220 --> 01:33.620
which contain the sixty four fixes that this.

01:34.830 --> 01:36.780
Twelve across 12.

01:37.470 --> 01:43.420
That is eight plus eight, which is so far this this is the dataset.

01:44.450 --> 01:47.400
So here you can see there are different values available.

01:47.690 --> 01:54.720
So when we combine these values, we actually get a digit on any of these values.

01:56.090 --> 01:59.030
So now we will visualize this.

01:59.060 --> 02:07.640
So for this, we can actually get the data and printed using flawed thought, I am sure, giving the

02:07.640 --> 02:09.260
data into a great form.

02:09.560 --> 02:11.960
So here I have rented one of the values.

02:11.970 --> 02:14.680
So here you can see that this looks like five.

02:15.380 --> 02:17.210
Next, we will have a look at.

02:18.510 --> 02:19.050
Inthe.

02:22.240 --> 02:27.870
Here you can see that this looks somewhat like one we will try.

02:28.930 --> 02:29.950
40.

02:31.530 --> 02:33.450
Which looks like seven.

02:34.510 --> 02:35.860
Next, we have.

02:37.150 --> 02:38.140
Two hundred.

02:39.200 --> 02:39.920
Which.

02:42.080 --> 02:44.420
Not looks like one, but it is one.

02:45.420 --> 02:46.590
Next, we have.

02:47.500 --> 02:48.610
See, 40.

02:50.500 --> 02:56.160
So this one is an eight, so you can see this is somewhat highlighted towards the end.

02:56.530 --> 02:59.770
So this is what we have in the images.

03:00.130 --> 03:07.810
So now we will import Bazzani and from the assembly, we can basically run Disney by giving the number

03:07.810 --> 03:08.830
of components.

03:10.160 --> 03:14.720
I need an them stage value, and after this, we will get the.

03:15.630 --> 03:22.710
The Senate object and begin simply to foot in France, one on top of that and get the data frame from

03:22.710 --> 03:27.180
this, this data frame will now contain the transformed values.

03:27.360 --> 03:32.330
And we are adding the digits from the biodata frame which we had created.

03:33.150 --> 03:37.770
This is the date of which we have obtained and on visualizing the ideas.

03:37.770 --> 03:40.950
And we can see here, these are the digits.

03:42.530 --> 03:48.950
So here you can see these other clusters which have been formed internally, so here you can see that

03:49.100 --> 03:54.590
the value six is present here while we have the value nine.

03:55.930 --> 03:57.160
Value zero here.

03:58.620 --> 03:59.910
This is the value one.

04:01.030 --> 04:05.790
This is value for now, you can relate how one in four looks somewhat similar.

04:05.830 --> 04:08.110
That is why these points are closer to each other.

04:08.800 --> 04:11.560
Here we have green, which is to.

04:12.710 --> 04:14.120
Here we have three.

04:15.850 --> 04:19.810
With just seven and it has a few points of nine.

04:20.950 --> 04:28.810
And growing closer to it so you can see how the numbers, which are closer in shape, are presented

04:29.470 --> 04:30.540
near to each other.

04:31.890 --> 04:34.090
Now we will see how we can implement this.

04:34.710 --> 04:41.820
So far, Disney, we only have to do with we will simply both Disney and the Woolverton transform on

04:41.820 --> 04:42.460
top of it.

04:42.810 --> 04:44.670
Now the next is FXR.

04:44.970 --> 04:48.240
So let us see how we can implement this.

04:48.810 --> 04:55.860
So for this year, we will import Escalon decomposition and we will import, in fact, that analysis

04:55.860 --> 05:04.170
from we will get the frame from the same handwritten digits data set and we will see the correlation

05:04.170 --> 05:04.730
matrix.

05:05.010 --> 05:09.870
So here you can see the central values are actually highly correlated.

05:11.340 --> 05:19.770
And we will then have a look and we will scale the data after scaling the data, we are taking out the

05:20.150 --> 05:21.520
principal component.

05:21.990 --> 05:24.960
So the principal components come out to be 30.

05:26.300 --> 05:34.580
So now we will have a look at this, we will fit the beast on top of it, and these are the key components

05:34.580 --> 05:35.480
of a generator.

05:36.110 --> 05:45.830
Now, these components have a shape, 40, 60, for that is 30 values for those sixty four columns,

05:45.830 --> 05:46.460
which we have.

05:47.970 --> 05:55.590
Now we will find out the BCA explained valiance ratio, so this will show how much variance is explained

05:55.590 --> 05:58.080
by this particular PC.

05:58.590 --> 06:05.330
So now here we are applying a cumulative sum and broadening of the PC explained variance ratio.

06:05.700 --> 06:12.030
And this will give us how much variance is explained by a combination of factors.

06:12.570 --> 06:18.330
So here you can see that the first component is able to explain twelve point six percent of variance.

06:19.470 --> 06:26.130
The first two components are able to explain 20 percent plus three components are able to explain.

06:26.130 --> 06:32.840
Thirty three percent plus four components are able to explain 42 percent next 48 and so on.

06:33.270 --> 06:40.950
Now, here, because this is an image data the BCA explained is very less in comparison to the usual

06:40.950 --> 06:42.570
cases which we will be seeing.

06:43.080 --> 06:50.070
So what we will be doing for that is we can we will select the top components and using the top number

06:50.070 --> 06:52.720
of components, whichever we want to have.

06:52.740 --> 06:59.840
So here we can select 11 or five or 20, whatever we want, and then we can apply photonics.

07:00.120 --> 07:08.270
So once we will do photonics, then we will get the details of what how we need to apply.

07:08.550 --> 07:15.540
And after that, after the speaker has been trained, we can simply do a backdoor transform.

07:15.900 --> 07:24.030
And with these transform, we will be able to transform our dataset from sixty four columns of data

07:24.240 --> 07:29.100
to one less number of columns of data here, which I have chosen 11 columns.

07:30.400 --> 07:38.590
Going further, if you want to load a single component, then we can simply see a component and the

07:38.590 --> 07:43.210
index of the component which warns would be one of the first components, we will be looking for the

07:43.210 --> 07:43.680
index.

07:44.440 --> 07:52.540
Now, let us try to implement this for another dataset, which is having numeric values, presenting

07:52.540 --> 07:52.780
them.

07:54.040 --> 08:01.570
So now we are picking another data set, which is existing base, so we have read the CSP file and we

08:01.570 --> 08:05.690
have filtered out the columns from the object database.

08:05.710 --> 08:07.670
So we don't want objectivity.

08:07.700 --> 08:09.780
We are wanting only the numeric columns.

08:10.180 --> 08:13.270
So we have filtered out the columns.

08:15.070 --> 08:21.640
So this is big drop, so we have dropped all of the columns from this, the next thing will be we will

08:21.640 --> 08:28.060
run the algorithm on top of it so we can simply run it like this.

08:28.390 --> 08:31.270
So these are the components which we have received from this.

08:32.200 --> 08:38.560
And now we will find out the explained variance, so this is the explained variance and this is this

08:38.560 --> 08:40.140
some of the variance explained.

08:40.420 --> 08:48.010
So now here you can see that we have 30 percent explained by the first one in forty seven point five

08:48.010 --> 08:51.080
percent explained by the force to Compellent and so on.

08:51.490 --> 09:00.340
So this is how we get to decide how many components we want to have full of it, 90 to 95 percent variance

09:00.340 --> 09:02.980
explained is good enough so we can give zero.

09:02.980 --> 09:11.350
One, two, three, four, five, six, seven, eight, nine, so we can keep all 10 components out

09:11.350 --> 09:13.020
of it or 11 components.

09:13.030 --> 09:17.140
That is completely up to us and just how we did earlier.

09:17.530 --> 09:24.610
We can simply do the first on the principle component by providing the number of components we want

09:24.610 --> 09:35.230
to have finally and do a fit and transform to get the updated data with less number of columns generated.

09:35.530 --> 09:42.700
Now let us have a look at the actual size of the Web, so let us have a look at the actual size of the

09:43.010 --> 09:43.740
data frame.

09:45.280 --> 09:49.450
So the actual size of the frame was.

09:52.760 --> 09:55.340
They need or shape.

09:57.300 --> 09:59.370
So Sudi.

10:00.630 --> 10:04.840
So it was 32 columns and we removed a few.

10:05.010 --> 10:07.230
So let's have a look at the.

10:09.230 --> 10:11.230
Size of the state of frame.

10:23.080 --> 10:30.160
So here we can see that we have 18 columns now instead of 32 columns, so we have reduced the number

10:30.160 --> 10:30.760
of columns.

10:31.000 --> 10:38.290
And similarly, when we have a very huge dataset in that case, we will be able to reduce a lot more

10:38.290 --> 10:43.890
number of columns and help us a lot in reducing the complexity of floodwaters.

10:44.230 --> 10:51.250
So I hope you will be able to implement these algorithms and maybe use it in any of your projects,

10:51.430 --> 10:52.960
which you will be working on.

10:53.230 --> 10:55.540
And I hope this will be really helpful to you.

10:56.260 --> 10:56.680
Thank you.