WEBVTT

00:00.990 --> 00:07.260
In this session, we will discuss the mode stacking and stacking and melting of data, so the first

00:07.260 --> 00:11.310
thing which we will do is we will impose the required library's.

00:13.360 --> 00:20.440
So we are importing Seabourne as Asness, Fondas as BDI and no as an.

00:25.150 --> 00:33.040
So now we will go ahead with looking at different through volume transformations, so stacking and stacking

00:33.040 --> 00:39.300
and arming the rebels have their own documentation act does not find that out.

00:39.370 --> 00:47.370
Or so you can refer to those in case you want to check the full documentation and learn more about them.

00:47.680 --> 00:52.240
But we will cover only what is actually required and the important parts of that.

00:53.150 --> 01:00.320
So this is the dip state, does it, so we are loading the students from the library.

01:01.580 --> 01:10.460
So the data set contains the total bill, the six more days, time and size of the of the particular

01:10.460 --> 01:10.740
date.

01:11.270 --> 01:13.780
And this is the detail of the bills.

01:13.790 --> 01:18.590
The amount the what is the amount?

01:18.620 --> 01:20.240
This is the bill amount.

01:20.720 --> 01:22.450
This is the gender of the person.

01:22.460 --> 01:29.690
This is if the person is smoking or not, then that the average someone has visited the restaurant,

01:30.020 --> 01:36.050
then the time when they have visited and the size of the group of people who have visited.

01:37.870 --> 01:43.510
So for this dataset, we will check details further.

01:43.720 --> 01:47.180
So we would, first of all, group the data.

01:47.410 --> 01:51.950
So here what we are doing is we have this entire data.

01:52.150 --> 01:56.800
So now we are grouping the data by the and then by six.

01:57.310 --> 02:04.370
And we are creating an aggregation with size of the group of people.

02:04.600 --> 02:05.860
So, like, we run this.

02:08.220 --> 02:16.110
So here you can see what happens here is that while we have the data, so it has brought all the data

02:16.590 --> 02:22.110
with the B, so we have all the details of each and every day, which we have.

02:22.410 --> 02:29.640
And for each day, it has further created a further subgroup of male and female.

02:29.970 --> 02:36.270
And on the male and female, it has created a size which has been aggregated.

02:37.360 --> 02:38.770
So this aggregation.

02:41.340 --> 02:49.690
So this aggregation is actually the sum of the size of the data, so it is adding the value for it.

02:49.950 --> 02:57.540
So it is actually finding out all the Sundays then for all the Sundays, a finding of all the Fenians,

02:57.840 --> 03:04.090
and it is just adding the Google, the number of people who are dead in the group.

03:04.530 --> 03:06.570
So this is how the abused group, but.

03:08.520 --> 03:17.070
Now we are getting to find out all these values, but it is a little difficult to how to analyze this

03:17.070 --> 03:17.990
kind of data.

03:18.210 --> 03:24.080
So we want the data to be in a column form where we can actually analyze it more to you.

03:24.510 --> 03:33.130
OK, because what happened now is that when we have created this grouping, this specific indexing available,

03:33.480 --> 03:40.340
so I cannot actually find out the particular value based on some particular index.

03:40.620 --> 03:46.800
I want the data which looks something like this so that I can retrieve certain values from it and apply

03:46.800 --> 03:49.860
for those based on the bottom does, which we have.

03:50.580 --> 03:52.500
So what we can do is.

03:53.620 --> 04:02.560
We can use Unstopping, so we have needed this tips, gva just dimps biodata, so what we are doing

04:02.560 --> 04:10.470
for that is we are applying the unstaffed method on top of and we are creating a on unstuck from this.

04:10.660 --> 04:11.890
So let me run this.

04:13.890 --> 04:19.210
So here I have the unstaffed version of this group.

04:19.740 --> 04:26.370
So what happened here is it has unstaffed the media and we have the indexing.

04:26.370 --> 04:34.560
So now the index, instead of being up to level the index, it is now just a single level index.

04:35.460 --> 04:41.340
So it has only one level of indexing, which is Thursday, Friday, Saturday, Sunday, which I can

04:41.340 --> 04:42.370
refer to easily.

04:42.540 --> 04:47.090
Now it is more understandable and we can use it more easily.

04:47.490 --> 04:49.860
So this is an unstoppable version of it.

04:50.400 --> 04:58.110
Now when we are giving the barometer so indiv is what it is doing is it is giving Thursday, Friday,

04:58.110 --> 04:59.280
Saturday, Sunday here.

04:59.280 --> 05:01.700
And it has unstaffed mean Fehmi.

05:02.040 --> 05:05.160
So how it was actually the first nibble here.

05:05.160 --> 05:07.670
We had those during this time in the sun.

05:07.950 --> 05:11.050
Then we have male and female as the next level.

05:11.490 --> 05:19.190
So on unshocking it has removed the male and female and only the first level, which was actually that

05:19.590 --> 05:24.490
now we will unstaffed on the basis of an argument which is zero.

05:24.840 --> 05:30.440
So when we strike on the basis of this, what happens is it is unstopping the level.

05:31.620 --> 05:32.220
Zero.

05:32.760 --> 05:36.030
So what it does is it does stacking the zero to.

05:37.520 --> 05:42.380
So it is unshocking zero levels or the keeping the level one.

05:43.990 --> 05:50.920
As of this summer, male and female remains as it is, and it has on Thursday, Friday, Saturday,

05:50.920 --> 05:51.300
Sunday.

05:51.310 --> 05:58.570
So we have the Thursday, Friday, Saturday, Sunday on the column by vigneron stacking on the basis

05:58.570 --> 05:59.130
of one.

05:59.380 --> 06:03.570
What it does is it unstaffed the level number one.

06:03.760 --> 06:06.280
So the level number one, which is.

06:08.110 --> 06:17.890
Female lend me get unstaffed, I'm it is represented on the lip, so which levies you want to unstaffed,

06:17.890 --> 06:21.970
you need to provide that as an input barometer for this function.

06:22.120 --> 06:23.050
Onestop function.

06:24.980 --> 06:33.800
Now, further, what we can do is if we want to have or new object, like when we are of looking at

06:33.800 --> 06:40.870
the columns of the upstart unstaffed, what we get is we get this multi-level kind of indexing.

06:41.720 --> 06:44.280
So this is the type start on stack.

06:44.900 --> 06:46.940
So in type, start on stack.

06:47.180 --> 06:50.390
We have this multilevel column name.

06:53.290 --> 06:57.970
So it is kind of difficult to understand how this actually works.

06:58.240 --> 07:05.920
So when we talk about the balloons, the balloons have size and undersize, it has male and female.

07:07.590 --> 07:15.060
So this is kind of difficult to interpret how we will filter out the values, how we will apply any

07:15.060 --> 07:19.030
operations on the volume, that becomes difficult in this particular case.

07:19.650 --> 07:21.660
So what we can do is.

07:23.670 --> 07:34.150
We can use tips us this was the deed of name name, and we have this size Godmamma, which is the double

07:34.170 --> 07:35.100
meaning of the.

07:36.600 --> 07:43.530
On the colony, so we done this, we can get the volume as it is, so I'm just getting the double reading

07:43.830 --> 07:48.480
this particular thing, which is the first colony and this is my second colony.

07:49.110 --> 07:59.290
So I can use this or what I can use I can simply use of copy this entire day Dufrene and then would

07:59.310 --> 08:03.570
dive into different and different levels and use that.

08:03.720 --> 08:08.500
But this kind of a tricky thing and it would take some time to understand also.

08:08.790 --> 08:11.990
So a better thing of a better thing.

08:12.000 --> 08:13.230
What we can do is.

08:14.270 --> 08:20.000
Instead of using this kind of a structure, it is convenient, if it is convenient, you can use this

08:20.270 --> 08:21.140
kind of a structure.

08:21.440 --> 08:26.150
If not convenient, then what we can do is we can simply use.

08:29.160 --> 08:34.410
So how we can do the stacking, so this is the start us them for me.

08:35.420 --> 08:40.520
This is what we have generated, so they absorb us, has the problems.

08:41.500 --> 08:44.890
Size me, size women, so these are problems.

08:45.280 --> 08:52.270
So when we do the US stuff, what it does is it again stacks back all the stuff on me.

08:52.960 --> 08:56.200
So I again get the level kind of a structure.

08:58.300 --> 09:06.130
Now, if you want to start with the level of zettl, it will start from the the zeroth level.

09:07.880 --> 09:14.650
And if you want to step on the basis of one, then it will stand the possible.

09:18.320 --> 09:28.040
So if we want to make the data by now, then either we can use the stack and sucking for changing the

09:28.040 --> 09:31.040
levels of data or we can use melting.

09:31.730 --> 09:33.710
So let us see how melting works.

09:33.980 --> 09:39.910
So we have this data, which is of having forced last height weight.

09:40.280 --> 09:44.570
So all the data is just in the form which we usually like to work with.

09:45.140 --> 09:48.890
Now what we can do is we can use dort make.

09:50.020 --> 09:55.560
And began giving the lady and we are giving the lenient on bubbleheads.

09:55.600 --> 09:59.900
We want to make it so we are giving the first and last name.

10:00.130 --> 10:02.140
So what did we do with it?

10:02.140 --> 10:02.890
Will simply.

10:04.370 --> 10:14.900
Change the look of fear, so it will change the height and weight into column name itself and it will

10:14.900 --> 10:16.280
give the value in front of you.

10:16.700 --> 10:18.190
So this is how it will make it.

10:18.200 --> 10:26.810
It will just by making you alone as it will increase the number of rules when we make something, it

10:26.810 --> 10:31.280
will increase the number of rules and when we are.

10:32.610 --> 10:40.440
Converting this so we can also look at this as cheese box index, so if you will just run this particular

10:40.440 --> 10:41.160
piece of fruit.

10:41.610 --> 10:46.590
So here we are just converting this first and last as the index theme's.

10:48.590 --> 10:50.400
So this is what we can do.

10:50.420 --> 10:54.800
We can have the next visas index names and.

10:55.910 --> 11:02.990
When we do this and the next, it will simply convert this and we will stop them together and we can

11:02.990 --> 11:05.200
convert it into the original.

11:05.960 --> 11:16.040
So it is a better use of the stuff because it kind of looks flawlessly might even be used on stacking.

11:16.040 --> 11:23.750
If my positive job is so, I would feel like you can try to follow what you like and follow what you

11:23.750 --> 11:33.050
are more comfortable with, but may think I'm stacking works more flawlessly and it's a lot more versatile

11:33.050 --> 11:33.680
in nature.

11:35.280 --> 11:40.410
Now, the other thing which we want to learn here is to convert into dummy variables.

11:40.680 --> 11:46.070
Now, we have discussed A of about political benefits.

11:46.350 --> 11:54.150
So let's say we have some variable as gender and it has two values, male and female, and we want to

11:54.150 --> 11:56.220
convert that into a zero one.

11:56.820 --> 11:59.760
So what we can do is we can create a dummy.

12:00.870 --> 12:03.870
So how we can do is here we have started.

12:05.480 --> 12:11.360
So it has this Volume six, which has values, female and me, so what I want to do here is I want to

12:11.360 --> 12:12.520
get the Emmys for this.

12:12.520 --> 12:15.310
So what I can do is I can simply these dummies.

12:16.730 --> 12:25.510
So if we can work the column six to six and the school, me, I'm six and the school Fehmi, I'm wherever

12:25.520 --> 12:32.360
the value of sex was me, it would be on the values to one under me and column.

12:32.660 --> 12:38.710
And wherever the value of Fehmi value was female and the female column, it will there one.

12:39.260 --> 12:43.660
So this is how we can dummy column instead of having this factual data.

12:43.820 --> 12:49.180
Now same thing can be done when we have a multiple number of captivities.

12:49.190 --> 12:51.440
Let's see if we have a job.

12:51.690 --> 12:58.640
Nine, then we can have as many number of categories as we want so we can simply use get dummies for

12:58.640 --> 12:58.990
that.

12:59.210 --> 13:09.190
And there are other methods also by which we can convert something into a dummy and then put Sadan for

13:09.320 --> 13:15.680
those on top of that so that we don't have extra number of of categories.

13:15.680 --> 13:22.850
I make sure no problems so we even know when we will talk about feature selection, how we reduce the

13:22.850 --> 13:23.780
number of volumes.

13:23.990 --> 13:29.930
But for now, you just need to know that we have this dummy function, which helps us in converting

13:30.230 --> 13:33.260
the categorical columns into dummy variables.

13:33.530 --> 13:42.770
And for the ordinary columns, we will have another process which will allow us to convert those of

13:42.950 --> 13:46.460
ordinary values into numeric values.

13:47.860 --> 13:52.230
In the next session, we will learn about somebody else.

13:52.350 --> 13:55.180
We will see how we can generate that.