WEBVTT

00:00.450 --> 00:06.660
Hello, everyone, in this session, we will discuss about sorting and merging of data frames, so the

00:06.660 --> 00:12.090
first thing which we will discuss is sorting data by one or more columns in ascending or descending

00:12.270 --> 00:12.700
order.

00:13.260 --> 00:18.510
Then we will discuss how we can combine data frames vertically or horizontally.

00:18.870 --> 00:24.640
Then we will learn about merging them, using keys left, right, in and out the joint.

00:25.050 --> 00:26.760
OK, so let's begin with it.

00:26.790 --> 00:30.890
So initially we will import Fondas and the number of.

00:35.760 --> 00:44.550
After importing these, I will be creating a new data frame, so this data frame is created from values.

00:46.500 --> 00:53.010
It has a shape of 20 cross food and it has values from two to eight.

00:53.430 --> 00:57.060
And the column names are A, B, C and the.

00:59.790 --> 01:06.300
This is the data frame which has been created, this data frame has 20 rows.

01:07.470 --> 01:16.560
And four columns now we will try to sort these using column E to sort these using column E, we will

01:16.560 --> 01:18.690
use D.F. dot sort values.

01:18.870 --> 01:26.820
So our values is a function which is used to sort of values in a Dufrene and we give the column by which

01:26.820 --> 01:33.340
we want to sort this out as the initial, the first parameter and the next barometer is in place.

01:33.930 --> 01:40.320
So in place operator here is means that if we give in place equal to do, then it will change the data

01:40.320 --> 01:41.130
frame itself.

01:41.580 --> 01:44.700
So if I change this.

01:46.020 --> 01:48.480
Then it will modify the data frame itself.

01:48.720 --> 01:54.270
So if I print the data frame, the data frame will be sorted with respect to it.

01:54.780 --> 02:00.960
Now, another thing, what I can do is I could have not given in basically due to if I did not give

02:00.960 --> 02:05.400
in physical truth, then the changes will not take effect in the device.

02:05.400 --> 02:11.640
So in that case, if I am not giving in basically true, then I would have to write it in a form like

02:11.640 --> 02:13.500
D.F. is equal to the sort.

02:13.830 --> 02:15.300
And without this.

02:17.270 --> 02:22.630
So if I am not writing in, please, then I have to give this particular format to make it effective,

02:22.970 --> 02:29.820
otherwise I can simply the idea of sort values, I'm simply saying, please, equal to two.

02:30.480 --> 02:37.840
Now, this is for sorting one column if you want to sort this column in descending order.

02:38.270 --> 02:41.850
And then what we can do is there is of the bottom of it, just ascending.

02:42.140 --> 02:45.440
So if by default ascending is said to be true.

02:46.390 --> 02:53.220
But in case we want to sorted in descending order, then we will have to write a sending people to Fuld's.

02:53.500 --> 02:59.320
So if you will put this ascending hold for us, then we saw this in descending order.

02:59.320 --> 03:03.040
So you can see that the column is decreasing.

03:03.580 --> 03:05.080
Now on a IS.

03:05.110 --> 03:09.310
We can also sort the values on the basis of multiple columns.

03:09.490 --> 03:15.850
So when we are sorting columns on the basis of multiple columns, in that case we will have to provide

03:16.000 --> 03:20.410
the argument of column which was given here.

03:20.800 --> 03:27.040
Initially we were giving just one column name here, but now we will have to give a list of column names.

03:27.220 --> 03:29.440
So we will give a list of column names.

03:29.710 --> 03:36.210
And based on the list of column names, we can give ascending values again in form of list.

03:36.640 --> 03:38.210
So you can see the difference here.

03:38.230 --> 03:40.180
Initially we were giving just.

03:41.590 --> 03:49.510
A Anderson default falls now we are giving with the values in a list for them, so once we do this,

03:50.020 --> 03:52.510
the values are sorted accordingly.

03:52.510 --> 03:55.270
So it has been sorted based on that.

03:55.510 --> 04:03.670
And on top of that, the values are sorted on the basis of the and then the values have been sorted

04:03.670 --> 04:11.320
on the basis of C, B in ascending order, the C in descending order of.

04:13.240 --> 04:22.180
Next thing, what we can do as of now, this source, the data by Colombe in ascending order but sorting

04:22.180 --> 04:24.350
by golems happens in descending order.

04:24.550 --> 04:33.280
So initially it was sort all the values of B in ascending order and then the values of B, that here

04:33.280 --> 04:35.950
we have three values as two.

04:36.250 --> 04:36.640
Right.

04:36.760 --> 04:46.630
So these values will be sorted for the on the basis of C, so it will force or B and after sorting B,

04:46.810 --> 04:53.160
the values corresponding to B will be sorted and these values will be sorted accordingly.

04:55.260 --> 05:04.920
Now we'll look at how we can combine different data set, so here I'm creating a data frame V1 and this

05:04.920 --> 05:10.320
data frame consists of volumes of letters and numbers.

05:10.330 --> 05:14.250
So there are letters and the numbers, one and two.

05:14.610 --> 05:16.690
And I've given the column names also.

05:17.550 --> 05:21.150
Similarly, I have created another data frame with value.

05:21.180 --> 05:23.070
See these three four.

05:23.340 --> 05:26.150
Gandal So I just run this now.

05:28.120 --> 05:33.640
So these are the values which has been created, these are the things which have been created now,

05:34.360 --> 05:41.520
as you always remember, those are equivalent to Zettl and columns are always equivalent to one.

05:41.770 --> 05:43.510
So we always did this.

05:43.510 --> 05:46.990
We just remembered using this rose.

05:48.150 --> 05:57.990
Former columns, and this is written in a way zero one one, so whenever I see Zettl, it means we are

05:57.990 --> 06:02.640
working with rules and whenever I see one, it means we are working with columns.

06:06.880 --> 06:14.790
So here what we are doing is we are concatenating, we are combining, we are just fisting them together.

06:14.920 --> 06:22.120
So we are basting our data from one data frame two together and we are basting them on the basis of

06:22.120 --> 06:26.370
access to this means that we are facing them always.

06:26.530 --> 06:29.380
So that means the rules of the event.

06:29.380 --> 06:36.190
One will be present at the top and after that the rules of data frame will be present.

06:36.880 --> 06:42.350
Now, whenever the values for the data frame one are missing.

06:42.580 --> 06:47.400
So as you see in data frame one, we do not have the column animals present.

06:47.830 --> 06:49.210
So what happens is.

06:50.160 --> 06:53.910
Whatever the value is not present, it gives not a number there.

06:54.690 --> 06:59.300
So here we have concatenated these on the basis of X is zero.

07:00.250 --> 07:06.950
OK, so in case more columns were present in data frame to all those columns will be added here and

07:06.980 --> 07:09.200
the values will be given as not a number.

07:09.900 --> 07:16.730
Now, similarly, if we are doing the same thing now, if you one more thing to notice here is the indexes

07:16.730 --> 07:18.680
have we have not been modified.

07:18.890 --> 07:22.340
So in my first data frame, the index was zero one.

07:22.340 --> 07:24.070
Here again, the index was zero one.

07:24.080 --> 07:29.320
So while sticking them together, it has actually retained the original indexing.

07:29.600 --> 07:35.180
So if you want to make the indexing again, correct the index, then what we can do is we can simply

07:35.390 --> 07:38.000
ignore index and mean said that through.

07:38.270 --> 07:39.920
So when we said this thing to.

07:41.260 --> 07:42.430
What happens is.

07:43.660 --> 07:45.780
Let me just run this again.

07:47.080 --> 07:55.660
So when we put this concatenated this, so what happens is it will simply retain the create a new index

07:55.660 --> 07:59.680
for this so it will change the indexes from zero one zero one two zero one, two, three.

08:00.340 --> 08:05.840
Now, another thing which we can try here is I'm creating a new data frame, data frame three.

08:06.070 --> 08:14.260
Now, this data frame consists of a column Unimin and another column name, and it has three rows of

08:14.260 --> 08:17.940
data, which is board volume monkies, your Staiger John.

08:18.160 --> 08:21.050
So these are the values which are present in this particular detainee.

08:22.910 --> 08:27.030
So we are trying to concatenate these by axis one.

08:27.290 --> 08:31.510
Now, what this means is that we will have this particular data frame.

08:32.030 --> 08:33.800
We will have the data frame to.

08:35.380 --> 08:41.050
This particular data frame, this is the data pointing to a different data frame to heal forced.

08:43.770 --> 08:46.350
So this is day two, sorry.

08:53.660 --> 08:59.410
This is the data frame do this is to be the frame to be, and what we're trying to do is we are going

08:59.420 --> 09:01.760
to be needing data from two would be to frame three.

09:01.910 --> 09:08.540
Now, what happens is the full data frame has two rules and three columns.

09:09.020 --> 09:12.970
The second data frame has three rows and two columns.

09:13.190 --> 09:18.440
So as a result, then we get five columns.

09:18.650 --> 09:22.040
That is letter number, animal animal.

09:22.250 --> 09:28.790
I mean, this is what we get by concatenate them with access one, which is column based concatenation.

09:29.150 --> 09:34.690
And what we get is we again get because the rule number is two here and three here.

09:34.700 --> 09:36.380
So it creates three rules.

09:36.620 --> 09:45.370
And wherever we don't have values from this particular data frame, it makes them as not a number.

09:45.530 --> 09:48.170
So it improves the value as not a number.

09:48.350 --> 09:54.980
And once we have done that, once we have done all the sorting, once we have done the combination,

09:55.490 --> 10:00.770
whatever, merging, whatever we are trying to do, what we can do is we can then improve the values,

10:00.890 --> 10:02.450
how we view the values.

10:02.450 --> 10:03.560
We see that little.

10:03.710 --> 10:10.000
But for now, just try to understand that these values will get generated and it will not have an empty

10:10.070 --> 10:10.340
value.

10:10.340 --> 10:11.270
It will have more than that.

10:12.470 --> 10:14.720
Now, the next thing what we can try is.

10:16.100 --> 10:21.620
So we Maskin got donated by Accies one, similarly, we are trying to concatenated Dufrene one, we

10:21.630 --> 10:22.490
did frame three.

10:22.640 --> 10:27.350
So what it does is it again creates the values, not no matter what.

10:27.350 --> 10:28.610
The values were not present.

10:32.740 --> 10:40.390
Now, let us discuss how we can move to different data frames, so here I am creating other data from

10:40.390 --> 10:41.410
the data frame.

10:41.410 --> 10:48.430
One contains one column customer data, which contains volumes one to five, then out of the column

10:48.430 --> 10:49.570
contains the product.

10:50.140 --> 10:54.880
Then that is another data frame which contains the customer I.D..

10:56.180 --> 10:57.920
And along with the customer itis.

10:59.400 --> 11:01.800
It contains the state now.

11:01.970 --> 11:03.960
Now, let me run this.

11:05.300 --> 11:12.800
So this is the data frame when this is the data frame, too, so here we have customarily and you can

11:12.800 --> 11:15.890
see that for the similarity, one, two, three, four, five.

11:15.900 --> 11:17.120
These are unique values.

11:17.420 --> 11:20.480
And we have video footage which are kind of repeating.

11:21.320 --> 11:26.000
Then here we have another the customer I.D., which are different from the customer I.D., which are

11:26.000 --> 11:27.230
present in this data.

11:27.260 --> 11:27.770
Name one.

11:29.920 --> 11:38.890
So we are next trying to merge these data frames, so why we are merging the data frames, what we are

11:38.890 --> 11:43.570
doing is we are merging the data from one with data frame two.

11:44.930 --> 11:52.460
And we are merging it on customer already, so we are seeing feed thought much and we say if one the

11:52.640 --> 11:59.930
to, then we give the key on which we want to merge these ideas and how we want to merge this.

12:00.170 --> 12:02.810
So this is the type of joint which we are trying to have.

12:03.140 --> 12:04.600
So this is in our joint.

12:05.360 --> 12:14.780
So when we say in our joint, that means that we want to keep only the rules which are common between

12:14.780 --> 12:17.390
this Dufrene one and the same two.

12:17.750 --> 12:22.010
And we want to keep the commonness on the basis of the customer.

12:22.700 --> 12:24.830
So we have customer varieties.

12:24.830 --> 12:29.140
One, two, three, four, five year and we have three, four, five, six, seven.

12:29.390 --> 12:33.190
So the common customer IDs are three, four and five.

12:33.440 --> 12:39.080
So we are expecting to have only three rules in this particular little thing.

12:39.110 --> 12:42.090
So while I run this, I get three rules from the state of Maine.

12:42.500 --> 12:50.660
Now, the next is we want to join on customer idea again, but we are wanting to join it on the outer

12:50.660 --> 12:50.970
join.

12:51.140 --> 12:58.520
So what our vision gives us, it gives us rules from all of the things, all the rules from the date

12:58.520 --> 12:58.940
of names.

12:59.120 --> 13:00.170
So we will get.

13:01.330 --> 13:07.730
Customer added one, two, seven, all of these customer IDs would be present.

13:08.020 --> 13:14.560
So once we've done this, we get customer I.D. one to seven and all the details in front of them now

13:14.560 --> 13:20.230
because customer IDs, six and seven were missing in the frame one.

13:20.380 --> 13:23.830
So the product values have been imputed as not the number here.

13:24.190 --> 13:28.180
And because the customer IDs, one and two are missing in between two.

13:28.390 --> 13:31.830
So that is the reason why they see this coming as not the number here.

13:33.120 --> 13:37.390
Now, next is customarily merging on customarily an.

13:38.640 --> 13:43.740
I joined this group, so I left join will basically give us.

13:45.060 --> 13:50.110
These are the two data frames which we have and we are keeping data from one on the left hand side.

13:50.340 --> 13:54.990
So this way we will get the columns which are from the.

13:56.950 --> 14:03.280
We will get the rules which are present in the next day, the family, so the rules are one to five.

14:04.430 --> 14:10.660
So when we are doing left joined, we are getting close one to five and the details which are missing

14:10.670 --> 14:16.010
on the right side, thebus that has been imputed, that has been created.

14:16.580 --> 14:23.650
Now, similarly, when we have this right joint, so the columns of the rules are picked from the right

14:23.660 --> 14:27.700
inside the ring, which is date of name to which has rules three to seven.

14:27.710 --> 14:29.060
So all of these are present.

14:29.330 --> 14:35.720
And the product, the product, the definition does not have the rule six and seven.

14:35.930 --> 14:38.450
So that is why the number has been created here.

14:39.520 --> 14:42.830
Now, a similar kind of things you can try to own here.

14:42.850 --> 14:50.530
We have more combinations so you can see we have the frame 20 frame three in the frame four.

14:50.740 --> 14:56.990
So now here what we have is in the frame three, we have values repeated.

14:57.940 --> 15:01.120
So the date of the T has to twice.

15:02.090 --> 15:03.830
And it has a lot on unbottled.

15:04.830 --> 15:12.760
While the Dufrene Ford has won the bidding twice and it doesn't have an number here.

15:13.530 --> 15:15.000
So what we do with.

15:17.250 --> 15:25.070
So now when we merge these different three days of rainfall, one customer arrive and have inner join,

15:25.380 --> 15:33.510
then what we get as a result is because the Zatarain three ended up rainfall has customer already one.

15:36.760 --> 15:37.330
For.

15:38.600 --> 15:47.060
In phone, so we get one, two and four, and the repetitions with the president are coming as it is

15:47.420 --> 15:54.230
so because the best humanity has repetition in the deafening for so it is taking what the values and

15:54.620 --> 15:55.940
values on both of these.

15:56.210 --> 16:00.280
And because the cost when I was presenting the different three twice.

16:00.470 --> 16:04.430
So it is generating values on top of these and facing the values accordingly.

16:05.240 --> 16:11.130
Similarly, all the violence will be imputed when we are generating a left join order, I join.

16:11.480 --> 16:14.490
So please try all these scenarios.

16:15.260 --> 16:15.770
Thank you.
