WEBVTT

00:01.440 --> 00:08.700
Now, let us go ahead and give you different details about the date of this, so let me have this data

00:08.700 --> 00:15.820
from the witness age of my fiancee, this education before violence, all of these problems present.

00:16.260 --> 00:20.640
So the first thing what we can do is we can check the different type of problems which has.

00:20.910 --> 00:24.530
So we can say the problem is.

00:26.720 --> 00:33.650
This gives us the list of different problems that have arisen and an important function is D.F. Thought

00:33.700 --> 00:38.180
Info, which gives us top level information about the.

00:40.620 --> 00:51.240
So here we can see that there are four thousand five hundred ten rows of data and.

00:52.470 --> 00:55.140
These are different from.

00:56.430 --> 01:01.290
In full, this is the age.

01:04.300 --> 01:05.560
Then this is.

01:06.830 --> 01:18.950
The job is the marital status, the education level, the default, the violence, housing details,

01:18.950 --> 01:24.570
loans, deals, all the biggies, these other details related to it, other things.

01:24.620 --> 01:26.120
What we can check is.

01:32.520 --> 01:40.200
Is the describe, which gives us a summary of the numerical problems, so out of the numerical column,

01:40.200 --> 01:47.510
we can see that this is the each column which has counts as forty five thousand two hundred and eleven.

01:47.760 --> 01:55.830
The mean age is is going to be the standard deviation is often the minimum age is 18, the maximum age

01:55.830 --> 01:56.960
is ninety five.

01:57.360 --> 02:04.680
And you can see that the maximum value, which is seventy five percent, has a value as what, eight,

02:04.920 --> 02:10.580
which means that there is a huge gap between the seventy five percent and the maximum value.

02:10.890 --> 02:16.290
So we can consider getting rid of the outlier values in the H.

02:17.530 --> 02:25.120
So these are a few insights which we can get from these volumes and bring out those structures, apart

02:25.120 --> 02:30.970
from that, we can take different vibes of the volume so we can take the detailed.

02:34.240 --> 02:42.130
This gives us the details of the volumes so we can see that each is individual job is object here,

02:42.130 --> 02:50.140
these object data VIPs means that these are strings and you can see that we have these values.

02:50.560 --> 02:57.130
So other values we can check if a particular value is numeric or North American needs and or take care

02:57.130 --> 02:57.670
of that.

03:00.530 --> 03:08.600
So here it seems that all the values at that, but in case if we have a column that should actually

03:08.600 --> 03:15.290
be numeric in nature and is in strength, then we can change the date of the column itself.

03:15.770 --> 03:17.870
So these are different things which we can do.

03:19.320 --> 03:24.270
Then we can check the shape of the data frame using the cheap.

03:26.910 --> 03:29.100
So this is the shape of the data frame.

03:29.110 --> 03:32.610
There are 17 volumes and these many number of those.

03:33.750 --> 03:41.220
Now, if you want to filter out some data from this particular data frame, then there are several ways

03:41.220 --> 03:41.940
of doing it.

03:42.210 --> 03:46.210
One is using a location so we can see if.

03:47.530 --> 03:57.220
Got a lock, and inside this, we can give the location values, so let us see, I want to view of the

03:57.220 --> 04:04.510
first row so we one and I want to see the second column.

04:06.590 --> 04:08.270
So it has the value of single.

04:10.470 --> 04:15.690
We can see the same thing using the left hand.

04:18.640 --> 04:21.480
So you can see this value is single now let us see.

04:21.520 --> 04:29.140
We want to have a sequence of values so I can see it from let's say I want to see the fourth row and

04:29.140 --> 04:31.660
I want to see the values in the fifth column.

04:33.430 --> 04:34.850
So I can do it this way.

04:35.320 --> 04:38.870
So using I is not really suggested.

04:39.070 --> 04:47.770
So instead of using I love, we actually use the location function because I love me because of having

04:48.490 --> 04:52.780
this dependency and confusion while working the data frames.

04:52.960 --> 04:59.250
So instead of using I love to actually work with different Farlam names itself.

05:00.370 --> 05:02.050
Let us try to filter out.

05:03.030 --> 05:08.220
So for filtering out these, we can use.

05:09.470 --> 05:20.360
There and then we give the column name usually, so the week now this gives us the column now later

05:20.360 --> 05:25.070
said they don't want to get only one column, but multiple columns.

05:25.280 --> 05:27.500
In that case, we will have to pass Mr..

05:32.550 --> 05:43.470
So we have the next 12 minutes on this show, then the next column, which I want to do is.

05:47.380 --> 05:51.200
So I can get these quotes now.

05:51.330 --> 05:57.060
I think what we can do is see we don't want these columns.

05:57.080 --> 06:04.330
I don't want all your details of data, but we want to put these out on the basis of a particular condition.

06:04.930 --> 06:09.620
So in that case, what we can do is we will have to use clock.

06:10.000 --> 06:11.200
So what we do is.

06:12.910 --> 06:17.290
He can simply see the dot look.

06:18.490 --> 06:24.530
Inside this, they have to be, which we are fudging.

06:25.120 --> 06:27.320
So we can give the condition.

06:27.800 --> 06:33.130
So let's say we want to get the condition as see the.

06:36.540 --> 06:37.110
Now.

06:38.060 --> 06:43.890
We have to give the conditions in the down bracket and then we will in one condition at every point.

06:44.090 --> 06:45.910
Now, what is the condition which I want to have?

06:45.920 --> 06:46.510
Let us see.

06:46.520 --> 06:50.580
I want to have age where the age of the person is more than 75.

06:51.080 --> 06:56.120
OK, so for that I can simply see D.F..

06:58.570 --> 07:02.350
I'm against cloning, so I h.

07:03.380 --> 07:08.460
I want people I want this each week, seventy 75.

07:09.450 --> 07:18.750
Now, on this, I get on the data and the INS has indicated that 70 five now legacy, I want the but

07:18.780 --> 07:22.950
also being so in that case, I will give under the conditions.

07:22.950 --> 07:26.580
So let us say I want on condition you OK?

07:26.790 --> 07:34.620
And I don't want to give under the condition which is C. having education as primary.

07:36.060 --> 07:37.550
So I would say the.

07:38.660 --> 07:40.490
Hi, the name.

07:41.990 --> 07:42.980
Education.

07:44.740 --> 07:47.830
And this education value has to be private.

07:48.220 --> 07:50.440
So I say, is he to.

07:51.640 --> 07:52.000
Brian.

07:53.820 --> 07:54.960
OK, so.

07:56.550 --> 07:59.520
Like this, I can get the details that age is good enough.

07:59.550 --> 08:01.690
Seventy five, education is primary.

08:01.980 --> 08:02.790
I'm like this.

08:02.790 --> 08:04.950
I can keep on adding different conditions.

08:05.190 --> 08:10.160
Now, once I'm adding condition, legacy, I want to have these conditions also.

08:10.350 --> 08:14.910
And I also want to I don't know the columns which I want to have.

08:15.150 --> 08:19.170
So what I can do is just against these conditions.

08:19.380 --> 08:26.550
So I have these given these conditions now after these conditions, I just give a comma and after all

08:26.610 --> 08:29.850
my all, which I wouldn't have.

08:29.850 --> 08:34.130
So I have an agent of education.

08:34.140 --> 08:37.500
So let me give each an education so I can give each.

08:38.500 --> 08:43.330
They give education and let me give.

08:46.080 --> 08:53.310
So I can get it out and I can get more education based on the criteria which I have, if I.

08:56.300 --> 09:04.550
Now, let's see, we want to have these conditions with just fine now in case we want to have any negative

09:04.550 --> 09:05.140
condition.

09:05.450 --> 09:10.540
So in that case, we can simply put all negation sign in front of this.

09:11.420 --> 09:17.390
So just to get in front of this so it will give everything apart from this particular condition.

09:18.390 --> 09:24.900
Now, other thing, what we can do is let's see if we want to drop a particular column, so in case

09:24.900 --> 09:30.690
we want to drop a particular column, then what we can do is we can save the ABC

09:33.180 --> 09:34.930
and save this function.

09:35.190 --> 09:38.330
They can give the list of columns which we want to.

09:38.940 --> 09:43.250
So I get to look at the list of each of.

09:45.630 --> 09:47.790
So I give you the quote.

09:49.850 --> 10:05.040
Often not the axis, which I want to get to this, so I have easy access to one, this is what you want.

10:05.080 --> 10:06.440
So I can simply say this.

10:08.380 --> 10:14.010
So once I have this, it will remove those two columns from the plane.

10:14.380 --> 10:23.190
Now, one thing you notice here is if we again run now, if we again run this war and the.

10:25.330 --> 10:25.870
So.

10:27.290 --> 10:35.450
If you check, the head still has the goal of open while it has been for you, so it will actually make

10:35.450 --> 10:40.610
this change, what we have to do is if you are going in this way, then we will have the same

10:43.370 --> 10:48.410
sort of thing over and over again this way.

10:48.420 --> 10:54.420
Then it will just show you what it will be very different.

10:54.470 --> 10:56.960
So this is one that will be liberal.

10:57.520 --> 10:59.690
And another way we will be the this.

11:01.150 --> 11:03.770
That we can write in the equal to.

11:04.860 --> 11:12.170
So what we can do is we can simply type of in place equal to do in front of it.

11:13.240 --> 11:15.420
So we can simply say Obama.

11:16.620 --> 11:28.530
In equal to then I think from this now out of the way of doing this is we can simply delete.

11:29.480 --> 11:35.780
So what we can do is we can simply see then and then we can give the name.

11:39.870 --> 11:48.120
I'm there and then we can give the qualifying age, so then it will delete the goal of age from this.

11:48.840 --> 11:51.480
So if we see the again.

11:55.620 --> 12:02.820
So now the D.F. will not have the goal of each and another thing, what we can do is we can delete the

12:02.820 --> 12:05.040
entire data frame by then.

12:05.670 --> 12:10.470
I'm just simply saying I like this.

12:11.280 --> 12:14.460
So these are a few basic operations which we have.

12:14.790 --> 12:19.200
Apart from this, there are several other operations we would have to do.

12:19.530 --> 12:27.480
These are operations which will allow us to basically analyze the data initially and view the detail

12:27.510 --> 12:28.010
properly.

12:28.310 --> 12:34.550
But we will have to do a lot more on top of these on top of these dolphins.

12:34.740 --> 12:43.940
So we will see the entire process, which we will be following for data transformation and data cleaning.

12:44.160 --> 12:46.400
So let us go to that.

12:46.440 --> 12:52.440
I'm going to see this in the next session, how we will work with actual data and what are the things

12:52.440 --> 12:53.740
you need to take care of?

12:53.760 --> 12:58.560
What are the things you need to consider while analyzing and modifying your data?

12:58.620 --> 13:01.130
That is something we will look after this session.
