WEBVTT

00:01.370 --> 00:08.330
In this situation, we would learn about Fondas and we will learn how we can work with him, so we would

00:08.330 --> 00:15.970
implement what we have learned to critical detail now and what we have learned in pieces in Fondas.

00:16.850 --> 00:18.300
So let us begin with that.

00:18.800 --> 00:24.710
So first of all, we'll import numbers and find us after importing Memphian Fondas.

00:24.710 --> 00:31.640
I'm just creating a new Dufrene with each physical safety I.D. rating, violence and children as columns

00:32.210 --> 00:33.580
and these other details.

00:33.590 --> 00:43.210
Now you can see that the age has values which are numeric in nature and it has strong values as defined.

00:43.220 --> 00:49.940
And thirty four, which is again, no, but the data is strong and apart from that that is of the string

00:50.300 --> 00:53.030
which is missing, which is present in each.

00:53.690 --> 01:01.130
Then we have FICO range, which is entirely in a string form but is created out of numbers and no hyphen.

01:02.260 --> 01:03.940
Apart from that, we have cities.

01:05.730 --> 01:06.000
Mea.

01:07.370 --> 01:12.620
Rioting and violent violence also contains not a number.

01:13.690 --> 01:18.800
And children, Hasle range from zero to five.

01:19.240 --> 01:26.560
And we have created 50 such rules, so I just run this and create a data frame using the dictionary

01:26.560 --> 01:28.540
for my annual printing.

01:28.540 --> 01:35.080
The head of the data frame, you can see we have these particular values and you can see in each column

01:35.080 --> 01:40.570
we are not able to find any missing values as of now.

01:40.780 --> 01:45.070
So let me just extend this a little bit and see if these 15.

01:45.880 --> 01:56.140
So when I see 15 rows, yes, there are missing values and rule 11 and rule index 40 and there is an

01:56.200 --> 01:58.780
order, no value and balance.

01:59.700 --> 02:07.050
In Rule nine, so these are a few values which we will need to keep in consideration and try to modify

02:07.050 --> 02:08.040
and try to in.

02:09.070 --> 02:15.730
Now, the next step, which we have is I can take the data types, so when I see my detailed or Dee

02:15.730 --> 02:23.050
Daibes, so I will be getting the data out of the columns which I have or the features or attributes

02:23.050 --> 02:23.750
that I have.

02:24.480 --> 02:26.420
Now, what are these other groups?

02:26.440 --> 02:30.180
It is Aidy now Ibbs and Dijo, which is perfectly fine.

02:30.850 --> 02:39.260
Age is object, which means that each is a string but each has to be a numeric because each is a number.

02:39.260 --> 02:39.640
Right.

02:39.790 --> 02:42.090
And here all these values are numbers.

02:42.250 --> 02:44.770
So we have to convert each into a number.

02:45.040 --> 02:49.660
That is the first point which we got from the second point is FICO.

02:49.840 --> 02:52.170
FICO is a numeric type.

02:52.180 --> 02:54.310
It is a data string.

02:54.490 --> 03:00.490
But if the string because there is a range, there is a hyphen in between and there are two numbers

03:00.730 --> 03:01.210
in here.

03:01.540 --> 03:11.940
Now, we cannot really apply any mathematical operation on a string and that to having two numbers together.

03:12.400 --> 03:18.160
So if you want to apply a mathematical operation, we need to have the hundred separate and one for

03:18.160 --> 03:25.660
three separate so that I can either multiply with some number or divided with some number or do any

03:25.660 --> 03:27.320
mathematical operation on top of it.

03:27.670 --> 03:30.610
So the next task which I have in hand is.

03:31.630 --> 03:41.320
To split these, so I want to have one hundred and one fifty in two different columns, or if I don't

03:41.320 --> 03:49.090
create two different columns out of what I can do is I can take the mean of the physical value and create

03:49.090 --> 03:51.160
a new column with only the mean value.

03:51.520 --> 03:57.790
Because you see the range for all of these are 50, 150, minus hundred is fifty three hundred minus

03:57.790 --> 04:01.180
two fifty fifty to fifty minus two hundred and fifty.

04:01.450 --> 04:05.020
So all the differences between these values is 50.

04:05.290 --> 04:06.600
So the range is fixed.

04:06.820 --> 04:14.320
So what we can do is we can have a mean value and create a column which is just for me I'm in gives.

04:14.320 --> 04:15.550
We don't want coming.

04:15.550 --> 04:19.030
We can create two columns as we come in and go, Max.

04:19.240 --> 04:24.040
So this is completely our choice, how we want to modify this particular column.

04:25.590 --> 04:32.880
The next column is City Now, city is a string of values, is having strong values, so it is a categorical

04:32.880 --> 04:33.210
column.

04:33.520 --> 04:39.540
Now, what we have known for that to the column is that we will be creating dummies out of this.

04:40.360 --> 04:45.930
That is, we will be creating one encoding, which is basically we will be creating a zero one zero

04:45.930 --> 04:46.580
one nothing.

04:46.800 --> 04:52.190
So we will have a column name Chinni and below zero or one values.

04:52.200 --> 04:55.230
Based on that, there is a value present or not.

04:58.230 --> 05:06.480
So we can work my way into another column on that column again, we'll have zero and one values depending

05:06.480 --> 05:09.900
on if the value is Mumbai for that particular.

05:10.410 --> 05:16.750
So each role will be converted into instead of a one city column.

05:16.770 --> 05:23.490
It will be converted into four columns with each of the values of Jenay Daily Woombye and.

05:24.790 --> 05:31.870
So we'll see how we do that next is having the reading column, this reading column will have again,

05:31.870 --> 05:34.720
excellent vibe, excellent, but I think good values.

05:34.870 --> 05:36.040
So these values.

05:38.140 --> 05:38.740
Have.

05:40.040 --> 05:41.570
US order in prison.

05:42.110 --> 05:46.670
So there is an order present in these values, so these are ordinary values.

05:46.850 --> 05:51.120
So ratings because it is the ordinary column or the new variable.

05:51.350 --> 05:55.290
So we will convert this froma string type to one number nine.

05:55.700 --> 06:02.120
We will give specific numbers to good, pathetic, excellent buy based on the sequence which is present

06:02.360 --> 06:07.000
so we can give BIID as one, then pathetic as zero.

06:07.160 --> 06:14.720
We can give good to let us see two and then excellent two three with three being the best rating and

06:14.720 --> 06:17.050
zero being the most reaping now.

06:17.300 --> 06:20.330
Next thing is balance, which is a floating coin.

06:20.340 --> 06:21.080
So this is fine.

06:21.080 --> 06:26.210
So balance has a floating point value, so it does not need to be converted into a numeric form.

06:26.360 --> 06:31.700
Similarly, children is again in digital form, so it does not have to be converted into a numeric form,

06:31.700 --> 06:33.120
which is already the numerical.

06:33.510 --> 06:41.750
Now, as we look at these values now, age has these missing values and balance also has these not a

06:41.750 --> 06:47.580
number of values, which means that we will have to imbue these values here.

06:48.680 --> 06:55.820
So what we do is for these data types, once we have checked this, we will start modifying these values

06:55.820 --> 06:56.460
one by one.

06:56.720 --> 07:03.700
So here, if you see for my data age, what I am doing here is I am imputing the values.

07:03.720 --> 07:06.620
I am just converting these values into a numeric form.

07:06.620 --> 07:07.840
Now, how do I do it?

07:08.060 --> 07:11.060
I do it with the user function to numeric.

07:12.020 --> 07:16.190
So there's two numeric function, basically convert this.

07:17.360 --> 07:28.640
A mixture of strings and numeric type in a B or numeric data and converting that, let us see if I just

07:28.640 --> 07:31.580
simply do it this way, I just remove this.

07:31.910 --> 07:35.330
It is not good cause because we don't know what that is as of now.

07:35.510 --> 07:42.410
So once I run this, what happens is you can see that it gives me an error that unable to string missing

07:42.410 --> 07:43.160
and position.

07:43.820 --> 07:45.620
So I position 11.

07:45.620 --> 07:51.740
It found a string which is missing and it is not able to understand what number should it please instead

07:51.740 --> 07:52.900
of the string missing.

07:53.180 --> 07:55.850
So it is not able to convert it properly.

07:56.060 --> 08:05.510
So to resolve this issue, we basically have this error cause what it does is whenever it finds error,

08:05.660 --> 08:08.290
it will convert that into not a number.

08:09.250 --> 08:17.140
So I'll just run this for you, so when I run this, it has converted these values into Northern No,

08:17.140 --> 08:19.390
you can see at 11 they have an odd number.

08:19.390 --> 08:21.160
At 14, they have the number.

08:21.430 --> 08:23.940
And if you will, check the data type again.

08:24.190 --> 08:25.780
So let me run this one again.

08:25.790 --> 08:31.000
So if you see the database back in the day that I would have changed to number.

08:33.350 --> 08:41.090
Let me do this again, so here you see the leadership has changed to a floating point, so we have resolved

08:41.090 --> 08:42.180
this particular issue.

08:42.680 --> 08:50.540
Now, next, what we can see here is we have we can perform different operations like the new column,

08:50.750 --> 08:54.450
or we can generate columns using the existing columns.

08:54.680 --> 08:58.810
So here what I'm doing is I'm creating a column named Constant Variable.

08:59.220 --> 09:02.030
I'm giving the value one hundred and say this.

09:02.330 --> 09:03.380
So I'll run this.

09:03.590 --> 09:06.800
And when I run this, you can see the data.

09:07.100 --> 09:11.020
So I have created a constant variable which has value.

09:11.040 --> 09:18.380
Hundert, how we create this is I just give the variable name, the column name in the races which I

09:18.380 --> 09:20.600
want to create and the value in front of it.

09:20.750 --> 09:23.390
So it has created this constant variable with value.

09:24.290 --> 09:28.280
Next, I give a new column name, which is violence.

09:28.280 --> 09:29.410
Log on.

09:29.600 --> 09:37.250
The formula for it is and we don't look, I'm just applying the mathematical function log to the violence

09:37.250 --> 09:40.070
value so it picks out the violence value.

09:41.830 --> 09:46.300
And big a logo and puts it in the violence column.

09:46.610 --> 09:54.250
Now, another thing is wherever the value of the violence would be, not a number, the value of violence

09:54.520 --> 09:55.420
also comes out.

09:55.450 --> 09:56.390
Do we know the number?

09:56.770 --> 10:04.870
So once we are doing such operations, so what we will have to do is if we are imputing the values to

10:04.870 --> 10:12.220
the violence column, we will also have to invoke new values to the violence column or what we can do

10:12.220 --> 10:18.920
is be reinforced if we import the values in the violence column and then generate the violence column

10:18.940 --> 10:22.480
so that it will automatically get the values from the violence column.

10:23.770 --> 10:30.880
Now, the next thing is maybe the age children ratio now here what we are simply doing is we are getting

10:30.880 --> 10:37.930
the age data and the children data and we're just dividing the data by the children data so that we

10:37.930 --> 10:41.050
could get the ratio of now.

10:42.480 --> 10:47.760
This is, again, the same thing, so because it has a number here.

10:48.970 --> 10:54.840
So I know the number has been generated, so these are a few things which we need to handle accordingly.

10:57.850 --> 11:06.760
Now we can do as many complex calculations, we want to create new columns and wherever we have missing

11:06.760 --> 11:09.370
values, we will have to imbue those values.

11:09.520 --> 11:12.790
Now, how we will impose those values will look at that little.

11:14.040 --> 11:22.150
Here we have another column maybe that each now we will check if there is any value in this each column.

11:22.560 --> 11:28.780
Now, as we already know, there are a lot of northern values in this particular each column.

11:29.070 --> 11:36.570
So when we do some of these so I'll just run this so you can see whatever the value was, not that we

11:36.570 --> 11:37.530
are getting a true.

11:39.700 --> 11:46.160
And when they run up some more food, so it will give me some of those values.

11:46.330 --> 11:49.450
So in total there are five nonsuch values.

11:50.020 --> 11:57.730
So what we can do is we need to impute values that we need to fill these places so that whenever we

11:57.730 --> 12:03.400
are performing any mathematical operation or whenever we are creating a model on top of it, it does

12:03.400 --> 12:09.460
not have any values because in case they would be null value, then it would not be able to process

12:09.460 --> 12:11.670
that to train our models.

12:12.310 --> 12:20.830
So whenever we are training on to make sure that no column has any knowledge, values or any missing

12:20.860 --> 12:21.430
values.

12:22.900 --> 12:24.400
So what we do here is.

12:26.130 --> 12:35.730
For my data, not location, so I'm filtering out the locations in my my data data frame and what am

12:35.730 --> 12:35.840
I?

12:35.930 --> 12:39.570
So I'm filtering my B that H.

12:40.590 --> 12:49.680
So I'm filtering in the my data age column and what am I taking, I am checking wherever it is.

12:50.920 --> 12:56.440
So this is basically filtering out the rules, maybe the.

13:01.940 --> 13:07.780
So this is checking their my data, each column has a value.

13:08.150 --> 13:13.460
It is just filtering out those five rows for me and.

13:14.860 --> 13:19.850
It is taking out the each column out of it, so we are simply getting these values.

13:20.050 --> 13:22.690
So let me run only this much piece of quote.

13:28.860 --> 13:35.310
So here you can see we have figured out only the values, which are not the number in the each column.

13:35.490 --> 13:41.390
So now what we are doing is we are simply pushing in my data each, not me.

13:41.400 --> 13:48.150
So we are taking the mean of the mighty that each column and we are pushing it inside these five rows

13:48.150 --> 13:48.660
of data.

13:49.570 --> 13:56.530
Similarly, what I'm doing here is I'm taking the mighty the eight children ratio and I'm updating it

13:56.740 --> 14:01.040
with the updated my data age and my little children.

14:01.180 --> 14:07.420
So now there will not be any null values which would have been imputed earlier because of the knowledge

14:07.420 --> 14:07.970
values.

14:08.920 --> 14:12.250
So when I ran this and then done this again.

14:12.400 --> 14:16.250
So now you can see there is no null value present in this particular column.

14:16.280 --> 14:22.890
And when I run my data again, so you can see that the values which were not that is the 11th rule.

14:23.740 --> 14:24.760
I'm the 14.

14:24.780 --> 14:33.080
True, they have been computed with the mean of the each column I'm the same has been updated in the

14:33.170 --> 14:33.610
children.

14:33.610 --> 14:33.970
Rachel.

14:36.170 --> 14:40.580
Now, the next thing what we can do is so here we have the reading school.

14:40.650 --> 14:43.460
So now reading was or the news column.

14:43.760 --> 14:45.040
So what did we decide?

14:45.050 --> 14:48.730
We will give some reading to the ordinary values.

14:48.740 --> 14:55.640
So instead of creating the values, what we do for all the variables is that we are pushing different

14:55.640 --> 14:57.640
values to the goal itself.

14:58.160 --> 15:05.600
So what I'm doing here is in the pathetic column, I'm pushing it minus one in the bag column I'm pushing

15:05.600 --> 15:15.500
in Zettl in the good or excellent column we are pushing in the one volume so that the value for reading

15:16.010 --> 15:17.140
is pathetic.

15:17.330 --> 15:18.800
It will be replaced by one.

15:19.130 --> 15:22.430
Whenever it is five, it will be replaced by zero.

15:22.640 --> 15:25.850
Wherever it is good or excellent, it will be replaced by one.

15:26.720 --> 15:28.310
So now what I do here is.

15:29.180 --> 15:31.150
I check my little reading school.

15:32.120 --> 15:39.380
This is a new column which I'm creating my writing school and I'm checking and beyond with this is a

15:39.380 --> 15:44.650
new format for you and we don't where it is checking a particular location.

15:44.930 --> 15:48.740
So where they're checking it is checking into my data.

15:48.740 --> 15:52.850
Reading it is checking values in the middle eating.

15:53.070 --> 15:54.290
Now, what is it checking?

15:54.290 --> 15:59.330
It is checking that the value inside my eating is in.

15:59.540 --> 16:07.280
If the value in the military thing is a part of this particular list, if any of these values is a part

16:07.280 --> 16:10.670
of good or excellent, then or one otherwise make a.

16:12.370 --> 16:19.150
So wherever the valley would be good or excellent, it would be changed to one and all of the values

16:19.150 --> 16:19.920
would be converted.

16:20.590 --> 16:26.500
Now see, whatever the value is good or excellent it has been converted to, one would just suffice.

16:26.840 --> 16:31.900
Now, the other situation was the buyer has to be converted to zero, which is also fine.

16:31.900 --> 16:36.040
That also has been done by putting Zettl in the other condition.

16:36.220 --> 16:41.800
Now, the only thing that we need to make sure is that whatever the value is for Turkey, that has to

16:41.800 --> 16:43.270
be replaced by minus one.

16:43.450 --> 16:50.080
So what we do is in my location, my be eating whatever the value is, but typic.

16:51.610 --> 16:55.980
We are opening the values for ratings, so we are getting the values for ratings.

16:56.430 --> 17:00.080
So what we do is, again, we will run only this much piece of gold.

17:03.050 --> 17:10.550
So let's run this first and we'll run this piece of code so you can see for filtered out breathing values,

17:10.580 --> 17:11.320
that's what I do.

17:11.690 --> 17:17.540
Because of this particular situation, the current values for my rating has been converted to zero.

17:17.910 --> 17:25.580
Now, what we do is we will simply open the ratings for value and replace it by minus one.

17:27.300 --> 17:28.710
And when we open this.

17:31.810 --> 17:39.520
Now, you can see the values have been updated now in the trading volume, we have these values and

17:39.520 --> 17:42.790
for those respective ratings, gold has been updated.

17:44.460 --> 17:51.840
Now, we could have done this in several other ways, but we just wanted to introduce to you now let

17:51.840 --> 17:53.940
me get my head again.

17:55.820 --> 17:59.540
Now, in the state of Maine, we have corrected the each column.

18:00.450 --> 18:07.370
The ratings have been updated, the eight show has been updated based on the new age.

18:07.950 --> 18:12.630
Now, the next thing which we can work on is the equal value here.

18:12.630 --> 18:19.860
The fecal value has a string and each string has numbers with a hyphen improvement.

18:21.200 --> 18:28.310
So what we can do is we can split these values now to split a string, we have a split.

18:29.680 --> 18:35.440
So let's try to use that mindset of equal split with hyphen as the split.

18:36.280 --> 18:44.720
So when we run this big legacy, this object has no attribute split, which means that this cannot split

18:44.890 --> 18:47.840
because it is considering this as a CBDs object.

18:48.160 --> 18:52.810
So we need to focus on what this industry and then do the split.

18:53.810 --> 18:59.380
So here they are, converting it into a string first and then performing the split on top of it.

19:01.510 --> 19:05.800
So once we get a list of the numbers.

19:06.930 --> 19:14.910
And now these two lists would have to be converted into data frames, so what we can do is we have expand

19:14.970 --> 19:22.420
method present in place, which allows us to convert this these two lists into a complete form.

19:22.770 --> 19:26.310
So what we can do is if you convert it into a data offering form.

19:26.320 --> 19:32.340
So let me switch this and expand this so you can see that a new data frame has been created with the

19:32.340 --> 19:34.620
forced index.

19:34.620 --> 19:39.720
Has the value from the list the most value from the list, and the second one has the second value from

19:39.720 --> 19:40.110
the list.

19:42.570 --> 19:49.350
Now, the next thing what we can do is here we have got the data frame so we can finally get the Dufrene

19:49.350 --> 19:57.900
convert this into a floor fight because this is still a string like it has been split into two different

19:57.900 --> 20:00.840
values, but this value is still a string.

20:01.080 --> 20:06.240
So we have to convert this into a number, into a plot point and then use it for the.

20:07.370 --> 20:15.260
So what we do here is we get the mighty road vehicle then converted into a string after converting it

20:15.260 --> 20:15.990
into a string.

20:16.010 --> 20:23.570
We split it on the basis of the hyphen and we expand this so that it can words into our data frame of

20:23.570 --> 20:26.390
the string as it is for the string.

20:26.420 --> 20:28.730
So we have to convert it into a floating point.

20:28.940 --> 20:32.110
So we are doing well as they float on top of it.

20:33.050 --> 20:40.670
I'm then getting it into the gate, so this is my new data frame, which I have read.

20:42.170 --> 20:49.130
So let me add on this, so let me bring for you so let me get keynoted.

20:53.120 --> 21:00.800
So she has these two values now what I'm doing is I'm simply creating a new column, if one F2, you

21:00.800 --> 21:05.560
can create one vehicle to vehicle, main vehicle, Max, anything like that.

21:05.870 --> 21:11.120
I simply get the values of key index zero and gain one into it.

21:12.910 --> 21:19.690
So after that, I will simply delete the column because I have selected the vehicle for Elamine and

21:19.730 --> 21:24.280
I don't need this particular column now, I don't want this kind of values.

21:24.280 --> 21:28.080
I can just use the new vehicle values which I have created.

21:28.390 --> 21:31.590
So I'm just deleting it using beneath my dad or vehicle.

21:31.600 --> 21:40.600
I can use different methods for deletion, like throw up my little vehicle so that we can save my data,

21:40.930 --> 21:45.250
not my date of drop, so we can do something like that.

21:46.360 --> 21:48.730
So now let me bring my data.

21:50.230 --> 21:53.740
So now we don't have the full volume and it has been removed.

21:54.140 --> 21:57.700
Now there is a dating column which also needs to be removed.

21:58.760 --> 22:03.590
Apart from that, we have this city in which we need to take care of.

22:04.430 --> 22:06.150
So how do we do that?

22:06.170 --> 22:11.680
We can use the antidote barometer, which we simply need where we want to check the value.

22:11.930 --> 22:17.600
So we are checking my data city values and we are checking if my editors be in the movie.

22:18.170 --> 22:23.370
So whatever might be the equivalent to city might be the city's equivalent of Mumbai.

22:23.690 --> 22:27.350
We are putting one and there it is not equal to my rate as a vehicle.

22:27.860 --> 22:29.350
Then we are putting zero there.

22:30.050 --> 22:38.510
So that will make the City Mumbai column where the values will be one or zero based on if the value

22:38.510 --> 22:40.540
is Mumbai in the city column.

22:41.660 --> 22:42.650
So I just ran this.

22:43.040 --> 22:50.060
Similarly, I have another condition where I'm creating the new column city today and I'm checking my

22:50.060 --> 22:52.110
status to be is equivalent to Jinney.

22:52.370 --> 22:58.690
So wherever this condition will be true, the value will be up in deejayed.

22:58.700 --> 23:04.820
I will be the value will be of not zero and wherever the value will be.

23:06.020 --> 23:10.240
So basically, it will give a boolean exaggerated, will have either true or false value.

23:10.460 --> 23:11.540
They've done this for you.

23:24.630 --> 23:30.480
So when we are checking this, it is giving a true and false value and what we are doing is we are doing

23:30.480 --> 23:31.640
the stip in.

23:32.250 --> 23:38.370
So when the truth will be converted to it, it will be converted to one and then it will all be false,

23:38.370 --> 23:40.820
will be converted to and it will be converted to zero.

23:40.950 --> 23:44.610
So automatically we will get the same thing which we are achieving from this.

23:45.300 --> 23:47.400
So it will give just the same thing.

23:47.470 --> 23:48.660
So let me run this again.

23:49.350 --> 23:55.200
And now we are simply checking another condition in gymnasia global indigency.

23:55.200 --> 23:56.960
So we are just doing the same thing.

23:58.680 --> 24:02.310
Now, our next thing is my little Golgotha.

24:03.270 --> 24:07.800
So we are creating another column called Golgotha and then we are using an order.

24:07.840 --> 24:13.200
So let me just run this and we will need the my column and we see the head.

24:13.380 --> 24:20.250
So now what has happened is the city column has been removed and there the city column value was changing.

24:20.330 --> 24:23.850
It has one here where the city column value was Malai.

24:23.850 --> 24:28.980
It will have one death whenever the city column value will be equal, that it will have one death.

24:29.160 --> 24:35.100
And when the city column value was Delli, it will be all zeros, which is signified by the fourth one.

24:37.150 --> 24:40.150
Now, the next thing is of the gun.

24:41.020 --> 24:44.670
Another thing is we can believe the city, the rating column.

24:44.980 --> 24:49.110
Now here what we are doing is we are getting another dummies for the reading columns.

24:49.120 --> 24:57.610
All we are doing is we are simply doing the deed on dummies and we are taking my data reading and we

24:57.610 --> 24:59.080
are doing a drop forced.

24:59.980 --> 25:06.420
Through, which means that I want to remove one of the values I'm going through, you fix reading,

25:06.430 --> 25:08.990
I'm getting up and you fix them, so I'll just run this.

25:09.310 --> 25:16.060
So here what it has done is we have three, four ratings and one was good, one was bad, one was pathetic.

25:16.060 --> 25:16.790
One was excellent.

25:17.050 --> 25:26.320
So when I used drop first, so it dropped the vibrating column because as we have discussed, it will

25:26.320 --> 25:31.960
mean like if all three volumes of these are Zeitels, that will mean that we are pointing towards bag.

25:32.170 --> 25:34.890
So we don't really need to have the bad column here.

25:35.230 --> 25:39.870
And the prefix simply means that I want this rating as a prefix here.

25:40.210 --> 25:43.240
So in case I don't want that.

25:44.130 --> 25:47.580
And I'll run this again so we will get the bad.

25:48.450 --> 25:54.150
So it has just dropped the first column, so I'll just put it back again and you can see it has been

25:54.150 --> 25:54.530
removed.

25:56.360 --> 26:02.810
So then after this, we once we have created the dummy columns, we can concatenate the dummy columns

26:03.050 --> 26:04.710
and we can remove the ratings.

26:04.730 --> 26:06.740
And here is their data, which we have.

26:07.040 --> 26:11.200
So this is how we can work, whether we can create these rating columns also.

26:11.420 --> 26:14.300
And also we can create this rating scores.

26:14.600 --> 26:22.850
Or the preferred method is to create a rating score, which will create one column only and not multiple

26:22.850 --> 26:24.400
columns for an ordinary data.

26:24.650 --> 26:34.010
This is just to show how Dumi is used, but we will be using this particular method for any ordinal

26:34.010 --> 26:39.500
variable and we will be creating dummies for any categorical.

26:41.780 --> 26:47.000
So this is for this particular session, in the next session, we will learn about the election and

26:47.000 --> 26:53.870
the four steps which we follow while we are creating the features and why we are selecting the features.
