WEBVTT

00:00.990 --> 00:04.330
In this session, we will discuss the vote feature selection.

00:04.650 --> 00:10.620
So before actually discussing the feature selection, let us discuss about the type of problems we will

00:10.620 --> 00:15.890
be dealing with and the type of data which we will be having for these kind of problems.

00:16.590 --> 00:19.030
So the problems would be of two types.

00:19.560 --> 00:24.480
One problem could be where we are trying to find out a continuous value.

00:24.490 --> 00:31.320
Just taking a moment when you are trying to find out about the amount can have value like one to two

00:31.320 --> 00:33.480
thousand three hundred or.

00:34.790 --> 00:40.460
Nine ninety nine or one million or one or any amount would be and value.

00:41.800 --> 00:48.680
Now, this amount value can be an individual, can be a floating point, so that is a continuous value.

00:49.780 --> 00:50.920
But does a numerically.

00:52.370 --> 00:58.870
Another type of problem would be that we are trying to find out if a loan has been approved or not.

01:00.030 --> 01:07.260
So in this case, the two types of values would be either the loan has been approved or the loan has

01:07.260 --> 01:08.730
not been approved.

01:09.510 --> 01:13.350
So if the loan has been approved, we can have a value one.

01:13.350 --> 01:16.790
If the loan has not been approved, the value or.

01:18.730 --> 01:26.410
So we have these two types of problems now when we are solving a problem in supervised learning money,

01:26.740 --> 01:29.300
and that is we have two types of things.

01:30.490 --> 01:32.650
One is the predictor.

01:33.650 --> 01:42.860
Predicted is the value which we are using for finding out a relation between these values so that we

01:42.860 --> 01:47.560
can derive a formula between these values, I'm using that formula.

01:47.570 --> 01:49.120
We can calculate the amount.

01:50.230 --> 01:58.570
It is simply just like they call us an anomaly and let us see in a normal equation what we will have,

01:58.570 --> 02:03.580
you will have some excellent value, some extra value, some X we value for value in X Y value.

02:03.910 --> 02:13.510
And we can form a formula where we can have something like B X one plus two into X two plus B three

02:13.510 --> 02:16.510
in the X three plus B four into X four plus B.

02:16.720 --> 02:20.860
I even do X five is equal to the value.

02:22.640 --> 02:30.290
Similarly, we can create a formula such as the one x one does that with two plus three x three plus

02:30.560 --> 02:39.200
for export plus with the five X is equal to some probability value, which will be giving us if something

02:39.200 --> 02:46.010
should fall into one or zero, if the probability is high, then we can see it to be one.

02:46.280 --> 02:52.710
If the probability is close to zero probability of having something is low, then it will be a zero.

02:53.000 --> 02:56.960
It is something like, let's say, the probability of the loan getting approved.

02:58.220 --> 03:04.190
Is 90 percent, then we can say the loan will be approved, then we can put the label, we can put the

03:04.190 --> 03:06.620
glass as one.

03:07.670 --> 03:14.910
If the loan will not be approved, the probability of loan being approved is six point five percent.

03:15.320 --> 03:17.540
That means the law will not be approved.

03:18.470 --> 03:23.960
So in that case, the value of approval will be close to zero or.

03:25.280 --> 03:31.970
So here we are calculating the probabilities and based on the probability we are deciding if will be

03:32.360 --> 03:32.810
or one.

03:34.260 --> 03:36.900
So these are the types of problem which we will be having.

03:38.190 --> 03:44.700
Now, all of these problems with the aid, certainly gorditas, and we will use those algorithms to

03:44.910 --> 03:47.340
calculate the solutions of these problems.

03:48.590 --> 03:56.090
And to find out the very values based on certain algorithm, based on certain formula which will apply

03:56.090 --> 04:04.130
on these X finds now repeating again and again the films which we use for X and Y, the domes, which

04:04.130 --> 04:11.760
are used for X, are independent values because we want X to be independent from each other.

04:12.140 --> 04:15.420
We don't want X values to be dependent on each other.

04:15.680 --> 04:22.220
We don't want the age to change according to the dependent value or to change according to the gender

04:22.280 --> 04:22.640
values.

04:23.060 --> 04:26.370
These values have to be completely independent from each other.

04:27.300 --> 04:32.760
Another thing is these values are also gold features or attribute.

04:34.280 --> 04:36.980
These values are also input values.

04:38.450 --> 04:45.470
Then the values which are the target values or the liberal values, the values which are output values

04:45.560 --> 04:57.620
to which we want to find out our food labels predicted values that values or output classes also sometimes.

04:58.340 --> 05:02.280
So these are the labels which are used for the life value.

05:02.420 --> 05:04.690
Life value is what we want to find out.

05:05.030 --> 05:10.240
I'm X values are the values which we are using to find out the Vivan.

05:11.850 --> 05:12.870
Let us get further.

05:17.320 --> 05:24.860
So why do we need feature selection, so we need feature selection because of the course of dimensionality.

05:25.510 --> 05:32.890
Now here, when I am seeing these details set in this day, I have one body which I want to predict.

05:34.070 --> 05:40.440
And I have certain values now when I have only five columns for my X values.

05:40.670 --> 05:42.530
The problem is pretty simple.

05:43.600 --> 05:49.160
Even you can solve this problem by simply calculating by hand.

05:49.780 --> 05:58.240
So it is easy to solve when the volume and the number of columns are less in number and it is also less

05:58.240 --> 05:59.750
complex for calculation.

06:00.520 --> 06:08.500
But when we have a lot of columns, when we have a lot of features which we use for calculating these

06:08.500 --> 06:12.400
values, then the complexity of the problem increases.

06:13.090 --> 06:17.570
Then the amount of calculation which is needed also increases.

06:18.430 --> 06:25.660
This is the reason why we don't want to have any irrelevant column which does not actually provide any

06:25.660 --> 06:30.540
useful information, but only increases the complexity of the problem.

06:33.080 --> 06:41.660
So as the dimensionality of the future space increases, the number of configurations can grow exponentially

06:42.290 --> 06:47.510
and thus the number of configurations covered by an observation decreases.

06:48.240 --> 06:51.620
That is the number of rules of data.

06:51.800 --> 06:58.460
The ratio which we are having of the rules of data with the number of columns is also decreasing because

06:58.460 --> 07:00.500
the rules of data will still remain the same.

07:00.920 --> 07:08.780
But the number of columns which we are having been keep increasing and then the number of X values increase.

07:09.170 --> 07:18.430
We need to have a sufficient amount of rules of data so that we can find out an adequate way for that.

07:19.500 --> 07:26.700
So this is why we need less number of features, and that is why we need to select some features which

07:26.700 --> 07:31.490
give a good amount of information and nothing relevant is given by.

07:33.530 --> 07:42.560
So what we are doing here is buying data modeling in relevant or partially relevant features can negatively

07:42.560 --> 07:48.890
impact model performance when we have any relevant or partially relevant features.

07:49.160 --> 07:57.710
What happens is that they then need extra information and extra piece of information would be needed

07:57.980 --> 07:58.820
to calculate.

08:00.510 --> 08:10.650
So a lot more complex, you're going to need to be used to consider those features on how they impact

08:10.660 --> 08:11.390
the Vivan.

08:12.360 --> 08:17.550
That is why we need to reduce these irrelevant or factually relevant jobs.

08:17.550 --> 08:22.860
We don't want these features which don't give enough information, but only increase the complexity

08:22.860 --> 08:23.210
of them.

08:24.210 --> 08:30.120
So the most important aspect of data modeling is speech and creation and feature selection.

08:30.300 --> 08:36.150
So what we do is we keep in consideration of different types of information, which we have different

08:36.480 --> 08:43.590
features or barometers we have, and try to create a lot of features and viognier, creating a lot of

08:43.590 --> 08:44.220
features.

08:44.460 --> 08:51.960
The need of architect feature selection also increases because we do need the feature creation, because

08:51.960 --> 08:54.560
we need to have adequate amount of information.

08:54.870 --> 09:01.920
But then once we have created these features, we also need to select the features which are actually

09:01.920 --> 09:02.610
important.

09:04.130 --> 09:11.480
And there is an upcoming business rule which says that we want that models to be simple and explaining.

09:12.640 --> 09:20.350
We lose the expletive deleted when we have a lot of features, so we want to have the models to be simple

09:20.350 --> 09:20.960
in nature.

09:21.160 --> 09:24.460
We don't want the work towards a complex model.

09:25.550 --> 09:32.030
And the thing is, garbage in, garbage out, which means that most of the things we will have many

09:32.030 --> 09:39.860
non informative features, for example, Naem or DVD booths, and these are for quality input, which

09:39.860 --> 09:42.170
will produce whole quality output.

09:43.520 --> 09:51.850
So in case we have some features which are irrelevant in nature, they just lead to a bad performance.

09:52.700 --> 09:59.430
And if we have a lot of features and they do not have enough information, they do not have the actual

09:59.430 --> 10:01.150
BIPIN, which we are looking for.

10:02.270 --> 10:07.400
Then, no matter how many features we have, we will not have good results.

10:08.340 --> 10:12.690
So this is the reason why we need to select the good features.

10:15.530 --> 10:18.630
Now, what are the benefits of performing feature selection?

10:19.010 --> 10:20.010
The benefits?

10:20.210 --> 10:22.130
First, it reduces over.

10:23.630 --> 10:28.690
Now, they have not really discussed what footing is now.

10:30.270 --> 10:33.780
You can learn or understand overfitting.

10:34.780 --> 10:36.580
By just.

10:37.520 --> 10:46.670
Thinking about a simple example, that is if they are preparing for a simple exam and we have two options,

10:47.360 --> 10:55.230
one option is to learn from the entire syllabus, and another option is to cram from the bus to your

10:55.250 --> 10:56.240
question papers.

10:57.540 --> 11:04.920
So what will happen is when we have learned from the past, you question papers, we will be able to

11:04.920 --> 11:08.160
answer only the questions which we have launched earlyish.

11:09.230 --> 11:15.860
So this causes overvoting, so that that is basically we have learned from us and obviously we have

11:15.860 --> 11:21.940
learned from a certain set of questions only, so we will be able to answer only those specific questions.

11:22.930 --> 11:29.710
While when we learn from the entire force curriculum, what happens is we go to each and every topic,

11:29.920 --> 11:37.440
which enables us to answer the extra questions or some questions which could be asked in different.

11:38.800 --> 11:48.550
So a model which loans from a lot of data and the model which does not just stick to specific topics,

11:48.760 --> 11:52.950
it learns more vastly and performs better.

11:54.010 --> 12:00.770
So we don't want anybody to learn from all these specific things, we want them to learn from a vaster

12:00.790 --> 12:08.360
space so that it is able to answer more accurately, it is able to predict more accurately.

12:08.920 --> 12:11.650
So this is what Overfitting is.

12:11.800 --> 12:18.010
Overfitting is when something launched from a very small space and is not able to answer for a larger

12:18.010 --> 12:18.930
space of data.

12:20.330 --> 12:28.460
So when we reduce the feature, when we do a good feature selection, what happens is the more we will

12:28.460 --> 12:34.970
not learn from something which is irrelevant in nature and actually focus more on learning from the

12:35.150 --> 12:43.340
vaster space and more important things to learn, then improves accuracy because it is less misleading

12:43.340 --> 12:43.850
in nature.

12:43.880 --> 12:45.810
So that data is less misleading.

12:45.830 --> 12:47.980
So it will improve the accuracy also.

12:48.200 --> 12:54.900
And because we will not have extra data points to learn from, so we will have lesser training time

12:55.580 --> 12:56.180
and again.

12:56.200 --> 12:59.750
It reduce the complexity of model, which is the main thing.

12:59.750 --> 13:01.530
We want to have a simpler model.

13:01.760 --> 13:04.370
So that is what it will allow us to achieve.

13:06.990 --> 13:09.990
Now, there are different methods of feature selection.

13:10.320 --> 13:16.500
One is unique variable selection, another one is feature importance than the it is a correlation matrix

13:16.500 --> 13:17.220
with Hedra.

13:18.030 --> 13:25.380
So first of all, univariate selection is using the filter method for another method is basically created

13:25.380 --> 13:29.370
using the selected best from the cyclone law library.

13:30.940 --> 13:39.250
And this is the moppy, so if we have the features as a continuous form and the output is also cantinas

13:39.250 --> 13:45.820
in nature, then we will use some correlation in case the features are contiguous in nature and the

13:45.820 --> 13:53.980
response, the output is a classification problem, then we will use Hédi in case the input is categorical

13:53.980 --> 13:56.920
in nature, but the output is continuous.

13:56.920 --> 14:04.420
We will use ANOVA and in case the input is categorical in nature and the output is also categorical

14:04.420 --> 14:06.620
nature, then we will use Chi Square from.

14:07.810 --> 14:15.660
Now, the silk vest is something we don't use that often because we already have a lot more methods.

14:16.000 --> 14:19.350
So this is something that you can explore more in case you want.

14:19.570 --> 14:25.890
But we will go towards the major and more easier things which we can implement just like that.

14:26.050 --> 14:28.550
So we will talk about those more in depth.