WEBVTT

00:02.290 --> 00:05.330
So let us have a look at the solution on this particular problem.

00:05.740 --> 00:07.030
So here we are.

00:07.060 --> 00:15.400
First of all, important, the important libraries and we have also got the data set into this data

00:15.400 --> 00:15.760
frame.

00:16.000 --> 00:21.340
And this is the data that we have, the satisfaction level, last evaluation number of project abridgment,

00:21.340 --> 00:26.950
the time spent in the company, Wilcockson, live promotion and all of the goals which we just discussed

00:26.950 --> 00:28.600
in the last review.

00:30.430 --> 00:35.620
So let us suppose to have a look at the value column so that we can see what is the distribution of

00:35.620 --> 00:36.070
our target.

00:37.180 --> 00:43.990
So our target value is having value zero and one zero means that person has not left the company, and

00:43.990 --> 00:46.040
one means that the person has left the company.

00:46.420 --> 00:54.250
And here you can clearly see that there are on seven thousand doors, which has that person does not

00:54.250 --> 00:57.550
leave the company, but three thousand rooms where the person needs the company.

00:59.500 --> 01:06.910
So when we look at the issue, it is two point four, which implies that there is a huge difference

01:06.910 --> 01:07.640
between these.

01:07.640 --> 01:12.640
So probably we will have to have a weighted classification.

01:13.240 --> 01:18.610
So next we will have a look at different columns.

01:18.640 --> 01:23.950
So there are only two columns which have objected to all of the non numerical columns.

01:24.190 --> 01:29.540
So the objective, they are the sales and salary.

01:30.910 --> 01:35.680
So next thing which we will be doing is on what is in the military.

01:36.880 --> 01:40.600
So this is the same code which you have seen several times.

01:40.780 --> 01:45.000
So we are converting this into a dummy variable.

01:45.010 --> 01:48.850
So we have converted into inside the column into dummy variable.

01:50.620 --> 01:58.360
Now, when we have a look at the shape of this particular data, we there are in Saudi in columns and

01:58.450 --> 02:00.790
around 10000 rows of it.

02:02.890 --> 02:15.250
Next, we are getting the data in two extreme and widely by dropping the left column from D.C.I and

02:15.820 --> 02:20.190
keeping only the left column from here in the extreme right.

02:23.250 --> 02:29.340
Next is just simply a code which will give you a report of the model, which we have just run.

02:29.760 --> 02:37.050
And next, what we are doing here is we are learning different parameters, which we have, for example.

02:37.080 --> 02:40.080
So here I am simply implementing X Eustachy.

02:40.440 --> 02:44.620
You will be implementing different algorithms and comparing amongst them.

02:45.030 --> 02:47.130
So this is the process which we will be following.

02:47.140 --> 02:54.930
You will be implementing different algorithms of logistic regression decision trees, random 460 most

02:55.620 --> 02:57.300
of my ways as.

02:57.720 --> 03:03.650
You will try all of those and then find out which works best on this particular dataset.

03:04.050 --> 03:07.920
And after that, you will be fine tuning that particular model.

03:07.920 --> 03:12.640
If it still does not perform, then you will be stacking different models.

03:14.490 --> 03:19.270
So these are different type of parameters which are present in extreme boost.

03:20.340 --> 03:28.500
What I have done here is I have first of all, implemented a randomized so Steube you'll find out how

03:28.500 --> 03:36.870
my model would actually perform and I have completed my to within so that out of so many combinations

03:36.870 --> 03:46.410
or nithin combinations will be created and I will be getting the model performance out of only this

03:46.780 --> 03:47.640
Benwood.

03:49.080 --> 03:53.760
So I'm simplifying the model and got the report out of the report.

03:53.760 --> 03:58.290
States that I have eighty three percent accuracy here.

03:59.220 --> 04:02.630
Next is eighty three point seven three point six.

04:02.910 --> 04:05.700
So the best one comes out to be three point seven.

04:05.700 --> 04:12.690
And these are the barometers which I have obtained from this now to finding this particular model,

04:12.690 --> 04:16.600
I will be applying these sequential parameter.

04:17.940 --> 04:24.410
So what I have done here is I have selected one parameter at a time or two or three parameters, ultrafine,

04:24.770 --> 04:28.680
and then I am simply fine-tuning each and every parameters like that.

04:29.160 --> 04:32.460
So first, I have been the number of investigators.

04:32.850 --> 04:36.100
I have seen the model and Gwatney Disvalue regarding that.

04:36.120 --> 04:40.470
So once they get the model, I get then the end estimate of nine hundred looks.

04:40.470 --> 04:43.960
But even then an estimated one hundred also looks very big.

04:45.060 --> 04:51.240
So what we do is we select something which we find is useful enough.

04:51.240 --> 04:55.350
So that is not much difference between the an estimate estimated.

04:55.350 --> 04:56.600
Nine hundred and a hundred.

04:56.910 --> 05:02.580
So we are thinking 500 because that is a good number for investigators.

05:02.580 --> 05:02.790
Right.

05:02.840 --> 05:03.630
We will be fine.

05:04.620 --> 05:08.980
So what we do for that is we will make the learning rate again.

05:09.000 --> 05:17.850
And here I have just kept morning zero point one and I will be picking of the gamein max depth value

05:17.850 --> 05:20.490
and fine tuning what these two parameters.

05:20.940 --> 05:29.280
So yeah, I have fixed the learning rate and we did and subsamples and I am fine tuning on top of it.

05:29.670 --> 05:33.630
And then I get these values regarding the parameters.

05:33.630 --> 05:35.520
I get the max to be 12.

05:35.520 --> 05:37.950
I'm gonna do it next again.

05:37.950 --> 05:39.660
I will fix these two values.

05:39.660 --> 05:42.480
I'm fine doing some other parameters and so on.

05:42.480 --> 05:44.050
This cycle would keep on going.

05:44.340 --> 05:46.500
So this is something that you will have to work own.

05:47.880 --> 05:55.950
And finally, when I keep on doing this entire process again and again, I end up with various values

05:55.950 --> 05:59.220
for these hyper parameters, which I keep fixing.

06:00.090 --> 06:04.420
And at the end I am left with an optimized model.

06:04.440 --> 06:09.090
So here you can see the mean validation is eighty four point one.

06:09.930 --> 06:14.910
When I look at the next one here, again, the mean validation is eighty four point one with standard

06:14.910 --> 06:17.520
deviation zero point zero two one.

06:20.040 --> 06:27.150
If we compare this from the previous one here, we have zero point equal one with the meanwhile standard

06:27.150 --> 06:29.750
deviation of zero point zero one seven.

06:31.290 --> 06:38.170
Here we have zero point eight four zero with standard deviation of zero point zero zero.

06:38.370 --> 06:44.580
So if you compare between these different models, you can pick out any model out of these where we

06:44.580 --> 06:52.740
have this eighty four point one, which seems to be a better one and select that particular model next.

06:52.750 --> 06:57.540
What I'm doing is I am simply getting the best model out of it.

06:58.080 --> 07:03.400
And from the best model, I will simply find out the cross validation score.

07:03.750 --> 07:09.960
So these are different cross-validation scores for this particular model and I am getting the mean and

07:09.960 --> 07:16.110
standard deviation out of to the mean that a school comes out with three point sixty six and the standard

07:16.110 --> 07:18.590
deviation comes out to be zero point one one.

07:19.020 --> 07:21.480
So this is my model.

07:21.690 --> 07:29.490
I have for you did you can try a different strategy, different importance of them together and create

07:29.490 --> 07:32.880
your own model of what this looks like, a good base for defense.