WEBVTT

00:01.290 --> 00:08.010
This particular project is relevant to the automobile industry, so here he will be walking on the save

00:08.020 --> 00:09.520
driver classification.

00:09.900 --> 00:18.090
So this is regarding the insurance companies who place a lot of importance as data science and predictive

00:18.090 --> 00:21.090
algorithms have them keep the premium low.

00:21.420 --> 00:31.080
So the data is always been at the core of what insurance companies do for analyzing and they claim find

00:31.080 --> 00:34.980
out what claim is relevant, which claim is not relevant.

00:35.340 --> 00:42.030
So here this particular data set contains around 30000 Rusal be done, 17 different features related

00:42.030 --> 00:47.760
to the driver itself so that the insurance company can actually predict if the driver would be a safe

00:47.760 --> 00:52.680
driver or not, and then make sure if they should be doing out the short interval.

00:53.820 --> 00:56.070
So this is the particular data set on here.

00:56.080 --> 00:57.990
You need to perform a classification.

01:00.750 --> 01:08.310
So here we have this data good value, which is zero and one one being the driver to save on insurance

01:08.310 --> 01:09.240
would be provided.

01:09.240 --> 01:13.230
Abdelal being the driver is unsafe and relevant to this.

01:13.440 --> 01:18.560
There are different golomb such as genda engine details, credit history.

01:18.960 --> 01:23.100
Then we have years of experience and we claim marital status.

01:23.100 --> 01:29.330
Wacol dipen different other values and these are different range type of columns.

01:29.490 --> 01:34.240
So these old columns you need to convert into numeric type.

01:34.290 --> 01:37.470
You need to find out a good way to convert these into numeric.

01:38.460 --> 01:42.570
Then there are different other informations like credit history, buckett.

01:42.780 --> 01:46.680
So these are different columns which you have for this particular data.

01:46.890 --> 01:52.950
And your target here is to find the if the driver is safe or not.

01:54.060 --> 02:02.760
So the main concern here would be to convert these different categorical variables into numeric variables

02:02.760 --> 02:08.160
that doesn't do dungee variables and how you handle this each criteria.

02:08.400 --> 02:14.310
So maybe, first of all, you will have to find out different age criteria and then see what are the

02:14.310 --> 02:15.480
ranges present.

02:15.480 --> 02:20.430
And then maybe you can create dummies for each range or something like that.

02:20.460 --> 02:24.960
So this is something which you need to think about on how you handle this.

02:25.490 --> 02:26.670
So thank you.
