WEBVTT

00:01.080 --> 00:07.410
Now, let us discuss about the second project, which we have, the second project is for classification

00:07.410 --> 00:07.860
problem.

00:12.090 --> 00:19.140
For this project, again, you will have to be outliers, find the missing values, create new columns,

00:19.140 --> 00:22.460
find relevant columns and try different algorithms.

00:22.890 --> 00:26.630
But now the metrics which you will be using will be different.

00:27.850 --> 00:33.790
And another thing which you will have to keep in north is the classes in this particular date.

00:35.140 --> 00:40.990
This is the date which we have and this particular data we have around.

00:43.390 --> 00:52.840
Thirty two volumes and out of these two columns did not get value is plus, as you can see, the count

00:52.840 --> 01:00.310
of the number of rows is Doleac eighty four thousand eight hundred and eight, while the sum of the

01:00.320 --> 01:08.860
glass value is for nine people, which clearly shows that there are a large number of values as well

01:09.010 --> 01:13.650
and only a few rows of data which have one person in them.

01:14.500 --> 01:23.320
And this dataset is a credit card fraud dataset, which clearly shows that most of the users are genuine.

01:23.500 --> 01:30.130
While only two hundred and eighty four examples are there, which are all fraud data.

01:31.220 --> 01:32.020
I'm here.

01:32.290 --> 01:40.240
I mean that it will be to find out these fraudulent data points correctly.

01:40.690 --> 01:48.880
Hence we will be looking out for the all of these classes and hence the metric which should be used

01:48.880 --> 01:51.960
for this particular problem should be called.

01:53.290 --> 02:02.590
Another thing which should be taken care of is that this data is imbalanced in nature, hence for using

02:02.710 --> 02:11.470
and selecting the class speed it should be selected appropriate for the you can decide if you want to

02:11.470 --> 02:14.230
keep this time the column or not.

02:14.500 --> 02:21.370
I compare all of these columns which are present and select the columns which are important.

02:22.210 --> 02:28.210
You will be applying the same techniques which have been discussed or you can explore different other

02:28.210 --> 02:28.830
techniques.

02:29.050 --> 02:35.770
The main target here is to find out the fraudulent credit card transactions correctly.

02:35.930 --> 02:42.510
That is to predict those four hundred and twenty nine point correct.

02:42.910 --> 02:45.370
So hence you will split the data.

02:45.370 --> 02:54.310
Or if you want to do, you can use cross CV and you can use different mechanisms by the main target

02:54.310 --> 02:58.680
here is to identify these as accurately as possible.

03:00.370 --> 03:04.670
So this is the next project, which is declassification project.

03:04.990 --> 03:05.590
Thank you.