WEBVTT

00:01.450 --> 00:02.250
Hello, everyone.

00:03.730 --> 00:09.820
Now, we have gathered all of the important skills which are required to begin our joining the data

00:09.820 --> 00:10.300
science.

00:11.780 --> 00:20.690
We know what fighting is, how to write the code and him, we are well aware of statistics, so now

00:20.840 --> 00:21.650
is the point.

00:21.650 --> 00:24.550
Then we will begin with the actual machine gun.

00:26.280 --> 00:34.130
And you must know that a major part of machine learning lifecycle goes into the.

00:36.140 --> 00:38.360
We have a lot of data in hand.

00:38.390 --> 00:49.970
We have data from the booths, from different websites or feedback forums, and these data are present

00:49.970 --> 00:52.820
in a structured form and also an unstructured form.

00:54.430 --> 01:02.260
Structured form could be in Excel files or devious and unstructured forms, could be meby or tweet.

01:03.430 --> 01:13.600
So all of this data needs to be collected, clean and some specific data needs to be selected, which

01:13.600 --> 01:15.870
could actually be used for data.

01:17.190 --> 01:25.230
So in this particular module, we will discuss about data preparation, which will include that.

01:28.030 --> 01:31.310
What are the different types of features available?

01:32.440 --> 01:38.830
And then once we identified different types of features, then we will check if there is any type of

01:38.830 --> 01:41.680
missing noisey or outlier data prison.

01:43.550 --> 01:49.160
And we will handle this type of data once we handle this data.

01:49.370 --> 01:58.340
Now, whatever we have is a complete data, but some of these columns might be useful and some may not

01:58.340 --> 01:58.550
be.

01:59.900 --> 02:07.700
And sometimes you might want to create new problems so that we can retrieve important information from

02:07.700 --> 02:07.940
that.

02:08.830 --> 02:16.810
So for that particular task, we will be creating new columns using the existing columns in hand, we

02:16.810 --> 02:23.200
will be selecting some columns out of all the features which we have, and we will be removing some

02:23.200 --> 02:25.090
columns which are not really required.

02:25.980 --> 02:32.310
So all of these things we will be doing as a part of the preparation and finally.

02:33.320 --> 02:40.730
We will convert everything into a new medical now different type of data for the structured data on

02:40.730 --> 02:41.700
actual data.

02:41.900 --> 02:49.300
So as part of data preparation, we will learn how to handle the new categorical unstructured data,

02:49.490 --> 02:52.310
the data which we get from a tablet form.

02:52.730 --> 02:57.050
And we will also learn how we can work on fixed data.

02:57.410 --> 03:04.460
So these are two things which we will be learning in the Depression and we will see how we can work

03:04.820 --> 03:09.620
using two very important Lively's that is number by and find us.

03:09.800 --> 03:18.140
And we will also learn how we can analyze the data using visualization libraries like from my lab and

03:18.170 --> 03:18.670
tiedemann.

03:20.750 --> 03:29.870
And for tax data, we will see how we can use libraries like Analytica so that we can retrieve important

03:29.870 --> 03:33.090
attributes, important features from the FDIC's data.

03:33.740 --> 03:38.240
So I hope you will learn a lot from this particular session.

03:38.960 --> 03:39.590
Thank you.