WEBVTT

00:02.290 --> 00:07.240
Now the next thing which we need to understand, is the chiefs of distributions.

00:08.270 --> 00:15.880
Now a data distribution shows all the possible values of the data and how when each value occurs.

00:20.200 --> 00:22.000
So what does of distribution?

00:22.180 --> 00:28.150
So let's say we have this particular data which is three, four, five, seven, eight, eight, nine,

00:28.150 --> 00:31.810
11, 12, 14, 15, 17, 18, 19, 19.

00:32.110 --> 00:38.190
So what we do here is we like this on a graph.

00:39.760 --> 00:42.310
And we try to create a call out of.

00:44.070 --> 00:51.930
So when three has one value and then four has one value, these values will have just a constant value

00:52.230 --> 00:55.890
and eight, because it has a higher number of values.

00:56.100 --> 00:59.700
The code will go up a little and then come down.

00:59.700 --> 01:02.880
The nine has one value, any living has one value and so on.

01:04.910 --> 01:05.450
So.

01:07.450 --> 01:10.360
These are the different type of distributions which we have.

01:11.290 --> 01:20.260
The normal distributed mean and median and mode are all equal and therefore are all appropriate measures

01:20.260 --> 01:22.690
of central tendency for skewed.

01:22.930 --> 01:26.790
The median may be a more appropriate measure of synchronous demands.

01:28.640 --> 01:31.040
So we have the bell shaped echo.

01:31.400 --> 01:38.330
So initially the smaller values have low values, low number of occurrences, and then the number of

01:38.330 --> 01:42.320
occurrences actually increase and then they go down again.

01:42.710 --> 01:47.420
It could be something like, let's say height of students in a class.

01:47.690 --> 01:54.860
So there would be less number of students with low height and less number of students with very high

01:54.860 --> 01:55.190
height.

01:55.700 --> 02:02.030
But then average height would be constant for the students it would bring in, you know, smaller number

02:02.030 --> 02:04.220
would mean a middle number.

02:05.740 --> 02:10.240
This is called the Bell Sheeple, which is also the normal distribution.

02:11.020 --> 02:17.050
Then there could be a triangular distribution, there could be a uniform distribution that all the values

02:17.050 --> 02:18.850
have seen the multiple sentences.

02:19.420 --> 02:21.490
Then there is the reverse GC.

02:21.760 --> 02:24.010
GC, right skewed.

02:24.850 --> 02:32.230
Right skewed because it is kind of stretched out and skewed out towards the right hand side.

02:33.010 --> 02:37.510
This is left skewed because it is stretched out towards the left hand side.

02:38.620 --> 02:42.790
Bimodal, which has two modes.

02:43.240 --> 02:43.770
Which is it?

02:43.780 --> 02:50.620
Which is it has two tips to it and multimodal, which has multiple things to it.

02:52.980 --> 03:00.300
Now for normally distributed data, which is basically this kind of data, the mean median and mode

03:00.600 --> 03:01.560
unseen.

03:03.010 --> 03:09.250
So what all of these mean median and what are appropriate measures of centers?

03:09.250 --> 03:15.340
And it's why even we talk about the left skewed or right skewed the mean value.

03:17.210 --> 03:21.230
And the mood value will be different from each other.

03:21.440 --> 03:25.310
So median will be a more appropriate measure.

03:25.340 --> 03:26.810
We'll see how that happens.

03:27.050 --> 03:32.400
But for now, you can understand that median will be a more appropriate measure of central tendency.

03:34.200 --> 03:36.330
Now let us look at normal distribution.

03:37.510 --> 03:45.010
So a normal distribution, sometimes called as the bagel, is a distribution that occurs naturally in

03:45.010 --> 03:46.180
many situations.

03:46.870 --> 03:50.770
For example, the BALTO is seen in scores of settlements.

03:51.520 --> 03:57.310
So when we are talking about schools of students, the bulk of the students will, for the average,

03:57.430 --> 03:58.300
see value.

03:58.930 --> 04:04.720
Basically, there would be a lot of students who will scored average value.

04:04.900 --> 04:11.200
While there will be a smaller number of Syrians who will score, be or B and there will be a very small

04:11.200 --> 04:14.170
number of students who will be scoring an F one in eight.

04:15.010 --> 04:19.160
So this will create a distribution which will resemble the double.

04:20.230 --> 04:22.480
America is symmetrical in nature.

04:22.840 --> 04:27.850
And half of the D doesn't fall on the left side of the mean and half of the data will fall on the right

04:27.850 --> 04:28.360
side of the.

04:30.020 --> 04:33.060
And many groups will follow this type of fight.

04:34.350 --> 04:38.880
So this is a highly used distribution, the normal distribution.

04:39.240 --> 04:42.690
So let us see some characteristics of the vehicle.

04:43.020 --> 04:44.520
It is created on the bottom.

04:44.520 --> 04:46.710
He does mean and standard deviation.

04:48.470 --> 04:50.090
And it can be followed.

04:51.270 --> 04:52.710
For various purposes.

04:53.010 --> 04:55.380
LAKE Let us see scholarship distribution.

04:55.770 --> 04:58.190
So let us say we have this medical.

04:59.410 --> 05:00.580
This is the makeup.

05:01.030 --> 05:04.240
And in this makeup, we have this kind of a distribution.

05:04.450 --> 05:12.130
So the most number of screens will have the C rating and then there will be fewer students who will

05:12.130 --> 05:19.630
have to be A, B or B, and there will be a very small number of strings we got.

05:20.140 --> 05:24.640
Who would have got anything or everything right now?

05:24.730 --> 05:27.100
Let us say we want to distribute scholarship.

05:28.150 --> 05:36.340
Then what we can do is we can so you get all the students on the basis of the topmost ratings.

05:36.760 --> 05:44.710
So this has a distribution where the middle most, 80 of which is the mean value plus one standard deviation

05:44.710 --> 05:46.600
and minus one standard deviation.

05:46.600 --> 05:51.900
The area between this month's spray has 68.27% of the beta.

05:53.250 --> 06:01.710
While the meme plus y is standard deviation, I'm the mean minus twice standard deviation.

06:01.950 --> 06:12.030
This entire area has 95% of the entire data and mean minus three standard deviation and mean plus three.

06:12.030 --> 06:15.690
Standard deviation has 99.73% of the data.

06:16.290 --> 06:25.050
So what we can do is we can give the scholarship to any student who has scored more than plus standard

06:25.050 --> 06:25.650
deviation.

06:27.010 --> 06:31.780
Or let's say we want to fire people from the organization.

06:32.620 --> 06:41.830
Then we can put the people from a rating less than mean minus the standard deviation and do the.

06:43.590 --> 06:48.060
One of the things on where we can monitor the performance of the people at work.

06:50.840 --> 06:57.050
So the bellicose can be followed for various focuses like scholarship distribution, finding the best

06:57.050 --> 07:01.940
performers or the worst performers, or finding out the segment of a data.

07:05.940 --> 07:07.410
Now what is for doses?

07:08.610 --> 07:17.250
Gold doses is a statistical measure that defines how heavily the fields of our distribution differ from

07:17.250 --> 07:19.140
the themes of normal distribution.

07:19.470 --> 07:23.480
So we have this particular distribution and it has this same.

07:24.470 --> 07:24.790
Right.

07:25.100 --> 07:29.510
This deal is like a normal this deal frozen.

07:29.870 --> 07:37.370
Now, if the deal is highly extended or if the deal is not extended, it is a very small deal.

07:37.610 --> 07:42.680
So if you want to understand how the deal is, then we can use good doses for that.

07:44.300 --> 07:51.080
So Caduceus identify as whether the feel of the garden given distribution contains extreme values or

07:51.080 --> 07:51.350
not.

07:53.790 --> 07:56.100
So these are the types of producers.

07:56.310 --> 07:58.150
One is liptak a talk.

07:59.350 --> 08:01.210
We're just having a.

08:03.390 --> 08:12.810
Similar number of values in the go, and it is having a more number of values in the middle values.

08:13.260 --> 08:16.950
So in this, the middle most values will have a very high.

08:19.300 --> 08:23.200
While in case of normal distribution, it is just a normal distribution.

08:23.770 --> 08:30.270
And this is Lidegaard, which has a very large number of values in their veins.

08:32.360 --> 08:36.910
And the little convict has very few values in the state.

08:40.300 --> 08:42.340
Now let us discuss about skewness.

08:43.580 --> 08:50.630
We saw this is the image of the skewness we had left skewed and we had the right skewed.

08:51.440 --> 08:53.360
So let us discuss about the skewness.

08:53.960 --> 09:00.650
So skewness refers to the distortion or asymmetry in a symmetrical vesicle or a normal distribution

09:00.650 --> 09:01.700
in a set of data.

09:02.000 --> 09:05.450
If the curve is shifted towards the left or to the right.

09:06.800 --> 09:14.150
In this sad twist to the stillness can be quantified as a representation of the extent to which a given

09:14.150 --> 09:16.670
distribution varies from the normal distribution.

09:16.940 --> 09:22.910
So we have a normal distribution and we want to see how much this normal distribution has been pulled

09:22.910 --> 09:24.890
towards the right or the left inside.

09:25.160 --> 09:29.240
We calculate this using skewness a normal distribution.

09:29.240 --> 09:30.800
We have skewness zero.

09:32.720 --> 09:40.310
So here we have negatively skewed, normal, skewed or positively skewed a normal distribution has no

09:40.310 --> 09:40.760
skew.

09:40.970 --> 09:45.110
So for this, the mean median and mode will be at the same value.

09:46.510 --> 09:49.060
And this is a perfectly symmetrical distribution.

09:49.840 --> 09:55.270
When we have the negatively skewed, it is skewed towards the negative towards the left.

09:56.080 --> 10:00.970
And here we have the positively skewed where it is skewed towards the right hand side.

10:01.630 --> 10:09.730
So when we have the negatively skewed distribution, so here the mode will be higher than the median

10:10.150 --> 10:13.390
and the median will be higher than the mean value.

10:14.310 --> 10:22.110
Why in case of positively skewed, the mean will be greater than the median and the median will be greater

10:22.110 --> 10:23.490
than the more divided.

10:30.590 --> 10:36.560
Now the next thing which we need to talk about is correlation and obedience.

10:37.280 --> 10:39.350
This is a very important topic.

10:40.360 --> 10:49.030
So both the measures they measure the relationship and the dependency between two variables.

10:49.900 --> 10:52.290
So let's see, we have two variables.

10:52.300 --> 10:56.500
One is age of a person and another is the income of the person.

10:57.190 --> 11:02.890
So in an ideal situation, as the age of the person will increase the.

11:03.860 --> 11:06.260
Income of the portion will also increase.

11:07.250 --> 11:14.150
So here there is a positive relationship between the age and the income.

11:14.900 --> 11:15.590
Why?

11:15.800 --> 11:17.240
When we talk about.

11:18.450 --> 11:26.460
The age and the health of the person as the age of the person will increase, the health will deteriorate.

11:27.450 --> 11:29.490
So what are these measures?

11:29.490 --> 11:31.230
Correlation and obedience.

11:31.500 --> 11:35.760
They measure the relationship and dependency between two variables.

11:35.760 --> 11:42.150
In the two variables are dependent on each other or if they are changing together or not.

11:42.870 --> 11:49.440
Covariance indicates the direction of a linear relationship between the variables, so equal medians

11:49.440 --> 11:55.440
will be giving if the values are changing towards positive direction or they are changing towards the

11:55.440 --> 11:55.880
negative.

11:55.890 --> 12:00.630
That action and correlation will give the direction also.

12:00.810 --> 12:04.710
And along with that it will also give the strength of the relationship.

12:05.040 --> 12:11.550
If the values are very highly dependent on each other and they are strongly changing together, or that

12:11.550 --> 12:17.370
is a very minor relationship between the change in the values dependent on each of other.

12:18.540 --> 12:26.820
So here you can see that we have these dual values and here the correlation coefficient has a values

12:26.820 --> 12:31.320
0.9996, which is very, very high value.

12:31.590 --> 12:36.510
So here there is a strong was too lenient correlation.

12:36.840 --> 12:41.100
So these values are changing together and they're changing very strongly.

12:42.260 --> 12:42.670
Okay.

12:42.860 --> 12:44.630
Positive that way.

12:44.630 --> 12:46.730
Here the value is negative.

12:46.730 --> 12:47.680
Z 2.19.

12:47.690 --> 12:52.640
So it is a strong negative negative because it has a negative sign.

12:52.910 --> 12:56.360
So when one value increases, the other value decreases.

12:57.390 --> 13:05.550
And here in strong positive relationship, when one value increases, the other value also increases.

13:06.660 --> 13:12.170
And if the value would have been close to zero, which means that the values I have a weak relationship

13:12.180 --> 13:12.600
with meaning.

13:13.350 --> 13:15.300
So if the value is lower.

13:16.710 --> 13:20.000
It is 0.64 and it is 2.99.

13:20.220 --> 13:23.460
So a lower value means that they have a weak relationship.

13:24.090 --> 13:31.260
Strong relationship is depicted by a higher value, close to one or close to minus one.

13:31.800 --> 13:40.230
While a weak value would be closer to 0.5 or something like if the value is very close to zero.

13:40.470 --> 13:44.640
This will mean that there is no linear relationship between these two values.

13:46.430 --> 13:52.820
So here at -0.58, it means that there is a negative relationship, but it is weak.

13:53.830 --> 13:57.700
Here the value the relationship value is 0.64.

13:57.700 --> 14:02.740
So the there is a negative, positive relationship, but it is a weak relationship.

14:04.630 --> 14:08.590
So this is about the descriptive statistics.

14:09.430 --> 14:09.760
Thank you.
