WEBVTT

00:02.290 --> 00:04.720
In this session, we will discuss a potential.

00:06.130 --> 00:10.160
So, first of all, let us discuss about the frequency distribution.

00:10.510 --> 00:12.610
What is the probability distribution?

00:13.650 --> 00:22.500
So the frequency distribution gives the exact frequency or the number of times a data point occurs,

00:23.160 --> 00:30.780
while the probability distribution gives the probability of occurrence of the given data point, when

00:30.780 --> 00:38.490
the first number of the best cases are large, the frequency distribution and probability distribution

00:38.610 --> 00:40.350
are similar in shape.

00:42.390 --> 00:47.230
So let us see what is the difference between frequency distribution and probability distribution.

00:48.000 --> 00:50.320
So let's say we have these data points.

00:51.000 --> 00:54.390
Now, these data points range from 10 to AP.

00:55.660 --> 01:07.100
And the frequency of these data point is two for them, three elevens, five to six turbines for four

01:07.300 --> 01:16.550
beings, zettl, 15 values, 14, 16 values, then 17 values and six 18 values.

01:17.410 --> 01:23.560
So if we want to find out and the total number of these values is 50.

01:24.500 --> 01:32.280
So there are in total 15 values and this is the number of times this particular value occurs.

01:32.840 --> 01:36.500
So 14 times 16 is occurring in the data.

01:36.620 --> 01:39.140
So 16 is the mode of the data.

01:40.010 --> 01:45.620
Similarly, we can find out the values and how often they occur and calculate the mean and median.

01:47.090 --> 01:55.670
When you want to find out the relative frequency related frequency, you will give the probability of

01:56.090 --> 01:57.290
the number occurring.

01:58.040 --> 01:59.450
So let us see.

01:59.600 --> 02:03.070
We want to find out the relative frequency of then occurring.

02:03.590 --> 02:08.660
So the relative frequency of then occurring will be two out of 50.

02:09.580 --> 02:17.770
So two times out of 50 then will be happening, three out of 50 times 11 will be a.

02:18.920 --> 02:28.280
Five times out of 50 will be the opponents of 12, similarly, 14 times out of the 16 will be occurring.

02:28.790 --> 02:33.440
So these are the frequencies it related to each other.

02:33.500 --> 02:40.430
So these are the number of frequencies where these data will be happening and the complete some of these

02:40.430 --> 02:43.250
relative frequencies comes out to be one.

02:43.900 --> 02:51.070
And when we find out when we divide these values, we actually get the probability of the value occurring.

02:51.860 --> 02:59.150
So the frequency actually helps us in driving the probability of the data occurring.

03:00.610 --> 03:09.940
So the probability of occurring is zero point zero, which means that there is a four percent chance

03:10.240 --> 03:11.800
of then occurring.

03:12.740 --> 03:21.080
When we randomly pick out the value from this particular population, similarly, there is a 12 percent

03:21.080 --> 03:23.870
chance of 13 occurring.

03:24.910 --> 03:33.720
Here there is a 28 percent chance of 16 occurring when we take a sample out of this data.

03:35.140 --> 03:42.880
So this is what the probability is and this is what the frequency is, and we can plot these together,

03:42.880 --> 03:46.460
frequency distribution and the probability distribution.

03:46.660 --> 03:53.960
So when we plot the frequency distribution, the area and the frequency distribution actually provides

03:53.960 --> 03:56.040
the probability of the value occurring.

04:03.020 --> 04:11.150
So here you can see this is the frequency distribution plot where the frequency of every value is given

04:11.870 --> 04:14.540
here, the frequency is almost Siedel here.

04:14.540 --> 04:19.460
The frequency used in frequency is 20 year, frequency is almost 22.

04:20.000 --> 04:26.480
And out of this, the probability distribution plot is created, which is this line of code which is

04:26.480 --> 04:26.800
given.

04:27.110 --> 04:30.750
And this will give the probability of a particular value of three.

04:30.950 --> 04:38.180
So if you want to find out the probability of one occurring, then we can find out the area below this

04:38.180 --> 04:46.030
particular area, below this particular area, and it will give the probability of one knackering.

04:46.250 --> 04:51.580
And similarly, we want to find out the probability of value occurring below 1.0.

04:51.860 --> 04:57.230
Then we can find out the area occurring below this particular area of Dakhil.

04:57.530 --> 04:59.750
Then it will give the probability of one point to work.

05:00.260 --> 05:05.510
Similarly, if you want to find out the probability of value occurring between and B, then we can find

05:05.510 --> 05:12.260
out the area below the Gulf between A and B, and it will give the probability of values B this between

05:12.260 --> 05:13.370
this and B.

05:14.560 --> 05:18.400
So this is what the distribution gulf will provide us.

05:19.760 --> 05:27.500
Now, what is the distribution of sample means now we are getting close to the central limit.

05:27.860 --> 05:30.890
Now we know what the frequency distribution is.

05:31.130 --> 05:33.580
We know what probability distribution is.

05:33.740 --> 05:37.400
So let's see what is the distribution of sample means.

05:38.030 --> 05:47.420
So the distribution of sample means is the collection of sample means for all the possible random samples

05:47.570 --> 05:49.400
of a particular size and.

05:50.930 --> 05:58.820
So what we are doing is we are creating a collection of some means, so we are collecting.

05:58.910 --> 06:00.970
We are creating a lot of samples.

06:01.160 --> 06:06.620
Let us say we create hundred samples from those hundred samples.

06:06.620 --> 06:09.200
We calculate different means.

06:09.440 --> 06:15.220
We take out different means, and then we block these means together.

06:16.070 --> 06:23.440
So we collect the sample means for all the possible random samples of a particular size.

06:23.450 --> 06:28.400
And so the samples which we will be taking from the population will have a common size.

06:28.580 --> 06:34.730
So the size will be seen and because the samples will be taken randomly, then the values in the samples

06:34.730 --> 06:35.480
will be different.

06:36.680 --> 06:43.010
So because the values of the samples will be different, the mean value will also be different for all

06:43.010 --> 06:43.880
the samples.

06:44.950 --> 06:51.730
So we can obtain these samples and the means from the population and then we applaud them together.

06:52.250 --> 07:00.400
Now it is not a distribution of scores, but the distribution of statistics because they are not taking

07:00.400 --> 07:02.540
the mean of the entire population.

07:02.770 --> 07:06.240
We are not taking values from the entire population.

07:06.460 --> 07:14.980
We are only plotting the needs of all this data means of different samples.

07:16.440 --> 07:24.420
Now, when we do this, what will happen is the distribution will be kind of in a normal form, the

07:24.430 --> 07:33.570
distribution will be normal in nature, life will be normal in nature, because when we take out samples,

07:33.750 --> 07:39.900
the samples will be ranging from, let's say we have a population and the values of the population range

07:39.910 --> 07:41.130
from one two hundred.

07:42.080 --> 07:49.780
So now when we take all the samples, the sample values will be randomly distributed from one to hundred.

07:51.070 --> 08:00.040
And they are taking the mean of these samples now, the mean of the sample will never be always the

08:00.040 --> 08:02.500
mean of the samples and things would be fafi.

08:02.500 --> 08:05.470
Sometimes it will be 51, sometimes it will be 52.

08:05.470 --> 08:06.980
Sometimes it will be 48.

08:07.000 --> 08:08.590
Sometimes it will be 42.

08:08.590 --> 08:16.330
To now, based on the sample and how the random sample is generated, those values will be more number

08:16.330 --> 08:24.700
of samples will have values closer to the center values, which is 50, and they will be very few samples

08:24.910 --> 08:30.250
for which somehow the mean value will be near the edges.

08:31.060 --> 08:38.280
So this is the reason why the gold which will be created from these machines will be in normal nature.

08:38.680 --> 08:46.330
So it will be almost perfectly normal if either the population from which the sample is drawn is normal

08:46.600 --> 08:50.170
or the number of sample is relatively large.

08:51.250 --> 08:58.030
That is be almost basically the size of the sample is large, so if the size of the sample is large,

08:58.240 --> 09:02.170
then the distribution has a mean that is equal to the population.

09:04.400 --> 09:08.620
So let's look at some properties of normal, Goebels.

09:09.140 --> 09:11.450
So what is the properties of the namiko?

09:11.810 --> 09:19.520
A normal distribution of esveld sheep, the mean median and mode will be equal and will be located in

09:19.520 --> 09:20.950
the center of the distribution.

09:21.560 --> 09:27.000
Then a normal distribution is unimportant, which is it will have a single peak.

09:27.380 --> 09:29.060
It will be symmetrical nature.

09:31.170 --> 09:37.880
Then the Gulf is above the mean, which is equivalent to saying that if this ship is same on the both

09:37.890 --> 09:38.680
side of the Gulf.

09:38.970 --> 09:44.970
So if we have this line passing, so the shape of the Gulf towards the left will be the same as the

09:45.000 --> 09:46.530
shape of the globe on the right.

09:47.590 --> 09:53.380
Then the goal was continuous, that is, the values will range from some value to another value, but

09:53.380 --> 09:55.390
there will not be any gaps in between.

09:57.170 --> 10:00.420
Then the Gulf will never touch the axis here.

10:00.440 --> 10:06.110
You can see this going endlessly beyond, but it is never touching the x axis here.

10:07.910 --> 10:14.670
So theoretically, no matter how far in either direction the globe extends, it never needs access.

10:16.330 --> 10:23.590
The total area under the normal distribution code is equal to one or two hundred, so the area and this

10:23.590 --> 10:29.320
entire gulf, which is the probability actually, so the probability of a value occurring under this

10:29.320 --> 10:30.040
Gulf will be.

10:31.510 --> 10:35.860
So it will be either it will range from zero to 100 or zero.

10:35.860 --> 10:42.250
The one will be the probability and the probability percentage will be zero to 100 percent.

10:42.730 --> 10:43.050
Right.

10:43.390 --> 10:47.680
So this is about the normal gulf and the area of the Gulf.

10:47.770 --> 10:50.320
Give the probability of the point falling under the sea.

10:50.950 --> 10:52.990
So if we have any point.

10:54.200 --> 11:00.540
Then the area under this particular point bin line will give the probability of that value occurring.

11:02.000 --> 11:08.240
Now, there are a few things about the Namiko, which is the area under the part of the normal Gulf

11:08.240 --> 11:10.400
that lies within one standard deviation.

11:10.910 --> 11:12.910
Now, this is the mean value.

11:12.950 --> 11:16.850
This is the mean median mode value, the central value.

11:17.910 --> 11:23.700
Now, this is one standard deviation away in the region and one standard deviation of it in the positive

11:23.700 --> 11:24.180
direction.

11:24.510 --> 11:29.790
So this is basically one standard deviation area of one standard deviation.

11:30.120 --> 11:36.270
This one standard deviation area contains approximately sixty eight percent of the.

11:38.090 --> 11:44.300
This one standard deviation bill contains approximately 60 percent of the data.

11:46.490 --> 11:49.070
Then the two standard deviation.

11:50.260 --> 11:57.850
This is the standard deviation area, it will contain almost 95 percent of the data and the three standard

11:57.850 --> 12:02.380
deviation will contain almost ninety nine point seven percent.

12:03.550 --> 12:12.660
This is the rule which is already defined for the normal golf or the baker, which is this is the percentage

12:12.670 --> 12:13.340
distribution.

12:13.810 --> 12:16.980
So how can we get help from this now?

12:17.050 --> 12:20.480
I think we have already discussed this example, but again, talking about.

12:21.040 --> 12:23.870
So we have this kind of distribution in the nonlegal.

12:24.040 --> 12:31.850
So what will happen is that let's say we want to find out the thought of the two percent people out

12:32.530 --> 12:39.340
of two point two percent, people who should be given a lease or should be given extra rewards for their

12:39.340 --> 12:39.820
hard work.

12:40.360 --> 12:41.630
So what will we do?

12:41.650 --> 12:44.880
We will find out later, see the dolphin things.

12:45.040 --> 12:50.800
And from the top ratings, we will find out the top two point, the weight person, people who are operating,

12:51.160 --> 12:54.170
and then we will be giving them a reward.

12:54.310 --> 12:57.610
So how we can do it, we can simply find out the mean value.

12:58.830 --> 13:01.020
Because we cannot really find this value, right?

13:01.040 --> 13:03.420
How will we know what is going to it?

13:04.110 --> 13:10.310
So what we can do is we can simplify the mean value and we can find out the standard deviation.

13:10.470 --> 13:18.060
And based on the mean value plus the standard deviation, we can declare that whoever is getting the

13:18.080 --> 13:20.580
rating above mean the standard deviation.

13:20.790 --> 13:21.890
We will give them whatever.

13:23.640 --> 13:30.030
Or let's say someone who has everything to you mean minus two standard deviation, we will give them

13:30.030 --> 13:33.710
additional training so that they perform better next quarter.

13:34.770 --> 13:36.450
They actually can do something like the.

13:38.140 --> 13:40.740
So this is the population distribution.

13:40.870 --> 13:47.620
So this is the normal population distribution curve, and for this particular article, the mean is

13:47.800 --> 13:51.120
zero point one, the meanness zero point one.

13:51.940 --> 13:55.600
This is the distribution and the mean value is zero point one to.

13:56.840 --> 14:05.990
The standard deviation is twenty one 06, this is the standard deviation for this block, we have mean

14:05.990 --> 14:10.070
as zero point one two and standard deviation as twenty point zero six.

14:11.170 --> 14:17.370
Now, what we are doing is we are taking out several samples, so we will take, I would say, 50,

14:17.380 --> 14:23.760
100 or 200 or 500 samples out of this of this population data.

14:24.640 --> 14:28.010
Out of this entire data, we will take out different samples.

14:28.300 --> 14:35.190
Now we are taking out samples so the values will anyhow range from this minus two plus hundred.

14:35.710 --> 14:40.040
OK, but we need to find out some sample values out of it.

14:40.300 --> 14:44.020
So what we are doing is we are taking out a few samples.

14:44.350 --> 14:48.150
So the values of the samples, you can see the distributions are different.

14:49.850 --> 14:56.500
See this, there are different distributions for the sampas, they're not exactly the same samples.

14:57.290 --> 15:03.920
So what happens is the standard deviation is also slightly different for the samples here.

15:03.930 --> 15:06.470
The standard deviation is nineteen point something.

15:06.470 --> 15:09.700
Here, the standard deviation is eighteen point something and so on.

15:09.950 --> 15:13.580
So different samples will have different standard deviations.

15:14.730 --> 15:19.890
Similarly, these samples which we have generated will have some mean value.

15:20.910 --> 15:26.580
Now, this means that you will also be different here, the mean value is minus zero point one eight

15:26.580 --> 15:30.650
one here, the mean value is minus one point seventy seven.

15:30.930 --> 15:33.740
Here, the money will mean value is zero point three.

15:33.930 --> 15:36.390
Here, the mean values, minus zero point zero four.

15:36.660 --> 15:38.980
So these mean values are also different.

15:39.810 --> 15:48.240
So what happens is when we take five hundred or two hundred or any number of such samples, then we

15:48.240 --> 15:49.530
create the.

15:51.950 --> 15:59.680
We create the sampling distribution out of it, which is the distribution of the sample means so here

15:59.690 --> 16:04.910
what we have done is we have created a block of all the means.

16:05.950 --> 16:13.300
Of the samples that we have obtained, so here we have certain mean values, so we have only plotted

16:13.300 --> 16:17.230
these export values in this particular judge you.

16:18.570 --> 16:22.080
OK, so now when we have plotted this.

16:23.070 --> 16:24.880
Sample means here.

16:25.430 --> 16:29.100
Now, this is where central to them comes in.

16:30.820 --> 16:37.930
So what we have done enough, we had one population out of this population, we have taken out different

16:37.930 --> 16:44.700
samples, the sample size, we are expecting to be more than 30 or at least 30 so that it is a good

16:44.710 --> 16:46.540
representative of the population.

16:47.540 --> 16:54.350
And they have created several samples from these samples, we have calculated the mean value of each

16:54.350 --> 16:58.910
and every something so mean value of each sample comes out to the buttocks.

16:58.940 --> 17:02.660
But we have, let's say, one hundred or one, two, one hundred samples.

17:02.660 --> 17:05.090
And we have the economics blood from these samples.

17:05.780 --> 17:09.440
Now, from the samples, we have generated a lot.

17:10.250 --> 17:18.750
And this is actually a lot of the mean values, which is the X values from these samples.

17:19.100 --> 17:24.950
So this is a lot of different values from different samples of what we created.

17:24.960 --> 17:29.060
We needed a distribution means one 60 random samples.

17:29.060 --> 17:35.090
We have taken 160 random samples and taken out the mean value and plotted them here.

17:35.690 --> 17:38.890
Now each consists of one hundred and three observations.

17:39.140 --> 17:42.230
So the sample size here is one hundred and three.

17:42.530 --> 17:44.050
That is the end value.

17:44.180 --> 17:46.940
The end value here is one hundred and three.

17:47.510 --> 17:50.680
And the number of samples is 160.

17:51.230 --> 17:55.910
Now, from this sampling distribution, this distribution of means.

17:57.530 --> 18:00.890
The distribution of means is also already sampling distribution.

18:01.160 --> 18:08.290
So from this sampling distribution, the mean is zero point zero eight.

18:09.770 --> 18:16.040
The calculated mean of this sampling distribution is zero point zero eight letters compared with the

18:16.040 --> 18:17.050
population mean.

18:17.950 --> 18:20.890
Here, the mean is zero point one to.

18:22.680 --> 18:26.010
And the the standard deviation is twenty point zero six.

18:27.380 --> 18:31.360
Here, the standard error is one point seven four.

18:31.880 --> 18:32.580
Now let us see.

18:33.800 --> 18:34.250
So.

18:36.180 --> 18:38.970
The central limit theorem states that.

18:40.020 --> 18:42.780
The distribution of sample mean.

18:43.990 --> 18:46.910
The sampling distribution should be nearly normal.

18:47.860 --> 18:52.840
But just through the sampling distribution, which we have created, which is the distribution of sample

18:52.840 --> 18:55.000
means, is normal.

18:57.970 --> 19:04.240
And the mean of the sampling distribution should be approximately equal to the population.

19:05.110 --> 19:06.820
That is the mean of the sampling.

19:06.820 --> 19:11.500
Distribution should be approximately equal to the population me, which is zero point one.

19:11.720 --> 19:12.520
So what is it?

19:15.450 --> 19:19.010
The mean of the sampling distribution is zero point zero it.

19:20.290 --> 19:26.200
And the mean of the population is zero point one, which is almost similar.

19:26.350 --> 19:28.360
The mean is almost close to each other.

19:30.130 --> 19:36.970
Now, the standard error is the standard deviation of the sampling me, the standard error which has

19:36.970 --> 19:41.700
been calculated, is actually the standard, the standard deviation of the sampling distribution.

19:42.980 --> 19:46.790
Standard error is the standard deviation of the sampling distribution.

19:47.030 --> 19:48.480
So what does that signify?

19:48.860 --> 19:57.920
This standard error is actually equal to the standard deviation of the population divided by the square

19:57.920 --> 20:01.030
root of the size of the sample.

20:01.910 --> 20:05.690
So twenty point zero six, which is the.

20:07.440 --> 20:09.300
Standard deviation of the population.

20:11.080 --> 20:14.770
Twenty point zero six, Bob, standard deviation of the population.

20:18.510 --> 20:22.470
And here we have the square root of one zero three, which is the.

20:23.990 --> 20:27.380
One hundred and three observations, so Vera.

20:28.780 --> 20:29.650
110.

20:32.010 --> 20:35.430
Today is the end, which is the size of each sample.

20:36.840 --> 20:39.170
So this gives us one point nineteen.

20:41.160 --> 20:41.730
Which is.

20:42.750 --> 20:48.150
One point seven four, so this is also equivalent, so this is what we are doing from this.

20:49.230 --> 20:50.430
Send a to.

20:51.590 --> 20:55.460
So we can find out the characteristics of the population.

20:57.010 --> 21:02.950
From the sampling distribution, from the sampling distribution, we can create different sampling distribution

21:03.190 --> 21:05.810
and find out the characteristics of the population here.

21:06.100 --> 21:12.400
So this is what central imitative is, which is that the distribution of the sample mean should be nearly

21:12.400 --> 21:12.890
normal.

21:12.890 --> 21:18.790
In the mean of the sampling, distribution should be approximately equal to the population mean.

21:19.480 --> 21:25.960
That is the mean of the sampling distribution of this sampling distribution will be almost equal to

21:25.960 --> 21:27.910
the mean of the population distribution.

21:28.960 --> 21:36.550
And the standard error, which is the standard deviation of the sampling distribution, will be equivalent

21:36.550 --> 21:37.000
to the.

21:37.990 --> 21:46.300
Standard deviation of the population distribution divided by the square root of the sample size, the

21:46.300 --> 21:48.900
number of observations in each sample.

21:50.300 --> 21:53.840
So this is what we have learned from the central limit.