WEBVTT

00:01.050 --> 00:02.380
Hello, dear.

00:02.430 --> 00:06.300
Now, we have discussed about descriptive statistics.

00:07.340 --> 00:15.320
And descriptive statistics, we have learned about the measures of central tendency like mean median

00:15.500 --> 00:25.280
mode, we have talked about the different type of distributions that we have and how the distributions

00:25.280 --> 00:31.160
vary, the respect of the dosis which is present in those.

00:32.270 --> 00:35.300
Lord, or with respect to the Skewness.

00:36.850 --> 00:45.580
Apart from that, we have discussed about the measures of offspring, which is ratings on standard deviation.

00:46.630 --> 00:54.910
Now, all of these are part of descriptive statistics, which basically provide a summary of the data.

00:56.730 --> 01:05.220
But usually what we will be working with is not the descriptive statistics, but with the inferential

01:05.220 --> 01:06.040
statistics.

01:06.900 --> 01:10.190
So let us look at what inferential statistics it's.

01:14.890 --> 01:25.120
So influential statistics consists of making inferences from samples to populations, hypothesis testing,

01:25.300 --> 01:31.900
data mining, relationship among variables and making predictions, which is based on the probability

01:31.900 --> 01:32.320
theory.

01:34.010 --> 01:40.640
In case of descriptive statistics, we organize the data and we summarize the data.

01:41.990 --> 01:50.900
In case of inferential statistics, we will be taking up a sample and based on the sample, we will

01:50.900 --> 02:00.590
make different inferences regarding the population data because usually we will not be able to find

02:00.590 --> 02:03.770
out everything about the population.

02:04.550 --> 02:11.870
Finding out the measures regarding the population is a very tedious task.

02:13.020 --> 02:18.570
For example, we are talking about salaries of all the people in the world.

02:19.790 --> 02:26.180
Now, calculating the salaries for all the people in the world will take a lot of pain.

02:27.250 --> 02:35.110
So instead of calculating and summing up the salaries of all people of the world, what we do is we

02:35.110 --> 02:39.250
take small samples of people from all over the world.

02:39.430 --> 02:47.980
And then based on these samples, the light of day, we try to find out these salaries, which would

02:47.980 --> 02:50.660
be belonging to the all the people of the world.

02:51.220 --> 02:53.680
So this is what inferential statistics is.

02:54.250 --> 03:00.790
We will have some sample data and based on the sample that we will try to find out about the population.

03:01.880 --> 03:09.440
Again, we will try to compare different samples and find out if they belong to the population or not

03:09.620 --> 03:14.010
or if there is some relationship between these two samples or not.

03:14.180 --> 03:18.920
So these kind of things, which when we do those are called inferential statistics.

03:21.610 --> 03:30.400
So why do we need this so descriptive statistics describes the data, why the influential statistics

03:30.400 --> 03:38.440
allows us to make predictions, that is inferences from this data which we have now with inferential

03:38.440 --> 03:39.250
statistics.

03:39.400 --> 03:46.690
You can dig the data from samples and generalize about the population, which is we can the mean of

03:46.690 --> 03:51.830
the sample and then try to find out what would be the mean of the population itself.

03:52.090 --> 03:56.410
For example, I have a bit of a sample of people who are diabetic.

03:56.770 --> 04:05.590
So from their diabetes level, I can find out about the diabetes level of people all around the population.

04:08.150 --> 04:09.650
So why do we need this?

04:11.130 --> 04:19.680
Inferential statistics provides a view of going from a sample to a population, inflating the barometer's

04:19.680 --> 04:23.410
of a population from the data on the statistics of the sample.

04:23.700 --> 04:30.630
So based on the statistics which are belonging to sample, we are trying to find out the bottom of the

04:30.630 --> 04:31.880
entire population.

04:34.200 --> 04:41.960
Now, it is usually necessary for a researcher to work with samples rather than the whole population,

04:42.750 --> 04:50.910
but one difficulty is that a sample is generally not identical to the population which it comes from.

04:51.690 --> 04:59.280
So it is difficult to find out that the population sample has to be a part of the population completely.

04:59.580 --> 05:04.950
We need to make sure that the sample is identical to the population and.

05:06.050 --> 05:09.050
The sample actually represents the population.

05:09.800 --> 05:14.030
Now, another difficulty is that no samples are the same.

05:14.300 --> 05:19.840
When we are taking some random values from a population, the values tend to differ.

05:20.180 --> 05:23.850
We cannot always get the same sample from the population.

05:24.470 --> 05:28.460
So how can we know which is the best described sample?

05:28.460 --> 05:31.320
Which sample will actually describe the population?

05:31.880 --> 05:38.750
So this is why we need rules which will actually be able to relate the samples to the population.

05:41.630 --> 05:48.560
Now, what a different estimation techniques, how do we actually estimate these values, so there are

05:48.560 --> 05:57.230
two different approaches for estimating, one being the point destination and another being the interval

05:57.230 --> 05:57.920
estimation.

05:59.720 --> 06:07.580
The point dimension, we give one value for a characteristic which is hopefully to lose to the unknown

06:07.580 --> 06:07.990
value.

06:08.660 --> 06:15.830
So in case of point destination, what happens is it will try to provide a value for me or it will try

06:15.830 --> 06:18.920
to provide a value of a standard deviation.

06:19.640 --> 06:25.610
And based on the standard deviation, we are not really sure if the standard deviation of the sample

06:25.790 --> 06:32.180
will actually be the standard deviation of the population or the mean which we have calculated for the

06:32.180 --> 06:40.700
sample will actually be the mean of the population because the samples are generated randomly and because

06:40.700 --> 06:44.750
these samples are generated randomly, the values in the samples would differ.

06:44.990 --> 06:47.370
So the mean will each and every day.

06:48.110 --> 06:55.940
So we need to find out a value which would be exactly similar to the population or which is as close

06:55.940 --> 06:57.500
as possible to the population.

06:59.110 --> 07:07.090
So we cannot actually expect to find the precise value describing the population then only using date

07:07.100 --> 07:07.850
of the sample.

07:08.230 --> 07:15.820
So when we are finding out some point estimations from the sample, we can not be completely sure that

07:16.090 --> 07:21.460
the sample mean, which we have calculated will always be equal to the population.

07:22.210 --> 07:27.970
This is something which we can very easily see because we are not sure how the sample has been generated

07:27.970 --> 07:31.700
and if this sample actually represent the population.

07:31.840 --> 07:35.770
So we need to calculate and find out several samples.

07:35.950 --> 07:43.780
And from those several samples, we will have to find out the mean which will actually justify the population.

07:43.790 --> 07:44.020
We.

07:45.170 --> 07:50.210
So we cannot see about a population from just one single sample.

07:50.900 --> 07:52.170
So what do we do now?

07:53.310 --> 08:00.220
The other method which we have is interval estimation now, what does interval estimation, though,

08:00.300 --> 08:00.630
do?

08:01.080 --> 08:04.360
It gives an interval of likely values.

08:04.530 --> 08:12.750
So instead of seeing that the mean is 20, it will see that the mean value for the population lies between

08:12.960 --> 08:16.120
18 and 22.

08:16.410 --> 08:25.180
And I am 90 percent sure that it will be correct on I am 95 percent sure that this value is always good.

08:25.770 --> 08:32.820
So what will happen is what I'm trying to say here is that when I say 95 percent confident that I'm

08:32.820 --> 08:41.360
95 percent confident that the population mean is 20 or or the meanness between 18 and 22.

08:42.240 --> 08:52.800
So this means that if we do these sampling hundred times, then out of those hundred samples, 95 percent

08:52.800 --> 08:57.580
of the times, the mean value will lie between 18 and 22.

08:58.580 --> 09:03.850
This is what this in Doel estimation will actually allow us to.

09:05.830 --> 09:15.250
So it will give you an interval of likely values where the width of the interval, which is from 18

09:15.250 --> 09:15.890
to 20.

09:15.910 --> 09:18.160
So here the word this four.

09:19.550 --> 09:26.000
So the width of the interval will depend on the confidence to be required to have in this in the.

09:26.690 --> 09:30.020
So this will be basically dependent on how much confidence.

09:30.890 --> 09:34.350
So what will happen if I am very, very, very confident?

09:35.060 --> 09:44.090
So if I am saying that I'm 99 percent confident, so I will have to have a larger window to be 99 percent

09:44.090 --> 09:44.630
confident.

09:44.630 --> 09:52.440
I will have to have something like I'm 99 percent sure of the values line between fifteen to twenty

09:52.490 --> 09:52.810
five.

09:53.420 --> 10:01.850
So in that case, I can be 99 percent sure now for being 95 percent sure the window will be because

10:01.850 --> 10:04.690
I am decreasing the window now.

10:04.910 --> 10:13.670
So by decreasing the window of the estimation, I will be a little less sure because I have to be very

10:13.670 --> 10:14.470
precise here.

10:14.900 --> 10:22.370
So if I have something like 18 to 22, so now I can be only 95 percent sure.

10:23.790 --> 10:32.280
And then when I'm seeing my mean value is between 19 and 21, so in that case, I am only 90 percent

10:32.280 --> 10:33.200
sure about this.

10:33.600 --> 10:36.600
So this is what in the world estimation will provide us.

10:36.780 --> 10:41.530
The smaller the window we are trying to get, the less of the confidence we.

10:42.880 --> 10:49.570
So that is what usually happens, but we will actually see how we will derive this confidence, how

10:49.570 --> 10:55.750
we will be driving the values and the intervals, so all these things are what we will be learning in

10:55.750 --> 10:59.230
these estimation techniques and information statistics.

11:03.190 --> 11:09.250
So one thing to remember is that statistics never proves anything.

11:10.260 --> 11:23.270
So with statistics, I can see that of all the people who are drinking coffee will actually have insomnia,

11:24.060 --> 11:26.030
so I cannot say something like that.

11:26.370 --> 11:31.180
So it will not give me a causal relationship relationship.

11:31.380 --> 11:32.780
It will not prove anything.

11:32.790 --> 11:35.910
It will just indicate a relationship that increases.

11:36.180 --> 11:38.760
The intake of coffee is high.

11:38.910 --> 11:41.280
Then the chances of insomnia is also high.

11:41.430 --> 11:49.130
But it will not to I cannot see that likely that insomnia is caused by Heigl's.

11:50.470 --> 11:51.620
High amount of coffee.

11:52.890 --> 11:59.850
So you see the difference between this, when I see it will never prove prove anything.

12:00.000 --> 12:03.420
So here I am saying that it will not give a causal relationship.

12:03.870 --> 12:09.640
So an association does not necessarily negate US shortfalls, effect relationship.

12:09.900 --> 12:17.100
So this means that I'm trying to say I cannot see that golf will cause insomnia, but I can only say

12:17.100 --> 12:21.780
using statistics that coffee and insomnia go hand in hand.

12:22.620 --> 12:24.000
So that is something which I consider.

12:25.570 --> 12:34.090
Now, statistics can always be wrong, however, there are things that researchers can do to improve

12:34.090 --> 12:40.090
the likelihood that the statistical analysis is correctly identifying a relationship between the lead.

12:40.450 --> 12:47.680
So here, what we will be doing is we will be trying to find out and trying to make as correct predictions

12:47.680 --> 12:48.570
as possible.

12:48.760 --> 12:50.930
But we are not really sure about that.

12:50.950 --> 12:56.580
We cannot be 100 percent sure about what we will try to be as precise as possible.

13:00.040 --> 13:03.190
So we will learn about video tapes in the next session.