WEBVTT

00:01.420 --> 00:04.840
In this session, we will discuss about hypothesis testing.

00:07.290 --> 00:15.630
They know we have discussed about a normal distribution and we have discussed how we can actually locate

00:15.630 --> 00:23.520
a particular point on a normal distribution and find out the probability of that point occurring and

00:23.730 --> 00:32.550
the probability of anything occurring below a particular point or a percentage of a value of cutting

00:32.800 --> 00:35.970
percentage of values occurring below a particular point.

00:36.210 --> 00:41.640
And we found out how we can find out the confidence interval at a particular confidence level.

00:43.370 --> 00:48.470
Now, hypothesis testing is defined in Boredoms.

00:49.560 --> 00:57.300
So in case of hypothesis testing, we will have to doms one will be the null hypothesis and another

00:57.450 --> 01:00.120
will be the alternate hypothesis.

01:02.150 --> 01:08.360
The null hypothesis will be the sample statistic will be equal to the.

01:09.370 --> 01:10.840
Population cycliste.

01:12.000 --> 01:19.890
So, for example, the null hypothesis, for one example, would be that the average marks after extra

01:19.890 --> 01:23.390
class are same as before the class.

01:23.550 --> 01:25.380
So let's say we have a problem.

01:25.380 --> 01:33.630
We have a situation where we have certain students and these students are being given some sessions,

01:33.660 --> 01:38.420
some extra some classes, and they obtain some marks.

01:38.490 --> 01:41.330
The students are scoring some months after an exam.

01:42.830 --> 01:51.320
So we're trying to come forward if the students will score just the same after some extra classes are

01:51.320 --> 01:52.840
given to the students or not.

01:53.710 --> 02:02.680
OK, so the null hypothesis will be that the student will remain just the same, even after we provide

02:02.680 --> 02:03.960
an extra cost to the string.

02:04.210 --> 02:07.350
So the treatment which we are providing here.

02:07.510 --> 02:14.450
So the null hypothesis always says that we are providing some treatment to the sample.

02:14.740 --> 02:16.830
We are doing something to the sample.

02:17.050 --> 02:20.470
But still, the sample does not show any response.

02:21.220 --> 02:23.760
The sample does not show any change.

02:24.340 --> 02:28.060
So the treatment here is we are providing an extra class.

02:28.300 --> 02:29.410
Still the.

02:30.450 --> 02:36.480
Metric, which we are checking, the value which we are checking, that is the average mass, that average

02:36.480 --> 02:38.490
marks remain just the same.

02:40.010 --> 02:47.870
So this is the null hypothesis, null hypothesis is that whatever we were seeing was happening earlier

02:47.870 --> 02:55.330
before a particular there was a plane that is still applicable even after providing a.

02:57.090 --> 03:07.230
And alternative hypothesis is that even after we have provided an extra class, then there will be a

03:07.380 --> 03:16.200
significant difference of the marks, the average marks in comparison to the marks which were given

03:16.680 --> 03:17.980
before the extra class.

03:18.780 --> 03:27.120
So basically, we are trying to say here is the null hypothesis states that something is just the same

03:27.120 --> 03:29.580
even after a treatment has been provided.

03:29.940 --> 03:34.470
An alternate hypothesis is that there will be a significantly different.

03:36.940 --> 03:40.300
Now, when we say significant difference.

03:41.420 --> 03:50.180
So if I am saying that initially the students marks were to see 15 marks out of 20, so initially students

03:50.180 --> 03:55.040
were scoring 15 months and now students are scoring 16.

03:56.000 --> 04:04.610
So this 16 is not a significant difference, change of a very small value is not a significant difference.

04:04.940 --> 04:13.080
This difference can anyhow be present with the change in the sample or with the confidence interval.

04:13.160 --> 04:19.160
The change we might need, even if we have a confidence in double the value, could be present in the

04:19.160 --> 04:20.250
confidence interval.

04:20.480 --> 04:23.270
So a small change is not a significant change.

04:23.490 --> 04:31.260
A significant change will be if the confidence interval entirely changes, if we are able to say that

04:31.260 --> 04:37.400
in the population with the sample was earlier, representing is now not the same population.

04:37.610 --> 04:45.850
The new machine which we have the new value of the average mean is now let's say 18, so now or 19.

04:45.950 --> 04:49.490
So now the population itself has changed.

04:49.700 --> 04:55.130
So now the population mean has also changed after the treatment which we have provided.

04:56.640 --> 05:02.370
So this is the difference between null hypothesis, an alternate hypothesis that is null hypothesis

05:02.520 --> 05:06.840
that I used to negate a treatment, that the treatment has not impacted anything.

05:07.080 --> 05:12.120
While the alternate hypothesis is to say that the treatment has actually made a significant.

05:14.630 --> 05:22.130
Now, let us look for the how hypothesis distinguished, so hypothesis testing is an infringement procedure

05:22.310 --> 05:27.960
that uses the sample data to evaluate the credibility of hypotheses about a population.

05:28.610 --> 05:35.930
So in this scenario, we will be, first of all, Stoebe, a particular statement and then we will try

05:35.930 --> 05:40.220
to evaluate if that particular statement is correct or not.

05:41.000 --> 05:46.100
Now, remember why we are creating this and why we are evaluating this.

05:46.490 --> 05:49.270
We cannot see a hypothesis is correct.

05:49.700 --> 05:52.760
We can only reject the null hypothesis.

05:52.940 --> 05:59.120
That is, we can only say that the statement is incorrect on the basis of evidence.

05:59.270 --> 06:02.800
We cannot say that the statement is correct in this scenario.

06:03.600 --> 06:09.980
Now, hypothesis is an educated guess about something in the world around you.

06:10.580 --> 06:15.470
It should be testable in nature, either by experiment or by observation.

06:16.480 --> 06:23.710
For example, a new medicine that you think might work or a way of teaching, you might be better on

06:23.710 --> 06:27.320
a possible location of a new species or legacy.

06:27.340 --> 06:33.180
We want to find out if a particular way of we introduce music is not learning.

06:33.370 --> 06:36.870
And we introduce some music in the videos of the beginning.

06:37.060 --> 06:40.290
And will that improve your learning or not?

06:41.230 --> 06:49.690
So obviously, to administer the test or anything which we can put on this can be of hypothesis.

06:58.360 --> 07:06.910
Now, what is hypothesised statement now is we I'm going to propose a hypothesis, it is important to

07:06.910 --> 07:08.130
write the statement.

07:08.980 --> 07:18.330
The statement will look like if I do something, then something may happen to the dependent variable.

07:18.520 --> 07:24.790
That is, if we do something to an independent variable, then something will happen to the dependent

07:24.790 --> 07:25.360
variable.

07:26.600 --> 07:32.780
For example, if I decrease the amount of water given to the hops, then the hopes will increase in

07:32.780 --> 07:33.170
size.

07:34.310 --> 07:43.310
So here the amount of water is the independent variable, while the size of the hub is the dependent

07:43.310 --> 07:47.910
variable, which is increasing with the amount of water which we are giving to it.

07:48.740 --> 07:55.040
Similarly, if I give patients counselling in addition to the medication, then their overall depression

07:55.040 --> 07:56.480
scale will decrease.

07:56.690 --> 08:03.140
Or if I give exam at noon instead of seven, then the students that could be improved so we can have

08:03.140 --> 08:05.870
these kind of different statements created.

08:07.010 --> 08:16.550
Now, a good hypothesis when we always have a if and then statement, and it will include both independent

08:16.550 --> 08:23.810
and dependent variable, and this has to be testable by experiment or survey or any other scientifically

08:23.810 --> 08:24.170
sound.

08:27.250 --> 08:30.550
Now, here we have seen this go before.

08:31.530 --> 08:32.850
Now, we used to.

08:34.240 --> 08:42.900
Analyze this particular plot for finding out the critical region or for finding out the confidence interval,

08:43.660 --> 08:52.500
but now we will be using this to test a hypothesis so we will have a particular significance level.

08:53.810 --> 08:59.660
This significance level will see how significant the value is.

09:00.680 --> 09:03.030
Different from the original value?

09:03.050 --> 09:07.500
How much significant difference is present in the value and the original value?

09:07.910 --> 09:12.490
So in this scene, what happens is we will have a mean value.

09:12.500 --> 09:16.280
We will have some value which is present inside this gold.

09:16.340 --> 09:19.790
We will have some value which is present in this distribution.

09:20.950 --> 09:28.000
So let us say we want to say that we have a student, we have those group of students who were learning

09:28.220 --> 09:31.660
was studying for an average three hours.

09:32.860 --> 09:38.410
So the average three hours that the students were learning, where was the mean value?

09:39.630 --> 09:45.530
So are student loans for three hours in average, so the mean value becomes three here.

09:47.310 --> 09:56.460
Now we want to find out if the change in the learning pattern, if we want to change the learning pattern

09:56.460 --> 10:04.860
of those students and introduce us to a scenario where we will train students by converting the lectures

10:04.860 --> 10:10.740
into music, so we will convert the content of the lectures into a music form.

10:10.920 --> 10:17.610
And then we want to check if the average learning time of the students, which they learn continuously

10:17.610 --> 10:18.570
changes or not.

10:19.290 --> 10:27.140
So what we can do is we can we will have the original data where the mean of the average learning time

10:27.150 --> 10:31.260
will be three years and we provide treatment.

10:31.530 --> 10:37.830
But just we change the learning process by introducing music in the lectures.

10:38.140 --> 10:48.030
And we check that if the mean of the new students, the new mean of the learning time of the students

10:48.240 --> 10:49.350
has changed or not.

10:50.070 --> 10:51.810
So now let's see.

10:51.990 --> 10:56.430
The new game is actually three point five hours now.

10:56.430 --> 11:01.500
The student loan for three point five hours instead of three years.

11:02.990 --> 11:03.440
So.

11:04.410 --> 11:13.260
How we accept and how we reject this hypothesis is actually depending on if this three point five hours

11:13.410 --> 11:17.340
actually falls in this significant region or not.

11:18.490 --> 11:20.740
And this, again, is defined by the.

11:21.650 --> 11:27.770
Confidence which we want to have by the significant difference which we want to have, the full value

11:27.770 --> 11:28.820
which you want to set.

11:29.770 --> 11:36.940
So if the percentage value we want to have a zero point zero five, then it should fall in this particular

11:36.940 --> 11:37.350
region.

11:37.720 --> 11:44.830
If we want to have the confidence of zero point zero one info value, then it will fall in this particular

11:44.830 --> 11:45.240
region.

11:45.460 --> 11:52.230
And if you want to have even less in confidence, then it will fall in this region.

11:53.980 --> 11:56.980
So based on that, we will have different values.

12:00.720 --> 12:01.980
So let us check for the.

12:03.520 --> 12:10.510
Now, the general structure of hypothesis testing, so all this hypothesis testing will have null hypothesis,

12:10.510 --> 12:17.680
alternate hypotheses on the data, which we will have, the null hypothesis again will be a statement

12:17.860 --> 12:21.900
about the population barometer, which we want to prove is wrong.

12:22.300 --> 12:28.870
And alternate hypothesis is a general statement about the population barometer, which is opposed to

12:29.170 --> 12:31.090
this null hypothesis.

12:31.330 --> 12:37.120
So the null hypothesis and the alternate hypothesis have to be opposite statements.

12:37.390 --> 12:45.340
These statements should be contradicting in nature and null hypothesis is something which we want to

12:45.340 --> 12:49.090
actually evaluate and we want to prove wrong.

12:51.080 --> 12:59.780
So we want to reject the null hypothesis, which is why we want to create this null hypothesis as something

12:59.780 --> 13:06.200
which we want to prove wrong, an alternate hypothesis will always be opposite to the null hypothesis.

13:08.880 --> 13:17.340
So now check for so let us see notice clearly the null hypothesis, so we have a question that is only

13:17.370 --> 13:25.230
certain things that if knee surgery patients go to a physical therapy twice a week instead of three

13:25.230 --> 13:28.580
times, their recovery period will be longer.

13:29.310 --> 13:33.970
Average recovery time for a knee surgery patient is eight point two weeks.

13:34.620 --> 13:36.750
So this is what a researcher thinks.

13:36.960 --> 13:44.040
A researcher thinks that if someone will go for the therapy for twice a week instead of three times,

13:44.190 --> 13:46.170
then the recovery period will be longer.

13:47.180 --> 13:50.270
So they want to prove this wrong.

13:50.480 --> 13:58.130
So if the researcher is wrong, then the recovery time is less than or equal to eight point two weeks.

13:59.730 --> 14:08.610
So the researcher wants to say that the surgery period and the physical therapy should be reduced to

14:08.610 --> 14:15.450
two weeks, if the therapy period is two weeks, then that if the repeated would be longer.

14:15.840 --> 14:22.120
So what we want to say here is we will create a null hypothesis and we will try to prove the research

14:22.120 --> 14:28.290
done wrong here to actually evaluate in the world the researcher thinks is correct or not solve the

14:28.290 --> 14:32.520
evil that I do oppose this statement by the researchers.

14:32.910 --> 14:40.320
So the researchers say that if knee surgery patients go for physical therapy twice a week, then the

14:40.320 --> 14:41.890
recovery period will be longer.

14:41.910 --> 14:49.790
So the opposite of this will be that the recovery period will be less than or equal to eight one two

14:49.800 --> 14:50.130
weeks.

14:50.670 --> 14:53.190
So this is what the null hypothesis will be.

14:54.420 --> 15:01.560
A null hypothesis will state the opposite of what that is such a thing so that we can actually reject

15:01.980 --> 15:03.370
the hypothesis.

15:03.630 --> 15:09.390
So we are seeing that the recovery time is less than or equal to the eight point two weeks.

15:10.360 --> 15:16.180
And the alternative would be completely opposite to the state doing so opposite of this statement would

15:16.180 --> 15:17.040
be here.

15:17.140 --> 15:20.490
The statement is new is less than equal to eight point two weeks.

15:20.740 --> 15:24.840
So the opposite of this will be new, will be greater than eight point two weeks.

15:25.120 --> 15:28.460
So the average recovery time is more than eight point two weeks.

15:29.230 --> 15:34.780
So this is what the null hypothesis and alternate hypothesis looks like.

15:36.770 --> 15:44.560
So what we want to do here is we want to check if the researcher is right or wrong.

15:45.590 --> 15:54.620
So using hypothesis testing, we can only prove if the researcher is wrong or not, we cannot see that

15:54.650 --> 15:58.830
the decision is correct, but we can only say that the decision is wrong.

15:59.690 --> 16:03.320
So for that, we evaluate this null hypothesis.

16:03.650 --> 16:06.750
We see that the researcher is actually wrong.

16:06.770 --> 16:10.000
So we created this statement that research is wrong.

16:10.520 --> 16:15.230
So we can either reject this or accept this.

16:16.660 --> 16:19.330
So this is what null hypothesis will allow us to.

16:19.840 --> 16:29.470
Now let us look further and create more statements so we have another study of the status of critically

16:29.470 --> 16:37.150
ill children in a pediatric intensive care unit and then examine 16 children.

16:37.390 --> 16:40.170
So the number of children examined are 16 here.

16:40.780 --> 16:48.100
And these children are with a woman and B and they were found that the number of missing 132 was one

16:48.100 --> 16:52.000
point and the standard deviation was of one point nine.

16:53.950 --> 17:01.270
Now, the extensive analysis has established that the number of such teeth is wider population of children

17:01.570 --> 17:03.910
is one point four, so.

17:05.220 --> 17:14.550
For example, we have the mean Value 1.0 and the standard deviation of one point nine and the.

17:15.550 --> 17:24.280
Actual value is one point four, so we want to check if the mean value for the critically ill children

17:24.280 --> 17:27.410
is different from the actual population or not.

17:27.880 --> 17:35.560
So this is the actual population and we want to check if this mean value, which we have found out is

17:35.680 --> 17:37.310
different from this one or not.

17:38.260 --> 17:45.550
So we want to prove here that this these both of these belong to two different population instead of

17:45.560 --> 17:46.540
same population.

17:48.570 --> 17:56.070
So for that, we will actually have to prove that this one point two is significantly different from

17:56.070 --> 18:01.750
this one point for so let us see the null hypothesis which we will be creating.

18:02.520 --> 18:08.850
So the statement about the population barometer, we would have to prove this wrong if possible.

18:08.860 --> 18:10.930
So something which we want to prove wrong.

18:11.650 --> 18:14.910
OK, and we want to check here is that.

18:16.320 --> 18:19.470
The meaning of critical children is different from the.

18:21.710 --> 18:25.520
Actual mean of the number of feet.

18:26.580 --> 18:31.410
So our hypothesis will be that the meme is one point for.

18:35.520 --> 18:37.920
And the alternate hypotheses will be.

18:41.010 --> 18:44.850
That this is not equal to one point for.

18:46.290 --> 18:53.060
OK, so the null hypothesis will be that both of these are actually steam and the children are not different,

18:54.150 --> 18:59.400
the mean for both the population is seen and we want to put all this wrong.

18:59.910 --> 19:02.790
So we will see that both of these have same mean.

19:02.790 --> 19:09.570
It is one point for only, and the alternate hypothesis will be that all the populations are different.

19:09.990 --> 19:16.200
And the mean of these 16 students is not equal to one point for.

19:18.830 --> 19:26.480
Now, what is probability theory now, probability theory allows us to calculate the exact probability

19:26.720 --> 19:30.260
that chance was the real reason for the relationship.

19:32.670 --> 19:41.400
That is this what the data which has happened, that is the number of children had missing the as one

19:41.400 --> 19:44.580
point two had the standard deviation of one point five.

19:44.620 --> 19:49.380
So here the mean which came out will be one point two, is actually dealable johns.

19:49.590 --> 19:52.530
And actually the mean is one point for me.

19:52.740 --> 19:54.630
But this is just the Biogen's.

19:54.640 --> 19:56.190
It came to be one point.

19:59.680 --> 20:07.420
So probability theory allows us to produce death statistics and the death statistics is a number that

20:07.420 --> 20:12.170
is used to decide whether to accept or reject the null hypothesis.

20:12.940 --> 20:17.670
So probability theory allows us to accept or reject the null hypothesis.

20:17.950 --> 20:23.170
So we will be able to either accept this or we will be able to reject this hypothesis.

20:23.560 --> 20:27.300
So this is the only thing which we can prove, correct or wrong.

20:29.440 --> 20:35.890
We cannot say anything about this one, so we will be either accepting the null hypothesis or rejecting

20:35.890 --> 20:37.090
the null hypothesis.

20:41.620 --> 20:47.680
So we will learn about the test and the test and how these are conducted in the next session.
