WEBVTT

00:00.950 --> 00:04.780
In this session, we will discuss about Zinnanti this so for.

00:04.820 --> 00:07.930
First of all, let us understand about some important films.

00:08.360 --> 00:16.590
So the P value, p value is the value of probability, which is obtained from the Z value or B value.

00:16.820 --> 00:26.210
When we look these values on the Z or B, B, but now this probability is the probability of obtaining

00:26.210 --> 00:35.000
a particular sample mean is less than the alpha value, then it is going to be falling in the B or also

00:35.000 --> 00:36.590
known as a critical region.

00:38.400 --> 00:41.370
So let us have a look at the DiGRA.

00:43.250 --> 00:44.690
So this is their dytham.

00:46.430 --> 00:50.690
And we are checking that we have a particular mean value.

00:51.860 --> 00:58.550
And we want to compare a different mean and we want to check if the machine is actually falling in the

00:58.550 --> 01:08.240
same distribution or not, that is let us see the example of the beat so the children would have the

01:08.240 --> 01:09.080
mean value.

01:09.240 --> 01:13.350
The population mean value for the children's teeth was one point four.

01:13.580 --> 01:17.330
So one point four beats, one missing for the children.

01:17.330 --> 01:19.790
The average value missing was one point for.

01:20.900 --> 01:30.080
Now, we want to check that if the average value of the missing teeth is one point, which is somewhat

01:30.080 --> 01:36.950
higher, then is this value significantly different from this one point four or not?

01:37.990 --> 01:46.240
So if the value is not significantly different, if the value is not significantly different, then

01:46.240 --> 01:51.880
it will fall in this region, only then it will be the part of the distributionally.

01:53.440 --> 02:01.990
But if the value was not one point two, but the value was actually us is zero point five.

02:02.940 --> 02:07.440
If the value was zero point five, then maybe the value would have fallen here.

02:08.690 --> 02:15.400
Then what would happen is that the value is actually falling into the value or elfatih Jim.

02:17.870 --> 02:24.380
So when the value is falling in this region, which is the region, this is the Lord and this is the

02:24.380 --> 02:29.780
Upworthy, if the value is, let's say, zero point five zero point zero, then it will fall in the

02:29.780 --> 02:30.380
Lordi.

02:30.620 --> 02:34.940
If the value is three or four, then the value will lie in the Upworthy.

02:35.720 --> 02:43.430
So if the value is falling in that region, then we will see that the meaning of the new new scenario,

02:43.670 --> 02:47.870
then we have some kind of illness in the children.

02:47.870 --> 02:52.010
And because of the illness, the number of falling teeth is actually changed.

02:52.280 --> 02:55.940
The usually the children have one point four missing teeth.

02:56.180 --> 03:03.200
But now in this scenario, because the children are ill, they are having less number of these or they

03:03.200 --> 03:04.660
have more number of falling.

03:06.110 --> 03:14.000
So now the population has entirely changed, so in this case, the mean of the population of nine of

03:14.000 --> 03:19.610
the children, all of the children, the deed is actually different from this one point.

03:19.630 --> 03:22.790
For now, the mean is some other value.

03:23.810 --> 03:30.230
So if the value is significantly different, then it should fall in the regions.

03:31.040 --> 03:38.150
So this is what we are taking in the case of the best or the best, basically, in case of hypothesis

03:38.150 --> 03:46.070
testing, if the value which we are checking, if the hypothesis which we are checking is significantly

03:46.070 --> 03:53.060
different from this data or it is just by chance, if the value is significantly different, this means

03:53.060 --> 03:58.910
that it is not by chance what actually there is something which is different in the scenarios or in

03:59.170 --> 04:03.380
the samples, which is the reason why these means are coming out with different.

04:04.570 --> 04:11.090
If the means are not coming out to be significantly different, that will mean that it is just by chance.

04:11.110 --> 04:17.680
The value comes out to be something here or something new and not something very not exactly the mean

04:17.680 --> 04:18.040
value.

04:22.350 --> 04:29.700
So if the Z score or this school is greater than the critical level, that is when the P value is less

04:29.700 --> 04:38.010
than the critical alpha value, then it is said that the mean is significantly different than the population

04:38.010 --> 04:42.040
mean, hence pointing out that the population is entirely different.

04:42.270 --> 04:49.620
That is what we just said, that if the value of the value of the new machine is significantly different

04:50.070 --> 04:52.320
and is lying in this region.

04:53.710 --> 05:01.750
This critical region, then it means that these two samples are belonging to the different population,

05:02.080 --> 05:04.220
one population is one healthy kids.

05:04.480 --> 05:09.720
For them, the missing teeth number is one point for and for the unhealthy.

05:09.720 --> 05:16.980
They the missing teeth value is either four or five, which means that this is belonging to a local

05:17.090 --> 05:21.390
population, which has a completely different number of missing deep.

05:22.270 --> 05:30.340
So this means that the illness has actually changed the population with illness, thus children have

05:30.340 --> 05:33.730
different demographic and without illness, the children have different.

05:35.540 --> 05:39.420
So this is what we can prove using the speed that this.

05:43.410 --> 05:49.680
So this means that there was no impact from the treatment on the existing population, does the null

05:49.680 --> 05:51.370
hypothesis is rejected.

05:51.680 --> 05:58.590
Now, if there is no significant difference between the populations, that means and the population

05:58.590 --> 05:59.520
are completely.

06:01.430 --> 06:07.640
And there is no difference, there is no impact of the treatment on the existing population.

06:07.820 --> 06:13.580
The population is just laid back and the null hypothesis can be rejected.

06:15.220 --> 06:16.120
And also.

06:17.090 --> 06:19.740
We cannot accept the null hypothesis.

06:20.060 --> 06:25.680
We can only reject the null hypothesis on the basis of the evidence.

06:26.540 --> 06:29.280
We cannot say that the null hypothesis is correct.

06:29.600 --> 06:33.010
We can only say that the null hypothesis is incorrect.

06:36.040 --> 06:38.800
Now, let us have a look at one example.

06:39.430 --> 06:47.230
So on the twenty five point five, the scheme men and women differed by about five points.

06:48.070 --> 06:53.370
The means for men and women were eighteen point seventy five belonging to men.

06:53.770 --> 06:58.090
I'm twenty three point five belonging to women now.

06:58.090 --> 07:03.160
They were not identical, but how likely is five point difference to occur just by chance?

07:03.520 --> 07:10.120
So here what we are trying to say is that there was a sort of scaling system and there was a satisfaction

07:10.120 --> 07:17.620
scale which was set up and men said that they were satisfied eighteen point seven, five times out of

07:17.620 --> 07:22.750
twenty five points, they give eighteen point seven five and women give twenty three point five.

07:24.010 --> 07:31.960
Now, again, these two values are two different means, belonging to two different datasets.

07:32.990 --> 07:40.820
Now, out of these two different data sets, we want to find out that are these two belonging to the

07:40.820 --> 07:45.380
same population or it is actually different for men and women?

07:45.650 --> 07:48.930
Is the satisfaction criteria different for men and women?

07:49.550 --> 07:51.800
So we will check.

07:52.160 --> 07:56.720
So we will find out if these populations are similar or different.

07:58.400 --> 08:06.280
So an analysis was conducted, I'm the P-value value for the gender comparison was found out to be point

08:06.290 --> 08:06.750
one one.

08:07.980 --> 08:13.210
Which means that the probability was find out to be 11 percent.

08:13.590 --> 08:20.110
Thus there was about 11 percent chance that this data, the five point difference would occur by chance.

08:20.370 --> 08:23.790
So there was an 11 percent chance that this would occur by chance.

08:24.510 --> 08:27.140
Now the P value is greater than point five.

08:27.360 --> 08:29.420
So we would fail to reject the null.

08:29.640 --> 08:30.900
That is the.

08:32.040 --> 08:39.660
That is this is not a significant difference, thus there is no evidence that the male and female differ

08:39.660 --> 08:46.050
in their satisfaction if the probability was lower than zero point five.

08:47.160 --> 08:55.050
If the probability was lower than zero point five or zero point five, that is, there was about five

08:55.050 --> 09:02.010
percent chance that this would occur by a chance that this is occurring by chance, then we would have

09:02.010 --> 09:05.670
rejected the null hypothesis that these populations are safe.

09:05.880 --> 09:11.980
We would see that these populations are actually different and they have different satisfaction levels.

09:14.960 --> 09:16.860
Let us look at another example.

09:17.030 --> 09:23.780
So we are comparing how means and females differ with respect to how likely they would be recommended

09:23.780 --> 09:25.080
on an online course.

09:25.520 --> 09:26.540
OK, so.

09:28.570 --> 09:34.130
Measurement is on a five point gain, so the measurement is five point scale.

09:34.450 --> 09:42.000
The men have given four point one and the men have given three point one here.

09:42.040 --> 09:47.110
Men have given four point to the rating and women have given three point one rating.

09:48.510 --> 09:50.490
Now, the null hypothesis states.

09:51.710 --> 09:57.470
Then there is no difference between men and women in the recommendation of an online course, which

09:57.470 --> 10:05.150
is men and women will equally recommend an online course, it is not like if we are target men, then

10:05.150 --> 10:10.490
they will recommend more of online courses and women will not recommend online courses.

10:10.700 --> 10:13.570
So we just want to check if this is a scenario or not.

10:13.880 --> 10:18.410
So the hypothesis is that they are saying they are no different.

10:20.060 --> 10:27.290
Now, on a five point satisfaction scale, they differ by one point, men give four point three and

10:27.290 --> 10:31.160
women give three point one recommendation of value.

10:31.760 --> 10:38.750
Now, these values are not identical, but we want to find out how much is the chance that this one

10:38.750 --> 10:41.090
point differences by chance.

10:41.750 --> 10:48.620
We want to find out if these two populations are actually different, if men and women will give different

10:48.890 --> 10:55.820
amount of recommendations, or that it is just by chance that the samples which we were taking and the

10:55.820 --> 10:58.340
people which we were targeting, they give different answers.

11:00.650 --> 11:08.090
So for this analysis was conducted and the P-value for the gender comparison, give point three point

11:08.090 --> 11:12.980
zero three, this means that this was a three percent chance that this would occur by chance.

11:13.310 --> 11:18.960
This means that the value is less than zero point five, the P value zero point five.

11:19.160 --> 11:24.970
So we will reject the null hypothesis that the results are actually significant.

11:24.990 --> 11:28.660
There is a significant difference between four point three and three point one.

11:29.060 --> 11:35.060
That is why we are rejecting this hypothesis that both had equal goodwill equally.

11:35.690 --> 11:40.460
So now what we will do, we will target men for recommending our course.

11:41.770 --> 11:45.760
Thus, there is evidence that male and female differ in their recommendations.

11:47.380 --> 11:54.280
So now you can understand how we are doing this, we will be basically having to type of data sets and

11:54.370 --> 12:00.160
based on these two mean values, we are just trying to find out the difference between these values.

12:00.160 --> 12:02.380
Is it actually significant or not?

12:02.680 --> 12:09.910
So if the difference is significant enough, if it is significant enough, then we will see that they

12:09.910 --> 12:15.500
are belonging to the two different population and that is it is not occurring by chance.

12:15.700 --> 12:23.470
And if it is not different enough, then we will see that this is just Biogen's and this is calculated

12:23.470 --> 12:25.570
on the basis of this probability value.

12:25.750 --> 12:28.060
And this is something which we will have to decide.

12:29.290 --> 12:34.690
So this is different significance level, so we decide on different significance level, there is a

12:34.690 --> 12:40.300
zero point zero one zero point zero five and zero point one as significance level.

12:40.510 --> 12:46.080
So this are these are the Z school values which define the significance levels.

12:46.360 --> 12:52.090
Ideally, the significance level, which is chosen is zero point zero five.

12:54.400 --> 13:01.030
So these Xcode value is one point ninety six, if the Z value is greater than one point ninety six,

13:01.060 --> 13:04.600
then it is significantly different.

13:04.870 --> 13:09.180
If it is not greater than one point ninety six, then it does not significantly different.

13:09.190 --> 13:11.100
That is what we define here.

13:11.680 --> 13:13.090
And these are the values.

13:14.300 --> 13:18.860
So we can directly calculate this on the basis of the Z values or.

13:20.890 --> 13:22.240
So let us get back.

13:25.510 --> 13:33.160
So this is what we have for now and we will discuss about the different types of errors and what is

13:33.160 --> 13:40.140
the meaning of significance and discuss about this in the next session.