WEBVTT

00:01.220 --> 00:08.660
High in this session, we will discuss about the next set of tests, which is Guy Square Test.

00:09.440 --> 00:12.260
So we have two types of test undercut.

00:12.560 --> 00:20.750
Square one is the test for goodness of fit and the other one is the best for independent.

00:21.620 --> 00:29.380
And these two tests are non parametric hypothesis tests using these type of sticks.

00:30.200 --> 00:37.460
So we will learn about what guys of statistics is and how we can perform these duties and when are we

00:37.460 --> 00:39.530
actually going to perform these tests.

00:42.580 --> 00:45.980
So first of all, let us talk about non parametric.

00:46.240 --> 00:51.130
What do we mean when we say these are known parametric hypothesis testing?

00:53.170 --> 01:00.310
So the total non parametric refers to the fact that these guys squared this do not require assumptions

01:00.310 --> 01:07.050
about population barometer's, nor do they test hypotheses about population barometer's.

01:09.100 --> 01:19.720
Now, the tests which we have conducted, that is the Z test and the deepest both of these tests was

01:19.720 --> 01:23.340
revolving around the population barometers.

01:24.250 --> 01:31.180
We were trying to find out either what the population barometer should be or we were trying to compare

01:31.180 --> 01:38.500
a particular sample of the population to find out if this sample actually belonged to the population

01:38.500 --> 01:39.120
or not.

01:40.640 --> 01:48.980
But in this case, we are not comparing with the population, we are not trying to find out something

01:48.980 --> 01:53.520
about the population, we are just analyzing about the data at its.

01:55.620 --> 02:04.980
So previous examples of hypothesis tests such as the fetus are barometric test, and they include assumptions

02:04.980 --> 02:12.690
about barometers and hypotheses about the parameters that is the mean of the population should be of

02:12.690 --> 02:18.840
one population, should be greater than something or not or mean of the populations are equal or not,

02:19.050 --> 02:25.030
or we know one population is greater than the other population of a particular treatment.

02:25.350 --> 02:27.420
So these kind of problems were there.

02:28.430 --> 02:36.860
For example, we would come betting that if there is a kid who has some disease, then he will have

02:36.870 --> 02:39.890
little number of or more number of broken.

02:40.790 --> 02:44.570
So that is one assumption which we are making here.

02:45.020 --> 02:54.830
We wanted to compare if a kid who is 80 is different from the population of the kids who are not in

02:55.010 --> 02:59.500
and comparing the population mean for the number of broken beat.

02:59.750 --> 03:01.220
So that was what we were doing.

03:01.220 --> 03:04.550
We were comparing to two samples of children.

03:04.820 --> 03:12.290
One sample had kids who have a particular disease, and another sample was of the kids who did not have

03:12.290 --> 03:13.040
any disease.

03:13.220 --> 03:16.860
And we were comparing the mean value of the broken between them.

03:17.750 --> 03:25.280
So there we were considering these two to analyze about the whole population, that we have all those

03:25.280 --> 03:32.360
children in the world and we take samples and we find out the mean of all the children who are ill and

03:32.360 --> 03:39.700
all the children who are naughty, then is the mean number of their broken needs would be equal or not.

03:39.740 --> 03:42.380
That is what we were trying to find out to that.

03:42.800 --> 03:45.950
But here we are trying to find out a different thing.

03:45.970 --> 03:46.850
Now, what does this.

03:47.890 --> 03:54.580
The most obvious difference between Chi Square test and the details hypotheses we have considered is

03:54.580 --> 03:56.230
the nature of data.

03:56.350 --> 03:59.380
So in the desert we had a different type of data.

03:59.380 --> 04:01.990
And in Geisler, we will have a different type of data.

04:02.200 --> 04:05.090
What type of data would be then we will seen something.

04:05.380 --> 04:10.840
Now, in case of Chi Square, the data are frequencies rather than numerical scores.

04:12.510 --> 04:17.460
So in case of deepest, what did we have in case of a.

04:18.060 --> 04:25.790
We had different children and a different data about children and for each children, each child, we

04:25.800 --> 04:27.780
had the number of broken teeth.

04:27.780 --> 04:32.560
He will have another child, the number of broken bones she will have and so on.

04:33.210 --> 04:39.400
Now, here we have a particular new medical school or we can see a particular marks and an exam that

04:39.400 --> 04:40.500
the child would not be.

04:40.740 --> 04:45.060
So these are different things which we were comparing earlier and gives of the best and safest.

04:45.540 --> 04:53.580
But now, in case of Chi Square, we are not comparing the numerical value, but we are actually comparing

04:53.580 --> 05:01.110
the frequency of the data that for a particular thing, there are 20 children available or there are

05:01.110 --> 05:02.580
20 children who are the.

05:03.750 --> 05:10.530
So we are talking about frequency here instead of the data related to them, not about the height of

05:10.530 --> 05:15.300
the children, but actually the number of the children who would represent this kind of data would be

05:15.300 --> 05:15.560
that.

05:15.660 --> 05:18.900
And to clarify more, we will take some examples of that.

05:18.930 --> 05:21.780
You'll have a better understanding of this.

05:24.110 --> 05:30.810
So let's get further, and the first test, which we will be talking about is the guy is quirkiest of

05:30.810 --> 05:31.980
goodness of.

05:33.410 --> 05:41.750
Now, the guy's got this for himself, it uses the frequency data from a sample to test the hypothesis.

05:43.120 --> 05:49.780
And what does this hypothesis here, what is the hypothesis in case of Woodenness offered, the hypothesis

05:49.930 --> 05:54.440
is about the shape or proportions of the population.

05:54.640 --> 06:04.660
So here we are trying to find out if a particular sample which we have is having a particular shape

06:04.660 --> 06:05.230
or not.

06:06.360 --> 06:12.990
That is, if the population will follow a particular type of distribution or not, that is what we are

06:12.990 --> 06:16.590
trying to find out here and we are trying to find out.

06:17.010 --> 06:22.080
Let's see, we have a sample X and we have the entire population.

06:22.440 --> 06:24.660
So we want to find out that the.

06:25.870 --> 06:32.610
Distribution that we are gaining from this particular sample would fit into a particular distribution

06:32.610 --> 06:32.910
or not.

06:33.180 --> 06:35.130
That is why it is known as woodenness of.

06:36.320 --> 06:43.220
We are trying to find out if we have this particular sample with 10 or 20 values, then willing to fit

06:43.220 --> 06:46.950
a normal distribution or it will fit any other type of distribution.

06:47.270 --> 06:49.540
So that is what we are trying to find, OK.

06:50.090 --> 06:56.860
Now, here, the guy square test is used to test it for a sample of data came from a population with

06:56.870 --> 06:58.160
a specific distribution.

06:58.160 --> 07:02.270
That is, it is belonging to a specific distribution or not.

07:03.390 --> 07:09.820
Now, every individual in the sample is classified into one category on the scale of measurement.

07:10.110 --> 07:15.270
So we will have different categories in case of this particular test.

07:15.780 --> 07:21.720
Earlier, we had just some numerical values, but here we will have certain categories.

07:21.870 --> 07:29.130
And based on those categories, we will have a number of people being allocated to that particular category,

07:29.400 --> 07:31.860
maybe 20 male and 30 female.

07:32.910 --> 07:35.020
So this kind of data would be available.

07:36.310 --> 07:44.980
Now, the data called absorbed frequencies simply count how many individuals from the sample are in

07:44.980 --> 07:45.790
each category.

07:45.820 --> 07:52.660
So basically we get to know that how many people are belonging to the category or category B and C and

07:52.660 --> 07:53.010
so on.

07:53.920 --> 07:54.430
So.

07:55.970 --> 07:58.760
What does the guy square this, what would this fit for?

07:59.510 --> 08:09.210
The null hypothesis specifies the proportion of the population that the bag should be in each category.

08:09.230 --> 08:14.370
That is how much of a particular population should belong to each category.

08:14.510 --> 08:20.960
If each category should have equal number of people in them, or there should be more number of people

08:20.960 --> 08:24.500
in one category and less number of people in the category.

08:24.680 --> 08:27.710
So this is what the null hypothesis would be about.

08:29.230 --> 08:37.690
And the proportions from the null hypothesis are used to compute the expected frequencies that describe

08:37.690 --> 08:43.240
how the sample will appear if it were in a perfect agreement with the null hypothesis.

08:43.420 --> 08:49.840
So basically, we create a null hypothesis, which will imply that latency there should be equal number

08:49.840 --> 08:57.790
of male and female for a particular service and B, on having those equal number of male and female.

08:57.790 --> 09:04.130
But we have this is what null hypothesis would suggest, that we should have 20 million, 25 million

09:04.130 --> 09:07.210
in this particular survey giving a particular opinion.

09:08.340 --> 09:17.190
Now we want to find out if the actual data, which we have, because what we have now is a sample of

09:17.190 --> 09:17.640
the data.

09:18.500 --> 09:23.010
And a sample might have less number of middle and high number of female.

09:23.600 --> 09:24.860
That is a possibility.

09:25.250 --> 09:31.820
But because this is a sample we cannot see straight away about the number of people in the population,

09:32.000 --> 09:36.170
because maybe when I took a sample, there were nine being male.

09:36.170 --> 09:37.610
I'm twenty one female.

09:37.820 --> 09:43.740
But in the population actually there were four hundred million four hundred females, which is an equal

09:43.740 --> 09:43.920
number.

09:44.720 --> 09:51.290
So here we want to find out if the distribution, which the sample is showing, is putting into the

09:51.290 --> 09:53.540
distribution, which the population would do.

09:53.930 --> 09:58.700
So we want to find out what the distribution of the population, which it will be giving up.

10:02.910 --> 10:04.280
So let us see here.

10:05.810 --> 10:10.280
So here we have a frequency data of different eye colors.

10:11.820 --> 10:19.080
And it says that there are 12 people with blue eyes, 21 people with brown eyes, three people with

10:19.080 --> 10:22.770
green eyes and four people with other eye color.

10:23.820 --> 10:31.340
Now we want to find out if let us say all people will be there will be all colors are equally present.

10:31.350 --> 10:35.520
And so we are comparing the people in the entire world.

10:35.700 --> 10:38.130
And this is the sample which we have.

10:38.130 --> 10:46.650
And based on the sample we want to derive, if all the people in the world will have same color I same

10:46.740 --> 10:53.420
quantity of eye color or not, or the same proportion of white colors or not, that is, let us see.

10:53.940 --> 10:56.700
Twenty five percent people will have blue eyes next.

10:56.700 --> 10:59.460
Twenty five percent people will have brown eyes next.

10:59.730 --> 11:03.150
Twenty five percent will have green eyes and the rest.

11:03.180 --> 11:05.510
Twenty five percent will have other colors.

11:06.810 --> 11:08.660
So this is what we want to prove here.

11:08.700 --> 11:11.020
This is worth something which we are trying to do here.

11:11.310 --> 11:15.720
So this is what the null hypothesis would be, that all the people we have seen, Michael.

11:16.730 --> 11:20.600
Now, this is what we would try to accept or reject or find notable.

11:21.900 --> 11:28.650
So this is the main objective here to find out about the guy that is.

11:29.710 --> 11:32.620
Using the frequency of data which we have.

11:33.900 --> 11:39.670
So this is what Gry Square test provides us and helps us in finding.

11:41.850 --> 11:50.140
Now, let us have a look at an example so considered a standard bag of milk chocolate Eminem's and there

11:50.160 --> 11:52.820
are six different colors in an eight.

11:53.070 --> 11:54.390
So what are different colors?

11:54.420 --> 11:57.910
We have red, orange, yellow, green, blue and brown.

11:58.320 --> 12:01.470
So these are the six colors which we have now.

12:01.470 --> 12:09.690
Suppose that we are curious about the distribution of these colors and we like the the production of

12:09.690 --> 12:10.440
the Eminem's.

12:10.710 --> 12:18.060
Does it happen in equal quantity or they are there creating a particular color would incorporate into

12:18.060 --> 12:18.840
some of the color.

12:19.140 --> 12:25.290
So we are trying to compare these categories, these colors as a category and we are trying to find

12:25.290 --> 12:31.650
out if they're if the categories are equally uplifting in the same proportion or that is a particular

12:31.650 --> 12:37.890
category see orange or green, which are being produced and the rest of being produced less.

12:37.950 --> 12:40.100
So that is something which we are trying to compare here.

12:41.760 --> 12:45.700
So do all the six colors in the equal proportion.

12:46.170 --> 12:51.490
Now, this is the type of question that can be asked with a witness list.

12:52.440 --> 12:59.760
That is what is the proportion of the values which are being created out of all the different colors

13:00.120 --> 13:04.760
or what are the proportion of different eye colors present in the wood?

13:05.960 --> 13:12.280
All right, so let us see what is the proportion of the transportation people are using, comparing

13:12.390 --> 13:17.720
what we learned or four wheeler that is a car or maybe a bus or train.

13:17.900 --> 13:20.140
So how are people actually commuting?

13:20.150 --> 13:21.200
What is the proportion?

13:21.890 --> 13:24.440
So these are different kind of questions which could be asked.

13:26.660 --> 13:30.740
Now, let us ask a question that why do we need guys Squit?

13:32.160 --> 13:41.190
So we begin by noting that the setting and why the goodness of this is appropriate, so we will find

13:41.190 --> 13:44.560
out that what is the need of this?

13:45.150 --> 13:52.440
So here we have a variable, which is categorical now because this is a categorical data and there are

13:52.440 --> 13:56.280
different levels of data presence in this particular case.

13:57.330 --> 13:59.690
That is why we will be using squarest.

14:00.600 --> 14:06.690
OK, so whenever we have categorically done, we have certain levels then we can use Guisewite.

14:08.100 --> 14:15.780
Now, we assume that the Eminem's recount will be simple, random sample from the population of all

14:15.780 --> 14:20.720
Eminem's, that is, we will assume that all the counts will be same.

14:20.730 --> 14:25.390
That is all of these Eminem's will be appearing in same proportion.

14:25.740 --> 14:27.570
So that is what we are assuming.

14:29.570 --> 14:37.010
So once we have the made the assumption that all the six colors are present in the same proportion,

14:37.340 --> 14:38.930
then we will.

14:41.280 --> 14:43.190
Creating an alternate type.

14:44.570 --> 14:49.100
Now, what are the Nalin alternate hypotheses these will be?

14:50.230 --> 14:57.670
Reflecting the assumption that we are making about the population, so we are testing whether the colors

14:57.670 --> 15:05.200
will occur in equal proportions or not, so this will be all null hypothesis that all the colors, all

15:05.200 --> 15:09.070
the colors of this a.m. are occurring in the same proportion.

15:09.070 --> 15:16.060
That is when the company is actually producing these product, they are producing these in equal quantities

15:16.060 --> 15:18.640
and randomly pushing them into different packages.

15:18.880 --> 15:22.110
That is why the numbers are coming out to be different.

15:22.600 --> 15:26.740
But actually the Eminem's which are being produced are same.

15:26.830 --> 15:28.020
No, OK.

15:29.420 --> 15:35.630
So what is the null hypothesis, the null hypothesis is if the one is the population proportion of red

15:35.630 --> 15:44.510
candies, then BIDU is the population of orange candies and so on, then the value of the would be equal

15:44.520 --> 15:49.250
to BIDU is equal to be three is equal to four is equal to be five is equal to be six.

15:49.250 --> 15:55.040
That is all the proportion, all the number of candies being created are seen.

15:56.120 --> 16:00.850
The production is same for all the six colors.

16:01.010 --> 16:02.720
This is a null hypothesis.

16:04.080 --> 16:11.040
What is the alternate hypothesis, the alternate hypothesis is that at least one of these colors.

16:12.950 --> 16:21.260
Is not equal to the others, that is at least one of the population proportion is not equal to one sixth

16:21.500 --> 16:23.030
of the entire population.

16:24.170 --> 16:29.120
So this is what our null hypothesis, an alternate hypothesis will.

16:31.560 --> 16:36.820
Now we've got to find out the actual and the expected outcomes.

16:37.230 --> 16:38.730
So what are the actual count?

16:39.030 --> 16:44.590
The actual count will be the count of the Eminem's, which we find out in a particular.

16:46.650 --> 16:51.810
And the expected count will be the sum of all the values divided by six.

16:53.260 --> 17:00.700
That is the exact expected count, so the actual count as the number of Gandy's, what each of the six

17:00.700 --> 17:07.870
colors which we find out in the frequency and the expected count refers to the what we would expect

17:07.870 --> 17:10.110
if the null hypothesis, what's true.

17:10.810 --> 17:18.070
So we will check be the size of our sample and the expected number of red candies will be in by six.

17:18.070 --> 17:23.530
If the entire sample is in, then the number of red candies will be in BASIX.

17:23.530 --> 17:26.980
Number of orange candies will be in by six and so on.

17:28.210 --> 17:28.540
Right.

17:28.900 --> 17:35.920
So for this example, the expected number of Gandy's for each of the six colors will simply be in Times

17:35.930 --> 17:41.500
VI or in BASIX, that is the the number of the entire sample divided by six.

17:42.440 --> 17:43.650
So let us go further.

17:43.850 --> 17:48.110
Now, let us apply the guys to find out the squad statistics.

17:49.390 --> 17:56.110
So if the null hypothesis were true, then the expected count for each of these colors would be one

17:56.110 --> 18:03.570
by six in six hundred, where the total number of Gandy's is six hundred.

18:03.790 --> 18:09.220
So let's say the total number of cammies for this particular date, the sample data, which we have

18:09.400 --> 18:10.460
is six hundred.

18:10.690 --> 18:18.520
So for this entire six hundred, we expect that there should be one by six hundred that this.

18:20.680 --> 18:23.490
Hundred counties of each and every color.

18:24.880 --> 18:30.580
But what is the actual value which is coming out, actual value shows that blue is two hundred and twelve

18:30.580 --> 18:36.820
oranges, one for the seven, green is one hundred and three, red is 50, yellow is forty six and brown

18:36.820 --> 18:37.750
is 42.

18:38.800 --> 18:47.890
So this is the frequency and we were expecting a hundred of each actually now, so we will now use this

18:47.890 --> 18:51.580
in calculation for the statistics of how we will find that out.

18:52.000 --> 18:57.310
We will find that out by subtracting the expected value from the observed value.

18:58.740 --> 19:03.190
Taking a square of that divided by the expected value.

19:03.810 --> 19:05.700
So what is the value here?

19:05.730 --> 19:11.900
It is one, two hundred and well, the expected is one hundred, so two hundred and twelve minus hundred

19:12.360 --> 19:14.110
who is square divided by one hundred?

19:14.490 --> 19:18.730
Similarly, we will calculate the value for all the different categories.

19:18.750 --> 19:21.090
That is all the different colors which we have.

19:22.660 --> 19:24.700
And then we will take some of these.

19:25.790 --> 19:29.480
So the sun comes out the week to thirty five point four to.

19:32.960 --> 19:39.330
This to thirty five point four is the guys square statistics.

19:39.350 --> 19:41.300
It is, these guys are sarcastic one.

19:43.340 --> 19:48.190
So this is the value which we have found for these guys sticks.

19:49.550 --> 19:53.820
Now we will find out the degree of freedom now what is the degree of freedom?

19:53.930 --> 19:56.180
This is the table which we will be having.

19:56.180 --> 20:02.110
This is the table which we will be having and the people will have the degree of freedom on the left

20:02.120 --> 20:05.960
inside the vertical column on the.

20:09.160 --> 20:18.250
The fee value will be present in the role here in the top role here, so we will be comparing it to

20:18.250 --> 20:19.480
zero point zero five.

20:20.340 --> 20:26.430
For this particular scenario, now, if you see what will be the degree of freedom we have in those

20:26.430 --> 20:27.780
six values here.

20:28.940 --> 20:35.510
So we have six values, then we can freely select five values out of this and one value would have to

20:35.510 --> 20:36.110
be fixed.

20:37.360 --> 20:43.180
Or any of the one value will have to be fixed and we will have to begin randomly select any.

20:44.290 --> 20:45.130
Five values.

20:46.420 --> 20:48.760
So here, the degree of freedom will be fine.

20:49.570 --> 20:53.710
That is six minus one, the number of degrees, minus one.

20:55.140 --> 21:02.790
So the degree of freedom is five and we are looking at the P value to zero point zero five.

21:03.850 --> 21:09.640
So four degrees of freedom value for zero point zero five is eleven point zero seven, and what is the

21:09.850 --> 21:10.840
square value which we have?

21:10.840 --> 21:12.130
We have 235.

21:13.200 --> 21:18.330
Now, if we look further forward for the smallest, we value the.

21:19.590 --> 21:23.220
Value is a stylistic value is zero point zero zero one.

21:24.370 --> 21:31.810
This gives us twenty point five one five, which is still lead to less in comparison to the guys square

21:31.870 --> 21:33.070
statistics, which we got.

21:33.790 --> 21:35.620
So what does this imply?

21:36.040 --> 21:42.910
This implies that we have a very small p value and hence we will be able to reject the null hypothesis.

21:43.180 --> 21:49.990
Even if the value would have been anything greater than eleven point zero seven zero, we would have

21:49.990 --> 21:55.660
rejected the null hypothesis, considering the P value to be zero point zero five, which is a standard

21:55.660 --> 21:56.290
P value.

21:57.550 --> 22:05.170
OK, so this is the reason why we are rejecting the null hypothesis and we are saying we are clearly

22:05.170 --> 22:15.250
able to see that the different colors of Eminem are not acting in same proportion, but actually have

22:15.250 --> 22:17.460
some different proportions associated.

22:17.740 --> 22:20.660
So they are not producing these colors equally.

22:20.920 --> 22:25.760
They are creating a few colors, more in number and a few colors.

22:25.760 --> 22:28.920
And they said no, the reason could be anything.

22:29.320 --> 22:36.520
Maybe the production cost is impacting or maybe the popularity of those colors, but they are creating

22:36.520 --> 22:37.990
them in different proportions.

22:38.800 --> 22:43.600
You see, this is something which we have gained from this guy's school statistics test and we were

22:43.600 --> 22:45.380
able to derive something from this.

22:46.150 --> 22:51.820
So this is how you will apply Chi Square statistics for goodness of fit for any example.

22:54.820 --> 23:03.430
So this is for the first example, we will discuss about this second example now, so what is this example?

23:03.550 --> 23:05.660
So here we have another problem.

23:05.950 --> 23:12.220
So earlier we were curious about the number of Eminem's that if they are producing them in the same

23:12.220 --> 23:18.160
number, the next problem which we have is that we have a casino game which involves rolling off three

23:18.160 --> 23:18.730
doses.

23:19.780 --> 23:24.420
And the winners are basically proportional to the total number of sixes ruled.

23:25.710 --> 23:27.720
So if someone has.

23:28.970 --> 23:35.210
So how would be finding out the winners we are finding out the winners based on the number of success

23:35.210 --> 23:38.130
they will be rolling in the hours of these three.

23:39.680 --> 23:45.830
Right now, suppose a gambler plays the game a hundred times with a given of the phone.

23:45.860 --> 23:51.050
So this is the observed count for a particular gambler that he had zero sixes.

23:51.320 --> 23:59.310
That is 40 times one six thirty five times double sixes, 15 times, and three sixes to the ace.

24:00.140 --> 24:07.320
So we want to find out if this gambler is playing with a good conscious or not.

24:07.640 --> 24:10.250
So is there something wrong with the dice or not?

24:11.410 --> 24:19.360
They saw the casino become suspicious of the gambler and wishes to find out if the dice up there or

24:19.360 --> 24:19.600
not.

24:19.930 --> 24:21.970
So how do they conclude that?

24:22.060 --> 24:24.820
What do they find out by these observations?

24:26.200 --> 24:28.370
So, again, what do we want to find out here?

24:28.750 --> 24:31.150
Here we have these four categories.

24:33.050 --> 24:40.580
We have these four categories and we have these expected gowns and the absolved gowns now out of these

24:40.580 --> 24:47.420
observed gowns, we want to find out if this is actually a good fit to the normal distribution, which

24:47.420 --> 24:49.100
we have for Dice's or not.

24:49.760 --> 24:53.300
Now, this is the expected gown which has been generated from the probability.

24:54.600 --> 25:02.010
Now, how do we find out the value, guys, good value for value will again be calculated by Absol value,

25:02.400 --> 25:06.180
minus expected value, whole square of this.

25:07.070 --> 25:13.500
Calculation divided by the observed value and the summation of this calculation for all the food, that

25:13.530 --> 25:14.960
is right.

25:15.080 --> 25:21.860
So how do we find that out for the minus fifty it square divided by fifty eight plus or minus thirty

25:21.860 --> 25:26.180
four point five, square divided by fifty eight and so on.

25:29.640 --> 25:30.150
Now.

25:31.610 --> 25:33.680
What do we do for them now?

25:33.700 --> 25:40.040
This is equal to one point seven to zero point zero zero seven plus nine point one four one four plus

25:40.040 --> 25:44.440
twelve point five plus twenty three point is equal to twenty three point three six seven.

25:44.450 --> 25:47.720
So the value which we get is twenty three point three six seven.

25:48.980 --> 25:55.760
Now, the degree of freedom will be four minus one, so the degree of freedom is three.

25:57.690 --> 25:59.880
So, Jake, what degree of freedom, three.

26:00.860 --> 26:03.110
And the value of zero point zero five.

26:04.480 --> 26:09.220
For this, the value has to be greater than seven point eight one to be rejected.

26:10.170 --> 26:17.880
Now, because twenty three point three, six seven is greater than seven point eight one five, so we

26:17.880 --> 26:19.600
can reject the null hypothesis.

26:19.950 --> 26:26.300
This means that the biases are not felt right.

26:26.340 --> 26:29.470
The null hypothesis was that the biases are bad.

26:29.490 --> 26:32.180
That is why it will be a part of the distribution.

26:32.490 --> 26:36.260
But because the it is of the significance level.

26:36.270 --> 26:44.490
So we can easily see that the dice is not fed and we can let go of the gambler and give him back his

26:44.490 --> 26:46.590
Dice's and have a good thing.

26:46.830 --> 26:47.190
Right.

26:47.310 --> 26:49.210
So this is what we will be doing here.

26:49.380 --> 26:54.660
So we are able to find out if there is something wrong with a particular distribution using the goodness

26:54.750 --> 26:55.260
of the.

26:56.920 --> 27:03.970
So this is how we use the goodness of the test, the next what we will be checking about is the.

27:05.300 --> 27:07.050
Best for these guys, quietist.

27:07.560 --> 27:12.830
OK, so let us have a look at this one more example which we have.

27:13.770 --> 27:23.190
OK, so this example contains two fifty six visual artists were surveyed to find out that Zodiac sign.

27:23.700 --> 27:30.150
So we want to find out if the visual artists have even distribution across a particularly or across

27:30.150 --> 27:30.810
these objects.

27:31.410 --> 27:36.450
That is, there is no particular Zodiac sign which makes a good visual artist.

27:36.750 --> 27:37.040
Right.

27:37.230 --> 27:42.150
If there would be difference in the distribution, it would simply mean that a particular zodiac sign,

27:42.150 --> 27:44.460
people will make good visual artists.

27:44.460 --> 27:47.400
So we might really high in the particular Zodiac sign.

27:48.450 --> 27:55.110
So what we do here is, again, we will have the categories as those 12 Zodiac signs, which we have,

27:55.650 --> 27:57.300
what will be the degrees of freedom?

27:57.300 --> 28:00.090
The degree of freedom will be 12 minus one.

28:01.300 --> 28:07.360
These are the words of values, these are the expected values, the expected value is one hundred by

28:07.360 --> 28:10.090
twelve, which is the expected value.

28:11.150 --> 28:17.870
So how do we find out that the residual value we find all of those value minus expected value?

28:17.900 --> 28:24.140
We take a whole square of this value and then divide it by the expected value, the desired, which

28:24.140 --> 28:28.790
we get me some more of these and we get five point zero nine for.

28:31.860 --> 28:39.030
Now, we can consider it a very small P value, if the P value is very small, then we will reject the

28:39.030 --> 28:41.070
null hypothesis like this.

28:41.070 --> 28:48.720
All the Zodiac signs are equal, and if the P value is very large, then the null hypothesis should

28:48.720 --> 28:49.800
not be rejected.

28:49.810 --> 28:50.140
Right.

28:50.610 --> 28:51.570
So what do we have?

28:51.570 --> 28:57.060
We have degrees of freedom 11 and the values five point zero nine.

28:59.360 --> 29:01.100
Degree of Freedom 11.

29:02.830 --> 29:05.140
And the value was five point.

29:06.310 --> 29:07.150
Zero nine.

29:07.420 --> 29:11.430
So here you can see that five point zero nine something here.

29:13.260 --> 29:20.310
Right between zero point nine five zero nine zero point nine, which means that we cannot reject the

29:20.310 --> 29:21.340
null hypothesis.

29:22.170 --> 29:22.650
So.

29:24.110 --> 29:31.730
We can easily see that all the Zodiac signs have equal possibility of having a talented visual artist

29:31.730 --> 29:36.590
that is no particular Zodiac sign, which will create a better visual artist.

29:37.820 --> 29:40.920
So this is what we gain from the goodness of the test.

29:41.120 --> 29:45.320
The next thing which we will be learning about is these guys with this sort of independence, which

29:45.320 --> 29:47.720
will be taken up in the next session.

29:48.170 --> 29:48.710
Thank you.