1
00:00:00,060 --> 00:00:04,270
So we've already covered the first two steps of the hypothesis testing procedure.

2
00:00:04,290 --> 00:00:08,940
We've learned how to state null and alternative hypotheses as opposite statements.

3
00:00:08,940 --> 00:00:14,460
And we talked in the last lecture about how to determine the level of significance or the alpha value

4
00:00:14,460 --> 00:00:19,470
based on how important it is that we don't commit a type one error.

5
00:00:19,500 --> 00:00:24,010
Now we want to talk about the third step, which is how to calculate the test statistic.

6
00:00:24,030 --> 00:00:29,070
Now, regardless of what kind of test we're running and we'll talk a little bit more about that later

7
00:00:29,070 --> 00:00:30,240
in this lecture.

8
00:00:30,270 --> 00:00:36,120
The short answer here is that our test statistic formula is always given by this formula here.

9
00:00:36,120 --> 00:00:41,910
So we're looking at the test statistic formula for a T score because we're assuming that population

10
00:00:41,910 --> 00:00:45,750
standard deviation is unknown, which is almost always the case.

11
00:00:45,750 --> 00:00:51,390
If we do happen to know population standard deviation, then we can calculate a Z score instead of a

12
00:00:51,390 --> 00:00:56,610
T score, in which case the formula is exactly the same, except that we have Z equals instead of T

13
00:00:56,610 --> 00:00:57,240
equals.

14
00:00:57,240 --> 00:01:01,860
And instead of using sample standard deviation, we would use population standard deviation.

15
00:01:01,860 --> 00:01:07,530
But again, since we almost never know, population standard deviation will assume here that we are

16
00:01:07,530 --> 00:01:09,280
calculating a T score.

17
00:01:09,300 --> 00:01:15,150
So you may recognize here that the values in this test statistic formula are the sample mean.

18
00:01:15,150 --> 00:01:20,490
So the mean that we calculate from the sample that we take from our population and then this mu sub

19
00:01:20,490 --> 00:01:25,050
zero value here or mu not is our hypothesized value.

20
00:01:25,050 --> 00:01:30,450
So in the last lecture when we were talking about determining the level of significance, we gave these

21
00:01:30,450 --> 00:01:37,830
hypothesis statements where we said that the alternative hypothesis was mu greater than 14, where we

22
00:01:37,830 --> 00:01:43,950
said that we were hypothesizing that mean order processing time was longer than 14 hours, which means

23
00:01:43,950 --> 00:01:50,040
that the alternative hypothesis is the opposite statement, which is that MU is less than or equal to

24
00:01:50,070 --> 00:01:51,090
14.

25
00:01:51,090 --> 00:01:55,410
That mean order processing time is less than or equal to 14 hours.

26
00:01:55,440 --> 00:02:04,200
This value right here in our hypothesis statements, this is mu not it is our hypothesized value.

27
00:02:04,200 --> 00:02:10,690
So we have sample mean, we have the hypothesized value and then here in the denominator you may recognize

28
00:02:10,710 --> 00:02:16,200
the formula for standard error, the standard deviation of the sampling distribution of the sample mean

29
00:02:16,200 --> 00:02:21,660
which is given by sample standard deviation divided by the square root of the sample size.

30
00:02:21,660 --> 00:02:27,900
N So let's say here again that working with this set of hypothesis statements, again, the idea here

31
00:02:27,900 --> 00:02:34,800
is that we are running a warehouse, a distribution center, we are shipping out orders and we're hypothesizing

32
00:02:34,800 --> 00:02:38,160
here that mean order processing time is longer than 14 hours.

33
00:02:38,160 --> 00:02:43,800
So from the time the customer places the order online to the time that we actually ship, the order

34
00:02:43,800 --> 00:02:47,370
out of the warehouse is greater than 14 hours.

35
00:02:47,370 --> 00:02:55,830
Let's say that we pull from our warehouse a simple random sample, and we use a sample size of 50 orders

36
00:02:55,830 --> 00:03:02,280
and we calculate a sample mean of maybe we'll say 16.5 hours.

37
00:03:02,280 --> 00:03:08,880
So for our 50 order sample, we found that the mean order processing time was 16 and a half hours.

38
00:03:08,880 --> 00:03:13,950
We already said that our hypothesized value was 14.

39
00:03:14,400 --> 00:03:20,310
And then let's say we also calculate it from that sample of 50 orders that we pulled a sample standard

40
00:03:20,310 --> 00:03:24,480
deviation equal to, let's maybe say 1.5 hours.

41
00:03:24,480 --> 00:03:28,410
So that's enough information to calculate the test statistic.

42
00:03:28,410 --> 00:03:39,000
We would say that t is equal to our sample mean 16.5 minus the hypothesized value 14 divided by sample

43
00:03:39,000 --> 00:03:44,880
standard deviation, which is 1.5 hours divided by square root of sample size.

44
00:03:44,970 --> 00:03:50,130
We said that our sample size was and equals 50, so we divide by square root 50.

45
00:03:50,130 --> 00:03:56,490
And if we use a calculator to find the value of this fraction, here we get an approximate T score of

46
00:03:56,490 --> 00:04:00,300
about 11.79 and that's it.

47
00:04:00,300 --> 00:04:07,470
That is the value of our test statistic and we are done with the third step of our hypothesis test.

48
00:04:07,470 --> 00:04:13,350
But we still want to take this opportunity to talk about one extra factor in our hypothesis testing

49
00:04:13,350 --> 00:04:19,050
procedure before we go forward, because we'll need this, which is the idea of a one tailed test versus

50
00:04:19,050 --> 00:04:20,130
a two tailed test.

51
00:04:20,130 --> 00:04:26,640
So if you remember from before, we talked about our null and alternative hypotheses and we said that

52
00:04:26,640 --> 00:04:32,910
we had three options, we could use an equal sign in the null hypothesis, in which case the alternative

53
00:04:32,910 --> 00:04:38,400
hypothesis has to have a not equal to sign or in the null we could have less than or equal to, in which

54
00:04:38,400 --> 00:04:43,500
case the alternative has a greater than sign or the null has a greater than or equal to, in which case

55
00:04:43,500 --> 00:04:45,510
the alternative has a less than sign.

56
00:04:45,750 --> 00:04:51,270
These are our only three options when we state our null and alternative hypotheses.

57
00:04:51,270 --> 00:04:55,920
Well, these three options correspond to one and two tailed tests.

58
00:04:55,920 --> 00:04:59,940
So if we use this first option here where the null is.

59
00:05:00,090 --> 00:05:03,100
An equal sign, and the alternative is a not equal to sign.

60
00:05:03,120 --> 00:05:08,820
That means we're using a two tailed test, which we also sometimes call a non directional test or a

61
00:05:08,820 --> 00:05:09,790
two sided test.

62
00:05:09,810 --> 00:05:12,420
Those are three different ways of saying the same thing.

63
00:05:12,450 --> 00:05:18,900
In contrast, if we use either the second or the third option, then we're conducting a one tailed test,

64
00:05:18,900 --> 00:05:22,470
also called a directional test, also called a one sided test.

65
00:05:22,500 --> 00:05:29,280
Now, previously when we talked about these three options, we said that this first option here, option

66
00:05:29,280 --> 00:05:32,700
number one, was our most conservative option.

67
00:05:32,730 --> 00:05:39,450
Using this option says that we don't really have an idea or a presupposition about the direction or

68
00:05:39,450 --> 00:05:41,560
the directionality of our test.

69
00:05:41,580 --> 00:05:47,330
We're not sure if we should use a greater than value or less than value in our alternative hypothesis.

70
00:05:47,340 --> 00:05:51,180
We really don't have any preconceived idea about the direction.

71
00:05:51,180 --> 00:05:58,650
But the reason in actuality that this is more conservative, we can see visually if we look at the normal

72
00:05:58,650 --> 00:05:59,600
distribution here.

73
00:05:59,610 --> 00:06:01,620
So we talked about this before too.

74
00:06:01,650 --> 00:06:08,010
If we have our probability distribution, then we have our region of acceptance in the middle here that

75
00:06:08,010 --> 00:06:12,060
is based on the confidence level percentage.

76
00:06:12,060 --> 00:06:16,950
And if we're running a two tailed test or a non directional test, then we're going to have a region

77
00:06:16,950 --> 00:06:21,720
of rejection over here in the lower tail and a region of rejection over here in the upper tail.

78
00:06:21,750 --> 00:06:28,890
The amount of area under the curve in this lower tail region of rejection is alpha over two and the

79
00:06:28,890 --> 00:06:31,650
amount of area in the upper tail is alpha over two.

80
00:06:31,680 --> 00:06:36,210
Remember that alpha is given by one minus the confidence level.

81
00:06:36,210 --> 00:06:46,230
So if we choose a confidence level of let's say 95%, then by definition the alpha value has to be 5%,

82
00:06:46,230 --> 00:06:53,640
which means then that we would have 2.5% of all the area under the curve in this lower tail and 2.5%

83
00:06:53,640 --> 00:06:55,770
of all the area under the curve in the upper tail.

84
00:06:55,770 --> 00:07:00,380
So this lower tail region of rejection would make up two and one half percent of total area.

85
00:07:00,390 --> 00:07:04,320
The upper tail region of rejection would make up two and one half percent of total area.

86
00:07:04,470 --> 00:07:09,690
And then this whole region of acceptance in the middle would be 95% of the area.

87
00:07:09,720 --> 00:07:16,080
Contrast that with a directional test in either direction where we have a one tailed test.

88
00:07:16,080 --> 00:07:20,720
So if we have this situation here, this is situation number two.

89
00:07:20,730 --> 00:07:26,310
So this is two and three, this is one and two.

90
00:07:26,310 --> 00:07:30,480
And then while we're at it here, we'll say that this is scenario three.

91
00:07:30,720 --> 00:07:36,300
So instead of two here, the alternative hypothesis states that we think the population mean is greater

92
00:07:36,300 --> 00:07:38,880
than some hypothesized value.

93
00:07:38,880 --> 00:07:44,430
And because we use this greater than value, that means we're using what we call an upper tailed test.

94
00:07:44,430 --> 00:07:48,740
And the entire region of rejection is squeezed into this upper tail.

95
00:07:48,750 --> 00:07:52,410
There is no other region of rejection down here in the lower tail.

96
00:07:52,410 --> 00:07:57,150
So we just have the region of acceptance on the left and the region of rejection on the right.

97
00:07:57,150 --> 00:08:03,660
Whereas if we state in our alternative hypothesis that we think the population mean is less than some

98
00:08:03,660 --> 00:08:10,110
hypothesized value, so less than that means we're running a lower tailed test and it pushes the entire

99
00:08:10,110 --> 00:08:12,270
region of rejection into the lower tail.

100
00:08:12,270 --> 00:08:18,410
So we just have the singular region of rejection on the left and the region of acceptance on the right.

101
00:08:18,420 --> 00:08:25,140
Now, going back here to this idea of the two tailed test being more conservative than either of the

102
00:08:25,140 --> 00:08:27,090
one tailed directional tests.

103
00:08:27,180 --> 00:08:34,620
The reason we say it's more conservative is because let's say that we have chosen a 95% confidence level,

104
00:08:34,620 --> 00:08:38,010
like we said, and therefore an alpha value of 5%.

105
00:08:38,220 --> 00:08:43,409
That means for a two tailed test, if we're running a two tailed test based on the fact that we have

106
00:08:43,409 --> 00:08:49,080
stated null and alternative hypotheses with these equal and not equal to signs, if we're running a

107
00:08:49,080 --> 00:08:55,500
two tailed test, then that means that this region of acceptance in the middle here is 95% of the area,

108
00:08:55,500 --> 00:08:59,460
because that 95% area is centered around the mean.

109
00:08:59,460 --> 00:09:03,060
It's symmetrical here in the middle of this distribution.

110
00:09:03,060 --> 00:09:11,040
That means that the boundary here between this lower tail region of rejection and the region of acceptance

111
00:09:11,040 --> 00:09:17,330
is here at 2.5% or at 0.0250.

112
00:09:17,340 --> 00:09:22,530
That would be our Z score value if we were thinking about this in terms of Z scores.

113
00:09:22,530 --> 00:09:28,500
And then the boundary here at the right edge of the region of acceptance and the lower edge of this

114
00:09:28,500 --> 00:09:37,950
upper region of rejection, this boundary right here would be at a Z score of 97.5 or 0.9750.

115
00:09:37,950 --> 00:09:42,990
We can see that two and one half percent over on the left, the two and one half percent over on the

116
00:09:42,990 --> 00:09:49,680
right above 97.5 and the 95% in the middle between 97.5 and 2.5.

117
00:09:49,680 --> 00:09:55,890
But if we run a one tailed or directional test, let's say we're running an upper tail test here and

118
00:09:55,890 --> 00:09:59,580
we look at this boundary again with the same.

119
00:10:00,000 --> 00:10:01,650
Its level of 95%.

120
00:10:01,680 --> 00:10:09,210
That means that we have the entire 95% over here, which means that this Z score has to be associated

121
00:10:09,210 --> 00:10:12,630
with 0.9500.

122
00:10:12,660 --> 00:10:18,060
It is the Z score that gives 0.9500 in the body of the Z table.

123
00:10:18,060 --> 00:10:25,860
So these are the Z scores that give 0.0250 and 0.9750 in the body of the Z table.

124
00:10:25,860 --> 00:10:29,450
So that is the boundary value we'd be looking for.

125
00:10:29,460 --> 00:10:35,160
And then similarly with this lower tail test, again, keeping our confidence level the same at 95%,

126
00:10:35,250 --> 00:10:40,290
that means the entire alpha value, 5% has to get squeezed into this lower tail.

127
00:10:40,290 --> 00:10:50,850
And so this boundary right here has to be associated with a Z score that gives 0.0500 in the body of

128
00:10:50,850 --> 00:10:51,740
the Z table.

129
00:10:51,750 --> 00:10:59,520
Now, what we see ultimately is that we are more likely to land in the region of acceptance with a two

130
00:10:59,520 --> 00:11:00,420
tailed test.

131
00:11:00,450 --> 00:11:06,930
It's going to be harder to reject the null hypothesis, harder to land within the region of rejection

132
00:11:06,960 --> 00:11:13,470
if we're using a to tail test, because we can see here that with this two tailed test on this upper

133
00:11:13,470 --> 00:11:19,890
side, we have to clear a bar of 0.975 in order to reject the null hypothesis.

134
00:11:19,890 --> 00:11:24,400
Whereas here with the upper tail test we only have to clear a bar of 0.95.

135
00:11:24,420 --> 00:11:31,170
In other words, our test statistic doesn't have to be as extreme when we run this upper tail test as

136
00:11:31,170 --> 00:11:32,910
when we run this two tailed test.

137
00:11:32,910 --> 00:11:39,570
And similarly here with the lower tail test, the test statistic does not need to be as extreme on the

138
00:11:39,570 --> 00:11:46,760
low end in order to be to the left of 0.05, as it does to be to the left of 0.0 to 5.

139
00:11:46,770 --> 00:11:53,880
So running a one tailed test makes it easier to reject the null hypothesis, which makes it easier to

140
00:11:53,880 --> 00:11:56,490
lend support to our alternative hypothesis.

141
00:11:56,490 --> 00:12:03,120
And so we've kind of lowered the bar, we've lowered the strictness of our test by running a one tail

142
00:12:03,120 --> 00:12:03,570
test.

143
00:12:03,570 --> 00:12:09,960
And that's why we said previously that it's super important that we have some idea, some evidence ahead

144
00:12:09,960 --> 00:12:15,690
of time about the directionality of the test if we're going to move forward with one of these pairs

145
00:12:15,690 --> 00:12:17,250
of hypothesis statements.

146
00:12:17,280 --> 00:12:22,170
Otherwise, if we don't really have any evidence of directionality either way or we want to be more

147
00:12:22,170 --> 00:12:27,840
conservative, we want to be safer and set the bar for ourselves as high as we can to make sure that

148
00:12:27,840 --> 00:12:34,350
our conclusion is more rock solid, Then we should run a two tailed non directional test.

149
00:12:34,350 --> 00:12:41,460
In other words, if we sort of line up these top two distributions here and we extend the boundary in

150
00:12:41,460 --> 00:12:45,720
this second distribution between the region of acceptance and the region of rejection, we sort of pull

151
00:12:45,720 --> 00:12:48,780
that up here into the first distribution.

152
00:12:48,780 --> 00:12:51,870
Maybe that boundary is right around here.

153
00:12:51,870 --> 00:12:57,420
If we extend that up, then what we're saying is that if we're running a two tailed test, we have to

154
00:12:57,420 --> 00:13:03,150
find a test statistic that clears this boundary in order to reject the null hypothesis.

155
00:13:03,150 --> 00:13:08,730
But if we were running a one tailed test only, we would need to find a test statistic that only clears

156
00:13:08,730 --> 00:13:12,840
this bar, this boundary in order to reject the null.

157
00:13:12,840 --> 00:13:18,900
And so any test statistics that we find between these two, any test statistics that fall within this

158
00:13:18,900 --> 00:13:24,990
interval, we will fail to reject the null if we're running a two tailed test but succeed at rejecting

159
00:13:24,990 --> 00:13:27,270
the null if we're running a one tailed test.

160
00:13:27,270 --> 00:13:33,180
And so that's why we see that two tailed test as being more conservative with all of this out of the

161
00:13:33,180 --> 00:13:33,540
way.

162
00:13:33,540 --> 00:13:39,090
And now that we know how to calculate a test statistic, we can move on to the last couple of steps

163
00:13:39,090 --> 00:13:45,240
of the hypothesis testing procedure, which will be determining whether or not this test statistic clears

164
00:13:45,240 --> 00:13:51,570
one of these bars, these hurdles, and lands us in one of these regions of rejection instead of the

165
00:13:51,570 --> 00:13:52,620
region of acceptance.

166
00:13:52,620 --> 00:13:58,170
In other words, is our test statistic severe enough to land us within a region of rejection?

167
00:13:58,170 --> 00:13:59,430
That's what we're going to look at next.

168
00:13:59,430 --> 00:14:04,230
How to determine whether or not the test statistic shows us enough significance.

169
00:14:04,230 --> 00:14:10,770
If it does, then we'll be landing within one of these regions of rejection, which will allow us to

170
00:14:10,770 --> 00:14:15,360
reject the null hypothesis and lend support for our alternative hypothesis.

171
00:14:15,360 --> 00:14:21,570
So now that we have this test statistic, we'll move on to determining these regions of acceptance and

172
00:14:21,570 --> 00:14:28,470
rejection and looking at whether or not it'll be acceptable for us to reject the null hypothesis.