1
00:00:00,150 --> 00:00:03,030
After we state the null and alternative hypotheses.

2
00:00:03,030 --> 00:00:08,039
The next step in the hypothesis testing procedure is to determine the level of significance.

3
00:00:08,070 --> 00:00:15,600
Now, we said before that the level of significance is equal to or the same as the alpha level, which

4
00:00:15,600 --> 00:00:18,710
remember is the complement to the confidence level.

5
00:00:18,720 --> 00:00:23,610
So when we say choose a level of significance, we could also think about this as choosing a confidence

6
00:00:23,610 --> 00:00:24,010
level.

7
00:00:24,030 --> 00:00:31,740
The most common levels will choose will be a confidence level of 90%, a confidence level of 95%, or

8
00:00:31,740 --> 00:00:42,390
a confidence level of 99%, which means then that the alpha values associated with these are 10%, 5%

9
00:00:42,390 --> 00:00:43,880
and 1%.

10
00:00:43,890 --> 00:00:49,590
So when we say choose a level of significance or choose an alpha value, we're talking about choosing

11
00:00:49,590 --> 00:00:50,970
this value right here.

12
00:00:50,970 --> 00:00:55,890
The level we choose depends on how strict we feel like we need to be with our test.

13
00:00:55,890 --> 00:01:01,740
And a lot of that is dictated by our willingness to make what are called type one and type two errors.

14
00:01:01,740 --> 00:01:03,960
So let's actually start there.

15
00:01:03,960 --> 00:01:09,810
Let's say to take an example that we have stated null and alternative hypotheses this way.

16
00:01:09,810 --> 00:01:16,530
Maybe we're running a shipping department at a warehouse and we are hypothesizing our alternative hypothesis

17
00:01:16,530 --> 00:01:21,510
is that the mean order processing time is longer than 14 hours.

18
00:01:21,510 --> 00:01:26,940
So from the time the customer places the order to the time that we send the shipment out of the warehouse

19
00:01:26,940 --> 00:01:33,490
on its way to the customer, we're saying that the mean time to process that order is longer than 14

20
00:01:33,490 --> 00:01:33,960
hours.

21
00:01:33,960 --> 00:01:39,780
So that's what we're hypothesizing, which means the null hypothesis is the opposite statement that

22
00:01:39,780 --> 00:01:42,870
the mean time is less than or equal to 14 hours.

23
00:01:42,870 --> 00:01:45,930
So this is the pair of hypothesis statements that we're working with.

24
00:01:45,930 --> 00:01:49,170
This was step one of the hypothesis testing procedure.

25
00:01:49,170 --> 00:01:52,740
And now step two is to choose our significance level.

26
00:01:53,010 --> 00:01:58,110
So to pick our significance level, we really have to be aware of these type one and type two errors.

27
00:01:58,110 --> 00:02:01,980
So to understand these errors, let's start here.

28
00:02:01,980 --> 00:02:07,620
In any hypothesis testing procedure, we can really end up with four scenarios and we're outlining those

29
00:02:07,620 --> 00:02:08,880
in this table here.

30
00:02:08,880 --> 00:02:14,700
We can either have the scenario where the null hypothesis is true or the scenario where the null hypothesis

31
00:02:14,700 --> 00:02:15,450
is false.

32
00:02:15,450 --> 00:02:21,750
And in both of those cases, we could reject the null hypothesis or we could accept the null hypothesis.

33
00:02:21,750 --> 00:02:24,450
In other words, fail to reject the null.

34
00:02:24,480 --> 00:02:28,440
Now two of these combinations are correct choices, right?

35
00:02:28,440 --> 00:02:34,320
If the null hypothesis is true and we accept it, that's a correct decision.

36
00:02:34,320 --> 00:02:39,780
And what we're saying here is that we may not know it based on the sample that we poll and the statistics

37
00:02:39,780 --> 00:02:41,640
that we calculate associated with that sample.

38
00:02:41,640 --> 00:02:46,800
But let's say in actuality, in the real world, whether we know it or not, the null hypothesis is

39
00:02:46,800 --> 00:02:48,180
actually true.

40
00:02:48,360 --> 00:02:52,050
And as a result of our testing procedure, we accept the null.

41
00:02:52,080 --> 00:02:53,880
That's a correct decision.

42
00:02:53,880 --> 00:02:57,000
We have no issue in that scenario.

43
00:02:57,030 --> 00:02:59,880
We also have one other correct scenario.

44
00:02:59,880 --> 00:03:05,640
This scenario is in actuality, in the real world, it is actually the case that the null hypothesis

45
00:03:05,640 --> 00:03:08,880
is false, which means that we should reject it.

46
00:03:08,880 --> 00:03:11,970
If it's false, hopefully we would reject the null.

47
00:03:12,000 --> 00:03:13,500
That would be the correct thing to do.

48
00:03:13,500 --> 00:03:19,620
And so if the null hypothesis is false and we do reject it, we make a correct decision here.

49
00:03:19,620 --> 00:03:22,200
We'll come back to the idea of power in a second.

50
00:03:22,200 --> 00:03:27,210
But all we want to say right now is that rejecting the null when it's false or accepting the null when

51
00:03:27,210 --> 00:03:29,130
it's true are both correct decisions.

52
00:03:29,130 --> 00:03:31,020
So there's no problem there.

53
00:03:31,020 --> 00:03:35,850
But these other two scenarios are wrong decisions, right?

54
00:03:35,850 --> 00:03:38,520
If the null hypothesis is actually true.

55
00:03:38,520 --> 00:03:44,400
But we take a sample and we calculate statistics for that sample, and the sample leads us to believe

56
00:03:44,400 --> 00:03:49,650
that the null hypothesis is actually false and therefore we reject it if we reject the null when it's

57
00:03:49,650 --> 00:03:52,590
actually true, that's a wrong decision.

58
00:03:52,590 --> 00:03:56,310
And we call that type of wrong decision a type one error.

59
00:03:56,310 --> 00:04:01,320
So if we reject the null when it's actually true, we make a type one error or we commit a type one

60
00:04:01,320 --> 00:04:07,980
error, and the probability of committing a type one error is equivalent to the alpha value, where

61
00:04:07,980 --> 00:04:13,110
that alpha value is the complement of the confidence level, which we've talked about before and which

62
00:04:13,110 --> 00:04:19,019
we outlined here, which means that when we decide on an alpha value, what we're really deciding on

63
00:04:19,019 --> 00:04:22,860
is how much we want to risk committing a type one error.

64
00:04:22,860 --> 00:04:29,730
In other words, if we choose an alpha value equal to 5%, we're saying that 5% of the time or one out

65
00:04:29,730 --> 00:04:36,600
of 20 times, we will reject the null hypothesis when it's actually true 5% of the time will commit

66
00:04:36,600 --> 00:04:37,890
a type one error.

67
00:04:37,890 --> 00:04:45,660
Whereas if we choose a confidence level of 99% and therefore an alpha value of 1% or a level of significance

68
00:04:45,660 --> 00:04:52,950
of 1%, what we're choosing there is that we are willing to commit a type one error, one out of 100

69
00:04:52,980 --> 00:04:54,540
times 1% of the time.

70
00:04:54,540 --> 00:04:59,610
Now, while it's true that we always want to be as confident as possible about our result, I.

71
00:04:59,670 --> 00:05:05,350
Ideally, we would always choose 99% confidence instead of some lower confidence level.

72
00:05:05,360 --> 00:05:12,410
Remember that picking a higher confidence level and therefore a lower alpha value comes at a cost because

73
00:05:12,410 --> 00:05:16,940
the lower the alpha value and we looked at this before when we talked about confidence intervals, the

74
00:05:16,940 --> 00:05:19,890
lower the alpha value, the wider the confidence interval.

75
00:05:19,910 --> 00:05:26,180
In other words, as we decrease the alpha value by increasing the confidence level, it becomes more

76
00:05:26,180 --> 00:05:32,300
difficult to reject the null hypothesis because the region of rejection is shrinking.

77
00:05:32,330 --> 00:05:38,630
Remember before we looked at sort of a graph of the confidence level, in contrast with the alpha value,

78
00:05:38,660 --> 00:05:43,520
we put our confidence level percentage in the middle of our normal distribution, and we said that these

79
00:05:43,520 --> 00:05:50,020
two little regions on either side of this confidence level area together made up the alpha value.

80
00:05:50,030 --> 00:05:55,910
So half of our alpha value is in this lower tail on the left and half of the alpha value is in the upper

81
00:05:55,910 --> 00:05:56,720
tail on the right.

82
00:05:56,720 --> 00:06:01,190
So that's why we say alpha over to alpha over two for each of these tails.

83
00:06:01,220 --> 00:06:05,240
This area in the middle here is what we call the region of acceptance.

84
00:06:05,240 --> 00:06:09,350
These two tails on either side are the regions of rejection.

85
00:06:09,350 --> 00:06:15,830
And what we mean there is that if we find a Z score or a T score that puts us outside of either of these

86
00:06:15,830 --> 00:06:22,640
boundaries here, this boundary on the left and this boundary on the right, if we find a Z score or

87
00:06:22,640 --> 00:06:27,290
a T score, depending on which one we're supposed to be using, that puts us above this boundary over

88
00:06:27,290 --> 00:06:29,330
here in this right side tail.

89
00:06:29,330 --> 00:06:31,220
Then we're in the region of rejection.

90
00:06:31,220 --> 00:06:33,350
We will reject the null hypothesis.

91
00:06:33,350 --> 00:06:38,990
Similarly, if we find that Z score or T score, that puts us to the left of this boundary on the lower

92
00:06:38,990 --> 00:06:44,330
edge, such that we're in the region of rejection on this lower tail, then we will reject the null

93
00:06:44,330 --> 00:06:45,170
hypothesis.

94
00:06:45,170 --> 00:06:51,320
But if our Z score or a T score puts us in this middle area here in between these two boundaries, we're

95
00:06:51,320 --> 00:06:56,870
inside that region of acceptance and we will accept the null hypothesis, also known as we will fail

96
00:06:56,870 --> 00:06:57,800
to reject the null.

97
00:06:57,800 --> 00:07:05,150
And so the point is that as this confidence level increases, so as we widen this confidence level percentage

98
00:07:05,150 --> 00:07:10,760
in the middle, this confidence interval and these two boundaries push out toward either side.

99
00:07:10,760 --> 00:07:16,850
So as that confidence level grows, these boundaries push out away from the mean in the center and that

100
00:07:16,850 --> 00:07:18,350
alpha value decreases.

101
00:07:18,350 --> 00:07:22,730
So these tails get smaller, the area in these tails gets smaller.

102
00:07:22,730 --> 00:07:28,880
That means that our region of rejection gets smaller, so we are less likely to end up with a Z score

103
00:07:28,880 --> 00:07:32,900
or a T score that's actually in one of these regions of rejection.

104
00:07:32,900 --> 00:07:36,260
We're more likely to wind up in the region of acceptance.

105
00:07:36,260 --> 00:07:43,460
And so as this alpha value decreases or as this confidence level grows, we are less likely to reject

106
00:07:43,460 --> 00:07:47,480
the null, which means we are less likely to make a type one error.

107
00:07:47,480 --> 00:07:53,840
Now, the fourth scenario, we can end up with this one here where the null hypothesis is actually truly

108
00:07:53,840 --> 00:07:57,500
false, but we accept it anyway when we shouldn't.

109
00:07:57,500 --> 00:08:03,650
We call that, as you may expect, a type two error and the probability of making a type two error we

110
00:08:03,650 --> 00:08:04,760
call beta.

111
00:08:04,790 --> 00:08:10,280
So if the null hypothesis is false and we're hoping to make a correct conclusion, which of course we

112
00:08:10,280 --> 00:08:17,930
are, then we need to fall in the region of rejection because we need to reject the null in order to

113
00:08:17,930 --> 00:08:20,540
make a correct decision when that null is false.

114
00:08:20,540 --> 00:08:26,570
But as the alpha value decreases, so as confidence level increases in alpha value decreases, and these

115
00:08:26,570 --> 00:08:31,760
two boundaries here push out further away from the mean toward the tails, this region of acceptance

116
00:08:31,760 --> 00:08:32,960
in the middle grows.

117
00:08:32,960 --> 00:08:37,549
We are more likely to accept the null instead of reject it, because it's going to be harder for us

118
00:08:37,549 --> 00:08:43,909
to find a Z score or T score that's going to put us in these smaller regions of rejection.

119
00:08:43,909 --> 00:08:46,880
And so we're more likely to make a type two error.

120
00:08:46,880 --> 00:08:53,390
What we realize then is that as we reduce the risk of committing a type one error, we increase the

121
00:08:53,390 --> 00:08:56,540
risk of committing a type two error and vice versa.

122
00:08:56,540 --> 00:09:00,440
So as alpha increases, beta decreases and vice versa.

123
00:09:00,440 --> 00:09:06,680
And in fact, the only way to decrease the probability of committing both errors at the same time will

124
00:09:06,680 --> 00:09:11,480
be to increase our sample size, which again is not always possible.

125
00:09:11,480 --> 00:09:18,290
So we're always balancing these two risks against each other, alpha and beta, unless we can increase

126
00:09:18,290 --> 00:09:23,270
our sample size, in which case we'll be able to reduce the risk of committing a type one error and

127
00:09:23,270 --> 00:09:26,510
reduce the risk of committing a type two error at the same time.

128
00:09:26,510 --> 00:09:32,570
Otherwise, alpha increases as beta decreases and vice versa, which means that most of the time, because

129
00:09:32,570 --> 00:09:37,310
we're always trying to take the best sample that we can and in theory we're using the largest sample

130
00:09:37,310 --> 00:09:38,300
we're able to.

131
00:09:38,330 --> 00:09:42,560
We're always being forced to decide which type of error is more dangerous.

132
00:09:42,560 --> 00:09:45,200
And the answer really depends on the situation.

133
00:09:45,200 --> 00:09:50,420
What we really need to ask ourselves when we're trying to determine this level of significance is what's

134
00:09:50,420 --> 00:09:52,310
our worst case scenario?

135
00:09:52,310 --> 00:09:58,370
So to take an example, let's think about a factory that produces car parts.

136
00:09:58,370 --> 00:09:59,420
So our.

137
00:09:59,530 --> 00:10:00,910
Factory is making.

138
00:10:01,610 --> 00:10:04,220
Parts for automobiles.

139
00:10:04,220 --> 00:10:07,910
And let's say that this factory has a quality control process in place.

140
00:10:07,910 --> 00:10:14,180
So as they produce parts, maybe some employees in the factory are responsible for quality control before

141
00:10:14,180 --> 00:10:15,680
each part is approved.

142
00:10:15,710 --> 00:10:23,090
In a situation like this one, the factory wants a low alpha value because that means they have to reject

143
00:10:23,090 --> 00:10:24,320
fewer parts.

144
00:10:24,320 --> 00:10:27,050
More parts are going to pass quality control.

145
00:10:27,050 --> 00:10:32,450
They'll reject fewer parts, which means that the factory will save money.

146
00:10:32,450 --> 00:10:38,600
It's going to cost less to make each part, and they'll be able to increase profit, which is definitely

147
00:10:38,600 --> 00:10:39,820
an important goal.

148
00:10:39,830 --> 00:10:44,750
But what that means is that more defective parts are likely to get through, which means they'll be

149
00:10:44,750 --> 00:10:49,940
putting more defective parts onto cars and those cars are less safe for consumers.

150
00:10:49,940 --> 00:10:57,620
So if we are the consumer, we actually might want a higher alpha value at that factory because of course

151
00:10:57,620 --> 00:11:00,200
we want our cars as safe as possible.

152
00:11:00,200 --> 00:11:05,660
But that might mean that we have to pay a little bit more money for the car because the car part factory

153
00:11:05,660 --> 00:11:08,330
has a stricter quality control process in place.

154
00:11:08,330 --> 00:11:13,070
So as the consumer, a higher alpha value means increased.

155
00:11:13,760 --> 00:11:14,780
Safety.

156
00:11:15,660 --> 00:11:17,360
But higher costs.

157
00:11:17,370 --> 00:11:20,160
So we're balancing these two things against each other.

158
00:11:20,160 --> 00:11:24,690
And what it really comes down to is the seriousness of the situation.

159
00:11:24,690 --> 00:11:31,680
If we're making car parts or airplane parts or if we're testing medications for human beings, we're

160
00:11:31,680 --> 00:11:37,680
probably going to want a very high confidence level, a very low alpha value, because the stakes are

161
00:11:37,680 --> 00:11:38,250
really high.

162
00:11:38,250 --> 00:11:40,890
It's really, really important we get this right.

163
00:11:40,890 --> 00:11:46,860
But if we're conducting a survey about whether employees at our company prefer chunky peanut butter

164
00:11:46,860 --> 00:11:52,050
versus smooth peanut butter, well, the stakes of that particular hypothesis testing procedure are

165
00:11:52,050 --> 00:11:52,800
not very high.

166
00:11:52,800 --> 00:11:59,040
We might be totally comfortable settling for a 90% confidence level, and we're very willing to commit

167
00:11:59,040 --> 00:12:02,790
a type one error 10% of the time or one in ten times.

168
00:12:02,790 --> 00:12:03,900
The stakes are not high.

169
00:12:03,900 --> 00:12:05,340
It's okay to be less confident.

170
00:12:05,340 --> 00:12:09,840
But if the stakes are very high, if it's something important, if it's a matter of health and safety,

171
00:12:09,840 --> 00:12:14,400
we're probably going to tend toward this higher confidence level, which means this lower alpha value

172
00:12:14,400 --> 00:12:20,430
or lower level of significance in order to reduce the probability that we commit a type one error,

173
00:12:20,430 --> 00:12:25,540
which means reduce the probability that we reject the null hypothesis when it's actually true.

174
00:12:25,560 --> 00:12:32,040
Now, before we wrap up here, we want to come back to really quickly this idea of power that we said

175
00:12:32,040 --> 00:12:33,240
we would revisit.

176
00:12:33,270 --> 00:12:35,910
We give this scenario a special name.

177
00:12:35,910 --> 00:12:42,090
We call it the power of the hypothesis test, because this is really exactly what we want to do.

178
00:12:42,120 --> 00:12:47,850
Remember here that the alternative hypothesis is kind of how we formulate our thinking, and then we

179
00:12:47,850 --> 00:12:50,670
just state the opposite claim for the null.

180
00:12:50,700 --> 00:12:57,870
So here in this simple example, we hypothesized that our order processing time is more than 14 hours.

181
00:12:57,870 --> 00:13:04,350
So what we're looking to do in our hypothesis testing procedure is to pull a sample from the population.

182
00:13:04,350 --> 00:13:09,390
So we would pull a sample of orders that are placed by our customers and then we would collect data

183
00:13:09,390 --> 00:13:14,760
from that sample to see how long it took to process each of those orders in the sample that would allow

184
00:13:14,760 --> 00:13:19,050
us to calculate a sample mean and a sample standard deviation.

185
00:13:19,050 --> 00:13:24,150
And then from there we'd use those statistics and follow the rest of the hypothesis testing procedure,

186
00:13:24,150 --> 00:13:26,490
which we'll look at in the next few lectures.

187
00:13:26,490 --> 00:13:32,250
But the idea at the end of the procedure is that if we want to support this claim that's being made

188
00:13:32,250 --> 00:13:38,010
by the alternative hypothesis, then what we're hoping for is that the null hypothesis is wrong.

189
00:13:38,010 --> 00:13:38,790
It's false.

190
00:13:38,790 --> 00:13:44,400
If we want this alternative hypothesis to be correct, to be true, that means we want the null hypothesis

191
00:13:44,400 --> 00:13:45,510
to be false.

192
00:13:45,510 --> 00:13:51,810
So we want the null hypothesis to be false and we want the sample that we took to support the fact that

193
00:13:51,810 --> 00:13:53,430
the null hypothesis is false.

194
00:13:53,430 --> 00:14:00,420
And if it does, if the statistics we pull from our sample give us enough evidence that we can reject

195
00:14:00,420 --> 00:14:01,110
the null.

196
00:14:01,110 --> 00:14:05,880
And again, we'll talk about what enough evidence means and how we can make a conclusion that we can

197
00:14:05,880 --> 00:14:11,910
reject the null if the null hypothesis is false as we want and the sample supports our ability to reject

198
00:14:11,910 --> 00:14:14,790
the null, then we make the correct decision here.

199
00:14:14,790 --> 00:14:21,210
And the conclusion of our hypothesis test is that we give support to the idea that order processing

200
00:14:21,240 --> 00:14:23,070
time is greater than 14 hours.

201
00:14:23,070 --> 00:14:28,860
We lend support to the hypothesis we sought to investigate in the first place, and we call that the

202
00:14:28,860 --> 00:14:30,810
power of the hypothesis test.

203
00:14:30,810 --> 00:14:36,720
It's the probability that we will reject the null when it's false, which is exactly what we're hoping

204
00:14:36,720 --> 00:14:37,380
to do.

205
00:14:37,380 --> 00:14:42,210
That would be the best possible outcome of the whole hypothesis test.

206
00:14:42,210 --> 00:14:50,730
So the higher the power of our test, the better off we are, and the power of the test is equal to

207
00:14:50,730 --> 00:14:52,770
one minus beta.

208
00:14:52,800 --> 00:14:55,740
Where beta is that probability that we make a type two error.

209
00:14:55,740 --> 00:15:01,200
So this value becomes important to us as we go forward with our hypothesis testing procedure.

210
00:15:01,200 --> 00:15:07,500
So just to recap, we said that step one of the procedure was to state null and alternative hypotheses.

211
00:15:07,500 --> 00:15:08,670
We did that up here.

212
00:15:08,670 --> 00:15:14,490
For our example, step two was to choose a level of significance, and that's what we've talked about

213
00:15:14,490 --> 00:15:15,060
here.

214
00:15:15,060 --> 00:15:20,460
Choosing a level of significance means choosing an alpha value, which in turn of course means picking

215
00:15:20,460 --> 00:15:21,660
a confidence level.

216
00:15:21,660 --> 00:15:25,500
If the stakes are high, it's important that we pick a higher confidence level.

217
00:15:25,500 --> 00:15:29,520
If the stakes aren't as high, we might be able to get away with a lower confidence level.

218
00:15:29,520 --> 00:15:33,150
So that's what we're considering when we choose a level of significance.

219
00:15:33,150 --> 00:15:39,840
Of course, once we pick a confidence level that comes with the associated alpha value or level of significance.

220
00:15:39,840 --> 00:15:45,300
And so we are consciously making this decision where based on the level of significance or alpha value

221
00:15:45,300 --> 00:15:52,110
that we choose, we are defining our willingness for making a type one error, which means our willingness

222
00:15:52,110 --> 00:15:55,620
to reject the null hypothesis when it's actually true.

223
00:15:55,620 --> 00:16:01,980
So if we've chosen a level of significance of 10%, we're saying that we're willing to be wrong in this

224
00:16:01,980 --> 00:16:02,850
specific way.

225
00:16:02,880 --> 00:16:08,730
10% of the time, we're willing to reject the null when it's actually true 10% of the time, if that's

226
00:16:08,730 --> 00:16:14,100
our chosen level of significance in this example with these hypothesis statements were hypothesizing

227
00:16:14,100 --> 00:16:14,460
that order.

228
00:16:14,640 --> 00:16:18,720
Processing time is longer than 14 hours if the null hypothesis is true.

229
00:16:18,750 --> 00:16:23,670
It means that order processing time is actually less than or equal to 14 hours.

230
00:16:23,670 --> 00:16:28,530
But rejecting the null hypothesis would mean saying that we found support.

231
00:16:28,530 --> 00:16:33,090
That order processing time was longer than 14 hours, even though it actually isn't.

232
00:16:33,090 --> 00:16:38,880
So what we're saying there is that 10% of the time to make the conclusion that order processing time

233
00:16:38,880 --> 00:16:41,730
is longer than 14 hours when it actually isn't.

234
00:16:41,730 --> 00:16:44,200
It's actually less than or equal to 14 hours.

235
00:16:44,220 --> 00:16:49,530
More specifically, when we say we're willing to be wrong 10% of the time, what we mean is that we

236
00:16:49,530 --> 00:16:56,040
are willing for one out of ten samples that we pull from our population to lead us to this specific

237
00:16:56,040 --> 00:16:58,170
type of wrong conclusion.

238
00:16:58,320 --> 00:17:02,220
So those are the first two steps of the hypothesis testing procedure.

239
00:17:02,220 --> 00:17:04,829
Step one state our hypothesis statements.

240
00:17:04,829 --> 00:17:07,260
Step two determine level of significance.

241
00:17:07,260 --> 00:17:11,520
And once we've done those first two steps, our next step, which we'll look at in the next lecture,

242
00:17:11,520 --> 00:17:14,400
will be calculating our test statistic.