1
00:00:00,060 --> 00:00:05,250
Now that we understand the idea of the central limit theorem, we can use it in inferential statistics

2
00:00:05,250 --> 00:00:07,230
to start testing hypotheses.

3
00:00:07,230 --> 00:00:13,680
And the focus of this lesson is going to be the hypothesis testing procedure that will follow each time.

4
00:00:13,680 --> 00:00:18,120
And more specifically, we'll look at this first step of our hypothesis testing procedure.

5
00:00:18,120 --> 00:00:23,280
So in statistics, as we've already talked about, the whole idea is that we collect a sample from the

6
00:00:23,280 --> 00:00:28,440
population, we investigate that sample and then we try to make a clear statement about whether or not

7
00:00:28,440 --> 00:00:33,300
that sample supports some original statement that we've made about our population.

8
00:00:33,300 --> 00:00:40,920
So this process is the hypothesis testing process or the process of inferential statistics, because

9
00:00:40,920 --> 00:00:45,840
we're using information about the sample to make inferences about the population.

10
00:00:45,840 --> 00:00:52,980
Now, the first step in this process is to state our hypothesis, which is a statement of expectation

11
00:00:52,980 --> 00:00:59,070
about some population parameter that we develop for the purpose of testing it, and we can hypothesize

12
00:00:59,070 --> 00:01:00,000
about anything.

13
00:01:00,000 --> 00:01:06,990
So our hypothesis might be that mean order processing time is 17 hours from the time the customer places

14
00:01:06,990 --> 00:01:08,970
the order to the time that we ship it out.

15
00:01:08,970 --> 00:01:14,550
Or we might hypothesize that employees in one department of our company work longer hours than employees

16
00:01:14,550 --> 00:01:15,660
in another department.

17
00:01:15,660 --> 00:01:18,240
Our hypothesis can be anything we want it to be.

18
00:01:18,240 --> 00:01:22,620
It's just that we hypothesize about whatever it is we want to test.

19
00:01:22,620 --> 00:01:28,200
And what we need to realize is that in this hypothesis testing process, in the process of doing inferential

20
00:01:28,200 --> 00:01:33,510
statistics, we actually always create a pair of opposing hypothesis statements.

21
00:01:33,510 --> 00:01:40,020
One of those statements is the null hypothesis and the other is the alternative hypothesis.

22
00:01:40,020 --> 00:01:43,410
We can indicate the null hypothesis as h sub zero.

23
00:01:43,410 --> 00:01:49,530
Sometimes we call it h not and the alternative hypothesis as h sub A We can remember that if we think

24
00:01:49,530 --> 00:01:51,720
about the A standing for alternative.

25
00:01:51,720 --> 00:01:56,640
Now the alternative hypothesis is the abnormality we're looking for in the data.

26
00:01:56,640 --> 00:01:59,010
It's the significance that we're hoping to find.

27
00:01:59,010 --> 00:02:04,740
Once we have our alternative hypothesis, then we always state the opposite claim and that opposite

28
00:02:04,740 --> 00:02:06,720
claim is the null hypothesis.

29
00:02:06,720 --> 00:02:12,300
So as an example, let's say we want to test the hypothesis that employees in the billing department

30
00:02:12,300 --> 00:02:14,910
of our company are working more than full time.

31
00:02:14,910 --> 00:02:16,890
They're working more than 40 hours a week.

32
00:02:16,890 --> 00:02:25,770
In that case, our alternative hypothesis would be that the mean number of hours worked is greater than

33
00:02:25,770 --> 00:02:26,580
40.

34
00:02:26,580 --> 00:02:33,120
Therefore, the null hypothesis is the opposite statement, which means that the null hypothesis is

35
00:02:33,120 --> 00:02:37,320
that the employees work less than or equal to 40 hours a week.

36
00:02:37,320 --> 00:02:43,890
Now, interestingly, when we think about hypotheses, we always tend to think in terms of the alternative

37
00:02:43,890 --> 00:02:47,220
hypothesis, because that's the thing that we're really trying to test.

38
00:02:47,220 --> 00:02:51,960
We're wondering here, do employees in the billing department work more than 40 hours a week?

39
00:02:51,960 --> 00:02:56,160
So we sort of frame up our thinking in terms of the alternative hypothesis.

40
00:02:56,160 --> 00:03:01,920
But when it comes to this hypothesis testing procedure, what we're actually going to be doing is testing

41
00:03:01,920 --> 00:03:02,970
the null hypothesis.

42
00:03:02,970 --> 00:03:08,670
So our focus going forward is going to be on the null hypothesis, even though when we think about what

43
00:03:08,670 --> 00:03:12,180
it is that we want to test, we think about the alternative hypothesis.

44
00:03:12,180 --> 00:03:13,560
That's not wrong.

45
00:03:13,560 --> 00:03:18,150
It's just that we might frame up our thinking in terms of the alternative hypothesis state.

46
00:03:18,150 --> 00:03:20,820
That state the null hypothesis is the opposite.

47
00:03:20,820 --> 00:03:26,910
And then going forward we'll focus on the null hypothesis that opposite statement and work on testing

48
00:03:26,910 --> 00:03:31,020
that hypothesis instead of really looking at the alternative hypothesis.

49
00:03:31,020 --> 00:03:34,650
As we work through this hypothesis testing process.

50
00:03:34,650 --> 00:03:40,680
Now when it comes to setting up the null and alternative hypothesis, there are only three ways that

51
00:03:40,680 --> 00:03:42,090
we can set these up.

52
00:03:42,090 --> 00:03:48,030
This second option here is the one we used in our example where we said that employees in the billing

53
00:03:48,030 --> 00:03:50,160
department worked more than 40 hours a week.

54
00:03:50,160 --> 00:03:56,430
The alternative hypothesis has the mean set greater than some value, which means that the null hypothesis

55
00:03:56,430 --> 00:03:57,390
is the opposite.

56
00:03:57,390 --> 00:04:01,320
Claim that the mean is less than or equal to that same value.

57
00:04:01,560 --> 00:04:07,920
We can have the opposite situation where we set the alternative hypothesis as the mean less than some

58
00:04:07,920 --> 00:04:13,620
value, in which case the null hypothesis is that the mean is greater than or equal to that same value

59
00:04:13,620 --> 00:04:14,400
that we chose.

60
00:04:14,400 --> 00:04:19,950
And then the third option is that we choose an alternative hypothesis where the mean is not equal to

61
00:04:19,950 --> 00:04:20,730
some value.

62
00:04:20,730 --> 00:04:25,980
So in this case, maybe we don't have a suspicion that the employees in the billing department work

63
00:04:25,980 --> 00:04:27,420
more than 40 hours a week.

64
00:04:27,420 --> 00:04:31,740
Maybe we just think that they work some number of hours other than 40.

65
00:04:31,740 --> 00:04:36,930
We're not sure if they work fewer than 40 hours or more than 40 hours, but we don't think that they

66
00:04:36,960 --> 00:04:39,030
work exactly 40 hours a week.

67
00:04:39,030 --> 00:04:44,160
Our suspicion, our hypothesis, the thing we want to test is that the employees in the billing department

68
00:04:44,160 --> 00:04:45,900
do not work 40 hours a week.

69
00:04:45,900 --> 00:04:52,500
So our alternative hypothesis would be that the mean is not equal to 40, and therefore the null hypothesis

70
00:04:52,500 --> 00:04:56,220
that goes along with that would be that the mean is equal to 40.

71
00:04:56,220 --> 00:04:59,760
So again, in this example we said that the billing department were.

72
00:04:59,850 --> 00:05:05,160
It's more than 40 hours a week, which means that we described a relationship between the mean and some

73
00:05:05,160 --> 00:05:06,180
constant value.

74
00:05:06,210 --> 00:05:10,650
We can also set up hypothesis statements that compare to groups.

75
00:05:10,650 --> 00:05:16,770
So if we think that the billing department works more hours than the payroll department, we could say

76
00:05:16,770 --> 00:05:17,640
something like this.

77
00:05:17,640 --> 00:05:25,170
So the mean for the billing department is greater than the mean for the payroll department, in which

78
00:05:25,170 --> 00:05:30,660
case the null hypothesis is the opposite statement that the mean for the billing department is less

79
00:05:30,660 --> 00:05:34,110
than or equal to the mean for the payroll department.

80
00:05:34,110 --> 00:05:41,190
And we could compare means like this with all three of these different statement types, these different

81
00:05:41,190 --> 00:05:43,170
pairs of hypothesis statements.

82
00:05:43,170 --> 00:05:50,310
But that being said, it's important for us to say that we should only ever use these second and third

83
00:05:50,310 --> 00:05:56,280
types here when we have good reason to believe that there's this kind of directionality that we're indicating

84
00:05:56,280 --> 00:05:56,730
here.

85
00:05:56,730 --> 00:06:01,680
In other words, we only want to set up these kind of directional hypothesis statements where we use

86
00:06:01,680 --> 00:06:06,660
either a greater than or less than simple in the alternative hypothesis, when we have a pretty strong

87
00:06:06,660 --> 00:06:12,930
suspicion that that is the directionality that actually exists in the population to continue on with

88
00:06:12,930 --> 00:06:18,780
this example we've been using here, maybe we have data from previous pay stubs for the employees in

89
00:06:18,780 --> 00:06:22,950
our billing department to indicate that they do tend to work more than 40 hours a week.

90
00:06:22,950 --> 00:06:26,280
And what we're doing here is just testing the current pay period.

91
00:06:26,280 --> 00:06:31,500
We already have some kind of prior knowledge, some strong suspicion that they are, in fact, working

92
00:06:31,500 --> 00:06:32,370
more than 40 hours.

93
00:06:32,370 --> 00:06:34,680
And we're looking to test that here.

94
00:06:34,680 --> 00:06:40,890
So we need to have some kind of suspicion for some good reason if we're going to use either of these

95
00:06:40,890 --> 00:06:42,720
sets of hypothesis statements.

96
00:06:42,720 --> 00:06:49,290
Otherwise, we really need to use this first one because this first pair of hypothesis statements has

97
00:06:49,290 --> 00:06:51,270
no directionality to it at all.

98
00:06:51,270 --> 00:06:56,490
And so it's sort of a more conservative approach to take what we're saying here with this alternative

99
00:06:56,490 --> 00:07:02,100
hypothesis that the mean is not equal to some value is that we don't know if the mean is less than that

100
00:07:02,100 --> 00:07:03,960
value or greater than that value.

101
00:07:03,960 --> 00:07:07,140
We don't really have any suspicion at all about the directionality.

102
00:07:07,140 --> 00:07:08,430
We're not sure which way it goes.

103
00:07:08,430 --> 00:07:12,330
We're just saying maybe the mean is not equal to that particular value.

104
00:07:12,330 --> 00:07:17,370
Whereas with these two sets of hypothesis statements, we're being more specific, we're indicating

105
00:07:17,370 --> 00:07:21,780
directionality, we're saying no, we think the mean is greater than some value, not less than the

106
00:07:21,780 --> 00:07:23,970
value, not equal to the value, but greater than the value.

107
00:07:23,970 --> 00:07:26,820
Or here the mean is less than that value.

108
00:07:26,850 --> 00:07:29,700
We're sort of presupposing that directionality.

109
00:07:29,700 --> 00:07:34,320
And so we really need to have some kind of a good reason to think that this is the case.

110
00:07:34,320 --> 00:07:38,970
If we're going to use either one of these pairs of hypothesis statements.

111
00:07:38,970 --> 00:07:40,950
Otherwise we should use this first one.

112
00:07:40,950 --> 00:07:42,870
It's just more conservative.

113
00:07:42,870 --> 00:07:49,260
It doesn't suppose to much if we don't really have any evidence for supposing a particular direction.

114
00:07:49,260 --> 00:07:56,640
So to summarize here, the first step in our hypothesis testing procedure is to restate the null and

115
00:07:56,640 --> 00:07:58,050
alternative hypotheses.

116
00:07:58,080 --> 00:08:02,880
Remember that we'll frame up our thinking around the alternative hypothesis, so we'll sort of think

117
00:08:02,880 --> 00:08:04,350
about what we want to test.

118
00:08:04,350 --> 00:08:10,920
That'll be our alternative hypothesis, and then the null hypothesis will be the opposite of that statement.

119
00:08:11,040 --> 00:08:17,280
But that being said, the alternative hypothesis always needs to be that the mean is not equal to some

120
00:08:17,280 --> 00:08:20,040
value greater than a value or less than a value.

121
00:08:20,070 --> 00:08:26,160
We can't say in the alternative hypothesis that the mean is equal to some value less than or equal to

122
00:08:26,160 --> 00:08:27,630
or greater than or equal to a value.

123
00:08:27,630 --> 00:08:33,929
We need to stick with these symbols in the alternative hypothesis, which means that the null hypothesis

124
00:08:33,929 --> 00:08:39,659
will be the opposite statement and therefore will always only see these values in the null hypothesis.

125
00:08:39,659 --> 00:08:45,660
So we'll save equal to less than or equal to and greater than or equal to for the null hypothesis statement.

126
00:08:45,660 --> 00:08:48,780
So we frame up our thinking around the alternative hypothesis.

127
00:08:48,780 --> 00:08:54,270
We state that as the mean is not equal to some value greater than some value or less than some value.

128
00:08:54,270 --> 00:08:59,730
Then we make the null hypothesis the opposite statement with the opposite sign, which means that we

129
00:08:59,730 --> 00:09:05,970
will end up with either this first pair here of hypothesis statements, this second pair, or this third

130
00:09:05,970 --> 00:09:06,450
pair.

131
00:09:06,450 --> 00:09:12,540
And once we have our pair of hypothesis statements, we can move on to step two, which is about determining

132
00:09:12,540 --> 00:09:13,830
the level of significance.

133
00:09:13,830 --> 00:09:19,800
But keep in mind that as we're going through this hypothesis testing procedure, it's important to always

134
00:09:19,800 --> 00:09:27,780
remember that the goal here of this whole process is to provide support for the claim that's being made

135
00:09:27,780 --> 00:09:30,150
by our alternative hypothesis.

136
00:09:30,150 --> 00:09:34,860
The goal is not to prove that our alternative hypothesis is true.

137
00:09:34,860 --> 00:09:41,430
This hypothesis testing procedure can never actually prove the statement being made in our alternative

138
00:09:41,430 --> 00:09:42,060
hypothesis.

139
00:09:42,060 --> 00:09:46,560
We can never prove either of these hypothesis statements to be wrong or right.

140
00:09:46,560 --> 00:09:50,550
All we can do is lend support or thought about the other way.

141
00:09:50,550 --> 00:09:53,760
Fail to lend support to the statement that we're trying to make.

142
00:09:53,760 --> 00:09:59,580
So if we go through this whole process, the best we can do is find support for or fail to.

143
00:09:59,620 --> 00:10:02,380
Find support for the alternative hypothesis.

144
00:10:02,380 --> 00:10:08,470
And the way that that's going to manifest itself in the end is with sort of this inverted language,

145
00:10:08,470 --> 00:10:13,780
which can be a little confusing, but we will either reject the null hypothesis, and you can imagine

146
00:10:13,780 --> 00:10:19,180
that if we reject the null hypothesis, that's going to give support to the alternative hypothesis since

147
00:10:19,180 --> 00:10:25,030
these are opposite statements, or if this process doesn't give us enough evidence to reject the null,

148
00:10:25,060 --> 00:10:28,870
then the conclusion that will make is that we failed to reject the null.

149
00:10:28,870 --> 00:10:31,600
We didn't have enough evidence to reject the null.

150
00:10:31,600 --> 00:10:34,120
We couldn't say that the null was wrong.

151
00:10:34,120 --> 00:10:39,640
And if we can't say that the null is wrong, that means that we don't really have any support to throw

152
00:10:39,640 --> 00:10:41,530
behind the alternative hypothesis.

153
00:10:41,530 --> 00:10:42,790
Our original statement.

154
00:10:42,790 --> 00:10:48,550
So just remember that this whole process is about support only, not proof, and that what we're going

155
00:10:48,550 --> 00:10:54,910
to do is break down each of these steps to eventually get to the conclusion where we try to reject the

156
00:10:54,910 --> 00:11:01,090
null in order to lend support to the alternative or fail to reject the null, which means that we've

157
00:11:01,090 --> 00:11:03,910
failed to support the alternative.

