1
00:00:00,090 --> 00:00:07,110
We talked briefly before about this idea that a sample mean, let's say we call it X bar might be a

2
00:00:07,110 --> 00:00:11,140
good or a bad estimate of the associated population mean.

3
00:00:11,160 --> 00:00:14,700
MU It really just depends on the sample that we take.

4
00:00:14,700 --> 00:00:21,510
We might get lucky and pull a random sample that does a good job representing the population and therefore

5
00:00:21,540 --> 00:00:24,480
that sample mean is close to the population mean.

6
00:00:24,480 --> 00:00:31,590
Or we might get unlucky or just do a bad job and find a non representative sample where the sample mean

7
00:00:31,590 --> 00:00:36,000
does not do a good job estimating that associated population parameter.

8
00:00:36,000 --> 00:00:40,200
The population mean this is what we call a point estimate.

9
00:00:40,200 --> 00:00:45,780
So the sample mean is a point estimate for the population mean because these are single points.

10
00:00:45,780 --> 00:00:53,040
They're single values in the same way that sample standard deviation is a point estimate for population

11
00:00:53,040 --> 00:00:54,120
standard deviation.

12
00:00:54,120 --> 00:00:59,520
Now, the benefit of using just a point estimate is that it's pretty easy to do, right?

13
00:00:59,520 --> 00:01:05,400
If this is what we're doing here, we can pull a sample, we calculate sample standard deviation, and

14
00:01:05,400 --> 00:01:11,070
we say, okay, based on the value we found for this standard deviation for this single sample, we

15
00:01:11,070 --> 00:01:16,110
estimate that population standard deviation is equivalent to that sample standard deviation that we

16
00:01:16,110 --> 00:01:16,410
found.

17
00:01:16,410 --> 00:01:17,820
It's pretty easy to do.

18
00:01:17,820 --> 00:01:23,610
But the drawback, of course, is that calculating just that one point estimate doesn't really tell

19
00:01:23,610 --> 00:01:26,940
us how good or bad that estimate really is.

20
00:01:26,940 --> 00:01:29,310
Again, the point estimate could be good.

21
00:01:29,340 --> 00:01:30,870
It could be a bad estimate.

22
00:01:30,870 --> 00:01:32,910
We wouldn't really know either way.

23
00:01:32,910 --> 00:01:35,610
So in contrast to a.

24
00:01:36,370 --> 00:01:40,180
Point estimate, which doesn't necessarily do a great job.

25
00:01:40,420 --> 00:01:48,550
What we instead may do, may choose to do is calculate an interval estimate, which gives us a range

26
00:01:48,550 --> 00:01:52,450
of values in which the population parameter may lie.

27
00:01:52,480 --> 00:01:55,860
Of course, finding an interval estimate is a little harder to do.

28
00:01:55,870 --> 00:02:00,880
It takes a little more work than the point estimate, but it gives us a lot more information.

29
00:02:00,880 --> 00:02:06,610
So if we use an interval estimate and we touched on this very briefly in a previous lesson, but if

30
00:02:06,610 --> 00:02:12,250
we use an interval estimate, we're able to make statements like I'm 95% confident that the population

31
00:02:12,250 --> 00:02:14,620
mean lies in the interval A to B.

32
00:02:14,650 --> 00:02:24,100
Now, that confidence we just mentioned, the 95% confidence is what's called the confidence level confidence

33
00:02:24,100 --> 00:02:28,230
level or we'll call it c L for short.

34
00:02:28,240 --> 00:02:35,650
And when it comes to confidence levels, we will very often use either 90% confidence, 95% confidence

35
00:02:35,650 --> 00:02:38,530
or 99% confidence.

36
00:02:38,530 --> 00:02:41,410
Those are the most commonly used confidence levels.

37
00:02:41,410 --> 00:02:47,200
And what the confidence level really tells us is the probability that an interval estimate will actually

38
00:02:47,200 --> 00:02:49,120
include the population parameter.

39
00:02:49,120 --> 00:02:56,830
So for instance, if we choose a 95% confidence level and using that 95% confidence level, the interval

40
00:02:56,830 --> 00:03:00,790
we calculate is let's say 11 to 15.

41
00:03:00,790 --> 00:03:07,480
Then what we're saying is that there's a 95% chance that the population parameter will say the population

42
00:03:07,480 --> 00:03:11,470
mean lies somewhere between 11 and 15.

43
00:03:11,470 --> 00:03:16,660
Or another way to put that is that for 95% of the samples that we could take.

44
00:03:16,660 --> 00:03:24,130
So if we were to take 100 samples, then 95% of them would give us an interval where the population

45
00:03:24,130 --> 00:03:26,210
mean lies inside of that interval.

46
00:03:26,230 --> 00:03:31,870
5% of the samples we take will give us an interval outside of the population mean.

47
00:03:31,870 --> 00:03:37,030
So let's say that the population mean is actually we'll say 13.

48
00:03:37,030 --> 00:03:44,020
What we're saying is that 95% of our samples will give us a confidence interval around 13 like this

49
00:03:44,020 --> 00:03:45,250
111 to 15.

50
00:03:45,250 --> 00:03:53,800
But 5% of our samples might give us an interval like we'll say 14 to 18.

51
00:03:53,800 --> 00:04:01,300
And obviously the interval 14 to 18 does not include the population mean 13, 13 lies outside of the

52
00:04:01,300 --> 00:04:02,960
interval, 14 to 18.

53
00:04:02,980 --> 00:04:07,990
Now, before we go further, you might be wondering why wouldn't we always just choose the largest or

54
00:04:07,990 --> 00:04:09,550
highest confidence level?

55
00:04:09,550 --> 00:04:12,940
Don't we always want to be as confident as we possibly can be?

56
00:04:12,970 --> 00:04:14,560
The answer is yes.

57
00:04:14,560 --> 00:04:14,980
Of course.

58
00:04:14,980 --> 00:04:16,930
We want to be as confident as we can be.

59
00:04:16,930 --> 00:04:21,010
But choosing a higher confidence level comes with drawbacks.

60
00:04:21,010 --> 00:04:28,180
For example, in order to achieve this higher confidence level, we may need to take a larger sample.

61
00:04:28,180 --> 00:04:31,510
And taking a larger sample might not always be possible.

62
00:04:31,510 --> 00:04:36,670
So there are other factors we have to balance against our level of confidence in general.

63
00:04:36,670 --> 00:04:41,770
Yes, we want the highest confidence level we can get, but not at the expense of some other important

64
00:04:41,770 --> 00:04:45,490
aspect of our sampling and hypothesis testing procedure.

65
00:04:45,490 --> 00:04:49,450
And so we kind of have to balance those two things against each other, and we'll talk about that in

66
00:04:49,450 --> 00:04:50,950
more detail later on.

67
00:04:51,100 --> 00:04:57,880
Now, this idea of confidence level is directly related to a really important value in statistics,

68
00:04:57,880 --> 00:04:59,800
which is called the alpha value.

69
00:04:59,830 --> 00:05:06,130
We indicate it with the Greek letter alpha, and the alpha value is simply the percentage that remains

70
00:05:06,130 --> 00:05:08,140
outside of this confidence level.

71
00:05:08,140 --> 00:05:13,600
So the alpha value associated with a 90% confidence level is 10%.

72
00:05:13,630 --> 00:05:20,590
The alpha value associated with a 95% confidence level is 5%, and the alpha value associated with a

73
00:05:20,590 --> 00:05:23,530
99% confidence level is 1%.

74
00:05:23,530 --> 00:05:27,340
We also call this the level of significance.

75
00:05:27,340 --> 00:05:33,040
So the level of significance or and we'll talk about this later on to we can also call this the probability

76
00:05:33,040 --> 00:05:36,430
of making a type one error.

77
00:05:36,430 --> 00:05:44,500
But either way, for now, we'll just recognize that the alpha value is always equal to one minus the

78
00:05:44,500 --> 00:05:45,460
confidence level.

79
00:05:45,460 --> 00:05:52,660
Now we can always visualize the alpha value as the total area under the normal distribution outside

80
00:05:52,660 --> 00:05:54,940
of what's called the confidence interval.

81
00:05:54,940 --> 00:05:57,550
Here, we're going to be talking about the confidence interval for the mean.

82
00:05:57,550 --> 00:05:59,710
We'll get to that confidence interval in a second.

83
00:05:59,710 --> 00:06:05,230
But if we look here at a normal distribution, so this is the standard, normal distribution with the

84
00:06:05,230 --> 00:06:08,110
mean of zero and a standard deviation of one.

85
00:06:08,110 --> 00:06:13,150
We can show here as an example, a confidence level of 90%.

86
00:06:13,150 --> 00:06:20,500
What that means is that 90% of the area under this distribution or the area under the curve is contained

87
00:06:20,500 --> 00:06:26,590
within these boundaries that we've set, which means, of course, that 10% of the area under the curve

88
00:06:26,590 --> 00:06:29,380
lies outside of these two boundaries.

89
00:06:29,380 --> 00:06:35,670
So if 10% is the alpha value, it's the area outside of this 90% that we've set.

90
00:06:35,710 --> 00:06:41,530
Entered here in the middle, then that means that in this lower tail we're going to have this 5% value

91
00:06:41,530 --> 00:06:42,910
alpha divided by two.

92
00:06:42,910 --> 00:06:46,650
And in the upper tail, we're going to have alpha divided by two or 5%.

93
00:06:46,720 --> 00:06:54,130
So we take the total 10% that lies outside of this 90%, and we divide it exactly in half to put half

94
00:06:54,130 --> 00:06:58,300
of the alpha value in the lower tail and half of the alpha value in the upper tail.

95
00:06:58,300 --> 00:07:00,280
So everything's evenly distributed here.

96
00:07:00,280 --> 00:07:05,740
The 90% in the middle, half the alpha value in the lower tail, half the alpha value in the upper tail,

97
00:07:05,740 --> 00:07:11,410
and in this case with a 90% confidence level, that 90% goes in the middle, which means we have 5%

98
00:07:11,410 --> 00:07:13,240
on the left and 5% on the right.

99
00:07:13,240 --> 00:07:17,950
What we can say then, because this is the standard normal distribution.

100
00:07:17,980 --> 00:07:26,950
Z we know that to find this boundary right here between the lower 5% and that middle 90%, we would

101
00:07:26,950 --> 00:07:30,010
need to look up in the body of our Z table.

102
00:07:30,010 --> 00:07:32,230
This 5% value here.

103
00:07:32,230 --> 00:07:41,680
If we locate that 5% value in the Z table, we find a Z score of -1.65 to find this boundary right here,

104
00:07:41,680 --> 00:07:48,100
this upper boundary, we have the 5% in this lower tail and then the middle 90%, which means this boundary

105
00:07:48,100 --> 00:07:54,940
right here separates 95% of the area on the left from this last 5% of the area on the right.

106
00:07:55,030 --> 00:08:01,120
So we would look up 95% in the body of our Z table or 0.9500.

107
00:08:01,120 --> 00:08:07,990
And when we find that value in the body, we see that it's associated with the Z score of positive 1.65.

108
00:08:08,020 --> 00:08:14,440
Now, of course, it makes sense here that we get exactly opposite Z values because this middle 90%

109
00:08:14,440 --> 00:08:16,690
is perfectly centered at the mean.

110
00:08:16,690 --> 00:08:18,400
It's symmetric around the mean.

111
00:08:18,400 --> 00:08:24,940
And so the distance from the mean to this lower boundary here is the same as the distance from the mean

112
00:08:24,940 --> 00:08:26,890
to this upper boundary here.

113
00:08:26,890 --> 00:08:29,080
And so we're going to get opposite Z scores.

114
00:08:29,110 --> 00:08:32,380
That being said, now is a good time to mention.

115
00:08:32,380 --> 00:08:39,190
Remember we said that 90%, 95% and 99% are the most commonly used confidence levels.

116
00:08:39,190 --> 00:08:45,730
Well, because we're always centering that percentage of the area in the middle of this standard normal

117
00:08:45,730 --> 00:08:46,600
distribution.

118
00:08:46,600 --> 00:08:52,060
That means we're going to have these little symmetric lower and upper tails on either side of that middle

119
00:08:52,060 --> 00:08:57,100
90%, 95% or 99%, which means we're always going to get opposite Z scores.

120
00:08:57,100 --> 00:09:05,440
So when we choose a 90% confidence level, we will always find the Z scores -1.65 and positive 1.65

121
00:09:05,440 --> 00:09:08,440
that bound that middle 90% of the area.

122
00:09:08,590 --> 00:09:16,660
If we use a 95% confidence level, our Z scores are positive and -1.96 And if we use a 99% confidence

123
00:09:16,660 --> 00:09:20,890
level, our Z scores are always positive or -2.58.

124
00:09:20,890 --> 00:09:27,340
So because we very commonly use these confidence levels, that means of course that we will very commonly

125
00:09:27,340 --> 00:09:29,200
use these Z scores.

126
00:09:29,200 --> 00:09:37,090
So we'll see these values 1.65, 1.96 and 2.58 come up all the time over and over again as we start

127
00:09:37,090 --> 00:09:39,640
to work through the hypothesis testing process.

128
00:09:39,640 --> 00:09:41,620
So it's worth pointing them out here.

129
00:09:41,620 --> 00:09:47,050
Now, with all that background, let's talk about this main idea here, which is the confidence interval

130
00:09:47,050 --> 00:09:47,710
for the mean.

131
00:09:47,710 --> 00:09:53,620
Going back to this idea of point estimates versus interval estimates, the confidence interval is the

132
00:09:53,620 --> 00:09:55,630
interval estimate that we're talking about.

133
00:09:55,630 --> 00:10:00,610
Now, the formula that we use to find it and we'll break this down is this formula right here.

134
00:10:00,610 --> 00:10:05,650
So this is the interval from A to B, this is the left edge of the interval A or the lower edge.

135
00:10:05,680 --> 00:10:08,200
This is the upper edge or the right edge of the interval.

136
00:10:08,230 --> 00:10:13,900
B So we have this interval from A on the left side to be on the right side, and it's equal to, you

137
00:10:13,900 --> 00:10:20,500
can see here the sample mean plus or minus the T score that we find multiplied by.

138
00:10:20,500 --> 00:10:26,380
And this value here you may recognize as standard error, which is sample standard deviation divided

139
00:10:26,380 --> 00:10:28,240
by the square root of the sample size.

140
00:10:28,240 --> 00:10:34,090
So sometimes you'll see the formula written this way where we have standard error written out as sample

141
00:10:34,090 --> 00:10:36,820
standard deviation divided by square root of sample size.

142
00:10:36,820 --> 00:10:42,100
And sometimes you'll see the formula written this way where we have se sub x bar, which indicates standard

143
00:10:42,100 --> 00:10:42,550
error.

144
00:10:42,550 --> 00:10:45,340
But either way, here's the first thing we want to point out.

145
00:10:45,340 --> 00:10:52,690
We are showing in our confidence interval formula a T score both here and here in both representations

146
00:10:52,690 --> 00:10:53,650
of the formula.

147
00:10:53,650 --> 00:10:56,260
But up to now we've been talking about Z scores.

148
00:10:56,260 --> 00:11:01,390
We talked about the standard normal distribution here with Z, We talked about these Z scores associated

149
00:11:01,390 --> 00:11:02,950
with each confidence level.

150
00:11:02,950 --> 00:11:10,000
Well, we can use a Z score if we happen to know population standard deviation, but whenever population

151
00:11:10,000 --> 00:11:15,010
standard deviation is unknown, which of course, as you can imagine in the real world, it is almost

152
00:11:15,010 --> 00:11:16,300
always unknown.

153
00:11:16,300 --> 00:11:20,350
Then we need to use a T score instead of a Z score.

154
00:11:20,350 --> 00:11:26,530
But if you did happen to know population standard deviation, then instead of using T here in these

155
00:11:26,530 --> 00:11:28,810
formulas we would use Z.

156
00:11:29,630 --> 00:11:30,640
In its place.

157
00:11:30,650 --> 00:11:35,510
But assuming, of course, that population standard deviation is unknown, which is most likely the

158
00:11:35,510 --> 00:11:40,490
case, then we'll need to use a TX score and this will be our formula for the confidence interval.

159
00:11:40,520 --> 00:11:46,820
Basically, the idea for using a PT score when population standard deviation is unknown is that sample

160
00:11:46,820 --> 00:11:53,780
standard deviation is of course a less reliable predictor of population standard deviation than population

161
00:11:53,780 --> 00:11:55,300
standard deviation itself.

162
00:11:55,310 --> 00:12:03,320
So because s here may not be perfectly accurate, we need to use a more conservative t value instead

163
00:12:03,320 --> 00:12:09,260
of a z value to find the confidence interval In the same way, even if we did no population standard

164
00:12:09,260 --> 00:12:15,830
deviation, we would still have to use a PT score if our sample size was small, because in the same

165
00:12:15,830 --> 00:12:22,340
way we would consider that small sample size to be maybe not so reliable and so we would use a more

166
00:12:22,340 --> 00:12:25,060
conservative t score instead of a Z score.

167
00:12:25,070 --> 00:12:30,950
So just to take a simple example here, so we can show how to calculate the confidence interval, let's

168
00:12:30,950 --> 00:12:38,300
say that the mean exam score of a sample of ten randomly selected students is 86.7.

169
00:12:38,300 --> 00:12:47,240
So we'll say that our sample size is n equals ten, that our sample mean is 86.7.

170
00:12:47,240 --> 00:12:51,770
And we'll say that we calculated a sample standard deviation of.

171
00:12:52,650 --> 00:12:54,030
5.72.

172
00:12:54,030 --> 00:13:00,150
And what we want to do now is find a confidence interval for the population mean at a confidence level

173
00:13:00,150 --> 00:13:01,480
of 99%.

174
00:13:01,500 --> 00:13:07,470
So we want to be really confident here in our result and let's see what that looks like.

175
00:13:07,470 --> 00:13:13,680
So because population standard deviation is unknown, and even if we did know it because our sample

176
00:13:13,680 --> 00:13:19,560
size is small and by small we mean GN is less than 30, our sample size is ten.

177
00:13:19,560 --> 00:13:25,620
We will have to use a T score instead of a Z score because we know that our sample size is n equals

178
00:13:25,620 --> 00:13:25,950
ten.

179
00:13:25,950 --> 00:13:31,260
That means degrees of freedom is equal to ten minus one or nine.

180
00:13:31,260 --> 00:13:38,460
And so if we pull up our t table and this is just a section of the T table that we can use, we look

181
00:13:38,460 --> 00:13:41,310
for degrees of freedom of nine.

182
00:13:41,310 --> 00:13:46,440
We see that over here on the left hand side and then we want 99% confidence.

183
00:13:46,440 --> 00:13:53,550
So if we come down along the bottom to 99% confidence, we see 99% confidence here.

184
00:13:53,850 --> 00:13:57,210
We have degrees of freedom of nine.

185
00:13:57,210 --> 00:14:04,410
And so if we find the intersection of that row and column, we get this value here, 3.25.

186
00:14:04,410 --> 00:14:11,280
Now, if we substitute what we know into our confidence interval formula right here, let's say that

187
00:14:11,280 --> 00:14:15,330
we get the sample mean, which we know is 86.7.

188
00:14:15,330 --> 00:14:21,510
So 86.7 plus or minus the T score that we found 3.25.

189
00:14:21,510 --> 00:14:24,180
So 3.25.

190
00:14:24,180 --> 00:14:30,420
And then we multiply that by sample standard deviation, which we calculated from our sample to be 5.72.

191
00:14:30,420 --> 00:14:38,370
So 5.72 divided by the square root of sample size, our sample size is and equals ten.

192
00:14:38,370 --> 00:14:40,500
So we get square root ten.

193
00:14:40,500 --> 00:14:48,510
And if we use a calculator to find this value, what we get here is 86.7 plus or minus.

194
00:14:48,510 --> 00:14:58,440
This 3.25 times 5.72 divided by root ten is approximately 5.88, which means then that our confidence

195
00:14:58,440 --> 00:15:04,200
interval is A to B equals on the lower end of the confidence interval.

196
00:15:04,200 --> 00:15:12,780
We have 86.7 -5.88 and on the upper end of the confidence interval we have 86.7 plus.

197
00:15:13,830 --> 00:15:14,940
5.88.

198
00:15:14,940 --> 00:15:26,370
And the result there is a confidence interval A to B that is equal to on the lower end, 80.82 and on

199
00:15:26,370 --> 00:15:30,810
the upper end 92.58.

200
00:15:30,810 --> 00:15:34,520
And that is how we calculate a confidence interval.

201
00:15:34,530 --> 00:15:41,070
What it says here, the conclusion that we're basically making is that we are investigating some population

202
00:15:41,070 --> 00:15:42,090
of students.

203
00:15:42,090 --> 00:15:49,170
We took a random sample of ten of those students and from those ten students we calculated a mean exam

204
00:15:49,170 --> 00:15:50,700
score of 86.7.

205
00:15:50,700 --> 00:15:53,670
So we added up the exam score for each of those ten students.

206
00:15:53,670 --> 00:15:58,350
And for those ten students, we calculated a sample mean of 86.7.

207
00:15:58,350 --> 00:16:01,260
So the mean test score was 86.7.

208
00:16:01,260 --> 00:16:05,910
And from that sample we calculated a sample standard deviation of 5.72.

209
00:16:05,910 --> 00:16:11,280
Now, if we were just using a point estimate like we talked about at the beginning, we could stop there

210
00:16:11,280 --> 00:16:14,490
and say that we expect that the population mean.

211
00:16:14,490 --> 00:16:19,890
So for all of the students, not just these ten, but for all the students in our population, we expect

212
00:16:19,890 --> 00:16:22,890
that the mean exam score is about 86.7.

213
00:16:22,890 --> 00:16:28,920
But again, going back to the original point, this particular sample of ten students might be a really

214
00:16:28,920 --> 00:16:31,620
bad estimate of the larger population.

215
00:16:31,620 --> 00:16:37,170
It might not be very representative, it might do a bad job estimating population mean it might also

216
00:16:37,170 --> 00:16:40,380
do a really good job, but we don't really know either way.

217
00:16:40,380 --> 00:16:45,960
It's just a quick and dirty way to find an estimate for population mean, but it doesn't say anything

218
00:16:45,960 --> 00:16:49,770
about how confident we are that this is a good estimate or a bad estimate.

219
00:16:49,770 --> 00:16:52,980
So instead we compute this confidence interval.

220
00:16:52,980 --> 00:17:00,000
And because we used a confidence level of 99%, remember we looked up this 99% confidence level in our

221
00:17:00,000 --> 00:17:00,780
t table.

222
00:17:00,780 --> 00:17:05,790
We had to use a T score instead of a Z score because number one, we didn't know the population standard

223
00:17:05,790 --> 00:17:06,480
deviation.

224
00:17:06,480 --> 00:17:09,810
And even if we did know it, we were taking a small sample.

225
00:17:09,810 --> 00:17:11,970
We had a sample size smaller than 30.

226
00:17:11,970 --> 00:17:15,540
So either of those factors would have required us to use a T score.

227
00:17:15,540 --> 00:17:20,520
We had both of them, so we certainly had to use a T score instead of a Z score, but we looked up that

228
00:17:20,520 --> 00:17:24,900
T score and now this confidence interval is an interval estimate.

229
00:17:24,930 --> 00:17:32,040
It tells us that we are 99% certain that the population mean the mean exam score for the entire group

230
00:17:32,040 --> 00:17:32,520
of students.

231
00:17:32,520 --> 00:17:38,820
The entire population falls somewhere between 80.82 and 92.58.

232
00:17:38,820 --> 00:17:45,180
And as you can see in a lot of ways, that gives us a lot more information than just this single point

233
00:17:45,180 --> 00:17:51,270
estimate where we calculated a sample mean of 86.7 off of just one sample of only ten students.

234
00:17:51,270 --> 00:17:56,430
Now, remember, before we talked about trade offs with the confidence level here, we chose a 99% confidence

235
00:17:56,430 --> 00:17:58,380
level and we found an interval.

236
00:17:58,380 --> 00:17:59,280
Let's round here.

237
00:17:59,280 --> 00:18:08,160
We found an interval that said that the population mean was probably between about 81 and 93, just

238
00:18:08,160 --> 00:18:11,310
rounding these values to the nearest whole number to make it a little easier to see.

239
00:18:11,310 --> 00:18:16,680
But going back to those trade offs, what happens is if we're willing to give up some confidence, so

240
00:18:16,680 --> 00:18:21,840
maybe instead of choosing 99%, we're instead willing to settle for a lower level of confidence and

241
00:18:21,840 --> 00:18:23,490
choose 90% confidence.

242
00:18:23,490 --> 00:18:29,430
Decreasing our confidence level will actually narrow this confidence interval.

243
00:18:29,430 --> 00:18:35,190
So maybe the confidence interval associated with 90% and I haven't actually calculated it, but maybe

244
00:18:35,190 --> 00:18:37,950
it would be something like 85.

245
00:18:38,710 --> 00:18:42,670
To 90 instead of 81 to 93.

246
00:18:42,700 --> 00:18:47,920
In other words, the higher the confidence, the wider the confidence interval, the lower the confidence,

247
00:18:47,920 --> 00:18:49,600
the narrower the confidence interval.

248
00:18:49,600 --> 00:18:55,390
And we like a narrower confidence interval because it really hones in on or gets us closer to gives

249
00:18:55,390 --> 00:19:01,090
us a smaller interval in which our population parameter or population mean likely lies.

250
00:19:01,090 --> 00:19:06,070
But we can't be quite as confident about that narrower interval.

251
00:19:06,100 --> 00:19:12,790
To give you another example, if you ask me to find an interval around the mean height of all American

252
00:19:12,790 --> 00:19:18,910
females, so the entire population is every female in America, and we're interested in the mean height

253
00:19:18,910 --> 00:19:27,400
of that population, I might be able to tell you that I am 99% confident that that population mean lies

254
00:19:27,400 --> 00:19:34,390
somewhere between four feet ten inches and let's say six feet three inches tall.

255
00:19:34,420 --> 00:19:39,010
We can be virtually guaranteed that the population mean is somewhere in this interval.

256
00:19:39,010 --> 00:19:40,630
This is a really wide interval.

257
00:19:40,630 --> 00:19:48,490
But if I start to narrow that interval and I maybe find an interval like five feet four inches to five

258
00:19:48,490 --> 00:19:56,110
feet, six inches, well, I really narrowed the interval, but maybe I'm only 90% confident or even

259
00:19:56,110 --> 00:20:01,510
less confident that the mean height of all American females is somewhere in that narrower interval.

260
00:20:01,510 --> 00:20:07,360
So the higher the confidence level, the wider the interval has to be by definition, the narrower the

261
00:20:07,360 --> 00:20:10,780
confidence interval, the lower our confidence level.

262
00:20:10,780 --> 00:20:16,180
So that is how we calculate the confidence interval for the mean of a population.

263
00:20:16,180 --> 00:20:22,120
Next, we'll use all of the information that we've gathered about Z scores and T scores, how to take

264
00:20:22,120 --> 00:20:27,400
samples and calculate confidence intervals, and we will turn toward hypothesis testing and the steps

265
00:20:27,400 --> 00:20:31,510
that we take to investigate a hypothesis from beginning to end.

