1
00:00:00,090 --> 00:00:05,670
So we've said that the standard normal distribution is the Z distribution, and that we can use the

2
00:00:05,670 --> 00:00:10,110
Z table to look up probabilities associated with that Z distribution.

3
00:00:10,140 --> 00:00:14,190
Now we want to talk about what's called the student's T distribution.

4
00:00:14,190 --> 00:00:20,040
So it is also a probability distribution, like the standard normal distribution represented by Z.

5
00:00:20,070 --> 00:00:22,270
So we're talking about the student's T distribution.

6
00:00:22,290 --> 00:00:28,680
It is also associated with its own T table in the same way that the Z distribution is associated with

7
00:00:28,680 --> 00:00:29,760
the Z table.

8
00:00:29,760 --> 00:00:35,250
And of course the T table gives us probabilities associated with this T distribution.

9
00:00:35,280 --> 00:00:41,640
Now the T distribution is similar to the Z distribution in the sense that it is symmetrical, it's bell

10
00:00:41,640 --> 00:00:45,160
shaped and it's centered around this mean of zero.

11
00:00:45,180 --> 00:00:51,690
The mean is always zero, but the T distribution is flatter and wider than the standard normal distribution,

12
00:00:51,690 --> 00:00:58,320
which means that more of the area under the T distribution is pushed out toward the tails, whereas

13
00:00:58,320 --> 00:01:02,320
more of the area in the Z distribution is collected around the mean.

14
00:01:02,340 --> 00:01:07,920
What that means, of course, is that the standard deviation of the T distribution is larger than the

15
00:01:07,920 --> 00:01:11,940
standard deviation of the Z distribution or the standard normal distribution.

16
00:01:11,970 --> 00:01:18,750
Now, the reason that we have this T distribution is because if we're ever using a small enough sample,

17
00:01:18,780 --> 00:01:24,450
the Z distribution becomes unreliable, as we've seen before, and we can kind of intuitively think

18
00:01:24,450 --> 00:01:25,170
about this.

19
00:01:25,170 --> 00:01:30,960
The smaller our sample size, the less reliably that sample is representative of the population.

20
00:01:30,960 --> 00:01:37,200
And so because the sample is less reliable, of course, the standard deviation is going to be larger,

21
00:01:37,200 --> 00:01:42,050
since we can't be as certain that we'll find a value around the mean.

22
00:01:42,060 --> 00:01:47,100
In other words, a smaller sample means less certainty, which means a larger standard deviation, which

23
00:01:47,100 --> 00:01:51,030
means more area is pushed out toward the tails of the distribution.

24
00:01:51,060 --> 00:01:57,030
Now, it turns out that when we're taking small samples, like a sample size of ten or a sample size

25
00:01:57,030 --> 00:02:02,850
of 15, that sample size is too small to reliably use the Z distribution.

26
00:02:02,850 --> 00:02:05,220
So instead we'll use the T distribution.

27
00:02:05,220 --> 00:02:12,240
But if we keep increasing that sample size, the magic number is a sample size of about 30.

28
00:02:12,270 --> 00:02:19,950
A sample size of 30, at which point this T distribution matches almost exactly the standard normal

29
00:02:19,950 --> 00:02:20,780
distribution.

30
00:02:20,790 --> 00:02:26,070
In other words, you can almost imagine here with this T distribution, the smaller the sample is,

31
00:02:26,070 --> 00:02:31,740
as we said before, the larger the standard deviation, which means the flatter this blue curve becomes.

32
00:02:31,740 --> 00:02:38,130
So the tip of the blue curve, the tip of the T distribution moves down like this and the curve gets

33
00:02:38,130 --> 00:02:41,310
flatter as our sample size gets smaller.

34
00:02:41,310 --> 00:02:48,510
But as the sample size gets larger, the peak of this blue curve increases, it moves up, which pulls

35
00:02:48,510 --> 00:02:54,270
in the area from the left and right side pulls in the area closer in toward the mean, pulling an area

36
00:02:54,270 --> 00:02:57,900
away from the tails and in toward the center of the distribution.

37
00:02:57,900 --> 00:03:06,480
And so the top of the distribution starts to rise and it's at this sample size and equals 30 where the

38
00:03:06,480 --> 00:03:13,380
T distribution has risen enough to match almost exactly the standard normal distribution, the Z distribution.

39
00:03:13,380 --> 00:03:20,130
And so what we can say is that for sample sizes of 30 or greater, in other words, and greater than

40
00:03:20,130 --> 00:03:26,730
or equal to 30, the values we would get from the T distribution and Z distribution are virtually identical.

41
00:03:26,730 --> 00:03:33,360
And so at that point we can revert to our Z distribution and use Z scores looking up values in the Z

42
00:03:33,360 --> 00:03:36,540
table, but for samples that are smaller than 30.

43
00:03:36,540 --> 00:03:42,960
So when NW is less than 30, that's when the T distribution is going to give us more accurate values.

44
00:03:42,960 --> 00:03:46,860
And so we want to find T scores and look them up in the T table.

45
00:03:46,860 --> 00:03:53,970
So we'll use values of T when N is less than 30, we'll use values of Z when N is greater than or equal

46
00:03:53,970 --> 00:03:54,940
to 30.

47
00:03:54,960 --> 00:04:01,200
Now, to go back to this point of the T distribution, we said that the peak of this distribution,

48
00:04:01,200 --> 00:04:08,190
the height of it increases or moves up as sample size increases and moves down as sample size decreases.

49
00:04:08,220 --> 00:04:13,710
Effectively, what we're saying there is that the shape of the T distribution changes based on the sample

50
00:04:13,710 --> 00:04:20,910
size and more specifically, we say that it changes based on what we call the degrees of freedom of

51
00:04:20,910 --> 00:04:28,440
the sample and the degrees of freedom is equal to n minus one where N is the sample size.

52
00:04:28,440 --> 00:04:32,880
So if our sample size is 20, then we'll have 19 degrees of freedom.

53
00:04:32,880 --> 00:04:36,630
If our sample size is ten, we'll have nine degrees of freedom.

54
00:04:36,630 --> 00:04:42,150
Keep in mind that sample size isn't the only condition that determines when we should use a T score

55
00:04:42,150 --> 00:04:43,290
versus a Z score.

56
00:04:43,290 --> 00:04:49,380
For instance, whenever population standard deviation is unknown, we want to use a T score instead

57
00:04:49,380 --> 00:04:50,250
of a Z score.

58
00:04:50,250 --> 00:04:56,940
So we'll be using the T score and the T table very often, which is why it's critical that we look at

59
00:04:56,940 --> 00:04:59,910
it here along with the Z table before we move.

60
00:04:59,980 --> 00:05:00,490
Forward.

61
00:05:00,520 --> 00:05:06,180
So when it comes to looking up values in the tea table, here is our tea table.

62
00:05:06,190 --> 00:05:13,600
Notice that along this left hand side here, down this first column, we show degrees of freedom, starting

63
00:05:13,600 --> 00:05:15,750
at one all the way up to 30.

64
00:05:15,760 --> 00:05:21,940
And the reason that we stop there, you can find tea tables on line for larger degrees of freedom.

65
00:05:21,940 --> 00:05:27,820
But the reason that the standard tea table stops here is because, remember, this is our magic number,

66
00:05:27,820 --> 00:05:34,210
at which point the Z table can take over because the values from the T and Z tables are almost identical

67
00:05:34,210 --> 00:05:38,710
for larger sample sizes, specifically sample sizes greater than or equal to 30.

68
00:05:38,710 --> 00:05:43,510
So in our standard tea table, we just list degrees of freedom from 1 to 30.

69
00:05:43,510 --> 00:05:49,810
So looking up values in the tea table requires us to know the degrees of freedom, which again is given

70
00:05:49,810 --> 00:05:56,470
by n minus one where n is the sample size and to know either upper tail probability which we see here

71
00:05:56,470 --> 00:06:02,390
across the top of the tea table or confidence level, which we see here along the bottom of the table.

72
00:06:02,410 --> 00:06:06,700
Now, we'll talk more later about upper tail probability and confidence level.

73
00:06:06,730 --> 00:06:11,410
All we want to focus on right now is just how to locate values in the tea table.

74
00:06:11,410 --> 00:06:13,510
So we understand that at minimum.

75
00:06:13,510 --> 00:06:18,490
And then later on, we'll dive into these two concepts further so that we can understand them, pair

76
00:06:18,490 --> 00:06:23,350
them with degrees of freedom, and find the correct value within the body of the tea table.

77
00:06:23,350 --> 00:06:29,860
So when it comes to just locating a value in the tea table, we look for the intersection in the body

78
00:06:29,860 --> 00:06:34,660
of the table between degrees of freedom and either upper tail probability or confidence level.

79
00:06:34,660 --> 00:06:40,000
So let's say we know that our upper tail probability is 0.15 or 15%.

80
00:06:40,060 --> 00:06:43,090
That puts us in this third column here.

81
00:06:43,090 --> 00:06:45,940
And let's say we know that we have six degrees of freedom.

82
00:06:46,120 --> 00:06:51,460
Well, of course we would come down here to six degrees of freedom over to this third column and we

83
00:06:51,460 --> 00:06:57,970
would locate this value right here, 1.134 as the intersection of six degrees of freedom, meaning a

84
00:06:57,970 --> 00:07:06,370
sample size of seven and 0.15 upper tail probability or 15% of the area under the tea distribution in

85
00:07:06,370 --> 00:07:07,390
the upper tail.

86
00:07:07,390 --> 00:07:12,850
Now, we said that we could use upper tail probability or confidence level, and the reason is because

87
00:07:12,880 --> 00:07:17,680
upper tail probability and confidence level are kind of two different ways of saying the same thing

88
00:07:17,680 --> 00:07:19,810
or describing the same scenario.

89
00:07:19,810 --> 00:07:25,420
So an upper tail probability of 0.15, if we come all the way down to the bottom of the column here,

90
00:07:25,420 --> 00:07:33,520
we see this 70% figure for confidence level, a confidence level of 70% and an upper tail probability

91
00:07:33,850 --> 00:07:36,790
of 0.15 mean the same thing.

92
00:07:36,790 --> 00:07:42,250
What you can notice if you compare all of the confidence levels with all of the upper tail probabilities,

93
00:07:42,370 --> 00:07:46,240
is that if we look here at confidence level, let's say we have 70% here.

94
00:07:46,240 --> 00:07:50,740
If we subtract 70% from 100%, we get 30%.

95
00:07:50,740 --> 00:07:54,490
If we divide that value by two, we get 15%.

96
00:07:54,490 --> 00:08:00,670
And we see that that is the value here for upper tail probability 0.15 and that holds across the board.

97
00:08:00,670 --> 00:08:07,930
So if we go over here to a 99% confidence level, if we subtract that from 100%, we get 1%.

98
00:08:07,960 --> 00:08:11,560
If we divide 1% by two, we get half a percent.

99
00:08:11,560 --> 00:08:16,900
And if we come up to the top of the column here, we can see that that upper tail probability is half

100
00:08:16,900 --> 00:08:17,800
of 1%.

101
00:08:17,800 --> 00:08:24,700
So really, we're looking up the same thing, whether we look up 0.15 upper tail probability or 70%

102
00:08:24,700 --> 00:08:26,050
for the confidence level.

103
00:08:26,050 --> 00:08:30,430
But sometimes we'll have confidence level and sometimes we'll have upper tail probability.

104
00:08:30,430 --> 00:08:35,650
And so we include both of those in the tea table so that we can quickly find whatever value it is that

105
00:08:35,650 --> 00:08:40,419
we're working with and then look up the corresponding value from the body of the tea table.

106
00:08:40,419 --> 00:08:45,130
Now, the only other thing we really want to say at this point is that, of course all of these values

107
00:08:45,130 --> 00:08:53,740
are important and useful, but the values that we will use by far the most often are given by 90 or

108
00:08:53,740 --> 00:08:59,350
95% confidence or 99% confidence.

109
00:08:59,350 --> 00:09:08,350
And of course, those are associated with upper tail probabilities of 5%, 2.5% and half of a percent.

110
00:09:09,140 --> 00:09:10,160
Respectively.

111
00:09:10,190 --> 00:09:15,020
Again, like we said earlier, we'll talk about these things later, but it is very common for us to

112
00:09:15,020 --> 00:09:19,870
choose a confidence level of 90%, 95% or 99%.

113
00:09:19,880 --> 00:09:25,580
Those are common confidence levels to work with, sort of think about them as industry standards.

114
00:09:25,580 --> 00:09:31,430
And so these are the values we'll see the most, which means that we will very often find ourselves

115
00:09:31,580 --> 00:09:39,890
within these columns of the tea table, not that we won't use the other columns ever, but these will

116
00:09:39,890 --> 00:09:43,600
probably be the columns we'll use most often.

117
00:09:43,610 --> 00:09:47,120
And so we just really want to be aware of these things.

118
00:09:47,120 --> 00:09:51,740
In particular, 0.05 upper table probability is equal to a 90% confidence level.

119
00:09:51,830 --> 00:09:59,570
0.0 to 5 upper tail probability is the same thing as 95% confidence level and 0.005 upper tail probability

120
00:09:59,570 --> 00:10:02,300
is equivalent to a 99% confidence level.

121
00:10:02,300 --> 00:10:08,060
If we can be very comfortable remembering these relationships, then that'll help us a lot moving forward.

122
00:10:08,060 --> 00:10:10,250
So that's it for the student's T distribution.

123
00:10:10,250 --> 00:10:16,220
We just wanted to introduce this idea that the tea distribution is a normal curve symmetric, centered

124
00:10:16,220 --> 00:10:22,010
at the mean of zero, just like the standard normal distribution, but that the exact shape of the tea

125
00:10:22,010 --> 00:10:25,730
distribution will change depending on sample size.

126
00:10:25,730 --> 00:10:31,280
And we will rely heavily on the tea distribution, especially when we're taking small samples or when

127
00:10:31,280 --> 00:10:36,200
population standard deviation is unknown, which it very, very commonly is unknown.

128
00:10:36,200 --> 00:10:40,340
So we'll rely on the tea distribution and therefore the tea table very often.

129
00:10:40,340 --> 00:10:45,680
But now that we understand the difference between the standard normal distribution and the tea distribution

130
00:10:45,680 --> 00:10:51,740
going forward will have the foundation, we need to start investigating hypotheses and making statements

131
00:10:51,740 --> 00:10:58,580
about our level of confidence surrounding the values of the population parameters, specifically the

132
00:10:58,580 --> 00:11:01,340
actual value of the population mean.