1
00:00:00,120 --> 00:00:07,020
So we left off talking about the standard normal distribution and this formula here for Z, where we

2
00:00:07,020 --> 00:00:14,550
say that Z is equal to X minus MU, divided by sigma, where X represents the random variable that we

3
00:00:14,550 --> 00:00:20,130
started with Z is the random variable specifically associated with the standard normal distribution

4
00:00:20,130 --> 00:00:23,040
MU is the mean and sigma is the standard deviation.

5
00:00:23,040 --> 00:00:29,610
And the point that we want to make now is specifically that if our random variable x is normally distributed,

6
00:00:29,610 --> 00:00:36,630
we can always use this formula to convert that normal distribution into the standard normal distribution

7
00:00:36,630 --> 00:00:38,070
associated with Z.

8
00:00:38,070 --> 00:00:43,980
So if we pull back up just a normal distribution, we said that sometimes we have normal distributions

9
00:00:43,980 --> 00:00:49,140
like this with this bell shape, but we can have normal distributions that are a little shorter and

10
00:00:49,140 --> 00:00:49,770
wider.

11
00:00:49,770 --> 00:00:54,270
So the distribution might be shorter than this one, but we can also have normal distributions that

12
00:00:54,270 --> 00:00:57,480
are taller and narrower than this one.

13
00:00:57,480 --> 00:01:02,880
They're all normally distributed, but they just have different measures of spread, they have different

14
00:01:02,880 --> 00:01:03,990
standard deviations.

15
00:01:03,990 --> 00:01:09,360
They're still normal, but some are more spread out and some are less spread out, whereas the standard

16
00:01:09,360 --> 00:01:15,480
normal distribution Z always has that same mean of zero and standard deviation of one.

17
00:01:15,480 --> 00:01:20,310
So that measure of spread for the standard, normal distribution is always the same.

18
00:01:20,310 --> 00:01:25,920
Standard deviation is one, whereas we can have infinitely many normal distributions represented by

19
00:01:25,920 --> 00:01:29,910
X with different measures of spread with different standard deviations.

20
00:01:29,910 --> 00:01:36,240
The point here is that regardless of the normal distribution that we have, regardless of the distribution

21
00:01:36,240 --> 00:01:42,960
of X with its own mean and standard deviation, we want to use this formula, which, if you remember

22
00:01:42,960 --> 00:01:51,030
from the last lecture, uses a shift of MU and a scaling of sigma to change the random variable x into

23
00:01:51,030 --> 00:01:52,410
the random variable z.

24
00:01:52,410 --> 00:01:57,270
We always want to use this formula to convert all of the values that make up the data.

25
00:01:57,270 --> 00:02:05,130
Set X into their corresponding values in the data set Z, and we can see this formula working in action.

26
00:02:05,130 --> 00:02:11,070
Remember that throughout the last few lectures we've been looking at a normal distribution that has

27
00:02:11,070 --> 00:02:15,870
a mean of five and a standard deviation of three.

28
00:02:15,870 --> 00:02:21,450
This normal distribution is obviously not the standard normal distribution, because the standard normal

29
00:02:21,450 --> 00:02:24,570
distribution has a mean of zero and a standard deviation of one.

30
00:02:24,570 --> 00:02:28,620
And this normal distribution has a mean of five and a standard deviation of three.

31
00:02:28,650 --> 00:02:33,000
So we're dealing with the normal distribution, but it is not the standard, normal distribution.

32
00:02:33,090 --> 00:02:40,290
So these two pieces of information here come from our random variable x, which is normally distributed,

33
00:02:40,380 --> 00:02:43,560
and we want to convert X into Z.

34
00:02:43,560 --> 00:02:47,490
We want to change this distribution into the standard normal distribution Z.

35
00:02:47,490 --> 00:02:53,280
And you can see that if we just take a few test values here, let's say that the random variable X takes

36
00:02:53,280 --> 00:02:55,050
on a value of five.

37
00:02:55,050 --> 00:03:04,170
Well, if we plug in five for x five from you and three for Sigma, what we get is Z equal to five minus

38
00:03:04,170 --> 00:03:07,200
five divided by three or zero.

39
00:03:07,230 --> 00:03:11,910
So the Z score associated with x equals five.

40
00:03:11,910 --> 00:03:15,960
We'll put a little five subscript here to indicate that x was equal to five.

41
00:03:15,990 --> 00:03:19,290
The z score associated with x equals five is zero.

42
00:03:19,290 --> 00:03:22,620
Let's look at what happens if we set x equal to eight.

43
00:03:22,620 --> 00:03:31,350
So we say z for the x value eight is equal to we would get eight minus five divided by three or three

44
00:03:31,350 --> 00:03:33,420
divided by three is equal to one.

45
00:03:33,420 --> 00:03:37,290
If we pick, let's say x equal to two.

46
00:03:37,950 --> 00:03:42,690
We get two minus five divided by three is equal to negative one.

47
00:03:42,690 --> 00:03:47,310
And if we pick maybe x equals 6.5.

48
00:03:48,370 --> 00:03:59,800
We get 6.5 minus five divided by three is equal to 1.5 divided by three or one half or 0.5.

49
00:03:59,830 --> 00:04:02,880
Now, here's what these values are actually telling us.

50
00:04:02,890 --> 00:04:10,420
The value we get for Z is giving us the distance of whatever X value we picked five 8 to 6.5.

51
00:04:10,570 --> 00:04:16,899
It's giving us the distance of that particular value of the random variable x away from the mean in

52
00:04:16,899 --> 00:04:18,970
terms of standard deviations.

53
00:04:18,970 --> 00:04:24,550
So when we picked x is equal to five, the mean of the random variable x is five.

54
00:04:24,550 --> 00:04:29,640
So the distance between the x value five and its mean is of course zero.

55
00:04:29,650 --> 00:04:35,050
So the z score we get is zero because the distance between the x value we picked and the mean is zero.

56
00:04:35,050 --> 00:04:41,500
So we get a Z score of zero, but if we pick X is equal to eight, the distance between eight and the

57
00:04:41,500 --> 00:04:42,820
mean five is three.

58
00:04:42,820 --> 00:04:46,000
We see that here, eight minus five in the numerator is three.

59
00:04:46,000 --> 00:04:50,350
So we could say that the distance between x equals eight and its mean is three.

60
00:04:50,350 --> 00:04:55,960
But we always want to give the distance in terms of standard deviations and that's why we divide by

61
00:04:55,960 --> 00:04:57,610
the standard deviation sigma.

62
00:04:57,610 --> 00:05:01,330
So when we take three divided by three, the result is one.

63
00:05:01,330 --> 00:05:08,830
And so this value tells us that the distance of eight away from the mean five in terms of standard deviations

64
00:05:08,830 --> 00:05:09,790
is one.

65
00:05:09,790 --> 00:05:16,120
In other words, X equals eight is one standard deviation away from the mean, more specifically one

66
00:05:16,120 --> 00:05:20,770
standard deviation above the mean because we got a value of positive one.

67
00:05:21,010 --> 00:05:26,830
But when we pick x is equal to two and we calculate the associated Z score, we say that the distance

68
00:05:26,830 --> 00:05:32,710
between x equals two and its mean x equals five is two minus five or negative three.

69
00:05:32,710 --> 00:05:39,640
So the distance between the specific value of x that we chose and its own mean is negative three that

70
00:05:39,640 --> 00:05:40,750
negative three value.

71
00:05:40,750 --> 00:05:45,520
The fact that it's negative tells us that the specific value of x that we picked is below the mean.

72
00:05:45,520 --> 00:05:51,250
It is less than the mean, which of course we see x equals two is less than x equals five the mean.

73
00:05:51,250 --> 00:05:56,710
So we get a distance of negative three and then when we divide by the standard deviation of three,

74
00:05:56,710 --> 00:05:58,990
the result we get is negative one.

75
00:05:59,170 --> 00:06:05,170
And this result tells us that x equals two is one standard deviation below the mean.

76
00:06:05,170 --> 00:06:07,420
So it's one standard deviation away from the mean.

77
00:06:07,420 --> 00:06:13,420
The negative sign tells us one standard deviation below the mean, whereas over here the positive one

78
00:06:13,420 --> 00:06:18,460
tells us one standard deviation away from the mean and specifically one standard deviation above the

79
00:06:18,460 --> 00:06:18,970
mean.

80
00:06:18,970 --> 00:06:22,120
And then here we see x equals 6.5.

81
00:06:22,150 --> 00:06:26,110
The distance away from the mean five is one and a half, right?

82
00:06:26,110 --> 00:06:29,890
6.5 minus five is 1.5 or one and one half.

83
00:06:30,070 --> 00:06:34,570
So our distance away from the mean for this particular value of x is 1.5.

84
00:06:34,570 --> 00:06:37,570
But what is that distance in terms of standard deviations?

85
00:06:37,570 --> 00:06:42,130
Well, it's half of a standard deviation because one standard deviation is three.

86
00:06:42,130 --> 00:06:44,890
So what is 1.5 of three?

87
00:06:44,890 --> 00:06:47,020
1.5 is half of three.

88
00:06:47,050 --> 00:06:50,320
And so the result that we get is 0.5 or one half.

89
00:06:50,320 --> 00:06:55,690
6.5 is half of a standard deviation away from the mean.

90
00:06:55,690 --> 00:07:02,470
And so this standardized Z score is 0.5 or half of a standard deviation above the mean.

91
00:07:02,470 --> 00:07:07,630
That is what we're calculating These values here that we're calculating are Z scores.

92
00:07:07,630 --> 00:07:13,870
And the Z score always tells us how many standard deviations we are away from the mean when we choose

93
00:07:13,870 --> 00:07:16,030
a particular value of our random variable.

94
00:07:16,210 --> 00:07:21,940
If we get a positive value for the Z score, it means the particular value we chose is above the mean.

95
00:07:22,030 --> 00:07:26,290
If we get a negative value, it means the value we chose is below the mean.

96
00:07:26,320 --> 00:07:31,990
If we get zero, it means the value we chose is exactly at the mean, it is the mean itself.

97
00:07:31,990 --> 00:07:37,030
Now we don't call the distribution of Z the standard normal distribution for nothing.

98
00:07:37,030 --> 00:07:44,830
It really is standardized, so much so that we actually have a full table of values for every different

99
00:07:44,830 --> 00:07:47,680
Z score that we can possibly calculate.

100
00:07:47,680 --> 00:07:50,470
And here's what that big table looks like.

101
00:07:50,470 --> 00:07:55,570
Now, there's a lot going on here, but it's actually pretty simple when we break it down, What we

102
00:07:55,570 --> 00:08:00,700
see in this whole left side of the table here is this first column of Z scores.

103
00:08:00,700 --> 00:08:03,040
And these values are all negative.

104
00:08:03,040 --> 00:08:04,330
You see that in the first row.

105
00:08:04,330 --> 00:08:11,680
We start with -3.4 and we go all the way down here to the bottom where we see -0.1 and then 0.0.

106
00:08:11,920 --> 00:08:18,370
So these are all negative Z scores, which we already said we find when we pick a particular value that

107
00:08:18,370 --> 00:08:19,660
is below the mean.

108
00:08:19,660 --> 00:08:25,390
We saw that here when we calculated the Z score for X equals two, whereas this whole right hand side

109
00:08:25,390 --> 00:08:29,500
of the table we see here in this column, the Z scores are all positive.

110
00:08:29,500 --> 00:08:33,490
They start at 0.0 and move all the way up to 3.4.

111
00:08:33,760 --> 00:08:36,640
So you can almost think about the normal distribution.

112
00:08:36,640 --> 00:08:42,730
And if we're starting at the left edge of the normal distribution, we are starting here with this most

113
00:08:42,730 --> 00:08:45,280
negative value, -3.4.

114
00:08:45,370 --> 00:08:47,950
And as we move from left to right along the.

115
00:08:48,000 --> 00:08:49,020
Normal distribution.

116
00:08:49,020 --> 00:08:56,670
We move this way in our Z table until we get all the way down here to the bottom at 0.0, where we reach

117
00:08:56,670 --> 00:09:01,170
the halfway point of the normal distribution or the mean of the normal distribution.

118
00:09:01,170 --> 00:09:06,180
Because remember, a Z score of zero is associated with the mean.

119
00:09:06,360 --> 00:09:13,500
We saw that here and we see that this Z score of zero is giving us exactly 0.5.

120
00:09:13,500 --> 00:09:20,280
And that's because 50% of the area under the curve of the normal distribution is to the left of this

121
00:09:20,310 --> 00:09:20,910
Z score.

122
00:09:20,910 --> 00:09:22,800
And we'll talk more about that in a second.

123
00:09:22,800 --> 00:09:27,480
But we move all the way down through the table here, all the way through these negative Z scores,

124
00:09:27,480 --> 00:09:33,090
from the most negative Z score to a just slightly negative Z score of -0.1.

125
00:09:33,090 --> 00:09:36,900
We hit this 0.0, which is right at the mean or right at the middle.

126
00:09:36,990 --> 00:09:42,690
And then we continue on through this right side of the table starting at 0.0.

127
00:09:42,720 --> 00:09:43,110
The mean.

128
00:09:43,110 --> 00:09:51,690
So you can see how these two values match between the tables because they're both at 0.0.

129
00:09:51,780 --> 00:09:58,350
And then we continue on through positive Z scores, starting with a slightly positive Z score of 0.1

130
00:09:58,350 --> 00:10:02,940
until we get to an extremely positive Z score of 3.4.

131
00:10:02,940 --> 00:10:07,740
And of course, up here we're at the far right edge of the standard normal distribution.

132
00:10:07,740 --> 00:10:14,790
The idea here with these Z tables, the value that we're getting out of the body of the Z table is the

133
00:10:14,790 --> 00:10:21,720
percentage of area underneath the curve that falls to the left of the Z score that we look up.

134
00:10:21,990 --> 00:10:28,500
So we already said here that for a Z score equal to zero, we could look in either side of the table

135
00:10:28,500 --> 00:10:30,900
and we see this 0.0 value.

136
00:10:30,900 --> 00:10:35,790
So we go to 0.0 that takes us out to the first decimal place.

137
00:10:35,790 --> 00:10:42,840
But then we have to look across the top row of the table to get the second decimal place for a Z score

138
00:10:42,840 --> 00:10:44,040
of exactly zero.

139
00:10:44,040 --> 00:10:49,680
We want to stay in this first column here to get a true Z score of 0.00.

140
00:10:49,680 --> 00:10:54,390
And so we go to the last row, this first column, and we see here 0.5.

141
00:10:54,390 --> 00:11:01,440
That means that 50% of the data, 50% of the area under the standard normal distribution lies to the

142
00:11:01,440 --> 00:11:03,750
left of Z equals zero.

143
00:11:03,750 --> 00:11:08,400
If we want to look up a Z score of one, well, this is a positive one here.

144
00:11:08,400 --> 00:11:10,440
So we're going to be in this right hand side of the table.

145
00:11:10,440 --> 00:11:12,240
We're looking for a positive one.

146
00:11:12,240 --> 00:11:16,680
Well, we find that at 1.00.

147
00:11:16,680 --> 00:11:22,860
And so we see this value right here, which is 0.8413.

148
00:11:22,860 --> 00:11:30,960
That tells us that about 84% of the data or 84.13% of the area under the standard normal distribution

149
00:11:30,960 --> 00:11:33,960
lies to the left of Z equals one.

150
00:11:33,960 --> 00:11:39,600
Now, if we look for Z equals negative one, because we have a negative Z score, where in this left

151
00:11:39,600 --> 00:11:52,140
side table here, if we go to Z equals -1.00, that puts us here in this first column at 0.1587.

152
00:11:52,140 --> 00:11:58,380
This tells us that the area under the standard normal distribution to the left of negative one is about

153
00:11:58,380 --> 00:12:01,950
15.87% of the total area.

154
00:12:01,980 --> 00:12:05,580
Now, here's an interesting point to make note of.

155
00:12:05,580 --> 00:12:11,970
Now that we have the Z score for both positive one and negative one, if we just sketch a super simple,

156
00:12:12,000 --> 00:12:17,220
normal distribution, we'll say that it looks like this doesn't need to be perfect, but we have here

157
00:12:17,400 --> 00:12:18,240
our.

158
00:12:18,970 --> 00:12:26,120
Mean, and we just found the percentage of area below Z equals one and Z equals negative one.

159
00:12:26,140 --> 00:12:30,610
So let's say that this right here is one standard deviation above the mean.

160
00:12:30,610 --> 00:12:33,700
We'll say that's one and that then symmetrically.

161
00:12:33,730 --> 00:12:38,240
This is one standard deviation below the mean or Z equals negative one.

162
00:12:38,260 --> 00:12:46,540
What these values tell us is that 15.87% of the total area under this curve is going to fall to the

163
00:12:46,540 --> 00:12:49,900
left of Z equals negative one.

164
00:12:49,930 --> 00:12:56,410
So this area right here makes up 15.87% of the total area under the curve.

165
00:12:56,500 --> 00:13:03,820
Another way to put this is that the probability that we find a value below one standard deviation below

166
00:13:03,820 --> 00:13:06,550
the mean is 15.87%.

167
00:13:06,550 --> 00:13:12,010
Or in the case of this normal distribution here, this random variable x, remember we found a z score

168
00:13:12,010 --> 00:13:19,210
of negative one for this value x equals two, which means that the probability or the likelihood that

169
00:13:19,210 --> 00:13:25,530
we get a value of X in our data set that is less than two is about 15.87%.

170
00:13:25,570 --> 00:13:31,240
Now, remember, the standard normal distribution is symmetric, which means that if the likelihood

171
00:13:31,240 --> 00:13:38,830
of us getting a value below one standard deviation is 15.87%, that means that finding a value above

172
00:13:38,830 --> 00:13:40,030
one standard deviation.

173
00:13:40,030 --> 00:13:46,930
So over here, the area that is symmetric to the one that we already sketched over here, the probability

174
00:13:46,930 --> 00:13:53,400
of us finding a value above one standard deviation above the mean also has to be 15.87%.

175
00:13:53,410 --> 00:13:59,350
And of course that makes sense because when we looked here in the Z table for the value associated with

176
00:13:59,350 --> 00:14:03,780
a Z score of 1.00, we got 0.8413.

177
00:14:03,790 --> 00:14:11,020
So we're saying that the probability that we find a value for X that's less than one standard deviation

178
00:14:11,020 --> 00:14:14,050
above the mean is 84.13%.

179
00:14:14,050 --> 00:14:23,560
So this whole area here, everything below one standard deviation above the mean or to the left of a

180
00:14:23,770 --> 00:14:25,840
Z score of positive one.

181
00:14:25,840 --> 00:14:29,290
This whole area is 84.13%.

182
00:14:29,290 --> 00:14:33,190
So that is 84.13%.

183
00:14:33,190 --> 00:14:39,460
And we're saying this value in blue, which was symmetric to the blue area that we sketched on the left,

184
00:14:39,460 --> 00:14:42,060
that is 15.8, 7%.

185
00:14:42,070 --> 00:14:49,990
So if we add to that 15.87%, notice that what we get here.

186
00:14:51,250 --> 00:14:52,670
Is 100%.

187
00:14:52,690 --> 00:14:59,080
Which, of course makes sense because the area under the distribution always has to be equal to one

188
00:14:59,080 --> 00:15:00,240
or 100%.

189
00:15:00,250 --> 00:15:06,700
And so when we pick some Z score like Z equals one, and we look at all the area to the left of Z equals

190
00:15:06,700 --> 00:15:13,630
one, and all the area to the right of Z equals one, those two areas need to sum to one or those probabilities

191
00:15:13,630 --> 00:15:15,570
need to sum to 100%.

192
00:15:15,610 --> 00:15:23,140
They're complementary probabilities and we can see that with any two associated Z scores in this table.

193
00:15:23,140 --> 00:15:31,690
Let's say for instance, we pick 1.53 that would give us this value right here, 0.9370.

194
00:15:31,690 --> 00:15:40,270
And then if we go find -1.53, so we go over here to the negative side of the table, -1.53 gives us

195
00:15:40,270 --> 00:15:45,340
this cell here, which we see is 0.0630.

196
00:15:45,340 --> 00:15:51,820
And if we add 0.9370 to 0.0630, the result we get is one.

197
00:15:51,820 --> 00:15:58,990
So the values we get from the Z table for opposite Z scores are always going to add to one which makes

198
00:15:58,990 --> 00:16:05,890
sense given our understanding of the area under the probability distribution, always adding up to one.

199
00:16:05,890 --> 00:16:09,700
We can also think about the values in the Z table as percentiles.

200
00:16:09,700 --> 00:16:16,270
So if the value that we get from the Z table is let's take this value here 0.9370, that tells us that

201
00:16:16,270 --> 00:16:24,940
the data point that was associated with a Z score of 1.53 is greater than 93.7% of the data points,

202
00:16:24,940 --> 00:16:29,910
or it's in the 93.7 percentile of all of our data.

203
00:16:29,920 --> 00:16:37,090
Or to say it another way, whatever value of the random variable x we used to arrive at a Z score of

204
00:16:37,090 --> 00:16:47,980
1.53 is a value of X larger than 93.7% of the other values of X and less than 6.3%, which we see right

205
00:16:47,980 --> 00:16:50,560
here of all of the other values of X.

206
00:16:50,560 --> 00:16:55,240
Now, keep in mind too, that we can also use this Z table to work backwards.

207
00:16:55,240 --> 00:17:02,860
So if we want to know, for instance, what the threshold is or the cutoff for the top 30% of our data,

208
00:17:02,860 --> 00:17:07,750
well, top 30% means that we have to be in that 70th percentile.

209
00:17:07,750 --> 00:17:10,690
100%, -30% is 70%.

210
00:17:10,690 --> 00:17:13,300
We have to beat that 70% cutoff.

211
00:17:13,300 --> 00:17:20,619
So if we come here to the positive side of the Z table and we locate the first value that is greater

212
00:17:20,619 --> 00:17:30,040
than 0.7, we see that it's right here, this cell right here, 0.7019.

213
00:17:30,040 --> 00:17:34,840
This is the first cell we see that is greater than 0.7.

214
00:17:34,840 --> 00:17:39,160
Just to the left of it, we see the first value that is less than 0.7.

215
00:17:39,160 --> 00:17:47,350
So we know that if we want to be within the top 30%, we'll need a Z score of at least 0.53 because

216
00:17:47,350 --> 00:17:55,600
we get the 0.5 from the row header over here, 0.5 and we get the 0.03 from the column header up here.

217
00:17:55,600 --> 00:18:06,040
So 0.5 and then 3.53 is the smallest Z score we can get that's associated with the top 30% of our data.

218
00:18:06,040 --> 00:18:12,460
So if we choose a value of X and we calculate its Z score and we get a Z score that is greater than

219
00:18:12,460 --> 00:18:19,030
or equal to 0.53, we know based on the table here that we're going to be in the top 30% of our data.

220
00:18:19,030 --> 00:18:25,750
So that's the idea of a Z score, the formula we use to calculate it, what it's actually telling us

221
00:18:25,870 --> 00:18:31,630
how to look up the Z scores that we find in a Z table, what the information in the Z table actually

222
00:18:31,630 --> 00:18:36,190
tells us, and how to work backwards from the Z table to the Z score.

223
00:18:36,280 --> 00:18:42,880
We will use these tables all the time and a quick search online for Z table will pull these up right

224
00:18:42,880 --> 00:18:43,180
away.

225
00:18:43,180 --> 00:18:46,390
They're so standard that you'll be able to find the Z table everywhere.

226
00:18:46,390 --> 00:18:50,020
But this is what we're going to be doing all the time going forward.

227
00:18:50,020 --> 00:18:55,060
We're going to have some random variable, we'll call it X, we're going to know its mean and it's standard

228
00:18:55,060 --> 00:18:55,870
deviation.

229
00:18:55,870 --> 00:19:02,260
But instead of using the normal distribution that's associated with X, which might be shorter and wider

230
00:19:02,260 --> 00:19:08,440
or taller and narrower than the standard normal distribution that represents Z, we're going to convert

231
00:19:08,440 --> 00:19:16,690
all values of X into values of Z to essentially find the associated value of Z for every value of X,

232
00:19:16,690 --> 00:19:21,160
We'll calculate here a Z score for any value of X that we need.

233
00:19:21,280 --> 00:19:27,190
That'll give us the value of Z that's associated with a particular value of X that we're interested

234
00:19:27,190 --> 00:19:27,460
in.

235
00:19:27,460 --> 00:19:32,800
In other words, it'll give us our Z score for that particular value of X, And once we have that Z

236
00:19:32,800 --> 00:19:40,210
score, then we have these Z tables available to us as a tool which will allow us to quickly look up

237
00:19:40,210 --> 00:19:47,680
all of these probability percentile threshold values that are associated with any specific Z score that

238
00:19:47,680 --> 00:19:48,820
we've calculated.

