1
00:00:00,150 --> 00:00:05,460
All we want to do here is learn how to convert from the normal distribution to what's called the standard

2
00:00:05,460 --> 00:00:06,620
normal distribution.

3
00:00:06,630 --> 00:00:13,560
So previously we looked at a normal distribution and we said that the normal distribution was this symmetric

4
00:00:13,560 --> 00:00:17,430
bell shaped curve that had its center, both its mean and its median.

5
00:00:17,430 --> 00:00:21,360
At the center of the distribution here where the mean is mu.

6
00:00:21,390 --> 00:00:27,060
We said also that the normal distribution followed the empirical rule which told us the percentage of

7
00:00:27,060 --> 00:00:32,220
area that we could expect to find under each section of the curve where these sections here are broken

8
00:00:32,220 --> 00:00:35,730
up based on the standard deviation of the data.

9
00:00:35,730 --> 00:00:40,500
So this blue area here represents one standard deviation around the mean.

10
00:00:40,500 --> 00:00:46,020
So from the mean minus one standard deviation on the left to the mean plus one standard deviation on

11
00:00:46,020 --> 00:00:46,650
the right.

12
00:00:46,650 --> 00:00:53,280
If we add on to that and include this red area that represents the interval from the mean minus two

13
00:00:53,280 --> 00:00:57,990
standard deviations on the left all the way over here to the mean plus two standard deviations on the

14
00:00:57,990 --> 00:00:58,440
right.

15
00:00:58,440 --> 00:01:02,190
And then we have our interval for three standard deviations around the mean as well.

16
00:01:02,190 --> 00:01:08,820
But with a normal distribution as we saw before, the mean and the standard deviation can take on any

17
00:01:08,820 --> 00:01:09,540
values.

18
00:01:09,540 --> 00:01:15,270
In fact, we said that the normal distribution could always be identified as capital N and then mean

19
00:01:15,270 --> 00:01:17,370
and variance.

20
00:01:17,370 --> 00:01:22,140
So for instance, if we had a mean of five.

21
00:01:22,840 --> 00:01:25,210
And a standard deviation of three.

22
00:01:25,210 --> 00:01:29,540
We could indicate that normal distribution as five and then nine.

23
00:01:29,560 --> 00:01:32,710
Because standard deviation is sigma, variance is sigma squared.

24
00:01:32,710 --> 00:01:36,970
So if our standard deviation is three, then our variance is three squared or nine.

25
00:01:36,970 --> 00:01:40,180
And in fact, we looked at that exact normal distribution.

26
00:01:40,180 --> 00:01:48,190
But the standard normal distribution is the special normal distribution where the mean is always zero

27
00:01:48,400 --> 00:01:50,920
and the standard deviation is always one.

28
00:01:50,920 --> 00:01:57,100
And we indicate the standard normal distribution with the capital letter Z instead of just any normal

29
00:01:57,100 --> 00:01:59,200
distribution with the capital letter N.

30
00:01:59,200 --> 00:02:03,250
So if we see capital Z, that's indicating the standard normal distribution.

31
00:02:03,250 --> 00:02:11,290
And of course if we were writing this as a normal distribution, we could write it as capital N of zero

32
00:02:11,290 --> 00:02:16,240
comma one, because in the standard normal distribution, the mean is zero, the standard deviation

33
00:02:16,240 --> 00:02:19,270
is one, therefore the variance is one squared or one.

34
00:02:19,270 --> 00:02:25,090
So the standard normal distribution is just equal to the normal distribution that we've looked at before.

35
00:02:25,090 --> 00:02:30,940
But with this particular mean and standard deviation, which means that if we're sketching the standard

36
00:02:30,940 --> 00:02:35,350
normal distribution here, we have to indicate that the mean is at zero.

37
00:02:35,350 --> 00:02:41,530
And then by definition, the lower boundary of one standard deviation around the mean is negative one,

38
00:02:41,530 --> 00:02:48,550
the upper boundary is positive one, and then we have negative two and negative three and positive two

39
00:02:48,550 --> 00:02:54,340
and positive three for the boundaries around the two standard deviations around the mean interval and

40
00:02:54,340 --> 00:02:56,560
three standard deviations around the mean interval.

41
00:02:56,560 --> 00:03:03,340
So when we have a standard normal distribution, then these boundaries are always negative three, negative

42
00:03:03,370 --> 00:03:05,920
two, negative one 0123.

43
00:03:05,920 --> 00:03:12,250
Whereas in this example here for the normal distribution with mean five and standard deviation three,

44
00:03:12,250 --> 00:03:15,310
those values would have been a mean of five.

45
00:03:15,310 --> 00:03:22,900
And then with a standard deviation of three, we would have had eight, 11 and 14, and then here two

46
00:03:22,900 --> 00:03:25,270
negative one and negative four.

47
00:03:25,300 --> 00:03:31,450
Now what we want to understand at this point is how we can take any normal distribution and convert

48
00:03:31,450 --> 00:03:33,640
it into the standard normal distribution.

49
00:03:33,640 --> 00:03:35,410
In other words, to standardize it.

50
00:03:35,410 --> 00:03:42,820
If we could have a way to convert any normal distribution into this super standardized, normal distribution

51
00:03:42,820 --> 00:03:48,550
with this mean of zero and standard deviation of one, that would certainly make comparing normal distributions

52
00:03:48,550 --> 00:03:49,270
easier.

53
00:03:49,270 --> 00:03:53,200
It would make identifying values more clear.

54
00:03:53,200 --> 00:03:55,420
We could make our reporting more clear.

55
00:03:55,420 --> 00:03:58,360
It would just give us a good standard from which to start.

56
00:03:58,360 --> 00:04:04,210
Instead of starting with something like this, this normal distribution we looked at earlier with mean

57
00:04:04,210 --> 00:04:09,310
five and standard deviation three, where we have all of these different numbers and it's much harder

58
00:04:09,310 --> 00:04:11,950
to see that this is a normal distribution.

59
00:04:11,950 --> 00:04:17,829
And of course, as you may expect, we already have a way to convert from a normal distribution into

60
00:04:17,829 --> 00:04:19,480
a standard, normal distribution.

61
00:04:19,480 --> 00:04:25,270
And it goes back to what we looked at earlier when we learned about transforming random variables.

62
00:04:25,270 --> 00:04:31,660
Remember that we talked about how to apply a shift or a scale to a random variable, and that's all

63
00:04:31,660 --> 00:04:32,770
we're going to do here.

64
00:04:32,770 --> 00:04:35,290
We're going to start with our normal distribution.

65
00:04:35,290 --> 00:04:41,740
MN And we're going to apply a shift and then a scale in order to transform our normal distribution into

66
00:04:41,740 --> 00:04:43,330
the standard, normal distribution.

67
00:04:43,330 --> 00:04:49,630
So if we start with a normal distribution and instead of representing that random variable with N,

68
00:04:49,630 --> 00:04:55,960
let's say we're representing it with X, then the first thing we would do is apply a shift to X where

69
00:04:55,960 --> 00:04:57,790
we subtract out the mean.

70
00:04:57,790 --> 00:05:00,670
Now let's talk about what this looks like visually.

71
00:05:00,670 --> 00:05:07,960
So this is in fact a sketch of the normal distribution with mean five and standard deviation three.

72
00:05:07,960 --> 00:05:14,200
So what we're starting with here is the normal distribution of the random variable here with mean five

73
00:05:14,200 --> 00:05:17,470
and standard deviation three or variance nine.

74
00:05:17,470 --> 00:05:20,200
This is what it's probability distribution looks like.

75
00:05:20,200 --> 00:05:27,310
Now if we apply a shift by subtracting out the mean from every data point in our dataset that's making

76
00:05:27,310 --> 00:05:32,050
this distribution, it's going to move every data point to the left by five units.

77
00:05:32,050 --> 00:05:33,310
The mean is five.

78
00:05:33,310 --> 00:05:39,820
So when we subtract out the mean, the graph will shift and the curve moves over to the left.

79
00:05:39,820 --> 00:05:45,220
And now we have a distribution that is centered at zero that has a mean of zero.

80
00:05:45,220 --> 00:05:55,570
So we moved the mean from five here to zero here by applying this shift to all of the data points in

81
00:05:55,570 --> 00:05:56,290
our dataset.

82
00:05:56,290 --> 00:06:03,280
So now we have the mean of the standard normal distribution, which is that mean of zero, but we still

83
00:06:03,280 --> 00:06:05,980
have a standard deviation of three.

84
00:06:05,980 --> 00:06:08,410
We still have this wide stretched curve.

85
00:06:08,410 --> 00:06:14,350
So this curve we have right now, we started with the normally distributed random variable with mean

86
00:06:14,350 --> 00:06:15,790
five, standard deviation three.

87
00:06:15,790 --> 00:06:22,210
Now this curve that we're showing is the normally distributed variable with mean zero and standard deviation.

88
00:06:22,320 --> 00:06:22,720
Three.

89
00:06:22,770 --> 00:06:27,580
So we have the first piece of getting this over to a standard normal distribution.

90
00:06:27,600 --> 00:06:32,450
Now all we have to do is apply a scale, and here's what that scale looks like.

91
00:06:32,460 --> 00:06:37,760
We will divide every data point in the data set by the standard deviation.

92
00:06:37,770 --> 00:06:43,380
When we do that, when we divide by the standard deviation, that means every data point in the set

93
00:06:43,380 --> 00:06:46,470
is going to get pulled in closer to the mean in this case.

94
00:06:46,470 --> 00:06:50,190
And the resulting curve looks like this and this.

95
00:06:50,190 --> 00:06:56,580
Now, this final curve, after both the shift and the scale, is the normally distributed random variable

96
00:06:56,580 --> 00:07:05,370
with mean zero and standard deviation one, which of course we know to be equal to the standard normal

97
00:07:05,370 --> 00:07:06,270
distribution.

98
00:07:06,270 --> 00:07:12,930
So that's really all that we want to understand how we transform one normal distribution into the standard,

99
00:07:12,930 --> 00:07:17,070
normal distribution, regardless of the mean and standard deviation that we started with.

100
00:07:17,070 --> 00:07:24,540
And going forward, this formula here after we've applied the shift and the scale is going to be our

101
00:07:24,540 --> 00:07:25,920
next point of focus.

102
00:07:25,920 --> 00:07:32,070
So we want to imagine that we started with the random variable x, and after these two transformations

103
00:07:32,070 --> 00:07:35,370
we transformed it into the random variable Z.

104
00:07:35,370 --> 00:07:42,210
And now if we rework this formula a little bit, if we switch it around and consolidate this side here

105
00:07:42,210 --> 00:07:49,110
we get Z is equal to X minus the mean divided by the standard deviation.

106
00:07:49,290 --> 00:07:55,710
And it's this formula that we're going to be focusing on going forward as we talk about what we call

107
00:07:55,710 --> 00:08:00,120
the Z scores that are associated with the standard normal distribution.

