1
00:00:00,090 --> 00:00:06,030
So we've said that measures of central tendency attempt to measure or indicate some kind of center point

2
00:00:06,030 --> 00:00:07,290
of the data set.

3
00:00:07,290 --> 00:00:12,720
But now we want to talk about measurements of dispersion or measurements of spread, which really at

4
00:00:12,720 --> 00:00:17,770
their core, attempt to quantify how much the data is spread out around the mean.

5
00:00:17,790 --> 00:00:25,290
So let's return first to this idea of a population which we indicate with Capital N or a sample, which

6
00:00:25,290 --> 00:00:27,270
we indicate with lowercase n.

7
00:00:27,570 --> 00:00:33,600
If we are looking at an entire population, for instance, maybe all of the students in our school,

8
00:00:33,600 --> 00:00:39,660
if we're thinking about students at the school, 100% of those students make up the population.

9
00:00:39,660 --> 00:00:43,770
And we would indicate that entire student population with capital N.

10
00:00:44,070 --> 00:00:51,030
If we want to take a sample of students in the school in an attempt to find a smaller representative

11
00:00:51,030 --> 00:00:58,710
group of that larger population, we would take some subsection, some sample of that larger population

12
00:00:58,710 --> 00:01:01,740
and we would indicate that sample with lowercase n.

13
00:01:01,770 --> 00:01:06,570
So if there are 1000 students in the school, then capital N is equal to 1000.

14
00:01:06,600 --> 00:01:12,360
If we're taking a sample of 100 students, then we would say lowercase n is equal to 100.

15
00:01:12,390 --> 00:01:19,200
And of course, as we already know, the mean for both the population and the sample we can find using

16
00:01:19,200 --> 00:01:20,430
the same formula.

17
00:01:20,460 --> 00:01:23,190
Remember that these two formulas are identical.

18
00:01:23,190 --> 00:01:29,910
The only difference is that when we indicate a population mean, we usually do so with this Greek letter

19
00:01:29,910 --> 00:01:36,570
MU, and when we indicate the mean for a sample, we usually do so with this x bar notation.

20
00:01:36,660 --> 00:01:43,080
And as we're summing up all of the data points in the population, we indicate that population here

21
00:01:43,080 --> 00:01:49,500
with Capital N and we divide by capital N the number of subjects in the population.

22
00:01:49,500 --> 00:01:55,080
Whereas with the sample mean formula, of course we're using lowercase end to indicate the number of

23
00:01:55,080 --> 00:01:56,640
subjects in the sample.

24
00:01:56,640 --> 00:02:02,040
But otherwise these formulas are going to calculate the same thing just one for the population and one

25
00:02:02,040 --> 00:02:02,930
for the sample.

26
00:02:02,940 --> 00:02:08,370
Now, when it comes to measurements of dispersion or measurements of spread, we're talking about how

27
00:02:08,370 --> 00:02:11,370
much the data is spread out around the mean.

28
00:02:11,370 --> 00:02:15,510
When we looked at measures of central tendency, we talked about mean, we talked about median, we

29
00:02:15,510 --> 00:02:16,440
talked about mode.

30
00:02:16,440 --> 00:02:18,990
But here we're really focusing on mean.

31
00:02:18,990 --> 00:02:23,490
And when we're talking about measures of spread, we're interested in how the data is spread out around

32
00:02:23,490 --> 00:02:25,140
this mean value.

33
00:02:25,170 --> 00:02:31,290
Is it very tightly clustered around the mean where all the data sits really, really close to that mean

34
00:02:31,290 --> 00:02:35,490
value, or is the data really spread out far from the mean?

35
00:02:35,490 --> 00:02:37,200
Does it have a larger range?

36
00:02:37,200 --> 00:02:41,700
Is a significant amount of the data far away from this mean value.

37
00:02:41,700 --> 00:02:46,710
That's what we're talking about when we say measures of dispersion or measurements of spread, how much

38
00:02:46,710 --> 00:02:48,270
is the data spread out?

39
00:02:48,270 --> 00:02:51,030
How much is the data dispersed from the mean?

40
00:02:51,030 --> 00:02:56,370
And when it comes to those measurements of dispersion or measurements of spread, we look at two things

41
00:02:56,370 --> 00:03:02,370
variance and standard deviation, both of which are very closely related to one another.

42
00:03:02,370 --> 00:03:08,160
Let's start by looking at this idea of variance, and again, we can calculate the variance for the

43
00:03:08,160 --> 00:03:14,460
population Capital N or for the sample lowercase n, and with both variance and standard deviation,

44
00:03:14,460 --> 00:03:19,770
the smaller the value we find, then of course the smaller the variance or the smaller the standard

45
00:03:19,770 --> 00:03:23,880
deviation and the more tightly clustered the data is around the mean.

46
00:03:23,880 --> 00:03:28,710
If we find a larger value for variance or a larger value for standard deviation, which we'll look at

47
00:03:28,710 --> 00:03:33,840
in a second, that means that the data is more spread out, that it is farther away from the mean.

48
00:03:33,840 --> 00:03:39,840
Now, when it comes to calculating variance, if we're talking about population variance, we indicate

49
00:03:39,840 --> 00:03:46,200
that with this Greek letter sigma and when we talk about population variance, we call that value sigma

50
00:03:46,200 --> 00:03:46,830
squared.

51
00:03:46,830 --> 00:03:52,260
In the same way, when we talk about sample variance, we talk about that as this value F squared.

52
00:03:52,260 --> 00:03:56,670
So we use sigma squared for population variance, SX squared for sample variance.

53
00:03:56,670 --> 00:04:02,610
And you'll notice that just like the formulas with the mean, these two formulas are essentially identical,

54
00:04:02,610 --> 00:04:09,480
except that here in the population formula we are using population mean mu you'll see that value here

55
00:04:09,480 --> 00:04:15,960
MU and with the sample we're using the sample mean value x bar that we already calculated up here when

56
00:04:15,960 --> 00:04:17,370
we found sample mean.

57
00:04:17,370 --> 00:04:23,040
And of course we're using capital N in the population formula lowercase n in the sample formula.

58
00:04:23,130 --> 00:04:30,960
Now these variance formulas look a little complicated, but really all they're calculating is the mean

59
00:04:30,960 --> 00:04:33,300
of the squared deviations.

60
00:04:33,300 --> 00:04:34,890
So what do we mean by that?

61
00:04:34,890 --> 00:04:37,830
Well, let's look at this population variance formula here.

62
00:04:37,830 --> 00:04:45,180
We already know from the formula we looked at for mean that x abi is just one data point in our population,

63
00:04:45,180 --> 00:04:48,270
one value, one subject in our population.

64
00:04:48,270 --> 00:04:51,210
And we know that mu here is population mean.

65
00:04:51,210 --> 00:04:59,610
So when we take x abi minus mu, this value right here is giving us the distance between this one data.

66
00:05:00,190 --> 00:05:02,250
And the mean of the data set.

67
00:05:02,260 --> 00:05:08,410
So if we found that the mean of the population was four and this particular data point is ten, we would

68
00:05:08,410 --> 00:05:13,570
take ten minus four and we'd get a value of six here, which just indicates that the particular data

69
00:05:13,570 --> 00:05:21,160
point of ten is six units away or a distance of six away from this population mean we call that the

70
00:05:21,160 --> 00:05:21,910
deviation.

71
00:05:21,910 --> 00:05:26,020
How much does ten deviate from the mean of four?

72
00:05:26,050 --> 00:05:31,180
Well, it deviates by six units or this idea of distance of six.

73
00:05:31,180 --> 00:05:35,290
So this is just the deviation of each particular data point away from the mean.

74
00:05:35,290 --> 00:05:38,710
And of course, once we find that deviation, then we square it.

75
00:05:38,710 --> 00:05:46,060
So this whole value here, quantity x sub minus mu squared, we say that that is the squared deviation.

76
00:05:46,180 --> 00:05:52,510
And then this notation here just tells us to add up all of the squared deviations, and then we're dividing

77
00:05:52,510 --> 00:05:58,840
by the number of subjects in the population capital N but adding up all of the squared deviations and

78
00:05:58,840 --> 00:06:01,600
then dividing by the number of squared deviations.

79
00:06:01,600 --> 00:06:03,940
That's just like what we did when we took the mean.

80
00:06:03,940 --> 00:06:09,370
We added up all the data points and then we divide it by n here we're adding up all the squared deviations

81
00:06:09,370 --> 00:06:10,510
and dividing by N.

82
00:06:10,510 --> 00:06:16,150
So this whole part of the formula here, this sigma notation and the divide by capital N is just taking

83
00:06:16,150 --> 00:06:18,520
the mean of the squared deviations.

84
00:06:18,520 --> 00:06:23,380
So this variance formula is really just calculating the mean of the squared deviations.

85
00:06:23,380 --> 00:06:26,110
And of course that's the same for sample variance.

86
00:06:26,110 --> 00:06:31,510
And so you can start to get an idea that what we're really doing here is we're kind of trying to find

87
00:06:31,510 --> 00:06:36,190
a mean distance away from the mean itself.

88
00:06:36,190 --> 00:06:43,270
In other words, in simple terms, on average, how far is each data point away from the mean away from

89
00:06:43,270 --> 00:06:45,250
the center of the data set?

90
00:06:45,250 --> 00:06:50,860
Because if we can find an average of how far away each data point is from that center point, then we

91
00:06:50,860 --> 00:06:55,690
can start to get an idea that, oh, the data is really spread out or that variance value is small,

92
00:06:55,690 --> 00:06:57,430
the data is really close together.

93
00:06:57,460 --> 00:07:03,010
The only thing about this concept of variance that tends to throw people off is this squared exponent

94
00:07:03,010 --> 00:07:03,220
here.

95
00:07:03,220 --> 00:07:09,820
The fact that we're squaring the deviations, there's a couple of reasons why we square this deviation

96
00:07:09,820 --> 00:07:10,360
value.

97
00:07:10,360 --> 00:07:15,670
First, if we get a value x sub I that is less than the mean.

98
00:07:15,670 --> 00:07:20,530
So thinking about our example before we said that maybe the mean of the population is four.

99
00:07:20,530 --> 00:07:25,570
Well, if this particular data point in the population is two, we're going to take two minus four,

100
00:07:25,570 --> 00:07:27,070
we're going to get a negative two.

101
00:07:27,070 --> 00:07:31,900
Our deviation is now negative and things can start to get a little strange.

102
00:07:31,900 --> 00:07:37,750
If we're working with negative values and trying to find a mean of both negative and positive values.

103
00:07:37,750 --> 00:07:43,960
So one reason that we square this deviation is to force all of these deviations to be positive.

104
00:07:43,960 --> 00:07:48,340
If we get a deviation of negative two and we square it, it becomes a positive four.

105
00:07:48,340 --> 00:07:53,710
And so then when we start adding up all of these squared deviations, we're going to be adding only

106
00:07:53,710 --> 00:07:54,790
positive values.

107
00:07:54,790 --> 00:07:56,950
It makes this calculation a lot cleaner.

108
00:07:57,070 --> 00:08:02,170
So we square this value to get everything to be positive, which makes our calculation easier.

109
00:08:02,170 --> 00:08:09,400
And the other reason we do it is because squaring the deviations magnifies the larger deviations and

110
00:08:09,400 --> 00:08:14,080
forces them to have a bigger effect on the variance than smaller deviations.

111
00:08:14,080 --> 00:08:19,630
So you can imagine here, let's say that we find a deviation this x sub minus mu value.

112
00:08:19,630 --> 00:08:22,750
If we find a deviation of ten and we square it.

113
00:08:22,750 --> 00:08:26,890
Now the squared deviation for that particular data point is 100.

114
00:08:27,040 --> 00:08:32,350
Whereas if we find a deviation of two and we square it, our squared deviation is four.

115
00:08:32,350 --> 00:08:38,740
So we're talking about the difference in deviations being two versus ten, but the difference in squared

116
00:08:38,740 --> 00:08:41,320
deviations being four versus 100.

117
00:08:41,320 --> 00:08:43,870
Now all of a sudden that difference is much, much larger.

118
00:08:43,870 --> 00:08:48,010
So it's really magnifying those values that are further from the mean.

119
00:08:48,010 --> 00:08:54,220
And that's going to help give us a clear picture of how varied, how dispersed the data really is away

120
00:08:54,220 --> 00:08:55,420
from the mean.

121
00:08:55,420 --> 00:08:58,690
So that's the variance concept.

122
00:08:58,690 --> 00:09:03,100
And then our other measure of dispersion is what's called standard deviation.

123
00:09:03,310 --> 00:09:09,610
And luckily standard deviation is very, very easy to calculate once we calculate variance and once

124
00:09:09,610 --> 00:09:15,910
we understand what variance is, because standard deviation is just the square root of variance.

125
00:09:15,910 --> 00:09:20,680
So we use this sigma letter, Greek letter sigma to indicate standard deviation.

126
00:09:20,680 --> 00:09:26,560
So notice that variance for the population with sigma squared, but standard deviation for the population

127
00:09:26,560 --> 00:09:27,760
is just sigma.

128
00:09:27,760 --> 00:09:30,190
It is the square root of sigma squared.

129
00:09:30,190 --> 00:09:35,920
So to find standard deviation, we just take the square root of sigma squared, which of course is variance.

130
00:09:35,920 --> 00:09:41,260
So we could also replace this sigma squared value underneath the square root with the entire right hand

131
00:09:41,260 --> 00:09:47,050
side of this population variance formula that would give us another formula for standard deviation.

132
00:09:47,050 --> 00:09:48,580
And then same thing over here.

133
00:09:48,580 --> 00:09:54,640
For sample standard deviation, we indicate that with SE, which of course is just the square root of

134
00:09:54,640 --> 00:09:58,660
x squared, and we know that SE squared is sample variance.

135
00:09:58,660 --> 00:09:59,380
So.

136
00:09:59,510 --> 00:10:05,030
Once we calculate population variance defined population standard deviation, we just take the square

137
00:10:05,030 --> 00:10:06,790
root of population variance.

138
00:10:06,800 --> 00:10:13,130
Or once we've calculated sample variance to find sample standard deviation, we just take the square

139
00:10:13,130 --> 00:10:17,330
root of sample variance and that'll give us sample standard deviation.

140
00:10:17,450 --> 00:10:22,190
Now some people wonder why we would ever bother using standard deviation.

141
00:10:22,190 --> 00:10:27,920
Why would we bother taking the square root to get a standard deviation value when this standard deviation

142
00:10:27,920 --> 00:10:30,530
value feels so similar to variance?

143
00:10:30,530 --> 00:10:34,850
And we've already calculated variance like what's the point of standard deviation?

144
00:10:34,880 --> 00:10:41,660
Well, the reason that we like to find standard deviation value and in fact we usually work with standard

145
00:10:41,660 --> 00:10:44,120
deviation instead of variance.

146
00:10:44,120 --> 00:10:50,330
So we'll talk much, much more about mean and standard deviation than we ever will about mean and variance

147
00:10:50,330 --> 00:10:57,020
and going forward will usually just use variance as a step on our way to finding standard deviation,

148
00:10:57,020 --> 00:10:58,520
which we're more interested in.

149
00:10:58,790 --> 00:11:04,400
The reason we're more interested in standard deviation is because the units of standard deviation will

150
00:11:04,400 --> 00:11:08,390
match the units of the mean and the units of our data set.

151
00:11:08,420 --> 00:11:15,770
So for instance, if you think about maybe a data set where all the data is given in terms of centimeters,

152
00:11:15,770 --> 00:11:24,160
so let's say that we have some sample of people's heights and that data is given to us in terms of centimeters.

153
00:11:24,170 --> 00:11:30,320
Well, when we go to find variance, the units on this x sub value are going to be centimeters.

154
00:11:30,320 --> 00:11:35,630
The units on X bar, the mean are also going to be centimeters, because all we did up here to find

155
00:11:35,630 --> 00:11:42,380
the mean was add up a bunch of centimeter values and then divide by some whole number.

156
00:11:42,380 --> 00:11:46,610
N And so the units of X bar are going to be in centimeters.

157
00:11:46,610 --> 00:11:49,430
We're going to have a mean in terms of centimeters.

158
00:11:49,430 --> 00:11:56,210
And so then when we calculate variance, the units of x sub by minus x bar are also going to be centimeters.

159
00:11:56,210 --> 00:12:01,490
But then when we square that value, we're going to end up with a centimeters squared.

160
00:12:01,490 --> 00:12:07,880
We're going to add up all those centimeter squared values, and we're going to have a centimeter squared

161
00:12:07,880 --> 00:12:10,190
unit value in this numerator.

162
00:12:10,280 --> 00:12:16,940
And then we'll divide by a whole number and we'll get a value for variance as squared in terms of centimeter

163
00:12:16,940 --> 00:12:17,630
squared.

164
00:12:17,960 --> 00:12:24,920
So if our units for the mean and for the data itself are in centimeters, then variance by definition

165
00:12:24,920 --> 00:12:26,870
is going to be in centimeter squared.

166
00:12:26,870 --> 00:12:31,430
But of course, as you can imagine, when we take the square root of this, when we take the square

167
00:12:31,430 --> 00:12:37,370
root of variance to get standard deviation, the square root of centimeter squared will just be centimeters

168
00:12:37,370 --> 00:12:37,970
again.

169
00:12:37,970 --> 00:12:42,320
And so the units of standard deviation are going to be back in terms of centimeters.

170
00:12:42,320 --> 00:12:48,890
And therefore the units of standard deviation and mean are always going to match one another, which

171
00:12:48,890 --> 00:12:50,510
is really nice.

172
00:12:50,510 --> 00:12:55,130
It's going to make these two values easier to interpret and compare and match up.

173
00:12:55,130 --> 00:13:01,580
And so that's a big part of the reason and one easy to understand reason for why we like to work with

174
00:13:01,580 --> 00:13:05,780
standard deviation, with the mean instead of variance with the mean.

175
00:13:05,780 --> 00:13:11,990
So all that being said, now that we understand mean variance, standard deviation and how we're calculating

176
00:13:11,990 --> 00:13:18,620
these, let's just do one quick example to make our calculation easy to do by hand.

177
00:13:18,620 --> 00:13:25,430
Let's pick a really super simple data set and say that we have some data set where the values are.

178
00:13:25,460 --> 00:13:29,690
We'll just pick something really easy one, two, three, four and five.

179
00:13:29,690 --> 00:13:31,760
Let's say that this is sample data.

180
00:13:31,760 --> 00:13:38,990
We've taken a sample of our population and so we can see that we have five subjects in the sample data

181
00:13:38,990 --> 00:13:42,260
and so we can say lowercase n is equal to five.

182
00:13:42,260 --> 00:13:45,620
These are our five sampled data points.

183
00:13:45,620 --> 00:13:52,220
And so if we take the mean, we should be able to see right away that the mean is three because we have

184
00:13:52,220 --> 00:13:59,000
this balancing point for the data here of three, but we could also use this formula to calculate sample

185
00:13:59,000 --> 00:14:01,550
mean by adding up these values.

186
00:14:01,550 --> 00:14:05,180
If we add up one, two, three, four and five, we get 15.

187
00:14:05,180 --> 00:14:09,500
And if we divide by N equals five, we get 15 over five and we find three.

188
00:14:09,500 --> 00:14:14,750
So we can say that x bar, the mean of the sample is equal to three.

189
00:14:14,750 --> 00:14:17,120
So we have our sample mean.

190
00:14:17,120 --> 00:14:23,780
Now we want to talk about how to find the variance of that sample just so we can see this formula in

191
00:14:23,780 --> 00:14:24,440
action.

192
00:14:24,440 --> 00:14:30,620
So again, we're going to start by looking at the deviation of each data point in the sample.

193
00:14:30,620 --> 00:14:34,700
So we're going to take this x sub value for our first data point.

194
00:14:34,700 --> 00:14:42,350
So that's one we'll take one minus three, that's x sub, B minus x bar, and that's our first deviation

195
00:14:42,350 --> 00:14:43,280
for the sample.

196
00:14:43,280 --> 00:14:48,980
So if we want to calculate here sample variance, we'll say x squared will do it down here.

197
00:14:48,980 --> 00:14:55,490
And our first deviation we said was one minus three, so one minus three quantity squared and we're

198
00:14:55,490 --> 00:14:58,190
just going to add up all of the squared deviations.

199
00:14:58,190 --> 00:14:59,360
So next we have.

200
00:14:59,470 --> 00:15:01,380
You minus three quantity squared.

201
00:15:01,390 --> 00:15:04,130
So two minus three squared.

202
00:15:04,270 --> 00:15:11,440
Then we take our third data point and we get three minus three quantity squared and then our fourth

203
00:15:11,440 --> 00:15:12,340
data point.

204
00:15:13,460 --> 00:15:18,550
And then our last data point five minus three quantity squared.

205
00:15:18,560 --> 00:15:23,300
So this is the sum of all of the squared deviations.

206
00:15:23,300 --> 00:15:28,700
And then we're going to divide that by MN minus one for sample variance.

207
00:15:28,700 --> 00:15:30,870
So we already know lowercase n is five.

208
00:15:30,890 --> 00:15:36,340
There are five data points in the sample, so n minus one is five minus one or four.

209
00:15:36,350 --> 00:15:38,510
So we get four in the denominator.

210
00:15:38,540 --> 00:15:44,120
Now, before we go forward, let's talk really briefly about this n minus one value, because you may

211
00:15:44,120 --> 00:15:50,120
be tempted to think that this denominator should just be lowercase n in the same way that the denominator

212
00:15:50,120 --> 00:15:55,040
for population variance is capital N and not capital N minus one.

213
00:15:55,040 --> 00:15:58,820
So the question is why is this lowercase n minus one?

214
00:15:59,090 --> 00:16:05,510
Well, the reason is because, of course, if we have our entire population, if we know all the data

215
00:16:05,510 --> 00:16:11,550
from the population, then we can calculate a perfectly accurate population, mean mu.

216
00:16:11,570 --> 00:16:17,300
And we can go through and find a perfectly accurate population variance and population standard deviation.

217
00:16:17,300 --> 00:16:23,330
But when we're taking a sample, of course, remember that taking a sample automatically introduces

218
00:16:23,330 --> 00:16:30,950
some variability into our calculations because of course we might take one sample as we did here and

219
00:16:30,950 --> 00:16:38,540
find a mean X bar equals three, but we might take a different sample of five subjects and find a different

220
00:16:38,540 --> 00:16:38,870
sample.

221
00:16:38,870 --> 00:16:40,130
Mean for that sample.

222
00:16:40,160 --> 00:16:43,220
Maybe that sample mean we find is x bar equals five.

223
00:16:43,220 --> 00:16:46,430
We could find another sample, mean x bar equals four.

224
00:16:46,430 --> 00:16:53,990
So the mean that we find the sample mean we find depends on the exact sample that we pull from our population.

225
00:16:54,200 --> 00:16:58,640
And the sample mean might change each time depending on the sample we pull.

226
00:16:58,640 --> 00:17:02,000
So we've got some variability here in this sample mean.

227
00:17:02,000 --> 00:17:07,609
So when we jump down here then to calculating sample variance, what we're doing here when we find these

228
00:17:07,609 --> 00:17:13,609
deviations is we're taking all of the values in our sample like up here, one, two, three, four and

229
00:17:13,609 --> 00:17:20,329
five, four x sub I, and we're finding the distance between those values and this sample mean x bar

230
00:17:20,329 --> 00:17:21,530
equals three.

231
00:17:21,530 --> 00:17:26,510
But of course by the nature of the sample mean we calculated x bar.

232
00:17:26,510 --> 00:17:35,240
We have already in a way minimized the distance, minimized the deviation of all of these sample values

233
00:17:35,240 --> 00:17:42,050
from x bar because of course this x bar value we calculated this sample mean that we calculated is by

234
00:17:42,050 --> 00:17:50,150
definition the single value we can find that is as close as possible to all of these values from our

235
00:17:50,150 --> 00:17:50,900
sample.

236
00:17:50,900 --> 00:17:54,470
It is the balancing point of all the values in our sample.

237
00:17:54,470 --> 00:17:59,210
And so we're already in a way minimizing these deviations.

238
00:17:59,210 --> 00:18:04,640
So it's kind of this idea that we are going to potentially underestimate deviation.

239
00:18:04,640 --> 00:18:11,540
And so the reason that we divide by n minus one is to correct at least a little bit for that bias.

240
00:18:11,540 --> 00:18:19,040
In other words, dividing by N minus one helps us undo a little bit of the bias that we introduced when

241
00:18:19,040 --> 00:18:25,430
we minimized the deviation in our sample by calculating this x bar value and then finding the deviation

242
00:18:25,430 --> 00:18:30,470
of each of these data points in the sample from that sample mean that we calculated.

243
00:18:30,470 --> 00:18:36,440
So that's why when we find population variance, we divide just by population capital N But when we

244
00:18:36,440 --> 00:18:42,440
find sample variance because of the nature of sampling and the bias and variability that we introduce,

245
00:18:42,440 --> 00:18:45,830
we divide instead by lowercase n minus one.

246
00:18:45,830 --> 00:18:50,480
And so in this case, this example we're working with, there were five subjects in the sample, so

247
00:18:50,480 --> 00:18:53,150
we divide by five minus one or four.

248
00:18:53,150 --> 00:18:56,090
So then all we have to do is calculate.

249
00:18:56,090 --> 00:19:00,710
Here we get one minus three is a negative two, and when we square that we get four.

250
00:19:01,070 --> 00:19:02,660
Two minus three is a negative one.

251
00:19:02,660 --> 00:19:04,790
When we square it, we get one.

252
00:19:04,910 --> 00:19:07,190
Here, we have zero, here we have four.

253
00:19:07,190 --> 00:19:13,280
Minus three is one squared is positive one and five minus three is two, quantity squared is positive

254
00:19:13,280 --> 00:19:13,820
four.

255
00:19:13,940 --> 00:19:17,480
So we get four and then we divide by four.

256
00:19:17,570 --> 00:19:22,790
And of course the value here then is ten over four or five over two.

257
00:19:22,790 --> 00:19:26,540
Or we could also write that as 2.5.

258
00:19:26,540 --> 00:19:32,600
So our sample variance then if this is our sample, our sample variance is 2.5.

259
00:19:32,600 --> 00:19:38,510
And then of course if we wanted to find standard deviation, all we have to do is take the square root

260
00:19:38,540 --> 00:19:39,680
of that value.

261
00:19:39,680 --> 00:19:48,890
And so we would say then that if we put up here sample variance is five halves, then sample standard

262
00:19:48,890 --> 00:19:56,990
deviation is the square root of five halves, the square root of five halves, and that value is approximately

263
00:19:56,990 --> 00:19:58,040
equal to.

264
00:19:58,650 --> 00:19:59,870
1.58.

265
00:19:59,880 --> 00:20:03,570
And this is, of course, like we said, 2.5.

266
00:20:03,600 --> 00:20:09,660
So hopefully that gives you an idea of what we're doing when we calculate variance or standard deviation

267
00:20:09,660 --> 00:20:15,720
for a population or a sample, both of which variance in standard deviation, both of which are measurements

268
00:20:15,720 --> 00:20:22,110
of dispersion or measurements of spread that give us an idea of how much the data is spread out around

269
00:20:22,110 --> 00:20:28,020
the mean, whether the data is spread out far from the mean, there's a wide range of the data away

270
00:20:28,020 --> 00:20:33,660
from the mean or whether that data is super tightly clustered around the mean.

