1
00:00:05,470 --> 00:00:11,110
Welcome everyone to this section of the course discussing joint distributions, more specifically,

2
00:00:11,110 --> 00:00:12,820
covariance and correlation.

3
00:00:13,740 --> 00:00:18,900
When studying mathematics for data science, we're often interested in the relationships between two

4
00:00:18,900 --> 00:00:21,030
data dimensions or data features.

5
00:00:21,150 --> 00:00:26,970
Data driven organizations often need to understand interactions between two data features in order to

6
00:00:26,970 --> 00:00:29,550
make decisions about the features that are linked.

7
00:00:30,810 --> 00:00:37,080
Joint distributions are going to allow us to mathematically quantify the relationship between two distributions

8
00:00:37,080 --> 00:00:38,010
of data.

9
00:00:38,100 --> 00:00:44,190
So here we can see a visualization of the relationship between total bill that somebody leaves after

10
00:00:44,190 --> 00:00:46,280
a meal and the tip that they leave.

11
00:00:46,290 --> 00:00:49,010
And we can see that they seem to be linked somehow.

12
00:00:49,020 --> 00:00:53,190
So there's an intuition already that is total bill increases, so does the tip.

13
00:00:54,430 --> 00:00:59,560
So this section of the course is going to focus on two specific properties of joint distributions,

14
00:00:59,560 --> 00:01:01,600
covariance and correlation.

15
00:01:01,780 --> 00:01:06,680
In general, these terms measure how much two data features move together.

16
00:01:06,700 --> 00:01:13,150
For example, the total bill moving along with tip and likewise the tip moving along the total bill.

17
00:01:14,660 --> 00:01:19,520
So it's a common task to try to understand correlations between two data variables.

18
00:01:19,550 --> 00:01:24,860
For example, most businesses are going to keep track of how correlated price changes are with overall

19
00:01:24,860 --> 00:01:25,850
sales revenue.

20
00:01:27,460 --> 00:01:32,680
When thinking about covariance and correlation, we're typically discussing it in terms of two data

21
00:01:32,680 --> 00:01:33,470
features.

22
00:01:33,490 --> 00:01:37,990
Now, I want to point something out about the notation, especially if we're thinking about something

23
00:01:37,990 --> 00:01:42,880
like machine learning when we're talking about covariance and correlation and the formulas behind them.

24
00:01:42,880 --> 00:01:47,170
We commonly notate the two data features as X and Y.

25
00:01:47,200 --> 00:01:53,410
Since if one were to plot these features, you could plot one on the x axis and the other on the y axis.

26
00:01:54,020 --> 00:01:59,720
So, for example, if we look back at our TIPS data set here, we can see Total Bill is a feature and

27
00:01:59,720 --> 00:02:01,940
we're plotting it along the x axis.

28
00:02:01,940 --> 00:02:05,860
And Tip is also a feature that we can plot along the Y axis.

29
00:02:05,870 --> 00:02:11,090
So as we continue to actually show you the formulas for covariance and correlation, we can think of

30
00:02:11,090 --> 00:02:13,850
total bill as X and tip as Y.

31
00:02:15,690 --> 00:02:20,430
Now, before we do a deep dive into the details of covariance and correlation in this lecture, we want

32
00:02:20,430 --> 00:02:23,040
to do a high level overview of these concepts.

33
00:02:23,040 --> 00:02:27,960
We'll also understand the motivation behind understanding these concepts in the context of using data

34
00:02:27,960 --> 00:02:29,310
science for business.

35
00:02:30,480 --> 00:02:32,400
So let's begin with covariance.

36
00:02:32,880 --> 00:02:35,610
The formula for covariance is the following.

37
00:02:35,610 --> 00:02:44,820
The covariance between two features X and Y is equal to one over N and the sum between x ie minus the

38
00:02:44,820 --> 00:02:49,700
average value of x multiplied by y of T minus the average value of Y.

39
00:02:49,710 --> 00:02:54,180
And we're going to break this down in just a second, but I want you to start building out an intuition

40
00:02:54,180 --> 00:02:55,230
for covariance.

41
00:02:55,230 --> 00:03:01,200
And keep in mind covariance can actually go from negative infinity all the way to positive infinity.

42
00:03:01,200 --> 00:03:07,350
And you can see a visualization of the general relationship for something that has negative covariance.

43
00:03:07,350 --> 00:03:12,240
So you can see it's kind of a negative slope there for covariance hovering around zero, there really

44
00:03:12,240 --> 00:03:15,540
is no relationship and then covariance that is greater than zero.

45
00:03:15,540 --> 00:03:21,240
You can see positive slope relationship and you can kind of tell from the formula here that it feels

46
00:03:21,240 --> 00:03:23,730
similar to calculating the slope of a line.

47
00:03:23,730 --> 00:03:25,500
But let's break it down a little further.

48
00:03:26,530 --> 00:03:28,270
Before we continue a quick note.

49
00:03:28,270 --> 00:03:33,850
If you were to Google the definition or formula for covariance, you are going to sometimes see this

50
00:03:33,850 --> 00:03:40,330
as n minus one, and other times you may see it as just n like we're showing here to be a little more

51
00:03:40,330 --> 00:03:47,020
specific and minus one is for sample covariance, while n is for population covariance.

52
00:03:47,020 --> 00:03:49,600
But we'll discuss that in more details later on.

53
00:03:49,600 --> 00:03:53,290
But right now we'll just show you the formula for population covariance.

54
00:03:54,840 --> 00:03:56,880
So how do we break this down?

55
00:03:56,910 --> 00:04:00,750
Well, we're looking at the covariance of data feature X with data feature Y.

56
00:04:00,780 --> 00:04:05,050
For example, I could say, what's the covariance between the total bill and the tip?

57
00:04:06,350 --> 00:04:11,890
Then you'll notice we're dividing by MN, where MN is the actual number of data points.

58
00:04:11,900 --> 00:04:17,810
You should immediately know how this implies that X and Y need to have matching length essentially the

59
00:04:17,810 --> 00:04:20,899
same number of data points, and that makes sense.

60
00:04:20,899 --> 00:04:26,840
You can't really calculate the covariance between two data features that don't have matching data points.

61
00:04:26,840 --> 00:04:32,240
So for example, if you went to one restaurant and had a couple of tips and then you went to another

62
00:04:32,240 --> 00:04:37,400
restaurant that had a ton of instances of total bill, you can't calculate the covariance between those

63
00:04:37,400 --> 00:04:40,130
because they don't actually share the same length.

64
00:04:42,460 --> 00:04:47,290
And you should also notice that this feels a lot like the same calculation for an average.

65
00:04:47,290 --> 00:04:52,510
So I want you to keep in mind that you're calculating some sort of average value because you're taking

66
00:04:52,510 --> 00:04:55,390
a sum and then dividing it by the number of points.

67
00:04:57,060 --> 00:04:59,430
And remember, this is the sum notation.

68
00:04:59,430 --> 00:05:05,310
So what this is doing is it states to take the sum of the calculation on the right hand side for every

69
00:05:05,310 --> 00:05:06,900
point in X and Y.

70
00:05:06,900 --> 00:05:12,390
So you're going to start with x one and Y one, basically the first row of data, then x two and Y to

71
00:05:12,390 --> 00:05:16,410
the second row of data, and you're going to go on for every single row of data.

72
00:05:16,410 --> 00:05:21,120
So that would be every instance, for example, of total bill and tip all the way until you reach X,

73
00:05:21,120 --> 00:05:22,320
y and Y.

74
00:05:22,350 --> 00:05:24,600
Essentially that goes all the way to n.

75
00:05:24,630 --> 00:05:30,300
I is just a placeholder for one, two, three, four and so on all the way to end rows or end data points.

76
00:05:31,020 --> 00:05:34,230
Now let's take a look at what's happening inside this sum.

77
00:05:34,230 --> 00:05:36,030
It's an interesting calculation.

78
00:05:36,030 --> 00:05:42,510
Notice it's the difference between each individual point and the mean value for the data series.

79
00:05:42,510 --> 00:05:46,200
Then you're going to multiply that calculation between X and Y together.

80
00:05:47,570 --> 00:05:53,600
This entire formula should feel familiar if you've already seen the formula for just straight variance.

81
00:05:54,990 --> 00:05:59,360
Notice how it's actually variance, but with another data feature.

82
00:05:59,370 --> 00:06:05,280
So if you think back to the formula for variance, it was extremely similar except you squared it.

83
00:06:05,310 --> 00:06:09,400
In this case, you're essentially doing variance between two features.

84
00:06:09,420 --> 00:06:11,730
Thus the name covariance.

85
00:06:15,410 --> 00:06:21,380
So now that we've briefly explored the formula, what information does Covariance actually report back

86
00:06:21,380 --> 00:06:21,960
to us?

87
00:06:21,980 --> 00:06:27,050
So if we take the total bills and tips, take those values, plug them into the formula, how do we

88
00:06:27,050 --> 00:06:29,090
actually interpret this final value?

89
00:06:30,340 --> 00:06:36,720
Recall that variance reports back the squared difference of the average value from the mean.

90
00:06:36,730 --> 00:06:42,850
So thinking back to just variance, if we were to visualize this and calculate it, we're just talking

91
00:06:42,850 --> 00:06:47,260
about what's the squared difference between the average value from the mean.

92
00:06:48,040 --> 00:06:51,220
So visualizing this I think helps out a lot.

93
00:06:51,310 --> 00:06:57,910
Imagine that we have a bunch of values and some of them are all of them fall between values one and

94
00:06:57,910 --> 00:06:58,450
nine.

95
00:06:58,480 --> 00:07:03,970
Notice we have no values for one, one value for two, three values, four for two values for five,

96
00:07:03,970 --> 00:07:06,610
one value at seven and one value at nine.

97
00:07:06,610 --> 00:07:12,250
So again, just talking about variance for right now, we'll expand this concept into covariance in

98
00:07:12,250 --> 00:07:15,670
just a second, but I want you to visualize variance first.

99
00:07:15,670 --> 00:07:18,850
Then it's going to become super easy to visualize covariance.

100
00:07:18,850 --> 00:07:22,960
So thinking back on variance, what's the formula actually asking us to do?

101
00:07:23,080 --> 00:07:26,200
Well, first off, I got to calculate the average value.

102
00:07:26,800 --> 00:07:31,930
So I go ahead and calculate the average value here and notice it's hovering around five.

103
00:07:33,110 --> 00:07:39,980
Then what I'm going to do is for every single point, I'm going to take that point, subtract the average

104
00:07:39,980 --> 00:07:41,630
value, and then square that.

105
00:07:41,630 --> 00:07:47,090
And if you start thinking about this visually, taking the square of this is actually the same thing

106
00:07:47,090 --> 00:07:51,600
as taking the area of a square from that point to the average value.

107
00:07:51,620 --> 00:07:56,150
So here what we've done is we've shown you those actual squares visually.

108
00:07:56,240 --> 00:08:02,360
Notice how we're essentially constructing squares from each point all the way to the average value.

109
00:08:02,360 --> 00:08:04,160
And again, this is for variance.

110
00:08:05,340 --> 00:08:10,440
And then what you're going to end up having here are all those squares, and then you can end up summing

111
00:08:10,440 --> 00:08:12,810
them up and then taking the average value.

112
00:08:12,810 --> 00:08:18,690
And that's essentially showing you, Hey, for all these points, what's the average value of your distance

113
00:08:18,690 --> 00:08:22,650
from the mean after you perform this squaring operation?

114
00:08:23,370 --> 00:08:25,080
So that's variance.

115
00:08:25,080 --> 00:08:30,540
And hopefully this visual, if you take the time, gives you a little bit of intuition behind what the

116
00:08:30,540 --> 00:08:32,010
calculation is doing.

117
00:08:32,010 --> 00:08:36,750
From a visual standpoint, again, you're taking the points, calculating the mean.

118
00:08:36,750 --> 00:08:42,090
And then if you look inside the formula, what it's actually asking you to do is take each individual

119
00:08:42,090 --> 00:08:46,170
point, go to the mean, and then take that difference and square it.

120
00:08:46,200 --> 00:08:50,880
Remember, because you're squaring it, that's always going to be a positive value and then you're going

121
00:08:50,880 --> 00:08:52,230
to take the average there.

122
00:08:54,160 --> 00:09:00,100
So the same concept of area can actually be applied if covariance, it's just going to be in two dimensions

123
00:09:00,100 --> 00:09:03,970
X and Y instead of a singular dimension, just X.

124
00:09:05,130 --> 00:09:09,780
So let's take a look again at covariance and try to understand what's happening here.

125
00:09:09,940 --> 00:09:14,880
Remember, this time it's going to be pretty much the same calculation and you're building out now a

126
00:09:14,880 --> 00:09:19,050
rectangle instead of a perfect square and you're just doing it along two dimensions.

127
00:09:20,150 --> 00:09:22,710
So you're going to start off again with X and Y.

128
00:09:22,730 --> 00:09:27,430
So we can think of X feature as total bill and Y feature as tip.

129
00:09:27,440 --> 00:09:30,830
And then what we need to do is start off by calculating the mean.

130
00:09:30,830 --> 00:09:35,510
But instead of just a single mean like we did on that single line for X, now we're going to have two

131
00:09:35,510 --> 00:09:37,430
means one for X and one for Y.

132
00:09:37,850 --> 00:09:39,530
So I'll go ahead and plot that out.

133
00:09:39,530 --> 00:09:41,720
And conveniently here it falls in the middle.

134
00:09:41,750 --> 00:09:45,170
But obviously, depending on your actual data set, it may fall somewhere else.

135
00:09:45,170 --> 00:09:50,930
But here I'm just showing you this blue dot is going to represent the value of x mean lined up with

136
00:09:50,930 --> 00:09:52,460
a value at y mean.

137
00:09:54,040 --> 00:09:56,830
Then you're going to go through each of your data points.

138
00:09:56,830 --> 00:10:02,260
So let's go through a single data point X, Y, and y Y, and I go ahead and plot that out.

139
00:10:03,360 --> 00:10:06,390
Now take a look at the actual equation.

140
00:10:06,870 --> 00:10:12,930
This equation is just the area of a square in this particular case, and technically, it could be a

141
00:10:12,930 --> 00:10:13,990
rectangle, too.

142
00:10:14,010 --> 00:10:18,450
So before we had a perfect square, but this time we have a rectangle.

143
00:10:18,480 --> 00:10:24,740
It's just going to be x, y minus x mean multiplied by y, minus y mean.

144
00:10:24,750 --> 00:10:28,230
And again, that's just one side times the other.

145
00:10:28,230 --> 00:10:32,340
So that's again just calculating a rectangular area.

146
00:10:32,490 --> 00:10:34,410
This time it's in two dimensions.

147
00:10:34,410 --> 00:10:40,500
So it's not going to be a perfect square because last time it was x, y minus x average squared.

148
00:10:40,500 --> 00:10:43,530
And in this case we're just doing it for two dimensions.

149
00:10:43,530 --> 00:10:47,160
But the same sort of visualization and intuition applies.

150
00:10:48,450 --> 00:10:53,490
And then next, you simply look at the slope between the two lines to realize whether or not this is

151
00:10:53,490 --> 00:10:55,420
going to be negative versus positive.

152
00:10:55,440 --> 00:11:01,470
And therein lies one of the main differences between variants for a single feature versus covariance

153
00:11:01,470 --> 00:11:02,860
between two features.

154
00:11:02,880 --> 00:11:06,960
Notice this time the value could actually be negative.

155
00:11:06,990 --> 00:11:12,840
Unlike last time with the square operation, it was always going to be positive and this is going to

156
00:11:12,840 --> 00:11:16,230
be the same as the slope between the two points.

157
00:11:16,230 --> 00:11:21,510
So notice how this is a negative slope, Meaning if you were to actually perform this mathematical operation,

158
00:11:21,510 --> 00:11:23,810
you're going to have a negative value.

159
00:11:23,820 --> 00:11:28,050
So we can kind of treat this area as a negative area for covariance.

160
00:11:29,290 --> 00:11:34,290
So if it's a negative slope, like in this example, we can treat this total area as negative covariance.

161
00:11:34,300 --> 00:11:39,580
And keep in mind, eventually we're going to do this for a bunch of rectangular or square values and

162
00:11:39,580 --> 00:11:40,930
then you can add them up.

163
00:11:41,820 --> 00:11:48,270
So, for example, at this point we can see that the slope between Z Y and the actual values of the

164
00:11:48,270 --> 00:11:51,530
mean X mean and Y mean, that's going to be positive.

165
00:11:51,540 --> 00:11:57,690
So I'm going to fill in this rectangle as having positive covariance and notice the slope basically

166
00:11:57,690 --> 00:11:58,980
falls in line with that.

167
00:12:00,340 --> 00:12:05,830
And now what you're going to end up doing is you keep doing this for all the points XY and Y until you

168
00:12:05,830 --> 00:12:07,170
get to the last row.

169
00:12:07,180 --> 00:12:12,730
And so you just keep drawing these rectangles in the same way you drew the squares for variance earlier.

170
00:12:14,230 --> 00:12:18,460
And again, note how the formula indicates we're doing this for all X and Y points.

171
00:12:19,420 --> 00:12:23,080
And then what you're going to end up doing is you're going to have a set of series.

172
00:12:23,080 --> 00:12:29,620
So in this particular example, I'm showing you a series set where they're doing all those rectangles.

173
00:12:29,620 --> 00:12:34,910
And in this case, a lot of them actually have a positive slope or a positive covariance.

174
00:12:34,930 --> 00:12:39,970
Notice if you were just to look at the black points and kind of ignore those green rectangles.

175
00:12:39,970 --> 00:12:45,100
For right now, the general trend looks to be like a positive slope.

176
00:12:45,100 --> 00:12:50,560
So in this case you have a positive covariance and we can see for this particular example, the covariance

177
00:12:50,560 --> 00:12:52,480
is 8.04.

178
00:12:52,540 --> 00:12:57,220
There's also a correlation value, but we'll talk more about correlation later on.

179
00:12:57,220 --> 00:13:02,980
But I want you to get this idea that when you see a bunch of points here with kind of this general positive

180
00:13:02,980 --> 00:13:08,920
trend or positive slope, if you were to do this calculation of drawing those rectangles, which is

181
00:13:08,920 --> 00:13:14,170
literally what the formula is asking you to do, you end up just summing those up and then taking the

182
00:13:14,170 --> 00:13:19,480
average of the area of those rectangles, and that would supply you with the covariance value.

183
00:13:19,480 --> 00:13:24,910
So hopefully this intuition and visualization really helps you understand what's going on behind the

184
00:13:24,910 --> 00:13:25,570
scenes.

185
00:13:26,210 --> 00:13:29,780
Let's take a look at a series with a negative covariance.

186
00:13:29,780 --> 00:13:34,940
You can still see there are some points that have that positive slope, but the majority of them look

187
00:13:34,940 --> 00:13:36,560
to have a negative slope.

188
00:13:36,560 --> 00:13:42,170
So again, when you sum these all up and take the average, you end up getting a negative covariance.

189
00:13:42,170 --> 00:13:46,460
And in this case the negative covariance is -0.119.

190
00:13:46,460 --> 00:13:48,890
And again, we'll talk more about correlation later on.

191
00:13:48,890 --> 00:13:50,750
You'll notice that the correlation is also negative.

192
00:13:52,370 --> 00:13:54,830
And then four points where they're a little more flat.

193
00:13:54,860 --> 00:13:58,830
You can see here we have two data series with almost zero covariance.

194
00:13:58,850 --> 00:14:03,860
So essentially what's happening here is your positive rectangle area is going to start balancing out

195
00:14:03,860 --> 00:14:08,750
with the negative rectangle area and that ends up summing to close to zero.

196
00:14:11,200 --> 00:14:15,010
So again, we're going to take a deeper dive into covariance later on in this section.

197
00:14:15,010 --> 00:14:20,080
But I want you to keep in mind that visualization of those positive and negative rectangular areas to

198
00:14:20,080 --> 00:14:23,530
give yourself an intuition of how covariance is actually calculated.

199
00:14:23,530 --> 00:14:29,380
And I would highly encourage you to literally just grab a piece of pen and paper and diagram this out.

200
00:14:29,380 --> 00:14:35,110
And you'll notice the equation here is essentially the equation of the averages of the areas of rectangles.

201
00:14:35,110 --> 00:14:41,470
So even though this formula at first looks really intimidating, it's basically kind of elementary school

202
00:14:41,470 --> 00:14:42,490
arithmetic here.

203
00:14:42,520 --> 00:14:46,060
It's just calculating the average areas of all these rectangles.

204
00:14:46,060 --> 00:14:50,500
But it's being clever in the fact that it's using things like the average x value and the average Y

205
00:14:50,500 --> 00:14:51,130
value.

206
00:14:53,200 --> 00:15:00,760
So in review what covariance is, it's simply the sum of the area of these rectangles that you then

207
00:15:00,760 --> 00:15:06,550
average out where the slope between the lines is going to indicate whether it's a positive area value

208
00:15:06,550 --> 00:15:08,440
or a negative area value.

209
00:15:08,440 --> 00:15:12,400
And then just again, simply average out the sum of those areas.

210
00:15:13,790 --> 00:15:19,370
I would highly encourage you to try it out yourself on some simple examples of positive and negative

211
00:15:19,370 --> 00:15:20,780
covariance series.

212
00:15:20,780 --> 00:15:26,990
So just see if you can create your own little data set with two or three rows and then go by hand,

213
00:15:26,990 --> 00:15:33,380
maybe plot them out just generally by hand and then see how that actually ends up reflecting the formula

214
00:15:33,380 --> 00:15:34,460
of covariance.

215
00:15:34,460 --> 00:15:39,890
So the formula for covariance can seem very intimidating at first, but it's literally just drawing

216
00:15:39,890 --> 00:15:44,240
rectangles and then taking the average value of the area of those rectangles.

217
00:15:45,350 --> 00:15:51,320
So now that we understand the intuition behind covariance, let's explore the related topic of correlation.

218
00:15:52,320 --> 00:15:53,970
So what is correlation?

219
00:15:54,390 --> 00:16:00,060
Well, correlation is actually very related to covariance and even uses the formula for covariance or

220
00:16:00,060 --> 00:16:02,790
the resulting covariance in its calculation.

221
00:16:02,790 --> 00:16:09,000
And as we learn about correlation, know how its possible range differs from covariance.

222
00:16:09,000 --> 00:16:16,830
So remember that covariance could go from negative infinity all the way to positive infinity while correlation

223
00:16:16,830 --> 00:16:19,410
has to be between negative one and one.

224
00:16:19,410 --> 00:16:24,840
So essentially any data points that you feed into a correlation calculation are always going to spit

225
00:16:24,840 --> 00:16:27,360
out a value that's between negative one and one.

226
00:16:28,880 --> 00:16:31,160
So here is the correlation formula.

227
00:16:31,580 --> 00:16:32,500
Pretty simple, right?

228
00:16:32,510 --> 00:16:34,280
It's just using covariance.

229
00:16:34,820 --> 00:16:41,330
So it's simply the calculation for covariance divided by the product of the standard deviations of the

230
00:16:41,330 --> 00:16:42,290
two series.

231
00:16:42,290 --> 00:16:47,720
So on the bottom here, you have the standard deviation of the x theta series multiplied by the standard

232
00:16:47,720 --> 00:16:52,700
deviation of the Y series, and you're just taking that value and then dividing the covariance value

233
00:16:52,700 --> 00:16:53,300
by it.

234
00:16:53,570 --> 00:16:55,490
So why would you do this?

235
00:16:55,940 --> 00:17:01,760
Well, let's expand this formula by showing the calculations for standard deviations and covariance.

236
00:17:02,630 --> 00:17:07,760
So if I were to plug those back in, plugging in the formula for covariance and the formulas for the

237
00:17:07,760 --> 00:17:12,890
standard deviations, I get something that looks like this the full formula for calculating the correlation.

238
00:17:14,079 --> 00:17:20,079
Recall for covariance that X and Y are separate data series, meaning they could have very different

239
00:17:20,079 --> 00:17:22,440
ranges of min and max values.

240
00:17:22,450 --> 00:17:29,140
For example, the total bill could go from maybe $0 all the way to thousands of dollars, but the tip

241
00:17:29,140 --> 00:17:33,220
itself maybe only goes from $0 to maybe hundreds of dollars.

242
00:17:33,220 --> 00:17:38,800
So that's already an issue where tracking the covariance between two series that could have a relationship,

243
00:17:38,800 --> 00:17:41,140
but their values are super different.

244
00:17:41,140 --> 00:17:47,020
That's going to mean it's going to be hard to actually interpret the raw covariance number because again,

245
00:17:47,020 --> 00:17:50,200
it can go from negative infinity all the way to positive infinity.

246
00:17:51,110 --> 00:17:58,010
So instead it would be great to standardize the covariance so that it always has values between negative

247
00:17:58,010 --> 00:17:59,300
one and one.

248
00:17:59,300 --> 00:18:04,970
And I mentioned that the very start that businesses tend to just report correlation instead of covariance.

249
00:18:04,970 --> 00:18:09,830
And that's the reason is because you always know it's going to be between negative one and one and it's

250
00:18:09,830 --> 00:18:11,450
essentially standardized that way.

251
00:18:11,450 --> 00:18:17,510
So it makes it way, way easier to interpret without needing to understand the ranges of the original

252
00:18:17,510 --> 00:18:18,950
data feature values.

253
00:18:20,880 --> 00:18:25,620
This can cause issues in trying to determine the significance of covariance values, since it can take

254
00:18:25,620 --> 00:18:29,130
any value, as I just mentioned, which is why we use correlation.

255
00:18:30,400 --> 00:18:34,360
So the formula for correlation actually standardizes this for us.

256
00:18:34,480 --> 00:18:39,460
It uses the denominator to standardize the covariance between negative one and one.

257
00:18:39,550 --> 00:18:45,400
So this is all to say if you're following this formula, you'll realize by using the mathematical trick

258
00:18:45,400 --> 00:18:51,280
of dividing covariance, by the multiplication of the standard deviation between series X and series

259
00:18:51,280 --> 00:18:55,750
Y, you end up standardizing covariance to always fall between negative one and one.

260
00:18:55,750 --> 00:18:59,200
And when I say standardized covariance, I'm really just talking about correlation.

261
00:18:59,200 --> 00:19:00,070
That's all it is.

262
00:19:00,070 --> 00:19:04,510
It's standardized covariance so that it has to fall between negative one and one.

263
00:19:04,510 --> 00:19:08,950
You can easily then interpret that number and you don't even need to know anything about the ranges

264
00:19:08,950 --> 00:19:10,000
of X and Y.

265
00:19:11,360 --> 00:19:16,940
So this makes it much easier, again, to interpret a correlation value rather than a covariance value,

266
00:19:16,940 --> 00:19:21,890
which is why in the real world you pretty much always see correlation being reported rather than covariance

267
00:19:21,890 --> 00:19:22,790
being reported.

268
00:19:24,800 --> 00:19:32,120
And you can check out here what the difference correlations look like for different distributions or

269
00:19:32,120 --> 00:19:34,150
joint distributions of features.

270
00:19:34,160 --> 00:19:39,770
And notice that technically speaking, there's going to be some strange relationships between features

271
00:19:39,770 --> 00:19:42,110
that sometimes have zero correlation.

272
00:19:42,110 --> 00:19:43,670
So something to watch out for.

273
00:19:43,700 --> 00:19:49,190
For example, you can see that we kind of have this ring or x looking values or something that looks

274
00:19:49,190 --> 00:19:53,540
like a sine wave, and depending how you calculate correlation, when you standardize it, it's going

275
00:19:53,540 --> 00:19:54,800
to end up being zero.

276
00:19:54,800 --> 00:19:56,840
So it's just something to look out for.

277
00:19:56,840 --> 00:20:01,760
But in general, when you're talking about correlation and joint distribution between two features,

278
00:20:01,760 --> 00:20:04,820
you should think of one as perfectly correlated.

279
00:20:04,820 --> 00:20:06,350
So one value goes with the other.

280
00:20:06,350 --> 00:20:11,540
They're essentially equal to each other and negative one being perfectly correlated the other way.

281
00:20:11,540 --> 00:20:15,740
So as one increases, the other decreases and then the values in between.

282
00:20:18,090 --> 00:20:22,890
Notice again how certain data sets with interesting behaviors can still have a zero correlation value.

283
00:20:22,890 --> 00:20:30,240
So there is helpfulness and actually visualizing a joint distribution along with reporting its correlation

284
00:20:30,240 --> 00:20:30,840
value.

285
00:20:32,260 --> 00:20:37,870
So we've now explored covariance and correlation and understand their ability to inform us of the relationship

286
00:20:37,870 --> 00:20:39,400
between two data series.

287
00:20:39,400 --> 00:20:44,140
And as I keep mentioning, you're most commonly going to see correlation use and organizations as the

288
00:20:44,140 --> 00:20:49,270
metric being reported since it is standardized to be between negative one and one.

289
00:20:50,260 --> 00:20:53,410
Okay, so let's take a deeper dive into covariance and correlation.

