1
00:00:05,440 --> 00:00:09,100
Welcome everyone to this section of the course on data distributions.

2
00:00:10,230 --> 00:00:14,760
In this section, we're going to be exploring concepts that allow us to better understand different

3
00:00:14,760 --> 00:00:16,200
distributions of data.

4
00:00:17,380 --> 00:00:21,700
Recall our discussions of data visualizations and specifically histograms.

5
00:00:21,730 --> 00:00:28,210
Earlier in the course, you should notice that this is actually a visualization of how this particular

6
00:00:28,210 --> 00:00:30,000
dataset is distributed.

7
00:00:30,010 --> 00:00:34,960
So we can almost think of it as modeling the distribution of the data itself.

8
00:00:34,990 --> 00:00:41,740
For example, I can see the distribution of the total bill from the tips data set that we've been exploring.

9
00:00:41,920 --> 00:00:48,520
This allows me to realize that most of the values are going to fall between $10 and $30, meaning that

10
00:00:48,520 --> 00:00:55,330
if I were to randomly pick a bill value, it has a higher likelihood of falling between $10 and $30,

11
00:00:55,330 --> 00:00:57,970
then being above $45.

12
00:00:59,170 --> 00:01:05,830
When thinking about our real world data sets, we can use data distributions to model outcomes depending

13
00:01:05,830 --> 00:01:08,380
on how the data itself is actually distributed.

14
00:01:08,500 --> 00:01:13,390
In this section, we're going to be exploring how to understand distributions and some different types

15
00:01:13,390 --> 00:01:14,860
of common distributions.

16
00:01:16,110 --> 00:01:21,540
So we're going to start off with some higher level ideas behind data distributions, including probability

17
00:01:21,540 --> 00:01:28,080
mass functions and discrete uniform distribution as an example of a PMF or probability mass function.

18
00:01:28,080 --> 00:01:34,380
And then we'll discuss probability density functions and the continuous uniform distribution as an example

19
00:01:34,380 --> 00:01:37,390
of a PDF or probability density function.

20
00:01:37,410 --> 00:01:40,620
Then we'll also talk about cumulative distribution functions.

21
00:01:41,160 --> 00:01:46,350
After that, we'll take a quick tour of some specific distributions like binomial distribution, Bernoulli

22
00:01:46,350 --> 00:01:48,330
distribution, and Poisson distribution.

23
00:01:49,710 --> 00:01:54,420
As you encounter real world data sets, you're going to begin to see behavior and patterns in the distribution

24
00:01:54,420 --> 00:02:00,810
of the data that are actually common characteristics to a particular data distribution, matching a

25
00:02:00,810 --> 00:02:06,570
real world data set to a particular distribution that is some theoretical mathematical model can actually

26
00:02:06,570 --> 00:02:08,370
be a very powerful tool.

27
00:02:09,300 --> 00:02:14,160
For example, understanding that certain data sets in the real world, such as the height of people

28
00:02:14,190 --> 00:02:20,370
are normally distributed, allows you to apply statistical operations related to normal distributions.

29
00:02:20,820 --> 00:02:25,860
As a quick note, we actually cover normal distributions, otherwise known as Gaussian distributions

30
00:02:25,860 --> 00:02:31,050
in the next section because it's actually a very specific and important distribution that shows up a

31
00:02:31,050 --> 00:02:32,400
lot in real life.

32
00:02:33,740 --> 00:02:38,840
So another example could be the binomial distribution, which can help you model events in which you

33
00:02:38,840 --> 00:02:44,960
have a sequence of independent event outcomes, for example, that could be used by a casino to model

34
00:02:44,960 --> 00:02:50,840
out the probability of winning on games such as Blackjack or Bacharach, which could help the casino

35
00:02:50,840 --> 00:02:55,970
itself manage expectations of future revenues from certain tables.

36
00:02:57,580 --> 00:03:02,320
Before we dive deeper into data distribution, let's give you an intuition of how to think about data

37
00:03:02,320 --> 00:03:07,330
distributions, especially the differences between probability mass functions and probability density

38
00:03:07,330 --> 00:03:10,150
functions, which are often confusing to beginners.

39
00:03:10,360 --> 00:03:12,160
Talking about data distributions.

40
00:03:13,380 --> 00:03:17,350
Let's begin with probability mass functions otherwise known as PM.

41
00:03:18,900 --> 00:03:24,750
Now, the formal definition of a probability mass function is, quote, a function that gives the probability

42
00:03:24,750 --> 00:03:28,950
that a discrete random variable is exactly equal to some value.

43
00:03:29,250 --> 00:03:31,590
Now, that's a lot of words.

44
00:03:31,740 --> 00:03:35,790
And this definition can seem very intimidating and confusing at first glance.

45
00:03:35,790 --> 00:03:41,430
But let's break it down and we'll realize it's actually pretty straightforward and it unlocks a lot

46
00:03:41,430 --> 00:03:43,410
of potential use cases for us.

47
00:03:44,510 --> 00:03:46,590
Let's start off with that very last word.

48
00:03:46,610 --> 00:03:47,450
Function.

49
00:03:47,720 --> 00:03:53,480
A function simply takes an input and then applies some transformation and returns the output.

50
00:03:53,510 --> 00:04:00,140
For example, I could define a function as f of x, where x is what you pass into the function and I

51
00:04:00,140 --> 00:04:03,470
could define it as f of x is equal to x plus one.

52
00:04:03,560 --> 00:04:10,010
That's a simple function that simply adds one to the input x, so that f of two is equal to three because

53
00:04:10,010 --> 00:04:11,990
two plus one is equal to three.

54
00:04:12,230 --> 00:04:16,310
Hopefully you already realize that and knew the definition of that sort of function.

55
00:04:17,410 --> 00:04:24,400
That means a probability function is similar, except it's going to return a probability value.

56
00:04:24,550 --> 00:04:27,120
For a probability mass function.

57
00:04:27,130 --> 00:04:33,880
It returns the probability that a discrete random variable is exactly equal to some value.

58
00:04:34,000 --> 00:04:36,100
Let's dive into that idea a little further.

59
00:04:37,190 --> 00:04:43,670
This means a probability mass function can be written as the following F of x is equal to the probability

60
00:04:43,670 --> 00:04:47,900
that X is equal to some particular value.

61
00:04:47,900 --> 00:04:54,440
That is a particular probability value where capital X is the discrete random variable and x is the

62
00:04:54,440 --> 00:04:55,460
certain value.

63
00:04:56,940 --> 00:05:02,280
So let's explore a simple example of a discrete random variable in order to fully understand probability

64
00:05:02,280 --> 00:05:03,300
mass functions.

65
00:05:03,330 --> 00:05:09,390
Imagine you are running a security test program on 100 employees, one of whom is the CEO.

66
00:05:10,470 --> 00:05:16,590
The security test program is going to send a test phishing email to one random employee out of the 100

67
00:05:16,590 --> 00:05:17,760
total employees.

68
00:05:18,090 --> 00:05:24,600
The question arises what is the probability that the CEO is going to be the one to receive the test

69
00:05:24,600 --> 00:05:25,860
phishing email.

70
00:05:26,040 --> 00:05:30,510
Now, you can kind of figure this out in your head, but I want to frame it in terms of probability

71
00:05:30,510 --> 00:05:31,590
mass functions.

72
00:05:33,230 --> 00:05:36,620
Again, notice a few distinct features of the situation.

73
00:05:36,710 --> 00:05:44,120
There are a discrete number of employees 100, and we should also mention they're discrete because there's

74
00:05:44,120 --> 00:05:46,640
really no employees between two employees.

75
00:05:46,640 --> 00:05:52,690
So if you have employee Bob and employee Cindy, there's no employee value that's between Bob and Cindy.

76
00:05:52,700 --> 00:05:56,810
Thus they're discrete and you have 100 separate employees.

77
00:05:57,020 --> 00:06:01,760
We're also interested in the probability of a particular event outcome.

78
00:06:01,850 --> 00:06:08,660
So the discrete number of employees gives you an idea that it should be probability mass instead of

79
00:06:08,660 --> 00:06:13,610
probability density, which will be for continuous situations, which we'll talk about later on.

80
00:06:13,880 --> 00:06:19,070
And again, we're interested in the probability of a particular event outcome where the phishing email

81
00:06:19,070 --> 00:06:22,400
happens to go specifically to the CEO.

82
00:06:23,460 --> 00:06:26,340
This means we have a probability match function.

83
00:06:26,520 --> 00:06:32,340
I'm looking for F of x as my function, where it's going to be the probability that x is equal to a

84
00:06:32,340 --> 00:06:33,390
particular value.

85
00:06:33,420 --> 00:06:37,860
In this case, I'm looking for the probability that X is equal to CEO.

86
00:06:37,890 --> 00:06:43,620
Essentially, what's the probability that this particular event of sending the email happens to go to

87
00:06:43,620 --> 00:06:44,580
the CEO?

88
00:06:46,130 --> 00:06:52,850
If we know there are only 100 employees and they all have an equal or uniform chance of being picked,

89
00:06:52,850 --> 00:06:55,250
then the odds of picking the CEO.

90
00:06:55,250 --> 00:07:01,880
That is, what's the probability that X happens to be equal to the CEO are simply one out of 100 or

91
00:07:01,880 --> 00:07:06,620
0.01 or 1% chance of picking the CEO, which kind of makes sense.

92
00:07:06,620 --> 00:07:11,810
You probably already knew in your head that if you're going to have an equal chance of picking any employee

93
00:07:11,810 --> 00:07:16,340
and only one of them is the CEO, then your chances are one over n or one over 100.

94
00:07:17,830 --> 00:07:20,740
And in this case, since the outcomes were discrete.

95
00:07:20,770 --> 00:07:24,580
Again, we label this as a probability mass function.

96
00:07:24,880 --> 00:07:29,320
It is the discrete outcomes that allow us to model this as a mass function.

97
00:07:29,320 --> 00:07:34,420
And in this particular case we had what's known as a discrete uniform distribution.

98
00:07:34,510 --> 00:07:36,600
Keep in mind that's not always the case.

99
00:07:36,610 --> 00:07:41,740
For example, maybe the CEO could have multiple emails because he or she is so important.

100
00:07:41,740 --> 00:07:47,710
And in that case, maybe it's not going to be uniform because the CEO actually has a higher chance of

101
00:07:47,740 --> 00:07:48,640
being picked.

102
00:07:48,640 --> 00:07:51,430
But in this case, we kept it discrete uniform.

103
00:07:52,730 --> 00:07:58,700
When visualizing probability mass functions, you're going to see a distinct probability points for

104
00:07:58,700 --> 00:07:59,610
each outcome.

105
00:07:59,630 --> 00:08:05,360
For example, if we only had ten employees with equal chances of being chosen, what would the PMF distribution

106
00:08:05,360 --> 00:08:06,500
visual look like?

107
00:08:07,640 --> 00:08:13,520
Well, the probability mass function for ten employees with a uniform chance could be visualized as

108
00:08:13,520 --> 00:08:14,300
the following.

109
00:08:14,330 --> 00:08:20,490
Here we're looking at a visualization of the PMF for the case where there's only ten employees.

110
00:08:20,510 --> 00:08:25,990
So keep a note here that there's no line connecting each particular employee.

111
00:08:26,000 --> 00:08:30,890
That's because each employee is discrete employee number one and employee number two, three and so

112
00:08:30,890 --> 00:08:31,280
on.

113
00:08:31,280 --> 00:08:33,770
There is no employee 2.5.

114
00:08:33,770 --> 00:08:38,690
And then on the Y axis, I can see the probability of picking the employee, which happens to be equal

115
00:08:38,690 --> 00:08:43,100
to one over n or one over ten, which is the same thing as 0.1.

116
00:08:45,260 --> 00:08:48,560
So what about probability density functions?

117
00:08:49,620 --> 00:08:55,680
So what happens in situations where the outcomes are not discrete, for example, instances where the

118
00:08:55,680 --> 00:08:57,450
outcomes are continuous?

119
00:08:57,480 --> 00:09:02,340
In this particular case, that means we're dealing with a probability density function.

120
00:09:04,500 --> 00:09:10,290
The formal definition of a probability density function is the following A function whose value at any

121
00:09:10,290 --> 00:09:15,780
given sample in the sample space can be interpreted as providing a relative likelihood that the value

122
00:09:15,780 --> 00:09:18,390
of the random variable would be close to that sample.

123
00:09:18,660 --> 00:09:20,030
Holy smokes, that's a lot.

124
00:09:20,040 --> 00:09:21,600
So again, let's break it down.

125
00:09:22,550 --> 00:09:25,880
This appears actually kind of similar to the PMF definition.

126
00:09:25,880 --> 00:09:31,490
But the key difference is that we're looking and working with a continuous random variable instead of

127
00:09:31,490 --> 00:09:34,170
what's known as a discrete random variable.

128
00:09:34,190 --> 00:09:36,950
This requires a small shift in our mindset.

129
00:09:38,260 --> 00:09:43,660
Imagine that we're going to now send out the test phishing emails to multiple accounts using a third

130
00:09:43,660 --> 00:09:44,740
party service.

131
00:09:44,860 --> 00:09:50,980
Typically, once we hit send, the emails are sent out within 1 to 3 minutes after we actually push

132
00:09:50,980 --> 00:09:51,670
the button.

133
00:09:51,940 --> 00:09:54,620
No how we're now framing this problem differently.

134
00:09:54,640 --> 00:09:57,400
It's now actually a continuous random variable.

135
00:09:57,400 --> 00:10:04,990
If we're thinking about the question how long after I hit send will the phishing email actually take

136
00:10:04,990 --> 00:10:06,460
to go out to the person?

137
00:10:06,460 --> 00:10:10,300
So now I'm not even thinking about who's the phishing email going to hit.

138
00:10:10,300 --> 00:10:16,090
Instead, I'm thinking about the question after you hit send, how long does it take for the email to

139
00:10:16,090 --> 00:10:17,620
actually reach the inbox?

140
00:10:17,620 --> 00:10:23,920
So, for example, what's the probability of having the email go out over 60 seconds long?

141
00:10:25,840 --> 00:10:31,150
So again, what if we wanted to know the probability that the emails get sent at exactly the one minute

142
00:10:31,150 --> 00:10:36,160
mark and ought to be very clear When I say exactly one minute, I mean precisely.

143
00:10:36,160 --> 00:10:37,690
Exactly one minute.

144
00:10:37,690 --> 00:10:43,000
So 60.00000 0 seconds, etc. after we hit send.

145
00:10:43,750 --> 00:10:47,620
Well, you should be thinking about is does that question even make sense?

146
00:10:47,740 --> 00:10:54,700
It's probably almost impossible that using this third party provider, the email I sent is going to

147
00:10:54,700 --> 00:11:03,070
be sent at exactly 60.000 0 seconds, etc. And the reason for that is because we're now dealing with

148
00:11:03,070 --> 00:11:11,110
a continuous feature or a continuous random variable, because now we're dealing with time instead of

149
00:11:11,110 --> 00:11:13,240
the separate discrete employees.

150
00:11:13,240 --> 00:11:20,920
I now have a continuous random variable that is could be 60 seconds, 61, 62 and there actually are

151
00:11:20,920 --> 00:11:25,480
values in between two points in that random variable.

152
00:11:25,480 --> 00:11:31,240
There are values between 60 seconds and 61 seconds, and there's kind of an infinite amount of values

153
00:11:31,240 --> 00:11:33,040
if I keep getting precise enough.

154
00:11:33,040 --> 00:11:36,340
Hopefully you remember that from our discussions of continuous features.

155
00:11:37,720 --> 00:11:44,530
So for continuous random variables, we switch our thinking from exact outcomes, which we did for discrete

156
00:11:44,530 --> 00:11:51,760
random variables like exactly employee number one or exactly the CEO to either approximations or interval

157
00:11:51,760 --> 00:11:56,230
ranges, since those make way more sense for continuous random variables.

158
00:11:56,260 --> 00:11:58,840
Thus the term the density function.

159
00:11:58,840 --> 00:12:05,440
And if you go back to the original definition, that's why it said close to or approximate a certain

160
00:12:05,440 --> 00:12:06,000
value.

161
00:12:06,010 --> 00:12:09,980
So we're not looking for an exact match to a discrete outcome.

162
00:12:10,000 --> 00:12:15,880
Instead, it starts making a lot more sense for continuous random variables to talk about ranges or

163
00:12:15,880 --> 00:12:16,720
intervals.

164
00:12:18,150 --> 00:12:23,310
For example, it'd be more useful to know what's the probability that the time it takes to send the

165
00:12:23,310 --> 00:12:25,860
email is less than one minute.

166
00:12:25,920 --> 00:12:30,510
So this allows us again, to answer the question what's the probability that emails are sent out in

167
00:12:30,510 --> 00:12:32,340
one minute or less?

168
00:12:32,340 --> 00:12:37,560
And now I'm dealing with a range or an interval, and that makes a lot more sense for a continuous random

169
00:12:37,560 --> 00:12:38,200
variable.

170
00:12:38,220 --> 00:12:41,940
Thus we can think about it as a probability density function.

171
00:12:43,450 --> 00:12:49,180
There are many different types of probability density functions, such as a normal distribution or something

172
00:12:49,180 --> 00:12:50,800
like a Boltzmann distribution.

173
00:12:52,340 --> 00:12:58,040
In fact, there are actually so many different types of distributions, both probability mass functions

174
00:12:58,040 --> 00:12:59,720
and probability density functions.

175
00:12:59,720 --> 00:13:04,640
You can actually view a list of them on Wikipedia, and I would encourage you to visit this link or

176
00:13:04,640 --> 00:13:10,730
just Google search list of probability distributions on Wikipedia and check them all out and explore

177
00:13:10,760 --> 00:13:11,210
them.

178
00:13:11,210 --> 00:13:16,550
Maybe be even a little confused by them, but then understand that they're actually all just examples

179
00:13:16,550 --> 00:13:18,590
of either a PMF or a PDF.

180
00:13:18,590 --> 00:13:22,370
And we're going to dive into a few of them in this particular section of the course.

181
00:13:23,610 --> 00:13:29,220
The key insight for understanding data distributions is that once you can actually connect a real world

182
00:13:29,220 --> 00:13:35,100
set of events or data points to a particular data distribution, you can leverage the properties of

183
00:13:35,100 --> 00:13:37,830
that data distribution to calculate probabilities.

184
00:13:38,720 --> 00:13:44,030
For example, in the very first example that I just talked about in this lecture, where probability

185
00:13:44,030 --> 00:13:50,540
mass functions and security, phishing emails, I had 100 employees, all with an equal chance of being

186
00:13:50,540 --> 00:13:51,260
picked.

187
00:13:51,500 --> 00:13:56,070
That's my real world situation if I think about the theoretical match to it.

188
00:13:56,090 --> 00:13:59,540
It's actually a discrete uniform distribution.

189
00:13:59,630 --> 00:14:05,570
It's discrete because all the employees were discrete and it's uniform since they all had the same probability

190
00:14:05,570 --> 00:14:06,800
of being chosen.

191
00:14:07,900 --> 00:14:13,540
That means I can actually look up the properties of a discrete uniform distribution on something like

192
00:14:13,540 --> 00:14:18,730
Wikipedia, and then I'll realize that there's actually a bunch of formulas already defined.

193
00:14:18,730 --> 00:14:25,060
And in fact, a discrete uniform distribution has a property that the probability of a single discrete

194
00:14:25,060 --> 00:14:30,850
outcome is simply one over n where n is the number of discrete event possibilities.

195
00:14:30,880 --> 00:14:34,030
Our example wn was equal to 100 employees.

196
00:14:34,030 --> 00:14:38,320
Thus the CEO had a one over 100 probability of being picked.

197
00:14:38,560 --> 00:14:43,870
That can seem super obvious to anybody who doesn't even know what theta distributions are.

198
00:14:43,870 --> 00:14:50,530
But as the data distributions get more complex, it becomes necessary to actually look up the formula.

199
00:14:51,930 --> 00:14:58,260
So again, a discrete uniform distribution is a simpler use case where one over MN may have been obvious

200
00:14:58,260 --> 00:15:01,460
even if you didn't realize it was a discrete uniform distribution.

201
00:15:01,470 --> 00:15:07,080
But keep in mind there are many real world situations that match up to more complex data distributions

202
00:15:07,080 --> 00:15:12,840
like a binomial distribution, trying to figure out the odds of winning a hand at blackjack or something

203
00:15:12,840 --> 00:15:13,560
like that.

204
00:15:13,710 --> 00:15:19,290
So calculating those event probabilities would be difficult without referencing prior works or proofs.

205
00:15:19,290 --> 00:15:24,150
So we want to make sure we take advantage of all the work mathematicians have already done for us in

206
00:15:24,150 --> 00:15:24,870
history.

207
00:15:24,870 --> 00:15:31,560
That way we already can connect real world data sets to theoretical data distributions and then we can

208
00:15:31,560 --> 00:15:32,910
match up the properties.

209
00:15:34,230 --> 00:15:38,760
So again, again, fortunately for us, mathematicians of the past such as Bernoulli, have already

210
00:15:38,760 --> 00:15:39,940
done the hard work for us.

211
00:15:39,960 --> 00:15:45,180
We can stand on their shoulders to quickly assess what formulas to use once we know the data distribution

212
00:15:45,180 --> 00:15:46,440
that we're actually working with.

213
00:15:47,610 --> 00:15:52,560
Let's continue exploring data distributions as well as taking a deeper dive into the concepts such as

214
00:15:52,590 --> 00:15:55,380
PMS and PDFs discussed in this lecture.

