1
00:00:00,150 --> 00:00:05,190
Before we actually get to talking about probability math functions, let's lay some foundation and talk

2
00:00:05,190 --> 00:00:06,780
about functions in general.

3
00:00:06,810 --> 00:00:12,600
For instance, let's think about this graph here, which is the graph of the function.

4
00:00:12,720 --> 00:00:15,770
F of x equals x minus one.

5
00:00:15,780 --> 00:00:19,560
We want to think about functions as inputs and outputs.

6
00:00:19,560 --> 00:00:25,710
And what they tell us is that for any value we input for X on the right hand side here the function

7
00:00:25,710 --> 00:00:30,140
will return to us a value for this left hand side, this F of x value.

8
00:00:30,150 --> 00:00:36,660
So for instance, if we pick x equals one, then the function gives back to us one minus one or zero,

9
00:00:36,660 --> 00:00:45,120
which means if we set up down here a table of inputs and outputs, so we'll have our inputs X and our

10
00:00:45,120 --> 00:00:46,560
outputs F of x.

11
00:00:46,560 --> 00:00:52,860
We said that if we pick an input of x equals one, our output is one minus one or zero.

12
00:00:52,980 --> 00:00:57,090
If we pick an input of x equals two, we get two minus one or one.

13
00:00:57,090 --> 00:01:00,030
So an input of two gives an output of one.

14
00:01:00,030 --> 00:01:05,790
An input of three would give an output of two etc. And of course that makes sense because this function

15
00:01:05,790 --> 00:01:10,950
here is just telling us whatever value we put in for x, we're going to subtract one from it to get

16
00:01:10,950 --> 00:01:11,460
the output.

17
00:01:11,460 --> 00:01:17,550
And so we can see that all of the outputs are one less than the input values and we can see that in

18
00:01:17,550 --> 00:01:18,810
the graph as well.

19
00:01:18,840 --> 00:01:22,200
Obviously all of these then become coordinate points.

20
00:01:22,200 --> 00:01:26,040
So we have this coordinate .10 and we see that.

21
00:01:26,990 --> 00:01:27,800
Right here.

22
00:01:27,830 --> 00:01:30,730
We also have this coordinate .21.

23
00:01:30,740 --> 00:01:32,930
We see that point here at two.

24
00:01:33,050 --> 00:01:38,180
One, we see that it's along this line and we have the 0.32.

25
00:01:38,210 --> 00:01:44,420
So we go out to X equals three and up to two and we sketch in that point.

26
00:01:44,510 --> 00:01:51,140
So what this function is saying is that if we pick a value X equals two, we come up to the graph of

27
00:01:51,140 --> 00:01:56,630
the function and then we move straight over to the vertical axis and we see that the value we get back

28
00:01:56,630 --> 00:01:57,320
is one.

29
00:01:57,320 --> 00:02:04,640
So when we pick two, we get back a value of one, or if we pick three, we come up to the function

30
00:02:04,640 --> 00:02:06,200
at x equals three.

31
00:02:06,200 --> 00:02:11,180
And then from here we go over to the vertical axis and we find two.

32
00:02:11,180 --> 00:02:13,970
So if the input is three, the output must be two.

33
00:02:13,970 --> 00:02:16,100
If the input is two, the output must be one.

34
00:02:16,100 --> 00:02:22,100
If the input is one here, the output must be zero, so zero and one.

35
00:02:22,100 --> 00:02:27,980
So we can get these input output pairs just from the function as we showed with our table here.

36
00:02:27,980 --> 00:02:33,830
Without ever having the graph, we can also get the input output pairs from the graph itself and the

37
00:02:33,830 --> 00:02:38,000
input output pairs from the graph will always match the values we get in our table.

38
00:02:38,000 --> 00:02:44,510
The values that are generated when we plug in values of X to our function well, a probability mass

39
00:02:44,510 --> 00:02:51,950
function works the same way, except that we plug in values that the variable can take on and what's

40
00:02:51,950 --> 00:02:54,170
returned to us is a probability.

41
00:02:54,170 --> 00:02:59,720
So here in this scenario, what was returned to us was a value along the vertical axis here, a value

42
00:02:59,720 --> 00:03:02,120
for F of X in a probability mass function.

43
00:03:02,120 --> 00:03:07,760
What we get back is a probability, a probability that that input value occurs.

44
00:03:07,760 --> 00:03:12,380
So as an example, let's look at this probability mass function.

45
00:03:12,560 --> 00:03:16,820
This function models the probability of flipping heads.

46
00:03:16,820 --> 00:03:19,250
If we flip a coin two times.

47
00:03:19,250 --> 00:03:22,370
Now, we've already looked at probability with coin flips.

48
00:03:22,370 --> 00:03:29,810
So we know that when we flip a coin two times we can flip heads and then heads we can flip heads and

49
00:03:29,810 --> 00:03:30,710
then tails.

50
00:03:30,710 --> 00:03:35,330
Or we can flip tails and then heads or tails and then tails.

51
00:03:35,330 --> 00:03:40,910
There are four possible equally likely outcomes when we flip a coin two times.

52
00:03:40,910 --> 00:03:46,520
Now, if we think about the number of heads we can get over the course of two coin flips, we can either

53
00:03:46,520 --> 00:03:49,670
get two heads here, which we see in this outcome.

54
00:03:49,670 --> 00:03:53,210
We can flip heads once, once or zero times.

55
00:03:53,210 --> 00:03:59,240
So the only possible options are flipping heads zero times, one time or two times.

56
00:03:59,240 --> 00:04:00,890
And that's what we see here.

57
00:04:00,890 --> 00:04:05,630
In the probability mass function x can be zero, it can be one or it can be two.

58
00:04:05,630 --> 00:04:13,100
So this part of the function here represents all of the possible outcomes of our discrete random variable.

59
00:04:13,100 --> 00:04:18,800
Now, it's important that we say discrete random variable because we're only going to use probability

60
00:04:18,800 --> 00:04:24,140
mass functions to model discrete random variables, not continuous random variables.

61
00:04:24,140 --> 00:04:30,320
And the reason is because we have to be able to identify a specific probability for each outcome.

62
00:04:30,320 --> 00:04:36,080
And remember, with continuous random variables, we can continue to measure an infinite number of between

63
00:04:36,080 --> 00:04:36,560
values.

64
00:04:36,560 --> 00:04:42,950
So whereas in this example, where we can flip heads zero times one time or two times, if we had a

65
00:04:42,950 --> 00:04:47,990
continuous random variable, we might need to look at the probability of zero times, but then also

66
00:04:47,990 --> 00:04:56,900
a quarter of a flip or a half of a flip or 0.4782935 flips every value, an infinite number of values

67
00:04:56,900 --> 00:05:01,250
in between zero and one and an infinite number of values between zero and two.

68
00:05:01,250 --> 00:05:07,340
And we can't define a specific probability for all of those infinite number of values.

69
00:05:07,340 --> 00:05:13,460
So we only use probability mass functions to model discrete random variables where we have this countable

70
00:05:13,460 --> 00:05:17,630
set of discrete values that our random variable can take on.

71
00:05:17,630 --> 00:05:23,570
So we have to have a discrete random variable, and all of the values that we can get are defined over

72
00:05:23,570 --> 00:05:25,520
here in this part of the function.

73
00:05:25,520 --> 00:05:30,710
This part of the function gives the probability that each of these outcomes occur.

74
00:05:30,710 --> 00:05:37,730
So this part here is P of X, the probability that the specific value of X occurs.

75
00:05:37,730 --> 00:05:43,880
And we can see here with our little chart we made that of the four equally likely outcomes will flip

76
00:05:43,880 --> 00:05:48,620
heads zero times exactly one out of those four outcomes.

77
00:05:48,620 --> 00:05:54,050
So there's a one in four chance that will flip heads zero times when we flip a coin two times.

78
00:05:54,050 --> 00:05:58,070
And so a one in four chance is a 0.25 chance.

79
00:05:58,070 --> 00:05:59,480
The same thing happens here.

80
00:05:59,480 --> 00:06:05,720
The probability of flipping heads two times happens only one out of these four possible scenarios.

81
00:06:05,720 --> 00:06:09,710
So that probability is also one fourth or 0.25.

82
00:06:09,710 --> 00:06:15,440
But the probability of flipping heads exactly once happens two out of four times.

83
00:06:15,440 --> 00:06:22,400
So the probability of x equals one is two over four or one half or 0.50.5.

84
00:06:22,400 --> 00:06:26,450
So we've matched up each of these probabilities with the course.

85
00:06:26,660 --> 00:06:27,500
The outcome.

86
00:06:27,500 --> 00:06:33,620
It's important to say now to then another requirement of a probability mass function is that all of

87
00:06:33,620 --> 00:06:41,000
these P of X values here have to add to one because remember, total probability is always one or 100%.

88
00:06:41,120 --> 00:06:45,770
Every individual probability value has to fall between zero and one.

89
00:06:46,070 --> 00:06:51,320
So 0.250.5 and 0.25 all fall between zero and one.

90
00:06:51,320 --> 00:06:57,800
And these values add up to 1.25 plus 0.5 plus 0.25 is one.

91
00:06:57,800 --> 00:07:00,440
So we have the requirement that these values add to one.

92
00:07:00,440 --> 00:07:05,360
We have the requirement that we're dealing with a discrete random variable, and we just set it a little

93
00:07:05,360 --> 00:07:11,360
bit, but let's be a little bit more explicit that all of these P of X values have to be positive.

94
00:07:11,360 --> 00:07:17,030
So we can contrast that with this kind of function, which we might have learned about in an algebra

95
00:07:17,030 --> 00:07:17,570
course.

96
00:07:17,570 --> 00:07:23,600
In this kind of function, we can find negative values of F of X, For instance, if we were to set

97
00:07:23,600 --> 00:07:29,750
X equal to one half, our input would be one half and our output would be one half minus one or negative

98
00:07:29,750 --> 00:07:30,320
one half.

99
00:07:30,320 --> 00:07:32,450
And we see that along the graph here.

100
00:07:32,450 --> 00:07:39,860
If we pick this x equals one half value and we connect it to our graph and then go straight over to

101
00:07:39,860 --> 00:07:44,990
the vertical axis, we see that we get a negative value of one half and F of x.

102
00:07:44,990 --> 00:07:49,220
This function is perfectly defined for that set of values.

103
00:07:49,220 --> 00:07:53,420
We could say the input is one half, the output is negative one half.

104
00:07:53,420 --> 00:07:55,250
That's totally valid.

105
00:07:55,250 --> 00:08:02,600
That satisfies this function f of x, but for probability mass functions we can never find a negative

106
00:08:02,600 --> 00:08:04,040
output like this.

107
00:08:04,040 --> 00:08:08,570
In our case, that means we can never get a negative probability.

108
00:08:08,570 --> 00:08:13,160
So all of these values right here have to be zero or positive.

109
00:08:13,160 --> 00:08:19,790
They cannot be negative, which makes sense because negative probability is really nonsensical.

110
00:08:20,000 --> 00:08:21,980
So those are our three requirements.

111
00:08:21,980 --> 00:08:26,810
We have to be dealing with a discrete random variable, which means we have a countable set of values.

112
00:08:26,810 --> 00:08:28,940
Our set of values is not continuous.

113
00:08:28,940 --> 00:08:30,800
We don't have a continuous random variable.

114
00:08:30,800 --> 00:08:36,980
And all of these probability outputs here have to be defined between zero and one, and in total they

115
00:08:36,980 --> 00:08:38,270
have to add to one.

116
00:08:38,270 --> 00:08:41,330
By definition, of course, that means they can't be negative.

117
00:08:41,360 --> 00:08:47,090
So with that said, the only other thing we want to say about probability mass functions is that sometimes

118
00:08:47,090 --> 00:08:52,310
we'll see them written this way and then sometimes we'll add an extra line to the function, or we'll

119
00:08:52,310 --> 00:08:59,210
see this line added to the function that says that the function is zero for any other values or zero

120
00:08:59,210 --> 00:09:00,020
otherwise.

121
00:09:00,020 --> 00:09:05,720
And if we have this line, all it's doing is clarifying for us that the probability that X takes on

122
00:09:05,720 --> 00:09:09,830
any other value other than zero one or two is zero.

123
00:09:09,860 --> 00:09:14,690
It doesn't really change anything because our probabilities still add to one.

124
00:09:14,690 --> 00:09:20,990
We're just explicitly stating by putting in this extra piece of information that these are indeed the

125
00:09:20,990 --> 00:09:27,200
only values that X can take and confirms for us that the total probability of this probability mass

126
00:09:27,200 --> 00:09:30,200
function is defined in these first three rows here.

127
00:09:30,200 --> 00:09:32,420
So sometimes we'll see this, sometimes we won't.

128
00:09:32,420 --> 00:09:34,160
We don't necessarily have to have it.

129
00:09:34,160 --> 00:09:38,930
It's just important that we meet all of the other requirements of discrete random variable probabilities

130
00:09:38,930 --> 00:09:42,500
defined between zero and one that all sum to one.

131
00:09:42,500 --> 00:09:48,860
Now, just like with this function, this algebraic function where we were able to sketch its graph,

132
00:09:48,860 --> 00:09:52,100
let's talk about the graph of a probability math function.

133
00:09:52,460 --> 00:09:57,710
So really we're setting it up on horizontal and vertical axes, just like we did over here.

134
00:09:57,710 --> 00:10:04,370
And we always include all possible values of x, our discrete random variable along the horizontal axis.

135
00:10:04,370 --> 00:10:08,390
So here are possible values for x or zero one or two.

136
00:10:08,390 --> 00:10:14,210
So we include those along the horizontal axis zero one and two, and then along the vertical axis we'll

137
00:10:14,210 --> 00:10:17,750
mark off increments of our probability P of X.

138
00:10:17,750 --> 00:10:25,880
And then in order to graph this probability mass function, all we do is add in these columns here which

139
00:10:25,880 --> 00:10:33,410
show that the probability that X is zero is 0.25 or 25% or one in four.

140
00:10:33,410 --> 00:10:41,030
So we draw in this bar, this line at x equals zero and extending up to 0.25, whereas the probability

141
00:10:41,030 --> 00:10:43,780
that x equals one is 0.5.

142
00:10:43,790 --> 00:10:50,780
So at X equals one, we sketch in this bar extending up to 0.5 along the vertical axis and then again

143
00:10:50,780 --> 00:10:56,840
here at x equals two, since the probability associated with x equals two is 0.25, we sketch in this

144
00:10:56,840 --> 00:10:59,450
bar all the way up to 0.25.

145
00:10:59,450 --> 00:11:06,380
So seeing the way that these two things match each other, we should be able to from this function sketch

146
00:11:06,380 --> 00:11:09,110
the graph of the probability mass function.

147
00:11:09,110 --> 00:11:14,270
But we should also be able to take this graph and define an algebraic form up here.

148
00:11:14,300 --> 00:11:16,460
The probability mass function F of x.

149
00:11:16,460 --> 00:11:21,110
Obviously, if we were starting from the graph, we could start here along the horizontal axis to see

150
00:11:21,110 --> 00:11:26,150
that x is defined at zero one or two and we could set up this part.

151
00:11:26,820 --> 00:11:29,940
Of our probability mass function.

152
00:11:29,940 --> 00:11:35,760
And then for each of those values, zero one and two, we would find the corresponding probability and

153
00:11:35,760 --> 00:11:37,020
we would set up.

154
00:11:37,710 --> 00:11:42,990
These pieces and then we would just say F of X is equal to we'd have our open brace here.

155
00:11:42,990 --> 00:11:49,620
And then our three scenarios defining the probability for each possible value of the discrete random

156
00:11:49,620 --> 00:11:50,070
variables.

157
00:11:50,070 --> 00:11:56,490
So we should be able to move back and forth between this algebraic representation and this visual graphical

158
00:11:56,490 --> 00:11:57,480
representation.

159
00:11:57,660 --> 00:12:02,610
And then the only other thing we want to say is that just like when we talked about probability before

160
00:12:02,610 --> 00:12:08,340
and we learned about complimentary probability and and or probability, we can answer those same kinds

161
00:12:08,340 --> 00:12:10,590
of questions with a probability math function.

162
00:12:10,590 --> 00:12:17,400
So whether we have this representation of F of X or this graphical representation of F of X, we should

163
00:12:17,400 --> 00:12:22,350
of course be able to state the probability that X is equal to zero or x is equal to one or x is equal

164
00:12:22,350 --> 00:12:23,070
to two.

165
00:12:23,070 --> 00:12:28,290
But we should also be able to find the probability, let's say, that X is equal to zero or one.

166
00:12:28,290 --> 00:12:35,010
We know from the addition rule that the probability that X is equal to zero or one has to be 0.25 plus

167
00:12:35,010 --> 00:12:35,820
0.5.

168
00:12:35,820 --> 00:12:49,830
So the probability that x is zero or one would be equal to 0.25 plus 0.5 or 0.75.

169
00:12:49,890 --> 00:12:54,900
Or we could say something like What's the probability that X is not one?

170
00:12:54,900 --> 00:12:59,790
So the probability that X is not one.

171
00:13:00,030 --> 00:13:03,210
Well, that would be everything but this 0.5 here.

172
00:13:03,210 --> 00:13:06,450
So we would take 0.25 plus 0.25.

173
00:13:07,500 --> 00:13:16,950
And we would get 0.5 or the probability of not one we could calculate as one minus the probability of

174
00:13:16,950 --> 00:13:19,950
x equal to one or 0.5.

175
00:13:19,950 --> 00:13:22,570
And we get the same answer, 0.5.

176
00:13:22,590 --> 00:13:28,830
And of course, we could answer these kinds of questions from this representation of F of X, but also

177
00:13:28,830 --> 00:13:35,040
from the graphical representation of F of X, for instance, this probability of zero or one question.

178
00:13:35,040 --> 00:13:41,460
We would just identify here the probability as 0.25 and here the probability as 0.5.

179
00:13:41,460 --> 00:13:44,280
We would add those together to get 0.75.

180
00:13:44,400 --> 00:13:50,370
So that's the idea of a probability math function, how they're related to the original idea of a function

181
00:13:50,370 --> 00:13:56,910
as a set of inputs and outputs, how the outputs are defined as measures of probability, how we can

182
00:13:56,910 --> 00:14:02,340
only use them for discrete random variables where the probabilities are all defined between zero and

183
00:14:02,340 --> 00:14:08,610
one and sum to one, and how we can represent probability mass functions both algebraically and graphically,

184
00:14:08,610 --> 00:14:13,560
and use both representations to answer all different kinds of probability questions.

