1
00:00:00,120 --> 00:00:06,510
Cumulative distribution functions model, the probability that some random variable will take on a value

2
00:00:06,510 --> 00:00:09,260
less than some specific value that we set.

3
00:00:09,270 --> 00:00:11,360
So we model them this way.

4
00:00:11,370 --> 00:00:12,990
This is what the equation looks like.

5
00:00:12,990 --> 00:00:19,200
We distinguish between probability mass functions and probability density functions and cumulative distribution

6
00:00:19,200 --> 00:00:24,840
functions by using this capital F, So with probability mass and probability density functions, we

7
00:00:24,840 --> 00:00:26,550
used a lowercase F.

8
00:00:26,550 --> 00:00:32,549
If you see a capital F, it means that we have transitioned to a cumulative distribution function instead.

9
00:00:32,549 --> 00:00:39,300
And what this says here is that the cumulative distribution function is equal to this is the probability

10
00:00:39,300 --> 00:00:45,900
that our discrete random variable capital X takes on some value less than or equal to this value of

11
00:00:45,900 --> 00:00:47,310
x that we set.

12
00:00:47,340 --> 00:00:54,630
Now, we actually understand this better if we look at the graphs of the distributions for both discrete

13
00:00:54,630 --> 00:00:56,670
and continuous random variables.

14
00:00:56,670 --> 00:00:59,370
So let's start with a discrete random variable.

15
00:00:59,370 --> 00:01:02,040
So we're talking about discrete random variables here.

16
00:01:02,370 --> 00:01:07,140
We have already looked at the probability mass function for a discrete random variable.

17
00:01:07,140 --> 00:01:10,440
This is the one we've been using throughout these lessons.

18
00:01:10,440 --> 00:01:15,750
It's the probability mass function associated with rolling one six sided die one time.

19
00:01:15,750 --> 00:01:21,240
And it shows us that a probability that we get a one, two, three, four, five or six when we roll

20
00:01:21,240 --> 00:01:23,610
the die is always equivalent.

21
00:01:23,610 --> 00:01:26,310
That's why we see that all of these bars have equal heights.

22
00:01:26,310 --> 00:01:28,590
The probability is always one over six.

23
00:01:28,590 --> 00:01:33,540
The probability of rolling a one is one over six, The probability of rolling a two is one over six,

24
00:01:33,540 --> 00:01:40,350
etc. So this is the probability distribution showing the probability mass function for the discrete

25
00:01:40,350 --> 00:01:41,340
random variable.

26
00:01:41,340 --> 00:01:47,550
This graph on the right, on the other hand, is the cumulative distribution function that's associated

27
00:01:47,550 --> 00:01:50,100
with this same probability mass function.

28
00:01:50,100 --> 00:01:56,610
So if we were to transfer this information from the probability distribution over here to this cumulative

29
00:01:56,610 --> 00:02:00,180
distribution, this is how one graph translates to the other.

30
00:02:00,180 --> 00:02:05,700
In other words, this is the same probability here, the probability associated with rolling a six sided

31
00:02:05,700 --> 00:02:06,870
die one time.

32
00:02:06,870 --> 00:02:09,150
We're just showing the cumulative distribution.

33
00:02:09,150 --> 00:02:15,270
And so what we're saying here, when we look at the graph or what this equation over here tells us is

34
00:02:15,270 --> 00:02:19,020
that we can define this kind of cumulative probability.

35
00:02:19,020 --> 00:02:25,500
So instead of just saying what is the probability that we roll a one or roll a four or roll a six,

36
00:02:25,500 --> 00:02:31,560
this is saying what is the probability that we roll a number less than or equal to two, less than or

37
00:02:31,560 --> 00:02:34,410
equal to four or less than or equal to six, for example.

38
00:02:34,410 --> 00:02:39,630
So that word cumulative is appropriate because the probability accumulates.

39
00:02:39,630 --> 00:02:47,400
And that idea of accumulation means that the cumulative distribution is always going to be increasing.

40
00:02:47,400 --> 00:02:53,610
And in fact, because we know that probability is always defined between zero and one over here on the

41
00:02:53,610 --> 00:03:00,120
left edge of the graph of our cumulative distribution function, the probability that we see is zero.

42
00:03:00,150 --> 00:03:05,340
The value of the graph along the vertical axis here is at zero, and that probability is always going

43
00:03:05,340 --> 00:03:10,920
to increase and increase and increase as we move to the right until eventually it gets to one.

44
00:03:10,920 --> 00:03:16,230
We see here we have six over six or one, the probability tops out at one.

45
00:03:16,530 --> 00:03:23,220
And so we say that the cumulative distribution has these properties, that the limit as X gets close

46
00:03:23,220 --> 00:03:29,550
to negative infinity is zero, and that the limit as x gets close to positive infinity is one.

47
00:03:29,730 --> 00:03:34,890
Now this limit concept comes from calculus, and that's really beyond the scope of this course.

48
00:03:34,890 --> 00:03:40,260
But technically all this is really saying is that we're at a probability of zero over here on the left.

49
00:03:40,260 --> 00:03:42,060
That's where this zero comes from.

50
00:03:42,180 --> 00:03:44,460
And one over here on the right.

51
00:03:44,460 --> 00:03:46,190
That's where this one comes from.

52
00:03:46,200 --> 00:03:51,450
In other words, our cumulative distribution, the graph of our cumulative distribution is always going

53
00:03:51,450 --> 00:03:58,320
to start at one and always increase and always end at positive one, which makes sense because again,

54
00:03:58,320 --> 00:04:01,890
we are accumulating probability as we move to the right.

55
00:04:01,890 --> 00:04:08,130
If we're rolling a six sided die one time, the probability that we roll a value less than one is of

56
00:04:08,130 --> 00:04:12,510
course going to be zero, because the lowest value we can roll is a one itself.

57
00:04:12,510 --> 00:04:17,670
That's why we see this zero value along the horizontal axis for these values.

58
00:04:17,670 --> 00:04:23,730
Less than one these empty circles here indicate that this value is not a part of the graph, whereas

59
00:04:23,730 --> 00:04:27,990
these solid filled in circles indicate that this is a part of the graph.

60
00:04:27,990 --> 00:04:33,870
So what this tells us is that the probability that we roll a value less than or equal to one, we come

61
00:04:33,870 --> 00:04:40,410
over to one along the horizontal axis and then we go up and find the solid part of the graph, which

62
00:04:40,410 --> 00:04:43,080
is not this empty part here, but the solid part here.

63
00:04:43,080 --> 00:04:49,440
We come over to the vertical axis and we see that this is one over six.

64
00:04:49,440 --> 00:04:55,260
So the probability that we roll a value less than or equal to one is one sixth, the probability that

65
00:04:55,260 --> 00:04:59,870
we roll a value, let's say less than or equal to three, we come over to three.

66
00:05:00,390 --> 00:05:02,910
We go up until we find a solid part of the graph.

67
00:05:02,910 --> 00:05:04,710
So we skip past this empty circle.

68
00:05:04,740 --> 00:05:06,210
We come up here.

69
00:05:06,240 --> 00:05:09,710
Then we go over along to the vertical axis.

70
00:05:09,720 --> 00:05:11,940
This is at three over six.

71
00:05:12,090 --> 00:05:17,310
So the probability that we roll a value less than or equal to three is three over six.

72
00:05:17,310 --> 00:05:22,590
And that makes sense because less than or equal to three means we're rolling a one, a two or three,

73
00:05:22,590 --> 00:05:28,440
which represents half of the total possibilities and three over six is equal to one half.

74
00:05:28,440 --> 00:05:35,640
So we can see how those dice rolls are accumulating and eventually we get to x equals six here where

75
00:05:35,640 --> 00:05:42,060
we can say we come up to the solid part of the graph and then we come over to the vertical axis and

76
00:05:42,060 --> 00:05:46,590
we can see the probability six over six or one there, this is one.

77
00:05:46,830 --> 00:05:53,010
So the probability that we roll a value less than or equal to six is of course, one.

78
00:05:53,010 --> 00:05:56,640
So that's what the cumulative distribution function is looking like.

79
00:05:56,640 --> 00:05:58,860
For a discrete random variable.

80
00:05:58,860 --> 00:06:01,650
We are adding up all of those probabilities as we go.

81
00:06:01,680 --> 00:06:07,440
It would be like taking this graph of the probability mass function and adding each of these probabilities

82
00:06:07,440 --> 00:06:08,190
as we go.

83
00:06:08,190 --> 00:06:12,830
So the probability that we roll a one or less is one sixth.

84
00:06:12,900 --> 00:06:18,360
Then when we add another one six to that, we can say that the probability of rolling a two or less

85
00:06:18,390 --> 00:06:20,760
is two over six or one third.

86
00:06:20,790 --> 00:06:28,080
The probability of rolling a three or less is one sixth plus one sixth plus one sixth or three over

87
00:06:28,080 --> 00:06:29,430
six or one half.

88
00:06:29,430 --> 00:06:34,440
To give us yet another view on this, let's look at continuous random variables.

89
00:06:34,440 --> 00:06:36,420
So it's the same idea.

90
00:06:36,420 --> 00:06:41,700
We can use this cumulative distribution function for both discrete and continuous random variables.

91
00:06:41,700 --> 00:06:47,460
When we have a continuous random variable, the graph of our probability density function might look

92
00:06:47,460 --> 00:06:53,730
something like this, and if we transfer that over into a cumulative distribution function, we see

93
00:06:53,730 --> 00:06:59,790
how this probability accumulates starting down at zero and continuing to accumulate probability all

94
00:06:59,790 --> 00:07:07,650
the way up until we get to one, at which point it levels off and never rises above this value of one.

95
00:07:07,650 --> 00:07:14,520
And we can imagine here that this sort of mean value in the probability density function is represented

96
00:07:14,520 --> 00:07:17,220
here in the cumulative distribution function.

97
00:07:17,220 --> 00:07:19,800
So we have those visuals.

98
00:07:19,800 --> 00:07:26,610
If we wanted to plug into the actual function here, let's say we wanted to model the probability for

99
00:07:26,610 --> 00:07:29,670
this discrete random variable representing the DI role.

100
00:07:29,670 --> 00:07:34,380
We wanted to model the probability of rolling something less than or equal to four.

101
00:07:34,410 --> 00:07:40,530
Then our cumulative distribution function here would be capital F and then capital X, and then we would

102
00:07:40,530 --> 00:07:48,570
say of four lowercase x Equal to four is the probability that the discrete random variable x takes on

103
00:07:48,570 --> 00:07:51,330
a value less than or equal to four.

104
00:07:51,450 --> 00:07:57,030
If we wanted to say here for the continuous random variable, let's say that maybe this right here is

105
00:07:57,030 --> 00:08:03,120
x equals two and maybe roughly the same here as something like this x equals two, Then we could do

106
00:08:03,120 --> 00:08:08,640
the exact same thing for the continuous random variable and say that the cumulative distribution function

107
00:08:08,640 --> 00:08:16,320
we have capital F, capital X of two is equal to the probability that the continuous random variable

108
00:08:16,320 --> 00:08:20,070
represented by Capital X is less than or equal to two.

109
00:08:20,100 --> 00:08:22,530
We would get that cumulative probability.

110
00:08:22,530 --> 00:08:29,130
The probability that this continuous random variable is taking on any value less than two or the exact

111
00:08:29,130 --> 00:08:29,880
value of two.

112
00:08:29,910 --> 00:08:32,549
So anything up to and including two.

113
00:08:32,549 --> 00:08:40,710
And we could find that probability on the graph by coming up from two here to the corresponding point

114
00:08:40,710 --> 00:08:43,770
on the graph and then coming over here to the vertical axis.

115
00:08:43,770 --> 00:08:51,870
And this value along the vertical axis, let's say that's about 0.85 based on where we put in this one

116
00:08:51,870 --> 00:08:57,630
here, I'm just looking at the scale between zero and one and estimating that that's about it, 0.85.

117
00:08:57,630 --> 00:09:03,090
And so maybe if this is the shape of the cumulative distribution function, then the probability that

118
00:09:03,090 --> 00:09:09,060
the continuous random variable takes on a value less than or equal to two is approximately 0.85 or 85.

119
00:09:09,600 --> 00:09:16,080
So again, instead of just modeling the probability of the discrete random variable, taking on a specific

120
00:09:16,080 --> 00:09:21,960
value, taking on the specific value x equals three or for the continuous random variable in the probability

121
00:09:21,960 --> 00:09:28,140
density function, the probability that we take on a value between, let's say one and two.

122
00:09:28,170 --> 00:09:30,330
So take on some value between one and two.

123
00:09:30,330 --> 00:09:35,370
That's what we did before with probability mass functions and probability density functions for discrete

124
00:09:35,370 --> 00:09:37,590
and continuous random variables respectively.

125
00:09:37,590 --> 00:09:42,840
Now, when we switch to cumulative distribution functions, we're looking at the probability that the

126
00:09:42,840 --> 00:09:50,640
random variable takes on some value up to some specified value up to some specified value instead of

127
00:09:50,640 --> 00:09:55,740
the probability of an exact value or the probability of some value in an interval.

128
00:09:55,740 --> 00:09:59,240
And then the only the thing we want to say about this is that we can.

129
00:09:59,600 --> 00:10:05,900
Use a cumulative distribution to find the probability that we do get a value in a particular interval.

130
00:10:05,900 --> 00:10:07,850
And we use this formula here.

131
00:10:07,850 --> 00:10:13,730
So if we have a cumulative distribution function, whether for a discrete random variable or a continuous

132
00:10:13,730 --> 00:10:19,280
random variable, we can use the values along the cumulative distribution function to find the probability

133
00:10:19,280 --> 00:10:22,340
that the random variable takes on a value in that interval.

134
00:10:22,340 --> 00:10:27,020
So, for instance, going back to this discrete random variable, because we have concrete values here,

135
00:10:27,020 --> 00:10:35,330
it's easy to see if we want to know the probability that when we roll the die, we get a value, let's

136
00:10:35,330 --> 00:10:39,290
say between two and four.

137
00:10:39,290 --> 00:10:43,460
And notice this here notation less than and less than or equal to.

138
00:10:43,490 --> 00:10:50,030
So what we're saying is that we're looking for the probability that X is between two and four, not

139
00:10:50,030 --> 00:10:52,400
including two, but including four.

140
00:10:52,400 --> 00:10:58,910
So if we have an interval here, two, three, four, we're not including two, but we are including

141
00:10:58,910 --> 00:10:59,270
four.

142
00:10:59,270 --> 00:11:03,590
So the only values that would actually fall within this interval are the values three and four.

143
00:11:03,590 --> 00:11:10,490
So we set that up and then we look at our cumulative distribution indicated by this capital F and we

144
00:11:10,490 --> 00:11:15,650
find the value of that cumulative distribution function at B, So B is four.

145
00:11:15,650 --> 00:11:18,380
We come here to four and we see that up at four.

146
00:11:18,380 --> 00:11:21,590
Here we're at four over six, which is two thirds.

147
00:11:21,590 --> 00:11:28,160
So we get two thirds and then we subtract the value of the cumulative distribution function evaluated

148
00:11:28,160 --> 00:11:30,950
at a well, A is two.

149
00:11:30,950 --> 00:11:36,620
So we come up here to two, we come up to the solid circle, come over to the vertical axis, and we

150
00:11:36,620 --> 00:11:39,410
see they're two over six or one third.

151
00:11:39,410 --> 00:11:45,590
So we subtract one third and the result then two over three minus one over three is one.

152
00:11:46,290 --> 00:11:47,010
Third.

153
00:11:47,010 --> 00:11:52,470
So this tells us that the probability that we get a value between two and four, not including two,

154
00:11:52,470 --> 00:11:54,900
but including four is one third.

155
00:11:54,930 --> 00:12:01,950
And of course, that makes sense because we said here that this expression showing this interval implies

156
00:12:01,950 --> 00:12:03,750
getting either a three or a four.

157
00:12:03,780 --> 00:12:06,870
Those are the only two values that fall within this interval.

158
00:12:06,870 --> 00:12:12,660
And of course, out of six possibilities when we roll the dice by getting anywhere from a one to a six,

159
00:12:12,660 --> 00:12:14,310
there are six possibilities.

160
00:12:14,310 --> 00:12:19,650
There are two possibilities of those six that meet this criteria that fall within this interval.

161
00:12:19,740 --> 00:12:24,660
So out of the six total possibilities, there are two that fall within the interval.

162
00:12:24,660 --> 00:12:30,090
And of course that simplifies to one third, one third of our possibilities.

163
00:12:30,090 --> 00:12:31,890
Meet this interval, meet this criteria.

164
00:12:31,890 --> 00:12:35,700
And so the probability that we fall within this interval is one third.

165
00:12:35,700 --> 00:12:41,730
So we can also use our cumulative distribution, whether it's for a discrete or continuous random variable

166
00:12:41,730 --> 00:12:48,030
to calculate the probability that the random variable will take on a value in this particular interval.

