1
00:00:00,120 --> 00:00:05,610
The first kind of plot we want to talk about is the scatterplot also called a scatter graph or scatter

2
00:00:05,610 --> 00:00:11,730
chart, which shows the relationship between two dimensions or characteristics or measurements of a

3
00:00:11,730 --> 00:00:12,210
group.

4
00:00:12,210 --> 00:00:17,040
So as we walk through this idea, let's just take a set of data.

5
00:00:17,040 --> 00:00:23,280
Let's say we're running a small business here and we ship out products to customers as they order them

6
00:00:23,280 --> 00:00:24,600
from our website.

7
00:00:24,600 --> 00:00:29,670
And maybe we keep track of lots of different pieces of data about our shipments.

8
00:00:29,670 --> 00:00:32,759
And let's say this is just a snapshot here of some of that data.

9
00:00:32,759 --> 00:00:36,090
So we record this state that we're shipping to.

10
00:00:36,120 --> 00:00:39,030
We categorize those states by regions.

11
00:00:39,030 --> 00:00:45,030
So we group states into the West, Midwest, Southeast, northeast, etc. regions.

12
00:00:45,030 --> 00:00:49,110
And then we also record the number of items we send in each shipment.

13
00:00:49,110 --> 00:00:53,730
So for this first order here in this first row, the person ordered three items.

14
00:00:53,730 --> 00:00:55,590
So the item count is three.

15
00:00:55,590 --> 00:00:58,890
Their order total is $2.08.

16
00:00:58,890 --> 00:01:01,710
So they bought three very inexpensive items.

17
00:01:01,710 --> 00:01:09,780
And with just that data alone, we can create a scatterplot that compares the number of items sent and

18
00:01:09,780 --> 00:01:14,010
the order total, the total amount of money for that item count.

19
00:01:14,010 --> 00:01:20,400
So here's what that scatterplot might look like if we put item count along the horizontal axis here.

20
00:01:20,400 --> 00:01:26,730
So all the way from zero items up to ten items, we can see in our data here that the most items we've

21
00:01:26,730 --> 00:01:32,490
recorded for any shipment is ten items, and then we place order total along the vertical axis here.

22
00:01:32,490 --> 00:01:36,300
So starting at zero along the bottom up to $100.

23
00:01:36,300 --> 00:01:44,550
So item count here is a number order total is in dollars and we can see here if we skim through order

24
00:01:44,550 --> 00:01:52,320
total that in fact this data is sorted by order total from smallest to largest order with the most expensive

25
00:01:52,320 --> 00:01:59,520
order being $99.69, which is why this vertical axis spans from $0 to $100.

26
00:01:59,520 --> 00:02:05,910
And then if we plot each order as a data point, what we create is a scatterplot.

27
00:02:05,910 --> 00:02:12,510
So taking this first order here, the customer ordered three items and the order total was $2.08.

28
00:02:12,510 --> 00:02:17,610
So we could think about along the horizontal axis here, locating three items.

29
00:02:17,610 --> 00:02:22,650
If this is two items and this is four items, then three items should be about halfway between.

30
00:02:22,650 --> 00:02:31,650
And then if this is $0 and this is $20, then $2.08 should be very close to the bottom of our vertical

31
00:02:31,650 --> 00:02:33,960
axis here, very close to $0.

32
00:02:33,960 --> 00:02:36,960
And in fact, we see that data point right there.

33
00:02:36,960 --> 00:02:39,720
That is the data point for this first order.

34
00:02:39,720 --> 00:02:46,470
And we could take any other order in our data set here and locate its point on the scatter graph.

35
00:02:46,470 --> 00:02:54,660
For instance here, this order that got sent to Georgia, the person ordered nine items and spent $45.24.

36
00:02:54,660 --> 00:02:58,350
So nine items should be right about halfway between eight and ten.

37
00:02:58,350 --> 00:03:05,400
And if we go up to 45, 24, if this is $40 and $60, $50 should be right about here.

38
00:03:05,400 --> 00:03:07,740
$45 should be right about here.

39
00:03:07,740 --> 00:03:15,660
It looks like this is the point associated with that order of nine items for a total of $45.24.

40
00:03:15,660 --> 00:03:21,210
So all of these orders have been plotted as individual points in this scatterplot.

41
00:03:21,210 --> 00:03:25,680
Now, a couple of things we really want to remember as we're creating scatter plots.

42
00:03:25,680 --> 00:03:32,130
The first is the most general point that what we're doing here is essentially comparing two characteristics

43
00:03:32,130 --> 00:03:34,290
of the same variable, if you will.

44
00:03:34,290 --> 00:03:41,730
So here in this particular example, we're comparing item count and order total for individual orders.

45
00:03:41,730 --> 00:03:46,830
And we could just as easily compare different characteristics of the same set of orders.

46
00:03:46,830 --> 00:03:52,080
For instance, maybe we also keep track of the time of day when the order was placed.

47
00:03:52,110 --> 00:03:59,010
We could make a different scatterplot that shows time of day along the horizontal axis and item count

48
00:03:59,010 --> 00:04:04,050
along the vertical axis or time of day along the horizontal axis and order total along the vertical

49
00:04:04,050 --> 00:04:09,780
axis, regardless of which two characteristics we pick about our variable.

50
00:04:09,810 --> 00:04:17,100
When we create a scatterplot, we want to if we can think about putting the independent variable along

51
00:04:17,100 --> 00:04:20,880
the horizontal axis and the dependent variable along the vertical axis.

52
00:04:20,880 --> 00:04:27,780
Now that's not going to be a perfect science because there may not be direct causation between the two

53
00:04:27,780 --> 00:04:29,610
characteristics that we're comparing.

54
00:04:29,610 --> 00:04:36,060
But if we can think about in general one characteristic causing the other, we want to put the characteristic

55
00:04:36,060 --> 00:04:41,760
that initiates the cause along the horizontal axis and the outcome or the cause along the vertical axis.

56
00:04:41,760 --> 00:04:46,170
So an easy example might be height versus weight of people.

57
00:04:46,170 --> 00:04:47,730
So if we think about.

58
00:04:49,020 --> 00:04:49,770
Height.

59
00:04:50,710 --> 00:04:51,640
Versus.

60
00:04:52,970 --> 00:04:53,780
Wait.

61
00:04:53,870 --> 00:04:58,470
This is a good example because it reveals the shortcomings of this idea.

62
00:04:58,520 --> 00:05:04,670
But in general, we have some intuition that the taller someone is, the more they're going to weigh.

63
00:05:04,730 --> 00:05:08,090
Of course, that's a very broad generalization.

64
00:05:08,090 --> 00:05:14,180
We often see taller people who weigh less than shorter people or shorter people who weigh more than

65
00:05:14,180 --> 00:05:15,090
taller people.

66
00:05:15,110 --> 00:05:18,550
So obviously, there's not perfect causation here.

67
00:05:18,560 --> 00:05:24,020
Being taller doesn't necessarily mean that you will be heavier, but it probably does make a little

68
00:05:24,020 --> 00:05:27,780
more sense to put height along the horizontal axis.

69
00:05:27,800 --> 00:05:35,390
Because in general, as a very broad idea as height increases, weight is probably likely to increase.

70
00:05:35,390 --> 00:05:40,830
So height is cause weight is the effect or the outcome.

71
00:05:40,850 --> 00:05:46,070
Now, oftentimes if we switch these so if we were to put weight along the horizontal axis and height

72
00:05:46,070 --> 00:05:50,160
along the vertical axis, it's usually not going to cause a big problem.

73
00:05:50,180 --> 00:05:51,770
Same thing with this example here.

74
00:05:51,770 --> 00:05:57,080
We could flip item count and order total putting item count along the vertical axis in order total along

75
00:05:57,080 --> 00:06:02,120
the horizontal axis, and we'd still be able to do the math that explores this relationship.

76
00:06:02,120 --> 00:06:03,320
Same with height and weight.

77
00:06:03,320 --> 00:06:10,430
But if possible, if we can somewhat identify that a height increase tends to cause a weight increase,

78
00:06:10,430 --> 00:06:15,170
we would put height along the horizontal axis, weight along the vertical axis in the same way that

79
00:06:15,170 --> 00:06:19,640
maybe in general as item count increases or total increases.

80
00:06:19,640 --> 00:06:25,160
We like to generally put that independent variable along the horizontal axis and the dependent variable

81
00:06:25,160 --> 00:06:26,480
along the vertical axis.

82
00:06:26,480 --> 00:06:32,120
But again, this is not a perfect science and switching these two isn't usually going to cause us trouble.

83
00:06:32,120 --> 00:06:37,490
We also want to say along these lines that saying we've heard before that.

84
00:06:38,170 --> 00:06:39,400
Correlation.

85
00:06:40,510 --> 00:06:42,490
Does not necessarily equal.

86
00:06:43,980 --> 00:06:45,000
Causation.

87
00:06:45,000 --> 00:06:51,750
So just because weight tends to increase as height increases doesn't mean that height perfectly dictates

88
00:06:51,750 --> 00:06:58,350
weight or just because order total tends to increase as item count increases doesn't mean that increasing

89
00:06:58,350 --> 00:07:01,620
the item count is guaranteed to increase the order total.

90
00:07:01,650 --> 00:07:06,990
Two things can be correlated, meaning that they can increase together or they can decrease together,

91
00:07:06,990 --> 00:07:08,820
or they can move in opposite directions.

92
00:07:08,820 --> 00:07:14,730
But the relationship between the two characteristics, the correlation between the two characteristics

93
00:07:14,730 --> 00:07:20,730
doesn't necessarily mean that a change in one characteristic is actually the cause for the change.

94
00:07:20,730 --> 00:07:27,150
In the other characteristic, there could be some third confounding variable or some other outside force

95
00:07:27,150 --> 00:07:32,490
that's causing the correlation between the two characteristics or the correlation we're seeing might

96
00:07:32,490 --> 00:07:33,810
just be coincidental.

97
00:07:33,840 --> 00:07:36,960
There's no guarantee necessarily of causation.

98
00:07:36,960 --> 00:07:42,900
And so we need to be really careful that just because we plot our data on a scatterplot and maybe we

99
00:07:42,900 --> 00:07:49,950
see some general trend does not prove that we somehow have causation between the two characteristics.

100
00:07:49,950 --> 00:07:54,750
All we can really say is that we see some kind of a trend in the data.

101
00:07:54,750 --> 00:08:00,390
In this case, the example we're using, it appears as though as item count increases, order, total

102
00:08:00,390 --> 00:08:06,090
increases or as order total increases, item count increases because that's the direction we're seeing

103
00:08:06,090 --> 00:08:07,920
in this trend line that we've plotted.

104
00:08:07,950 --> 00:08:11,610
Now, that being said, let's talk about this trend line a little more.

105
00:08:11,610 --> 00:08:17,820
We call this when we're dealing with a scatterplot, we call this the trend line, sometimes the regression

106
00:08:17,820 --> 00:08:23,190
line, the best fit line line of best fit, even the least squares line, those are all basically names

107
00:08:23,190 --> 00:08:27,990
for the same thing, which is this line that indicates the trend through the data.

108
00:08:27,990 --> 00:08:32,610
And we actually have mathematical formulas for calculating this line.

109
00:08:32,610 --> 00:08:39,630
Now, we can do this by hand with this set of formulas here, but of course it's going to be much easier

110
00:08:39,630 --> 00:08:46,650
for us to plug our data into a calculator or software and have it calculate the equation of this trend

111
00:08:46,650 --> 00:08:47,730
line for us.

112
00:08:47,730 --> 00:08:53,880
But even if we do that, these are the formulas that our software or our calculator will use to find

113
00:08:53,880 --> 00:08:55,170
the equation of this line.

114
00:08:55,170 --> 00:09:01,560
Now, if you've ever taken algebra, you know that the equation of a line is given by Y equals m, x

115
00:09:01,560 --> 00:09:02,400
plus B.

116
00:09:02,400 --> 00:09:09,900
Now M, this value here in front of the X is the slope of the line.

117
00:09:09,900 --> 00:09:13,860
It tells us the rate at which the line is increasing or decreasing.

118
00:09:13,860 --> 00:09:20,340
So a higher value for M means the line is steeper, it's increasing faster, whereas a lower value for

119
00:09:20,340 --> 00:09:23,820
M means the line is more shallow, it's increasing more slowly.

120
00:09:23,820 --> 00:09:27,780
A negative value for M indicates that the trend is decreasing.

121
00:09:27,780 --> 00:09:31,320
The line would be going in the opposite direction this way, this value.

122
00:09:31,320 --> 00:09:36,840
B Here is the point at which the line intersects the vertical axis.

123
00:09:36,840 --> 00:09:45,270
So if we picture here this vertical axis, then it b is this value right here.

124
00:09:45,270 --> 00:09:51,330
So in our case it looks like B is a little less than ten because we have here zero and 20 along the

125
00:09:51,330 --> 00:09:56,310
vertical axis, this looks like it's a little less than half way from 0 to 20.

126
00:09:56,310 --> 00:10:02,010
So I'm guessing that the value of B is a little less than ten, and we'll actually calculate this value

127
00:10:02,010 --> 00:10:02,820
to find out.

128
00:10:02,820 --> 00:10:10,170
So if we have the slope and we have the y intercept, then we can plug in any value of x and this equation

129
00:10:10,170 --> 00:10:12,840
will return to us a value for y.

130
00:10:12,840 --> 00:10:20,730
So while this formula is normally in algebra, just y equals mx x plus B, this little hat on the Y,

131
00:10:20,730 --> 00:10:22,170
we call this y hat.

132
00:10:22,200 --> 00:10:27,870
This indicates that we're talking specifically about a trend line and we want to distinguish between

133
00:10:27,870 --> 00:10:34,020
the equation of a trend line and the equation of just a regular line, because we want to be super aware

134
00:10:34,020 --> 00:10:39,480
of the fact that the equation of the trend line is sort of this predictive trend.

135
00:10:39,480 --> 00:10:44,790
And what we notice here, if we look at this trend line, is that it actually doesn't pass through a

136
00:10:44,790 --> 00:10:47,370
single point in our data set, right?

137
00:10:47,370 --> 00:10:49,020
We plotted all these points.

138
00:10:49,020 --> 00:10:52,920
This little point down here was the 0.3 $2.08.

139
00:10:52,920 --> 00:10:58,110
This point right here was the 0.9 and $45.24.

140
00:10:58,110 --> 00:11:06,090
This line is supposed to predict the value for order total when we pick a particular value for item

141
00:11:06,090 --> 00:11:06,570
count.

142
00:11:06,570 --> 00:11:12,570
So what this trend line is really saying is that let's say we consider an item count of eight.

143
00:11:12,570 --> 00:11:18,090
If we come here to item count of eight, we come straight up from the horizontal axis until we meet

144
00:11:18,090 --> 00:11:23,190
the trend line and then we head straight over to the vertical axis.

145
00:11:23,190 --> 00:11:30,510
From that point it looks like we get to a value here of maybe roughly 65, let's say.

146
00:11:30,510 --> 00:11:36,030
If that's the case, then what this trend line is telling us is that the expected order total for an

147
00:11:36,030 --> 00:11:39,570
order with eight items is about $65.

148
00:11:39,570 --> 00:11:43,500
But of course, if we look at our raw data, we don't see any.

149
00:11:43,570 --> 00:11:47,540
Orders where the item count is eight and the total is $65.

150
00:11:47,560 --> 00:11:54,760
In fact, if we scan through here, the only time where we have an item count of eight is these three

151
00:11:54,760 --> 00:11:55,840
orders right here.

152
00:11:55,840 --> 00:12:01,060
And we can see that the order total for each of those is over $95 each time.

153
00:12:01,060 --> 00:12:06,880
In fact, we see those three data points right here, and those values aren't anywhere close to $65.

154
00:12:06,880 --> 00:12:13,150
We can also scan through this list and see orders that have a total near $65.

155
00:12:13,150 --> 00:12:22,660
What we find is that the closest two orders are these right here where the totals were $58.37 and $70.01

156
00:12:22,660 --> 00:12:23,620
and the item counts.

157
00:12:23,620 --> 00:12:27,850
There were ten and six items respectively, not eight items.

158
00:12:27,850 --> 00:12:34,720
So what we need to realize is that the trend line is not predictive at all in terms of actual specific

159
00:12:34,720 --> 00:12:37,150
data values in our data set.

160
00:12:37,150 --> 00:12:43,840
What it does instead is compute what it believes are the expected values for order total based on item

161
00:12:43,840 --> 00:12:51,160
count or vice versa, and gives us a trend line that is the most efficient path through the data, which

162
00:12:51,160 --> 00:12:56,380
means that if we're trying to make real world decisions using a trend line, we need to be really careful

163
00:12:56,380 --> 00:13:02,470
because this gives us a guideline that we can use to estimate what we might see.

164
00:13:02,500 --> 00:13:07,690
Of course, using this example, we know that the trend line estimates that if someone orders eight

165
00:13:07,690 --> 00:13:13,510
items, we can probably expect that order to be around $65 on average.

166
00:13:13,510 --> 00:13:20,860
But as we can see from our data, we are certainly not guaranteed that exact order total if we pick

167
00:13:20,860 --> 00:13:22,000
an eight item order.

168
00:13:22,000 --> 00:13:26,530
So we can somewhat use this trend line through a scatterplot to make predictions.

169
00:13:26,530 --> 00:13:30,550
But we just have to be aware of the fact that it's only an expected value.

170
00:13:30,550 --> 00:13:34,270
It's only an average for order total based on item count.

171
00:13:34,300 --> 00:13:38,020
It's not perfectly predictive of all of the orders will get.

172
00:13:38,020 --> 00:13:45,550
If it was perfectly predictive, then every single dot in our scatterplot would be exactly along this

173
00:13:45,550 --> 00:13:46,090
line.

174
00:13:46,090 --> 00:13:52,690
So that being said, let's just walk through this math using the table we've created here so that we

175
00:13:52,690 --> 00:13:59,230
can see how to calculate M and B, and then use this equation to plot the trend line.

176
00:13:59,230 --> 00:14:03,760
So if we have our raw data here, we need a few things.

177
00:14:03,760 --> 00:14:09,490
We need the value of N, which is the count of data points in our set.

178
00:14:09,520 --> 00:14:15,850
Here we have 25 orders if we count here, the number of rows we have, we have 25 orders.

179
00:14:15,850 --> 00:14:17,560
We've indicated that here.

180
00:14:17,560 --> 00:14:21,670
So when our case end is equal to 25.

181
00:14:21,850 --> 00:14:26,140
So let's look at what this looks like to plug everything in here.

182
00:14:26,170 --> 00:14:28,260
We'll get 25.

183
00:14:28,270 --> 00:14:33,340
Then we multiply that by the sum of all of the products of X and Y.

184
00:14:33,340 --> 00:14:39,670
So we're going to say here that item count represents X and that order total represents Y, which makes

185
00:14:39,670 --> 00:14:45,490
sense because we put item count along the horizontal axis, which is the x axis, and we put order total

186
00:14:45,490 --> 00:14:48,070
along the vertical axis, which is the Y axis.

187
00:14:48,070 --> 00:14:55,540
So what we're doing here with this sum of all the products of X and Y is we're multiplying X by Y for

188
00:14:55,540 --> 00:14:56,680
every order.

189
00:14:56,680 --> 00:15:00,430
So we're multiplying item count and order total for each order.

190
00:15:00,430 --> 00:15:06,970
And we see that here in this column, the product of count and order, total item count and order total.

191
00:15:06,970 --> 00:15:16,450
So if we multiply three by $2.08, we get $6.24 or just 6.24 without the units of dollars.

192
00:15:16,450 --> 00:15:19,360
If we multiply six by five, we get 30.

193
00:15:19,360 --> 00:15:23,020
If we multiply one by 9.65, we get 9.65.

194
00:15:23,020 --> 00:15:28,510
So we're just multiplying X and Y or these two values to get the product.

195
00:15:28,720 --> 00:15:35,800
Then this here, the sum of all the products means we take all these products and we sum them, we add

196
00:15:35,800 --> 00:15:36,640
them together.

197
00:15:37,030 --> 00:15:41,500
And if we sum all of these values, we get 88 6.38.

198
00:15:41,620 --> 00:15:47,350
So we multiply this by 80 806.38.

199
00:15:47,770 --> 00:15:51,670
Then we'll subtract the sum of all of the x values.

200
00:15:51,670 --> 00:15:53,620
So we've said x is item count.

201
00:15:53,620 --> 00:15:58,480
If we sum all of these item counts, we get a sum of 138 right here.

202
00:15:58,600 --> 00:16:04,870
So 138 multiplied by the sum of all the y's well y is order total.

203
00:16:04,870 --> 00:16:11,710
So if we sum all of these order totals, we get a sum of 1,229.38.

204
00:16:11,710 --> 00:16:23,920
So we multiply this by 12 29.38, and then we divide by n, which we already said was the count of orders

205
00:16:23,920 --> 00:16:29,020
25 multiplied by the sum of all of the x squared values.

206
00:16:29,020 --> 00:16:31,420
In other words, X is item count.

207
00:16:31,420 --> 00:16:34,150
So we want to take every item count and square it.

208
00:16:34,150 --> 00:16:36,580
So we take three, we square it, we get nine.

209
00:16:36,580 --> 00:16:43,120
That's this column here, count squared, we take six, we square it and we get 36 one.

210
00:16:43,300 --> 00:16:47,470
Word is one zero squared is zero, four squared is 16.

211
00:16:47,470 --> 00:16:54,010
So we square all of the X's and once we've squared them all, we add them together and the total is

212
00:16:54,010 --> 00:16:55,540
1038.

213
00:16:55,570 --> 00:16:57,850
That is the sum of all the x squared.

214
00:16:57,850 --> 00:17:05,950
So we multiply by 1038 and then we subtract from this the sum of all the X's.

215
00:17:05,950 --> 00:17:08,800
And once we get the sum of all the X's, then we square that value.

216
00:17:08,800 --> 00:17:13,300
Well, the sum of all the X's we already found, we take all the X's, we add them together, we get

217
00:17:13,300 --> 00:17:14,500
138.

218
00:17:14,500 --> 00:17:16,869
So that is the sum of all the X's.

219
00:17:16,869 --> 00:17:17,890
But then we square that.

220
00:17:17,890 --> 00:17:20,859
So we're taking 138 and squaring it.

221
00:17:20,859 --> 00:17:26,079
And here's the sum squared value 138 squared is 19,044.

222
00:17:26,079 --> 00:17:28,390
So we're subtracting 19,000.

223
00:17:29,190 --> 00:17:30,180
44.

224
00:17:30,210 --> 00:17:37,400
And if we use a calculator to do the math here, what we get is approximately 7.3132.

225
00:17:37,410 --> 00:17:43,380
When we round a four decimal places, that's the value of M, That is the slope of the trend line,

226
00:17:43,380 --> 00:17:49,290
which means that for every one unit we move to the right along the horizontal axis will move up about

227
00:17:49,290 --> 00:17:57,180
7.3 units along the vertical axis, meaning that for every one item we add to item count, we increase

228
00:17:57,180 --> 00:18:00,420
order total by about $7.31.

229
00:18:00,450 --> 00:18:02,830
That's what that slope is telling us.

230
00:18:02,850 --> 00:18:09,090
Now all we have to do is calculate B using this formula, and when we do, we get the sum of all the

231
00:18:09,090 --> 00:18:14,460
y's, which we know is 12 2938 So we get 1229.

232
00:18:15,210 --> 00:18:16,020
38.

233
00:18:16,110 --> 00:18:17,520
And then we subtract.

234
00:18:17,520 --> 00:18:20,850
We use the value of em that we just calculated.

235
00:18:20,850 --> 00:18:23,950
So M is about 7.3132.

236
00:18:23,970 --> 00:18:33,810
So we say 7.3132, and then we multiply that by the sum of the x's, which we know is 138, so 138.

237
00:18:34,200 --> 00:18:42,990
And then we divide this whole thing by the number of orders and equals 25, so we divide by 25.

238
00:18:42,990 --> 00:18:51,960
And the result there is approximately 8.8063 when we round to four decimal places, which means that

239
00:18:51,960 --> 00:19:03,000
the equation of our trend line using this formula here is y hat approximately equal to 7.3132 times

240
00:19:03,000 --> 00:19:08,250
x plus 8.8063.

241
00:19:08,250 --> 00:19:13,530
Now, we've already plotted the trend line here, but if we wanted to use this equation to plot it because

242
00:19:13,530 --> 00:19:18,300
let's say we didn't have it yet, we just had the scatterplot and now we want to sketch in this trend

243
00:19:18,300 --> 00:19:19,950
line what we would do.

244
00:19:19,980 --> 00:19:24,930
Remember that B here is the y intercept.

245
00:19:24,930 --> 00:19:28,830
This is the value where the trend line intersects the Y axis.

246
00:19:28,830 --> 00:19:35,610
So we would come here along the y axis, the vertical axis, and we would locate about 8.8, and that's

247
00:19:35,610 --> 00:19:37,560
this value we saw earlier.

248
00:19:37,560 --> 00:19:38,670
It's about 8.8.

249
00:19:38,670 --> 00:19:41,340
Remember before we said we thought it was a little less than ten?

250
00:19:41,340 --> 00:19:42,900
Well, it's about 8.8.

251
00:19:42,900 --> 00:19:45,150
So we would start there with this point.

252
00:19:45,150 --> 00:19:54,210
We'd put a point there along the axis and then this slope here of 7.313, two tells us, like we said

253
00:19:54,210 --> 00:20:01,020
before, that for every one unit we move over along the horizontal axis, we move up seven units along

254
00:20:01,020 --> 00:20:08,880
the vertical axis, which means that when we move out to X equals one from 8.8, we should go up about

255
00:20:08,880 --> 00:20:09,930
7.3.

256
00:20:09,930 --> 00:20:16,320
So at x equals one, we should get about 8.8 plus 7.3 is 16.1.

257
00:20:16,470 --> 00:20:20,520
So we would come up and we would locate about 16.

258
00:20:21,390 --> 00:20:23,550
Point one right about here.

259
00:20:23,550 --> 00:20:28,430
We could see that that's maybe at 16 here along the vertical axis.

260
00:20:28,440 --> 00:20:33,690
Then we would move over one more unit to X equals two and we would move up another 7.3.

261
00:20:33,690 --> 00:20:45,390
So from 16.1, adding 7.3, again we get about 23.4 and we can see here at two a value of about 23.4.

262
00:20:45,390 --> 00:20:48,240
So you can see that line starting to sketch out.

263
00:20:48,240 --> 00:20:49,890
That's how to do it by hand.

264
00:20:49,920 --> 00:20:55,980
Of course, if we just plug this line into any graphing calculator, computer algebra system or software

265
00:20:55,980 --> 00:21:02,370
program, it will plot this line for us without us having to find points to plot it out manually.

266
00:21:02,370 --> 00:21:10,410
But once we have even just two points, then we can sketch in a line that passes through those two points

267
00:21:10,410 --> 00:21:12,870
and we see that we get the trend line.

268
00:21:12,870 --> 00:21:18,300
Now we'll talk much more about scatter plots and trend lines when we talk later in the course about

269
00:21:18,300 --> 00:21:18,960
regression.

270
00:21:18,960 --> 00:21:25,110
But this is the general idea behind using a set of raw data to create a scatterplot.

271
00:21:25,110 --> 00:21:31,110
How we can read and interpret a scatterplot, and then how to use these formulas to actually build the

272
00:21:31,110 --> 00:21:35,280
equation of the trend line that runs through the data in our plot.

