1
00:00:00,120 --> 00:00:03,090
Now we want to look at another type of plot, which is the line.

2
00:00:03,090 --> 00:00:09,720
Plot and line plots are great when we want to create the sketch of a data set where there's a continuous

3
00:00:09,720 --> 00:00:12,090
relationship between the data points.

4
00:00:12,090 --> 00:00:15,630
For example, time and temperature are both great examples of this.

5
00:00:15,630 --> 00:00:22,560
So for instance, if we have the average high temperature in degrees Fahrenheit for each month in a

6
00:00:22,560 --> 00:00:29,460
city, then we can display each month along the horizontal axis of our line plot and then sketch in

7
00:00:29,460 --> 00:00:33,540
each of these average high temperatures in line with the corresponding month.

8
00:00:33,540 --> 00:00:36,120
And our line plot would look something like this.

9
00:00:36,120 --> 00:00:42,060
So we have the months along the horizontal axis and then along the vertical axis, we have this range

10
00:00:42,060 --> 00:00:48,480
of temperatures in degrees Fahrenheit and we can see that here in January, we've plotted 69 degrees

11
00:00:48,480 --> 00:00:49,260
Fahrenheit.

12
00:00:49,260 --> 00:00:55,740
In February, we've plotted 73 degrees, in March, we've plotted 78 degrees, etc. And once we plot

13
00:00:55,740 --> 00:00:58,950
all of those points, we connect them with a line.

14
00:00:59,100 --> 00:01:05,430
Now, realize here that connecting all of the data points in a scatterplot with a line like this one

15
00:01:05,430 --> 00:01:06,660
wouldn't make sense.

16
00:01:06,660 --> 00:01:10,920
Because if you remember when we looked at scatter plots, the dots were all over the place.

17
00:01:10,920 --> 00:01:14,520
They don't necessarily connect to each other in a logical way.

18
00:01:14,550 --> 00:01:21,510
The crucial thing here is that we use a line plot when the values in between each point on our line

19
00:01:21,510 --> 00:01:22,920
plot make sense.

20
00:01:22,920 --> 00:01:26,850
So, for instance, the average high in January is 69 degrees Fahrenheit.

21
00:01:26,850 --> 00:01:30,090
The average high in February is 73 degrees Fahrenheit.

22
00:01:30,120 --> 00:01:37,740
It makes sense to think about the temperature increasing from 69 degrees to 73 degrees between January

23
00:01:37,740 --> 00:01:38,490
and February.

24
00:01:38,490 --> 00:01:44,190
Those in between values of 70, 71, 72 degrees, for example, make sense.

25
00:01:44,190 --> 00:01:50,280
And so connecting these temperature values with a line gives us a logical kind of plot.

26
00:01:50,280 --> 00:01:54,300
Whereas connecting the points in a scatterplot with a line doesn't make sense.

27
00:01:54,300 --> 00:01:58,950
That's why we used the trend line to sketch a trend through the scatterplot.

28
00:01:58,950 --> 00:02:05,250
Whereas we use a line like this one to connect data points where the values in between each point make

29
00:02:05,250 --> 00:02:08,190
sense in the context of the kind of data we have.

30
00:02:08,190 --> 00:02:10,800
So we said this works well for things like temperature.

31
00:02:10,800 --> 00:02:13,500
Also for any kind of measurement of time.

32
00:02:13,500 --> 00:02:19,560
If we have some measurement of time along the vertical axis, maybe in minutes, let's say here instead

33
00:02:19,560 --> 00:02:24,480
we have 60 minutes, 70 minutes, 80 minutes, 90 minutes instead of degrees Fahrenheit.

34
00:02:24,510 --> 00:02:29,370
The values in between there make sense between 60 minutes and 70 minutes.

35
00:02:29,370 --> 00:02:31,800
We have all those continuous time values.

36
00:02:31,800 --> 00:02:39,030
So it's logical if we have data in terms of time to use a line plot for a representation of that data

37
00:02:39,030 --> 00:02:39,570
set.

38
00:02:39,570 --> 00:02:46,470
In other words, we use line plots to show changes over time or a connection between data points.

39
00:02:46,470 --> 00:02:52,410
So if showing that connection doesn't really make sense, then a line plot probably isn't the best plot

40
00:02:52,410 --> 00:02:53,070
to use.

41
00:02:53,070 --> 00:02:58,740
To give a counter example, let's say we're keeping track of the number of times that the Summer Olympics

42
00:02:58,740 --> 00:03:00,720
has been hosted on each continent.

43
00:03:00,720 --> 00:03:05,610
And so along the horizontal axis, we have continents like Europe.

44
00:03:06,410 --> 00:03:11,390
Asia, North America, etc., along the horizontal axis.

45
00:03:11,390 --> 00:03:17,060
And then we have a count along the vertical axis for the number of times that each continent has hosted

46
00:03:17,060 --> 00:03:18,380
the Summer Olympic Games.

47
00:03:18,410 --> 00:03:24,290
Well, in theory, we could plot the value for each continent, plot the value for Europe, for Asia,

48
00:03:24,290 --> 00:03:26,930
for North America, and connect them with the line.

49
00:03:26,930 --> 00:03:33,230
But it doesn't really make sense to connect those discrete data points with a line because they're not

50
00:03:33,230 --> 00:03:35,300
really related to each other.

51
00:03:35,300 --> 00:03:41,060
Or another way to put it, there's no in between values between Europe and Asia or between Asia and

52
00:03:41,060 --> 00:03:43,100
North America that make sense.

53
00:03:43,100 --> 00:03:47,720
These are just discrete buckets or discrete categories that are separate from one another.

54
00:03:47,720 --> 00:03:54,020
And so we wouldn't connect those data points with a line graph, whereas something like these temperature

55
00:03:54,020 --> 00:03:59,540
values have those in between values, we can connect them with a line plot and it makes a lot more sense.

56
00:03:59,570 --> 00:04:05,060
Now, whenever we have a line plot, we want to be able to interpret what we're seeing.

57
00:04:05,060 --> 00:04:12,710
Of course, each point along the line graph gives us a particular value, but we can also look at the

58
00:04:12,710 --> 00:04:16,190
slope or the change between certain points.

59
00:04:16,190 --> 00:04:21,800
For instance, let's look here at the change between August and September.

60
00:04:21,800 --> 00:04:29,240
So here we're at August and then here is September and then this is October.

61
00:04:29,240 --> 00:04:34,820
What we can see is that the change between August and September is much less than the change between

62
00:04:34,820 --> 00:04:36,350
September and October.

63
00:04:36,350 --> 00:04:39,590
And we see that reflected in the slope of each line.

64
00:04:39,620 --> 00:04:45,590
The slope between August and September is much shallower, whereas the slope between September and October

65
00:04:45,590 --> 00:04:46,790
is much steeper.

66
00:04:46,790 --> 00:04:53,840
So the steeper each segment is, the bigger the change in this case in temperature, the more shallow

67
00:04:53,870 --> 00:04:58,250
the particular segment, the less change in temperature we have.

68
00:04:58,250 --> 00:05:05,030
So a line plot can help us see where the greatest rate of change occurs and where the least amount of

69
00:05:05,030 --> 00:05:06,050
change occurs.

70
00:05:06,050 --> 00:05:11,330
For instance, the change up here between June, July, August, September is pretty shallow.

71
00:05:11,330 --> 00:05:12,500
There's not a lot changing.

72
00:05:12,500 --> 00:05:14,180
The temperature is pretty consistent.

73
00:05:14,180 --> 00:05:20,450
But to go, for instance, from March to June, the temperature is changing significantly every month.

74
00:05:20,450 --> 00:05:25,790
To go from September to December, the temperature's falling off significantly every month.

75
00:05:25,790 --> 00:05:32,030
So we want to pay attention both to the values along the line plot, but also to the change in between

76
00:05:32,030 --> 00:05:37,310
each value, because that slope, depending on how shallow or how steep it is, gives us a good idea

77
00:05:37,310 --> 00:05:40,880
of the rate of change in that part of the data.

78
00:05:40,880 --> 00:05:45,860
The last thing we want to say about line plots is that there's another type of line plot we want to

79
00:05:45,860 --> 00:05:49,910
be aware of, and that specific kind of plot is called an O Jive.

80
00:05:49,910 --> 00:05:54,620
And it looks like this basically, it's just an accumulating line plot.

81
00:05:54,620 --> 00:06:01,070
So for this particular example, we again have months along the horizontal axis.

82
00:06:01,810 --> 00:06:05,120
And then we have revenue along the vertical axis.

83
00:06:05,140 --> 00:06:11,980
Remember that all of these graphs and plots we're looking at are all about displaying information in

84
00:06:11,980 --> 00:06:16,870
the way that makes the most sense or communicates the data most clearly.

85
00:06:16,900 --> 00:06:23,560
So to take this example, if we have revenue data, in other words, the revenue that our company earns

86
00:06:23,560 --> 00:06:29,290
each month of the year, we could certainly take that data and create a line plot and that would be

87
00:06:29,290 --> 00:06:33,280
useful because we'd be able to see the revenue earned each month.

88
00:06:33,280 --> 00:06:39,670
But maybe instead what we're trying to communicate is more about the total revenue for the year and

89
00:06:39,670 --> 00:06:42,130
less about our month to month revenue.

90
00:06:42,160 --> 00:06:47,530
As you can imagine, the line plot would do a better job communicating the month to month revenue,

91
00:06:47,620 --> 00:06:53,230
whereas if we use an O five where the values in our data set accumulate, then we'll get a much better

92
00:06:53,230 --> 00:06:55,780
picture of revenue for the entire year.

93
00:06:55,780 --> 00:07:02,500
So what the OGI shows here for December is not just the revenue we earned in December, it's the revenue

94
00:07:02,500 --> 00:07:04,600
we earned over the course of the entire year.

95
00:07:04,810 --> 00:07:11,020
This value here representing November is the revenue we've earned up and through November, this data

96
00:07:11,020 --> 00:07:15,130
point above October is the revenue we've earned up and through October.

97
00:07:15,130 --> 00:07:21,400
So every month that goes by as we earn more and more revenue, this line is always going to grow, assuming

98
00:07:21,400 --> 00:07:22,750
our revenue is always positive.

99
00:07:22,750 --> 00:07:28,630
And you can imagine how this jive of revenue gives us a different picture of what's happening than we

100
00:07:28,630 --> 00:07:34,090
get from a line plot of revenue and which one we should use really just depends on what we're trying

101
00:07:34,090 --> 00:07:37,960
to communicate and which of these does a better job of communicating.

102
00:07:37,960 --> 00:07:38,890
Our point.

103
00:07:38,920 --> 00:07:45,010
Keep in mind here also that we're showing two different kinds of line plots and lives here in this line

104
00:07:45,010 --> 00:07:45,550
plot.

105
00:07:45,550 --> 00:07:52,930
We've sketched in these points indicated by these dots along the way where is this has no dots.

106
00:07:52,930 --> 00:07:55,360
It's just a collection of line segments.

107
00:07:55,360 --> 00:07:58,390
We can sketch a line, plot or an octave.

108
00:07:58,390 --> 00:08:04,900
Either way, we can include these little dots as points for the line plot or for the five, or we can

109
00:08:04,900 --> 00:08:09,820
exclude them and have a line that looks like this, just a collection of a bunch of segments for both

110
00:08:09,820 --> 00:08:12,970
a line plot and an F both are acceptable.

111
00:08:12,970 --> 00:08:18,430
And then finally, we just want to say that in the same way that a line plot is only appropriate for

112
00:08:18,430 --> 00:08:23,080
certain types of data, Ogi lives are also only appropriate for certain types of data.

113
00:08:23,080 --> 00:08:28,300
For instance, even though a line plot works really well for temperature data, an O jive doesn't really

114
00:08:28,300 --> 00:08:30,130
make sense at all for temperature data.

115
00:08:30,130 --> 00:08:36,039
If we're working here with this temperature data, we would show a temperature of 69 for January, but

116
00:08:36,039 --> 00:08:42,309
then a temperature of 69 plus 73 or 142 degrees for February.

117
00:08:42,309 --> 00:08:49,270
And in the context of temperature, it makes no sense to show an accumulated 142 degrees for February.

118
00:08:49,270 --> 00:08:55,630
So this kind of temperature data is nonsensical in an OGI, but revenue data works really well.

119
00:08:55,630 --> 00:09:00,400
So again, just paying attention to which kind of plot makes sense for the kind of data we have.

120
00:09:00,400 --> 00:09:06,430
And then with both line plots and drives, the last thing we want to say is that we can also plot multiple

121
00:09:06,430 --> 00:09:08,650
series on the same graph.

122
00:09:08,650 --> 00:09:16,270
So for example, with this temperature data, this blue line plot could represent one particular city

123
00:09:16,270 --> 00:09:19,570
and the red line plot could represent another city.

124
00:09:19,570 --> 00:09:25,930
Plotting them together on the same set of axes gives us a great way to compare average high temperatures

125
00:09:25,930 --> 00:09:27,460
across multiple cities.

126
00:09:27,460 --> 00:09:31,720
So line plots are particularly useful for seeing the comparison that way.

127
00:09:31,720 --> 00:09:34,660
And of course, we can do the same thing with an five.

128
00:09:34,690 --> 00:09:40,720
Maybe instead of plotting revenue accumulation over the course of a year for our entire business, we

129
00:09:40,720 --> 00:09:45,550
want to divide the business up into different departments or different lines of business, different

130
00:09:45,550 --> 00:09:46,990
segments of our business.

131
00:09:46,990 --> 00:09:53,470
For instance, maybe we have a business where we sell products online and also in person at conferences

132
00:09:53,470 --> 00:09:59,770
all over the country where we set up a booth and sell product and so we could plot in an off here revenue

133
00:09:59,770 --> 00:10:05,860
accumulation for our online business in blue over the course of the year, and then revenue accumulation

134
00:10:05,860 --> 00:10:10,480
for our in-person business at conferences in red over the course of the year.

135
00:10:10,480 --> 00:10:17,050
And that would allow us to see the comparison between those two parts of our business in the same graph,

136
00:10:17,050 --> 00:10:18,790
in the same five.

