1
00:00:00,090 --> 00:00:06,900
Box and whisker plots, which we also sometimes call box plots or box charts are a great way to display

2
00:00:06,900 --> 00:00:11,520
data when we want to show both the median and spread of the data at the same time.

3
00:00:11,640 --> 00:00:16,250
So in general, a box and whisker plot might look something like this.

4
00:00:16,260 --> 00:00:23,250
We always have the plot itself, which is this blue part above, and then we have it set next to a number

5
00:00:23,250 --> 00:00:23,800
line.

6
00:00:23,820 --> 00:00:29,100
This number line think of as a reference so we know where the box plot is actually positioned.

7
00:00:29,130 --> 00:00:34,980
For instance, we know that the left edge of this box and whisker plot is sitting at the value two and

8
00:00:34,980 --> 00:00:38,420
the right edge of the box and whisker plot is sitting at 14.

9
00:00:38,430 --> 00:00:40,440
So this number line is for reference.

10
00:00:40,440 --> 00:00:42,940
It tells us where this box plot is positioned.

11
00:00:42,960 --> 00:00:48,510
Now, when we talk about these, we need to know that this rectangular box in the center, this is the

12
00:00:48,510 --> 00:00:49,500
box part.

13
00:00:49,620 --> 00:00:54,720
And then these little extensions on the left and right hand side are called the whiskers.

14
00:00:54,720 --> 00:01:00,360
So this extension here out to the dot on the left and then this little linear extension here, out to

15
00:01:00,360 --> 00:01:01,410
the dot on the right.

16
00:01:01,410 --> 00:01:04,709
These are the whiskers attached to the box in the middle.

17
00:01:04,739 --> 00:01:10,500
Keep in mind that sometimes we'll see box and whisker plots shown vertically, but the concept is exactly

18
00:01:10,500 --> 00:01:10,830
the same.

19
00:01:10,830 --> 00:01:15,960
We have the box in the center and then the whiskers extend up at the top and down at the bottom and

20
00:01:15,960 --> 00:01:18,150
there's a number line for reference.

21
00:01:18,150 --> 00:01:20,250
So we have the box and we have the whiskers.

22
00:01:20,250 --> 00:01:25,680
And the great thing about a box and whisker plot, again, remember, all of these plots were just looking

23
00:01:25,680 --> 00:01:31,380
for the best way to communicate the data, to communicate the point we're trying to make that we see

24
00:01:31,380 --> 00:01:32,130
in the data.

25
00:01:32,130 --> 00:01:37,440
And the great thing about the box and whisker plot is that it does show us a lot of data.

26
00:01:37,440 --> 00:01:43,320
At the same time, this picture looks simple, but it's communicating a lot of different things all

27
00:01:43,320 --> 00:01:43,890
at once.

28
00:01:44,010 --> 00:01:50,970
So this far left edge of the whisker represents the minimum value of the data.

29
00:01:50,970 --> 00:01:54,060
So this is the minimum in the data.

30
00:01:54,060 --> 00:01:59,850
So we know that whatever data set we use to create this box and whisker plot, the smallest value in

31
00:01:59,850 --> 00:02:01,470
the data set is two.

32
00:02:01,470 --> 00:02:09,960
And then, as you may expect, this far right edge of the right whisker represents the maximum point.

33
00:02:10,860 --> 00:02:11,850
Of the data set.

34
00:02:11,850 --> 00:02:15,800
So we know that the maximum value in this data set is 14.

35
00:02:15,810 --> 00:02:18,330
So that's already two pieces of information.

36
00:02:18,330 --> 00:02:26,610
And then if you remember, we talked about this before, we know that the range of a data set is always

37
00:02:26,610 --> 00:02:31,890
equal to the maximum minus the minimum.

38
00:02:31,890 --> 00:02:38,700
So because we know that the maximum here is 14 and the minimum is two, we can say that the range for

39
00:02:38,700 --> 00:02:43,620
this data set is 14 minus two or 12.

40
00:02:43,620 --> 00:02:48,480
And very quickly, we can know from the box plot that the range of the data is 12.

41
00:02:48,480 --> 00:02:50,790
So now we have three pieces of information.

42
00:02:51,060 --> 00:02:58,680
We also know that this line in the middle of the box, there will always be one line here in the middle

43
00:02:58,680 --> 00:03:02,220
of the box, not necessarily in the center of the box.

44
00:03:02,220 --> 00:03:05,370
As you can see, this one here is not centered within the box.

45
00:03:05,370 --> 00:03:08,100
It's just contained somewhere inside the box.

46
00:03:08,100 --> 00:03:12,210
But this is always representative of the median.

47
00:03:12,210 --> 00:03:15,240
So in our case, the median is at 11.

48
00:03:15,240 --> 00:03:20,010
And then finally, the box plot also gives us the quartiles of the data.

49
00:03:20,010 --> 00:03:25,110
If you remember earlier, we talked about the quartiles of a data set and the interquartile range,

50
00:03:25,110 --> 00:03:27,750
well, the left edge of the box.

51
00:03:27,750 --> 00:03:34,410
So this point right here always represents the first quartile which we call Q one.

52
00:03:34,500 --> 00:03:40,680
As you know, the median is by definition Q two, and then the right edge of the box.

53
00:03:40,680 --> 00:03:44,220
So in our case, this point right here represents.

54
00:03:44,810 --> 00:03:46,980
Q three or the third quartile.

55
00:03:47,000 --> 00:03:51,150
So for this particular data set, we know the first quartile is five.

56
00:03:51,170 --> 00:03:53,510
The third quartile is 13.

57
00:03:53,510 --> 00:03:57,730
And the second quartile, which is equivalent to the median, is of course 11.

58
00:03:57,740 --> 00:04:08,690
But then if you remember the interquartile range or IQ, R is always the third quartile minus the first

59
00:04:08,690 --> 00:04:11,150
quartile or Q three minus Q one.

60
00:04:11,540 --> 00:04:20,570
And so in our case we can quickly see that's 13 minus five, so 13 minus five or eight.

61
00:04:20,570 --> 00:04:26,870
We know that our interquartile range is eight based on the fact that the quartiles are represented by

62
00:04:26,870 --> 00:04:31,460
the left edge, this median line and the right edge of the box.

63
00:04:31,460 --> 00:04:36,080
Remember that the quartiles divide the data set into quarters.

64
00:04:36,080 --> 00:04:47,030
So we know that 25% of the data has to be contained between two and five, that another 25% of the data

65
00:04:47,030 --> 00:04:55,970
has to be between five and 11, that another 25% has to be between 11 and 13, and that the last 25%

66
00:04:55,970 --> 00:05:03,290
of the data has to be between 13 and 14, which means then that we can also say that the middle 50%

67
00:05:03,290 --> 00:05:04,370
of the data set.

68
00:05:04,370 --> 00:05:10,460
In other words, if we wrote out all the values in the data set, maybe it was something like two,

69
00:05:10,460 --> 00:05:14,120
three, three, four, et cetera.

70
00:05:14,120 --> 00:05:17,120
All the way up to the last value of 14.

71
00:05:17,120 --> 00:05:19,040
So we took all the values in our data set.

72
00:05:19,040 --> 00:05:21,080
We line them up in ascending order.

73
00:05:21,110 --> 00:05:23,720
Maybe there are 100 data points.

74
00:05:23,720 --> 00:05:31,010
Well, the middle 50 data points that middle 50% of the data are data points that span from a value

75
00:05:31,010 --> 00:05:36,860
of five to a value of 13, since those are the values that define the edges of the box.

76
00:05:36,860 --> 00:05:42,680
So you can see that with a quick box and whisker plot, we have the minimum maximum and range.

77
00:05:42,680 --> 00:05:44,300
That's three pieces of information.

78
00:05:44,300 --> 00:05:51,170
The three quartiles and IQ R, that's four more pieces of information or seven total plus the median.

79
00:05:51,200 --> 00:05:52,700
Of course, that's the same as Q two.

80
00:05:52,700 --> 00:05:57,710
But if we think about it as a separate thing now we have eight pieces of information, plus we can think

81
00:05:57,710 --> 00:06:03,410
about each quarter of the data, one being represented by the left whisker, one being represented inside

82
00:06:03,410 --> 00:06:08,480
the box to the left of this median line, one being represented by the inside of the box to the right

83
00:06:08,480 --> 00:06:12,860
of the median line and the last quarter being represented by this right whisker.

84
00:06:12,860 --> 00:06:15,260
But we have those 4/4 of the data.

85
00:06:15,260 --> 00:06:21,530
And so then we can quickly identify the middle 50% of the data or the upper 75% of the data or lower

86
00:06:21,530 --> 00:06:22,850
75% of the data.

87
00:06:22,850 --> 00:06:30,170
So we have tons of information just from this box and whisker plot available to us at a glance quickly.

88
00:06:30,170 --> 00:06:35,720
So if this is the kind of information that we're trying to communicate based on the data set and the

89
00:06:35,720 --> 00:06:40,730
goals that we have, this box and whisker plot idea might be a really good choice.

90
00:06:40,730 --> 00:06:46,910
And then the only other thing we want to say about box and whisker plots is that we often represent

91
00:06:46,910 --> 00:06:53,960
a plot like this with what's called a five number summary, because we can essentially pull five critical

92
00:06:53,960 --> 00:06:58,490
numbers from this plot and present them in a table instead.

93
00:06:58,490 --> 00:07:01,250
So those five numbers are these values here.

94
00:07:01,250 --> 00:07:06,680
One, two, three, four and five.

95
00:07:06,680 --> 00:07:12,080
In other words, minimum value, left edge of the box, median right edge of the box, maximum.

96
00:07:12,080 --> 00:07:16,850
If we collect those values in a five number summary, it looks like this.

97
00:07:16,850 --> 00:07:18,740
We show it as the minimum.

98
00:07:18,740 --> 00:07:19,160
Here.

99
00:07:19,160 --> 00:07:25,460
Our minimum value was two, so we're showing a minimum value of 2q1 or the first quartile, the boundary

100
00:07:25,460 --> 00:07:31,880
between the lower 25% of the data and the second 25% or second quarter of the data that's right there

101
00:07:31,880 --> 00:07:32,540
at five.

102
00:07:32,540 --> 00:07:37,790
So we have the first quartile, we have the median, or we could also call that Q two, we have the

103
00:07:37,790 --> 00:07:42,410
third quartile or we could also call that Q two or the second quartile we have.

104
00:07:42,410 --> 00:07:45,470
Q three, the third quartile and then the maximum.

105
00:07:45,470 --> 00:07:51,290
This five number summary is essentially displaying all of the same information as the box and whisker

106
00:07:51,290 --> 00:07:51,590
plot.

107
00:07:51,590 --> 00:07:55,100
It's a little less visual, but the numbers are all right there.

108
00:07:55,100 --> 00:07:59,630
So this is kind of two different ways of representing the same thing.

