1
00:00:05,400 --> 00:00:09,390
Welcome everyone to this section of the course on visualizing data.

2
00:00:10,860 --> 00:00:14,530
Visualizing data is a key aspect of data science.

3
00:00:14,550 --> 00:00:21,330
It's very important to be able to convey the information that you've worked on or have studied to others,

4
00:00:21,360 --> 00:00:26,790
especially people who may not actually have your full technical knowledge or understanding of how to

5
00:00:26,790 --> 00:00:30,240
view the raw data or perform a statistical analysis.

6
00:00:30,330 --> 00:00:35,940
You should think of data visualizations as bridging a gap between the practitioners of data science

7
00:00:35,940 --> 00:00:38,460
and key decision makers or leadership.

8
00:00:39,950 --> 00:00:44,780
As we continue learning about data visualization throughout this section, you should always keep in

9
00:00:44,780 --> 00:00:50,390
mind what is the information that I want to share or the story I am trying to tell?

10
00:00:50,390 --> 00:00:56,330
And how does this specific visualization help in conveying that information to others?

11
00:00:56,480 --> 00:01:00,620
You should think of data visualization as just another form of communication.

12
00:01:02,120 --> 00:01:08,570
Now, data visualization is especially crucial in organizations because often decisions are made based

13
00:01:08,570 --> 00:01:12,470
on final visualization or interpretation of the data.

14
00:01:12,770 --> 00:01:19,010
Keep in mind that typically when discussing data science, we're using that to actually make a decision

15
00:01:19,010 --> 00:01:21,200
or perform a specific action.

16
00:01:21,230 --> 00:01:27,860
The end result is typically not a data visualization, but the actual decision made using that data

17
00:01:27,860 --> 00:01:28,880
visualization.

18
00:01:30,140 --> 00:01:35,030
And again, remember that the purpose of data science at an enterprise level, again, is to use it

19
00:01:35,030 --> 00:01:38,760
to make key decisions and improve products or services.

20
00:01:38,780 --> 00:01:44,750
Sometimes beginners get confused and think that data visualization itself is the final end product,

21
00:01:44,750 --> 00:01:47,030
but that's typically not going to be the case.

22
00:01:47,030 --> 00:01:50,750
It's a tool for communicating what decisions should be made.

23
00:01:52,500 --> 00:01:57,600
So again, think of data visualization as an interface between those who work directly with data sources

24
00:01:57,600 --> 00:02:01,170
and those who make higher level strategic decisions in a business.

25
00:02:02,830 --> 00:02:08,919
A key part of data visualization is understanding what type of plot, chart or visualization to use,

26
00:02:08,919 --> 00:02:11,410
which is the purpose of this section of the course.

27
00:02:12,890 --> 00:02:15,690
Keep in mind, you should also be careful with visualizations.

28
00:02:15,710 --> 00:02:19,530
Often information doesn't actually need a pretty visual to be useful.

29
00:02:19,550 --> 00:02:25,610
Sometimes you really just need to report simple metrics like the average or mean, and in that case

30
00:02:25,610 --> 00:02:31,260
the number itself is probably useful enough and you don't need to make some sort of beautiful visualization.

31
00:02:31,280 --> 00:02:33,260
Remember, we're not really artists here.

32
00:02:33,260 --> 00:02:36,380
We're not trying to make the prettiest visualization of all time.

33
00:02:36,380 --> 00:02:40,190
We're trying to make the clearest communication factor possible.

34
00:02:42,120 --> 00:02:44,520
To understand which data visualization to use.

35
00:02:44,520 --> 00:02:49,860
Let's have a quick tour of the different data visualizations categories we're going to be covering in

36
00:02:49,860 --> 00:02:55,980
this section, including scatter plots, line plots, distribution plots and categorical plots.

37
00:02:57,140 --> 00:02:58,790
For each of these plot types.

38
00:02:58,790 --> 00:03:04,160
In this introduction lecture, we're going to slowly construct the plot step by step, and we're also

39
00:03:04,160 --> 00:03:08,660
briefly going to mention some more advanced variations for certain plot types.

40
00:03:08,910 --> 00:03:15,110
So let's begin with the scatterplot, which is very likely a plot or visualization that you've seen

41
00:03:15,110 --> 00:03:15,770
before.

42
00:03:17,060 --> 00:03:23,370
A typical scatterplot is going to represent two dimensions, essentially two features of your data.

43
00:03:23,390 --> 00:03:27,230
For example, the height versus weight of a group of people.

44
00:03:27,230 --> 00:03:31,190
So we have two dimensions height and weight or two features, height and weight.

45
00:03:31,370 --> 00:03:36,860
Now, a scatterplot can actually reveal relationships between two data features.

46
00:03:36,860 --> 00:03:43,010
So think of that as the key indicator of using a scatterplot when you want to reveal the relationship

47
00:03:43,010 --> 00:03:45,230
visually between two features.

48
00:03:46,300 --> 00:03:48,280
So imagine the following data set.

49
00:03:49,030 --> 00:03:51,670
Here we have what's known as the tips data set.

50
00:03:51,700 --> 00:03:53,320
We see a total bill.

51
00:03:53,350 --> 00:03:59,200
A tip, the sex of the client that actually visited the restaurant, whether or not they were a smoker,

52
00:03:59,200 --> 00:04:03,770
the day they visited, the time, whether it was lunch or dinner and the overall party size.

53
00:04:03,790 --> 00:04:07,060
So we have lots of features in this data set to work with.

54
00:04:08,240 --> 00:04:14,450
Now, you may be wondering, for example, is there a relationship between the size of the tip left

55
00:04:14,450 --> 00:04:16,070
and the total bill?

56
00:04:17,649 --> 00:04:23,680
So now we're going to analyze two specific features, and we can do this visually with a scatterplot.

57
00:04:24,940 --> 00:04:28,210
So we end up graphing along one axis.

58
00:04:28,240 --> 00:04:34,780
A feature such as the x axis can be total bill, and then we add in the other feature along the y axis,

59
00:04:34,780 --> 00:04:36,070
such as the tip.

60
00:04:36,130 --> 00:04:40,390
And then for every data point, you actually start plotting them as a dot.

61
00:04:40,390 --> 00:04:43,720
And keep in mind you can use other symbols besides a circular dot.

62
00:04:44,720 --> 00:04:50,690
So, for example, we have one point with a total bill of a little less than $10 and a tip size that

63
00:04:50,690 --> 00:04:52,070
was around $5.

64
00:04:52,430 --> 00:04:57,530
Then we keep going through the data set marking down the points of total bill versus tip.

65
00:04:57,980 --> 00:05:00,560
And you do this for the entire data set.

66
00:05:00,560 --> 00:05:07,430
And here we can now visually see a general trend that as your total bill increases, typically your

67
00:05:07,430 --> 00:05:11,450
tip is also going to increase, which intuitively makes sense.

68
00:05:11,450 --> 00:05:17,210
But obviously there's going to be data features where the relationship is unclear and it's a lot easier

69
00:05:17,210 --> 00:05:20,480
to digest, so to speak, in a visual format.

70
00:05:21,570 --> 00:05:25,470
So again, the tip tends to increase as the total bill increases.

71
00:05:25,470 --> 00:05:28,710
And that's the sort of story you can tell with a scatterplot.

72
00:05:28,860 --> 00:05:33,690
Keep in mind that the relationship may also show something like the inverse, where one feature tends

73
00:05:33,690 --> 00:05:36,480
to decrease as another feature increases.

74
00:05:37,780 --> 00:05:40,630
So what are the considerations for scatter plots?

75
00:05:41,450 --> 00:05:46,700
Well, you could add a trend line, so you could actually use a simple linear regression model that

76
00:05:46,700 --> 00:05:48,890
we're going to learn about later on in this course.

77
00:05:48,890 --> 00:05:56,240
And you could plot that on top of the scatter points to really set home the story that there is some

78
00:05:56,240 --> 00:05:58,820
sort of relationship here between these two features.

79
00:05:59,890 --> 00:06:03,790
You can also add transparency for stacked scatter points.

80
00:06:03,790 --> 00:06:07,120
So take a look at the area inside the circle.

81
00:06:07,120 --> 00:06:13,000
You'll notice that right now there's actually a lot of dots that are stacked right on top of each other.

82
00:06:13,000 --> 00:06:19,060
And for really large data sets, it may be unclear how many dots are actually stacked on top.

83
00:06:19,060 --> 00:06:23,950
So what you can do is add a little bit of transparency for those dots.

84
00:06:23,950 --> 00:06:29,800
That means dots that are stacked on top of each other will appear darker than a singular data point

85
00:06:29,800 --> 00:06:30,640
or dot.

86
00:06:31,490 --> 00:06:38,150
So here we're now adding some transparency, which allows you to see where the data points are actually

87
00:06:38,150 --> 00:06:39,880
stacked on top of each other.

88
00:06:39,890 --> 00:06:46,760
So again, if you have a lot of data points, use transparency to indicate where your data is clustered.

89
00:06:48,770 --> 00:06:51,290
You could also add more variables.

90
00:06:51,290 --> 00:06:57,790
So previously we showed you using scatter plots for two features that is total bill and tip.

91
00:06:57,800 --> 00:07:02,030
But remember, we also had the sex of the person visiting the restaurant.

92
00:07:02,120 --> 00:07:08,690
So in that case, we can add another feature using something like color, such as orange for male and

93
00:07:08,690 --> 00:07:09,770
blue for female.

94
00:07:09,800 --> 00:07:13,970
And now you can add in even more features into your visualization.

95
00:07:14,180 --> 00:07:18,770
Remember, you should only do this if that's important to the final decision being made.

96
00:07:20,260 --> 00:07:22,360
And you can do this again with shapes.

97
00:07:22,360 --> 00:07:27,130
So now you can see we're actually putting in a lot of information visually into this plot.

98
00:07:27,250 --> 00:07:33,160
We're not just plotting out total bill versus tip, but we're also showing the sex of the person through

99
00:07:33,160 --> 00:07:37,450
color and then using shape to indicate whether or not they were a smoker.

100
00:07:37,450 --> 00:07:38,830
So notice we have smoker.

101
00:07:38,830 --> 00:07:39,130
Yes.

102
00:07:39,130 --> 00:07:43,540
As a circle and a cross or X for non smoker.

103
00:07:43,720 --> 00:07:46,330
Keep in mind, this can get very busy very quickly.

104
00:07:46,330 --> 00:07:53,050
So you typically want to try to show the minimum amount of features in order to get your story across.

105
00:07:53,050 --> 00:07:55,450
You shouldn't just add features for the sake of it.

106
00:07:57,620 --> 00:07:59,930
Now let's move on to line plots.

107
00:08:00,860 --> 00:08:06,950
Sometimes we already know there should be a continuous relationship between points along an axes.

108
00:08:07,070 --> 00:08:13,160
This is where we can use a simple line to indicate a known relationship between points along a data

109
00:08:13,160 --> 00:08:13,790
feature.

110
00:08:15,600 --> 00:08:22,830
So for example, in our previous example of using the total bill versus tips, it does not make sense

111
00:08:22,830 --> 00:08:29,490
to draw a line between the points because remember, the actual visitors to the restaurant are all separate

112
00:08:29,490 --> 00:08:30,290
from each other.

113
00:08:30,300 --> 00:08:35,580
There's no relationship between a person that showed up on Sunday for dinner versus another person that

114
00:08:35,580 --> 00:08:37,309
showed up on Saturday for lunch.

115
00:08:37,320 --> 00:08:40,470
You shouldn't be drawing a line between those two dots.

116
00:08:41,640 --> 00:08:46,980
So again, here, the line connecting total bills of separate parties doesn't actually represent anything.

117
00:08:46,980 --> 00:08:52,050
So a line plot would be incorrect on charting total bill versus tip.

118
00:08:53,390 --> 00:08:59,420
So here a line is misleading and this is an example of when you should not use a line plot because it's

119
00:08:59,420 --> 00:09:03,290
trying to indicate some known relationship between each party.

120
00:09:03,290 --> 00:09:09,890
Bill, You should only be using line plots when you for sure know that there is a relationship before

121
00:09:09,890 --> 00:09:11,540
you actually plotted out the data.

122
00:09:13,170 --> 00:09:17,550
So a trend line would have been the correct way to address a need to show a linear fit.

123
00:09:17,550 --> 00:09:19,350
And we saw the trend line earlier.

124
00:09:20,160 --> 00:09:23,280
That is the trend line on top of your scatterplot.

125
00:09:23,310 --> 00:09:26,580
This is not what we mean with a line plot.

126
00:09:26,790 --> 00:09:33,000
A line plot is going to assume prior knowledge of some sort of relationship in the data.

127
00:09:33,180 --> 00:09:38,650
Here we are not assuming some prior knowledge of a relationship between total bill and tip.

128
00:09:38,670 --> 00:09:43,170
It's only after we plotted the points that we decide to add a trend line.

129
00:09:43,200 --> 00:09:46,440
This is, again, not the line plot that we're talking about.

130
00:09:46,620 --> 00:09:49,080
So it is the line plot that we're discussing here.

131
00:09:49,620 --> 00:09:51,900
And when is the line appropriate?

132
00:09:53,010 --> 00:09:58,770
When we already know for certain that there is some sort of continuous relationship between data points

133
00:09:58,770 --> 00:10:00,030
along a feature.

134
00:10:00,240 --> 00:10:02,850
A very common example is timestamps.

135
00:10:02,850 --> 00:10:08,130
So we know for certain there are continuous times in between each timestamp to point.

136
00:10:09,090 --> 00:10:11,460
So imagine that you have the following data set.

137
00:10:11,550 --> 00:10:18,480
You have a year, a month, and then the passengers that happen to fly may be in the thousands for that

138
00:10:18,480 --> 00:10:20,580
particular month of that particular year.

139
00:10:20,580 --> 00:10:28,290
So the way to read this is that in January of 1949, you had 112,000 passengers fly on an airplane.

140
00:10:28,290 --> 00:10:35,490
And then in February of 1949, you had 118,000 passengers fly on an airplane and so on and so on.

141
00:10:36,090 --> 00:10:40,530
Notice that we already know the way times and dates work.

142
00:10:40,530 --> 00:10:46,920
So I know there's a relationship between January to February, to March, to April, to May, etc.,

143
00:10:46,920 --> 00:10:48,150
throughout the years.

144
00:10:48,150 --> 00:10:54,600
So it makes sense to draw a line between the actual time stamps, since I already know that's a continuous

145
00:10:54,600 --> 00:10:56,160
linear relationship.

146
00:10:59,230 --> 00:11:05,470
So we know flights occurred between years and if you wanted to, you could just add a dot there for

147
00:11:05,470 --> 00:11:10,360
the passengers total per year or per month or over the entire time series.

148
00:11:10,360 --> 00:11:15,910
But as I mentioned, you know that flights occurred throughout the days between those months.

149
00:11:15,910 --> 00:11:21,520
And, you know, there's a linear relationship or continuous relationship as you move along the time

150
00:11:21,520 --> 00:11:26,260
axes, in which case you can use a line plot.

151
00:11:26,260 --> 00:11:32,020
The line indicates that you have that continuous knowledge that, you know, time moves forward and

152
00:11:32,020 --> 00:11:38,890
you know that there is some sort of relationship time wise along an axis between the data point in January

153
00:11:38,890 --> 00:11:41,710
to February, to March to April and so on.

154
00:11:41,710 --> 00:11:47,200
So this indicates that, you know, that there is some sort of continuous relationship or knowledge

155
00:11:47,200 --> 00:11:53,590
for a particular data feature and that there is a linkage between the separate data points, the link

156
00:11:53,590 --> 00:11:57,310
here being that they're continuous on some sort of time series.

157
00:11:59,240 --> 00:12:02,050
A line plot also makes it easy to stack features.

158
00:12:02,050 --> 00:12:07,360
So if you want it to, you could separate out all those data points by month and then you can actually

159
00:12:07,360 --> 00:12:09,310
stack them via different colors.

160
00:12:09,310 --> 00:12:15,880
So you can see over the years how many travelers were there in January versus June or July versus November

161
00:12:15,880 --> 00:12:16,770
and December.

162
00:12:16,780 --> 00:12:22,510
So if we showed you this plot and you quickly did an analysis, what's the story being told here?

163
00:12:22,660 --> 00:12:28,930
Well, you'll notice it looks like these cooler colors, like the greens and the blues that are actually

164
00:12:28,930 --> 00:12:33,070
related to the summer months tend to be higher than the rest of the months.

165
00:12:33,070 --> 00:12:37,000
And that makes sense because people would tend to travel more during the summer.

166
00:12:37,000 --> 00:12:43,510
And we can now see a lime plot indicating a story here that you expect higher travel during the summer

167
00:12:43,510 --> 00:12:45,640
months regardless of the year.

168
00:12:45,640 --> 00:12:49,390
That was always the case from 1950 all the way to 1960.

169
00:12:49,390 --> 00:12:55,030
And the overall trend is that passengers keep growing over year over year.

170
00:12:55,030 --> 00:12:57,670
So we're telling actually two stories here.

171
00:12:57,670 --> 00:13:03,730
The passengers are growing year over year, but also summer months are always popular regardless of

172
00:13:03,730 --> 00:13:05,110
the year you're looking at.

173
00:13:06,990 --> 00:13:09,660
Now let's discuss distribution plots.

174
00:13:10,720 --> 00:13:16,900
Recall our discussions on measurements of dispersion of data such as variance or standard deviation.

175
00:13:17,140 --> 00:13:23,050
Distribution plots allow us to visualize the dispersion of data across a feature or variable.

176
00:13:23,200 --> 00:13:27,850
One of the most common ways to do this is through a plot known as a histogram.

177
00:13:29,490 --> 00:13:32,220
Let's revisit our tips data set.

178
00:13:33,590 --> 00:13:38,900
So let's imagine we wanted to know what is the distribution of total bill amounts?

179
00:13:39,020 --> 00:13:42,920
Is everything super wide ranging from $1 to 1000?

180
00:13:43,520 --> 00:13:48,190
Or do most bills tend to have the same amount somewhere between ten and $20?

181
00:13:48,200 --> 00:13:50,150
How could we actually answer this?

182
00:13:50,180 --> 00:13:56,690
Well, you could just report back a standard deviation or variance along that feature, but that doesn't

183
00:13:56,690 --> 00:14:00,610
really tell you the whole story because then you may not need to include the mean.

184
00:14:00,620 --> 00:14:05,420
So what is the visualization we could use to show the distribution of a feature?

185
00:14:06,750 --> 00:14:11,760
And again, I could answer this through some statistical metrics, like telling you the mean total bill

186
00:14:11,760 --> 00:14:13,500
is $18.79.

187
00:14:13,500 --> 00:14:17,410
If the standard deviation is $8.09 or $0.90.

188
00:14:17,430 --> 00:14:18,360
Minimum values.

189
00:14:18,360 --> 00:14:19,440
Maximum values.

190
00:14:19,440 --> 00:14:24,180
But this is sometimes a little confusing and it's hard to actually internalize.

191
00:14:25,500 --> 00:14:30,270
So instead of using this information, let's try a visualization.

192
00:14:31,490 --> 00:14:38,300
So a histogram is going to count the number of occurrences of data points within a range along the x

193
00:14:38,300 --> 00:14:38,980
axis.

194
00:14:38,990 --> 00:14:45,020
Then it will create a bar of the height of the count of the occurrences within that range in the x axis

195
00:14:45,020 --> 00:14:47,360
where the height moves along the y axis.

196
00:14:47,600 --> 00:14:54,530
You should always know that the x axis feature is continuous and this has to be the case for a histogram.

197
00:14:54,530 --> 00:14:56,390
Again for a distribution plot.

198
00:14:56,390 --> 00:14:59,630
The x axis feature should be continuous.

199
00:15:01,060 --> 00:15:05,500
So we end up getting a distribution or histogram that looks like this.

200
00:15:05,530 --> 00:15:09,190
So we have the total bill on the x axis.

201
00:15:09,190 --> 00:15:16,720
And remember, total bill is continuous because you go from $20 to $21.22 and then there's also cents

202
00:15:16,720 --> 00:15:17,560
in between those.

203
00:15:17,560 --> 00:15:21,550
So $20.01, $20.02 etc., etc..

204
00:15:22,830 --> 00:15:29,390
And you can also change the bin size in order to get a different visualization around the distribution.

205
00:15:29,400 --> 00:15:36,540
It depends on how granular you want to be on the actual width of the bin for the x axis feature.

206
00:15:36,570 --> 00:15:42,510
Notice that the y axis here is just a count of the number of occurrences that fell between that particular

207
00:15:42,510 --> 00:15:43,620
width of the bin.

208
00:15:44,620 --> 00:15:50,080
So if we were to take a specific look at one of these, for example, this highest bar, we're taking

209
00:15:50,080 --> 00:15:57,870
a look at dollar total bill values that are somewhere between, I would say $12 and maybe $16.

210
00:15:57,880 --> 00:16:02,770
So again, take a look at the x axis width here of this actual bin.

211
00:16:02,890 --> 00:16:06,820
So we're taking a look at some of those values in between ten and 20.

212
00:16:06,850 --> 00:16:09,180
It doesn't go from 10 to 20.

213
00:16:09,190 --> 00:16:11,080
It goes from somewhere in between that.

214
00:16:11,080 --> 00:16:16,120
So maybe somewhere around $12, so somewhere around 16 or $17.

215
00:16:16,120 --> 00:16:21,790
And then we count how many total bill instances were between those two values.

216
00:16:21,790 --> 00:16:27,310
And then it ends up being something like a little less than 70, and you're doing that for all the bins.

217
00:16:27,310 --> 00:16:29,440
And that's how a histogram works.

218
00:16:29,440 --> 00:16:36,580
And again, remember, you can change that bin size to try to get a more granular feel on your data.

219
00:16:36,580 --> 00:16:44,380
So here we have wider bin sizes, which means we're going to have less vertical stacking, so to speak,

220
00:16:44,380 --> 00:16:47,530
but you could diminish that bin size.

221
00:16:47,530 --> 00:16:54,600
So look at smaller pieces along the x axis and gets higher granular visualization.

222
00:16:54,610 --> 00:17:00,100
Again, it's still showing the same data, but you're being a little more specific on the ranges along

223
00:17:00,100 --> 00:17:05,980
those bins and it's up to you to decide what's the best bin size to choose and what's the story you're

224
00:17:05,980 --> 00:17:08,020
trying to tell with the distribution plot.

225
00:17:10,849 --> 00:17:15,339
Now, keep in mind, there's actually many more types of plots that can display distribution.

226
00:17:15,349 --> 00:17:21,230
There is a box and whisker plot, and there's also a CD known as a kernel density estimation plot.

227
00:17:21,290 --> 00:17:25,430
By far the most common you're probably going to encounter, though, is the histogram.

228
00:17:27,160 --> 00:17:29,440
Now let's talk about categorical plots.

229
00:17:30,390 --> 00:17:35,170
Categorical plots simply display some metric per category.

230
00:17:35,190 --> 00:17:41,100
For example, you can have a mean value per category or a count per category.

231
00:17:41,310 --> 00:17:46,740
There are many variations of these types of plots, but one of the most common is the simple bar plot.

232
00:17:47,920 --> 00:17:52,540
Be careful not to confuse the bar plot for the histogram we just saw earlier.

233
00:17:52,570 --> 00:17:59,350
They may appear similar, but you should carefully look at the x axis feature for a bar chart or bar

234
00:17:59,350 --> 00:18:00,970
plot that is categorical.

235
00:18:01,000 --> 00:18:04,540
Then the feature along that x axis should be categorical.

236
00:18:04,630 --> 00:18:08,620
Recall for the histogram, it was a continuous feature.

237
00:18:09,800 --> 00:18:15,320
So again, taking a look back at this tips data set that we've been playing around with, maybe we're

238
00:18:15,320 --> 00:18:19,940
trying to answer a question like what is the total expenditure per day?

239
00:18:19,940 --> 00:18:27,010
So the sum of the total bill per day, per sundae, per Monday, per Tuesday, etc..

240
00:18:27,020 --> 00:18:33,410
So for example, it was a total amount of revenue or expenditure per each category of day.

241
00:18:33,440 --> 00:18:33,920
Notice.

242
00:18:33,920 --> 00:18:38,720
Day is not really continuous in the same way because if we're thinking in terms of weekdays, there's

243
00:18:38,720 --> 00:18:42,770
not really a weekday between Monday and Tuesday or Tuesday and Wednesday.

244
00:18:42,770 --> 00:18:46,010
So we can think of each of these days as a category.

245
00:18:46,010 --> 00:18:52,040
And then we want some metric per category, such as the total expenditure or the sum of the total bill

246
00:18:52,040 --> 00:18:52,790
per day.

247
00:18:54,160 --> 00:18:58,660
So if we were to do this for the data set and keep in mind, this dataset actually only has four days

248
00:18:58,660 --> 00:19:03,580
Thursday, Friday, Saturday and Sunday, we can get a bar plot or a categorical plot.

249
00:19:03,580 --> 00:19:08,670
And again, the whole purpose here is to show some sort of metric per day.

250
00:19:08,680 --> 00:19:13,630
And what's nice about the visualization is it makes it a lot easier to compare different categories

251
00:19:13,630 --> 00:19:14,500
to each other.

252
00:19:14,500 --> 00:19:20,830
So here we can really quickly tell that Saturday and Sunday tend to be the highest expenditure days

253
00:19:20,830 --> 00:19:23,920
and surprisingly Friday is actually quite low.

254
00:19:23,920 --> 00:19:30,370
And here we can quickly see the total sums per day in a visual format, allowing me to quickly compare

255
00:19:30,370 --> 00:19:32,050
days to each other.

256
00:19:32,050 --> 00:19:37,810
I could have given you the same metrics just in terms of numbers per day, but with a visualization,

257
00:19:37,810 --> 00:19:39,220
it's very easy to see.

258
00:19:39,220 --> 00:19:40,090
The story here.

259
00:19:40,090 --> 00:19:45,370
Saturday and Sunday are very popular then followed by Thursday and Friday surprisingly low.

260
00:19:46,900 --> 00:19:49,150
Keep in mind, you can always add in more features.

261
00:19:49,150 --> 00:19:55,930
So for example, I could separate this out via another category, and now I have a category per category.

262
00:19:55,930 --> 00:20:02,980
So essentially a metric per category per category, a metric per day per sex, male or female.

263
00:20:02,980 --> 00:20:06,010
And you can keep adding more features via color or design.

264
00:20:06,010 --> 00:20:10,750
But again, we really want to make sure that we're showing the simplest visualization possible in order

265
00:20:10,750 --> 00:20:13,120
to get the information we're interested across.

266
00:20:14,910 --> 00:20:19,950
And again, remember that there are a lot more types of visualizations than the ones shown here on our

267
00:20:19,950 --> 00:20:21,020
quick little tour.

268
00:20:21,030 --> 00:20:27,180
But always keep in mind what is the information I want to share or the story I'm trying to tell, and

269
00:20:27,180 --> 00:20:32,160
how does this specific visualization help in conveying that information to others?

270
00:20:32,820 --> 00:20:38,040
Okay, let's take a much deeper dive into data visualizations throughout the next lectures.

