1
00:00:00,540 --> 00:00:07,470
Hi and welcome to the lesson where we talk about the confusion matrix on classification report two very

2
00:00:07,470 --> 00:00:12,150
important reports that help us analyze the performance of our classification model.

3
00:00:12,720 --> 00:00:15,160
So like I said, a confusion matrix.

4
00:00:15,180 --> 00:00:22,020
Basically, it's used on classification models where we have categorical outputs, not continuous value,

5
00:00:22,020 --> 00:00:27,330
like a regression model, and it's often just performed on the test data set, although you can perform

6
00:00:27,330 --> 00:00:28,440
it on the training dataset.

7
00:00:28,890 --> 00:00:32,310
There's often less a less point to doing it on the training dataset.

8
00:00:32,310 --> 00:00:38,130
You want to actually see an associate model's performance on unseen data, which is the test dataset,

9
00:00:38,550 --> 00:00:43,770
and the confusion matrix is a very easily generated would say could lose function.

10
00:00:43,860 --> 00:00:45,240
You can just import it like this.

11
00:00:45,690 --> 00:00:52,830
And the two input arguments that this confusion matrix formula of function takes all the ground should

12
00:00:52,830 --> 00:00:56,670
labels, which is your way test labels or why labels.

13
00:00:57,270 --> 00:01:00,840
And these are test labels, I should say, and the predicted labels.

14
00:01:01,110 --> 00:01:07,710
This is when you input the X test into your model and you get some predictions out of it.

15
00:01:08,010 --> 00:01:11,580
So these are the two input arguments that the confusion matrix needs.

16
00:01:12,660 --> 00:01:16,050
So this is a confusion matrix confusing, isn't it?

17
00:01:16,590 --> 00:01:17,970
So what do these numbers mean?

18
00:01:18,210 --> 00:01:22,470
Well, let's move on to the next slide, and I'll explain that to you.

19
00:01:22,590 --> 00:01:28,110
So firstly, disregard this color bar here, but we'll get to that shortly.

20
00:01:28,560 --> 00:01:31,140
What I want you to understand, firstly, is the accesses here.

21
00:01:31,500 --> 00:01:36,990
So on the top here you have two predicted labels and on the right you have the true labels, true meaning

22
00:01:36,990 --> 00:01:37,920
the ground with labels.

23
00:01:38,070 --> 00:01:43,800
So why do we have a diagonal of large numbers going through this confusion matrix?

24
00:01:44,310 --> 00:01:45,870
Well, it's quite simple, actually.

25
00:01:46,480 --> 00:01:48,810
So let's take a look at the top left corner here.

26
00:01:49,230 --> 00:01:50,310
Nine seven three.

27
00:01:50,910 --> 00:01:59,190
What is this thing is that our model predicted a class zero nine hundred and seventy three times when

28
00:01:59,190 --> 00:02:00,590
it actually was class zero.

29
00:02:00,600 --> 00:02:05,880
So whenever we fitted digits that were actually zero, then the classifier.

30
00:02:06,150 --> 00:02:12,960
The model predicted that it was zero nine hundred and seventy three times what is at times I mean instances,

31
00:02:12,960 --> 00:02:20,310
because remember, our test dataset had 10000 numbers in it, images in it.

32
00:02:20,730 --> 00:02:26,880
And if you divide 10000 by 10 roughly because it's not an evenly distributed class, you get roughly

33
00:02:26,880 --> 00:02:33,600
a thousand or so numbers in each of each of them in each and each of the tests dataset.

34
00:02:34,320 --> 00:02:39,260
So we have roughly a thousand zeros, a 2001 2002 trees and so on.

35
00:02:39,270 --> 00:02:40,830
Roughly, it's not exactly tells them.

36
00:02:41,730 --> 00:02:43,020
So what is this thing here?

37
00:02:43,500 --> 00:02:44,640
This is what the model predicts.

38
00:02:44,640 --> 00:02:50,280
The model predicted zero nine hundred and seventy three times when it was actually zero.

39
00:02:50,730 --> 00:02:52,020
So what about these numbers here?

40
00:02:52,020 --> 00:02:54,390
Just to this one, this one, this one's two?

41
00:02:54,900 --> 00:03:02,490
Well, what I'm seeing here is that when the input was actually zero twice, the model incorrectly predicted

42
00:03:02,490 --> 00:03:08,820
that it was a two, then sometimes repeated that zero was a five or six or seven twice in particular,

43
00:03:08,850 --> 00:03:09,480
was it?

44
00:03:09,870 --> 00:03:17,820
So this breakdown in this table helps us understand where our model is weak so you can see Caelynn.

45
00:03:18,480 --> 00:03:22,460
That's why it is a diagonal in the middle, because one corresponds to this one here.

46
00:03:22,460 --> 00:03:27,450
And to correspond to this one two here, that gives us this large diagonal in the middle, which is

47
00:03:27,450 --> 00:03:30,540
desirable because that means your model is performing very well.

48
00:03:31,410 --> 00:03:33,060
However, let's take a look at something here.

49
00:03:33,450 --> 00:03:36,200
So let's look at it for it.

50
00:03:36,220 --> 00:03:37,320
Idiot for here.

51
00:03:38,130 --> 00:03:41,850
What else to following us is that five four five is here.

52
00:03:41,970 --> 00:03:45,590
When the ground truth was five, it truly was a five hour model.

53
00:03:45,600 --> 00:03:46,170
Got it right.

54
00:03:46,560 --> 00:03:48,780
Eight hundred and eighty four times.

55
00:03:49,140 --> 00:03:49,980
That's very good.

56
00:03:50,970 --> 00:03:59,310
And in the red ball, this tells us that twice the model taught a five was a zero three times the model

57
00:03:59,310 --> 00:04:02,640
to it, a five was a tree and so on for this.

58
00:04:03,030 --> 00:04:04,370
So whoop, you give that.

59
00:04:04,380 --> 00:04:08,250
This gives you a good understanding of how to understand the confusion matrix.

60
00:04:08,280 --> 00:04:09,450
It's actually quite simple.

61
00:04:09,900 --> 00:04:15,600
So if you if you're confused with this, just go over this lesson one more time, and hopefully it will

62
00:04:15,600 --> 00:04:16,500
all make sense.

63
00:04:17,400 --> 00:04:19,400
So let's take a look at this.

64
00:04:19,410 --> 00:04:23,310
What mean issues can we deduce from a model's performance?

65
00:04:23,580 --> 00:04:29,790
Immediately, what you should do is analyze look at a larger values that aren't in a diagonal to large

66
00:04:29,790 --> 00:04:30,240
values.

67
00:04:30,240 --> 00:04:32,070
Here are this nine and the seven.

68
00:04:32,640 --> 00:04:40,590
What is this telling us here is that our model has a problem with thinking that is actually nice because

69
00:04:40,590 --> 00:04:46,920
we know that this this rule here, all of this, all of these images here were actually false, but

70
00:04:46,920 --> 00:04:49,620
nine times, which is not that much to be fair.

71
00:04:50,070 --> 00:04:55,200
Nine times or model toward that four was nine and look so good the seven here now.

72
00:04:55,860 --> 00:04:59,570
So this means that for the input.

73
00:04:59,640 --> 00:05:06,340
Image I was seven, we got to 2014 that it was right, but seven times on one thought a seven was a

74
00:05:06,340 --> 00:05:06,640
two.

75
00:05:07,360 --> 00:05:09,370
So that's what this summary is here.

76
00:05:10,240 --> 00:05:16,900
So again, I hope this helps you understand how to how to analyze performance, how to find weaknesses

77
00:05:16,900 --> 00:05:17,470
of a model.

78
00:05:17,800 --> 00:05:20,830
Try to figure out what your model is good at and weak at.

79
00:05:21,130 --> 00:05:22,960
And maybe we need to strengthen the model.

80
00:05:25,890 --> 00:05:32,190
So before we move ahead with some more detailed metrics, let's take a look at a simple binary classification

81
00:05:32,190 --> 00:05:34,000
problem in this problem.

82
00:05:34,020 --> 00:05:41,250
We have a classifier that predicts whether a patient has a disease based on an X-ray scan, so immediately

83
00:05:41,550 --> 00:05:43,740
with they are four possible outcomes here.

84
00:05:44,100 --> 00:05:49,710
These four possible outcomes are true negatives, false positives, false negatives and true positives.

85
00:05:50,190 --> 00:05:55,410
So this is a confusion matrix here, actually for binary classification problem.

86
00:05:55,710 --> 00:06:00,420
And as you can see, we have the brewers here with what the model predicted predicted.

87
00:06:00,420 --> 00:06:01,590
No predicted ness.

88
00:06:01,950 --> 00:06:02,400
Yes.

89
00:06:02,970 --> 00:06:05,640
And then we have the true ground with answers here.

90
00:06:05,670 --> 00:06:06,560
New yes.

91
00:06:07,590 --> 00:06:09,360
So true negatives.

92
00:06:09,360 --> 00:06:10,770
What do true negatives mean?

93
00:06:11,280 --> 00:06:16,500
True negatives mean that a model correctly predicted new and the patient didn't have the disease?

94
00:06:17,220 --> 00:06:19,080
Now what about false negatives?

95
00:06:19,260 --> 00:06:21,480
Well, that's pretty bad in this case.

96
00:06:21,960 --> 00:06:25,890
That means the patients actually had the diseases, but a model predicted no.

97
00:06:26,580 --> 00:06:33,210
Now, in this case, the model predicted five times, which are false positives that the patient had

98
00:06:33,210 --> 00:06:33,780
the disease.

99
00:06:34,200 --> 00:06:37,740
However, in fact, he didn't wish he or she didn't.

100
00:06:38,430 --> 00:06:44,790
And also four true positives, which is a good outcome, means that the model is working well with detecting

101
00:06:44,790 --> 00:06:47,250
whether a person has the disease right here.

102
00:06:47,340 --> 00:06:50,340
90 So immediately now.

103
00:06:50,670 --> 00:06:52,290
What can we deduce from this?

104
00:06:52,680 --> 00:06:56,760
Now, just to just to be clear, just so I don't confuse you these totals here.

105
00:06:56,760 --> 00:07:02,550
Total true negatives is just some sum of these here for the true label total true.

106
00:07:02,870 --> 00:07:07,290
These are basically how much patients had the disease, how much didn't have the disease and what the

107
00:07:07,290 --> 00:07:08,190
model was predicting.

108
00:07:08,550 --> 00:07:13,380
So the model predicted 50 people didn't have the disease and ninety five people had the disease.

109
00:07:13,380 --> 00:07:14,970
And this is the reality here.

110
00:07:15,750 --> 00:07:17,610
So let's take a look at accuracy.

111
00:07:18,030 --> 00:07:20,610
How do we determine accuracy in this case here?

112
00:07:21,390 --> 00:07:27,690
Well, actuaries accuracy is given simply by true positives, plus true negatives.

113
00:07:28,230 --> 00:07:34,770
That's 40 plus 90 divided by the sample size, which is 145, which gives us 89 percent.

114
00:07:35,010 --> 00:07:41,730
So based on this, we can see over accuracy for this model, which is a good term to use to baseline

115
00:07:41,730 --> 00:07:42,200
in models.

116
00:07:42,360 --> 00:07:43,050
A good metric.

117
00:07:43,380 --> 00:07:48,240
However, it doesn't give us all the information as we have seen before, but our accuracy in this case

118
00:07:48,240 --> 00:07:50,700
is 89 percent or 0.89.

119
00:07:50,700 --> 00:07:51,060
Seven.

120
00:07:51,810 --> 00:07:55,020
So what about a misclassification rate or error rate?

121
00:07:55,470 --> 00:08:01,530
That's simply one minus the accuracy, which is basically 10 percent or zero point one zero three.

122
00:08:01,950 --> 00:08:04,770
Now, let's take a look at some other metrics.

123
00:08:05,250 --> 00:08:11,460
So the first one we'll talk about is true positive rate, also called sensitivity and most often called

124
00:08:11,460 --> 00:08:12,120
recall.

125
00:08:12,630 --> 00:08:15,180
So recall is a very important metric.

126
00:08:15,570 --> 00:08:21,240
It basically tells us when it's a yes, meaning that when the ground truth is yes, how often are we

127
00:08:21,240 --> 00:08:22,110
predicting yes?

128
00:08:22,470 --> 00:08:28,950
It's a very important metric, especially with the disease case in that you don't want to miss a disease.

129
00:08:29,250 --> 00:08:34,950
So when the person has the disease, how often are we predicting that the person has it that that disease?

130
00:08:35,400 --> 00:08:41,940
And in this case, we can just derived from our stats here so we can see that we predicted the model

131
00:08:41,940 --> 00:08:48,690
predicted that 90 times that the patient had the disease and when in fact in reality they were 100 patients

132
00:08:48,690 --> 00:08:49,350
had the disease.

133
00:08:49,680 --> 00:08:51,080
So we missed 10 percent.

134
00:08:51,150 --> 00:08:52,590
So but it's simple to look it out.

135
00:08:52,890 --> 00:08:58,860
It's just true positives over the true positive levels, which are these which is a hundred rate, our

136
00:08:59,370 --> 00:09:00,540
total true positives.

137
00:09:01,080 --> 00:09:02,290
So that's what we get this point.

138
00:09:02,310 --> 00:09:02,610
Nine.

139
00:09:03,150 --> 00:09:05,700
Now let's take a look at false positive rate.

140
00:09:06,570 --> 00:09:09,750
Well, false positive rate again is quite simple.

141
00:09:10,020 --> 00:09:12,270
It's when it's a new, basically.

142
00:09:12,360 --> 00:09:15,450
So when a patient doesn't have the disease, how often are we predicting?

143
00:09:15,450 --> 00:09:15,900
Yes.

144
00:09:16,350 --> 00:09:18,600
So basically, this tells us whole.

145
00:09:18,750 --> 00:09:23,940
In a way, it gives us an indication of how many times when it returns a yes.

146
00:09:24,420 --> 00:09:26,580
Does it actually mean that a patient has the disease?

147
00:09:26,940 --> 00:09:35,340
Well, the false positive rate is 11 percent, so that means 11 percent of our positives or predictions

148
00:09:35,340 --> 00:09:37,440
of yes are actually no.

149
00:09:38,220 --> 00:09:43,920
So if this was a very high number, you wouldn't know that this isn't a very reliable test for finding

150
00:09:43,920 --> 00:09:45,450
that easiest disease.

151
00:09:46,110 --> 00:09:50,010
Now let's talk about specificity or the true negative rate.

152
00:09:50,430 --> 00:09:51,890
And this is the formula for it.

153
00:09:52,080 --> 00:09:57,570
It's true negatives divided by the true negative label total number of true negative labels.

154
00:09:58,170 --> 00:10:04,560
So in layman's terms, what is this telling us is that when we predict a new is it actually in know?

155
00:10:04,560 --> 00:10:05,190
How often?

156
00:10:05,190 --> 00:10:06,210
Is it actually A.?

157
00:10:06,810 --> 00:10:13,350
So it's a very important case where if you if you're if you're being tested, OK, imagine the scenario.

158
00:10:13,740 --> 00:10:19,290
You're going for COVID tests and you get a new but you want to know what is the probability of that

159
00:10:19,290 --> 00:10:20,130
new being A..

160
00:10:20,490 --> 00:10:24,580
Well, that's basically what specificity measures and in this case is point in.

161
00:10:24,660 --> 00:10:24,990
Nine.

162
00:10:25,470 --> 00:10:32,400
So as you can see, these are very important metrics when evaluating performances of classification

163
00:10:32,400 --> 00:10:32,820
models.

164
00:10:34,350 --> 00:10:36,180
Now let's take a look at precision.

165
00:10:36,820 --> 00:10:42,360
Precision is another metric it's given by true positives, divided by how many times you predicted yes.

166
00:10:42,990 --> 00:10:48,540
So what is this telling us here is that when it's a yes, how often is it right?

167
00:10:49,500 --> 00:10:58,920
So when we predict 90 times that a patient has the disease and we predicted 95 positives in total that

168
00:10:58,920 --> 00:11:01,870
tells us basically that we have ninety five percent to four yeses.

169
00:11:01,890 --> 00:11:02,520
All right.

170
00:11:03,150 --> 00:11:05,130
So it's a very good metric again to look at.

171
00:11:06,390 --> 00:11:08,640
So this is a summary of some of these metrics.

172
00:11:09,030 --> 00:11:13,420
Precision recall, and you may see another one here, a new one called F1.

173
00:11:14,160 --> 00:11:18,430
Well, before we recap what recall and precision and talk about the trigger.

174
00:11:18,480 --> 00:11:20,970
So let me start with what F1 is.

175
00:11:21,420 --> 00:11:26,550
F1 is basically harmonic mean of what precision and recall.

176
00:11:26,940 --> 00:11:33,220
So it's a way to encapsulate both precision and recall, which are two very important metrics in one.

177
00:11:33,240 --> 00:11:37,590
And this is what with the formula, how it's worked out, how it's calculated.

178
00:11:38,250 --> 00:11:43,350
So it's a basically overall score that takes in consideration both recoil and precision.

179
00:11:43,500 --> 00:11:48,210
So just to recap, what recall is what a true positive rate or sensitivity.

180
00:11:48,780 --> 00:11:53,640
Basically, it's saying when and say yes, how often do do we predict yes?

181
00:11:53,970 --> 00:12:00,650
So in other words, how good of a how good is our classifier in identifying occurrences of our class

182
00:12:00,690 --> 00:12:03,570
or finding the true positives in the class?

183
00:12:04,290 --> 00:12:09,180
Precision is when we predict for years how often that, yes, actually correct.

184
00:12:10,170 --> 00:12:14,610
So the trade offs here, sometimes we can accept a lower position with high recoil.

185
00:12:15,120 --> 00:12:16,980
Let's take a look at disease prediction.

186
00:12:17,280 --> 00:12:19,140
Let's let's look at COVID 19 testing.

187
00:12:19,560 --> 00:12:25,140
We want to identify persons with the disease as much as possible at the expense of false positives.

188
00:12:25,950 --> 00:12:33,750
What this means here is that we we don't want to miss people with COVID 19 because if we if we if they

189
00:12:33,750 --> 00:12:38,730
have the disease yet they go to negative tests, they go back out in public and they can spread the

190
00:12:38,730 --> 00:12:39,510
virus around.

191
00:12:39,900 --> 00:12:46,380
So in this case, we we know false positives are relatively acceptable because of the cost of having

192
00:12:46,380 --> 00:12:48,900
missed a person with the disease and giving them a false.

193
00:12:49,380 --> 00:12:55,470
So what this means is that we will have a high recall with a low precision in this case.

194
00:12:57,830 --> 00:13:00,710
So now let's take a look at classification reports.

195
00:13:01,310 --> 00:13:02,920
You can see a classification report.

196
00:13:02,960 --> 00:13:07,650
Basically, it gives us the precision recall, F1 score and support.

197
00:13:07,670 --> 00:13:09,730
So what is simply the occurrences of each class?

198
00:13:09,740 --> 00:13:16,490
It's not that important in terms of as a metric, it's a small class count, and it basically gives

199
00:13:16,490 --> 00:13:17,880
us that for each class.

200
00:13:17,900 --> 00:13:18,710
Here you can see.

201
00:13:19,190 --> 00:13:25,310
So you can see for amnestied digit classifier, we're getting very good schools in precision recall

202
00:13:25,310 --> 00:13:26,210
and F1 support.

203
00:13:26,660 --> 00:13:27,720
So this is a quite good.

204
00:13:27,740 --> 00:13:32,750
However, in reality, when a lot of Real-World data sets, it's not going to be this nice.

205
00:13:32,900 --> 00:13:37,280
You're going to have some classes performing quite poorly, some performing much better.

206
00:13:37,820 --> 00:13:45,140
So this is a very good classification report, is a very good tool in a matrix that you can use to analyze

207
00:13:45,140 --> 00:13:49,520
this performance, especially when it went in terms of looking at precision and recall specific things.

208
00:13:50,180 --> 00:13:54,830
So no, let's go ahead and trino amnesty and model that we've done before.

209
00:13:55,220 --> 00:14:01,340
But now we're going to generate a classification report and a confusion matrix using Bord Keras and

210
00:14:01,340 --> 00:14:01,910
PyTorch.

211
00:14:02,330 --> 00:14:04,490
So I'll see you in the next chord lessons.

212
00:14:04,640 --> 00:14:05,090
Thank you.