1
00:00:00,420 --> 00:00:07,050
Hello, everyone, and welcome to this new section in which we look at other methods of evaluating our

2
00:00:07,050 --> 00:00:11,740
model other than the binary accuracy which will be seen so far.

3
00:00:11,760 --> 00:00:17,670
So in this section we'll look at how to compute the true positives, false positives, true negatives,

4
00:00:17,670 --> 00:00:24,540
false negatives, the precision, the recall, the A under the curve, how to come up with conversion

5
00:00:24,540 --> 00:00:26,100
metrics like this.

6
00:00:26,100 --> 00:00:34,320
And finally, how to plot out an rosy curve like this one, which permits us select the threshold more

7
00:00:34,320 --> 00:00:35,080
efficiently.

8
00:00:35,100 --> 00:00:41,160
Don't forget to subscribe and hit the notification button so you never miss amazing content like this.

9
00:00:42,000 --> 00:00:48,990
Let's now look at other ways of evaluating our model other than the accuracy which we've seen so far.

10
00:00:49,740 --> 00:00:57,480
To better understand why working with all of the accuracy isn't always a great idea, we have to take

11
00:00:57,480 --> 00:01:06,240
into consideration the fact that our model on a test set has 94% accuracy.

12
00:01:06,570 --> 00:01:17,700
Now, this means that we have six out of 100 predictions, which are actually false.

13
00:01:19,020 --> 00:01:28,050
Now, what if I get to the hospital and I'm told that I don't have malaria when in fact I actually have

14
00:01:28,050 --> 00:01:29,130
this disease?

15
00:01:29,910 --> 00:01:38,310
So that said, the model predicts an infected and I actually have the parasite in my bloodstream.

16
00:01:38,910 --> 00:01:46,380
Now, this particular situation becomes very dangerous because the patient gets back home thinking he

17
00:01:46,380 --> 00:01:52,740
or she doesn't need any treatment, whereas that patient actually has this parasite.

18
00:01:52,770 --> 00:02:01,560
You see that even with a 94% accuracy, we wouldn't be able to save ourselves from such chaotic model

19
00:02:01,560 --> 00:02:02,460
predictions.

20
00:02:03,030 --> 00:02:10,100
Now, in another example, you have a situation where the actual is, let's put it out here.

21
00:02:10,110 --> 00:02:14,810
So in another example, we can have a situation where the actual is on parasitized.

22
00:02:14,820 --> 00:02:22,500
So actually you do not have this parasite, but the model predicts that you have the parasite.

23
00:02:22,770 --> 00:02:31,830
Now, in this case, although we have a wrong prediction, we have actually a less chaotic situation

24
00:02:31,830 --> 00:02:35,580
as compared to this previous case here since.

25
00:02:36,690 --> 00:02:40,170
At least actually you are uninfected.

26
00:02:40,650 --> 00:02:53,070
If we consider negative, that's this negative to be uninfected and positive to be passed to contain

27
00:02:53,070 --> 00:02:54,020
the parasite.

28
00:02:54,030 --> 00:02:55,110
That's P.

29
00:02:55,380 --> 00:02:59,500
So yeah, we consider we have positive and then we have negative.

30
00:02:59,520 --> 00:03:04,560
So here we have negative, uninfected and positive parasitized.

31
00:03:04,590 --> 00:03:11,850
Now if we let this, you will find that this first situation where actually we have the parasites and

32
00:03:11,850 --> 00:03:19,780
the model predicts on parasitized is known as a false negative.

33
00:03:19,800 --> 00:03:22,530
So here we have this false negative.

34
00:03:24,850 --> 00:03:31,150
And this is because the model predicts negative when it isn't actually negative.

35
00:03:31,180 --> 00:03:38,260
So since we have this wrong prediction for negative, we call it a false negative.

36
00:03:38,290 --> 00:03:46,330
And in this case, where we have the model predicting parasitized that is positive when it's actually

37
00:03:46,330 --> 00:03:50,030
negative, we call this a false positive.

38
00:03:50,050 --> 00:03:52,240
So, yeah, we have FP.

39
00:03:52,600 --> 00:03:55,990
And here we have f DN.

40
00:03:56,170 --> 00:04:05,470
Now, there are two other scenarios that is with c, n, and TP for the T, and we have the true negative,

41
00:04:05,470 --> 00:04:13,810
the true negatives and the CP, the true positives for the t n We have the model predicting negative

42
00:04:13,810 --> 00:04:16,240
when actually we are negative.

43
00:04:16,240 --> 00:04:21,280
That is, we have the model saying this is uninfected when actually it's uninfected.

44
00:04:21,310 --> 00:04:28,600
So that's a true negative and then for the true positive, the model predicts a positive that's parasitized

45
00:04:28,600 --> 00:04:30,580
when actually it's parasitized.

46
00:04:30,580 --> 00:04:32,350
So that's a true positive.

47
00:04:32,530 --> 00:04:39,130
Now hopefully you've understood the concepts of true negatives, true positives, false negatives and

48
00:04:39,130 --> 00:04:40,750
false positives.

49
00:04:41,740 --> 00:04:49,300
We could then summarize all this information in this matrix known as the confusion matrix and this confusion

50
00:04:49,300 --> 00:04:49,780
matrix.

51
00:04:49,780 --> 00:04:55,480
We have the true negatives, the number of true negatives here, the number of true positives, the

52
00:04:55,480 --> 00:04:59,440
number of false negatives and the number of false positives.

53
00:04:59,980 --> 00:05:10,720
This means that if we have a test set of, say, 2750 different data points and then we run this or

54
00:05:10,720 --> 00:05:17,170
we evaluate this with our model, we'll be able to get this number of true negatives, get this number

55
00:05:17,170 --> 00:05:23,980
of false negatives, number of true positives, number of false positives, and hence better evaluate

56
00:05:24,280 --> 00:05:25,390
this model.

57
00:05:26,020 --> 00:05:33,310
So if we take this example where we've evaluated our model on the test set and then this model A produces

58
00:05:33,310 --> 00:05:39,460
this coefficient matrix and this Model B produces other conversion matrix, where here we see we have

59
00:05:39,460 --> 00:05:41,170
four the true negatives and true passes.

60
00:05:41,170 --> 00:05:47,860
We have 1000, 1000 as 2000 correctly predicted data points.

61
00:05:47,860 --> 00:05:52,000
And then here we have 1002 thousand, 2000 correctly predicted the points.

62
00:05:52,000 --> 00:06:04,150
And then for this model A, we have 700 false positives and 50 false negatives, whereas for Model B

63
00:06:04,150 --> 00:06:09,670
we have 50 false positives and 700 false negatives.

64
00:06:10,180 --> 00:06:17,680
Recall we are defined negative to be uninfected and positive to be parasitized, and hence if we are

65
00:06:17,680 --> 00:06:25,930
to choose a model between A and B, we'll try to choose that model which minimizes the number of false

66
00:06:25,930 --> 00:06:26,950
negatives.

67
00:06:27,250 --> 00:06:34,630
So we are not saying that we shouldn't minimize the number of false positives because we have to try

68
00:06:34,630 --> 00:06:38,440
to minimize all the false predictions.

69
00:06:38,440 --> 00:06:46,480
But then since we have false negatives, we are telling a sick person that he or she isn't sick.

70
00:06:47,850 --> 00:06:54,540
This at least, is worse than telling a healthy person that he or she is sick.

71
00:06:54,690 --> 00:06:58,860
And so we'll try to prioritize the number of false negatives.

72
00:06:58,860 --> 00:07:07,590
And based on this prioritization, we are going to prefer Model A since we have the smaller number of

73
00:07:07,590 --> 00:07:08,760
false negatives.

74
00:07:08,970 --> 00:07:13,470
And so your would choose model A over model B.

75
00:07:14,280 --> 00:07:21,900
Now, as a quick note, you may decide to say negative is for parasitized and positive is for uninfected.

76
00:07:22,290 --> 00:07:27,900
It isn't a must that this must be tied together like this.

77
00:07:27,900 --> 00:07:34,380
But for clarity, purposes is better to look at it this way since saying you are tested negative means

78
00:07:34,380 --> 00:07:38,100
you are uninfected and test a positive means you are parasitized.

79
00:07:39,480 --> 00:07:45,720
Now you should also note that depending on the kind of problem you want to solve, in some cases you

80
00:07:45,720 --> 00:07:53,010
will want to prioritize minimizing the number of false positive over the number of false negatives.

81
00:07:53,010 --> 00:07:56,970
So this actually depends on the problem you're trying to solve.

82
00:07:57,930 --> 00:08:02,850
But in this case, we are prioritizing the number of false negatives.

83
00:08:03,180 --> 00:08:11,010
Now, based on what we've seen so far, we're going to introduce several new performance metrics and

84
00:08:11,010 --> 00:08:14,400
all this take up these different formulas.

85
00:08:14,400 --> 00:08:19,980
We could see right here we have the precision, which is the not the number of true positives divided

86
00:08:19,980 --> 00:08:24,470
by the number of true positives, plus the number of false positives recalled Trump.

87
00:08:24,510 --> 00:08:29,250
True positives divided by a number of true positives, plus number of false negatives.

88
00:08:29,250 --> 00:08:34,620
The accuracy, the number of true negatives, plus number of true positives, divided by a number of

89
00:08:34,620 --> 00:08:39,270
true negatives plus true positives plus false negatives plus false positives.

90
00:08:39,270 --> 00:08:43,380
So we'll stop for this first three for now.

91
00:08:43,560 --> 00:08:44,940
Now what do you notice?

92
00:08:44,940 --> 00:08:50,670
You'll notice that in this position, recall we have this true positive, true, positive.

93
00:08:50,670 --> 00:08:53,420
And here true positive, true, positive.

94
00:08:53,430 --> 00:08:59,840
What differentiates them is the fact that in the position we have false positive, the denominator and

95
00:08:59,850 --> 00:09:02,880
the recall, we have false negative in the denominator.

96
00:09:02,910 --> 00:09:11,550
This means that if the number of false negatives is high, that is, we have let's say we have a constant,

97
00:09:11,580 --> 00:09:17,000
a constant K divided by a high value.

98
00:09:17,010 --> 00:09:23,850
So constant divided by a high value, so constant, divided by high here we're going to have a low output.

99
00:09:24,090 --> 00:09:31,740
And so if we want to have a low recall, then we need to have a high number of false negatives.

100
00:09:31,740 --> 00:09:39,300
And if we want to have a low precision, then we need to have a high number of false positives.

101
00:09:39,390 --> 00:09:45,470
Now, in our case, we're trying to minimize the number of false negatives.

102
00:09:45,480 --> 00:09:50,730
And since we're trying to minimize the number of false negatives, it means that we're trying to maximize

103
00:09:50,730 --> 00:10:01,410
the recall since minimizing this denominator will entail maximizing this overall TP on TP plus F and

104
00:10:01,830 --> 00:10:07,800
and so here we're trying to prioritize the recall over the precision.

105
00:10:08,370 --> 00:10:12,870
Now, if you look at the accuracy, you'll notice that we have ten plus.

106
00:10:12,870 --> 00:10:14,310
TP And ten plus.

107
00:10:14,310 --> 00:10:16,080
TP Right here, ten plus.

108
00:10:16,080 --> 00:10:16,920
TP Ten plus.

109
00:10:16,920 --> 00:10:19,620
TP And we have F and plus FP.

110
00:10:19,860 --> 00:10:28,950
If you are keen enough, you should see that this accuracy doesn't give any priority for whether the

111
00:10:28,950 --> 00:10:32,080
false negatives or the false positives.

112
00:10:32,100 --> 00:10:34,580
It treats this two as the same.

113
00:10:34,590 --> 00:10:41,400
But as we've seen previously in the real world, in solving real world problems, many times we would

114
00:10:41,400 --> 00:10:43,000
have to prioritize.

115
00:10:43,020 --> 00:10:48,510
Hence, the accuracy may not always be the best metrics for our problem.

116
00:10:48,720 --> 00:10:58,140
In our case, we find that using the recall is even better than using the accuracy.

117
00:10:58,230 --> 00:11:09,330
As with the recall, we get to reduce or with the recall, we get to see whether our model does well

118
00:11:09,330 --> 00:11:13,140
at minimizing the number of false negatives.

119
00:11:14,310 --> 00:11:18,300
Now we also have this F one car two times the precision times.

120
00:11:18,300 --> 00:11:24,600
Recall the suppression of this recall divided by the precision, plus recall the specificity, the number

121
00:11:24,600 --> 00:11:28,950
of true negatives, divided by number of true negatives, plus number of false positives.

122
00:11:29,820 --> 00:11:37,280
And then we also have this RC plot right here, our assistance for receiver operating characteristics.

123
00:11:37,290 --> 00:11:40,980
Here we have the true positive rate and the false positive rate.

124
00:11:41,190 --> 00:11:44,580
The true positive rate is the number of true positive.

125
00:11:44,580 --> 00:11:46,830
Double number of true positive plus number.

126
00:11:46,960 --> 00:11:50,350
All false negatives, which happens to be the recall.

127
00:11:50,350 --> 00:11:56,290
And then the false positive rate is a number of false positives divided by a number of false positives,

128
00:11:56,290 --> 00:12:03,460
plus the number of true negatives, which if you look carefully, you'll find that it's equal one minus

129
00:12:03,460 --> 00:12:06,430
the specificity which has been defined right here.

130
00:12:07,420 --> 00:12:17,050
Before getting to understand this RC plot which we've put out here, let's recall this two models which

131
00:12:17,050 --> 00:12:26,380
we had described previously, that's Model A and Model B, where Model A had a smaller number of false

132
00:12:26,380 --> 00:12:29,230
negatives as compared to Model B.

133
00:12:30,310 --> 00:12:38,740
Now, if we pick out just as Model A, and then we are interested in reducing this or let's say we pick

134
00:12:38,740 --> 00:12:45,850
our Model B, suppose we pick our Model B, our interested in reducing this number of false negatives

135
00:12:45,850 --> 00:12:46,750
right here.

136
00:12:47,590 --> 00:12:53,410
Then one solution could be that of modifying the threshold.

137
00:12:53,500 --> 00:13:02,970
So if we have a threshold of 0.5, meaning that a both like we have this 0.5 below 0.5, we consider

138
00:13:02,970 --> 00:13:09,250
a negative above 0.5, we consider a positive that is parasitized and then below uninfected.

139
00:13:10,120 --> 00:13:14,740
Then what I could do here is reduce this threshold.

140
00:13:14,740 --> 00:13:19,840
So if I take this threshold to say a value of 0.2, let's take this here.

141
00:13:20,020 --> 00:13:23,020
So I've reduced this threshold now to 0.2.

142
00:13:23,260 --> 00:13:34,090
You'll see that for most of the predictions, our model is going to say that this is a parasitized output.

143
00:13:34,090 --> 00:13:37,230
Since now this threshold has been reduced.

144
00:13:37,240 --> 00:13:45,310
This means that if we have a model prediction of 0.3, which initially would have been uninfected,

145
00:13:45,310 --> 00:13:49,720
now this Model C is this as parasitized.

146
00:13:50,560 --> 00:13:59,110
And so this makes it now more difficult for our model to have false negatives, since our model now

147
00:13:59,110 --> 00:14:08,380
has this tendency of predicting that given an input image is parasitized.

148
00:14:09,190 --> 00:14:16,570
That said, we now look, we now need to look for a way that we could automate this process.

149
00:14:16,570 --> 00:14:24,700
That is, we want to be able to choose this threshold correctly or rightly, because if, let's say

150
00:14:24,700 --> 00:14:34,660
we take a threshold of say, 0.001, it means that anytime our model predicts less than 0.01, it's

151
00:14:34,660 --> 00:14:44,410
an uninfected and then greater than 0.01 is parasitized, then this will be very dangerous for the overall

152
00:14:44,410 --> 00:14:53,350
model performance as now most times we would have the model predicting that the input is parasitized.

153
00:14:54,340 --> 00:15:04,720
And so our M here is to pick this threshold such that this number of true positives and true negatives

154
00:15:04,720 --> 00:15:07,960
we've had right here don't get reduced.

155
00:15:10,000 --> 00:15:17,980
Now, the way we could look at this is now by using this Rosie plot with this RC plot where you actually

156
00:15:17,980 --> 00:15:26,140
have your is the different true positive rates and false positive rates you would have at a given threshold.

157
00:15:26,140 --> 00:15:31,450
So this means that a point picture is just a given threshold.

158
00:15:31,480 --> 00:15:36,250
Now let's suppose that the threshold 0.5 is about yours.

159
00:15:36,250 --> 00:15:37,180
So let's pick this.

160
00:15:37,180 --> 00:15:39,310
Let's suppose that this is 0.5.

161
00:15:39,310 --> 00:15:42,310
We could pick another threshold.

162
00:15:42,310 --> 00:15:47,590
Let's say this one is say 0.2 and so on and so forth.

163
00:15:47,590 --> 00:15:49,300
Let's say this one is 0.1.

164
00:15:51,580 --> 00:15:57,700
Now, we could have another model with this different Rosie plot, another one with this kind of Rosie

165
00:15:57,730 --> 00:15:58,360
plot.

166
00:15:58,360 --> 00:16:06,970
But note that overall, our aim is to ensure that this false positive rate is minimized and the true

167
00:16:06,970 --> 00:16:08,410
positive rate is maximized.

168
00:16:08,410 --> 00:16:14,590
So if we have an RC plot, which is like this so let's let's re draw this.

169
00:16:14,980 --> 00:16:21,820
If we have this kind of Rosie plot that is one that goes up straight like this and then comes this way,

170
00:16:21,820 --> 00:16:24,550
we have this right here.

171
00:16:24,550 --> 00:16:32,830
So if we have this kind of Rosie plot, then we will be able to pick out this threshold right here,

172
00:16:33,130 --> 00:16:42,910
0.6 We'll be able to pick this threshold because at this threshold for this threshold at this point,

173
00:16:42,910 --> 00:16:46,780
the true positive rate is at as high as value that is of one.

174
00:16:46,990 --> 00:16:51,640
And then the false positive rate is at its lowest possible value that is of zero.

175
00:16:51,910 --> 00:16:54,460
So yeah, we have one and then zero.

176
00:16:54,460 --> 00:16:58,480
So here we go, we have this and then we pick out this threshold.

177
00:16:58,510 --> 00:17:02,590
Now this value of x could be five, could be four or whatever.

178
00:17:02,590 --> 00:17:08,140
So we have zero point whatever value will lead us to this.

179
00:17:09,100 --> 00:17:15,190
Nonetheless, many times we wouldn't have this kind of plots, so we'll do it plus, which will look

180
00:17:15,190 --> 00:17:19,840
like this, this and so on and so forth.

181
00:17:19,870 --> 00:17:26,200
Now, once given a plot like this one, let's suppose we have a plot like this one, the Emir two axis

182
00:17:26,200 --> 00:17:27,550
of the right equation.

183
00:17:27,940 --> 00:17:36,010
If I want to make sure that my recall is always maximized, which is our case, we will try to ensure

184
00:17:36,010 --> 00:17:40,300
that we pick out this points around this.

185
00:17:40,300 --> 00:17:44,360
So we try to pick out this points to the top right here.

186
00:17:44,380 --> 00:17:46,630
Let's take some of this off.

187
00:17:47,230 --> 00:17:53,080
So as we're saying, if we want to maximize our recall, is that normal?

188
00:17:53,080 --> 00:17:58,870
That will pick out threshold values which will take us around this region because it's around this region

189
00:17:58,870 --> 00:18:01,210
that our recall is maximized.

190
00:18:01,240 --> 00:18:02,560
Let's do this.

191
00:18:02,560 --> 00:18:03,790
So you could see that clear.

192
00:18:05,410 --> 00:18:06,070
Okay.

193
00:18:06,070 --> 00:18:12,640
Now, the problem with picking a point around this is that when you pick a point around this, the false

194
00:18:12,640 --> 00:18:14,620
positive rates is increased.

195
00:18:14,620 --> 00:18:20,800
So you need to find that balance between this false positive rate and true positive rate.

196
00:18:20,800 --> 00:18:28,240
So it would be much logical to pick a point around this right here so we could pick around this region

197
00:18:28,240 --> 00:18:28,840
instead.

198
00:18:30,060 --> 00:18:32,630
Of this previous region right here.

199
00:18:32,640 --> 00:18:34,380
Now, you can see that in this region.

200
00:18:34,680 --> 00:18:35,660
Let's say we pick this point.

201
00:18:35,680 --> 00:18:36,840
We pick this point.

202
00:18:37,080 --> 00:18:39,990
Your false positive rate now is smaller.

203
00:18:39,990 --> 00:18:48,030
While your true positive rate is or your recall is maximized, though it isn't the best possible recall

204
00:18:48,030 --> 00:18:48,930
we could have.

205
00:18:48,930 --> 00:18:54,240
But trying to focus on getting that recall of one will lead you into trouble.

206
00:18:54,240 --> 00:18:59,670
Since getting a recall of one in this case will increase our false positive rate.

207
00:19:00,210 --> 00:19:07,440
And then if we're dealing with a problem where we're trying to maximize the precision, then in that

208
00:19:07,440 --> 00:19:12,210
case, we want to ensure that this false positive rate is minimized.

209
00:19:12,210 --> 00:19:16,050
And so in this kind of problems, we want to pick a point around this.

210
00:19:16,050 --> 00:19:23,070
So you see, we want to pick this kind of value since our lists are false, positive rate is minimized.

211
00:19:23,070 --> 00:19:28,830
But then if you want to go and pick a point around this year, you would have a false positive rate

212
00:19:28,830 --> 00:19:29,550
of zero.

213
00:19:29,580 --> 00:19:36,810
But doing this will get you this into trouble because, yeah, you're going to have a true positive

214
00:19:36,810 --> 00:19:38,550
rate, which is very small.

215
00:19:38,550 --> 00:19:40,470
And so you need to get that balance.

216
00:19:41,370 --> 00:19:47,280
Now, if you have a problem where it doesn't really matter that it's you, I've been trying to prioritize

217
00:19:47,280 --> 00:19:53,820
the position or the recall and working with accuracy just fine, then you could pick out this point

218
00:19:53,820 --> 00:19:54,480
right here.

219
00:19:54,480 --> 00:20:01,530
So this is when you are having or you don't have to prioritize on any areas when you're trying to prioritize

220
00:20:01,530 --> 00:20:05,040
on the recall and yours is when you're trying to prioritize on the precision.

221
00:20:05,760 --> 00:20:09,990
But I just want grittiness with this tool that is the rosy plot.

222
00:20:09,990 --> 00:20:17,490
We are able to pick out this point and then automatically get that threshold we need to work with.

223
00:20:18,330 --> 00:20:24,870
And so when doing predictions, we will not or we may not use 0.5, but we're going to use a certain

224
00:20:24,870 --> 00:20:28,020
threshold which will suit these objectives.

225
00:20:28,020 --> 00:20:29,790
We've set initially.

226
00:20:29,790 --> 00:20:35,060
Now we'll now move to the area under the curve for the area under the curve.

227
00:20:35,070 --> 00:20:38,550
We generally use this when we compare comparing to models.

228
00:20:39,450 --> 00:20:46,290
Here we have this model, let's call it model A, It's not it's not actually this Model A is a different

229
00:20:46,290 --> 00:20:48,690
model A let's call this model alpha.

230
00:20:48,690 --> 00:20:54,150
And then we have this auto model which we shall call model beta.

231
00:20:54,150 --> 00:20:56,940
So we have model alpha and model beta.

232
00:20:58,170 --> 00:21:03,540
It's clear that model beta is better because it gives us better options.

233
00:21:03,780 --> 00:21:10,350
Now you'll see that if I find myself here, I get it better.

234
00:21:10,830 --> 00:21:16,710
True positive rate false positive rate balance as compared to when I find myself at this position.

235
00:21:16,890 --> 00:21:23,400
And so if we are comparing this to models, we could make use of the area under the curve by calculating

236
00:21:23,400 --> 00:21:25,050
this area covered here.

237
00:21:25,050 --> 00:21:26,880
So let's bound this.

238
00:21:26,880 --> 00:21:36,150
And then for Alpha, we have this a area under the curve popularly known as a you see a small U and

239
00:21:36,150 --> 00:21:39,480
then see which will give us this.

240
00:21:39,480 --> 00:21:46,350
And then for beta, we're going to have this area under the curve now which covers this area, plus

241
00:21:46,350 --> 00:21:49,470
this extra area right here.

242
00:21:49,470 --> 00:21:56,400
And so in general, if we have two models and then we want to compare them, then we could use this

243
00:21:56,400 --> 00:22:03,360
area under the curve, since it shows us how much freedom we have in playing around with this thresholds.

244
00:22:03,750 --> 00:22:09,900
We now get back to the code and see how we're going to implement this new matrix we've just talked about.

245
00:22:09,900 --> 00:22:17,340
So right here we have Matrix and then we have the false positives, we have the false negatives, we

246
00:22:17,340 --> 00:22:26,580
have the true positives, we have the true negatives, we have the precision, we have the recall,

247
00:22:26,580 --> 00:22:29,730
we have the AUC, AUC.

248
00:22:29,730 --> 00:22:34,770
And then since we're dealing with a binary classification problem here, we have binary accuracy, we

249
00:22:34,770 --> 00:22:36,390
run that, that's fine.

250
00:22:36,810 --> 00:22:44,430
And then right here, instead of having this matrix, we define this matrix list, which will contain

251
00:22:44,430 --> 00:22:47,010
the different matrix which we've just talked of.

252
00:22:47,010 --> 00:22:51,300
So we have the true positive false positive right up to the AUC.

253
00:22:51,300 --> 00:22:55,380
Let's run this compile and then feed our model.

254
00:22:55,530 --> 00:23:01,230
You'll see that as we train we have the true positives, false positives, true negatives, false negatives,

255
00:23:01,230 --> 00:23:06,750
accuracy, precision recall, and AUC scores which are being given to us.

256
00:23:06,780 --> 00:23:13,650
As you could see, the other results we obtained, we're going to go ahead to evaluate our model.

257
00:23:13,890 --> 00:23:16,500
So let's get to the model evaluation.

258
00:23:16,500 --> 00:23:22,770
We run this test data and then we have our model evaluated.

259
00:23:23,640 --> 00:23:24,420
There we go.

260
00:23:24,420 --> 00:23:26,100
Our model has been evaluated.

261
00:23:26,100 --> 00:23:28,800
As you could see, we have this.

262
00:23:28,940 --> 00:23:31,100
Last 0.35.

263
00:23:31,130 --> 00:23:35,560
Number of true positives 1323.

264
00:23:35,570 --> 00:23:37,220
Number of false positives.

265
00:23:37,760 --> 00:23:38,870
Number of true negatives.

266
00:23:38,870 --> 00:23:39,770
False negatives.

267
00:23:39,770 --> 00:23:40,640
Accuracy.

268
00:23:40,640 --> 00:23:44,030
Precision recall and AUC.

269
00:23:44,210 --> 00:23:46,310
So that's what we have for this model.
