1
00:00:02,450 --> 00:00:06,260
This video tutorial we will look at mean average precision.

2
00:00:06,590 --> 00:00:12,830
So mean average precision is a metric that is used to evaluate the performance of object detection models

3
00:00:12,830 --> 00:00:14,030
in computer vision.

4
00:00:14,720 --> 00:00:23,240
For example, if I are fine tune the Yolov5 model on fire data set so that I can detect fire in images,

5
00:00:23,240 --> 00:00:25,310
videos, or in a live webcam feed.

6
00:00:25,310 --> 00:00:32,210
So after I have fine tuned the Yolov5 model on fire data set, and I now have a fire detection model,

7
00:00:32,210 --> 00:00:36,350
how I can evaluate the performance of my fine tune model.

8
00:00:36,380 --> 00:00:42,530
So to evaluate the performance of the fine tuned model, I will be calculating a metric that is mean

9
00:00:42,530 --> 00:00:43,550
average precision.

10
00:00:43,670 --> 00:00:49,580
So mean average precision is a commonly used metric to evaluate the performance of object detection

11
00:00:49,580 --> 00:00:50,240
models.

12
00:00:50,360 --> 00:00:57,350
Many object detection models such as popstar R-cnn, MobileNet SSD, and YOLO series which include Yolov5,

13
00:00:57,380 --> 00:01:05,270
Yolov6, Yolov7, Yolo v eight, neuron V9 use mean average precision to evaluate their models.

14
00:01:05,540 --> 00:01:10,130
So mean average precision formula is based on the following submatrix.

15
00:01:10,130 --> 00:01:15,200
Or we can say that following are the building blocks of mean average precision, which include confusion

16
00:01:15,200 --> 00:01:18,770
matrix, intersection over union, precision, and recall.

17
00:01:18,890 --> 00:01:25,370
So to calculate the precision and recall values, we need to calculate true positives and false positives.

18
00:01:25,370 --> 00:01:31,910
And to calculate the true positives and false positives, we need to have a confusion matrix and to

19
00:01:31,910 --> 00:01:34,550
calculate true positives and false positives.

20
00:01:34,550 --> 00:01:39,470
In confusion matrix, we need to, uh, get the intersection over union.

21
00:01:39,650 --> 00:01:43,160
So these four things are quite interlinked with each other.

22
00:01:44,140 --> 00:01:49,420
So the confusion matrix is based on the following four attributes, which are given over here, though

23
00:01:49,420 --> 00:01:49,990
the confusion.

24
00:01:49,990 --> 00:01:53,650
To calculate the confusion matrix, we need to calculate true positives.

25
00:01:53,890 --> 00:01:58,000
False positives or false negatives and true negatives.

26
00:01:58,000 --> 00:02:00,010
So what is true positives.

27
00:02:00,010 --> 00:02:03,460
So for example I have a car in an image.

28
00:02:03,460 --> 00:02:05,230
So my model predicted a label.

29
00:02:05,230 --> 00:02:11,380
So my model predicted that there is a car inside this image and it matches correctly with my ground

30
00:02:11,380 --> 00:02:11,590
truth.

31
00:02:11,590 --> 00:02:18,610
So in my ground truth prediction or in my ground truth model annotation, I have, uh, annotated my,

32
00:02:18,670 --> 00:02:24,700
uh, the image that there is a car and my model predicted that there is a car in the image as well.

33
00:02:24,700 --> 00:02:28,720
So my model prediction matches with my ground truth as well.

34
00:02:28,720 --> 00:02:35,680
So if my model prediction matches with the ground truth, then we can say that this is a true positive

35
00:02:35,680 --> 00:02:36,400
prediction.

36
00:02:37,660 --> 00:02:44,860
So now you can see that for example, uh, you can see that uh, to understand confusion matrix, let's

37
00:02:44,860 --> 00:02:47,320
take an example of this classification problem.

38
00:02:47,560 --> 00:02:51,250
Uh, a model has to identify whether there are hot dogs in the image.

39
00:02:51,250 --> 00:02:55,480
The model can be either, uh, the prediction can be either correct or incorrect.

40
00:02:55,480 --> 00:03:01,390
So for example, like you can see that, uh, my model predicted that there is a hot dog inside this

41
00:03:01,390 --> 00:03:01,900
image.

42
00:03:01,900 --> 00:03:02,380
Okay.

43
00:03:02,380 --> 00:03:07,930
So this is a true positive because we can see that there is a hot dog inside this image okay.

44
00:03:09,090 --> 00:03:10,710
And then next is true negative.

45
00:03:11,160 --> 00:03:15,780
The model does not predict a label and it is not the part of the ground truth.

46
00:03:15,810 --> 00:03:19,620
Okay, so if I just show you a true negative over here.

47
00:03:21,180 --> 00:03:28,020
So now you can see that, um, we have our donut over here, and my model does not make any prediction.

48
00:03:28,020 --> 00:03:34,320
Like, for example, over here we need to identify whether they are hot dogs in the image or not.

49
00:03:34,320 --> 00:03:34,800
Okay.

50
00:03:34,800 --> 00:03:40,530
So like there is a donut and my model does not make any prediction over here okay.

51
00:03:40,530 --> 00:03:44,220
So this is the true negative because there is no hot dog inside this image.

52
00:03:44,220 --> 00:03:47,160
There is a donut and my model does not make any prediction.

53
00:03:47,520 --> 00:03:51,720
So the model does not predict the label and is not the part of the ground truth.

54
00:03:51,990 --> 00:03:56,100
So here we are trying to identify whether there are hot dogs inside this image.

55
00:03:56,100 --> 00:03:57,840
And you can see there is no hot dog.

56
00:03:57,870 --> 00:03:58,710
There is a donut.

57
00:03:58,710 --> 00:04:04,350
And my model prediction is not hot dog like my model does not predict and label over here.

58
00:04:04,590 --> 00:04:06,360
So this is true negative.

59
00:04:07,290 --> 00:04:09,450
The third one is false positive.

60
00:04:09,690 --> 00:04:13,950
The model predicted a label, but it is not a part of the ground truth.

61
00:04:13,950 --> 00:04:16,230
So you can see over here false positive.

62
00:04:16,290 --> 00:04:19,860
So here you can see we have an ice cream inside this image by model.

63
00:04:19,890 --> 00:04:22,410
But my model predicted that there is a hot dog.

64
00:04:22,620 --> 00:04:26,130
So there is no there is no hot dog inside this image.

65
00:04:26,130 --> 00:04:29,160
So Ben but there is an ice cream inside this image.

66
00:04:29,160 --> 00:04:30,600
So this is a false positive.

67
00:04:31,110 --> 00:04:35,640
The model predicted a label like no model predicted that there is a hot dog, but it is not the part

68
00:04:35,640 --> 00:04:36,390
of the ground truth.

69
00:04:36,390 --> 00:04:38,640
There is no hot dog inside this image.

70
00:04:40,190 --> 00:04:42,230
That the fourth one is false negative.

71
00:04:42,680 --> 00:04:47,720
The modern does not predict it, but it is a part of the ground truth.

72
00:04:47,750 --> 00:04:52,700
Okay, so now you can see that my model does not predict anything like you can see in this image.

73
00:04:52,700 --> 00:04:53,600
The false negative.

74
00:04:53,630 --> 00:04:58,100
There is a hot dog inside this image like we can see over here, here.

75
00:04:58,490 --> 00:05:00,920
But my model does not predict anything.

76
00:05:00,920 --> 00:05:04,280
Like my model is unable to predict that there is a hot dog.

77
00:05:04,280 --> 00:05:08,750
So the model does not predicted a label, but it is a part of the ground truth, like there is a hot

78
00:05:08,750 --> 00:05:09,890
dog inside this image.

79
00:05:10,070 --> 00:05:15,440
The hot dog is a part of the ground truth by model, but my model is unable to predict anything like

80
00:05:15,440 --> 00:05:17,510
model does not predict that there is a hot dog.

81
00:05:19,450 --> 00:05:25,510
So if you want this both things true positive true negatives, false positives and false negatives,

82
00:05:25,510 --> 00:05:27,790
you will have the confusion matrix.

83
00:05:28,270 --> 00:05:31,270
So the next thing is intersection over union.

84
00:05:31,690 --> 00:05:36,820
So intersection over union indicates the overlap of predicted bounding box coordinates to the ground

85
00:05:36,820 --> 00:05:38,560
truth bounding box coordinates.

86
00:05:38,560 --> 00:05:41,380
So now you can see that here we have a stop sign.

87
00:05:41,380 --> 00:05:45,370
So the green color bounding box represents the ground truth bounding box.

88
00:05:46,440 --> 00:05:51,810
And this maroon color are bounding box represents the predicted bounding box okay.

89
00:05:52,200 --> 00:05:52,890
So.

90
00:05:53,970 --> 00:05:59,880
Using intersection over union, we find the overlap of this predicted bounding box with the ground truth

91
00:05:59,880 --> 00:06:00,510
bounding box.

92
00:06:00,510 --> 00:06:00,960
Okay.

93
00:06:01,800 --> 00:06:08,490
So if I get the higher value of IOU like I do, value ranges from 0 to 1 and I get the IOU value of

94
00:06:08,490 --> 00:06:09,570
0.9.

95
00:06:09,570 --> 00:06:15,960
So higher value indicates that the predicted bounding box coordinates closely resembled with the ground

96
00:06:15,960 --> 00:06:17,580
truth bounding box coordinates.

97
00:06:17,610 --> 00:06:23,760
For example, in this case, I will be getting a higher value because my ground truth predicted bounding

98
00:06:23,760 --> 00:06:29,250
box coordinates, which you can see in the maroon color, closely resembles with my ground truth bounding

99
00:06:29,250 --> 00:06:29,970
box coordinates.

100
00:06:29,970 --> 00:06:35,940
Like, you can see that my predicted bounding box is very close to the ground truth bounding box like

101
00:06:35,940 --> 00:06:39,870
they very closely resemble, like the predicted bounding box.

102
00:06:39,870 --> 00:06:43,470
Very closely resembles with the ground truth bounding box.

103
00:06:43,470 --> 00:06:50,490
So in this case we will have a higher IOU, and we use intersection over union to calculate the overlap

104
00:06:50,490 --> 00:06:55,800
between predicted bounding box coordinates with the ground truth bounding box coordinates.

105
00:06:55,950 --> 00:07:02,790
Okay, so IOU intersection over union metric evaluates the correctness of prediction.

106
00:07:02,790 --> 00:07:06,990
So intersection over union value ranges from 0 to 1.

107
00:07:06,990 --> 00:07:07,560
Okay.

108
00:07:07,560 --> 00:07:13,350
With the help of IOU threshold we can decide whether the prediction is true positive false positive

109
00:07:13,380 --> 00:07:15,120
false negative and true negative.

110
00:07:15,120 --> 00:07:20,340
I told you in the start to calculate true positives false positive we need to calculate intersection

111
00:07:20,340 --> 00:07:21,180
over union.

112
00:07:21,510 --> 00:07:25,170
So intersection over union metrics evaluates the correctness of prediction.

113
00:07:25,170 --> 00:07:29,520
So you can see over here we have set the threshold as 0.5.

114
00:07:29,520 --> 00:07:32,910
And I am getting an IOU value of 0.96.

115
00:07:32,910 --> 00:07:36,930
Like you can see that in the red color I have the ground truth bounding box.

116
00:07:36,930 --> 00:07:40,710
And in the blue color this is my predicted bounding box.

117
00:07:40,890 --> 00:07:44,610
So now we can see that with an IOU is 0.96.

118
00:07:44,610 --> 00:07:49,350
Like you can see that, um, the higher IOU means that the predicted bounding box, which is in the

119
00:07:49,350 --> 00:07:52,890
blue color, closely resembles with the ground truth bounding box.

120
00:07:53,340 --> 00:07:59,490
Okay, so there you can see that we are getting a higher IOU, and you can see that our IOU value is

121
00:07:59,490 --> 00:08:03,120
above the IOU threshold, which is 0.5 over here.

122
00:08:03,390 --> 00:08:07,320
So as our IOU value is above the IOU threshold.

123
00:08:07,410 --> 00:08:09,750
So we can say that this is true positive.

124
00:08:10,760 --> 00:08:17,000
Now you can see that here we have threshold 0.5 and I am getting an IOU value of 0.22.

125
00:08:17,030 --> 00:08:23,300
So now you can see that, uh, my IOU value is less than the IOU threshold which we have defined over

126
00:08:23,300 --> 00:08:23,660
here.

127
00:08:23,660 --> 00:08:25,400
So this is a false positive.

128
00:08:25,400 --> 00:08:29,150
And as I told you at the start, what is false positive.

129
00:08:29,180 --> 00:08:33,080
The model predicted a label, but it is not a part of the ground truth.

130
00:08:33,500 --> 00:08:37,640
So if I just show you like you can see that model predicted hot dog.

131
00:08:37,640 --> 00:08:40,040
But there is an ice cream in that image.

132
00:08:40,040 --> 00:08:43,580
So now you can see that the model makes a prediction over here.

133
00:08:43,910 --> 00:08:46,400
Like but this is not a part of the ground truth.

134
00:08:46,400 --> 00:08:49,850
But this avenue which we got is 0.22.

135
00:08:49,850 --> 00:08:53,240
And it is less than the IOU threshold which is a false positive.

136
00:08:54,620 --> 00:08:56,810
Now the next thing is false negative.

137
00:08:56,810 --> 00:09:02,390
So if I show you over here the model does not predict a label, but it is a part of the ground truth.

138
00:09:02,510 --> 00:09:04,070
Like you can see over here.

139
00:09:04,070 --> 00:09:05,870
I have a bird inside this image.

140
00:09:06,140 --> 00:09:09,260
Like I have the ground truth over here, image over here.

141
00:09:09,410 --> 00:09:11,780
But my model is unable to predict anything.

142
00:09:11,780 --> 00:09:15,290
The model does not predict that there is a label a bird in here.

143
00:09:15,290 --> 00:09:18,200
So in this case, the model predicted that there is a bird.

144
00:09:18,200 --> 00:09:21,770
But over here the model is unable to predict anything.

145
00:09:21,770 --> 00:09:23,720
So this is a false negative.

146
00:09:23,720 --> 00:09:26,930
But like you can see we have a bird inside this image.

147
00:09:26,930 --> 00:09:30,560
But my model is unable to predict that there is a bird.

148
00:09:30,560 --> 00:09:33,020
So this is false negative.

149
00:09:33,020 --> 00:09:39,140
And you can see that IOU is 0.00 which is less than the threshold which we have defined 0.5.

150
00:09:40,580 --> 00:09:43,130
So next is precision and recall.

151
00:09:43,130 --> 00:09:45,290
We will discuss the precision and recall.

152
00:09:45,290 --> 00:09:52,520
So precision measures a precision measures how well you point to positives out of all positive predictions

153
00:09:52,520 --> 00:09:53,030
okay.

154
00:09:53,390 --> 00:09:59,060
So precision is equal to true positives divided by true positive plus false positive.

155
00:09:59,060 --> 00:10:06,110
So uh precision is basically how will we can find true positives out of all total positive predictions.

156
00:10:06,500 --> 00:10:13,010
So for example, if I set the threshold 0.5 and this is my ground truth, uh, bounding box and this

157
00:10:13,010 --> 00:10:17,630
is my predicted bounding box, and I am getting an IOU value of 0.7.

158
00:10:17,630 --> 00:10:21,050
So IOU value 0.7 is greater than the threshold.

159
00:10:21,050 --> 00:10:22,400
So this is a true positive.

160
00:10:22,880 --> 00:10:29,240
And here you can see that I am just getting an IOU value of 0.3 which is less than the threshold which

161
00:10:29,240 --> 00:10:30,890
I have defined 0.5.

162
00:10:30,890 --> 00:10:32,960
So this is the false positive.

163
00:10:34,880 --> 00:10:36,710
So next we have recall.

164
00:10:36,740 --> 00:10:42,290
So recall measures how well we can find true positives out of all ground truth.

165
00:10:42,470 --> 00:10:47,630
So recall is equal to true positives divided by true positives plus false negative.

166
00:10:47,660 --> 00:10:52,520
So recall measures how well it can find true positives out of all ground truths.

167
00:10:52,550 --> 00:10:53,000
Okay.

168
00:10:53,000 --> 00:10:56,000
And the value of recall ranges from 0 to 1.

169
00:10:56,480 --> 00:10:59,570
So next we will see how we can calculate average precision.

170
00:10:59,600 --> 00:11:03,890
So please remember average precision is not the average of precision.

171
00:11:04,370 --> 00:11:08,900
Average precision is basically the area under the precision recall curve.

172
00:11:09,440 --> 00:11:12,140
So here we will be calculating the average precision.

173
00:11:12,140 --> 00:11:15,950
Mainly the average precision is not the average of precision.

174
00:11:15,980 --> 00:11:16,730
Please remember.

175
00:11:18,460 --> 00:11:18,730
Post.

176
00:11:18,730 --> 00:11:25,030
We are using the Yolo V8 model to do object detection on an image like you can see that, uh, we have,

177
00:11:25,030 --> 00:11:28,780
uh, we are using an object model to do object detection on this image.

178
00:11:28,780 --> 00:11:32,170
You can see over here we already have ground truth image.

179
00:11:32,170 --> 00:11:33,820
So this is the ground truth image.

180
00:11:33,820 --> 00:11:35,980
Like you can say this is our input image.

181
00:11:35,980 --> 00:11:40,360
We already have the ground truth image like we have the input image with annotations.

182
00:11:40,360 --> 00:11:45,190
Like you can see that I have uh, this is my ground truth image with annotations.

183
00:11:45,190 --> 00:11:49,630
Or you can say this is my, uh, input image with annotations.

184
00:11:49,630 --> 00:11:52,990
Like you can see that we have annotated person two persons over here.

185
00:11:52,990 --> 00:11:55,240
We have annotated all the dogs.

186
00:11:55,240 --> 00:11:58,210
Over here we have an annotated a teddy bear over here.

187
00:11:58,210 --> 00:12:03,700
So the, uh, my ground truth image has two person one this one one this one and 12 dogs.

188
00:12:03,700 --> 00:12:05,980
So if you count all this, these are 12 dogs.

189
00:12:05,980 --> 00:12:08,710
And I have one teddy like you can see over here.

190
00:12:08,710 --> 00:12:10,510
And this is one truck.

191
00:12:11,860 --> 00:12:12,340
Okay.

192
00:12:12,340 --> 00:12:16,990
So now I will be using Yolo V8 model to do object detection on this image.

193
00:12:16,990 --> 00:12:24,790
And from Yolo V8 model I have these predictions like the Yolo V8 model predicted seven dogs like while

194
00:12:24,790 --> 00:12:26,980
I have 12 dogs in my ground truth image.

195
00:12:26,980 --> 00:12:32,920
But the My Object model predicted that there are seven dogs three Teddy while there is one teddy in

196
00:12:32,920 --> 00:12:36,460
my ground truth image, but my yolo V8 model predicted that there are three.

197
00:12:36,460 --> 00:12:39,430
Riding like this is the teddy, which is a wrong prediction.

198
00:12:39,430 --> 00:12:41,410
This is the teddy, which is a wrong prediction.

199
00:12:41,410 --> 00:12:47,170
One person, although there are two persons, but they detected only one person, one sheep.

200
00:12:47,170 --> 00:12:48,010
There is no sheep.

201
00:12:48,010 --> 00:12:49,510
This is wrong prediction.

202
00:12:49,510 --> 00:12:51,370
And one duck there is one truck.

203
00:12:52,740 --> 00:12:55,260
So these are the predictions from the YOLO model.

204
00:12:55,260 --> 00:12:59,820
So like you can see that we have drop top.

205
00:13:00,660 --> 00:13:01,500
Person, daddy.

206
00:13:01,500 --> 00:13:03,690
So we have multiple classes inside.

207
00:13:03,720 --> 00:13:05,820
Inside this image okay.

208
00:13:06,360 --> 00:13:06,780
Okay.

209
00:13:06,780 --> 00:13:10,740
So we will get creating the average precision class wise.

210
00:13:10,740 --> 00:13:13,980
Like first we'll get average precision for the dog class.

211
00:13:13,980 --> 00:13:16,680
Then we will calculate average precision for the Teddy class.

212
00:13:16,680 --> 00:13:19,320
Then we will get get average precision for the person class.

213
00:13:19,320 --> 00:13:23,100
And then we will calculate the average precision for the truck class okay.

214
00:13:24,930 --> 00:13:26,370
Like you can see over here.

215
00:13:26,370 --> 00:13:29,730
I have just added all the predictions over here.

216
00:13:30,330 --> 00:13:33,330
The light detections like you can see this is the field.

217
00:13:33,330 --> 00:13:34,560
This is false positives.

218
00:13:34,560 --> 00:13:39,150
And here you can see we have true positive true positive false positive.

219
00:13:39,150 --> 00:13:43,170
Like uh this is a false positive prediction as well.

220
00:13:43,170 --> 00:13:45,000
This is not a dog in over here.

221
00:13:45,330 --> 00:13:48,360
So like you can see we have the true positive true positive.

222
00:13:48,360 --> 00:13:52,650
And here we have the confidence score which is provided over here as well.

223
00:13:52,800 --> 00:13:58,470
So these are true positive true positive true positive false positive true positive true positive and

224
00:13:58,470 --> 00:13:59,160
false positive.

225
00:13:59,340 --> 00:14:02,850
So now I will show you how you can calculate precision and recall.

226
00:14:03,150 --> 00:14:05,790
Uh there is mistake like you can see over here.

227
00:14:06,090 --> 00:14:07,980
Uh there we have it.

228
00:14:07,980 --> 00:14:08,940
We should have 12.

229
00:14:10,280 --> 00:14:10,820
Okay.

230
00:14:11,730 --> 00:14:14,430
So we will be dividing it one divided by 12.

231
00:14:14,460 --> 00:14:17,430
Okay, so I told you what is precision?

232
00:14:18,980 --> 00:14:19,880
Bishan is.

233
00:14:21,070 --> 00:14:25,300
How will you find true positive out of total all positive predictions?

234
00:14:25,330 --> 00:14:25,840
Okay.

235
00:14:28,510 --> 00:14:30,400
Or if I show you over here.

236
00:14:30,400 --> 00:14:37,780
So this is I'm using the precision formula of total positives and true positives out of all positive

237
00:14:37,780 --> 00:14:38,920
predictions okay.

238
00:14:38,920 --> 00:14:44,530
And if I show you recall formula over here, recall measures how well you can find true positives out

239
00:14:44,530 --> 00:14:45,700
of all ground truth.

240
00:14:46,090 --> 00:14:49,720
So like you can see that I have 12 ground truth.

241
00:14:50,020 --> 00:14:51,790
Uh, annotations.

242
00:14:51,790 --> 00:14:55,420
I have 12 annotations for the dog class in the input image.

243
00:14:55,420 --> 00:14:57,550
Like you can see we have 12 over here.

244
00:14:58,390 --> 00:14:58,690
Okay.

245
00:14:58,690 --> 00:15:02,530
And recall is how well you can find true positive out of all ground truth okay.

246
00:15:03,100 --> 00:15:08,200
So here you can see this is not 16 in here we should have 12.

247
00:15:08,320 --> 00:15:09,820
Here we will also have 12.

248
00:15:09,850 --> 00:15:11,770
So here we will also have 12.

249
00:15:11,770 --> 00:15:20,830
So we will be dividing it by 12 because we have uh 12 annotations in the ground uh in the input image

250
00:15:20,830 --> 00:15:22,150
for the dog class.

251
00:15:22,150 --> 00:15:22,480
Okay.

252
00:15:22,480 --> 00:15:27,070
So you can see the annotations in the input image for the dog class.

253
00:15:28,660 --> 00:15:29,230
Okay.

254
00:15:29,230 --> 00:15:33,010
So now you can see that in the first prediction this is true positive.

255
00:15:33,010 --> 00:15:35,500
And we have the cumulative true positive one.

256
00:15:35,500 --> 00:15:37,810
And currently we have false positive as zero.

257
00:15:37,810 --> 00:15:41,950
So one divided by one plus zero which you can see our formula over here.

258
00:15:41,950 --> 00:15:45,250
And we have total ground truth for the door class S12.

259
00:15:45,250 --> 00:15:49,480
So if we do divide one divided by 12 this comes to be 0.08.

260
00:15:49,480 --> 00:15:50,890
So you can check it as well.

261
00:15:50,890 --> 00:15:53,020
One divided by 12 is 0.08.

262
00:15:53,170 --> 00:15:55,360
So now next is 0.91.

263
00:15:55,360 --> 00:15:56,650
This is a false positive.

264
00:15:56,650 --> 00:15:58,210
And here you can see we have one.

265
00:15:58,210 --> 00:16:01,510
So now in the formula we get the answer 0.5.

266
00:16:01,510 --> 00:16:04,450
And if I divide one divided by.

267
00:16:05,250 --> 00:16:09,720
612 is 0.08 because in recall we took cumulative true positive.

268
00:16:09,900 --> 00:16:12,270
Now here you can see we have true positive again.

269
00:16:12,270 --> 00:16:14,520
So it comes to and last one.

270
00:16:14,520 --> 00:16:16,770
So you can see over here.

271
00:16:18,020 --> 00:16:19,100
Two divided by five.

272
00:16:19,100 --> 00:16:21,800
It turns out to be 0.66.

273
00:16:25,060 --> 00:16:25,570
Biologist.

274
00:16:25,570 --> 00:16:26,950
There is a mistake as well.

275
00:16:26,950 --> 00:16:28,300
So here we should have.

276
00:16:29,570 --> 00:16:31,280
One over here as well.

277
00:16:31,280 --> 00:16:34,400
So I just add this over here.

278
00:16:35,330 --> 00:16:35,630
Okay.

279
00:16:35,870 --> 00:16:36,860
So.

280
00:16:37,430 --> 00:16:41,990
So now you can see if you divide two divided by three it comes out to be 0.66.

281
00:16:41,990 --> 00:16:45,500
Because here we shouldn't have three because two plus one.

282
00:16:45,740 --> 00:16:46,130
Okay.

283
00:16:46,130 --> 00:16:47,390
And here we have two.

284
00:16:47,390 --> 00:16:53,240
And if you divide two by wealth you will get 0.16 in output okay.

285
00:16:54,950 --> 00:16:56,750
Then you can see we have false positive.

286
00:16:56,750 --> 00:16:59,930
And in the similar way you will make this calculations as well.

287
00:16:59,930 --> 00:17:05,480
And we will make this calculation for all the predictions that you get from the V8 model.

288
00:17:05,480 --> 00:17:10,340
And now we will plot the CNN graph like you can see over here.

289
00:17:10,550 --> 00:17:17,030
Uh, so here we will be using uh, calculating average precision using Pascal VOC 11 point interpolation

290
00:17:17,030 --> 00:17:17,510
method.

291
00:17:17,510 --> 00:17:23,000
So to calculate average precision I will be using Pascal VOC 11 point interpolation method.

292
00:17:23,240 --> 00:17:28,340
The Pascal VOC 11 point interpolation method was introduced in 2007.

293
00:17:28,340 --> 00:17:35,060
Pascal VOC challenge, where uh precision values are recorded over 11, equally spaced.

294
00:17:36,300 --> 00:17:37,440
Recall values.

295
00:17:37,440 --> 00:17:39,660
Average precision is defined as follows.

296
00:17:39,690 --> 00:17:40,440
Average precision.

297
00:17:40,470 --> 00:17:45,240
Average precision is one divided by 11 sum of 11 interpreted precision.

298
00:17:45,240 --> 00:17:47,520
So in the x axis you will have the.

299
00:17:48,700 --> 00:17:51,910
Recall, and in the y axis you will have the precision.

300
00:17:54,120 --> 00:17:57,510
So precision values are interpolated across 11 recall values.

301
00:17:57,510 --> 00:18:02,310
The interpolated precision is the maximum precision corresponding the recall value greater than the

302
00:18:02,760 --> 00:18:03,780
brand recall value.

303
00:18:03,780 --> 00:18:08,010
So you can see that, uh, from the values which I have bought over here.

304
00:18:08,130 --> 00:18:08,700
Okay.

305
00:18:08,910 --> 00:18:13,440
So I have been using these values and I have created this precision recall graph.

306
00:18:13,890 --> 00:18:14,370
Okay.

307
00:18:14,460 --> 00:18:20,010
So basically I told you at the start average precision is the area under the precision recall curve

308
00:18:20,250 --> 00:18:20,490
okay.

309
00:18:20,730 --> 00:18:26,790
So like you can see that I have kept lot final interpolated graph and calculate average precision for

310
00:18:26,790 --> 00:18:27,690
that dog class.

311
00:18:28,080 --> 00:18:32,010
So you can see that I have created this precision recall curve over here.

312
00:18:32,010 --> 00:18:35,370
And you can I show I told you at here as well.

313
00:18:36,600 --> 00:18:41,220
The interpolated precision is the maximum precision corresponding to the recall value.

314
00:18:41,220 --> 00:18:44,880
So if I just see over here this is the maximum precision.

315
00:18:44,880 --> 00:18:47,760
Over here I can see this maximum precision value over here.

316
00:18:47,760 --> 00:18:48,720
So.

317
00:18:49,380 --> 00:18:54,420
The interpolated precision is the maximum precision corresponding to the recall value, so this is the

318
00:18:54,420 --> 00:18:55,410
maximum precision.

319
00:18:55,470 --> 00:18:59,250
And here you can see I've created the average precision for the dog class.

320
00:18:59,250 --> 00:19:05,580
So if you can see over here at 0.0 we have 1.0.

321
00:19:05,580 --> 00:19:15,930
So we have only one uh for was the at 0.0 and 1.054, 0.1, 0.2, 0.30.4, I am at 0.71.

322
00:19:15,930 --> 00:19:18,180
So for multiply by 0.71.

323
00:19:18,180 --> 00:19:26,940
And in the next six values like 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, I am at zero.

324
00:19:26,940 --> 00:19:28,470
So six multiply zero.

325
00:19:28,470 --> 00:19:33,720
And I am just getting average precision for the dog class as 35 4.9%.

326
00:19:35,290 --> 00:19:40,330
So in the same way, I will get average precision for the person class and just calculate this manually

327
00:19:40,330 --> 00:19:41,110
on paper.

328
00:19:41,110 --> 00:19:47,260
So I as a same approach, I'm just calculating the precision and recall values which I just followed

329
00:19:47,260 --> 00:19:48,130
over here.

330
00:19:48,670 --> 00:19:52,690
So and I'm just completing drawing the precision recall curve.

331
00:19:52,690 --> 00:19:56,800
And you can see over here I'm just getting a value of 0.545.

332
00:19:56,800 --> 00:20:00,910
And in the same way I'm also calculating the average precision for the drug class.

333
00:20:01,210 --> 00:20:04,450
And here I'm just getting the average precision value as one.

334
00:20:04,780 --> 00:20:06,280
The same approach I'm following.

335
00:20:06,430 --> 00:20:07,360
Just I told you about.

336
00:20:07,360 --> 00:20:10,900
And here I'm just calculating the average precision of the sheep and teddy class.

337
00:20:11,290 --> 00:20:14,590
And the same approach which I showed you about.

338
00:20:15,800 --> 00:20:16,580
And here I am.

339
00:20:16,580 --> 00:20:18,140
This mean average precision.

340
00:20:18,230 --> 00:20:23,450
So mean average precision is the average of average precision over all detected classes.

341
00:20:23,450 --> 00:20:27,470
So I have you can see that if I just show above over here.

342
00:20:28,320 --> 00:20:33,870
Okay, so my model detected dog teddy person sheep truck.

343
00:20:33,870 --> 00:20:37,500
So five different classes that my model detected.

344
00:20:37,890 --> 00:20:38,430
Okay.

345
00:20:38,880 --> 00:20:39,810
And.

346
00:20:41,660 --> 00:20:45,890
I just calculated the average precision for each of these class as well.

347
00:20:45,890 --> 00:20:50,000
Like I showed you over here, I could put average precision for each of these class.

348
00:20:50,000 --> 00:20:52,070
So we have total number of five classes.

349
00:20:52,070 --> 00:20:56,150
And I just kept pretty average precision for each of the class which I showed you above.

350
00:20:56,540 --> 00:20:56,840
Okay.

351
00:20:56,840 --> 00:21:01,400
And I'm just getting a mean average precision score 47.88%.

352
00:21:01,400 --> 00:21:09,140
So here you can see that, uh, I am what I am using is I am using Pascal VOC level point interpolation

353
00:21:09,140 --> 00:21:09,560
method.

354
00:21:09,560 --> 00:21:16,340
So I have used Pascal VOC 11 pointer interpolation method to calculate average precision okay.

355
00:21:16,340 --> 00:21:20,660
So calculate average precision using Pascal VOC 11 point interpolation method.

356
00:21:23,060 --> 00:21:23,570
Okay.

357
00:21:23,570 --> 00:21:24,680
So.

358
00:21:26,170 --> 00:21:31,900
So here you can see in Pascal VOC 11 by integration method I average precision is calculated at IOU

359
00:21:31,930 --> 00:21:33,580
threshold of 0.5.

360
00:21:33,580 --> 00:21:34,030
Okay.

361
00:21:34,030 --> 00:21:40,840
But currently, uh, the 101. iteration method is used to get rid average precision.

362
00:21:40,900 --> 00:21:49,270
So currently in object detection models like YOLO, YOLO, yolo v7, Yolo v6 101 Inter point interpolation

363
00:21:49,270 --> 00:21:57,820
method is used to calculate average precision, so Mscoco introduced 101. interpolation method to calculate

364
00:21:57,820 --> 00:21:59,830
average precision in 2014.

365
00:21:59,830 --> 00:22:04,690
It is better approximation of accuracy under the precision recall curves okay.

366
00:22:07,220 --> 00:22:09,800
In Pascal VOC 11 point term regression method.

367
00:22:09,800 --> 00:22:14,180
Average precision is calculated at IOU threshold of 0.5, like I showed you above.

368
00:22:14,180 --> 00:22:19,100
So we have calculated the average precision considering an IOU threshold of 0.5.

369
00:22:19,520 --> 00:22:23,210
But currently the approach which is followed is mscoco 101.

370
00:22:23,210 --> 00:22:26,660
Interpret 101. interpolation average precision.

371
00:22:26,660 --> 00:22:33,530
So in this case uh, the average precision uh, or you can say the mean average precision is being calculated,

372
00:22:33,800 --> 00:22:40,280
uh, considering a set of ten different thresholds, which range from 0.5 A to 0.95.

373
00:22:40,280 --> 00:22:47,270
So in current mscoco 101 interpolation method, the mean average precision is calculated, uh, from

374
00:22:47,270 --> 00:22:54,200
IOU 0.5 to 0.95 and with a step size of 0.05.

375
00:22:54,500 --> 00:23:00,050
So coco mean average calculate mean average precision is calculated for a set of ten different thresholds

376
00:23:00,050 --> 00:23:01,250
and then average.

377
00:23:01,250 --> 00:23:07,790
It ranges from 0.5 to 0.95 at a step of 0.05.

378
00:23:09,020 --> 00:23:13,970
So before I miss Coco, the mean average precision was calibrated at higher threshold of 0.5.

379
00:23:13,970 --> 00:23:20,750
But now the mean average precision is calibrated at an I o threshold from 0.5 to 0.95, with a step

380
00:23:20,750 --> 00:23:23,750
size or step frequency of 0.05.

381
00:23:23,840 --> 00:23:29,360
At present, Miss Coco introduced 1.1101. interpolation.

382
00:23:29,390 --> 00:23:34,040
Average precision is accepted is accepted at the standard metric.

383
00:23:34,310 --> 00:23:38,300
So currently Miss Coco introduced 101. interpolation.

384
00:23:38,330 --> 00:23:45,080
Average precision is being accepted as the standard metric, but in this tutorial we use Pascal VOC

385
00:23:45,080 --> 00:23:48,920
11 point interpolation method to calculate average precision.

386
00:23:48,920 --> 00:23:50,630
So that's all from this tutorial.

387
00:23:50,630 --> 00:23:51,710
Thank you for watching.