1
00:00:02,940 --> 00:00:10,860
In this video tutorial, I will be testing the Yolo v9 model performance on image video and on the live

2
00:00:10,860 --> 00:00:11,970
webcam feed.

3
00:00:11,970 --> 00:00:14,250
So let's get started with it.

4
00:00:14,700 --> 00:00:18,990
I have divided this complete tutorial into six different steps.

5
00:00:18,990 --> 00:00:22,530
In the step number one, I will go on the yolo v9 GitHub repo.

6
00:00:23,160 --> 00:00:28,260
In the step number two, I will create a virtual environment which is recommended if you don't want

7
00:00:28,260 --> 00:00:30,120
to disturb your Python packages.

8
00:00:30,600 --> 00:00:35,490
Then after creating a virtual environment in the step number three, I will install all the required

9
00:00:35,490 --> 00:00:41,970
packages or libraries that are required to do object detection or image video and on the live webcam

10
00:00:41,970 --> 00:00:42,480
feed.

11
00:00:42,870 --> 00:00:46,530
In the step number four, I will do object detection on image.

12
00:00:46,530 --> 00:00:52,350
I will be showing, showing you how you can play with different hyper parameters and adjust them as

13
00:00:52,350 --> 00:00:53,850
per your requirements.

14
00:00:53,850 --> 00:00:57,480
Then in the step number five, we will be doing object detection on video.

15
00:00:57,480 --> 00:01:02,310
And in the last step I will show you how you can do object detection on the live webcam feed.

16
00:01:02,310 --> 00:01:04,500
So let's get started with it.

17
00:01:05,010 --> 00:01:10,650
So over here you can see that I have created a project by the name testing YOLO Benign Model Performance.

18
00:01:10,650 --> 00:01:12,390
In my local app directory.

19
00:01:12,390 --> 00:01:17,100
You can create a project in any of your directory, and I am using PyCharm Community Edition.

20
00:01:17,100 --> 00:01:20,190
And I have Python 3.10 installed.

21
00:01:20,520 --> 00:01:21,030
Okay.

22
00:01:22,990 --> 00:01:23,170
Going.

23
00:01:23,170 --> 00:01:26,020
The step number one, we need to clone the Yolov5 and GitHub repo.

24
00:01:26,020 --> 00:01:28,000
So I will just go over here.

25
00:01:30,030 --> 00:01:37,290
Yolo v9 GitHub and I will just open the first link from here, and just go over here and just copy the

26
00:01:37,290 --> 00:01:40,470
URL to the clipboard, and then go back to the PyCharm.

27
00:01:40,470 --> 00:01:48,810
And over here I will just write let clone and just uh, pass the link or copy the URL link, which I

28
00:01:48,810 --> 00:01:49,320
pass.

29
00:01:49,560 --> 00:01:51,420
Uh, just paste the link over here.

30
00:01:51,540 --> 00:01:54,600
So now I will clone the complete Golovnin GitHub repo.

31
00:01:59,050 --> 00:02:02,230
So now in the next step, I will just go to the clone folder.

32
00:02:02,230 --> 00:02:04,450
So if I just, uh, refresh.

33
00:02:04,450 --> 00:02:04,690
Okay.

34
00:02:04,690 --> 00:02:06,010
No need, no need to refresh.

35
00:02:06,010 --> 00:02:08,080
We can find this over here.

36
00:02:08,080 --> 00:02:12,790
So we have all the benign repository being cloned over here.

37
00:02:12,790 --> 00:02:15,850
And we have all the required files we can see over here.

38
00:02:16,120 --> 00:02:16,420
Okay.

39
00:02:16,420 --> 00:02:16,780
So.

40
00:02:17,370 --> 00:02:20,280
So in the next step I will just go to the clone folder.

41
00:02:20,280 --> 00:02:23,550
So I will just set CD current directory yolo v nine.

42
00:02:23,550 --> 00:02:25,800
So now we are going to the clone folder.

43
00:02:26,190 --> 00:02:28,200
So now we are here at this clone folder.

44
00:02:28,200 --> 00:02:28,470
Now.

45
00:02:30,670 --> 00:02:33,370
So now we have done with the step number one.

46
00:02:33,370 --> 00:02:36,310
Then the step number two will create a virtual environment.

47
00:02:36,790 --> 00:02:37,180
Okay.

48
00:02:37,180 --> 00:02:40,480
So to create a virtual environment I will write over here.

49
00:02:42,500 --> 00:02:45,140
I on, I am.

50
00:02:48,340 --> 00:02:53,290
Virtual environment, and I will be creating a virtual environment by the name YOLO v9 testing.

51
00:02:53,290 --> 00:02:55,750
So this will be the name of my virtual environment.

52
00:02:55,780 --> 00:02:56,980
Yolo v9 testing.

53
00:03:02,430 --> 00:03:04,110
This will take few more seconds.

54
00:03:12,490 --> 00:03:12,790
It will.

55
00:03:12,790 --> 00:03:14,830
Let's wait for it to get completed.

56
00:03:14,830 --> 00:03:16,000
Okay, so it's done.

57
00:03:16,540 --> 00:03:17,770
So if I just.

58
00:03:18,880 --> 00:03:19,540
Your ballot.

59
00:03:19,540 --> 00:03:22,150
Here we have this jewelry nine.

60
00:03:24,040 --> 00:03:25,630
Let's think forward here.

61
00:03:27,820 --> 00:03:28,210
Benign.

62
00:03:28,210 --> 00:03:30,730
The next step we will go to this scripts directory.

63
00:03:30,730 --> 00:03:31,960
So I will just write.

64
00:03:34,450 --> 00:03:37,000
First we need to go to the YOLO v9 testing.

65
00:03:37,000 --> 00:03:40,330
So I will just write YOLO in testing folder.

66
00:03:41,360 --> 00:03:41,750
Okay.

67
00:03:41,750 --> 00:03:44,300
And then we need to go to the scripts directory.

68
00:03:44,300 --> 00:03:47,480
So I will just write CD scripts.

69
00:03:50,040 --> 00:03:51,990
And I will just activate this.

70
00:03:54,080 --> 00:03:56,150
Okay, so it's now activated.

71
00:03:56,900 --> 00:03:59,000
So now I will just go and.

72
00:04:03,390 --> 00:04:04,290
But that's done.

73
00:04:04,290 --> 00:04:09,390
So now in the next step, what we'll do is we will, uh, install all the required packages that we

74
00:04:09,390 --> 00:04:12,120
have in the requirements.txt file.

75
00:04:12,120 --> 00:04:18,600
So if I open the requirements.txt file, we can see all the required packages that are required to do

76
00:04:18,600 --> 00:04:22,950
object detection on image video and on the live webcam feed using YOLO.

77
00:04:22,950 --> 00:04:24,930
Benign are being listed over here.

78
00:04:25,140 --> 00:04:32,880
To install all these required packages we just need to write pip install minus r requirements dot txt

79
00:04:32,880 --> 00:04:34,530
and let's enter.

80
00:04:48,050 --> 00:04:51,290
So this package installation will take time.

81
00:04:51,290 --> 00:04:55,430
So if, uh, you can see over here I have written minus r.

82
00:04:55,430 --> 00:04:58,460
So this is basically a hyphen RI1.

83
00:04:58,460 --> 00:04:59,900
You can say that hyphen r.

84
00:04:59,900 --> 00:05:07,760
So basically it means that uh recursively for each library will be installed that are listed in this

85
00:05:07,820 --> 00:05:09,680
requirements.txt file.

86
00:05:10,190 --> 00:05:13,400
So this package installation will take some time.

87
00:05:13,400 --> 00:05:15,230
So let's wait for it to get completed.

88
00:05:19,670 --> 00:05:23,090
Now you can see that all the required packages are being installed.

89
00:05:23,090 --> 00:05:25,850
So now we will do object detection on image.

90
00:05:26,090 --> 00:05:29,540
So I will just go to the GitHub repo over here.

91
00:05:30,370 --> 00:05:32,770
So if I just go down over here.

92
00:05:32,770 --> 00:05:35,260
So now we will be using the following nine models.

93
00:05:35,260 --> 00:05:41,890
And to do object detection on image using following nine model we will be using the detect.py script

94
00:05:41,890 --> 00:05:42,340
file.

95
00:05:42,430 --> 00:05:46,510
And then we will use this command okay.

96
00:05:46,930 --> 00:05:49,630
So let's move forward here.

97
00:05:49,630 --> 00:05:51,370
So detect-2.py.

98
00:05:51,370 --> 00:05:53,860
So let's see how we can do object detection on image.

99
00:05:53,860 --> 00:05:57,670
So I will be using Python detect dash dot py.

100
00:05:58,090 --> 00:06:02,230
And I have just created a folder by the name over here resources.

101
00:06:02,290 --> 00:06:05,800
So here I've added three different images okay.

102
00:06:05,800 --> 00:06:08,110
So now we will be doing object detection on these images.

103
00:06:08,110 --> 00:06:13,630
And I have also added two videos over here as well okay so that's fine.

104
00:06:13,900 --> 00:06:16,690
So then I will just define.

105
00:06:17,710 --> 00:06:19,840
Thought over here as.

106
00:06:22,120 --> 00:06:25,270
We'll go to the resource space for the.

107
00:06:29,140 --> 00:06:33,700
And then I will be doing object detection on image one dot jpg first.

108
00:06:34,390 --> 00:06:37,480
Okay, so I just need to pass the file name over here.

109
00:06:37,480 --> 00:06:38,440
The spelling of.

110
00:06:39,510 --> 00:06:40,920
Beaux Arts.

111
00:06:40,920 --> 00:06:41,280
Correct.

112
00:06:42,090 --> 00:06:45,210
So let me add weights over here as well.

113
00:06:45,210 --> 00:06:47,730
So if you just go over here.

114
00:06:48,210 --> 00:06:53,340
So and if you just if over here so go releases over here.

115
00:06:53,730 --> 00:06:57,240
So now you can see that you'll be nine comes with four different models.

116
00:06:57,240 --> 00:07:03,540
But currently we have uh nine weights available for two models which is YOLO v9 compact and YOLO v9

117
00:07:03,540 --> 00:07:04,140
extended.

118
00:07:04,290 --> 00:07:07,950
So I will be downloading the YOLO v9 compact model weights.

119
00:07:07,950 --> 00:07:11,880
And we will be doing testing using the YOLO v9 compact model weights.

120
00:07:11,910 --> 00:07:15,120
So now you can see that V9 comes with different models.

121
00:07:15,120 --> 00:07:20,310
But currently the weights of YOLO one and compact model and the V9 extended model are available.

122
00:07:20,310 --> 00:07:23,040
And we will be using the YOLO v9 compact model weights.

123
00:07:23,040 --> 00:07:29,550
Although YOLO v9 extended model weight is better, gives better accuracy than YOLO, and compact model

124
00:07:29,550 --> 00:07:33,480
like YOLO v nine extended model outperforms uh income.

125
00:07:33,480 --> 00:07:36,900
Uh, all the other YOLO v9 models like you can see over here.

126
00:07:39,210 --> 00:07:46,020
Well, now you can see the file is downloaded and I will just go over here, go under downloads.

127
00:07:46,650 --> 00:07:49,380
Copy this from here and go into my.

128
00:07:50,680 --> 00:07:55,300
Wandered over here and I'll just play it again.

129
00:07:59,710 --> 00:08:03,040
And so there we have that model.

130
00:08:03,340 --> 00:08:05,110
And that's all you need to do.

131
00:08:06,820 --> 00:08:07,240
Okay.

132
00:08:07,240 --> 00:08:09,310
So let's go backward here.

133
00:08:09,550 --> 00:08:14,170
And if you just see over here we have the model weights available over here.

134
00:08:14,530 --> 00:08:17,860
And if I just go below down over here.

135
00:08:25,450 --> 00:08:26,110
Okay.

136
00:08:26,830 --> 00:08:29,260
So I will just passing the normal weights.

137
00:08:29,260 --> 00:08:32,830
So first the model weights I will be writing weights.

138
00:08:34,780 --> 00:08:38,260
Then we will just go to the weights rectory.

139
00:08:38,260 --> 00:08:41,770
And over here we have the following line.

140
00:08:42,600 --> 00:08:43,200
Bad.

141
00:08:43,200 --> 00:08:47,670
Seen it all and let's see how it goes.

142
00:08:49,230 --> 00:08:51,090
So there are some stay over there.

143
00:08:56,540 --> 00:08:59,180
Here is our Lord and let's see how it works now.

144
00:09:14,780 --> 00:09:16,760
But this will take two more seconds.

145
00:09:18,020 --> 00:09:23,810
I don't have a GPU available so I'm doing are testing it on my CPU.

146
00:09:23,810 --> 00:09:24,920
Okay now.

147
00:09:25,070 --> 00:09:25,580
So here.

148
00:09:29,420 --> 00:09:32,510
Opening, it will work fine, but let's see.

149
00:09:45,040 --> 00:09:45,430
Okay.

150
00:09:45,430 --> 00:09:50,020
So now we have the output into this runs segmentation do image one.

151
00:09:50,020 --> 00:09:54,520
So this is our output image which you can see over here.

152
00:09:55,090 --> 00:09:55,570
Okay.

153
00:09:56,140 --> 00:10:04,150
So now you can see over here we are able to track the person dog okay.

154
00:10:04,150 --> 00:10:10,600
And here you can see that uh the car is quite blurred but our model is available uh also able to detect

155
00:10:10,600 --> 00:10:11,890
that car as well.

156
00:10:12,250 --> 00:10:14,500
But, uh, this is a wrong detection.

157
00:10:14,500 --> 00:10:16,210
This is complete backpack.

158
00:10:17,110 --> 00:10:20,830
Okay, so now one wrong direction is look fine.

159
00:10:21,430 --> 00:10:22,000
Okay.

160
00:10:22,000 --> 00:10:30,580
So, for example, if, uh, you got a case where you have very small objects, okay.

161
00:10:30,790 --> 00:10:36,520
And you have to decrease down the confidence value, for example, currently my confidence threshold

162
00:10:36,520 --> 00:10:38,770
is being set as 0.25.

163
00:10:39,310 --> 00:10:43,120
Like you can see over here, the default confidence value is 0.25.

164
00:10:43,600 --> 00:10:45,940
So what is this confidence value means.

165
00:10:46,030 --> 00:10:46,870
So.

166
00:10:47,510 --> 00:10:54,830
After doing uh, so object detection using yolo v9, we will only be drawing the bounding boxes around

167
00:10:54,830 --> 00:10:55,220
that.

168
00:10:55,220 --> 00:11:01,880
Around around that, objects that are on that, our confidence threshold or confidence value above 25%.

169
00:11:02,030 --> 00:11:02,360
Okay.

170
00:11:02,360 --> 00:11:03,710
So this is the confidence threshold.

171
00:11:03,710 --> 00:11:11,240
So all the detected objects that have a confidence uh below 25%, uh, we will not be drawing bounding

172
00:11:11,240 --> 00:11:12,140
box around them.

173
00:11:12,530 --> 00:11:12,860
Okay.

174
00:11:12,860 --> 00:11:18,530
So now you can see here the confidence value is 0.900.780..

175
00:11:18,530 --> 00:11:20,120
Like something here as well.

176
00:11:20,120 --> 00:11:22,130
Like it's not visible over here.

177
00:11:22,130 --> 00:11:26,720
But we can see currently the back 0.48 the all the uh.

178
00:11:28,050 --> 00:11:33,960
All the objects that have bounding boxes drawn around them have a confidence value above 25%.

179
00:11:33,960 --> 00:11:34,380
Okay.

180
00:11:34,740 --> 00:11:39,390
So like we are only drawing bounding boxes around those objects which have a confidence value above

181
00:11:39,390 --> 00:11:40,410
25%.

182
00:11:40,410 --> 00:11:45,780
So what this confidence value means, the confidence value means that the model is 25%.

183
00:11:45,930 --> 00:11:51,690
For example, I've said that for for example, the confidence value over here is 0.90 for dog.

184
00:11:51,690 --> 00:11:55,800
So this means that the model is 90% confident that this is a dog.

185
00:11:56,310 --> 00:11:57,030
Here is person.

186
00:11:57,030 --> 00:11:59,610
For the person, the confidence value is 0.78.

187
00:11:59,730 --> 00:12:04,560
This means the model is 78% confident that the this is a person.

188
00:12:04,560 --> 00:12:04,980
Okay.

189
00:12:05,400 --> 00:12:08,910
And like this is what we meant by confidence value.

190
00:12:08,910 --> 00:12:12,330
So and here I have defined the confidence threshold as 0.25.

191
00:12:12,330 --> 00:12:13,350
So this means.

192
00:12:14,000 --> 00:12:19,940
All the objects that have a confidence value, all the detected objects that have a confidence value

193
00:12:19,940 --> 00:12:23,720
above 25%, we will be drawing bounding boxes around them.

194
00:12:23,720 --> 00:12:29,180
Like you can see, we have drawn bounding boxes for a dog, for a person, for a god, and that's it.

195
00:12:29,510 --> 00:12:35,300
Okay, so there might be some case where you have to decrease down this confidence value to do object

196
00:12:35,300 --> 00:12:35,900
detection.

197
00:12:35,900 --> 00:12:36,380
Okay.

198
00:12:36,740 --> 00:12:37,790
For example.

199
00:12:41,560 --> 00:12:50,200
If I decrease the confidence value to 0.10 and let's see, what results do I get from here.

200
00:12:50,200 --> 00:12:54,190
So now by default the confidence is back.

201
00:12:54,850 --> 00:12:58,000
So I just need to write 1% threshold.

202
00:12:59,320 --> 00:13:02,950
So now I have decreased the confidence threshold to 0.10.

203
00:13:02,950 --> 00:13:10,360
So what I mean over here is that the all the objects that were benign model detect all the objects that

204
00:13:10,360 --> 00:13:16,720
my YOLO benign model, that which have confidence value above 10% will have bounding boxes drawn around

205
00:13:16,720 --> 00:13:16,990
them.

206
00:13:16,990 --> 00:13:17,470
Okay.

207
00:13:23,090 --> 00:13:25,790
But now you can see, uh, I don't see any.

208
00:13:27,320 --> 00:13:28,460
Back over here.

209
00:13:30,080 --> 00:13:33,230
Let's decrease the confidence value to 0.05.

210
00:13:33,230 --> 00:13:43,070
Like all the detected objects that have a confidence value above 5% will have bounding boxes drawn around

211
00:13:43,070 --> 00:13:43,310
them.

212
00:14:00,460 --> 00:14:03,400
So now you can see over here okay.

213
00:14:03,400 --> 00:14:06,520
So now you can see we have detected a handbag over here as well.

214
00:14:06,940 --> 00:14:07,300
Okay.

215
00:14:07,300 --> 00:14:12,910
So let's decrease down the confidence value to 0.101%.

216
00:14:12,910 --> 00:14:18,340
So this means that after doing object detection using YOLO benign, all the detected objects that were

217
00:14:18,340 --> 00:14:22,990
confidence value above 1% will have bounding boxes drawn around them.

218
00:14:31,800 --> 00:14:33,990
So if I just, like, crash over here.

219
00:14:35,680 --> 00:14:39,760
So now you can see that this is what I want to show you over here.

220
00:14:40,000 --> 00:14:45,100
So if you just see over here, let me open this image from here as well.

221
00:14:59,140 --> 00:15:01,000
So this is the output image which we have gone to.

222
00:15:01,000 --> 00:15:01,150
Now.

223
00:15:01,150 --> 00:15:04,810
You can see over here, uh this is the backpack okay.

224
00:15:04,810 --> 00:15:06,640
So this is correct direction.

225
00:15:06,640 --> 00:15:11,650
This is the person over here like you can see over here.

226
00:15:11,650 --> 00:15:15,160
We have detected person two times okay.

227
00:15:15,550 --> 00:15:20,860
So now you can see there are two persons like it has detected person 2.42 times.

228
00:15:20,950 --> 00:15:23,470
Although there is only one person okay.

229
00:15:23,470 --> 00:15:28,870
Like you can see over here there are two person like it has detected person two times but there is only

230
00:15:28,870 --> 00:15:29,830
one person.

231
00:15:29,980 --> 00:15:31,870
Okay so.

232
00:15:31,870 --> 00:15:38,230
But last you can see over here, uh, we have uh, detected two backpacks like you can see one over

233
00:15:38,230 --> 00:15:39,850
here, one over here.

234
00:15:40,270 --> 00:15:40,600
Okay.

235
00:15:40,600 --> 00:15:44,980
So now you can see it has detected three backpacks, although there is only one backpack.

236
00:15:44,980 --> 00:15:48,010
So like you can see that we have, uh.

237
00:15:49,310 --> 00:15:54,170
Multiple bonding boxes around a single object, like there is one person, but you can see we have two

238
00:15:54,170 --> 00:15:54,950
bonding boxes.

239
00:15:54,950 --> 00:15:59,990
There is one backpack, but you can see we have three bounding boxes around the backpacks.

240
00:15:59,990 --> 00:16:00,410
Okay.

241
00:16:04,020 --> 00:16:04,590
So.

242
00:16:04,590 --> 00:16:10,140
But to remove this overlapping bounding boxes like you can see that for person we have two overlapping

243
00:16:10,140 --> 00:16:10,800
bounding boxes.

244
00:16:10,800 --> 00:16:14,400
For backpack we have three overlapping bounding boxes now.

245
00:16:14,400 --> 00:16:19,410
So to remove this overlapping bounding boxes we use a technique called non-max pressure.

246
00:16:19,410 --> 00:16:25,050
So non-max suppression is a technique that is used to remove, uh, overlapping bounding boxes that

247
00:16:25,050 --> 00:16:27,780
may arise from the object detection algorithm.

248
00:16:28,710 --> 00:16:29,220
Okay.

249
00:16:31,050 --> 00:16:33,390
So what is the main idea of norm expression?

250
00:16:33,390 --> 00:16:39,570
So the main idea for the norm expression is to retain only the bounding boxes that have a high confidence

251
00:16:39,570 --> 00:16:40,860
score than other.

252
00:16:41,190 --> 00:16:41,580
Okay.

253
00:16:41,580 --> 00:16:47,460
So what will happen is that we will be using norm expression to remove this, uh, like multiple bounding

254
00:16:47,460 --> 00:16:50,550
boxes that have been drawn around a single object, like two parts.

255
00:16:50,670 --> 00:16:53,190
We have only one person, but we have two bounding boxes.

256
00:16:53,280 --> 00:16:55,890
Uh, like the model has detected two persons a plus.

257
00:16:55,890 --> 00:16:59,250
You can see we have only one backpack, and the model has detected three backpacks.

258
00:16:59,250 --> 00:16:59,670
Okay.

259
00:16:59,790 --> 00:17:04,140
So now you can see we have multiple bounding boxes, uh, for one single object.

260
00:17:04,140 --> 00:17:10,050
So to remove this overlapping bounding boxes or redundant or extra bounding boxes, we will be using

261
00:17:10,050 --> 00:17:13,620
a technique called norm expression which will retain only uh.

262
00:17:13,620 --> 00:17:18,420
So now you out of this multiple bounding boxes for a single object like person, like we have created

263
00:17:18,420 --> 00:17:23,550
two persons, but we will only retain only one person, and that person will be retained that have a

264
00:17:23,550 --> 00:17:25,440
high confidence value.

265
00:17:25,440 --> 00:17:25,860
Okay.

266
00:17:26,100 --> 00:17:28,290
So what will I will be use for this?

267
00:17:28,290 --> 00:17:31,110
I will be using a technique called norm expression.

268
00:17:31,110 --> 00:17:35,880
And here you can see we have a IOU threshold for the norm expression.

269
00:17:35,880 --> 00:17:39,570
So this IOU is what non-maximum suppression.

270
00:17:43,150 --> 00:17:48,640
But the value of this IOU threshold for the non-maximum suppression varies from 0 to 1.

271
00:17:48,700 --> 00:17:51,070
So this value will be between 0 to 1.

272
00:17:51,650 --> 00:17:55,220
So if we set the higher threshold value, like currently it's in the middle.

273
00:17:55,220 --> 00:18:00,530
But if we set the higher threshold value like 0.80.9, what will happen.

274
00:18:00,560 --> 00:18:02,510
This will result in a.

275
00:18:04,610 --> 00:18:09,320
So what will happen is that the this only lapping bounding boxes will not be removed.

276
00:18:09,320 --> 00:18:14,600
If we set the original threshold for the non expression of value high like it varies from 0 to 1.

277
00:18:14,600 --> 00:18:17,420
And if I set the value 0.8 to 0.9.

278
00:18:17,420 --> 00:18:23,090
So this overlapping bounding boxes will not be removed or very few will be removed.

279
00:18:23,090 --> 00:18:29,510
And all the overlapping bounding boxes or the maximum number of overlapping bounding boxes will be retained.

280
00:18:32,970 --> 00:18:33,300
Okay.

281
00:18:33,300 --> 00:18:38,790
And if I just set the, uh, I use threshold value to low, like 0.1 or 0.2.

282
00:18:38,790 --> 00:18:43,320
So this will result in all the overlapping bounding boxes will be removed.

283
00:18:43,320 --> 00:18:46,950
And this will result in a fewer but more accurate detection.

284
00:18:46,950 --> 00:18:51,600
So if I set up a threshold for non expression as a very low value.

285
00:18:51,600 --> 00:18:55,710
So this will result in uh overlapping bounding boxes will be removed.

286
00:18:55,710 --> 00:18:59,310
And this will result in fewer but more accurate detections.

287
00:19:00,800 --> 00:19:02,510
So let's test this out.

288
00:19:02,510 --> 00:19:07,040
So if I just set out the IOU threshold over here.

289
00:19:12,540 --> 00:19:13,740
0.1.

290
00:19:13,800 --> 00:19:13,980
Okay.

291
00:19:13,980 --> 00:19:17,610
So currently you can see that we have only 1% but 2% collected.

292
00:19:17,610 --> 00:19:20,280
We have only one backpack but three backpacks are detected.

293
00:19:20,640 --> 00:19:24,990
So let's see if I just set this threshold value okay.

294
00:19:24,990 --> 00:19:28,470
And I will stake its IOU ratio.

295
00:19:29,850 --> 00:19:32,670
So let's see if I just set a small value of higher threshold.

296
00:19:32,670 --> 00:19:34,440
Will it affect my result.

297
00:19:34,440 --> 00:19:37,620
Will it remove the overlapping bounding boxes that are being drawn?

298
00:19:47,600 --> 00:19:49,700
So now you can see that only we have one person.

299
00:19:49,700 --> 00:19:51,680
Now we have one backpack.

300
00:19:51,680 --> 00:19:52,340
And okay.

301
00:19:52,340 --> 00:19:56,420
So now you can see that here we have only okay okay.

302
00:19:56,420 --> 00:20:00,710
So now you can see that in the previous case we have two persons and three and backpacks.

303
00:20:00,710 --> 00:20:03,620
But now you can see we have only one person and one backpack.

304
00:20:03,620 --> 00:20:07,610
Like this means on the overlapping bounding boxes have been removed.

305
00:20:07,610 --> 00:20:09,410
And if I just show you over here.

306
00:20:12,090 --> 00:20:16,080
So now, if I just show you this output over here as well.

307
00:20:21,660 --> 00:20:24,450
Well, now you can see over here like.

308
00:20:25,290 --> 00:20:27,420
The is.

309
00:20:30,670 --> 00:20:35,050
So now you can see we have only one bounding box around the person like you can see over here.

310
00:20:35,050 --> 00:20:36,730
We have only one handbag.

311
00:20:36,730 --> 00:20:37,090
Okay.

312
00:20:37,090 --> 00:20:39,310
Over here we have only one dog.

313
00:20:39,310 --> 00:20:42,130
Like all the overlapping bounding boxes have been removed.

314
00:20:42,130 --> 00:20:42,670
Okay.

315
00:20:42,730 --> 00:20:45,820
So now you can see that my output looks very fine.

316
00:20:45,820 --> 00:20:48,370
Like I have not changed the confidence threshold.

317
00:20:48,730 --> 00:20:52,210
I only adjusted the threshold for non expression.

318
00:20:52,210 --> 00:20:55,060
And this has removed all the overlapping bounding boxes.

319
00:20:55,390 --> 00:20:59,350
Like you can see that all the overlapping bounding boxes have been removed.

320
00:20:59,350 --> 00:20:59,770
Okay.

321
00:20:59,800 --> 00:21:07,750
So like you can see I have set it the very small value for the threshold for non expression which you

322
00:21:07,750 --> 00:21:09,190
can see over here.

323
00:21:11,330 --> 00:21:16,820
Okay, I've just set the value 0.1 and this resulted in all the overlapping bounding boxes being removed.

324
00:21:16,820 --> 00:21:22,100
And if I just set the threshold for non expression as a high value like 0.9.

325
00:21:22,100 --> 00:21:25,340
So the overlapping bounding boxes will be retained.

326
00:21:25,340 --> 00:21:29,930
And this will like maximum number of overlapping bounding boxes will be retained okay.

327
00:21:34,770 --> 00:21:37,860
So let me show you if I set the threshold value high.

328
00:21:49,100 --> 00:21:49,400
And.

329
00:21:54,090 --> 00:21:55,890
Like if I just show you over here.

330
00:22:02,030 --> 00:22:05,330
So now you can see that there are so many overlapping bounding boxes.

331
00:22:05,330 --> 00:22:08,600
Like for dog, they are two for the person they are.

332
00:22:08,600 --> 00:22:11,990
They are multiple hand-bags backpacks are being rejected.

333
00:22:12,020 --> 00:22:14,750
Okay, so there are so many overlapping bounding boxes.

334
00:22:14,750 --> 00:22:20,900
So with the help of this, I would, uh, threshold value for norm expression.

335
00:22:20,930 --> 00:22:25,100
You can vary the, uh, you can vary your results.

336
00:22:25,100 --> 00:22:25,400
You can.

337
00:22:25,400 --> 00:22:26,990
This will affect your results very much.

338
00:22:26,990 --> 00:22:31,670
Like you can see now, uh, if I set the high value of threshold for norm expression.

339
00:22:31,670 --> 00:22:36,230
2%, six Hand-bags six backpacks, two handbags are being detected.

340
00:22:36,230 --> 00:22:36,620
Okay.

341
00:22:37,010 --> 00:22:41,900
So in the same way, you can do object detection on your other images as well.

342
00:22:42,290 --> 00:22:44,690
So you can try this out over here.

343
00:22:47,950 --> 00:22:54,250
See, this is another concept which I want to explain about, uh, norm expression and transport.

344
00:22:54,340 --> 00:22:57,220
So you can try it out with your own images as well.

345
00:22:57,250 --> 00:23:01,540
So now let's test do object detection on the second image over here as well.

346
00:23:03,530 --> 00:23:06,890
So I'm just using the default person's treasure that I treasure now.

347
00:23:07,460 --> 00:23:08,330
Nothing different.

348
00:23:18,200 --> 00:23:20,090
Okay, so we have the results now.

349
00:23:20,090 --> 00:23:20,780
Over here.

350
00:23:22,230 --> 00:23:24,450
So now you can see that we have detected person.

351
00:23:24,450 --> 00:23:25,590
Person.

352
00:23:25,770 --> 00:23:26,160
Okay.

353
00:23:26,160 --> 00:23:27,960
We have detected these two person as well.

354
00:23:27,960 --> 00:23:30,390
And the sports ball is detected as well.

355
00:23:30,390 --> 00:23:33,180
So the results look quite good over here as well.

356
00:23:33,180 --> 00:23:36,060
Like you can see the person over here and over here is quite blurred.

357
00:23:36,060 --> 00:23:38,490
But our model is able to detect it as well.

358
00:23:38,490 --> 00:23:40,860
In this way you can test on other images as well.

359
00:23:40,860 --> 00:23:44,670
So now I will be testing on a video over here as well.

360
00:23:44,670 --> 00:23:45,420
So.

361
00:23:53,140 --> 00:23:55,690
I will just say radio one dot NPY forward.

362
00:23:56,590 --> 00:23:57,010
Okay.

363
00:23:57,010 --> 00:23:58,180
And here.

364
00:23:58,180 --> 00:24:01,510
So now if I want to show the output.

365
00:24:02,500 --> 00:24:03,040
Bye.

366
00:24:03,580 --> 00:24:05,860
So I will just write you the image.

367
00:24:05,860 --> 00:24:09,580
So this will be showing me the live results over here as well.

368
00:24:29,180 --> 00:24:32,060
So now you can see over here this is a wrong detection.

369
00:24:32,060 --> 00:24:36,050
So now you can see that we are able to detect the cars over here as well.

370
00:24:36,050 --> 00:24:37,220
So I'm using CPU.

371
00:24:37,220 --> 00:24:38,690
So the detection is very slow.

372
00:24:38,690 --> 00:24:41,630
Like you can see that uh the inference is very slow.

373
00:24:41,630 --> 00:24:45,380
But you can see that we are able to detect a car over here as well.

374
00:24:45,740 --> 00:24:47,750
So the results look like.

375
00:24:47,750 --> 00:24:47,990
Good.

376
00:24:47,990 --> 00:24:49,070
So in this way.

377
00:24:49,550 --> 00:24:51,800
So this works.

378
00:24:54,730 --> 00:24:58,300
In the same way you can test out on other video as well.

379
00:25:02,240 --> 00:25:06,740
So if you are running on CPU, you can imagine that the processing will be very slow.

380
00:25:06,740 --> 00:25:13,070
But if we tried on GPU, the the frame rate will be very high and the processing will be very fast on

381
00:25:13,070 --> 00:25:13,520
CPU.

382
00:25:13,550 --> 00:25:18,530
The FPS or the frame rate is very low and the processing is very low as well.

383
00:25:21,250 --> 00:25:23,230
Now you can say I've tested on other video.

384
00:25:23,230 --> 00:25:30,070
We are able to detect a person handbag over here as well, and the detection results look quite impressive.

385
00:25:34,060 --> 00:25:38,020
Okay, so in this way you can test on other images and videos as well.

386
00:25:38,020 --> 00:25:42,280
So let's go on and do detection using live webcam feed.

387
00:25:45,820 --> 00:25:48,160
We do object detection using the light webcam feed.

388
00:25:48,190 --> 00:25:55,030
We will set the source to zero if you are using your internal webcam like I am using my HP laptop internal

389
00:25:55,030 --> 00:25:55,630
webcam.

390
00:25:55,630 --> 00:25:59,770
But if you are using an external webcam, then you can set the source 1 or 2.

391
00:26:00,010 --> 00:26:04,420
Okay, but if you have only one external webcam, then you can set the source value to one.

392
00:26:05,020 --> 00:26:05,470
Okay.

393
00:26:05,470 --> 00:26:07,150
And let's run this up now.

394
00:26:25,540 --> 00:26:26,920
When all you can see over here.

395
00:26:26,920 --> 00:26:33,010
I am able to do object detection on the live webcam feed over here as well, so that I'm just running

396
00:26:33,010 --> 00:26:34,240
it on my CPU machine.

397
00:26:34,240 --> 00:26:37,540
So you can see, uh, the detection is very slow.

398
00:26:37,810 --> 00:26:43,330
So that's one of the reason what you can do this object detection on the live webcam feed as well,

399
00:26:43,570 --> 00:26:45,820
like I am doing over here.

400
00:26:45,820 --> 00:26:50,380
And you can see the frame size is already by 640 okay.

401
00:26:53,820 --> 00:26:58,380
So it's 640 is the bid and the 40 is the frame right.

402
00:26:58,410 --> 00:27:01,050
You can see the 640 is the width and frame height.

403
00:27:01,260 --> 00:27:05,310
So let's see I have this uh uh pic of the scissor.

404
00:27:05,310 --> 00:27:08,160
So let's see if it is able to detect the scissor or not.

405
00:27:08,400 --> 00:27:08,640
Okay.

406
00:27:08,640 --> 00:27:09,750
So it's the night.

407
00:27:09,780 --> 00:27:10,950
This is not a knife.

408
00:27:10,950 --> 00:27:11,880
This is a scissor.

409
00:27:12,300 --> 00:27:14,130
And let's see.

410
00:27:14,280 --> 00:27:14,670
Okay.

411
00:27:14,670 --> 00:27:17,490
So now you can see that as decorated as scissor now.

412
00:27:18,000 --> 00:27:18,420
Okay.

413
00:27:18,420 --> 00:27:23,520
So now you can see that here we have overlapping not a cell phone okay.

414
00:27:26,920 --> 00:27:29,200
So now you can see it's going to detect it as Caesar.

415
00:27:29,200 --> 00:27:31,360
And that is fine.

416
00:27:31,360 --> 00:27:34,270
Like now you can see that it's being detected as Caesar.

417
00:27:34,510 --> 00:27:38,110
So in this way you can do object detection on the live webcam feed as well.

418
00:27:38,110 --> 00:27:39,700
And that's all from this tutorial.

419
00:27:39,730 --> 00:27:40,990
Thank you for watching.