1
00:00:02,000 --> 00:00:08,000
In this video tutorial, we will look at how we can train the YOLO 11 object detection model on a custom

2
00:00:08,000 --> 00:00:09,000
data set.

3
00:00:09,000 --> 00:00:15,000
We will train the YOLO 11 object detection model on a Personal Protective Equipment data set, which

4
00:00:15,000 --> 00:00:17,000
is available on Roboflow.

5
00:00:17,000 --> 00:00:22,000
So here is a quick demo or demo of how our results will look like at the end.

6
00:00:22,000 --> 00:00:27,000
After we train the YOLO 11 object detection model on Personal Protective Equipment data set.

7
00:00:27,000 --> 00:00:33,000
So over here you can see that we are able to detect the protective helmet protective boots, safety

8
00:00:33,000 --> 00:00:34,000
vest.

9
00:00:34,000 --> 00:00:39,000
Along with this we will also be able to detect the gloves and the eyewear as well.

10
00:00:39,000 --> 00:00:44,000
So this is a quick demo at how our results will look like at the end.

11
00:00:44,000 --> 00:00:46,000
So let's move towards the coding part.

12
00:00:47,000 --> 00:00:51,000
So over here you can see that we have created a Google Colab notebook.

13
00:00:51,000 --> 00:00:53,000
So I have written all the code already.

14
00:00:53,000 --> 00:00:56,000
So you can see over here we have written all the code.

15
00:00:56,000 --> 00:00:58,000
I have done the training as well.

16
00:00:58,000 --> 00:01:02,000
And you can see we have some results over here as well.

17
00:01:02,000 --> 00:01:09,000
And then we will test the fine tuned model on random images as well as on videos as well.

18
00:01:09,000 --> 00:01:13,000
Okay, so we will cover all this in this video tutorial.

19
00:01:13,000 --> 00:01:15,000
So let's get started with this.

20
00:01:15,000 --> 00:01:22,000
So before we run the script please make sure that you have selected the runtime as T for GPU.

21
00:01:22,000 --> 00:01:22,000
Yes.

22
00:01:22,000 --> 00:01:23,000
Okay.

23
00:01:23,000 --> 00:01:27,000
And now you can see we have the T for GPU available over here.

24
00:01:27,000 --> 00:01:31,000
So Google Colab offers free GPU like you can see over here.

25
00:01:31,000 --> 00:01:35,000
Like it can last around one hour 20 minutes, which is enough for this task.

26
00:01:35,000 --> 00:01:43,000
And we have a GPU Ram of 15 GB available over here and system Ram of 12.7 GB, which is enough for this

27
00:01:43,000 --> 00:01:44,000
tutorial and project.

28
00:01:44,000 --> 00:01:45,000
Okay.

29
00:01:45,000 --> 00:01:47,000
So let's first run this.

30
00:01:47,000 --> 00:01:51,000
So this will tell me that if I am using the GPU memory or not.

31
00:01:51,000 --> 00:01:53,000
So currently I'm using the GPU memory.

32
00:01:53,000 --> 00:01:53,000
Okay.

33
00:01:53,000 --> 00:01:58,000
But no process is running because I'm not training the YOLO model 20.

34
00:01:59,000 --> 00:02:01,000
Then we need to install the Ultralytics package.

35
00:02:02,000 --> 00:02:02,000
Okay.

36
00:02:04,000 --> 00:02:08,000
So over here you can see we are installing the Ultralytics package.

37
00:02:08,000 --> 00:02:10,000
So this will take some time.

38
00:02:10,000 --> 00:02:11,000
Okay.

39
00:02:11,000 --> 00:02:12,000
So now it's done.

40
00:02:12,000 --> 00:02:15,000
So you can just hide the output so your notebook looks more clean.

41
00:02:15,000 --> 00:02:20,000
So now we will check that if we have ultralytics working fine.

42
00:02:22,000 --> 00:02:31,000
So now you can see over here Ultralytics 8.3.1 version is currently running.

43
00:02:31,000 --> 00:02:33,000
Over here we have Python 3.1 available.

44
00:02:33,000 --> 00:02:37,000
And we have Cuda available Tesla t4 GPU.

45
00:02:37,000 --> 00:02:42,000
And we have two CPU cores available of 12.7 GB Ram.

46
00:02:42,000 --> 00:02:44,000
Okay so that looks fine.

47
00:02:44,000 --> 00:02:49,000
So we are using Ultralytics but with version 8.3.1.

48
00:02:49,000 --> 00:02:49,000
Okay.

49
00:02:49,000 --> 00:02:53,000
So now from Ultralytics I will import YOLO.

50
00:02:54,000 --> 00:02:58,000
So when you import YOLO you can just call any YOLO model.

51
00:02:58,000 --> 00:03:01,000
You can call YOLO model over here as well.

52
00:03:01,000 --> 00:03:04,000
You can call YOLO model over there as well.

53
00:03:04,000 --> 00:03:10,000
So now to display any input or output image in the Google Colab notebook we will import the image library

54
00:03:10,000 --> 00:03:12,000
from IPython dot display.

55
00:03:12,000 --> 00:03:15,000
Then I will be downloading the data set from Roboflow.

56
00:03:15,000 --> 00:03:19,000
So let me just show you this data set over here as well.

57
00:03:23,000 --> 00:03:30,000
So we will be using this data set which is available on Roboflow over here okay.

58
00:03:30,000 --> 00:03:39,000
So in order to use this data set uh to fine tune your YOLO model, you just need to log in over here

59
00:03:39,000 --> 00:03:40,000
on Roboflow.

60
00:03:40,000 --> 00:03:45,000
So you can just click over here and you can sign in with Google, with your GitHub GitHub account or

61
00:03:45,000 --> 00:03:48,000
with any other email like Yahoo email as well.

62
00:03:48,000 --> 00:03:53,000
So I will just sign in with my, uh, Google account over here as well.

63
00:03:53,000 --> 00:03:54,000
Okay.

64
00:03:54,000 --> 00:03:58,000
So now you can see over here we will use the version two over here.

65
00:03:58,000 --> 00:04:00,000
And you can see here is our data set.

66
00:04:00,000 --> 00:04:06,000
We have uh, around 3290 images available over here.

67
00:04:06,000 --> 00:04:16,000
And we have around 22,311 images in our train set, uh, 7652 images in our validation set and 327 images

68
00:04:16,000 --> 00:04:18,000
in our test set.

69
00:04:18,000 --> 00:04:23,000
So we have our data set split as like we are using 70% of the data for the training person purpose,

70
00:04:23,000 --> 00:04:30,000
20% of the data for the validation purpose, and 10% of the data is used for the test purpose as well.

71
00:04:30,000 --> 00:04:31,000
Okay.

72
00:04:37,000 --> 00:04:40,000
So this is how our data set looks like.

73
00:04:41,000 --> 00:04:42,000
Okay.

74
00:04:42,000 --> 00:04:44,000
So now I will just download the data set.

75
00:04:44,000 --> 00:04:47,000
Now in order to download this data set you can just click over here.

76
00:04:47,000 --> 00:04:54,000
Then you can select the Yolo V8 format because okay you can either select the yellow V8 or YOLO 11 format.

77
00:04:55,000 --> 00:04:55,000
Okay.

78
00:04:55,000 --> 00:04:58,000
And then you can click on show download code over here.

79
00:05:09,000 --> 00:05:15,000
And then you can just copy this code from here and just add this code over here.

80
00:05:15,000 --> 00:05:16,000
Okay.

81
00:05:16,000 --> 00:05:21,000
So now this will download uh the data set.

82
00:05:21,000 --> 00:05:26,000
If you run the cell over here it will download this data set into Roboflow.

83
00:05:26,000 --> 00:05:30,000
And now you can see that our data set is being started to download.

84
00:05:30,000 --> 00:05:32,000
So this will take some time.

85
00:05:37,000 --> 00:05:38,000
Now it's around.

86
00:05:38,000 --> 00:05:38,000
Done.

87
00:05:38,000 --> 00:05:39,000
Okay.

88
00:05:39,000 --> 00:05:40,000
That looks good.

89
00:05:40,000 --> 00:05:43,000
So now we have downloaded the data set over here.

90
00:05:43,000 --> 00:05:44,000
Okay.

91
00:05:44,000 --> 00:05:46,000
So we have that train test validation data set.

92
00:05:46,000 --> 00:05:49,000
So here is our data dot yml file.

93
00:05:49,000 --> 00:05:53,000
So over here you can see that we have number of classes at seven.

94
00:05:53,000 --> 00:05:55,000
So we have to go to seven classes.

95
00:05:55,000 --> 00:06:01,000
We have the dust mask class, IVR class, glove, protective boots, protective helmets, safety vest

96
00:06:01,000 --> 00:06:02,000
and shield class.

97
00:06:02,000 --> 00:06:10,000
Okay, so these are the different classes that we have in our data set over here okay.

98
00:06:10,000 --> 00:06:18,000
So one thing you can do first before you go ahead is uh here is our train test images and validation.

99
00:06:20,000 --> 00:06:23,000
So you can just copy path from here okay.

100
00:06:23,000 --> 00:06:28,000
And just and just update this path first.

101
00:06:29,000 --> 00:06:30,000
Okay.

102
00:06:30,000 --> 00:06:36,000
And then you can just click over here, copy the path and just update this path over here.

103
00:06:36,000 --> 00:06:43,000
And then you can just click over here Copy Path and just update this path over here okay.

104
00:06:44,000 --> 00:06:46,000
So here is the link of the data set as well.

105
00:06:46,000 --> 00:06:53,000
You can just view this data set by uh copying this link and and adding this into another tab over here

106
00:06:53,000 --> 00:06:54,000
as well.

107
00:06:54,000 --> 00:06:55,000
Okay.

108
00:06:55,000 --> 00:06:58,000
So that looks good.

109
00:06:58,000 --> 00:06:58,000
Okay.

110
00:07:01,000 --> 00:07:06,000
So now if I just right over here, basically our data set is being saved into the variable data set.

111
00:07:06,000 --> 00:07:13,000
And if I just right over here data set dot location this will tell me the location of the data set over

112
00:07:13,000 --> 00:07:13,000
here.

113
00:07:14,000 --> 00:07:14,000
Okay.

114
00:07:14,000 --> 00:07:18,000
So now we will train the YOLO 11 model on a custom data set.

115
00:07:18,000 --> 00:07:22,000
So you just need to, uh, add the model over here.

116
00:07:22,000 --> 00:07:23,000
Okay.

117
00:07:23,000 --> 00:07:26,000
So if you are currently we are using YOLO 11 model.

118
00:07:26,000 --> 00:07:28,000
So you just need to write YOLO 11.

119
00:07:28,000 --> 00:07:30,000
We are using YOLO 11 medium model.

120
00:07:30,000 --> 00:07:33,000
So you just need to write YOLO 11 medium model.

121
00:07:33,000 --> 00:07:37,000
And we will be training or fine tuning the YOLO 11 model on.

122
00:07:37,000 --> 00:07:42,000
Like we can say that 50 epochs because this will be enough okay.

123
00:07:42,000 --> 00:07:45,000
And toss is equal to detect mod is equal to train.

124
00:07:45,000 --> 00:07:49,000
And I'm just passing the location of the data dot yml file over here.

125
00:07:49,000 --> 00:07:50,000
Okay.

126
00:07:50,000 --> 00:07:52,000
And the default image size.

127
00:07:52,000 --> 00:08:00,000
Basically our YOLO 11 model is being YOLO 11 model is being trained on the, uh, image size of six

128
00:08:00,000 --> 00:08:01,000
4640.

129
00:08:01,000 --> 00:08:06,000
So it's better if we fine tune the YOLO 11 model on the same size.

130
00:08:06,000 --> 00:08:11,000
So plus we can like, fine tune it for 40 epochs.

131
00:08:11,000 --> 00:08:13,000
Like that will be fine as well.

132
00:08:14,000 --> 00:08:16,000
So let's run this.

133
00:08:18,000 --> 00:08:22,000
So I'm just showing you like how you can train the YOLO 11 model.

134
00:08:34,000 --> 00:08:39,000
Now you can see it's downloading the data set model over here as well.

135
00:08:39,000 --> 00:08:39,000
Okay.

136
00:08:52,000 --> 00:08:55,000
So now you can see that the training has started.

137
00:08:55,000 --> 00:09:03,000
Uh, so let's wait for the training, uh, to complete, or I will just show you after it completes

138
00:09:03,000 --> 00:09:05,000
ten epochs or so.

139
00:09:05,000 --> 00:09:05,000
Okay.

140
00:09:05,000 --> 00:09:08,000
So let's wait for it to get complete.

141
00:09:12,000 --> 00:09:15,000
So you can see that training is still in progress.

142
00:09:15,000 --> 00:09:21,000
Uh, currently, uh, 30 epochs are being processed out of 40.

143
00:09:21,000 --> 00:09:24,000
And if we just see over here, I made one change.

144
00:09:24,000 --> 00:09:31,000
Previously I was using YOLO 11 medium model, but now I'm using YOLO 11 nano model, which is the smallest

145
00:09:31,000 --> 00:09:37,000
in size and it is the fastest, but it is least accurate among the YOLO 11 models.

146
00:09:37,000 --> 00:09:43,000
The reason is that because, uh, to process an epoch, the YOLO 11 medium model was taking around,

147
00:09:43,000 --> 00:09:45,000
uh, 3.5 minutes.

148
00:09:45,000 --> 00:09:48,000
Uh, so it will take a lot of consumer, a lot of time.

149
00:09:48,000 --> 00:09:55,000
So I just shifted to YOLO 11 nano model, which is taking around 53 seconds to process an epoch.

150
00:09:56,000 --> 00:09:56,000
Okay.

151
00:09:56,000 --> 00:10:02,000
And currently, uh, 30 epochs, 31 epochs are being processed.

152
00:10:02,000 --> 00:10:06,000
So let me try to explain you a few things in the meanwhile.

153
00:10:06,000 --> 00:10:11,000
So now over here you can see that we are using YOLO 11.

154
00:10:11,000 --> 00:10:14,000
We can just close the dataset that we are using.

155
00:10:14,000 --> 00:10:15,000
Uh, these are the different parameters.

156
00:10:16,000 --> 00:10:18,000
We are using the YOLO 11 nano model.

157
00:10:19,000 --> 00:10:25,000
And we are using the we are just training or fine tuning our YOLO 11 nano model on this Personal Protective

158
00:10:25,000 --> 00:10:26,000
equipment data set.

159
00:10:26,000 --> 00:10:28,000
We have set the epochs to 40.

160
00:10:28,000 --> 00:10:30,000
The patience is being set to 100.

161
00:10:30,000 --> 00:10:36,000
Like if we are getting, uh, the mean average precision like this is the mean average precision with,

162
00:10:36,000 --> 00:10:45,000
uh, IOU threshold, uh, intersection over union threshold as 50 or 0.5 is 0.421 and mean average precision

163
00:10:45,000 --> 00:10:47,000
when IOU threshold.

164
00:10:47,000 --> 00:10:51,000
Intersection over union threshold varies from 0.5 to 0.95.

165
00:10:51,000 --> 00:10:53,000
Is 0.238.

166
00:10:53,000 --> 00:10:56,000
So the patient's value is being set to 100.

167
00:10:56,000 --> 00:10:58,000
So if the mean average precision.

168
00:10:58,000 --> 00:11:05,000
Um remains same for 100 epochs, then uh we will stop the training.

169
00:11:05,000 --> 00:11:06,000
Okay.

170
00:11:06,000 --> 00:11:07,000
So this is what.

171
00:11:07,000 --> 00:11:11,000
Patient will reflect then we have a batch size is equal to 16.

172
00:11:11,000 --> 00:11:14,000
Like in each batch we will pass 16.

173
00:11:14,000 --> 00:11:18,000
Images to our model okay.

174
00:11:18,000 --> 00:11:22,000
So for training uh, in each batch we will pass 16 images to our model.

175
00:11:22,000 --> 00:11:22,000
For training.

176
00:11:22,000 --> 00:11:24,000
So what does this mean.

177
00:11:24,000 --> 00:11:28,000
So these are the number of batches 142 are the number of batches.

178
00:11:28,000 --> 00:11:28,000
Okay.

179
00:11:29,000 --> 00:11:34,000
So in each epoch we pass 142 batches okay.

180
00:11:34,000 --> 00:11:37,000
So like you can see over here and in each batch.

181
00:11:37,000 --> 00:11:39,000
We have 16 images.

182
00:11:39,000 --> 00:11:43,000
So if I just show you over here uh number of images.

183
00:11:43,000 --> 00:11:47,000
So we have 2271 images in the training.

184
00:11:47,000 --> 00:11:47,000
Okay.

185
00:11:47,000 --> 00:11:55,000
Like you can see over here, we have around 22,271 images in the training.

186
00:11:55,000 --> 00:11:56,000
You can see over here.

187
00:11:56,000 --> 00:11:57,000
Okay.

188
00:11:57,000 --> 00:12:04,000
So what we can do for here is if you divide this number with 16.

189
00:12:04,000 --> 00:12:04,000
Okay.

190
00:12:04,000 --> 00:12:08,000
So now let me see if I just open the calculator from here.

191
00:12:13,000 --> 00:12:17,000
So as I told you that in each batch we are passing 16 images.

192
00:12:17,000 --> 00:12:23,000
So if you divide 2271 divided by 16.

193
00:12:24,000 --> 00:12:28,000
So like you can see over here, these are the number of batches.

194
00:12:28,000 --> 00:12:30,000
So what we have 142.

195
00:12:30,000 --> 00:12:32,000
And if I just show you over here.

196
00:12:34,000 --> 00:12:39,000
Like you can see over here we have 142 batches as well.

197
00:12:39,000 --> 00:12:39,000
Okay.

198
00:12:39,000 --> 00:12:42,000
And over here in the validation we have around.

199
00:12:44,000 --> 00:12:45,000
in the validation.

200
00:12:45,000 --> 00:12:49,000
We have around 637 images like you can see over here.

201
00:12:49,000 --> 00:12:50,000
Okay.

202
00:12:50,000 --> 00:12:59,000
So if I just divide 637 divided by 16 is equal to around 40.

203
00:12:59,000 --> 00:13:07,000
But you might notice over here one thing that we are have we are the number of batches should be 40.

204
00:13:07,000 --> 00:13:11,000
But the number of batches over here are 20 because we have reduced the size of the batches.

205
00:13:11,000 --> 00:13:18,000
So you can say that we are just passing 32 images in each of the batch over here in the validation.

206
00:13:18,000 --> 00:13:19,000
Okay.

207
00:13:19,000 --> 00:13:26,000
So and if you just go over here we are setting, we are just fine tuning our YOLO 11 model by setting

208
00:13:26,000 --> 00:13:27,000
using the default image size.

209
00:13:27,000 --> 00:13:29,000
Basically YOLO 11.

210
00:13:29,000 --> 00:13:33,000
Our model is being trained on that by using the image setting.

211
00:13:33,000 --> 00:13:37,000
The image size of 640 cross 640 like 640 is the width and 640 is the height.

212
00:13:37,000 --> 00:13:44,000
And we are just fine tuning the YOLO 11 model by setting the same image size of six 4640.

213
00:13:44,000 --> 00:13:52,000
So if our image are larger than six 4640, then it will be compressed to six 4640.

214
00:13:52,000 --> 00:13:57,000
So it is better if you don't, uh, change the values of the image size and keep it default.

215
00:13:57,000 --> 00:14:02,000
Because if you just vary those values, like if you increase the image size or decrease the image size,

216
00:14:02,000 --> 00:14:06,000
the performance of the model get might get compromised as well.

217
00:14:06,000 --> 00:14:06,000
Okay.

218
00:14:07,000 --> 00:14:09,000
So we have the batch size 16.

219
00:14:09,000 --> 00:14:15,000
And uh let's and in the train three folder our results will be saved okay.

220
00:14:15,000 --> 00:14:17,000
So if I just go to the run dataset.

221
00:14:17,000 --> 00:14:21,000
So in the train three folder we have our weights and everything okay.

222
00:14:22,000 --> 00:14:25,000
So we are using the pre-trained model.

223
00:14:25,000 --> 00:14:30,000
So it's true we are using the optimizer automatically defined okay.

224
00:14:30,000 --> 00:14:33,000
Uh verbose is equal to true because you can see we are just getting the detail.

225
00:14:33,000 --> 00:14:36,000
Like after each epoch the mean average precision we are getting.

226
00:14:36,000 --> 00:14:38,000
So we are just getting all these details.

227
00:14:38,000 --> 00:14:40,000
So this is because we have set the.

228
00:14:40,000 --> 00:14:43,000
By default the verbose is being set to true.

229
00:14:43,000 --> 00:14:43,000
Okay.

230
00:14:47,000 --> 00:14:49,000
And if we just go ahead.

231
00:14:49,000 --> 00:14:50,000
Um.

232
00:14:56,000 --> 00:14:56,000
Okay.

233
00:14:57,000 --> 00:14:58,000
So.

234
00:15:01,000 --> 00:15:07,000
And then we have uh, so these are all the other parameters.

235
00:15:07,000 --> 00:15:09,000
You can just change these values.

236
00:15:09,000 --> 00:15:10,000
Parameters over here.

237
00:15:10,000 --> 00:15:12,000
These are the by default set values.

238
00:15:13,000 --> 00:15:17,000
Uh, I was just trying to show you one thing.

239
00:15:17,000 --> 00:15:19,000
The maximum number of detections.

240
00:15:19,000 --> 00:15:20,000
Okay.

241
00:15:20,000 --> 00:15:24,000
So you can see that, uh, the maximum number of detections are being set to 300.

242
00:15:24,000 --> 00:15:30,000
So it will detect maximum 300 objects in a frame or in an image.

243
00:15:30,000 --> 00:15:30,000
Okay.

244
00:15:30,000 --> 00:15:36,000
And this is the non-max suppression intersection over union value is being set to 0.7 okay.

245
00:15:36,000 --> 00:15:37,000
So.

246
00:15:40,000 --> 00:15:45,000
So basically non-expression is a technique that is used to remove redundant or overlapping bounding

247
00:15:45,000 --> 00:15:46,000
boxes.

248
00:15:46,000 --> 00:15:57,000
Um, so all the objects that have an intersection over union, like uh, below 0.7 are being removed

249
00:15:57,000 --> 00:15:58,000
or treated as false positive.

250
00:15:59,000 --> 00:15:59,000
Okay.

251
00:16:02,000 --> 00:16:02,000
Okay.

252
00:16:02,000 --> 00:16:09,000
So if the intersection over union of the two bounding boxes is below 0.7, uh, then the bounding boxes

253
00:16:09,000 --> 00:16:13,000
which have a low confidence threshold will be suppressed.

254
00:16:13,000 --> 00:16:16,000
So this is what we mean by IOU over here.

255
00:16:17,000 --> 00:16:21,000
So and uh, then all these are all the other parameters.

256
00:16:21,000 --> 00:16:23,000
So this is what we need to know.

257
00:16:24,000 --> 00:16:27,000
So let's see where our training has progressed for now.

258
00:16:27,000 --> 00:16:30,000
So we are done with the 37 epochs.

259
00:16:30,000 --> 00:16:30,000
Okay.

260
00:16:30,000 --> 00:16:32,000
So this will take around two minutes more.

261
00:16:32,000 --> 00:16:34,000
So let's wait for this to finish.

262
00:16:34,000 --> 00:16:39,000
And you can see over here we are just getting mean average precision with intersection over union threshold

263
00:16:39,000 --> 00:16:48,000
as 0.5 or 50% is 89.8%, and the mean average precision with intersection over union threshold varies

264
00:16:48,000 --> 00:16:55,000
from 50 to 95, or 0.5 to 0.95, is 0.617 or 61.7%.

265
00:16:55,000 --> 00:16:59,000
So these are all the calculations that are obtained on the validation set.

266
00:17:00,000 --> 00:17:07,000
So basically we are not using the validation images like we have uh 637 images in the validation.

267
00:17:07,000 --> 00:17:10,000
So we are not using the validation images for the training.

268
00:17:10,000 --> 00:17:16,000
So these are all the mean average precision scores on the validation data set okay.

269
00:17:18,000 --> 00:17:19,000
So.

270
00:17:22,000 --> 00:17:27,000
So as we are not uh training our model on the validation data set so we can just use the scores and

271
00:17:27,000 --> 00:17:29,000
see how our model perform.

272
00:17:29,000 --> 00:17:30,000
Okay.

273
00:17:31,000 --> 00:17:37,000
So basically uh, the mean average precision score on the validation data set tell us how better our

274
00:17:37,000 --> 00:17:38,000
model generalizes.

275
00:17:38,000 --> 00:17:39,000
Okay.

276
00:17:45,000 --> 00:17:47,000
So now you can see that training is being done.

277
00:17:47,000 --> 00:17:53,000
And you can see that, uh, we are just getting the mean average precision scores, uh, for all the

278
00:17:53,000 --> 00:17:57,000
classes, uh, these are the scores that are obtained on the validation data set.

279
00:17:57,000 --> 00:18:03,000
And our best model are saved into this, uh, train tree and, uh, test.

280
00:18:03,000 --> 00:18:06,000
So these are our best model weights over here.

281
00:18:06,000 --> 00:18:07,000
We have.

282
00:18:07,000 --> 00:18:15,000
Okay, so now you can see that the mean average precision score for only shield class is low, like

283
00:18:15,000 --> 00:18:23,000
it's 0.763 and mean average precision when IOU varies from 50 to 95 is 0.467 else or for all the classes

284
00:18:23,000 --> 00:18:25,000
it's much higher.

285
00:18:25,000 --> 00:18:25,000
Okay.

286
00:18:25,000 --> 00:18:28,000
And the precision and recall scores are also a bit low.

287
00:18:29,000 --> 00:18:29,000
Okay.

288
00:18:31,000 --> 00:18:36,000
The reason might be that, uh, for the shield class, we don't have very much data available like an

289
00:18:36,000 --> 00:18:39,000
export in the training as well as in the validation.

290
00:18:39,000 --> 00:18:46,000
Like you can see over here, we have uh, in the validation, we have 637 images and we have only 25

291
00:18:46,000 --> 00:18:47,000
images for the shield class.

292
00:18:48,000 --> 00:18:51,000
Similarly, in the training, we have also very low data for the shield class as well.

293
00:18:51,000 --> 00:18:56,000
Like if we increase the data might be, uh, we might get better results.

294
00:18:56,000 --> 00:18:56,000
Okay.

295
00:18:59,000 --> 00:19:05,000
So now if I just go to over here we have the precision recall curve the confusion matrix.

296
00:19:05,000 --> 00:19:08,000
So let me just show you the precision curve first over here.

297
00:19:14,000 --> 00:19:22,000
So one thing you might notice that um as the confidence value increases uh, now you can see what is

298
00:19:22,000 --> 00:19:23,000
precision first.

299
00:19:24,000 --> 00:19:28,000
Uh, precision is basically uh, number of true positives.

300
00:19:28,000 --> 00:19:34,000
Like how much correct position predictions we got out of all positive predictions, like true positives

301
00:19:34,000 --> 00:19:37,000
divided by true positive plus false positives.

302
00:19:37,000 --> 00:19:44,000
So in short, you can say that precision is the number of correct predictions like we got.

303
00:19:44,000 --> 00:19:50,000
Okay, so now you can see that as we increase the confidence threshold the precision value continues

304
00:19:50,000 --> 00:19:52,000
to increase okay.

305
00:19:53,000 --> 00:19:59,000
Now you can see that as we increase the now you can see we have the for each class we are just getting

306
00:19:59,000 --> 00:20:02,000
a separate precision curve okay.

307
00:20:02,000 --> 00:20:07,000
And as we increase the confidence threshold the precision value is increasing.

308
00:20:07,000 --> 00:20:12,000
Like you can clearly see over here for all the generally for all the classes.

309
00:20:15,000 --> 00:20:23,000
So this means that at a higher confidence threshold the model is making very low false positive predictions.

310
00:20:23,000 --> 00:20:27,000
As we increase, the confidence for the model became more certain.

311
00:20:27,000 --> 00:20:33,000
And although it might make very less predictions but it will make a correct predictions.

312
00:20:33,000 --> 00:20:40,000
Okay, so as we increase the confidence threshold, the model makes a very less false positive predictions

313
00:20:40,000 --> 00:20:42,000
we can clearly see over here.

314
00:20:44,000 --> 00:20:53,000
While we can see that when we uh, when we have a low confidence score like 0.0, um, we have uh,

315
00:20:53,000 --> 00:20:54,000
very low precision score.

316
00:20:54,000 --> 00:20:59,000
This is because we are just getting many false positives in our predictions.

317
00:20:59,000 --> 00:21:05,000
When we get more false positives in our prediction, our precision value drops.

318
00:21:05,000 --> 00:21:11,000
And when we get, uh, no true, no false positives, like when we increase the confidence score, we

319
00:21:11,000 --> 00:21:13,000
definitely don't get, uh, very much false positive.

320
00:21:13,000 --> 00:21:16,000
So our, uh, precision increases.

321
00:21:16,000 --> 00:21:18,000
So this is how it relates.

322
00:21:18,000 --> 00:21:22,000
And you can see over here this blue curve is for all classes.

323
00:21:22,000 --> 00:21:30,000
And we can see over here uh at 0.959 confidence threshold, we are getting a precision score of 1.00.

324
00:21:30,000 --> 00:21:33,000
So this means that 0.959 confidence threshold.

325
00:21:33,000 --> 00:21:36,000
We are getting a precision score of 1.00.

326
00:21:36,000 --> 00:21:39,000
And here we have the recall confidence curve over here.

327
00:21:41,000 --> 00:21:47,000
So we can see that the recall value for all the classes drops as we increase the confidence scores.

328
00:21:47,000 --> 00:21:53,000
Like you can see at low confidence, uh, we have uh at low confidence score, the recall value is high.

329
00:21:53,000 --> 00:21:57,000
And as we increase the confidence score the recall value drops.

330
00:22:05,000 --> 00:22:11,000
So, uh, basically, uh, recall tells us the true positives out of all ground truths.

331
00:22:11,000 --> 00:22:11,000
Okay.

332
00:22:11,000 --> 00:22:13,000
So what does this mean?

333
00:22:13,000 --> 00:22:20,000
Like when we have a low confidence score, like, uh, in recall, we are not worried about false positives,

334
00:22:20,000 --> 00:22:20,000
okay.

335
00:22:20,000 --> 00:22:23,000
Only in recall we are worried about the true positives.

336
00:22:23,000 --> 00:22:29,000
So when we have a low confidence score like 0.0 we are getting more predictions.

337
00:22:29,000 --> 00:22:36,000
And in general, when we have more predictions that we might have very much high false positives.

338
00:22:36,000 --> 00:22:40,000
But we also have many true positives in our predictions as well.

339
00:22:40,000 --> 00:22:46,000
So in recall, we are only concerned with the number of true positives okay.

340
00:22:46,000 --> 00:22:51,000
So as we decrease the confidence score we might get a lot of true positives.

341
00:22:51,000 --> 00:22:55,000
So we have a high recall at low confidence.

342
00:22:55,000 --> 00:23:04,000
And uh when we increase the confidence score like 1.00 our we might not get uh um, we might not get

343
00:23:04,000 --> 00:23:06,000
very much predictions like the prediction rate drop.

344
00:23:06,000 --> 00:23:11,000
Like if we have the confidence score 0.0, we are getting, for example, ten predictions.

345
00:23:11,000 --> 00:23:17,000
And when we make the confidence score 1.0, we might be getting, uh, around 1 or 2 predictions.

346
00:23:17,000 --> 00:23:20,000
So when the prediction rate drops.

347
00:23:20,000 --> 00:23:26,000
So it might mean you can say that we might get less true positives okay.

348
00:23:26,000 --> 00:23:31,000
So therefore the default score also dropped because we are getting less true positives.

349
00:23:36,000 --> 00:23:43,000
So because when we increase the confidence score like 1.0 or 0.8, there will be less detections, okay,

350
00:23:43,000 --> 00:23:48,000
because it will only keep the draw bounding boxes around those detected objects which have a higher

351
00:23:48,000 --> 00:23:50,000
confidence above 0.8.

352
00:23:50,000 --> 00:23:52,000
So definitely the predictions will drop.

353
00:23:52,000 --> 00:23:59,000
And we might have uh, although we might get low false positives, but we all we are also getting uh,

354
00:23:59,000 --> 00:24:01,000
very much low true positives as well.

355
00:24:01,000 --> 00:24:05,000
So therefore our recall score will drop.

356
00:24:08,000 --> 00:24:14,000
But one benefit is that when we increase the confidence score, you might get, uh, fewer predictions,

357
00:24:14,000 --> 00:24:16,000
but they will be more accurate predictions.

358
00:24:16,000 --> 00:24:17,000
So remember this.

359
00:24:17,000 --> 00:24:21,000
When you increase the confidence score you might get fewer predictions, but they will be more accurate

360
00:24:21,000 --> 00:24:22,000
predictions.

361
00:24:22,000 --> 00:24:23,000
Okay.

362
00:24:24,000 --> 00:24:26,000
So this is the confidence score.

363
00:24:26,000 --> 00:24:31,000
So the confidence score basically tells us how our models handles different classes.

364
00:24:31,000 --> 00:24:38,000
So for example for the dust mask class, like you can see that um, at 282 times our model detected

365
00:24:38,000 --> 00:24:45,000
correctly that there is a dust mask, while 16 times when there is a dust mask our model was unable

366
00:24:45,000 --> 00:24:48,000
to detect, like our model detected nothing.

367
00:24:48,000 --> 00:24:53,000
And 128 times when there was, uh, where the user was wearing an eyewear.

368
00:24:53,000 --> 00:24:59,000
Our model detected correctly that the person is wearing an eyewear, while one time when the person

369
00:24:59,000 --> 00:25:03,000
was wearing an eyewear, our model detected it as a shield.

370
00:25:03,000 --> 00:25:10,000
And 18 times when the person was wearing an eyewear, our model detected nothing.

371
00:25:10,000 --> 00:25:17,000
Okay, similarly, 725 times when the person was wearing protective helmet or model detected correctly,

372
00:25:17,000 --> 00:25:22,000
that the person is wearing protective helmet and uh, 23 times when the person is wearing protective

373
00:25:22,000 --> 00:25:26,000
helmet, our model is unable to detect anything.

374
00:25:26,000 --> 00:25:26,000
Okay.

375
00:25:26,000 --> 00:25:31,000
And one time when the person is wearing protective helmet or safe shield, and two times when the person

376
00:25:31,000 --> 00:25:33,000
is wearing protective helmet.

377
00:25:33,000 --> 00:25:36,000
The our model detected as safety vest.

378
00:25:36,000 --> 00:25:40,000
So confidence metrics tell us how our model handles different classes.

379
00:25:40,000 --> 00:25:40,000
Okay.

380
00:25:40,000 --> 00:25:42,000
So you can just run this.

381
00:25:43,000 --> 00:25:45,000
There will not be very much difference.

382
00:25:48,000 --> 00:25:48,000
Okay.

383
00:25:50,000 --> 00:25:50,000
Okay.

384
00:25:51,000 --> 00:25:51,000
That's fine.

385
00:25:54,000 --> 00:25:54,000
Okay.

386
00:25:54,000 --> 00:25:58,000
So now over here now using these are the normalized confusion matrix.

387
00:25:59,000 --> 00:25:59,000
Okay.

388
00:26:02,000 --> 00:26:07,000
So now you can see 93% of the times our model correctly detected correctly that there is a dust mask.

389
00:26:07,000 --> 00:26:14,000
While 7% of the time our model are detected incorrectly or our model was unable to detect anything.

390
00:26:14,000 --> 00:26:21,000
Uh, 80% of the time, our model detected correctly that there is an IVR, and 1% of the time when there

391
00:26:21,000 --> 00:26:24,000
is an IVR, our model detected it as a shield.

392
00:26:24,000 --> 00:26:28,000
So this is how we can use confusion matrix.

393
00:26:28,000 --> 00:26:28,000
Okay.

394
00:26:28,000 --> 00:26:34,000
And then this tells us how much data we have in our data set.

395
00:26:34,000 --> 00:26:41,000
Like you can see over here for the shield class we have very much low data like very less data for the

396
00:26:42,000 --> 00:26:42,000
IVR.

397
00:26:42,000 --> 00:26:43,000
We also have less data.

398
00:26:43,000 --> 00:26:50,000
But for the, uh protective boots, protective helmets, safety vest, we have huge amount of data like

399
00:26:50,000 --> 00:26:50,000
these.

400
00:26:50,000 --> 00:26:53,000
We have 2000 instances around for the glove.

401
00:26:53,000 --> 00:26:54,000
Okay.

402
00:26:54,000 --> 00:26:57,000
So now you can see these are the results over here.

403
00:26:57,000 --> 00:27:02,000
Now we can see that, um, the mean average precision is increasing.

404
00:27:02,000 --> 00:27:04,000
We have loss decreasing.

405
00:27:04,000 --> 00:27:08,000
So this is how these are the model predictions on the training batches okay.

406
00:27:08,000 --> 00:27:11,000
And these are the model predictions for the validation batch.

407
00:27:11,000 --> 00:27:13,000
And this looks good.

408
00:27:13,000 --> 00:27:15,000
Like model is able to detect protective boots gloves.

409
00:27:15,000 --> 00:27:18,000
So we have not used the validation data for the training.

410
00:27:18,000 --> 00:27:22,000
So it's always better to take a look and see how our model is performing.

411
00:27:22,000 --> 00:27:27,000
Okay, so now what I can do over here is these are my model weights.

412
00:27:27,000 --> 00:27:33,000
So one thing I can do over here is I will just save this model weights into the drive.

413
00:27:33,000 --> 00:27:36,000
So now what I will do is I will just right now.

414
00:27:41,000 --> 00:27:45,000
I just need to mount Google Drive in the Google Colab.

415
00:27:46,000 --> 00:27:48,000
Uh, yes.

416
00:27:48,000 --> 00:27:48,000
That.

417
00:27:48,000 --> 00:27:48,000
Yeah.

418
00:27:50,000 --> 00:27:54,000
So I can just do is I can just copy this from here.

419
00:27:57,000 --> 00:28:00,000
And I can just run this over here.

420
00:28:04,000 --> 00:28:06,000
I will just connect to the Google Drive now.

421
00:28:10,000 --> 00:28:11,000
Okay.

422
00:28:17,000 --> 00:28:17,000
So this will take some time.

423
00:28:18,000 --> 00:28:20,000
In the meanwhile, I will just open a drive over here.

424
00:28:29,000 --> 00:28:30,000
This is my drive.

425
00:28:33,000 --> 00:28:37,000
Okay, so if I just refresh this, we have the drive over here.

426
00:28:38,000 --> 00:28:38,000
Okay.

427
00:28:43,000 --> 00:28:46,000
Okay, so one thing I can do is I will just, like, move.

428
00:28:46,000 --> 00:28:48,000
And I want to switch.

429
00:28:48,000 --> 00:28:50,000
I want to move this weights.

430
00:28:51,000 --> 00:28:52,000
Okay.

431
00:28:55,000 --> 00:29:00,000
From there to, uh, in which folder I want to move this, I can just, uh.

432
00:29:02,000 --> 00:29:02,000
Oh.

433
00:29:02,000 --> 00:29:03,000
This here.

434
00:29:06,000 --> 00:29:07,000
Okay.

435
00:29:08,000 --> 00:29:09,000
So let's see.

436
00:29:11,000 --> 00:29:17,000
And now if I just go to Yolov7 over here, I should have best spot.

437
00:29:24,000 --> 00:29:24,000
Yeah.

438
00:29:24,000 --> 00:29:26,000
Where we have our weights.

439
00:29:26,000 --> 00:29:28,000
So one thing I can do over here is.

440
00:29:32,000 --> 00:29:35,000
So you can just copy this thing from here.

441
00:29:38,000 --> 00:29:40,000
Oh, we only need this link.

442
00:29:40,000 --> 00:29:41,000
Like and share.

443
00:29:47,000 --> 00:29:51,000
So what we can do is you can just delete this cell.

444
00:30:03,000 --> 00:30:06,000
So let's see if it works on.

445
00:30:12,000 --> 00:30:14,000
So now you can see we are able to download the model weights.

446
00:30:14,000 --> 00:30:19,000
And you can see we have our best model weights over here.

447
00:30:19,000 --> 00:30:22,000
Okay, so now we'll just pass those best model weights.

448
00:30:22,000 --> 00:30:25,000
And we will now validate our fine tuned model.

449
00:30:25,000 --> 00:30:27,000
So we'll see how our model generalizes.

450
00:30:28,000 --> 00:30:31,000
Okay so.

451
00:30:37,000 --> 00:30:41,000
So now you can see over here we have our 637 images.

452
00:30:41,000 --> 00:30:44,000
And we are just passing those images in a batch of 16.

453
00:30:44,000 --> 00:30:45,000
Okay.

454
00:30:45,000 --> 00:30:48,000
And so we have total number of batches as 40.

455
00:30:48,000 --> 00:30:53,000
And you can see over here only for the shield class we are just getting a bit low mean average precision.

456
00:30:53,000 --> 00:30:59,000
As for all the classes we are getting good precision recall and mean average precision score okay.

457
00:30:59,000 --> 00:31:03,000
And our results are saved in over here.

458
00:31:04,000 --> 00:31:07,000
Okay so we have the confusion matrix F1 curves all those.

459
00:31:08,000 --> 00:31:13,000
So now we will just test our model or do perform inference.

460
00:31:13,000 --> 00:31:17,000
Or with our fine tuned model on the test data dataset images.

461
00:31:17,000 --> 00:31:17,000
Okay.

462
00:31:20,000 --> 00:31:26,000
So does it mean that we have these images and we are just trying to perform inference on this test dataset.

463
00:31:26,000 --> 00:31:26,000
Images.

464
00:31:26,000 --> 00:31:30,000
So we have around 322 images in our test dataset.

465
00:31:30,000 --> 00:31:30,000
Okay.

466
00:31:30,000 --> 00:31:32,000
So it will take few seconds.

467
00:31:32,000 --> 00:31:35,000
So now I will not display all these images over here.

468
00:31:35,000 --> 00:31:39,000
I will just try to save like three images over here.

469
00:31:42,000 --> 00:31:44,000
And let's see how our results look like.

470
00:31:44,000 --> 00:31:48,000
Like basically these are the this is the unseen data.

471
00:31:48,000 --> 00:31:50,000
We have not trained our model on a test data set.

472
00:31:50,000 --> 00:31:58,000
So our model is able to detect glove or dust mask, helmet, uh safety vest glove, protective boots.

473
00:31:58,000 --> 00:31:58,000
Results.

474
00:31:58,000 --> 00:31:59,000
Good.

475
00:31:59,000 --> 00:32:00,000
Dust mark glove.

476
00:32:00,000 --> 00:32:04,000
So now you can see that our results look quite promising over here.

477
00:32:04,000 --> 00:32:04,000
Okay.

478
00:32:04,000 --> 00:32:09,000
And now I will just download a sample image from my drive and see how our model performs.

479
00:32:13,000 --> 00:32:16,000
Okay, so I'm just setting a confidence threshold of 0.25.

480
00:32:16,000 --> 00:32:18,000
And I want to save the results.

481
00:32:20,000 --> 00:32:24,000
And let me just display this results over here.

482
00:32:27,000 --> 00:32:33,000
So now you can see we are able to detect the safety vest protective boots, safety vest protective helmet.

483
00:32:33,000 --> 00:32:34,000
Like we are just getting good results.

484
00:32:34,000 --> 00:32:40,000
So now I will download, uh, two videos from my drive directly into this Google Colab notebook so that

485
00:32:40,000 --> 00:32:42,000
I can test this fine tuned model.

486
00:32:42,000 --> 00:32:43,000
Okay.

487
00:32:50,000 --> 00:32:53,000
Okay, so let's in the meanwhile let's run this up as well.

488
00:33:03,000 --> 00:33:07,000
So now our the complete video is being divided into 310 frames.

489
00:33:07,000 --> 00:33:11,000
And we are doing detection on each of the frame one by one.

490
00:33:11,000 --> 00:33:13,000
And in the output we have a complete Update video.

491
00:33:22,000 --> 00:33:26,000
Okay, so let me just display this output video over here.

492
00:33:36,000 --> 00:33:41,000
So this will take a few seconds before we have the output video being displayed over here.

493
00:33:43,000 --> 00:33:44,000
Uh okay.

494
00:33:45,000 --> 00:33:49,000
But you can see our output video is basically saved over here.

495
00:33:50,000 --> 00:33:50,000
This is our output.

496
00:33:51,000 --> 00:33:54,000
Now you can see the size is 92 MB around.

497
00:33:54,000 --> 00:33:57,000
So it will just compress the size like you can see over here.

498
00:33:57,000 --> 00:33:59,000
It's around 3.7 MB.

499
00:33:59,000 --> 00:34:03,000
And now we will display this output video over here in Google Colab notebook.

500
00:34:12,000 --> 00:34:14,000
So this will take some time more.

501
00:34:14,000 --> 00:34:16,000
And where we have the output.

502
00:34:19,000 --> 00:34:22,000
So here is our output video in the Google Colab notebook.

503
00:34:22,000 --> 00:34:27,000
And let me download this video and show you how the results look like.

504
00:34:28,000 --> 00:34:29,000
Okay.

505
00:34:37,000 --> 00:34:42,000
Now you can see over here we are able to detect the safety valves protective protective helmet.

506
00:34:43,000 --> 00:34:45,000
The results look quite promising.

507
00:34:45,000 --> 00:34:51,000
Like you can see over here, we are able to take safety vest, protective helmet, protective boots.

508
00:34:51,000 --> 00:34:54,000
Okay, these results look quite promising to me.

509
00:34:54,000 --> 00:34:56,000
Like, very good results.

510
00:34:56,000 --> 00:34:58,000
Like, you can clearly see over here.

511
00:34:59,000 --> 00:35:05,000
Uh, in the same way we can test our model on this other video over here as well.

512
00:35:11,000 --> 00:35:14,000
So now you can see the complete video is divided into 267 frames.

513
00:35:14,000 --> 00:35:18,000
And we are doing detection on each of the frame one by one.

514
00:35:25,000 --> 00:35:27,000
I'm just saving down the notebook.

515
00:35:34,000 --> 00:35:35,000
Okay.

516
00:35:35,000 --> 00:35:37,000
So let's run this over here.

517
00:35:42,000 --> 00:35:43,000
Okay.

518
00:35:43,000 --> 00:35:51,000
So now we are just, uh uh, our output video is being saved to over here detect predict for.

519
00:35:51,000 --> 00:35:54,000
And I'm just trying to display the output video over here.

520
00:35:55,000 --> 00:35:56,000
Just Google Colab notebook.

521
00:35:56,000 --> 00:35:59,000
So this will take few seconds.

522
00:36:02,000 --> 00:36:08,000
So in short, in this tutorial we have learned that how we can fine tune or train the YOLO 11 model

523
00:36:08,000 --> 00:36:11,000
on any on personal protective equipment data set.

524
00:36:11,000 --> 00:36:19,000
You can also train or fine tune the model on any other data set as well, and our results look quite

525
00:36:19,000 --> 00:36:22,000
promising and clearly see over.

526
00:36:25,000 --> 00:36:29,000
So this will take few seconds before we have the output over here.

527
00:36:36,000 --> 00:36:38,000
So here is our output video.

528
00:36:38,000 --> 00:36:43,000
Let me download this video and just open this video now.

529
00:36:48,000 --> 00:36:55,000
So over here you can see that we are able to detect the safety vest protective boots uh helmet protective.

530
00:36:55,000 --> 00:36:56,000
So our results look quite good.

531
00:36:56,000 --> 00:37:03,000
In this tutorial we have seen that how we can train or fine tune, uh, the YOLO 11 model on any custom

532
00:37:03,000 --> 00:37:03,000
data set.

533
00:37:03,000 --> 00:37:06,000
That all that is all from this tutorial.

534
00:37:06,000 --> 00:37:07,000
Thank you for watching.

