1
00:00:03,060 --> 00:00:10,050
In this video tutorial, we will learn how we can fine tune our train geology ten model on a custom

2
00:00:10,050 --> 00:00:10,920
data set.

3
00:00:11,370 --> 00:00:18,300
Geology ten is the latest state of the art real time object detection model that outperforms all the

4
00:00:18,300 --> 00:00:24,090
other YOLO models in terms of speed, accuracy, parameters, efficiency.

5
00:00:24,750 --> 00:00:26,430
So let's get started with it.

6
00:00:26,430 --> 00:00:30,930
So now here you can see this is the Euro weekend GitHub repository.

7
00:00:30,930 --> 00:00:33,030
You can find all the details over here.

8
00:00:33,030 --> 00:00:34,260
Like you can see over here.

9
00:00:34,260 --> 00:00:37,560
The density and accuracy trade off is being presented over here.

10
00:00:37,560 --> 00:00:39,210
And what is basically latency.

11
00:00:39,450 --> 00:00:43,770
Our latency is basically the time taken to do object detection on an input image.

12
00:00:43,770 --> 00:00:46,680
So now you can see here we have written in milliseconds.

13
00:00:46,680 --> 00:00:48,570
And here we have the average precision.

14
00:00:48,570 --> 00:00:54,180
So you can see that uh yolo v ten models outperform all the other models.

15
00:00:54,330 --> 00:00:56,310
Uh in terms of accuracy.

16
00:00:56,340 --> 00:01:00,960
Uh, here you can see the average precision and in terms of, uh, latency as well.

17
00:01:00,960 --> 00:01:08,070
Like you can see that, uh, inference latency of your P10, uh, model is uh, much better as compared

18
00:01:08,070 --> 00:01:09,810
to other YOLO models.

19
00:01:09,810 --> 00:01:15,270
Like, you can see that, uh, YOLO v ten model takes less time to do object detection on input image

20
00:01:15,270 --> 00:01:16,590
like you can see over here.

21
00:01:16,920 --> 00:01:22,200
Uh, like other YOLO models, take very much time to do object detection on an input image, which can

22
00:01:22,200 --> 00:01:23,520
be seen over here.

23
00:01:23,520 --> 00:01:29,820
So the inference latency of V ten model is very good as compared to other YOLO models.

24
00:01:30,680 --> 00:01:33,590
And over here you can see size and accuracy.

25
00:01:33,590 --> 00:01:36,770
Trade off is being presented and where you can see number of parameters.

26
00:01:36,770 --> 00:01:38,870
And this is represented in millions.

27
00:01:38,870 --> 00:01:41,030
Like you can see that uh yellow by ten.

28
00:01:41,300 --> 00:01:43,370
Uh in yellow by ten.

29
00:01:43,370 --> 00:01:48,080
We adopt efficiency driven design strategy which reduces the computational overhead.

30
00:01:48,080 --> 00:01:50,960
And we optimize various model components as well.

31
00:01:50,960 --> 00:01:57,200
So in this case of less number of parameters are used as compared to other YOLO models.

32
00:01:57,200 --> 00:02:03,320
And YOLO v ten also outperforms all the other European models in terms of accuracy as well.

33
00:02:03,320 --> 00:02:08,930
And you can see over here the return comes with uh, six different models here the ten nano, small,

34
00:02:08,930 --> 00:02:10,910
medium, large, extra large.

35
00:02:11,000 --> 00:02:11,780
Over here.

36
00:02:11,780 --> 00:02:16,010
And here you can see if you want to train your model with, uh, ten model on a custom data set.

37
00:02:16,010 --> 00:02:18,140
All the steps are provided over here.

38
00:02:18,140 --> 00:02:19,550
You can check this guide.

39
00:02:19,550 --> 00:02:24,440
Or you can do a predictions on an input image using the pre-trained model weights.

40
00:02:24,440 --> 00:02:26,720
Or fine tune your model weights as well.

41
00:02:26,720 --> 00:02:33,230
Plus here you can find all the details or you can export the model into, uh, Onnx format or in some

42
00:02:33,230 --> 00:02:34,280
other format as well.

43
00:02:34,280 --> 00:02:36,740
And here you can see if you want to do the citations.

44
00:02:36,740 --> 00:02:38,690
All this stuff is being presented over here.

45
00:02:38,690 --> 00:02:44,390
So you can check this GitHub repo and uh, some, uh, all the uh, persons who have contributed into

46
00:02:44,390 --> 00:02:46,370
this wrapper are also mentioned over here.

47
00:02:46,670 --> 00:02:48,680
Uh, what updates they have done, like.

48
00:02:49,810 --> 00:02:56,410
Um, uh, like you can see over here like this, uh, GitHub user has integrated the sword object tracking

49
00:02:56,410 --> 00:02:57,070
as well.

50
00:02:57,070 --> 00:02:59,620
So all these contributions are mentioned over here.

51
00:02:59,620 --> 00:03:04,870
And like, uh, this GitHub user has, uh, added a hugging face demo over here.

52
00:03:04,870 --> 00:03:09,130
Colab demo, all these videos you can find over here as well.

53
00:03:09,130 --> 00:03:09,700
Okay.

54
00:03:09,700 --> 00:03:15,490
And in this video tutorial we'll be fine tuning the YOLO v ten model or training the yellow V ten model

55
00:03:15,490 --> 00:03:17,590
on Personal Protective Equipment data set.

56
00:03:17,590 --> 00:03:19,690
And you can find this data set on Roboflow.

57
00:03:19,690 --> 00:03:22,450
So like you can see all the details over here.

58
00:03:22,450 --> 00:03:28,540
So we will have seven different classes in our data set which includes safety mask just uh, safety

59
00:03:28,540 --> 00:03:31,870
mask vest uh, safety boots over here.

60
00:03:32,960 --> 00:03:34,310
Like you can see over here.

61
00:03:34,310 --> 00:03:35,720
We have the safety boards.

62
00:03:35,720 --> 00:03:36,710
Uh, safety helmet.

63
00:03:36,710 --> 00:03:38,660
Where's our safety mask?

64
00:03:38,690 --> 00:03:41,270
Eyeglasses on these glasses here.

65
00:03:41,270 --> 00:03:43,880
So they are seven different classes in our data set.

66
00:03:43,880 --> 00:03:49,280
And our data set is very much balanced in terms of, uh, which you can find all the details in the

67
00:03:49,280 --> 00:03:50,450
health check over here.

68
00:03:50,750 --> 00:03:53,690
So the dataset throughput and we will be using this data set.

69
00:03:53,870 --> 00:04:00,350
Uh, so to export this data set into Roboflow, you just need to uh, to export this data set from Roboflow

70
00:04:00,350 --> 00:04:02,210
into your Google Colab notebook.

71
00:04:02,360 --> 00:04:06,710
Uh, you need to create an account on Roboflow and, and create an account on Roboflow.

72
00:04:06,710 --> 00:04:11,840
Then you just need to click on download Data set and download so you can there you can download the

73
00:04:11,840 --> 00:04:16,940
data set on your local system, but you can just click on show download code and you can just convert

74
00:04:16,940 --> 00:04:21,230
this dataset into YOLO V8 format and see the written format is not available over here.

75
00:04:21,380 --> 00:04:23,150
So we can use your V8 format.

76
00:04:23,150 --> 00:04:23,810
And.

77
00:04:25,600 --> 00:04:25,810
Here.

78
00:04:25,810 --> 00:04:31,450
Now, if you just, uh, add this word into your Google Colab notebook, you would be able to download

79
00:04:31,450 --> 00:04:34,990
the data set from Roboflow into the Google Colab notebook.

80
00:04:35,440 --> 00:04:35,860
Okay.

81
00:04:36,250 --> 00:04:39,760
So now in the first step, you need to clone the YOLO and GitHub repo.

82
00:04:39,790 --> 00:04:45,280
I have already trained the model on this personal protective data set, because if I start training

83
00:04:45,280 --> 00:04:47,170
over here, this will take so much time.

84
00:04:47,290 --> 00:04:50,020
Uh, but I will explain you all the results that we got.

85
00:04:50,020 --> 00:04:56,320
Now we install all the required packages, uh, that we require to do object detection or, uh, to

86
00:04:56,320 --> 00:04:57,520
fine tune the model.

87
00:04:57,520 --> 00:05:02,320
So these are the all the packages that are required, and these will be installed one by one.

88
00:05:03,540 --> 00:05:07,470
Then over here in the step number three we'll download all the pre-trained model weights.

89
00:05:07,470 --> 00:05:10,470
So let's wait until all these packages get installed.

90
00:05:13,040 --> 00:05:17,480
While you can see all the required packages are installed, it's necessary that you install all the

91
00:05:17,480 --> 00:05:21,710
required packages before you start training the model on custom data set.

92
00:05:21,710 --> 00:05:27,230
So now if I just go over here into the repository and click over here, you can find all the model weights

93
00:05:27,230 --> 00:05:27,950
over here.

94
00:05:28,040 --> 00:05:29,930
We just need to copy the link address.

95
00:05:29,930 --> 00:05:33,920
And you just need to add all these addresses over here.

96
00:05:33,920 --> 00:05:40,700
And this will download the each of the model weights into uh your uh into what this uh folder over here.

97
00:05:40,700 --> 00:05:42,800
So our weights folder will be created over here.

98
00:05:42,800 --> 00:05:45,650
And inside this you can have all the model weights.

99
00:05:46,100 --> 00:05:48,260
We ten model weights added over there.

100
00:05:48,740 --> 00:05:49,220
Okay.

101
00:05:49,640 --> 00:05:51,770
So now uh.

102
00:05:54,710 --> 00:05:59,900
Now this now you can see over here a waste folder is being created and you can find all the model weights

103
00:05:59,900 --> 00:06:00,620
over here.

104
00:06:00,950 --> 00:06:01,250
Okay.

105
00:06:01,250 --> 00:06:05,030
So now we'll download the data set from Roboflow into this Google Colab notebook.

106
00:06:05,060 --> 00:06:05,480
Okay.

107
00:06:05,480 --> 00:06:10,460
So you can simply copy this code over here and just and this code over here.

108
00:06:10,460 --> 00:06:14,840
So this will download the data set from Roboflow into this Colab notebook.

109
00:06:14,930 --> 00:06:16,640
So this will take some time.

110
00:06:30,830 --> 00:06:33,620
Now you can see the data set is being downloaded over here.

111
00:06:33,920 --> 00:06:37,760
So this will take more time over here before it gets to 100%.

112
00:06:39,950 --> 00:06:40,400
Again.

113
00:06:43,660 --> 00:06:44,410
Okay.

114
00:06:45,600 --> 00:06:47,760
Or now you can see we have the data set over here.

115
00:06:47,820 --> 00:06:53,130
Now one thing you before you go ahead, you need to do over here is, uh, you need to rename this folder

116
00:06:53,130 --> 00:06:55,020
and you just need to open a data dot.

117
00:06:55,020 --> 00:06:56,640
YML file over here.

118
00:06:58,600 --> 00:07:02,920
Uh, as you open it, a relatively small file, then you just need to update this path over here.

119
00:07:02,920 --> 00:07:09,190
So you just need to click over here and copy path over here and just add those paths over here.

120
00:07:09,640 --> 00:07:13,210
So these are the two things that are must to do.

121
00:07:15,410 --> 00:07:21,140
Okay, so I just you just need to update these two parts over here, and then you just need to set this

122
00:07:21,140 --> 00:07:23,090
folder as your current directory again.

123
00:07:23,450 --> 00:07:24,050
Okay.

124
00:07:24,590 --> 00:07:29,690
And then we will train the model ten model will be ten model on the custom data set.

125
00:07:29,690 --> 00:07:32,870
So now you can see that we are doing object detection.

126
00:07:32,870 --> 00:07:33,890
The mode is training.

127
00:07:33,890 --> 00:07:37,820
Currently uh we are training a model for 50 epochs.

128
00:07:37,820 --> 00:07:39,950
And the bad side is being set to 16.

129
00:07:39,950 --> 00:07:43,220
Uh, the bad side is being set to, uh, in the powers.

130
00:07:43,220 --> 00:07:48,470
So, uh, if you have a very large data set, then please make sure that you reduce the batch size to

131
00:07:48,470 --> 00:07:48,950
eight.

132
00:07:49,100 --> 00:07:49,430
Okay.

133
00:07:49,430 --> 00:07:51,200
So this is the, uh, batch size.

134
00:07:51,200 --> 00:07:56,060
Basically mean how much chunk we pass the data to the model for the training.

135
00:07:56,480 --> 00:07:57,080
Okay.

136
00:07:58,070 --> 00:08:00,590
Then I'm just using your written nano model.

137
00:08:00,590 --> 00:08:03,980
And here I'm just passing the link to the data dot yml file.

138
00:08:03,980 --> 00:08:08,720
In the YAML file I have the path for training and validation images set.

139
00:08:09,200 --> 00:08:15,500
Okay, so now uh, here I have trained the model for 50 epochs on personal protective equipment data

140
00:08:15,500 --> 00:08:15,860
set.

141
00:08:15,860 --> 00:08:19,280
And you can see that these are the, uh, precision sport.

142
00:08:19,340 --> 00:08:20,600
Uh, we have got over here.

143
00:08:20,600 --> 00:08:25,100
And here we have the record score over here, and here we have the mean average precision, uh, with

144
00:08:25,100 --> 00:08:28,190
I use, uh, 50% or 0.50.

145
00:08:28,190 --> 00:08:33,320
And here we have the mean average precision when IOU varies from 0.5 to 0.95.

146
00:08:33,350 --> 00:08:33,980
Okay.

147
00:08:34,400 --> 00:08:34,790
Uh.

148
00:08:35,890 --> 00:08:37,480
To the Brazilian forest law.

149
00:08:37,480 --> 00:08:42,970
It means that there are many false positives, and if the false positives are low, it means that there

150
00:08:42,970 --> 00:08:44,140
are many false negatives.

151
00:08:44,140 --> 00:08:50,080
But in all the cases, the procedures for export in accepting a shield case, the precision score is

152
00:08:50,080 --> 00:08:50,560
a bit lower.

153
00:08:50,560 --> 00:08:53,050
So it means there are not many false negatives okay.

154
00:08:53,530 --> 00:08:59,980
So now these are the basically uh, this graph basically after training we have these result graphs.

155
00:08:59,980 --> 00:09:06,760
So now you can see that uh, this graph shows uh, how many instances of mass we have in our dusty,

156
00:09:06,790 --> 00:09:09,070
uh, dust mask we have in our training set.

157
00:09:09,070 --> 00:09:10,270
So these are not the images.

158
00:09:10,270 --> 00:09:15,130
These are the number of annotations for the dust masks we have in our training data for the IVR.

159
00:09:15,130 --> 00:09:20,140
And you can see for the shield, we have very low instances of annotations.

160
00:09:20,320 --> 00:09:24,550
Uh, for the shield shield class, we have very low instances for annotations.

161
00:09:24,550 --> 00:09:29,650
So now that is why you can see that, uh, the score is not good and precision score is also low as

162
00:09:29,650 --> 00:09:31,780
compared to other, uh, classes.

163
00:09:33,100 --> 00:09:33,640
Okay.

164
00:09:33,640 --> 00:09:37,000
And here you can see that we have the precision point curve.

165
00:09:37,000 --> 00:09:42,130
So now you can see the precision score over here at different points scores okay.

166
00:09:42,130 --> 00:09:44,350
So this is the precision score over here.

167
00:09:44,350 --> 00:09:48,790
These are the recall score at different confidence schedules over here.

168
00:09:48,790 --> 00:09:51,100
And this is the precision and recall curve.

169
00:09:51,100 --> 00:09:53,830
And here we have the F1 and confidence curve.

170
00:09:53,830 --> 00:09:58,810
So F1 confidence curve basically tells us that trade off between the precision and recall scores.

171
00:09:58,810 --> 00:10:01,210
And here we have the confusion matrix.

172
00:10:01,330 --> 00:10:03,430
Uh so let us see the normalized.

173
00:10:03,430 --> 00:10:10,570
So now you can see that uh the confusion matrix show us that um the model predicted uh, like in 282

174
00:10:10,570 --> 00:10:16,030
cases when there was a dust mask, the model predicted correctly, while 26 times when there was the

175
00:10:16,030 --> 00:10:21,130
user or the person was wearing a dust mask, the model was unable to predict anything.

176
00:10:21,130 --> 00:10:21,640
Okay.

177
00:10:21,640 --> 00:10:27,970
And, uh, for the glove, 551 times when the user, uh, the person was wearing a glove, the model

178
00:10:27,970 --> 00:10:30,310
predicted correctly that the person is wearing a glove.

179
00:10:30,310 --> 00:10:37,090
And one time when the person is wearing a glove, the model, uh, has a false positive prediction and

180
00:10:37,090 --> 00:10:38,890
predicted that it has a IVR.

181
00:10:38,890 --> 00:10:42,820
And, um, three times when the person is wearing a glove.

182
00:10:42,820 --> 00:10:47,020
The model predicted as a protective helmet three times when the person is wearing a glove, the model

183
00:10:47,020 --> 00:10:52,810
predicted as a safety vest, and one time when the person is wearing a glove, the model predicted as

184
00:10:52,810 --> 00:10:53,080
shield.

185
00:10:53,080 --> 00:10:54,670
So these are all the false positives.

186
00:10:54,670 --> 00:11:00,400
And 72 times when the person was wearing a glove, the model predicted as a, uh, background.

187
00:11:00,400 --> 00:11:04,150
Like nothing, the model was unable to detect anything, so this was a false negative.

188
00:11:04,150 --> 00:11:06,400
Similarly, you can go for other classes as well.

189
00:11:06,400 --> 00:11:08,140
And these are the normalized scores.

190
00:11:08,140 --> 00:11:13,150
Like 92% of the times when the person was wearing a mask, a dust mask, the model predicted correctly,

191
00:11:13,210 --> 00:11:18,850
while eight times percent of the times when the model was wearing, when the person was wearing a dust

192
00:11:18,850 --> 00:11:21,490
mask, the model was unable to detect anything.

193
00:11:22,770 --> 00:11:28,710
Okay, so 67% of the times when the person was wearing IVR, the uh, model was able to detect correctly,

194
00:11:28,710 --> 00:11:35,430
while 33% of times when the person was wearing um IVR, the model was unable to detect anything.

195
00:11:35,430 --> 00:11:38,670
So you can see this chart over here as well.

196
00:11:38,670 --> 00:11:42,540
And over here you can see the mean average precision is continuously improving.

197
00:11:42,540 --> 00:11:44,550
So we have trained the model for 50 bucks.

198
00:11:44,550 --> 00:11:46,830
But you can see that uh there is a continuous increase.

199
00:11:46,830 --> 00:11:51,870
So if you just train the model for 100 epochs you can see a better mean average precision score.

200
00:11:51,870 --> 00:11:56,190
And uh, all this and you can see the loss is continuously decreasing.

201
00:11:56,190 --> 00:11:59,010
So these are the model predictions on the validation batch.

202
00:11:59,010 --> 00:12:01,470
So we have not used the validation images for training.

203
00:12:01,470 --> 00:12:07,230
So it's always better uh to have a look and see how our model is performing on the validation batch.

204
00:12:07,530 --> 00:12:07,920
Okay.

205
00:12:07,920 --> 00:12:09,660
And the result looks quite promising.

206
00:12:10,740 --> 00:12:13,560
So after training, I placed the model weights onto my drive.

207
00:12:13,560 --> 00:12:18,210
So I will directly download the model weights from the drive into this Google Colab notebook over here.

208
00:12:18,750 --> 00:12:19,260
Okay.

209
00:12:20,820 --> 00:12:22,470
Although I have already moderates now.

210
00:12:22,470 --> 00:12:25,440
And you can find the weights over here.

211
00:12:28,490 --> 00:12:29,030
And.

212
00:12:33,460 --> 00:12:35,410
Well, now you can see here we have the best model weights.

213
00:12:36,280 --> 00:12:36,640
All right.

214
00:12:36,640 --> 00:12:43,300
I will download a random image for that model testing from my, uh, drive into this Google Colab notebook.

215
00:12:43,300 --> 00:12:44,530
And let's do the testing.

216
00:12:44,530 --> 00:12:47,740
We are doing object detection, but the mod is not training or prediction.

217
00:12:47,740 --> 00:12:53,980
So all the prediction we have a confidence about 0.25 uh, will be uh, the bounding boxes will be drawn

218
00:12:53,980 --> 00:12:55,990
around them and we want to save the output.

219
00:12:55,990 --> 00:12:58,750
And we are just using the best model weights we have got from the training.

220
00:12:58,750 --> 00:13:03,400
And then, uh, the test for the test we are using this image, one which have downloaded from the Google

221
00:13:03,430 --> 00:13:03,970
Drive.

222
00:13:12,140 --> 00:13:16,220
Now you can see we are able to detect the protective helmets, but the gloves are not detected.

223
00:13:16,370 --> 00:13:19,790
Also, if we train the model higher epochs, results might work.

224
00:13:20,390 --> 00:13:24,500
So now test on some other image as well and see what results we are getting over there.

225
00:13:28,370 --> 00:13:31,100
So now you can see over here we are able to take the safety vest.

226
00:13:31,100 --> 00:13:35,030
Protect the valve was a safety vest protective helmets.

227
00:13:35,030 --> 00:13:37,760
So the result over here looks quite promising.

228
00:13:37,760 --> 00:13:42,470
So now we will download some random demo videos from the drive into this Google Colab notebook.

229
00:13:42,470 --> 00:13:46,010
And let's see what results do we get on this demo videos.

230
00:13:53,610 --> 00:13:58,200
So now you can see over here I'm doing testing on these demo videos over here.

231
00:13:59,370 --> 00:13:59,550
On.

232
00:13:59,550 --> 00:14:01,170
Let's see how does it works.

233
00:14:04,530 --> 00:14:09,630
Now you can see that the complete video is being divided into 310 frames, and we are doing object detection

234
00:14:09,630 --> 00:14:11,070
on each of the frame one by one.

235
00:14:11,070 --> 00:14:14,880
And in the output we will get the combined, uh, output video.

236
00:14:15,120 --> 00:14:15,750
Okay.

237
00:14:17,880 --> 00:14:22,080
So meanwhile the object detection on this input video gets appeared.

238
00:14:22,080 --> 00:14:22,770
So it's done.

239
00:14:22,770 --> 00:14:26,670
And I have already, uh, displayed the output video over here.

240
00:14:26,670 --> 00:14:28,380
So let me just download this.

241
00:14:30,730 --> 00:14:32,500
Okay, so it's downloading.

242
00:14:32,500 --> 00:14:35,140
Let me navigate my screen towards the outward.

243
00:14:35,590 --> 00:14:36,640
Let me show you this.

244
00:14:38,920 --> 00:14:41,980
Well, now you can see where here we are able to take the safety words.

245
00:14:42,100 --> 00:14:43,990
Uh, protective helmet, protective boots.

246
00:14:43,990 --> 00:14:46,060
And the results look quite promising to me.

247
00:14:46,060 --> 00:14:47,560
And these are the good results.

248
00:14:47,740 --> 00:14:49,480
Okay, so.

249
00:14:51,130 --> 00:14:54,250
In the same way I tested on the other video as well.

250
00:14:54,250 --> 00:14:57,010
And, uh, here we have the results over here.

251
00:14:57,010 --> 00:15:00,220
So let me just, uh, download this as well.

252
00:15:01,540 --> 00:15:01,750
Okay.

253
00:15:01,750 --> 00:15:03,730
Let me navigate my screen towards it.

254
00:15:04,060 --> 00:15:06,100
So now you can see that this is another video.

255
00:15:06,100 --> 00:15:09,400
And we are able to detect the safety valves protective boots.

256
00:15:09,400 --> 00:15:10,000
Very good.

257
00:15:10,000 --> 00:15:12,280
The result and look very promising to me.

258
00:15:12,550 --> 00:15:14,560
Uh, so that's all from this tutorial.

259
00:15:14,590 --> 00:15:15,700
Thank you for watching.

260
00:15:15,700 --> 00:15:16,300
Bye bye.