1
00:00:03,000 --> 00:00:09,000
In this video tutorial, we will learn how we can fine tune the YOLO 11 instance segmentation model

2
00:00:09,000 --> 00:00:11,000
on a custom data set.

3
00:00:11,000 --> 00:00:17,000
We will find you on the YOLO 11 instance segmentation model on for potholes detection.

4
00:00:17,000 --> 00:00:19,000
And here is a quick demo.

5
00:00:19,000 --> 00:00:22,000
So you can see that we are able to detect potholes in this video.

6
00:00:22,000 --> 00:00:25,000
And you can see that our results look quite good.

7
00:00:25,000 --> 00:00:33,000
So we will be fine tuning the YOLO 11 instance segmentation model on potholes dataset.

8
00:00:33,000 --> 00:00:36,000
And we will be detecting the potholes in images videos.

9
00:00:36,000 --> 00:00:40,000
And you can also detect the potholes on the live feed as well.

10
00:00:40,000 --> 00:00:47,000
And uh, we will be training the uh or fine tuning the YOLO 11 segmentation model on potholes data set,

11
00:00:47,000 --> 00:00:49,000
which is available on Kaggle.

12
00:00:49,000 --> 00:00:51,000
So let me first show you the data set.

13
00:00:51,000 --> 00:00:54,000
And so here you can see we are at Kaggle.

14
00:00:54,000 --> 00:00:59,000
So here you can see we have the pothole image data set available over here okay.

15
00:00:59,000 --> 00:01:03,000
so you can see we have around 600 images over here.

16
00:01:03,000 --> 00:01:05,000
And you can see we have different potholes.

17
00:01:05,000 --> 00:01:07,000
And all these images are annotated as well.

18
00:01:07,000 --> 00:01:12,000
So we have the annotation results in a dot yml file over here.

19
00:01:13,000 --> 00:01:14,000
Okay.

20
00:01:14,000 --> 00:01:17,000
So you can see over here um that here we have the data set.

21
00:01:17,000 --> 00:01:20,000
So uh we just first need to download this data set.

22
00:01:20,000 --> 00:01:24,000
So you just need to sign in if you have already have an account on Kaggle.

23
00:01:24,000 --> 00:01:29,000
But if you don't have an account on Kaggle you can, uh, create your own account on Kaggle.

24
00:01:29,000 --> 00:01:30,000
It's very simple.

25
00:01:30,000 --> 00:01:35,000
So you can see I've just, uh, log in with my own account on Kaggle and you can just click on download

26
00:01:35,000 --> 00:01:38,000
over here and you will be able to download this data set.

27
00:01:38,000 --> 00:01:40,000
So it will be downloaded in zip format.

28
00:01:40,000 --> 00:01:41,000
Okay.

29
00:01:41,000 --> 00:01:43,000
So I've already downloaded this data set.

30
00:01:43,000 --> 00:01:46,000
So I will not download it again.

31
00:01:46,000 --> 00:01:48,000
After this I will just go to Roboflow.

32
00:01:50,000 --> 00:01:56,000
Uh, we are using Roboflow so that we can convert our data set, um, into a format so that we can,

33
00:01:56,000 --> 00:01:59,000
uh, train our YOLO 1110 segmentation model.

34
00:02:00,000 --> 00:02:00,000
Okay.

35
00:02:05,000 --> 00:02:06,000
So I just need to sign in.

36
00:02:06,000 --> 00:02:12,000
But if you have done creating your account you can just create an account on Roboflow as well.

37
00:02:15,000 --> 00:02:18,000
So this will take a few seconds and I will assign Google.

38
00:02:18,000 --> 00:02:23,000
But if you haven't created account you can just create an account with your GitHub account with your

39
00:02:23,000 --> 00:02:26,000
work email and use what ever you want.

40
00:02:27,000 --> 00:02:31,000
So there are multiple ways through which you can create an account on Roboflow.

41
00:02:32,000 --> 00:02:37,000
Okay, so here you can see all your projects and you can see I've just created this project six days

42
00:02:37,000 --> 00:02:37,000
ago as well.

43
00:02:37,000 --> 00:02:39,000
But I will not be using this.

44
00:02:39,000 --> 00:02:45,000
I will just create a new project so I can show you how you can, uh, create, convert your data set

45
00:02:45,000 --> 00:02:49,000
into a format so that you can fine tune the YOLO 11 instance segmentation model.

46
00:02:49,000 --> 00:02:58,000
So what Wolds texture And segment pitch.

47
00:02:58,000 --> 00:03:03,000
So this is the name of my project.

48
00:03:03,000 --> 00:03:05,000
Pothole selection segmentation.

49
00:03:05,000 --> 00:03:07,000
And we are detecting the pothole.

50
00:03:07,000 --> 00:03:10,000
And you can see we have the instance segmentation.

51
00:03:10,000 --> 00:03:11,000
Okay.

52
00:03:11,000 --> 00:03:14,000
So detect multiple objects and their actual shape.

53
00:03:14,000 --> 00:03:16,000
Best for measurement and pot shapes.

54
00:03:16,000 --> 00:03:17,000
Okay.

55
00:03:17,000 --> 00:03:18,000
So let's get started.

56
00:03:18,000 --> 00:03:20,000
Create a public project from here.

57
00:03:22,000 --> 00:03:23,000
Okay.

58
00:03:23,000 --> 00:03:27,000
So now you can see we have instance segmentation project not object detection.

59
00:03:27,000 --> 00:03:29,000
Then we need to upload data.

60
00:03:29,000 --> 00:03:31,000
So I will just click on select folder from here.

61
00:03:31,000 --> 00:03:32,000
Okay.

62
00:03:32,000 --> 00:03:35,000
So um you can see over here I have this folder.

63
00:03:35,000 --> 00:03:38,000
And I will just upload this folder complete.

64
00:03:38,000 --> 00:03:41,000
Because inside this folder I have the train validation folders.

65
00:03:41,000 --> 00:03:44,000
So I will just upload this complete folder over here.

66
00:03:44,000 --> 00:03:48,000
And you can see we have around 1564 files.

67
00:03:48,000 --> 00:03:54,000
So we have around 750 images um 760 images around folder.

68
00:03:55,000 --> 00:03:57,000
are images and we have.

69
00:03:57,000 --> 00:03:59,000
So basically our data set is integrated.

70
00:03:59,000 --> 00:04:02,000
So our half files are called annotations.

71
00:04:02,000 --> 00:04:04,000
And half files are basically images.

72
00:04:04,000 --> 00:04:05,000
Okay.

73
00:04:05,000 --> 00:04:08,000
So you can just upload this complete data set over here.

74
00:04:08,000 --> 00:04:10,000
And this will take uh not very much time.

75
00:04:10,000 --> 00:04:11,000
Like okay.

76
00:04:11,000 --> 00:04:15,000
So there is a video as well uh in this folder.

77
00:04:15,000 --> 00:04:19,000
So we need to extract one frame per second okay.

78
00:04:19,000 --> 00:04:20,000
That's fine.

79
00:04:25,000 --> 00:04:29,000
So all this data set will be uploaded over here.

80
00:04:29,000 --> 00:04:30,000
So this will take some time.

81
00:04:30,000 --> 00:04:35,000
And let's wait and see as this data set gets uploaded over here.

82
00:04:51,000 --> 00:04:53,000
So this the files are uploading.

83
00:04:53,000 --> 00:04:56,000
You can see the files are being processed over here.

84
00:04:57,000 --> 00:04:57,000
Okay.

85
00:04:57,000 --> 00:04:59,000
So let's see.

86
00:05:00,000 --> 00:05:02,000
Hopefully this will finish very soon.

87
00:05:04,000 --> 00:05:05,000
Okay.

88
00:05:05,000 --> 00:05:11,000
So now over here you can see that, uh, we will be using 70% of our data for the training, 20% of

89
00:05:11,000 --> 00:05:14,000
the data for the validation purpose, and 10% of the data for the test purpose.

90
00:05:15,000 --> 00:05:15,000
Okay.

91
00:05:15,000 --> 00:05:24,000
So we have around 796 images, and out of 796 images, 780 images are annotated and 16 images are not

92
00:05:24,000 --> 00:05:25,000
annotated.

93
00:05:25,000 --> 00:05:29,000
So I will not try to annotate like we have only one class as we are taking part.

94
00:05:30,000 --> 00:05:32,000
So 780 images are enough.

95
00:05:32,000 --> 00:05:38,000
So I'm not uploading these files in my project which I have created like potholes prediction.

96
00:05:38,000 --> 00:05:41,000
So this will take few more seconds over here.

97
00:05:45,000 --> 00:05:49,000
Okay, so let's see how does it goes.

98
00:05:50,000 --> 00:05:54,000
So now you can see that, the files are being uploaded over here.

99
00:05:59,000 --> 00:06:05,000
Okay, so now you can see in my data set I have 780 images.

100
00:06:05,000 --> 00:06:08,000
Um, I don't want to rotate my images.

101
00:06:09,000 --> 00:06:10,000
Just close this.

102
00:06:11,000 --> 00:06:18,000
Okay, so now you can see over here our images are being generated and you can see over here we have

103
00:06:18,000 --> 00:06:19,000
detected pothole.

104
00:06:19,000 --> 00:06:20,000
Pothole.

105
00:06:20,000 --> 00:06:20,000
Okay.

106
00:06:23,000 --> 00:06:24,000
We have detected pothole.

107
00:06:24,000 --> 00:06:24,000
Pothole.

108
00:06:24,000 --> 00:06:28,000
And you can see that the name of the class is very large.

109
00:06:28,000 --> 00:06:30,000
So we can just update this as well.

110
00:06:30,000 --> 00:06:35,000
So you can see that all the images over here are updated.

111
00:06:35,000 --> 00:06:35,000
Okay.

112
00:06:36,000 --> 00:06:41,000
And we can just check this so you can see that the annotations are quite good.

113
00:06:41,000 --> 00:06:42,000
Like you can see over here.

114
00:06:42,000 --> 00:06:45,000
All the images are annotated.

115
00:06:45,000 --> 00:06:46,000
Okay.

116
00:06:46,000 --> 00:06:49,000
So next thing we can just go to generate over here.

117
00:06:49,000 --> 00:06:50,000
Okay.

118
00:06:50,000 --> 00:06:55,000
So we have one class and we have separate images.

119
00:06:55,000 --> 00:06:56,000
Okay.

120
00:06:56,000 --> 00:07:03,000
So we are taking 70% of the images like 544 images for the training, 156 images for the validation

121
00:07:03,000 --> 00:07:06,000
and 10% images like 80 images for the test purpose.

122
00:07:06,000 --> 00:07:06,000
Okay.

123
00:07:06,000 --> 00:07:08,000
So now we will do some pre-processing.

124
00:07:08,000 --> 00:07:12,000
We will resize each image to six 4640.

125
00:07:12,000 --> 00:07:13,000
Okay.

126
00:07:15,000 --> 00:07:20,000
So one thing we can say is we can add further pre-processing step as well.

127
00:07:21,000 --> 00:07:22,000
Or we can just modify the class name.

128
00:07:23,000 --> 00:07:26,000
Like we can just update this class name to point four okay.

129
00:07:28,000 --> 00:07:29,000
Okay.

130
00:07:29,000 --> 00:07:31,000
There are other pre-processing steps as well.

131
00:07:31,000 --> 00:07:36,000
So if you don't have enough data for the training, uh, you can just increase the size of your training

132
00:07:36,000 --> 00:07:40,000
data by converting your images to grayscale.

133
00:07:40,000 --> 00:07:41,000
Okay.

134
00:07:41,000 --> 00:07:48,000
And, uh, like following other pre-processing steps, like crop some images, uh, or just isolate

135
00:07:48,000 --> 00:07:48,000
objects.

136
00:07:48,000 --> 00:07:53,000
So they are multiple pre-processing steps that you can follow if you don't have enough data for the

137
00:07:53,000 --> 00:07:54,000
training purpose.

138
00:07:54,000 --> 00:07:55,000
Okay.

139
00:07:57,000 --> 00:08:04,000
And where we are resizing our image to size six 4640, because our YOLO 11 model has been trained on

140
00:08:04,000 --> 00:08:06,000
an image size of 640 cross 640.

141
00:08:06,000 --> 00:08:08,000
So it's always better that you fine tune.

142
00:08:08,000 --> 00:08:09,000
When you fine tune your model.

143
00:08:09,000 --> 00:08:12,000
You just, uh, have the image of size 640.

144
00:08:13,000 --> 00:08:19,000
640 because if you increase the size or decrease the size, um, that might affect your results.

145
00:08:19,000 --> 00:08:24,000
Okay, so and if you don't have enough training data, then you can increase the size of the training

146
00:08:24,000 --> 00:08:29,000
data by, uh, converting these images to rotation and rotating.

147
00:08:29,000 --> 00:08:36,000
These images are cropping these images and rescaling these images, adding, uh, increasing the brightness

148
00:08:36,000 --> 00:08:36,000
of these images.

149
00:08:36,000 --> 00:08:40,000
So they are multiple augmentation steps over here that you can follow.

150
00:08:41,000 --> 00:08:41,000
Okay.

151
00:08:43,000 --> 00:08:49,000
Now I will just, uh, generate the data set over here, and then I will export this data set from Roboflow

152
00:08:49,000 --> 00:08:51,000
into my Google Colab notebook.

153
00:08:54,000 --> 00:08:58,000
So now you can see over here my data set is ready.

154
00:08:58,000 --> 00:09:00,000
So now I will just download this data set.

155
00:09:00,000 --> 00:09:05,000
So I just want to download this data set in your 11th format okay.

156
00:09:05,000 --> 00:09:09,000
And um it will show me download code over from here.

157
00:09:12,000 --> 00:09:16,000
So I will just copy this code from here okay.

158
00:09:17,000 --> 00:09:21,000
And I will just go over here okay.

159
00:09:21,000 --> 00:09:25,000
So this is a Google Colab notebook which I have created for this project.

160
00:09:25,000 --> 00:09:26,000
And I have written all the code.

161
00:09:26,000 --> 00:09:31,000
I will just show you a step by step guide how you can fine tune the YOLO 11 instance segmentation model

162
00:09:31,000 --> 00:09:33,000
or potholes detection.

163
00:09:33,000 --> 00:09:39,000
And you can just remove this and you can just add this, uh, code over here.

164
00:09:39,000 --> 00:09:45,000
So now if you run this cell, you will able to download a data set from Roboflow into your Google Colab

165
00:09:45,000 --> 00:09:45,000
notebook.

166
00:09:45,000 --> 00:09:52,000
Okay, so first of all, before running the script, please make sure that you have selected the runtime

167
00:09:52,000 --> 00:09:53,000
as key for GPU.

168
00:09:53,000 --> 00:09:56,000
And then, um, we will just run these cells.

169
00:09:56,000 --> 00:09:57,000
Okay.

170
00:09:57,000 --> 00:10:02,000
So first we will install the Ultralytics package and OpenCV Python package.

171
00:10:02,000 --> 00:10:09,000
Uh so we are using uh YOLO 11 model and YOLO 11 model is available under Ultralytics package okay.

172
00:10:09,000 --> 00:10:12,000
So therefore we are installing the Ultralytics package.

173
00:10:12,000 --> 00:10:18,000
So you can just simply run YOLO 11 model by installing installing the latest package.

174
00:10:18,000 --> 00:10:23,000
You don't need to clone the GitHub repository, although you can clone the GitHub repository and you

175
00:10:23,000 --> 00:10:25,000
can just follow this lengthy procedure as well.

176
00:10:25,000 --> 00:10:27,000
But this is a very simple procedure.

177
00:10:27,000 --> 00:10:34,000
Like you just install the analytics package and simply, uh, run the YOLO 11 model for command line

178
00:10:34,000 --> 00:10:36,000
interface or using Python script.

179
00:10:37,000 --> 00:10:37,000
Okay.

180
00:10:39,000 --> 00:10:42,000
So now from Ultralytics we will import YOLO.

181
00:10:42,000 --> 00:10:48,000
And if we import YOLO Then we can either use that or YOLO model, YOLO 11 model, or any other model

182
00:10:48,000 --> 00:10:50,000
as well of YOLO series.

183
00:10:50,000 --> 00:10:51,000
Okay.

184
00:10:51,000 --> 00:10:54,000
And then we have from IPython dot display import image.

185
00:10:54,000 --> 00:10:59,000
If you want to display an image or in Google Colab notebook we need the image library.

186
00:10:59,000 --> 00:11:04,000
Okay, so now we are downloading the data set from Roboflow into the Google Colab notebook.

187
00:11:04,000 --> 00:11:07,000
Okay, so here you can see this is the API key.

188
00:11:07,000 --> 00:11:11,000
You can just create your account on Roboflow and get your own API key as well.

189
00:11:11,000 --> 00:11:13,000
Please don't use my API key.

190
00:11:13,000 --> 00:11:15,000
Try to get your own API key as well.

191
00:11:19,000 --> 00:11:25,000
Okay, so now you can see the data set is downloaded and you can see we have provided our data set over

192
00:11:25,000 --> 00:11:26,000
here.

193
00:11:26,000 --> 00:11:27,000
Okay.

194
00:11:27,000 --> 00:11:30,000
So now over here you can see we have our data set.

195
00:11:30,000 --> 00:11:34,000
And if you just open the data dot yml file over here.

196
00:11:38,000 --> 00:11:39,000
It's taking so much time.

197
00:11:49,000 --> 00:11:51,000
That means let's save this first.

198
00:11:56,000 --> 00:11:59,000
And this shouldn't take so much time.

199
00:12:04,000 --> 00:12:04,000
Let's do it.

200
00:12:06,000 --> 00:12:09,000
Okay, so here you can see we have the dot yml file over here.

201
00:12:09,000 --> 00:12:10,000
So you can.

202
00:12:10,000 --> 00:12:15,000
First thing you can do is you can just update the path of the train images.

203
00:12:18,000 --> 00:12:20,000
Validation images.

204
00:12:26,000 --> 00:12:32,000
And test it if you just skip this test with this will not this.

205
00:12:32,000 --> 00:12:36,000
This works as well because during training we will be using the training images.

206
00:12:36,000 --> 00:12:36,000
And.

207
00:12:36,000 --> 00:12:40,000
For the validation we will be using the validation data set images okay.

208
00:12:40,000 --> 00:12:47,000
Okay, so please make sure to update the data dot yml file paths for the training in validation and

209
00:12:47,000 --> 00:12:48,000
for the test as well.

210
00:12:48,000 --> 00:12:49,000
Okay.

211
00:12:49,000 --> 00:12:52,000
And you can see we have only one class which is pothole.

212
00:12:52,000 --> 00:12:52,000
Okay.

213
00:12:52,000 --> 00:12:54,000
So everything looks perfect.

214
00:12:54,000 --> 00:12:54,000
Okay.

215
00:12:54,000 --> 00:12:59,000
So now over here we have the dataset location over here as well.

216
00:12:59,000 --> 00:13:00,000
Okay.

217
00:13:00,000 --> 00:13:04,000
And one thing we can do over here is we can just copy this from here.

218
00:13:04,000 --> 00:13:07,000
And we can just add this path over here.

219
00:13:08,000 --> 00:13:08,000
Okay.

220
00:13:08,000 --> 00:13:16,000
So you can download the YOLO 11 segmentation model from uh, the YOLO 11 GitHub repository will be directly

221
00:13:16,000 --> 00:13:17,000
downloaded over here.

222
00:13:17,000 --> 00:13:21,000
So if you write model is equal to YOLO YOLO 11 s dash segmentation.

223
00:13:21,000 --> 00:13:24,000
And this will download the YOLO 11 segmentation model.

224
00:13:24,000 --> 00:13:30,000
YOLO 11 comes with five different models YOLO 11 nano, YOLO 11 small, YOLO 11 medium, YOLO 11 large

225
00:13:30,000 --> 00:13:31,000
and 11 Extra large.

226
00:13:31,000 --> 00:13:33,000
So we will be using YOLO 11 small model.

227
00:13:33,000 --> 00:13:38,000
If you want to get more accurate results, you can use the higher YOLO models.

228
00:13:38,000 --> 00:13:44,000
But um, so, uh, what, uh, what is the difference between your 11 small models and your 11 large

229
00:13:44,000 --> 00:13:45,000
models?

230
00:13:45,000 --> 00:13:46,000
So a YOLO.

231
00:13:46,000 --> 00:13:49,000
11 small models are faster, but they are less accurate.

232
00:13:49,000 --> 00:13:56,000
But YOLO 11 large models are more accurate, but they are less, uh, fast or they have, uh, don't

233
00:13:56,000 --> 00:14:02,000
have, uh, they are not very fast or the inference speed is slow in, uh, for the YOLO 11 large models.

234
00:14:02,000 --> 00:14:02,000
Okay.

235
00:14:02,000 --> 00:14:05,000
So basically we are trying to perform instance segmentation.

236
00:14:05,000 --> 00:14:11,000
And first we will train the YOLO 11 uh segmentation model on for pothole detection.

237
00:14:11,000 --> 00:14:17,000
And here you can see we have the data dot yml file path over here in the data dot YAML file path.

238
00:14:17,000 --> 00:14:21,000
Uh we have a train validation test images path over here okay.

239
00:14:22,000 --> 00:14:27,000
And we are trying to fine tune the YOLO 11 instance segmentation model for 50 epochs which you can find

240
00:14:27,000 --> 00:14:28,000
over here.

241
00:14:28,000 --> 00:14:29,000
Okay.

242
00:14:29,000 --> 00:14:32,000
So uh, we are not changing any of the parameters.

243
00:14:32,000 --> 00:14:33,000
Like you can see over here.

244
00:14:33,000 --> 00:14:35,000
Our patience is being set to 100.

245
00:14:35,000 --> 00:14:39,000
And uh, our batch size is being stretched to 16.

246
00:14:39,000 --> 00:14:41,000
Okay, so what does batch size mean?

247
00:14:41,000 --> 00:14:46,000
So we will be passing our data to the model in 16 batches okay.

248
00:14:46,000 --> 00:14:49,000
I will later show you how this what this means as well.

249
00:14:49,000 --> 00:14:50,000
Okay.

250
00:14:52,000 --> 00:14:59,000
So here you can see I've already fine tuned the YOLO 11 instance segmentation model for pothole detection.

251
00:14:59,000 --> 00:15:02,000
And you can see I have just fine tuned it for 50 epochs.

252
00:15:02,000 --> 00:15:06,000
I will not run the training again because this takes a lot of time as well.

253
00:15:06,000 --> 00:15:14,000
And you can see over here we are getting a mean average precision with IOU 0.5 or 50% is 0.908 or 90.8%.

254
00:15:14,000 --> 00:15:22,000
And mean average precision with IOU varies from 0.5 to 0.95, is 0.623 like 62.3%.

255
00:15:22,000 --> 00:15:25,000
And these are the bounding box prediction results.

256
00:15:25,000 --> 00:15:28,000
And here we have the mass prediction results over here.

257
00:15:28,000 --> 00:15:31,000
And they are quite good as well okay.

258
00:15:32,000 --> 00:15:33,000
Okay.

259
00:15:33,000 --> 00:15:37,000
So over here if you just see over here our batch size was set to 16.

260
00:15:37,000 --> 00:15:38,000
As I told you.

261
00:15:39,000 --> 00:15:42,000
Okay, so, um, if I just show you one thing.

262
00:15:42,000 --> 00:15:43,000
Um.

263
00:15:43,000 --> 00:15:47,000
So I told you that we will be passing our data in 16 batches.

264
00:15:47,000 --> 00:15:47,000
Okay.

265
00:15:47,000 --> 00:15:49,000
So over here.

266
00:15:53,000 --> 00:15:59,000
So as we will be passing our data in 16 batches, uh, here you can see over here, uh, and our default

267
00:15:59,000 --> 00:16:01,000
image size is six 4640.

268
00:16:01,000 --> 00:16:02,000
We are not changing this.

269
00:16:02,000 --> 00:16:08,000
So if you just see over here, we have 543 images in our training data.

270
00:16:08,000 --> 00:16:11,000
And you can see 34 divided by 34.

271
00:16:11,000 --> 00:16:12,000
So what does this mean.

272
00:16:12,000 --> 00:16:16,000
So if I just open my calculator over here okay.

273
00:16:16,000 --> 00:16:18,000
So we will be passing our data in 16 batches.

274
00:16:18,000 --> 00:16:22,000
So if I just divide 543 by 60.

275
00:16:24,000 --> 00:16:27,000
So we have 33.93 okay.

276
00:16:27,000 --> 00:16:32,000
So in each batch we will be passing 34 images okay.

277
00:16:32,000 --> 00:16:39,000
So our complete data will be passed in 16 batches, and in each batch we will be passing around 34 images.

278
00:16:39,000 --> 00:16:42,000
Okay, so this is what it means over here.

279
00:16:42,000 --> 00:16:43,000
Okay.

280
00:16:43,000 --> 00:16:46,000
And, um, our results are quite good.

281
00:16:46,000 --> 00:16:52,000
Like, uh, for the bounding box predictions, we are getting a mean average precision with IOU threshold

282
00:16:52,000 --> 00:16:56,000
as 0.5 or 50% as 90.8%.

283
00:16:56,000 --> 00:17:02,000
And for the mass prediction, we are getting a mean average precision with IOU threshold 0.5 or 50%

284
00:17:02,000 --> 00:17:04,000
or 0.91 or 91%.

285
00:17:04,000 --> 00:17:05,000
Good.

286
00:17:05,000 --> 00:17:10,000
And after I have, as I told you, that I have already trained the YOLO 11 instance segmentation model

287
00:17:10,000 --> 00:17:13,000
for on potholes uh data set.

288
00:17:13,000 --> 00:17:15,000
So I've just saved the model weights on my drive.

289
00:17:15,000 --> 00:17:21,000
So I will directly download the weights from my drive into this Google Colab notebook over here okay.

290
00:17:26,000 --> 00:17:26,000
Okay.

291
00:17:26,000 --> 00:17:31,000
So over here now you can see that, um, these are some results over here.

292
00:17:31,000 --> 00:17:39,000
So now you can see over here as we increase the present value of our.

293
00:17:39,000 --> 00:17:42,000
Now you can see over here on the x axis we have the confidence value.

294
00:17:42,000 --> 00:17:44,000
And on the y axis we have the precision score.

295
00:17:44,000 --> 00:17:50,000
So you can see as the confidence increases um our precision value increases.

296
00:17:51,000 --> 00:17:51,000
Okay.

297
00:17:51,000 --> 00:17:54,000
So what does precision mean.

298
00:17:54,000 --> 00:18:02,000
Precision is basically a sum of true positives or true positives divided by all the positive predictions.

299
00:18:03,000 --> 00:18:07,000
So if I just you just write precision in computer vision.

300
00:18:07,000 --> 00:18:09,000
So you can see over here.

301
00:18:17,000 --> 00:18:20,000
The okay as precision medicine.

302
00:18:20,000 --> 00:18:25,000
How will you find true positives out of all positive predictions okay.

303
00:18:26,000 --> 00:18:27,000
Okay.

304
00:18:28,000 --> 00:18:31,000
So here you can see we have the general formula.

305
00:18:31,000 --> 00:18:32,000
Uh okay.

306
00:18:32,000 --> 00:18:32,000
Over here.

307
00:18:32,000 --> 00:18:37,000
So you can see that, uh, true positives and I just based.

308
00:18:40,000 --> 00:18:42,000
On no es un.

309
00:18:42,000 --> 00:18:43,000
Okay.

310
00:18:43,000 --> 00:18:49,000
So here you can see that, uh, a correct predictions divided by total predictions.

311
00:18:49,000 --> 00:18:49,000
Okay.

312
00:18:49,000 --> 00:18:53,000
So this is precision correct predictions divided by total predictions okay.

313
00:18:53,000 --> 00:19:00,000
So over here, for example, uh, you made five predictions and four among you have ten ground truth.

314
00:19:00,000 --> 00:19:02,000
And I made five predictions for that.

315
00:19:02,000 --> 00:19:03,000
Two positives.

316
00:19:03,000 --> 00:19:06,000
And there is one false positives okay.

317
00:19:06,000 --> 00:19:12,000
And there are six false negatives like we are unable to detect uh, ground truth six times okay.

318
00:19:12,000 --> 00:19:14,000
So like precision will be 80%.

319
00:19:14,000 --> 00:19:17,000
And uh recall we will discuss later okay.

320
00:19:17,000 --> 00:19:23,000
So basically we uh check how many correct predictions are made out of total predictions.

321
00:19:23,000 --> 00:19:23,000
Okay.

322
00:19:23,000 --> 00:19:29,000
So as you can see that as we increase the confidence score okay.

323
00:19:29,000 --> 00:19:34,000
So when we increase the confidence score, our predictions might decrease.

324
00:19:34,000 --> 00:19:34,000
Okay.

325
00:19:34,000 --> 00:19:40,000
But when we increase the confidence score our model, make correct predictions like when.

326
00:19:40,000 --> 00:19:47,000
If we can say that if our detection has a confidence score of 0.90, this means that the model is 90%

327
00:19:47,000 --> 00:19:52,000
confident, uh, that, uh, this is a box or this is a pothole.

328
00:19:52,000 --> 00:19:58,000
Okay, so as we increase the continuous threshold, our precision score will definitely increase.

329
00:19:58,000 --> 00:19:59,000
Okay.

330
00:19:59,000 --> 00:20:04,000
Because when we increase the confidence score, we will be getting, uh, more correct predictions,

331
00:20:04,000 --> 00:20:07,000
and, uh, there will be less false positives.

332
00:20:07,000 --> 00:20:08,000
Okay.

333
00:20:08,000 --> 00:20:15,000
So over here and we can see that when, uh, we have a confidence score of 0.946, then we have a precision

334
00:20:15,000 --> 00:20:17,000
score of 1.00.

335
00:20:17,000 --> 00:20:18,000
Okay.

336
00:20:18,000 --> 00:20:24,000
And over here you can see that as we increase the confidence for our recall value point begins to decrease.

337
00:20:24,000 --> 00:20:25,000
Okay.

338
00:20:25,000 --> 00:20:29,000
And, uh, and you can see that, uh.

339
00:20:30,000 --> 00:20:32,000
And then we have a confidence score like 0.0.

340
00:20:32,000 --> 00:20:35,000
Then we have a maximum recall.

341
00:20:35,000 --> 00:20:36,000
So if you just see over here.

342
00:20:36,000 --> 00:20:40,000
Recall is basically correct predictions out of total ground truth.

343
00:20:40,000 --> 00:20:47,000
So you can see over here if we have ten ground truth uh, images of dogs or ten ground truth of dogs,

344
00:20:47,000 --> 00:20:53,000
and we make ten predictions out of this, we have eight true positives and two false positives and two

345
00:20:54,000 --> 00:20:54,000
false negatives.

346
00:20:54,000 --> 00:20:57,000
So we have like, uh, 80% of recall.

347
00:20:57,000 --> 00:21:03,000
And you can you can see that if you make five predictions like five, we have ten ground truth for the

348
00:21:03,000 --> 00:21:05,000
gag and we make five predictions.

349
00:21:05,000 --> 00:21:11,000
And out of five predictions we have, uh, four true positives and we have six false negative.

350
00:21:11,000 --> 00:21:15,000
And in that case we might have a higher precision, but the recall is 40%.

351
00:21:16,000 --> 00:21:16,000
Okay.

352
00:21:16,000 --> 00:21:24,000
So when we increase the confidence score like you can see, you know that our predictions will definitely

353
00:21:24,000 --> 00:21:25,000
decrease.

354
00:21:25,000 --> 00:21:25,000
Okay.

355
00:21:25,000 --> 00:21:29,000
So When we increase the confidence score.

356
00:21:29,000 --> 00:21:31,000
Our predictions will decrease.

357
00:21:31,000 --> 00:21:33,000
And you can see that, uh.

358
00:21:33,000 --> 00:21:36,000
Uh, I will be not be able to detect all the ground truths.

359
00:21:36,000 --> 00:21:39,000
So therefore the recall value will decrease.

360
00:21:39,000 --> 00:21:48,000
So when we have, uh, a confidence score of 0.00, then we have, um, maximum recall and confidence

361
00:21:48,000 --> 00:21:52,000
score 0.00, which is 0.96.

362
00:21:52,000 --> 00:21:56,000
So we have a maximum value of recall 0.96 at confidence zero zero.

363
00:21:56,000 --> 00:21:59,000
So these are predictions for the bounding boxes.

364
00:21:59,000 --> 00:22:05,000
And as instance segmentation we not only predict the bounding boxes we also predict the mask as well.

365
00:22:05,000 --> 00:22:07,000
So these are the graph plots for the mask.

366
00:22:07,000 --> 00:22:11,000
And you can see they are quite similar with the bounding boxes as well.

367
00:22:11,000 --> 00:22:14,000
So over here we have the confusion matrix.

368
00:22:14,000 --> 00:22:17,000
So from confusion matrix tells us how the model handles different classes.

369
00:22:17,000 --> 00:22:23,000
So um for 29 times when there is a pothole or our model predicted correctly that there is a pothole.

370
00:22:23,000 --> 00:22:29,000
And 51% of times when there is a pothole, our model is unable to detect that there is a pothole.

371
00:22:30,000 --> 00:22:30,000
Okay.

372
00:22:30,000 --> 00:22:35,000
And 89% of the time, our model detected correctly that there is a pothole, and 11% of the times our

373
00:22:35,000 --> 00:22:38,000
model is unable to detect that there is a pothole.

374
00:22:38,000 --> 00:22:41,000
So these are the model predictions on the validation batch.

375
00:22:41,000 --> 00:22:45,000
We are not using the validation images data set images for the training.

376
00:22:45,000 --> 00:22:51,000
So it's always better to have a look and see how our model performs on the validation dataset images.

377
00:22:51,000 --> 00:22:54,000
So you can see that results look quite promising.

378
00:22:54,000 --> 00:22:59,000
And over here you can see that, um, our accuracy tends to increase or the mean average precision tends

379
00:22:59,000 --> 00:23:04,000
to increase as being as the number of epochs continue to increase.

380
00:23:04,000 --> 00:23:05,000
Okay.

381
00:23:05,000 --> 00:23:08,000
And now we will validate the fine tuned model.

382
00:23:08,000 --> 00:23:12,000
So to see um, uh how our model generalizes.

383
00:23:12,000 --> 00:23:17,000
So we are validating our model on the validation data set images to see how our model generalizes.

384
00:23:17,000 --> 00:23:24,000
And you can see that um for the box predictions we are getting a mean average precision with IOU 0.50.909

385
00:23:24,000 --> 00:23:25,000
1909 like 19.9%.

386
00:23:25,000 --> 00:23:32,000
And for the mass predictions, we are getting a mean average precision with IOU 0.50.91 like 91%.

387
00:23:39,000 --> 00:23:44,000
Okay, so now we will do perform inference on the test dataset images over here.

388
00:23:44,000 --> 00:23:45,000
Okay.

389
00:23:45,000 --> 00:23:49,000
So what you can do over here is I'm just checking one thing.

390
00:23:49,000 --> 00:23:49,000
Okay.

391
00:23:49,000 --> 00:23:50,000
That's fine.

392
00:23:50,000 --> 00:23:54,000
So you can just go over here and just copy this part from here.

393
00:23:54,000 --> 00:23:57,000
And you can just add this part over here.

394
00:23:57,000 --> 00:24:02,000
And you can just perform inference on the test dataset images as well over here.

395
00:24:05,000 --> 00:24:06,000
Okay.

396
00:24:07,000 --> 00:24:10,000
And you can just display this results over here as well.

397
00:24:10,000 --> 00:24:14,000
I've just written some code using matplotlib library over here.

398
00:24:15,000 --> 00:24:16,000
Okay.

399
00:24:16,000 --> 00:24:20,000
So it will display all the model predictions on the test dataset images over here.

400
00:24:20,000 --> 00:24:22,000
So now this will be.

401
00:24:22,000 --> 00:24:24,000
Take your seconds.

402
00:24:25,000 --> 00:24:26,000
Okay.

403
00:24:35,000 --> 00:24:39,000
So here you can see we have the model predictions on the test data.

404
00:24:39,000 --> 00:24:40,000
Set images.

405
00:24:40,000 --> 00:24:43,000
And we will see over here as well.

406
00:24:54,000 --> 00:24:56,000
So this will take few more seconds.

407
00:24:56,000 --> 00:24:59,000
And let's see what results do we get over here.

408
00:25:05,000 --> 00:25:09,000
After this we will test our model on this two videos.

409
00:25:09,000 --> 00:25:09,000
Okay.

410
00:25:10,000 --> 00:25:18,000
So uh let's see I think this shouldn't take as much long, but let's wait.

411
00:25:22,000 --> 00:25:25,000
So these are the model predictions on the test dataset images.

412
00:25:25,000 --> 00:25:27,000
And you can see that the results look quite good.

413
00:25:27,000 --> 00:25:31,000
We are able to detect the potholes everywhere.

414
00:25:31,000 --> 00:25:31,000
Okay.

415
00:25:32,000 --> 00:25:34,000
So that doesn't look quite good.

416
00:25:36,000 --> 00:25:36,000
Yeah.

417
00:25:36,000 --> 00:25:40,000
But here you can see there are many potholes we are not able to detect all.

418
00:25:40,000 --> 00:25:46,000
So if you just, uh, train your model on high number of epochs and you will be able to detect these

419
00:25:46,000 --> 00:25:47,000
as well.

420
00:25:47,000 --> 00:25:48,000
Okay.

421
00:25:48,000 --> 00:25:53,000
Or you can just, uh, decrease the confidence score over here and test this as well.

422
00:25:54,000 --> 00:25:56,000
You might get more accurate results.

423
00:25:56,000 --> 00:25:56,000
Okay.

424
00:25:56,000 --> 00:26:00,000
So let's test our model on this sample video over here.

425
00:26:01,000 --> 00:26:01,000
Okay.

426
00:26:08,000 --> 00:26:10,000
So you can just run this as well.

427
00:26:10,000 --> 00:26:15,000
And you will able to, uh, get the potholes in this video.

428
00:26:18,000 --> 00:26:23,000
So now you can see that, uh, the complete video is being divided into 324 frames, and we are detecting

429
00:26:24,000 --> 00:26:27,000
potholes in each of the frame one by one.

430
00:26:51,000 --> 00:26:55,000
So now I will display this video into the Google Colab notebook over here.

431
00:26:56,000 --> 00:27:00,000
And then I will just show you the results as well over here.

432
00:27:12,000 --> 00:27:13,000
Well here is our output video.

433
00:27:13,000 --> 00:27:16,000
I will just simply download this video from here.

434
00:27:17,000 --> 00:27:18,000
And.

435
00:27:19,000 --> 00:27:21,000
here you can see is our output.

436
00:27:24,000 --> 00:27:27,000
So you can see over here we are able to detect the potholes.

437
00:27:27,000 --> 00:27:29,000
And our results look quite promising.

438
00:27:29,000 --> 00:27:36,000
Like you can see we are able to detect the potholes over here in the same way you can test on other

439
00:27:36,000 --> 00:27:37,000
video as well.

440
00:27:37,000 --> 00:27:40,000
I've already done the testing and here I have the output video.

441
00:27:40,000 --> 00:27:45,000
So I will just simply download this output video and show you, uh, what results look like.

442
00:27:49,000 --> 00:27:53,000
So over here you can see we are able to detect the potholes as well.

443
00:27:53,000 --> 00:27:59,000
So in this tutorial we have seen that how we can fine tune the YOLO 11 instance segmentation model for

444
00:27:59,000 --> 00:28:00,000
potholes detection.

445
00:28:00,000 --> 00:28:07,000
You can also fine tune the YOLO 11 instance segmentation model for to detect any other uh, object as

446
00:28:07,000 --> 00:28:07,000
well.

447
00:28:07,000 --> 00:28:13,000
So you can just fine tune the YOLO 11 instance segmentation model or any other custom data set as well.

448
00:28:13,000 --> 00:28:14,000
So that's all from this tutorial.

449
00:28:14,000 --> 00:28:16,000
Thank you for watching.