1
00:00:03,000 --> 00:00:10,000
In this video tutorial, we will look at how we can fine tune the YOLO 11 classification model on a

2
00:00:10,000 --> 00:00:11,000
custom data set.

3
00:00:11,000 --> 00:00:17,000
We will fine tune the YOLO 11 image classification model for plants classification.

4
00:00:17,000 --> 00:00:21,000
So here you can see we have the data set available on Roboflow.

5
00:00:21,000 --> 00:00:26,000
If I just go to the overview over here we have this plant classification data set.

6
00:00:26,000 --> 00:00:29,000
And it contains around 3800 images.

7
00:00:29,000 --> 00:00:33,000
We can see over here the data set consists of 3764 images.

8
00:00:33,000 --> 00:00:36,000
You can download this project by clicking over here.

9
00:00:36,000 --> 00:00:41,000
But uh to access this dataset you need to, uh, login on roboflow.

10
00:00:41,000 --> 00:00:42,000
With your account.

11
00:00:42,000 --> 00:00:47,000
You can easily create your account on roboflow, with your GitHub account, with your Gmail ID, or

12
00:00:47,000 --> 00:00:49,000
with any other work email as well.

13
00:00:50,000 --> 00:00:55,000
So you can see over here we have, uh, 3764 images available in our data set.

14
00:00:55,000 --> 00:01:03,000
And we have a lot of different classes of plants available over here like Tulip Time and, uh, Ruby

15
00:01:03,000 --> 00:01:05,000
and melon.

16
00:01:05,000 --> 00:01:09,000
Nana's like, there are different classes available over here.

17
00:01:09,000 --> 00:01:13,000
And, uh, we have three versions of data set available over here.

18
00:01:13,000 --> 00:01:19,000
Uh, originally, our data set consists of 3764 images as seen over here as well.

19
00:01:19,000 --> 00:01:24,000
Uh, but, uh, the author of this data set, I'm not the author of this data set.

20
00:01:24,000 --> 00:01:26,000
The author of this data set applied the augmentation.

21
00:01:27,000 --> 00:01:32,000
So now you can see that, uh, clockwise rotation, counter clockwise rotation augmentation and, uh,

22
00:01:32,000 --> 00:01:34,000
nose up to 0.5 pixels.

23
00:01:35,000 --> 00:01:39,000
Uh, uh, augmentation, like three augmentation steps are being applied over here.

24
00:01:39,000 --> 00:01:44,000
And in the pre-processing we have resize the image to the size 6.40 x 640.

25
00:01:44,000 --> 00:01:49,000
So remember that augmentation is always applied to the training data.

26
00:01:49,000 --> 00:01:52,000
Augmentation is not applied on the validation data or test data set.

27
00:01:53,000 --> 00:01:57,000
So we only always apply augmentation on that training data.

28
00:01:57,000 --> 00:02:00,000
So we have 3764 images.

29
00:02:00,000 --> 00:02:02,000
and after applying augmentation.

30
00:02:02,000 --> 00:02:06,000
Our training data size increased to 7905 images.

31
00:02:06,000 --> 00:02:12,000
We have 753 images in our validation data set, and we have 376 images in our test dataset.

32
00:02:12,000 --> 00:02:18,000
In ideal scenario, in computer vision projects, we usually take 70% of the images in the training

33
00:02:18,000 --> 00:02:24,000
data, 20% of the images in the validation data, and 10% of the images in the test data set.

34
00:02:24,000 --> 00:02:27,000
Okay, so we have the three versions of our data set available over here.

35
00:02:27,000 --> 00:02:32,000
We will be using the third version, but we have the other versions available over here as well.

36
00:02:34,000 --> 00:02:34,000
Okay.

37
00:02:35,000 --> 00:02:41,000
So and if you just go to analytics you can see over here if your classes are balanced or not.

38
00:02:41,000 --> 00:02:46,000
So if you see a red line over here this means that that class is not balanced.

39
00:02:46,000 --> 00:02:48,000
So we have seen all the green lines.

40
00:02:48,000 --> 00:02:53,000
So means this means that all our classes are balanced okay.

41
00:02:54,000 --> 00:02:59,000
So like you can see over here we have balanced classes.

42
00:02:59,000 --> 00:03:05,000
We have are originally 3764 images in our training data, in our complete data set.

43
00:03:05,000 --> 00:03:10,000
So we have total 3764 images in our complete data set.

44
00:03:10,000 --> 00:03:15,000
The complete data set consists of training, validation and test data sets over here.

45
00:03:15,000 --> 00:03:21,000
And we have 88% of the images like after applying augmentation on the training data.

46
00:03:21,000 --> 00:03:24,000
Our training data set size increased to 7905.

47
00:03:25,000 --> 00:03:29,000
And we have validation in 783 images in the validation.

48
00:03:29,000 --> 00:03:32,000
And we have 376 images in the test data set.

49
00:03:32,000 --> 00:03:37,000
As I told you, always remember that we applied augmentation only on that training data.

50
00:03:37,000 --> 00:03:43,000
So in order to, uh, export this data set from Roboflow into our Google Colab notebook, you can click

51
00:03:43,000 --> 00:03:50,000
on download and you can just like down folder structure or open a paper version and you can just click

52
00:03:50,000 --> 00:03:52,000
on show download code from here.

53
00:03:54,000 --> 00:03:55,000
Okay I'm just making a mistake.

54
00:03:55,000 --> 00:04:04,000
We need to uh, select folder structure from here you can just click on show download code from here,

55
00:04:04,000 --> 00:04:07,000
and you can just copy this from here.

56
00:04:10,000 --> 00:04:15,000
And uh, and just copy this from here and you will just add this over here.

57
00:04:15,000 --> 00:04:16,000
Okay.

58
00:04:16,000 --> 00:04:20,000
So here you can see we have the complete Google Colab notebook.

59
00:04:20,000 --> 00:04:22,000
I have written down all the code over here.

60
00:04:22,000 --> 00:04:26,000
Uh, before running this script, please make sure that you have selected runtime for GPU.

61
00:04:26,000 --> 00:04:29,000
And first we will install that package.

62
00:04:29,000 --> 00:04:31,000
Then we will import all the required libraries.

63
00:04:31,000 --> 00:04:34,000
So YOLO 11 is available under Ultralytics package.

64
00:04:34,000 --> 00:04:35,000
Okay.

65
00:04:35,000 --> 00:04:41,000
Then you can uh, previously while using Yolov7 Yolov6 Yolov5, you will need to clone the complete

66
00:04:41,000 --> 00:04:43,000
data repository of that model.

67
00:04:43,000 --> 00:04:48,000
But um, uh, YOLO 11 and Yolo V8 are available under that package.

68
00:04:48,000 --> 00:04:54,000
So this makes it quite easy that we just install the package Ultralytics, and then we can access any

69
00:04:54,000 --> 00:04:55,000
of that model as well.

70
00:04:59,000 --> 00:05:04,000
So we will be using the YOLO 11 model through a Python script.

71
00:05:04,000 --> 00:05:09,000
You can also run YOLO 11 model in a command line interface as well.

72
00:05:09,000 --> 00:05:11,000
Okay, so you have both options.

73
00:05:11,000 --> 00:05:11,000
Okay.

74
00:05:14,000 --> 00:05:15,000
So we have installed the analytics package.

75
00:05:15,000 --> 00:05:18,000
And let's see if I can run this version over here.

76
00:05:18,000 --> 00:05:21,000
So we have the Ultralytics 8.3.1 version.

77
00:05:21,000 --> 00:05:23,000
And we have Python version of 3.1.

78
00:05:23,000 --> 00:05:31,000
We have the Cuda available Tesla T4 GPU and like 15 GB we have the Vram available and we have two CPUs

79
00:05:31,000 --> 00:05:32,000
available over here.

80
00:05:32,000 --> 00:05:40,000
And we have like uh, out of 112 GB disk space over 36 86.6 GB is being occupied currently.

81
00:05:42,000 --> 00:05:46,000
So now I will just download this data set from Roboflow into this over here.

82
00:05:55,000 --> 00:05:59,000
So this will take few seconds okay.

83
00:05:59,000 --> 00:06:03,000
Now it's extracting the dataset and like it's 100% done.

84
00:06:03,000 --> 00:06:04,000
Okay, so we can just hide the output.

85
00:06:04,000 --> 00:06:07,000
So here you can see we have the plant classification data set.

86
00:06:07,000 --> 00:06:10,000
We have the train test validation folder over here.

87
00:06:10,000 --> 00:06:11,000
Okay.

88
00:06:11,000 --> 00:06:13,000
For each of the class okay.

89
00:06:13,000 --> 00:06:14,000
Okay.

90
00:06:14,000 --> 00:06:15,000
So that's not good.

91
00:06:15,000 --> 00:06:16,000
Okay.

92
00:06:19,000 --> 00:06:21,000
So here's the link of the data set over here.

93
00:06:24,000 --> 00:06:25,000
Okay.

94
00:06:25,000 --> 00:06:26,000
So you can just check all the details.

95
00:06:26,000 --> 00:06:30,000
And now I will just download the your 11 classification model.

96
00:06:30,000 --> 00:06:33,000
So your 11 comes with five different models for the 11 small.

97
00:06:33,000 --> 00:06:38,000
Your 11 uh your 11 nano your 11 small your 11 medium your 11 large.

98
00:06:38,000 --> 00:06:40,000
Your 11 extra large.

99
00:06:40,000 --> 00:06:40,000
Okay.

100
00:06:40,000 --> 00:06:44,000
So your 11 nano is now or stress but it is less accurate.

101
00:06:44,000 --> 00:06:47,000
But your 11 extra large is the most accurate.

102
00:06:47,000 --> 00:06:51,000
But it takes more inference time as compared to other other 11 models.

103
00:06:51,000 --> 00:06:57,000
So I'm using the other 11 small model or sorry I'm just mistakenly run this cell again.

104
00:06:57,000 --> 00:07:01,000
Okay, so I'm using YOLO 11 Small classification model over here.

105
00:07:04,000 --> 00:07:08,000
If you want to get more better accuracy, you can use larger models as well.

106
00:07:08,000 --> 00:07:09,000
Okay.

107
00:07:10,000 --> 00:07:12,000
So you can see we have downloaded the model.

108
00:07:12,000 --> 00:07:13,000
Okay.

109
00:07:13,000 --> 00:07:15,000
So I have already done that training okay.

110
00:07:15,000 --> 00:07:18,000
So I will not do that training again because this takes very much time.

111
00:07:18,000 --> 00:07:24,000
Like you can see over here uh, for each epoch like it takes around one minute 22 seconds okay.

112
00:07:24,000 --> 00:07:29,000
So if you are training for 60 epochs, this will take around, uh, 90 minutes at least.

113
00:07:29,000 --> 00:07:34,000
Okay, so we have around 797,905 images.

114
00:07:34,000 --> 00:07:38,000
In the training data, we have 376 images, uh, in the test data set.

115
00:07:38,000 --> 00:07:39,000
And.

116
00:07:39,000 --> 00:07:39,000
Okay.

117
00:07:40,000 --> 00:07:45,000
And in the validation data, uh, I, uh, it's none, but I don't know why it's getting none.

118
00:07:45,000 --> 00:07:48,000
So we have around 27 classes over here.

119
00:07:48,000 --> 00:07:51,000
We have a batch size of 16 over here.

120
00:07:52,000 --> 00:07:52,000
Okay.

121
00:07:55,000 --> 00:08:01,000
So over here, like you can see that we have 1905 images in the training data.

122
00:08:01,000 --> 00:08:07,000
So if we are passing our data in 16 batches okay.

123
00:08:07,000 --> 00:08:11,000
So if we just open the calculator over here.

124
00:08:13,000 --> 00:08:16,000
And 7095 divided by 16.

125
00:08:17,000 --> 00:08:19,000
So we have around 494 okay.

126
00:08:19,000 --> 00:08:22,000
So we will be passing data in 16 batches.

127
00:08:22,000 --> 00:08:26,000
So like 494 or 495 are almost equal.

128
00:08:26,000 --> 00:08:33,000
So in each batch we will be passing 495 images to our model for the training.

129
00:08:33,000 --> 00:08:34,000
Okay.

130
00:08:36,000 --> 00:08:43,000
So over here you can see I have just fine tune that in classification model on this image data set for

131
00:08:43,000 --> 00:08:45,000
60 epochs.

132
00:08:45,000 --> 00:08:46,000
Okay.

133
00:08:46,000 --> 00:08:53,000
So we have just fine tune the YOLO 11 classification model uh to detect uh to classify plants.

134
00:08:53,000 --> 00:08:58,000
Uh, and we have finally on the YOLO 11 image classification model on this plant classification data

135
00:08:58,000 --> 00:09:01,000
set for 60 epochs.

136
00:09:01,000 --> 00:09:03,000
Okay, so I'm not run the training again.

137
00:09:03,000 --> 00:09:08,000
And you can see we are just getting a good accuracy score of 97.1%.

138
00:09:08,000 --> 00:09:10,000
Uh, where is the lake?

139
00:09:10,000 --> 00:09:10,000
Okay.

140
00:09:10,000 --> 00:09:14,000
So I think, uh, there is an issue with the link.

141
00:09:14,000 --> 00:09:18,000
I just need to add the link over here.

142
00:09:19,000 --> 00:09:24,000
Uh, so I already saved the model weights into the drive, and we are directly downloading the model

143
00:09:24,000 --> 00:09:27,000
weights from the drive into this Google Colab notebook.

144
00:09:27,000 --> 00:09:29,000
So if I just click this link.

145
00:09:35,000 --> 00:09:35,000
Okay.

146
00:09:35,000 --> 00:09:38,000
But I don't need to download the model from here.

147
00:09:38,000 --> 00:09:43,000
So one thing I can do is I will just add the link over here.

148
00:09:46,000 --> 00:09:51,000
So I have uh, so I have added the link over here and you can see that I am if you just run this cell

149
00:09:51,000 --> 00:09:57,000
I'm downloading the best model weights, uh, from my drive into this Google Colab notebook.

150
00:09:57,000 --> 00:09:57,000
Okay.

151
00:09:57,000 --> 00:09:59,000
So here you can see the best model weights.

152
00:09:59,000 --> 00:10:02,000
And here are my training results.

153
00:10:02,000 --> 00:10:06,000
So after you run that perform the training you have this confusion matrix.

154
00:10:06,000 --> 00:10:10,000
So confusion matrix basically tells us how our model handles different classes.

155
00:10:10,000 --> 00:10:17,000
So over here if you just see that uh for this by ARM class, 11% or 11 times our model detected correctly

156
00:10:17,000 --> 00:10:23,000
that this is my arm by one time when there is a my arm, our model detected it as can be.

157
00:10:23,000 --> 00:10:24,000
Okay.

158
00:10:24,000 --> 00:10:29,000
Similarly, uh, uh, 11 times our model detected correctly that this is a manga.

159
00:10:29,000 --> 00:10:32,000
And uh, 2% of the time there is a manga.

160
00:10:32,000 --> 00:10:34,000
Our model detected that as papaya.

161
00:10:34,000 --> 00:10:35,000
Okay.

162
00:10:36,000 --> 00:10:42,000
And, um, for the others, like you can see, 27% of the time our model is correctly that is the shampoo.

163
00:10:42,000 --> 00:10:48,000
And if our model detected correctly every time, like there is no misclassification over here.

164
00:10:49,000 --> 00:10:54,000
So over here you can see that 92% of the time our model correctly predicted that this is an arm.

165
00:10:54,000 --> 00:10:57,000
And it's not a time our model misclassified.

166
00:10:57,000 --> 00:10:58,000
It as uh.

167
00:10:58,000 --> 00:10:58,000
Goodbye.

168
00:10:59,000 --> 00:11:01,000
So this is a normalized confusion matrix.

169
00:11:01,000 --> 00:11:07,000
And similarly we have other classes like 70% 300 time or more correctly, that is the manga.

170
00:11:07,000 --> 00:11:09,000
And 30% of time when there is some manga.

171
00:11:09,000 --> 00:11:16,000
Our model detected as papaya, and 7% of the time when there is a manga or model detected as zero.

172
00:11:16,000 --> 00:11:16,000
Okay.

173
00:11:16,000 --> 00:11:19,000
So, uh, this is a similar thing.

174
00:11:19,000 --> 00:11:25,000
Okay, so these are the model predictions on the validation batch as I'm not training the model on the

175
00:11:25,000 --> 00:11:26,000
validation data set.

176
00:11:26,000 --> 00:11:27,000
Uh, images.

177
00:11:27,000 --> 00:11:32,000
So it's always better to take a look and see how our model performs on the validation data set images.

178
00:11:32,000 --> 00:11:35,000
And you can see that, um, the results look quite promising.

179
00:11:35,000 --> 00:11:39,000
Like our model makes some correct predictions as well.

180
00:11:40,000 --> 00:11:40,000
Okay.

181
00:11:43,000 --> 00:11:48,000
So like you can see okay.

182
00:11:56,000 --> 00:11:59,000
So here are the training results over here.

183
00:12:00,000 --> 00:12:02,000
Okay, that is my mistake.

184
00:12:02,000 --> 00:12:04,000
Okay, that is my mistake.

185
00:12:04,000 --> 00:12:06,000
Uh, okay.

186
00:12:06,000 --> 00:12:09,000
So I'm just, uh, just like me.

187
00:12:13,000 --> 00:12:16,000
So now here you can see our training and validation loss results.

188
00:12:16,000 --> 00:12:21,000
Previously, I just mistakenly run this cell because I have not run this training in this session.

189
00:12:21,000 --> 00:12:24,000
So I will not have this folder.

190
00:12:24,000 --> 00:12:28,000
When you run this training, you have the runs folder appearing over here and there.

191
00:12:28,000 --> 00:12:30,000
You can just draw this results.

192
00:12:30,000 --> 00:12:34,000
So I have already run training before recording this tutorial.

193
00:12:34,000 --> 00:12:37,000
So over here you can see we have the training.

194
00:12:37,000 --> 00:12:39,000
The loss is continuously decreasing.

195
00:12:39,000 --> 00:12:41,000
Validation loss is continuously increasing.

196
00:12:41,000 --> 00:12:44,000
Our accuracy is increasing continuously.

197
00:12:44,000 --> 00:12:49,000
So if we run train model training on high number of epochs, uh, we can expect that model accuracy

198
00:12:49,000 --> 00:12:50,000
will further improve.

199
00:12:50,000 --> 00:12:51,000
Okay.

200
00:12:51,000 --> 00:12:56,000
Um, so we can download the best model weights from here as well.

201
00:12:56,000 --> 00:13:00,000
So if you just run this cell, uh, the best model weights will be downloaded.

202
00:13:01,000 --> 00:13:06,000
Okay, so now you can see the best model weights are being downloaded over here.

203
00:13:10,000 --> 00:13:13,000
So here we can see the best.

204
00:13:13,000 --> 00:13:16,000
So here we can see the best model weights appearing over here.

205
00:13:17,000 --> 00:13:17,000
Okay.

206
00:13:17,000 --> 00:13:19,000
So uh let's see.

207
00:13:19,000 --> 00:13:24,000
So now I will just load the train classification model over here.

208
00:13:24,000 --> 00:13:26,000
And I will just specify the test loaded part.

209
00:13:27,000 --> 00:13:31,000
So now we will test our fine tuned model on this test dataset images.

210
00:13:31,000 --> 00:13:38,000
We will test the performance of our fine tuned to the 11 image classification model on this test dataset.

211
00:13:38,000 --> 00:13:40,000
Images which we have over here.

212
00:13:40,000 --> 00:13:44,000
Okay so I will just specify the test folder path.

213
00:13:44,000 --> 00:13:49,000
So now I will just go through each class folder in this directory over here.

214
00:13:49,000 --> 00:13:49,000
Okay.

215
00:13:49,000 --> 00:13:56,000
So we you can see we have different uh tests uh class folders in this test directory over here.

216
00:13:56,000 --> 00:13:57,000
You can just check all these.

217
00:13:57,000 --> 00:13:58,000
Okay.

218
00:13:58,000 --> 00:14:02,000
so now we will predict, uh, on each image in the class folder.

219
00:14:02,000 --> 00:14:07,000
So now we will do the prediction on each of these images in this class folders.

220
00:14:07,000 --> 00:14:11,000
And uh then you will see what results we get.

221
00:14:11,000 --> 00:14:13,000
And let's run this cell.

222
00:14:13,000 --> 00:14:15,000
So here you can see I've just added all the code.

223
00:14:15,000 --> 00:14:21,000
So we are just looping through each uh, folder in this test directory.

224
00:14:21,000 --> 00:14:25,000
And we are just testing the model performance then.

225
00:14:25,000 --> 00:14:27,000
And let's see what score we get.

226
00:14:27,000 --> 00:14:29,000
Either we are making great predictions or not.

227
00:14:29,000 --> 00:14:33,000
So here you can see we have the results okay.

228
00:14:34,000 --> 00:14:40,000
So we have the uh so now we are looping through each of that uh folder in this test directory.

229
00:14:40,000 --> 00:14:47,000
And we are doing the what we are making model predictions on each of these images that we have in these

230
00:14:47,000 --> 00:14:49,000
folders over here.

231
00:14:49,000 --> 00:14:50,000
Okay.

232
00:14:50,000 --> 00:14:55,000
And over here you can see if we have the ground truth by arm like you can see over here.

233
00:14:56,000 --> 00:15:02,000
Our predicted class is also Bam, and we have a confidence score of 1.00, which means 100%.

234
00:15:02,000 --> 00:15:05,000
So you can see we have multiple images inside each of this folder.

235
00:15:05,000 --> 00:15:12,000
Similarly, if our ground truth is jumbo, we our predicted class is also jumbo and we have a confidence

236
00:15:12,000 --> 00:15:13,000
score of 1.00%.

237
00:15:13,000 --> 00:15:18,000
And if we go below similarly our class is ground.

238
00:15:18,000 --> 00:15:23,000
Truth is our predicted class and our confidence score is 100% as well.

239
00:15:23,000 --> 00:15:28,000
Okay, but in some cases the confidence score might decrease as well.

240
00:15:28,000 --> 00:15:34,000
So let's see if there is a scenario where we are getting a low confidence score.

241
00:15:34,000 --> 00:15:35,000
Okay.

242
00:15:35,000 --> 00:15:41,000
So you can see for our for the ground truth our our predicted class is lower as well.

243
00:15:41,000 --> 00:15:43,000
But the confidence score has decreased to 50%.

244
00:15:43,000 --> 00:15:48,000
Like the model is 50% confident that this is lower and our ground truth is lower.

245
00:15:48,000 --> 00:15:52,000
Our predicted class is lower, but the model is 70% confident that this is smaller.

246
00:15:52,000 --> 00:15:58,000
So there can be some wrong, uh, false positive or wrong predictions as well.

247
00:15:59,000 --> 00:16:05,000
So over here, like you can see that we have tested on model performance on all this test data set images.

248
00:16:05,000 --> 00:16:10,000
You can also test the model performance on some random image or an image as well.

249
00:16:10,000 --> 00:16:13,000
And you can see how well your model performs.

250
00:16:13,000 --> 00:16:13,000
Okay.

251
00:16:14,000 --> 00:16:20,000
So basically we are going to loop through each of the folders we have with the test factory.

252
00:16:20,000 --> 00:16:24,000
And we are making model predictions on each of these images.

253
00:16:24,000 --> 00:16:30,000
And then we are saying that if our predicted class matches with the ground truth and what confidence

254
00:16:30,000 --> 00:16:31,000
score we are getting.

255
00:16:31,000 --> 00:16:37,000
So in this tutorial, we have learned how we can fine tune the YOLO 11 classification model for plants

256
00:16:37,000 --> 00:16:38,000
classification.

257
00:16:38,000 --> 00:16:39,000
And we have fine tune.

258
00:16:39,000 --> 00:16:44,000
The YOLO 11 model on this plant classification data set are available on Roboflow.

259
00:16:44,000 --> 00:16:51,000
You can also fine tune your 11 image classification model on any other data set on your custom data

260
00:16:51,000 --> 00:16:56,000
set as well, and you can test the model performance on, um, other random images as well.

261
00:16:56,000 --> 00:16:57,000
So that's all from this tutorial.

262
00:16:57,000 --> 00:16:58,000
Thank you for watching.