1
00:00:03,000 --> 00:00:09,000
In this video tutorial, we will look at how we can fine tune the YOLO 11 code estimation model for

2
00:00:09,000 --> 00:00:11,000
human activity recognition.

3
00:00:11,000 --> 00:00:13,000
And here is a quick demo.

4
00:00:13,000 --> 00:00:16,000
You can see that we are able to detect a fall.

5
00:00:16,000 --> 00:00:22,000
Along with this, we will also be able to detect other body positions like sitting, standing, walking

6
00:00:22,000 --> 00:00:24,000
and fall down as well.

7
00:00:24,000 --> 00:00:26,000
So let me first show you the data set.

8
00:00:26,000 --> 00:00:33,000
And over here you can see that we will be using Human Activity Keypoints data set available on Roboflow.

9
00:00:33,000 --> 00:00:37,000
And we have total 1608 images in our data set.

10
00:00:37,000 --> 00:00:42,000
And you can see over here we have a fall down sitting standing walking.

11
00:00:42,000 --> 00:00:45,000
So we have these different classes over here.

12
00:00:45,000 --> 00:00:47,000
And we have two versions of data set available.

13
00:00:47,000 --> 00:00:49,000
And like this one.

14
00:00:49,000 --> 00:00:52,000
And this one we will be using the second version.

15
00:00:52,000 --> 00:00:54,000
In the second version no augmentation is applied.

16
00:00:55,000 --> 00:00:59,000
So that works because we have enough number of images in that training data.

17
00:00:59,000 --> 00:01:09,000
So we have 70% of images like 126 images in the training data, 20% of images like 324 images in the

18
00:01:09,000 --> 00:01:09,000
validation data.

19
00:01:09,000 --> 00:01:17,000
And we have 158 images in the test data set, which account for 10% of our total data set.

20
00:01:17,000 --> 00:01:24,000
And um, we have I think we also have fine tuned models available over here as well.

21
00:01:24,000 --> 00:01:31,000
And um, our analytics like just check the model health check up for class balance.

22
00:01:31,000 --> 00:01:36,000
You can see that we have good number of images for the sitting class fall down and standing.

23
00:01:36,000 --> 00:01:39,000
But the walking class is underrepresented.

24
00:01:39,000 --> 00:01:40,000
Like this.

25
00:01:40,000 --> 00:01:44,000
The data set is not totally balanced like walking class is underrepresented.

26
00:01:44,000 --> 00:01:53,000
So if you are in ideal scenario, we should add more images into our walking class for the walking class

27
00:01:53,000 --> 00:02:00,000
and try to make it like increase the number of images or Our agitations for the working class and try

28
00:02:00,000 --> 00:02:04,000
to make it equal to the are at least standing class as well.

29
00:02:04,000 --> 00:02:12,000
Okay, so like in all the three scenarios, training, validation and test are the working classes are

30
00:02:12,000 --> 00:02:15,000
underrepresented like the dataset is not completely balanced.

31
00:02:15,000 --> 00:02:16,000
Okay.

32
00:02:16,000 --> 00:02:21,000
And here are some insights like where our majority of annotations lie okay.

33
00:02:21,000 --> 00:02:25,000
And our image ratio is 1287 20 okay.

34
00:02:25,000 --> 00:02:28,000
So these are different images we have.

35
00:02:28,000 --> 00:02:30,000
You can see over here.

36
00:02:31,000 --> 00:02:37,000
So now you can see that uh the images are properly annotated as well.

37
00:02:37,000 --> 00:02:42,000
So here we have the human activity Keypoints dataset that is available on Roboflow.

38
00:02:42,000 --> 00:02:48,000
In order to use this dataset in your, uh, project or in Google Colab notebook, you need to create

39
00:02:48,000 --> 00:02:52,000
an account on Roboflow and then you will be able to download this data set.

40
00:02:52,000 --> 00:02:58,000
Or you can export this dataset from Roboflow into the Google Colab notebook, so you can just select

41
00:02:58,000 --> 00:03:05,000
the format from here, like it show download code, and you can just copy this code from here and add

42
00:03:05,000 --> 00:03:07,000
it to your Google Colab notebook.

43
00:03:07,000 --> 00:03:11,000
And you will be able to download this data set from Roboflow into the Google Colab notebook.

44
00:03:11,000 --> 00:03:12,000
Or else.

45
00:03:12,000 --> 00:03:17,000
Uh, there is another way that you can download the zip file to your computer by clicking on continue.

46
00:03:17,000 --> 00:03:23,000
This will start downloading this zip file to your computer, but I have already have the zip file so

47
00:03:23,000 --> 00:03:24,000
I don't need this.

48
00:03:24,000 --> 00:03:25,000
I will cancel it.

49
00:03:28,000 --> 00:03:33,000
Okay, so now here I have just created this complete notebook and it's completely ready.

50
00:03:33,000 --> 00:03:38,000
So before running the script, please make sure that you have selected the runtime SD or GPU.

51
00:03:38,000 --> 00:03:41,000
So in the step number one will install that politics package.

52
00:03:46,000 --> 00:03:48,000
So now you can see the Ultralytics package is installed.

53
00:03:48,000 --> 00:03:50,000
So now we will import Ultralytics.

54
00:03:50,000 --> 00:03:55,000
And from Ultralytics we will import YOLO so that we can access the YOLO 11 model.

55
00:03:56,000 --> 00:04:02,000
So now we are checking the analytics like which version is available and the Python version as well,

56
00:04:02,000 --> 00:04:06,000
and how many whether we are using GPU or not.

57
00:04:06,000 --> 00:04:10,000
So we have our latest latest 8.3.12 version available.

58
00:04:10,000 --> 00:04:19,000
We are using Python 3.10 and we have Cuda available, Tesla T4 GPU with 15 GB Ram, and we have two

59
00:04:19,000 --> 00:04:27,000
CPUs available, and currently we have a total disk space of 112 GB, out of which 36.6 GB is being

60
00:04:27,000 --> 00:04:27,000
occupied.

61
00:04:28,000 --> 00:04:35,000
So I try to download the data set from Roboflow into the Google Colab notebook using this piece of code,

62
00:04:35,000 --> 00:04:43,000
like when you just click on Download Data set and you select the format, um, and using this, copy

63
00:04:43,000 --> 00:04:44,000
this code and add it over here.

64
00:04:44,000 --> 00:04:48,000
So but it gives me the error I can just show you as well.

65
00:04:48,000 --> 00:04:48,000
So.

66
00:04:51,000 --> 00:04:57,000
So if you try to download that data set using this piece of code, it will give you an error.

67
00:05:01,000 --> 00:05:06,000
Okay, so previously it was giving me an error, but now it is working fine.

68
00:05:06,000 --> 00:05:09,000
So you can use this way as well to download the data set.

69
00:05:09,000 --> 00:05:16,000
Now you can see we have downloaded the data set from Roboflow into our Google Colab notebook.

70
00:05:16,000 --> 00:05:19,000
So that works fine are pretty fine.

71
00:05:19,000 --> 00:05:20,000
Like I'm just amazed.

72
00:05:21,000 --> 00:05:23,000
So we can just or you can just.

73
00:05:24,000 --> 00:05:29,000
I have also downloaded this data set into the zip format like I was trying, and I have just placed

74
00:05:29,000 --> 00:05:31,000
this data set into my Google Drive.

75
00:05:31,000 --> 00:05:35,000
So now I'm directly downloading this data set from my drive over here.

76
00:05:35,000 --> 00:05:35,000
Okay.

77
00:05:35,000 --> 00:05:43,000
So either you can um, just download this data set and just create another folder in the name of Gregory.

78
00:05:43,000 --> 00:05:47,000
And you can just unzip this data set over here.

79
00:05:52,000 --> 00:05:59,000
So you can see we have a data set folder and you can see where we have the data dot yml files over here.

80
00:05:59,000 --> 00:06:00,000
Okay.

81
00:06:02,000 --> 00:06:05,000
Let me just save this script for now.

82
00:06:09,000 --> 00:06:14,000
So there are two ways like you can follow either of the way okay.

83
00:06:23,000 --> 00:06:28,000
So either you can just download this data set from drive into your Google Colab notebook.

84
00:06:28,000 --> 00:06:32,000
And uh, just save this unzip this data set and you can see we have this data set.

85
00:06:32,000 --> 00:06:38,000
And uh, you can see here we have the downloaded the data set from Roboflow into the Google Colab notebook.

86
00:06:38,000 --> 00:06:45,000
So if you just follow this, uh, if you just export the data set from Roboflow into the Google Colab

87
00:06:45,000 --> 00:06:49,000
notebook, then you need to update the data dot yml file path over here.

88
00:06:49,000 --> 00:06:51,000
Like you can see we have the names.

89
00:06:51,000 --> 00:06:56,000
And if I just check the data dot YAML file, this one as well.

90
00:06:56,000 --> 00:06:59,000
So you can see we have the same things available over here as well.

91
00:07:00,000 --> 00:07:01,000
So.

92
00:07:03,000 --> 00:07:04,000
So it's your choice.

93
00:07:04,000 --> 00:07:06,000
You can use either of the way approaches.

94
00:07:06,000 --> 00:07:07,000
Okay.

95
00:07:07,000 --> 00:07:12,000
But one thing you need to make sure is if you want to use this data set which you downloaded from Roboflow

96
00:07:12,000 --> 00:07:15,000
into your Google Colab notebook.

97
00:07:15,000 --> 00:07:19,000
So you just need to update the training and validation path.

98
00:07:19,000 --> 00:07:23,000
So you can just go over here and you can just copy the path over here.

99
00:07:24,000 --> 00:07:27,000
We make sure to update this path to paths over here.

100
00:07:29,000 --> 00:07:31,000
Just copy path from here.

101
00:07:35,000 --> 00:07:35,000
Okay.

102
00:07:35,000 --> 00:07:39,000
So please make sure that you update that training and validation paths over here.

103
00:07:39,000 --> 00:07:44,000
After this you can download the YOLO 11 uh for the estimation model.

104
00:07:44,000 --> 00:07:50,000
So I'm directly downloading YOLO 11 pose estimation model from GitHub repository into this Google Colab

105
00:07:50,000 --> 00:07:51,000
notebook.

106
00:07:51,000 --> 00:07:58,000
And then I'm also loading the YOLO 11 model, and the YOLO 11 model is saved in the variable model.

107
00:07:58,000 --> 00:08:01,000
And I'm using YOLO 11 small pose estimation.

108
00:08:01,000 --> 00:08:07,000
Currently YOLO 11 comes with five different models YOLO 11, nano small, medium large extra large.

109
00:08:07,000 --> 00:08:12,000
So the difference between each of these models, like YOLO 11 nano is the fastest, but it is least

110
00:08:12,000 --> 00:08:13,000
accurate.

111
00:08:13,000 --> 00:08:17,000
But YOLO 11 Extra Large is the most accurate, but among other YOLO 11 models.

112
00:08:17,000 --> 00:08:20,000
But it is takes more inference time.

113
00:08:20,000 --> 00:08:24,000
Okay, so we are using currently using YOLO 11 small for the estimation model over here.

114
00:08:24,000 --> 00:08:29,000
And I have downloaded the model and I have saved the model into the variable model.

115
00:08:29,000 --> 00:08:30,000
Okay.

116
00:08:31,000 --> 00:08:37,000
Then I will not run the training again because I have already done the training and I was not recording

117
00:08:37,000 --> 00:08:39,000
this video, I was just preparing this notebook.

118
00:08:39,000 --> 00:08:41,000
Then I just run the training.

119
00:08:41,000 --> 00:08:47,000
So now you train the YOLO 11 or fine tuning the if you want to fine tune the YOLO 11 or estimation model

120
00:08:47,000 --> 00:08:50,000
on human Activity data set.

121
00:08:51,000 --> 00:08:52,000
You will do model dot train.

122
00:08:52,000 --> 00:08:58,000
Here you will pass the data dot YAML file path and we are finding the YOLO 11 code estimation model

123
00:08:58,000 --> 00:09:02,000
mode is equal to train, and we are fine tuning it for 50 epochs.

124
00:09:02,000 --> 00:09:04,000
And the batch size is being set to eight.

125
00:09:04,000 --> 00:09:11,000
So currently you can see over here we have 1126 images uh in our training data.

126
00:09:11,000 --> 00:09:15,000
So these images will be passed in in eight batches okay.

127
00:09:15,000 --> 00:09:18,000
So here you can see.

128
00:09:18,000 --> 00:09:23,000
So if you just open the calculator from here and.

129
00:09:27,000 --> 00:09:35,000
And if you write 1126 divided by eight so you can see 140 .75 like it's 141.

130
00:09:35,000 --> 00:09:38,000
So we will be passing our data in eight batches.

131
00:09:38,000 --> 00:09:44,000
And in each batch we will pass 141 images as we have 1126 images in total.

132
00:09:44,000 --> 00:09:44,000
Okay.

133
00:09:44,000 --> 00:09:49,000
So here you can see that, uh, these are the mean average precision.

134
00:09:49,000 --> 00:09:52,000
Like in this case, we are doing two things.

135
00:09:52,000 --> 00:09:53,000
First, we are predicting bounding boxes.

136
00:09:53,000 --> 00:09:58,000
Plus also we are doing the prediction or estimation of key points.

137
00:09:58,000 --> 00:09:58,000
Okay.

138
00:09:58,000 --> 00:10:03,000
So this is the mean average precision with IOU 0.5 for the bounding boxes.

139
00:10:03,000 --> 00:10:11,000
And these are the mean average precision with IOU 0.5 for the key points prediction okay.

140
00:10:12,000 --> 00:10:15,000
And uh it will run for 60 epochs.

141
00:10:15,000 --> 00:10:20,000
And you can see over here, as I told you, that we have around four different classes in our data set.

142
00:10:20,000 --> 00:10:24,000
And over here you can see that, uh, we have the results.

143
00:10:24,000 --> 00:10:29,000
Like for the working class, I told you that the data set contain less number of images, so we cannot

144
00:10:29,000 --> 00:10:34,000
expect the same performance, uh, for the working class as compared to other classes.

145
00:10:34,000 --> 00:10:39,000
So now you can see over here we have the fall down class sitting class standing walking.

146
00:10:39,000 --> 00:10:43,000
And you can see over here, um, even for the walking class, we are getting a good, mean, average

147
00:10:43,000 --> 00:10:44,000
precision score.

148
00:10:44,000 --> 00:10:48,000
And, uh, in for the pose estimation prediction.

149
00:10:48,000 --> 00:10:53,000
Like for the working class, you can see that, um, accuracy is low.

150
00:10:53,000 --> 00:10:57,000
Like you can see that for all the other classes the accuracy is above 90%.

151
00:10:57,000 --> 00:11:04,000
But for the working class the accuracy is less so in terms of pose estimation prediction.

152
00:11:04,000 --> 00:11:07,000
For the working class, the accuracy has compromised.

153
00:11:07,000 --> 00:11:07,000
Okay.

154
00:11:08,000 --> 00:11:10,000
So then we are validating our model.

155
00:11:10,000 --> 00:11:12,000
Uh, so on the validation data set.

156
00:11:12,000 --> 00:11:17,000
So we basically validate our model uh to see how our model generalizes.

157
00:11:17,000 --> 00:11:25,000
So now you can see that we are good getting good mean average precision score over here okay.

158
00:11:25,000 --> 00:11:30,000
So like the results look promising for me okay.

159
00:11:32,000 --> 00:11:34,000
So now here we have the confusion matrix.

160
00:11:34,000 --> 00:11:41,000
So um 88% of the time when the person has fallen down, our model predicted correctly that the person

161
00:11:41,000 --> 00:11:42,000
has fallen down.

162
00:11:42,000 --> 00:11:47,000
While 2% of the times when the person has fallen down, our or model predicted that a person is sitting

163
00:11:47,000 --> 00:11:51,000
while a one person or 1 to 88.

164
00:11:51,000 --> 00:11:54,000
One time when the person has fallen down.

165
00:11:54,000 --> 00:11:58,000
Our model was unable to detect anything.

166
00:12:02,000 --> 00:12:06,000
Okay, so I have, uh, while explaining I made might made a mistake.

167
00:12:06,000 --> 00:12:08,000
So this is not 88%.

168
00:12:08,000 --> 00:12:09,000
So this is 88.

169
00:12:09,000 --> 00:12:11,000
Like what, 88 instances?

170
00:12:11,000 --> 00:12:11,000
Okay.

171
00:12:11,000 --> 00:12:17,000
So in for 88 instances when the person has fallen down, our model predicted correctly that the person

172
00:12:17,000 --> 00:12:23,000
has fallen down and 2% are two times when the person has fallen down.

173
00:12:23,000 --> 00:12:29,000
Our model predicted that the person is sitting while one time when the person has fallen down, our

174
00:12:29,000 --> 00:12:36,000
model was unable to predict anything and 1104 times when the person is sitting.

175
00:12:36,000 --> 00:12:44,000
Our model predicted correctly that he is sitting while 74 times when the person, uh is standing.

176
00:12:44,000 --> 00:12:50,000
Our model predicted correctly that the person is standing while nine.

177
00:12:50,000 --> 00:12:53,000
Times when the person is standing.

178
00:12:53,000 --> 00:13:03,000
Our model predicted that the person is walking, while one time when the person is standing, our model

179
00:13:03,000 --> 00:13:05,000
predicted nothing.

180
00:13:05,000 --> 00:13:14,000
Okay, while for the 41 times when the person is walking, our model predicted correctly that the person

181
00:13:14,000 --> 00:13:22,000
is walking, while five times when the person is walking, our model predicted that the person is standing.

182
00:13:22,000 --> 00:13:27,000
Okay, so in this way you can, uh, use the confusion matrix.

183
00:13:27,000 --> 00:13:30,000
So confusion matrix tells us how our model handles different classes.

184
00:13:30,000 --> 00:13:33,000
So this is a normalized to 97%.

185
00:13:33,000 --> 00:13:34,000
So here is the percentage.

186
00:13:34,000 --> 00:13:39,000
So 97% times when the model when the person has fallen down.

187
00:13:39,000 --> 00:13:42,000
Our model predicted that the person has fallen down.

188
00:13:42,000 --> 00:13:45,000
While 2% of the times when the person has fallen down.

189
00:13:45,000 --> 00:13:54,000
Our model predicted that the person is sitting while 100% of the times when the person is sitting our

190
00:13:54,000 --> 00:14:02,000
model predict correctly that the person is sitting, while 888% of the times when the person is standing,

191
00:14:02,000 --> 00:14:08,000
our model predicted correctly that the person is standing, while 11% of the times when the person is

192
00:14:08,000 --> 00:14:14,000
standing, our model predicted that the person is walking, while 1% of the time when the person is

193
00:14:14,000 --> 00:14:18,000
standing, our model is unable to predict anything.

194
00:14:18,000 --> 00:14:24,000
While 87% of the time when the person is walking our model predicted correctly that the person is walking,

195
00:14:24,000 --> 00:14:30,000
while 11% of the time when the person is walking, our model predicted that the person is standing.

196
00:14:30,000 --> 00:14:33,000
So in this way you can use the confusion matrix.

197
00:14:33,000 --> 00:14:35,000
So here comes the precision curve.

198
00:14:38,000 --> 00:14:43,000
Um, before we look at this curve, let's understand, uh, what is precision.

199
00:14:43,000 --> 00:14:48,000
Okay, so if I just said precision computer vision.

200
00:14:51,000 --> 00:14:54,000
So if I just go over here and here.

201
00:14:54,000 --> 00:14:54,000
Okay.

202
00:14:57,000 --> 00:15:05,000
So over here you can see that precision is a total correct predictions divided by total predictions.

203
00:15:05,000 --> 00:15:06,000
Okay.

204
00:15:06,000 --> 00:15:14,000
Uh, so basically you can see that um, precision is basically total true positives divided by total

205
00:15:14,000 --> 00:15:15,000
positive predictions.

206
00:15:15,000 --> 00:15:16,000
Okay.

207
00:15:16,000 --> 00:15:20,000
So like um here we have the formula for precision.

208
00:15:20,000 --> 00:15:27,000
So if you just see this curve as we increase the confidence threshold the precision will increase.

209
00:15:27,000 --> 00:15:34,000
Because when we increase the confidence threshold our false positives will decrease.

210
00:15:34,000 --> 00:15:36,000
Like we will be getting less false positives.

211
00:15:36,000 --> 00:15:43,000
So when we increase the confidence, we will definitely getting less false positives and we will getting

212
00:15:43,000 --> 00:15:44,000
more true positives.

213
00:15:44,000 --> 00:15:52,000
Okay, so over here and when we get more true positives, we will have definitely have a more better

214
00:15:52,000 --> 00:15:53,000
value of precision.

215
00:15:53,000 --> 00:15:54,000
Okay.

216
00:15:54,000 --> 00:16:00,000
So over here you can see that when we have a low confidence threshold we might be getting a very high

217
00:16:00,000 --> 00:16:01,000
false positives.

218
00:16:01,000 --> 00:16:05,000
So we will have low uh precision value at low confidence score.

219
00:16:05,000 --> 00:16:12,000
As we increase the confidence threshold our precision value will increase because, uh, when we increase

220
00:16:12,000 --> 00:16:18,000
the confidence or our false positives will decrease, our true positives will be increasing.

221
00:16:18,000 --> 00:16:19,000
Okay.

222
00:16:19,000 --> 00:16:24,000
So therefore, when we increase the confidence threshold, the precision value will increase.

223
00:16:24,000 --> 00:16:30,000
And you can see over here, uh, for all classes the precision value is 1.00.

224
00:16:31,000 --> 00:16:35,000
Like it's the perfect one value at 0.975 quantile.

225
00:16:35,000 --> 00:16:42,000
So at 0.975 confidence threshold, we are getting a precision value of one for all classes like you

226
00:16:42,000 --> 00:16:44,000
can see over here as well.

227
00:16:45,000 --> 00:16:51,000
While you can see for the working class as we have less number of images for the working class, I showed

228
00:16:51,000 --> 00:16:55,000
you in the training data that it's under balanced or underrepresented.

229
00:16:55,000 --> 00:17:00,000
So you can see the precision score is low for the working class even at higher confidence threshold,

230
00:17:01,000 --> 00:17:05,000
even as you can see higher confidence threshold and the precision score is low for the working class,

231
00:17:05,000 --> 00:17:10,000
while for all the other classes we are getting a very good precision score.

232
00:17:10,000 --> 00:17:13,000
Okay, so here is the simple technique.

233
00:17:13,000 --> 00:17:18,000
As you increase the contrast threshold, you will be getting less false positives and you will be getting

234
00:17:18,000 --> 00:17:19,000
more true positives.

235
00:17:19,000 --> 00:17:20,000
And you are.

236
00:17:20,000 --> 00:17:24,000
When you are getting more true positives, your precision value will continue to increase.

237
00:17:24,000 --> 00:17:28,000
And when you have a low confidence score, you will be getting more false positives.

238
00:17:28,000 --> 00:17:33,000
And when you are getting more false positives, then you will have low precision value.

239
00:17:35,000 --> 00:17:37,000
And here we have the recall confidence curve.

240
00:17:37,000 --> 00:17:37,000
Confirmed.

241
00:17:38,000 --> 00:17:44,000
So in recalls we have the correct predictions, like the number of true positives divided by total ground

242
00:17:44,000 --> 00:17:49,000
truth, like a ground truth annotations which include true positives plus false negatives.

243
00:17:49,000 --> 00:17:53,000
Okay, the false negatives are that model is unable to predict that.

244
00:17:53,000 --> 00:17:57,000
For example, if there is a cat, the and model does not predict anything.

245
00:17:57,000 --> 00:18:04,000
This is a false negative because model is unable to predict that there is a cat in the image.

246
00:18:05,000 --> 00:18:11,000
So in the conference code, like you can see that in recall, when we increase the confidence score,

247
00:18:11,000 --> 00:18:14,000
our recall value tends to decrease.

248
00:18:14,000 --> 00:18:21,000
Because if you just see the formula over here, as we increase the confidence score, our we might be

249
00:18:21,000 --> 00:18:24,000
getting a true positive values.

250
00:18:24,000 --> 00:18:29,000
But um, when we increase the confidence score, uh, our predictions might be missing.

251
00:18:29,000 --> 00:18:32,000
Like we will be getting less predictions.

252
00:18:32,000 --> 00:18:39,000
Okay, so when we increase the confidence score, our, uh predictions will decrease, like if we have

253
00:18:39,000 --> 00:18:41,000
a confidence score of 0.2.

254
00:18:41,000 --> 00:18:48,000
If I am getting ten predictions and if I make the confidence score 0.9, I will be definitely getting

255
00:18:48,000 --> 00:18:49,000
5 or 6 predictions.

256
00:18:49,000 --> 00:18:54,000
So as we increase the particular score, our true positives will decrease.

257
00:18:54,000 --> 00:18:59,000
And um, we might be missing some, uh, predictions as well.

258
00:18:59,000 --> 00:19:06,000
Like if there is a cat in the, uh, image and we are detecting the cat at a confidence score of 0.6,

259
00:19:06,000 --> 00:19:09,000
and if I make the confidence value 0.8.

260
00:19:09,000 --> 00:19:12,000
So we will not be detecting the cat in the image.

261
00:19:12,000 --> 00:19:14,000
So the prediction is missing.

262
00:19:14,000 --> 00:19:16,000
So we will get doing false negative.

263
00:19:16,000 --> 00:19:19,000
So as we are unable to detect anything.

264
00:19:19,000 --> 00:19:26,000
So when we increase the confidence stretch so our prediction will decrease and we will not be detecting

265
00:19:26,000 --> 00:19:28,000
everything in the image.

266
00:19:28,000 --> 00:19:32,000
And there will be false negatives like we are unable to detect a ground truth.

267
00:19:32,000 --> 00:19:36,000
So our recall value will decrease.

268
00:19:36,000 --> 00:19:37,000
Okay.

269
00:19:40,000 --> 00:19:44,000
So as we increase the confidence threshold, the recall value will decrease.

270
00:19:44,000 --> 00:19:53,000
And you can see as well at a confidence score of 0.0 we add a confidence score of 0.0.

271
00:19:53,000 --> 00:19:56,000
We are getting the best recall score of 0.96.

272
00:19:56,000 --> 00:19:56,000
Okay.

273
00:19:56,000 --> 00:20:03,000
So now as you can see over here, if you have ten ground truth images, annotations of Cat.

274
00:20:03,000 --> 00:20:08,000
And if you make five predictions, we are getting four out of five predictions.

275
00:20:08,000 --> 00:20:11,000
We are getting four true positives and false negatives.

276
00:20:11,000 --> 00:20:16,000
Like we are unable to detect the ground truth, so the recall value will be 40%.

277
00:20:16,000 --> 00:20:16,000
Okay.

278
00:20:16,000 --> 00:20:20,000
We might be getting higher precision, but the recall value will be poor okay.

279
00:20:21,000 --> 00:20:27,000
So when we increase the confidence threshold, uh, many of the, uh, predictions gets missing.

280
00:20:27,000 --> 00:20:33,000
And many of the when the many of the predictions get missing, then we will not be detecting all the

281
00:20:33,000 --> 00:20:33,000
ground truths.

282
00:20:33,000 --> 00:20:36,000
And we they will go to all negative false negatives.

283
00:20:36,000 --> 00:20:43,000
And when the four of false negatives continue to decrease, increase and then the recall value will

284
00:20:43,000 --> 00:20:43,000
decrease.

285
00:20:43,000 --> 00:20:44,000
Okay.

286
00:20:44,000 --> 00:20:51,000
So as you increase the confidence score, the recall value will continuously decrease because there

287
00:20:51,000 --> 00:20:53,000
will be less predictions.

288
00:20:53,000 --> 00:20:56,000
And here is the model predictions on the validation batch.

289
00:20:56,000 --> 00:20:59,000
We are not using the validation data set images for the training.

290
00:20:59,000 --> 00:21:02,000
So it's always better to have a look and see how our model is performing.

291
00:21:02,000 --> 00:21:09,000
So we are able to detect the sitting position of the person called down as well standing as well.

292
00:21:09,000 --> 00:21:11,000
So we are just getting good results over here.

293
00:21:11,000 --> 00:21:15,000
So I have already trained the model and saved the model weights onto my drive.

294
00:21:15,000 --> 00:21:20,000
So I'm directly downloading the model weights from drive into this Google Colab notebook.

295
00:21:22,000 --> 00:21:28,000
So you can also validate your model to see how our model, your model generalizes as well.

296
00:21:30,000 --> 00:21:32,000
Okay, I mistakenly run the training.

297
00:21:32,000 --> 00:21:35,000
I don't need to run it, so I'll just post this.

298
00:21:35,000 --> 00:21:36,000
So.

299
00:21:38,000 --> 00:21:39,000
Here we are.

300
00:21:39,000 --> 00:21:40,000
Okay.

301
00:21:40,000 --> 00:21:40,000
So we are here.

302
00:21:40,000 --> 00:21:45,000
So basically you can also, uh download the model here.

303
00:21:45,000 --> 00:21:48,000
So after training I've just placed the model weights onto my drive.

304
00:21:48,000 --> 00:21:54,000
So I'm directly downloading the model weights from drive into this, uh, Google Colab notebook over

305
00:21:54,000 --> 00:21:54,000
here.

306
00:21:55,000 --> 00:21:55,000
Okay.

307
00:21:59,000 --> 00:22:04,000
So now I'm just will just perform inference on the test data set images here.

308
00:22:09,000 --> 00:22:12,000
So here I'm just performing trends on the test dataset images.

309
00:22:12,000 --> 00:22:18,000
And let me just plot, uh, the results we get on the test dataset images.

310
00:22:25,000 --> 00:22:27,000
So this will take few seconds.

311
00:22:27,000 --> 00:22:27,000
And let's.

312
00:22:36,000 --> 00:22:40,000
So I'm trying to display all my results into this Google Colab notebook over here.

313
00:22:52,000 --> 00:22:56,000
So here you can see that we are our results are here.

314
00:22:56,000 --> 00:23:01,000
So we the person is walking, we have written and here the person is standing that works.

315
00:23:01,000 --> 00:23:03,000
The person has fallen down.

316
00:23:03,000 --> 00:23:05,000
The person is sitting.

317
00:23:05,000 --> 00:23:08,000
So our results look quite promising to me.

318
00:23:08,000 --> 00:23:08,000
Yeah.

319
00:23:09,000 --> 00:23:15,000
So now I will just download a random video from my drive into this Google Colab notebook, and I will

320
00:23:15,000 --> 00:23:18,000
test my model performance on this video.

321
00:23:18,000 --> 00:23:22,000
So in the video we have total number of 272 frames.

322
00:23:22,000 --> 00:23:29,000
And uh, we are doing uh predictions on each of the frame one by one, like the person is walking,

323
00:23:29,000 --> 00:23:30,000
falling Fallen down.

324
00:23:31,000 --> 00:23:37,000
Okay, so let me display the output video over here and let's see how our results look like.

325
00:23:43,000 --> 00:23:45,000
So here we have the output video.

326
00:23:45,000 --> 00:23:48,000
Let me just download this video and show you how it works.

327
00:23:55,000 --> 00:23:56,000
So you can see the person is coming.

328
00:23:56,000 --> 00:23:58,000
We are able to detect the person is walking.

329
00:23:58,000 --> 00:24:00,000
And here it's fall down.

330
00:24:00,000 --> 00:24:05,000
So you can see that we are able to detect that the person has fallen down and our results look quite

331
00:24:05,000 --> 00:24:06,000
promising.

332
00:24:06,000 --> 00:24:06,000
Okay.

333
00:24:06,000 --> 00:24:10,000
So let's see again here we can see the person coming up.

334
00:24:10,000 --> 00:24:11,000
The person is walking.

335
00:24:11,000 --> 00:24:12,000
That's fine.

336
00:24:12,000 --> 00:24:13,000
Now the person has fall down.

337
00:24:13,000 --> 00:24:16,000
And our model predicted that the person has fall down.

338
00:24:16,000 --> 00:24:19,000
So you can test the model on some other videos as well.

339
00:24:19,000 --> 00:24:25,000
So in this tutorial we have seen that how we can fine tune the YOLO 11 pose estimation model for human

340
00:24:25,000 --> 00:24:26,000
activity recognition.

341
00:24:26,000 --> 00:24:28,000
So that's all from this video tutorial.

342
00:24:28,000 --> 00:24:30,000
Thank you for watching.