1
00:00:03,000 --> 00:00:09,000
In this video tutorial, we will learn how we can create a Streamlit application to do object detection,

2
00:00:09,000 --> 00:00:13,000
instance segmentation and pose estimation using Ultralytics.

3
00:00:13,000 --> 00:00:14,000
YOLO 11.

4
00:00:14,000 --> 00:00:20,000
Here is a quick demo of the Streamlit application that we will be creating in this tutorial.

5
00:00:20,000 --> 00:00:26,000
So let me first give you a quick demo of the Streamlit application that we are building in this tutorial.

6
00:00:26,000 --> 00:00:32,000
So we can perform object detection, instance segmentation and pose estimation with the help of this

7
00:00:32,000 --> 00:00:33,000
Streamlit application.

8
00:00:33,000 --> 00:00:40,000
And we will be able to perform object detection, instance segmentation and pose estimation on images

9
00:00:40,000 --> 00:00:41,000
and on videos.

10
00:00:41,000 --> 00:00:43,000
You can upload any random image like.

11
00:00:43,000 --> 00:00:48,000
You can click over here from Raw files and you can upload any image from here.

12
00:00:49,000 --> 00:00:53,000
Okay, as you upload the image, an image will appear over here.

13
00:00:53,000 --> 00:01:00,000
And if you want to perform a object detection, you can see over here we have by default the object

14
00:01:00,000 --> 00:01:02,000
detection is being marked over here.

15
00:01:02,000 --> 00:01:04,000
And then you can click on Detect object.

16
00:01:04,000 --> 00:01:07,000
And over here you will see an output image.

17
00:01:07,000 --> 00:01:09,000
Once the object detection is being performed.

18
00:01:10,000 --> 00:01:13,000
So now you can see over here we have the output image.

19
00:01:13,000 --> 00:01:19,000
And you can see we are able to detect bus person uh backpack handbags.

20
00:01:19,000 --> 00:01:24,000
So you can see the detection results over here like you can see now uh and over here you can see we

21
00:01:24,000 --> 00:01:29,000
have the, uh, bounding box coordinates over here in the form of tensors.

22
00:01:29,000 --> 00:01:35,000
Okay, so we have the bounding box coordinates for each of the detected object in the form of tensors.

23
00:01:35,000 --> 00:01:35,000
Okay.

24
00:01:35,000 --> 00:01:39,000
So as you expand this, you will see all the bounding box coordinates.

25
00:01:39,000 --> 00:01:45,000
And here you can see over here we have are able to detect a person backpack and bag person or bus and

26
00:01:45,000 --> 00:01:46,000
all the things.

27
00:01:46,000 --> 00:01:49,000
Okay so currently I'm using YOLO 11 nano model.

28
00:01:49,000 --> 00:01:55,000
And if you use uh large models of YOLO 11 you will get definitely better results.

29
00:01:56,000 --> 00:02:00,000
So going ahead you can also perform a segmentation as well.

30
00:02:00,000 --> 00:02:06,000
So if you want to perform the segmentation on the same image, you can click on segmentation from here.

31
00:02:06,000 --> 00:02:09,000
And you can just go down from here and click on Detect Object.

32
00:02:10,000 --> 00:02:11,000
Okay.

33
00:02:13,000 --> 00:02:16,000
So now you can see over here we are able to perform the segmentation.

34
00:02:16,000 --> 00:02:19,000
We are able to take the person backpack.

35
00:02:19,000 --> 00:02:21,000
And we are able to draw the mask as well.

36
00:02:21,000 --> 00:02:27,000
So basically in instance segmentation we can draw a mask around each of the detected object.

37
00:02:27,000 --> 00:02:32,000
And you can see over here we have we are able to draw a blue mask around each of the detected object

38
00:02:32,000 --> 00:02:33,000
okay.

39
00:02:33,000 --> 00:02:36,000
In the same way you can also perform pose estimation.

40
00:02:36,000 --> 00:02:41,000
And if you click on pose estimation from here and going below, if you click on Detect object from here.

41
00:02:45,000 --> 00:02:48,000
So now you can see over here we are able to perform the pose estimation.

42
00:02:48,000 --> 00:02:53,000
And you can see that we have drawn the key points around each of the detected object.

43
00:02:53,000 --> 00:02:57,000
Like you can see over here we are able to perform pose estimation.

44
00:02:57,000 --> 00:03:01,000
And we are able to draw or draw the key points around each of the detected object.

45
00:03:02,000 --> 00:03:02,000
Okay.

46
00:03:02,000 --> 00:03:09,000
In the same way, if you want to upload some other image, you can just click on upload files from here

47
00:03:09,000 --> 00:03:11,000
and you can upload this image from here.

48
00:03:15,000 --> 00:03:16,000
Okay.

49
00:03:16,000 --> 00:03:22,000
And after uploading this image, if you want to perform pose estimation, I have already selected pose

50
00:03:22,000 --> 00:03:22,000
estimation.

51
00:03:22,000 --> 00:03:24,000
And you can click on Detect Objects.

52
00:03:24,000 --> 00:03:27,000
So now you can see over here we are able to detect the object.

53
00:03:27,000 --> 00:03:31,000
Plus we are also able to draw the key points around each object object.

54
00:03:31,000 --> 00:03:32,000
So we have performed pose estimation.

55
00:03:32,000 --> 00:03:33,000
In the same way.

56
00:03:33,000 --> 00:03:38,000
If you want to perform segmentation you can just click on segmentation from here and you will be able

57
00:03:38,000 --> 00:03:40,000
to perform segmentation.

58
00:03:40,000 --> 00:03:45,000
Now you can see over here we are able to detect the object as well as we are able to draw a mask around

59
00:03:45,000 --> 00:03:47,000
each of the detected object over here.

60
00:03:47,000 --> 00:03:48,000
Okay.

61
00:03:48,000 --> 00:03:54,000
So we have seen that how we can perform object detection, instance segmentation and pose estimation

62
00:03:55,000 --> 00:03:56,000
on the image.

63
00:03:56,000 --> 00:03:58,000
Let's go to the video part.

64
00:03:58,000 --> 00:04:05,000
So if you want to perform object detection, instance segmentation and pose estimation on video, you

65
00:04:05,000 --> 00:04:07,000
can just select video from here as the source.

66
00:04:07,000 --> 00:04:09,000
And you can see this is the default video.

67
00:04:09,000 --> 00:04:15,000
Let's first see how we can perform object detection, instance segmentation and pose estimation on a

68
00:04:15,000 --> 00:04:16,000
video.

69
00:04:16,000 --> 00:04:19,000
So in my video directory I have added the two different videos.

70
00:04:19,000 --> 00:04:26,000
You can add as many videos you want, so as many videos you add in your video directory, you will be

71
00:04:27,000 --> 00:04:31,000
able to perform object detection, instance segmentation and pose estimation on those videos.

72
00:04:31,000 --> 00:04:37,000
Okay, so if you select video one and if you want to perform object detection like let's click on Detect

73
00:04:37,000 --> 00:04:37,000
Object.

74
00:04:37,000 --> 00:04:38,000
From here.

75
00:04:38,000 --> 00:04:40,000
And here you can see the output.

76
00:04:40,000 --> 00:04:43,000
We are able to detect the person over here okay.

77
00:04:44,000 --> 00:04:47,000
So like this is the wrong detection.

78
00:04:47,000 --> 00:04:49,000
Currently I'm using the 11 nano model.

79
00:04:49,000 --> 00:04:55,000
And if you want to get good results you can use a higher YOLO 11 models which include your 11 large.

80
00:04:55,000 --> 00:04:57,000
Your or 11 extra large.

81
00:04:57,000 --> 00:05:02,000
Uh, the reason I'm using a small yellow model, like 11 nano is that I'm currently running, uh, the

82
00:05:02,000 --> 00:05:08,000
Streamlit application on my CPU, and I don't have GPU in my local machine, so YOLO 11 is the fast.

83
00:05:08,000 --> 00:05:15,000
YOLO 11 nano is the fastest, but it is least accurate, while YOLO 11 Extra Large is the most accurate.

84
00:05:15,000 --> 00:05:17,000
But it takes more inference time.

85
00:05:17,000 --> 00:05:21,000
Okay, so there is a trade off between speed and accuracy.

86
00:05:22,000 --> 00:05:27,000
Okay, so if you want to go with another video like you can see over here, you can click on Detect

87
00:05:27,000 --> 00:05:32,000
Video Objects and you can see over here we are able to detect the person bicycle over here.

88
00:05:32,000 --> 00:05:35,000
And detection results look quite promising.

89
00:05:35,000 --> 00:05:36,000
Okay.

90
00:05:36,000 --> 00:05:41,000
And if you want to perform segmentation you can click on segmentation from here and click on Detect

91
00:05:41,000 --> 00:05:42,000
Video Objects.

92
00:05:42,000 --> 00:05:45,000
And now you can see that we are able to perform segmentation.

93
00:05:45,000 --> 00:05:48,000
We are able to draw a mask around each of the directory object.

94
00:05:49,000 --> 00:05:53,000
Uh the person are in the blue mask which you can clearly see over here.

95
00:05:53,000 --> 00:05:58,000
And if you want to perform pose estimation, you can just click on Pose Estimation and click on Detect

96
00:05:58,000 --> 00:05:59,000
Video Objects.

97
00:05:59,000 --> 00:06:02,000
And now you can see that we are able to detect the objects.

98
00:06:02,000 --> 00:06:06,000
And we are also able to draw key points around each of the object over here.

99
00:06:07,000 --> 00:06:13,000
You can also just a confidence threshold, like if you increase the confident threshold to 0.933 like

100
00:06:13,000 --> 00:06:18,000
93%, you can see detection has um like there are no detections.

101
00:06:18,000 --> 00:06:22,000
And if I just reduce the confidence to 70 let's see.

102
00:06:22,000 --> 00:06:28,000
So now you can see that, uh, it will only detect the object that have a confidence score above 70%

103
00:06:28,000 --> 00:06:30,000
or 0.7.

104
00:06:30,000 --> 00:06:35,000
So all the objects that have confidence score above 70% will be detected.

105
00:06:35,000 --> 00:06:37,000
So what does confidence score means?

106
00:06:37,000 --> 00:06:42,000
Like the confidence score means how much the model is confident that this is a person or this is a bicycle

107
00:06:42,000 --> 00:06:43,000
okay.

108
00:06:44,000 --> 00:06:48,000
So if you just increase the confidence score there will be less false positives.

109
00:06:48,000 --> 00:06:55,000
But if you decrease the confidence score like maybe to 25, or if you further increase the confidence,

110
00:06:55,000 --> 00:06:57,000
go to 10% or 0.10.

111
00:06:57,000 --> 00:07:02,000
You will get more detections, but there will be more false positives as well.

112
00:07:02,000 --> 00:07:07,000
Okay, but if you increase the uh, confidence score, there will be more accurate detection and there

113
00:07:07,000 --> 00:07:11,000
will be less false positives, but there will be few detections as well.

114
00:07:11,000 --> 00:07:12,000
Okay.

115
00:07:12,000 --> 00:07:15,000
So this is how our Streamlit application works.

116
00:07:16,000 --> 00:07:19,000
And we will be creating this Streamlit application in this tutorial.

117
00:07:19,000 --> 00:07:21,000
So let's get started with this.

118
00:07:23,000 --> 00:07:28,000
So over here you can see that I have just created a project in PyCharm Community Edition.

119
00:07:28,000 --> 00:07:31,000
And I have requirements.txt file.

120
00:07:31,000 --> 00:07:35,000
And we are we have Ultralytics and Streamlit package over here.

121
00:07:35,000 --> 00:07:39,000
So if you create this application we only require two packages.

122
00:07:39,000 --> 00:07:41,000
One is Ultralytics and Streamlit.

123
00:07:41,000 --> 00:07:45,000
And to install this is you can simply right pip install.

124
00:07:47,000 --> 00:07:52,000
R requirements dot txt.

125
00:07:52,000 --> 00:07:56,000
And if you run this you will be able to install these two packages.

126
00:07:56,000 --> 00:08:00,000
So I've already installed these packages and you can see over here requirements already satisfied.

127
00:08:00,000 --> 00:08:04,000
But if you are installing these packages for the first time this will take some time.

128
00:08:04,000 --> 00:08:08,000
Next we will just create a new file like Main.py file.

129
00:08:10,000 --> 00:08:16,000
In the step number one, we will import all the required libraries that we require in this project.

130
00:08:20,000 --> 00:08:22,000
So we require the OpenCV Python package.

131
00:08:22,000 --> 00:08:24,000
We require the Streamlit library.

132
00:08:35,000 --> 00:08:36,000
To define the path.

133
00:08:38,000 --> 00:08:42,000
Of our root directory we require the Pathlib primary.

134
00:08:42,000 --> 00:08:43,000
From there we import path.

135
00:08:45,000 --> 00:08:48,000
Then to require SSIS library.

136
00:08:50,000 --> 00:08:53,000
Then to go to object detection using Ultralytics YOLO 11.

137
00:08:54,000 --> 00:08:59,000
We require a YOLO package software report from Ultralytics.

138
00:08:59,000 --> 00:09:01,000
We will import YOLO.

139
00:09:06,000 --> 00:09:10,000
And to display any image we require image lab.

140
00:09:14,000 --> 00:09:15,000
Okay.

141
00:09:21,000 --> 00:09:27,000
So let's create a new page layout and with the help of Streamlit.

142
00:09:28,000 --> 00:09:31,000
So we will be now creating an initial page layout.

143
00:09:34,000 --> 00:09:37,000
So we'll write speed dot set dash page dash config.

144
00:09:39,000 --> 00:09:42,000
And the page title will be.

145
00:09:45,000 --> 00:09:46,000
YOLO 11.

146
00:09:47,000 --> 00:09:49,000
We can also add an page icon.

147
00:10:02,000 --> 00:10:05,000
Next we will define a header for our Streamlit application.

148
00:10:11,000 --> 00:10:15,000
The header will be object detection level.

149
00:10:18,000 --> 00:10:20,000
We can also create a sidebar.

150
00:10:20,000 --> 00:10:26,000
We will have our all the settings like what task we want to perform, what would be the confidence threshold.

151
00:10:26,000 --> 00:10:30,000
And if you want to do object detection on image or on video.

152
00:10:30,000 --> 00:10:33,000
So you need to create a sidebar for this.

153
00:10:39,000 --> 00:10:45,000
In the sidebar we will also add a header with the name model configurations.

154
00:10:53,000 --> 00:10:53,000
In the sidebar.

155
00:10:53,000 --> 00:10:58,000
We will also give the user the option to choose whether you want to perform object detection, instance

156
00:10:58,000 --> 00:11:00,000
segmentation, or pose estimation.

157
00:11:13,000 --> 00:11:16,000
Or the user will have the option to choose the model type.

158
00:11:28,000 --> 00:11:32,000
We are just adding radio buttons for their.

159
00:11:37,000 --> 00:11:38,000
Detection.

160
00:11:38,000 --> 00:11:39,000
Segmentation.

161
00:11:42,000 --> 00:11:43,000
Fault estimation.

162
00:11:48,000 --> 00:11:54,000
We will also give the user the option to select the confidence threshold with the help of slider.

163
00:12:03,000 --> 00:12:05,000
The confidence value will be in the form of float.

164
00:12:22,000 --> 00:12:24,000
The minimum value of confidence will be 25.

165
00:12:24,000 --> 00:12:28,000
The default value will be 40 and the maximum value will be 100.

166
00:12:30,000 --> 00:12:36,000
We will divide it by 100 because our model takes the confidence value in the form of, uh, between

167
00:12:36,000 --> 00:12:43,000
0 to 1, like 0.1 is 10%, 0.25 is 25%, 0.90 is 90%.

168
00:12:43,000 --> 00:12:43,000
Okay.

169
00:12:44,000 --> 00:12:45,000
So we write.

170
00:12:49,000 --> 00:12:52,000
So let's see how each Streamlit application looks like.

171
00:12:52,000 --> 00:12:56,000
So we'll write Streamlit run main.py.

172
00:13:23,000 --> 00:13:25,000
So this will take some time.

173
00:13:25,000 --> 00:13:27,000
Currently this will take some time.

174
00:13:28,000 --> 00:13:29,000
Let me see if there is an error.

175
00:13:31,000 --> 00:13:32,000
It looks fine.

176
00:13:38,000 --> 00:13:38,000
Okay.

177
00:13:38,000 --> 00:13:42,000
So here you can see we have our Streamlit application.

178
00:13:42,000 --> 00:13:46,000
Like you can see over here we have the model configurations cost reduction.

179
00:13:46,000 --> 00:13:47,000
Segmentation.

180
00:13:47,000 --> 00:13:48,000
Cost estimation.

181
00:13:48,000 --> 00:13:49,000
Okay.

182
00:13:49,000 --> 00:13:50,000
Default value.

183
00:13:50,000 --> 00:13:52,000
Why we are seeing this.

184
00:13:59,000 --> 00:13:59,000
Okay.

185
00:13:59,000 --> 00:14:03,000
So the maximum value is 100 and default value is 40.

186
00:14:05,000 --> 00:14:06,000
I have just made this news.

187
00:14:06,000 --> 00:14:07,000
Let's see.

188
00:14:08,000 --> 00:14:10,000
Let me refresh this up.

189
00:14:13,000 --> 00:14:13,000
Okay.

190
00:14:13,000 --> 00:14:14,000
Again I think.

191
00:14:18,000 --> 00:14:18,000
Oh yeah.

192
00:14:20,000 --> 00:14:21,000
Let me refresh this up.

193
00:14:24,000 --> 00:14:25,000
So you can see.

194
00:14:25,000 --> 00:14:28,000
Default value is 40 and the minimum value is 25.

195
00:14:28,000 --> 00:14:30,000
And maximum value is 100.

196
00:14:30,000 --> 00:14:30,000
Okay.

197
00:14:30,000 --> 00:14:33,000
So let's complete this application now.

198
00:14:39,000 --> 00:14:45,000
First we need to get the absolute path of the current file.

199
00:14:45,000 --> 00:14:45,000
Wide.

200
00:15:00,000 --> 00:15:07,000
And we need to get the parent directory of the parent file.

201
00:15:09,000 --> 00:15:10,000
Which is.

202
00:15:19,000 --> 00:15:22,000
Next we will add the root path.

203
00:15:25,000 --> 00:15:27,000
To the system dot.

204
00:15:28,000 --> 00:15:29,000
Or if it's not already there.

205
00:15:41,000 --> 00:15:44,000
I will just say if not in.

206
00:15:46,000 --> 00:15:48,000
This pod.

207
00:15:51,000 --> 00:15:53,000
Dot dot dot.

208
00:15:53,000 --> 00:15:54,000
Append.

209
00:15:59,000 --> 00:15:59,000
Okay.

210
00:16:04,000 --> 00:16:07,000
Next we need to get the relative path.

211
00:16:12,000 --> 00:16:14,000
Of the root directory.

212
00:16:17,000 --> 00:16:23,000
With respect to the current working directory.

213
00:16:41,000 --> 00:16:42,000
Okay.

214
00:16:42,000 --> 00:16:48,000
So we have just got that part to the root directory with respect to the current current working directory.

215
00:16:48,000 --> 00:16:50,000
Uh, that looks fine.

216
00:16:50,000 --> 00:16:51,000
Next.

217
00:16:53,000 --> 00:16:55,000
So we need to define the sources.

218
00:16:55,000 --> 00:17:03,000
Uh, our we can pass our input in the form of image and as well as in the form of video.

219
00:17:10,000 --> 00:17:13,000
Our input will be in the form of image or video.

220
00:17:16,000 --> 00:17:18,000
So let's define a source list.

221
00:17:41,000 --> 00:17:47,000
Or we need to define some default images that will appear if the user doesn't upload an image.

222
00:17:47,000 --> 00:17:49,000
What should be the default image appearing?

223
00:17:50,000 --> 00:17:53,000
I will set some image configurations over here.

224
00:17:58,000 --> 00:18:03,000
So first we need to create a directory by the name images over here.

225
00:18:11,000 --> 00:18:14,000
And inside this directory I will add some free images.

226
00:18:16,000 --> 00:18:21,000
So this is the default detected image that will appear in our Streamlit application.

227
00:18:21,000 --> 00:18:23,000
This is the image one.

228
00:18:23,000 --> 00:18:24,000
This is the default image as well.

229
00:18:24,000 --> 00:18:28,000
And this is another image which will appear in our images directory.

230
00:18:29,000 --> 00:18:29,000
Okay.

231
00:18:34,000 --> 00:18:40,000
So in the root over here we have defined a relative path of the root directory with respect to the current

232
00:18:40,000 --> 00:18:41,000
working Writing directory.

233
00:18:50,000 --> 00:18:54,000
And we need to define our default image over here.

234
00:18:56,000 --> 00:18:58,000
So we can go to images directory.

235
00:18:58,000 --> 00:19:00,000
And over here.

236
00:19:03,000 --> 00:19:05,000
We have image one or jpg.

237
00:19:08,000 --> 00:19:12,000
Then we have default detect image.

238
00:19:20,000 --> 00:19:26,000
And we have detected image one dot okay.

239
00:19:30,000 --> 00:19:33,000
Similarly we can do for the video as well.

240
00:19:33,000 --> 00:19:37,000
For this I will just create another directory where the name videos over here.

241
00:19:47,000 --> 00:19:50,000
And over here I will add two videos.

242
00:19:52,000 --> 00:19:52,000
Okay.

243
00:20:03,000 --> 00:20:10,000
Well first define video directory is equal to root and videos for the name of the directory is videos

244
00:20:11,000 --> 00:20:12,000
okay.

245
00:20:16,000 --> 00:20:18,000
And then we will create a dictionary.

246
00:20:25,000 --> 00:20:26,000
Right.

247
00:20:26,000 --> 00:20:27,000
Video one.

248
00:20:31,000 --> 00:20:33,000
Video directory.

249
00:20:36,000 --> 00:20:37,000
Video one dot.

250
00:20:42,000 --> 00:20:44,000
Then we will write video two.

251
00:20:47,000 --> 00:20:52,000
In your directory media two dot mp4.

252
00:20:53,000 --> 00:20:53,000
Okay.

253
00:20:57,000 --> 00:21:02,000
Next we need to define model configurations where we will create a weight folder over here.

254
00:21:02,000 --> 00:21:08,000
And I will add my uh object detection instance segmentation and pose estimation over 11 models.

255
00:21:08,000 --> 00:21:10,000
So I will just write model configuration.

256
00:21:18,000 --> 00:21:26,000
So first what I will do is I will just create a new directory by the name weights over here.

257
00:21:27,000 --> 00:21:32,000
And inside this I will add what I will do over here is uh we can go over here.

258
00:21:34,000 --> 00:21:34,000
Okay.

259
00:21:34,000 --> 00:21:37,000
So you can just go to your 11 GitHub.

260
00:21:37,000 --> 00:21:37,000
up.

261
00:21:39,000 --> 00:21:46,000
And if you want to download any object detection model or instance segmentation or pose estimation model,

262
00:21:46,000 --> 00:21:51,000
uh, you can simply click over here and you can see that it start downloading.

263
00:21:51,000 --> 00:21:55,000
And if you want to download any segmentation model, you can simply click over here.

264
00:21:55,000 --> 00:21:57,000
And it will now start downloading.

265
00:21:57,000 --> 00:22:01,000
And if you want to download Pose Estimation model you can simply click over here.

266
00:22:01,000 --> 00:22:02,000
It will start downloading okay.

267
00:22:05,000 --> 00:22:08,000
So I have already downloaded so I'm not downloading again.

268
00:22:15,000 --> 00:22:18,000
So I'm just adding these three models over here.

269
00:22:20,000 --> 00:22:25,000
So you can see we have your 11 pose estimation model and your 11 segmentation model.

270
00:22:29,000 --> 00:22:30,000
So I will just write route.

271
00:22:30,000 --> 00:22:34,000
And you just need to go to the weights directory over here.

272
00:22:42,000 --> 00:22:44,000
So we just need to go to model directory.

273
00:22:44,000 --> 00:22:48,000
And here we have the detection model as your 11 nano dash.

274
00:22:48,000 --> 00:22:52,000
So I go to 11 nano that we got it.

275
00:22:57,000 --> 00:23:01,000
So in case of your custom model.

276
00:23:07,000 --> 00:23:08,000
You will write detection.

277
00:23:12,000 --> 00:23:13,000
Dash model.

278
00:23:13,000 --> 00:23:17,000
And here you will write model dash directory.

279
00:23:17,000 --> 00:23:25,000
And over here you will add custom model with dot 80.

280
00:23:25,000 --> 00:23:28,000
And you will just comment this line of code okay.

281
00:23:28,000 --> 00:23:29,000
And uncomment this okay.

282
00:23:32,000 --> 00:23:36,000
Next we need to define the segmentation model.

283
00:23:48,000 --> 00:23:51,000
Next we need to also define the pose estimation model.

284
00:23:59,000 --> 00:24:00,000
Okay.

285
00:24:00,000 --> 00:24:01,000
So that things work.

286
00:24:06,000 --> 00:24:09,000
So we are good for now.

287
00:24:15,000 --> 00:24:18,000
Next uh, if the user uh we need to add the code.

288
00:24:18,000 --> 00:24:22,000
If the user select the detection segmentation or the pose estimation model.

289
00:24:37,000 --> 00:24:39,000
Now if the model die base.

290
00:24:45,000 --> 00:24:47,000
Or a model die based detection.

291
00:24:49,000 --> 00:24:52,000
Model path is equal to path.

292
00:24:55,000 --> 00:24:57,000
Detection model as if.

293
00:25:02,000 --> 00:25:03,000
If the model die based.

294
00:25:03,000 --> 00:25:05,000
Segmentation.

295
00:25:07,000 --> 00:25:09,000
Model path.

296
00:25:10,000 --> 00:25:22,000
Path segmentation model and if model type is equal to for estimation.

297
00:25:27,000 --> 00:25:30,000
On the label or.

298
00:25:32,000 --> 00:25:33,000
For estimation.

299
00:25:33,000 --> 00:25:33,000
On.

300
00:25:42,000 --> 00:25:44,000
So next we will load the YOLO model.

301
00:25:46,000 --> 00:25:48,000
When I try.

302
00:25:50,000 --> 00:25:53,000
YOLO more than five.

303
00:25:56,000 --> 00:25:59,000
And write except exception as e.

304
00:26:02,000 --> 00:26:04,000
S g dot error.

305
00:26:07,000 --> 00:26:12,000
You can see the write unable to load model.

306
00:26:12,000 --> 00:26:15,000
Check uh s file path.

307
00:26:21,000 --> 00:26:21,000
Okay.

308
00:26:27,000 --> 00:26:29,000
Simply write this over here.

309
00:26:32,000 --> 00:26:36,000
And we can also display the error in our Streamlit application.

310
00:26:42,000 --> 00:26:46,000
Now we can just set image and video configuration.

311
00:27:05,000 --> 00:27:10,000
Will give the user the option to select if you want to perform object detection, instance segmentation

312
00:27:10,000 --> 00:27:13,000
and pose estimation on image or on video.

313
00:27:13,000 --> 00:27:15,000
So we'll give the user the option.

314
00:27:23,000 --> 00:27:26,000
We can start sidebar dot radio.

315
00:27:29,000 --> 00:27:30,000
Like source.

316
00:27:34,000 --> 00:27:35,000
Source list.

317
00:27:39,000 --> 00:27:40,000
Sorry.

318
00:27:40,000 --> 00:27:43,000
Initialize source image is equal to none.

319
00:27:46,000 --> 00:27:47,000
So if that.

320
00:27:47,000 --> 00:27:51,000
If the user selects an image.

321
00:27:59,000 --> 00:28:01,000
We will ask the user to upload an image.

322
00:28:16,000 --> 00:28:20,000
And the Streamlit application will accept the image of type.

323
00:28:24,000 --> 00:28:25,000
JPG png.

324
00:28:28,000 --> 00:28:29,000
JPEG.

325
00:28:31,000 --> 00:28:32,000
BMP.

326
00:28:36,000 --> 00:28:38,000
So I only know all these image types.

327
00:28:38,000 --> 00:28:42,000
If you know some other, you can also add it over here as well.

328
00:28:44,000 --> 00:28:48,000
Next we will create two columns which will contain a default image and other will contain the detected

329
00:28:48,000 --> 00:28:52,000
image, or one will contain the.

330
00:28:52,000 --> 00:28:57,000
Use the image that the user has uploaded and other is the our output image with the detections.

331
00:29:03,000 --> 00:29:05,000
So first we will go towards the column one.

332
00:29:12,000 --> 00:29:17,000
So if the source image is none like the user doesn't upload an image.

333
00:29:23,000 --> 00:29:26,000
Then we will define a default image path over here.

334
00:29:37,000 --> 00:29:39,000
And we will say.

335
00:29:42,000 --> 00:29:43,000
This is our default image.

336
00:29:57,000 --> 00:29:58,000
You can also open this image.

337
00:30:10,000 --> 00:30:14,000
And we can also display this image in our Streamlit application using S3 dot image.

338
00:30:16,000 --> 00:30:17,000
Right.

339
00:30:17,000 --> 00:30:18,000
Default image path.

340
00:30:21,000 --> 00:30:24,000
And our caption will be before image.

341
00:30:48,000 --> 00:30:48,000
Else.

342
00:30:48,000 --> 00:30:55,000
If the user uploads an image and source image is not on, then we will display the uploaded image.

343
00:30:59,000 --> 00:31:06,000
So if the user uploads an image, then we will display the source image.

344
00:31:09,000 --> 00:31:13,000
Caption will be uploaded.

345
00:31:13,000 --> 00:31:14,000
Image.

346
00:31:22,000 --> 00:31:22,000
Okay.

347
00:31:25,000 --> 00:31:27,000
And now we will add accept as well.

348
00:31:27,000 --> 00:31:32,000
Except exception as okay.

349
00:31:36,000 --> 00:31:37,000
Okay.

350
00:31:38,000 --> 00:31:39,000
What will be.

351
00:31:42,000 --> 00:31:43,000
Any.

352
00:31:53,000 --> 00:31:54,000
Error occurred.

353
00:31:56,000 --> 00:31:57,000
While.

354
00:32:03,000 --> 00:32:06,000
Opening that and we will.

355
00:32:08,000 --> 00:32:10,000
Display the error as.

356
00:32:13,000 --> 00:32:16,000
Now we'll create another column as well.

357
00:32:31,000 --> 00:32:33,000
But if forcing it is not.

358
00:32:36,000 --> 00:32:42,000
Then we'll define default detected image path over here.

359
00:33:03,000 --> 00:33:04,000
Okay.

360
00:33:10,000 --> 00:33:14,000
And here we'll pass the default detected image.

361
00:33:16,000 --> 00:33:17,000
Path.

362
00:33:19,000 --> 00:33:21,000
Caption will be detected.

363
00:33:21,000 --> 00:33:22,000
Image.

364
00:33:39,000 --> 00:33:49,000
Else if the source image is there, and user also has a sidebar button which is.

365
00:33:51,000 --> 00:33:53,000
Detect objects.

366
00:33:57,000 --> 00:34:02,000
That we will use model dot predict.

367
00:34:03,000 --> 00:34:04,000
Okay.

368
00:34:05,000 --> 00:34:08,000
And here we will pass the uploaded image.

369
00:34:11,000 --> 00:34:12,000
And.

370
00:34:16,000 --> 00:34:18,000
The first uploaded image over here.

371
00:34:19,000 --> 00:34:20,000
Then.

372
00:34:28,000 --> 00:34:32,000
But we also define the confidence value.

373
00:34:36,000 --> 00:34:37,000
It'sand.

374
00:34:42,000 --> 00:34:45,000
That's not the result as well.

375
00:35:25,000 --> 00:35:30,000
And we'll also show the bounding box coordinates there as well.

376
00:35:31,000 --> 00:35:32,000
Right.

377
00:35:32,000 --> 00:35:34,000
With a dot expander.

378
00:36:02,000 --> 00:36:04,000
And we will display the error.

379
00:36:08,000 --> 00:36:13,000
And we can also add section over here as well.

380
00:36:27,000 --> 00:36:28,000
Worthwhile.

381
00:36:35,000 --> 00:36:35,000
Okay.

382
00:36:35,000 --> 00:36:36,000
That works.

383
00:36:38,000 --> 00:36:43,000
Now, let's test, uh, all our Streamlit application works on image.

384
00:36:43,000 --> 00:36:48,000
And if it works fine, then we'll go ahead with the video as well.

385
00:36:52,000 --> 00:36:55,000
Okay, so let's test our Streamlit application.

386
00:36:58,000 --> 00:37:03,000
So here you can see that this is a Streamlit application that we have created so far.

387
00:37:03,000 --> 00:37:08,000
So first uh, let's reduction I will just upload this image.

388
00:37:10,000 --> 00:37:12,000
And if I click on detect object from here.

389
00:37:12,000 --> 00:37:14,000
So this is the uploaded image.

390
00:37:14,000 --> 00:37:15,000
Okay.

391
00:37:15,000 --> 00:37:19,000
So now you can see over here we are able to detect the person purse backpack.

392
00:37:19,000 --> 00:37:24,000
And here you can see we have the bounding box coordinates for each detected object in the form of a

393
00:37:24,000 --> 00:37:25,000
tensor.

394
00:37:25,000 --> 00:37:29,000
We can also do segmentation and let's click on detect object from here.

395
00:37:31,000 --> 00:37:34,000
So now you can see that we are able to form instance segmentation.

396
00:37:34,000 --> 00:37:38,000
And we have drawn mask around each of the detected object.

397
00:37:38,000 --> 00:37:40,000
We can also perform pose estimation.

398
00:37:40,000 --> 00:37:43,000
And let's click on detect object from here.

399
00:37:46,000 --> 00:37:49,000
So now you can see we are able to perform pose estimation.

400
00:37:49,000 --> 00:37:52,000
And we have drawn key points around each of the detected object.

401
00:37:52,000 --> 00:37:58,000
And here you can see we have the bounding box coordinates appearing over here for each of the object.

402
00:37:58,000 --> 00:38:00,000
Okay so we are done with the image part.

403
00:38:00,000 --> 00:38:05,000
Let's go ahead and move towards the uh video part as well.

404
00:38:05,000 --> 00:38:06,000
Okay.

405
00:38:12,000 --> 00:38:13,000
So we can write and if.

406
00:38:16,000 --> 00:38:20,000
Source radio is equal to radio.

407
00:38:22,000 --> 00:38:23,000
Copy.

408
00:38:23,000 --> 00:38:23,000
Right.

409
00:38:23,000 --> 00:38:32,000
Source video is equal to ask not sidebar or select box.

410
00:38:38,000 --> 00:38:40,000
It was a video.

411
00:38:50,000 --> 00:38:52,000
In your dictionary dot keys.

412
00:38:59,000 --> 00:39:02,000
And right will open with your dictionary.

413
00:39:05,000 --> 00:39:10,000
And what source video or read it.

414
00:39:13,000 --> 00:39:24,000
As I so we need to convert the video into bytes so we can perform object detection on the video.

415
00:39:37,000 --> 00:39:41,000
Or S3 dot video will display the video into our Streamlit application.

416
00:39:56,000 --> 00:39:59,000
So if the user click on this button right here.

417
00:40:02,000 --> 00:40:02,000
And.

418
00:40:09,000 --> 00:40:12,000
Then we'll add a try and accept permission.

419
00:40:34,000 --> 00:40:35,000
Okay.

420
00:40:42,000 --> 00:40:44,000
So we'll create an empty frame from now.

421
00:40:48,000 --> 00:40:51,000
Now we'll just create a while loop over here.

422
00:41:11,000 --> 00:41:14,000
And if we are able to read the frame then.

423
00:41:16,000 --> 00:41:19,000
We'll just resize the frame over here.

424
00:41:48,000 --> 00:41:55,000
Next we will predict the objects in the image using YOLO 11.

425
00:41:57,000 --> 00:41:59,000
Right result is equal to model dot.

426
00:41:59,000 --> 00:42:06,000
Predict image is equal to confidence is equal to confidence value.

427
00:42:10,000 --> 00:42:16,000
Next we will plot the detected objects on the radio Rain.

428
00:42:34,000 --> 00:42:36,000
Uh, we already have created an empty frame.

429
00:42:36,000 --> 00:42:39,000
We will be plotting our detected objects on that frame.

430
00:43:16,000 --> 00:43:23,000
And if the video frames end, then we'll release all the frames and we will break the while loop.

431
00:43:25,000 --> 00:43:29,000
And we can also add an accept conditional here.

432
00:43:46,000 --> 00:43:48,000
Around loading video.

433
00:43:53,000 --> 00:43:58,000
Okay so and we will display the error over here as well.

434
00:43:59,000 --> 00:44:02,000
So now we have created our Streamlit application.

435
00:44:02,000 --> 00:44:06,000
Now it's time to test our Streamlit app application.

436
00:44:06,000 --> 00:44:11,000
So we can simply write over here Streamlit run main.py.

437
00:44:12,000 --> 00:44:20,000
Okay so let's test our Streamlit application and let's see if we are able to do detection on image and

438
00:44:20,000 --> 00:44:21,000
video as well.

439
00:44:21,000 --> 00:44:23,000
We have tested for image already now.

440
00:44:23,000 --> 00:44:26,000
Now we will mainly test on video now.

441
00:44:31,000 --> 00:44:34,000
So first of all I will just click on video from here.

442
00:44:34,000 --> 00:44:41,000
And um I can just call it video two and click on Detect Video Objects from here.

443
00:44:41,000 --> 00:44:46,000
And now you can see over here we are able to detect a person bicycle from here.

444
00:44:46,000 --> 00:44:49,000
And the detection results look quite promising.

445
00:44:50,000 --> 00:44:53,000
If you want to perform segmentation you can select segmentation.

446
00:44:53,000 --> 00:44:53,000
From here.

447
00:44:53,000 --> 00:44:57,000
You can select any video that you have saved in the video directory.

448
00:44:57,000 --> 00:45:00,000
And then you will run Detect video objects.

449
00:45:00,000 --> 00:45:03,000
Now you can see that we are able to perform instance segmentation.

450
00:45:03,000 --> 00:45:06,000
We are able to detect a traffic lights bicycle person.

451
00:45:06,000 --> 00:45:11,000
And you can see over here we have drawn a mask around each of the detected object.

452
00:45:11,000 --> 00:45:12,000
So this is what we do.

453
00:45:12,000 --> 00:45:13,000
In instance segmentation.

454
00:45:13,000 --> 00:45:18,000
In instance segmentation, we draw a mask around each of the detected object.

455
00:45:18,000 --> 00:45:24,000
Similarly, we can perform pose estimation and they can detect 3D objects from here.

456
00:45:24,000 --> 00:45:26,000
So now you can see we have performed pose estimation.

457
00:45:26,000 --> 00:45:30,000
And we have drawn key points around each of the detected object.

458
00:45:30,000 --> 00:45:35,000
Similarly you can adjust the confidence threshold to 79 as well.

459
00:45:35,000 --> 00:45:40,000
And you can see now as I increase the confidence threshold the detection has decreased.

460
00:45:40,000 --> 00:45:42,000
But now the detections are more accurate.

461
00:45:42,000 --> 00:45:47,000
But if I decrease the confidence score there can be some false positives as well.

462
00:45:47,000 --> 00:45:49,000
But there have been more number of detections.

463
00:45:51,000 --> 00:45:53,000
Similarly you can do it for image as well.

464
00:45:53,000 --> 00:45:55,000
We have already tried on images.

465
00:45:58,000 --> 00:45:59,000
And detect objects.

466
00:46:01,000 --> 00:46:05,000
And let me see if I have some other image over here as well.

467
00:46:08,000 --> 00:46:12,000
Let's use this image and let's click on detect objects.

468
00:46:12,000 --> 00:46:15,000
So now you can see over here we are able to detect the objects.

469
00:46:15,000 --> 00:46:20,000
And we are also able to draw the key points around each of the detected object because we have selected

470
00:46:20,000 --> 00:46:21,000
for the estimation.

471
00:46:21,000 --> 00:46:28,000
And if I just like segmentation, you can see we are able to draw a mask around each of the detected.

472
00:46:31,000 --> 00:46:37,000
So now you can see over here with the help of instance segmentation, we are able to draw a mask around

473
00:46:37,000 --> 00:46:38,000
each of detected object.

474
00:46:38,000 --> 00:46:43,000
And if I click on detection from here and click on detect object.

475
00:46:43,000 --> 00:46:47,000
From here you can see that I am able to detect a different objects like person bus.

476
00:46:47,000 --> 00:46:52,000
And over here if I just expand this I have all the bounding box coordinate informations in the form

477
00:46:52,000 --> 00:46:54,000
of tensors over here.

478
00:46:54,000 --> 00:46:56,000
So that's all from this tutorial.

479
00:46:56,000 --> 00:47:01,000
In this tutorial we have learned how we can create a Streamlit application to do object detection,

480
00:47:01,000 --> 00:47:07,000
instance segmentation and pose estimation with the help of ultralytics YOLO 11 on images as well as

481
00:47:07,000 --> 00:47:08,000
on videos.

482
00:47:08,000 --> 00:47:09,000
That's all from this video.

483
00:47:09,000 --> 00:47:11,000
Thank you for watching.

