1
00:00:02,000 --> 00:00:03,000
Hello everyone.

2
00:00:03,000 --> 00:00:08,000
In this video tutorial we will run YOLO V8 in Google CoLab.

3
00:00:08,000 --> 00:00:12,000
Yolo V8 is faster and more accurate than YOLO.

4
00:00:12,000 --> 00:00:18,000
V7 YOLO V8 is trained for object detection and image segmentation and image classification tasks.

5
00:00:19,000 --> 00:00:24,000
We will run object detection and segmentation tasks in this tutorial.

6
00:00:24,000 --> 00:00:28,000
So here are different models of YOLO V8.

7
00:00:28,000 --> 00:00:29,000
You can see over here.

8
00:00:29,000 --> 00:00:33,000
This snapshot is taken from the GitHub repository of YOLO V8.

9
00:00:33,000 --> 00:00:36,000
So we have around five different model.

10
00:00:36,000 --> 00:00:44,000
The first model is YOLO, V8 and YOLO V8 and is smallest in size and faster in speed, but it is less

11
00:00:44,000 --> 00:00:45,000
accurate.

12
00:00:45,000 --> 00:00:49,000
While YOLO v8 X is larger in size.

13
00:00:49,000 --> 00:00:57,000
Not as fast as YOLO V8 and is, but it is more accurate than all the other YOLO V8 models.

14
00:00:57,000 --> 00:01:03,000
Okay, so before we run this script, please make sure you have select the runtime as GPU.

15
00:01:03,000 --> 00:01:05,000
To check this, click on runtime.

16
00:01:05,000 --> 00:01:11,000
Go to change runtime and make sure that the hardware accelerator is set as GPU.

17
00:01:11,000 --> 00:01:17,000
If it is set as None or TPU, please make sure it is set to selected GPU and click on Save.

18
00:01:18,000 --> 00:01:21,000
Now we will first import all the required libraries.

19
00:01:21,000 --> 00:01:26,000
So first of all will import from IPython dot display import image.

20
00:01:26,000 --> 00:01:33,000
We need this library if we because we need to display any output, image or any input image into our

21
00:01:33,000 --> 00:01:41,000
Google CoLab notebook to display any input or output image into the Google CoLab notebook, we require

22
00:01:41,000 --> 00:01:45,000
this library from ipython dot display import image.

23
00:01:45,000 --> 00:01:48,000
So let's run this cell to import this library.

24
00:01:49,000 --> 00:01:56,000
Now to run object detection, image classification and image segmentation tasks we will install in ultralytics.

25
00:01:56,000 --> 00:01:59,000
So here I'm doing PIP install ultralytics.

26
00:02:00,000 --> 00:02:03,000
So it might take a few seconds to install.

27
00:02:05,000 --> 00:02:12,000
This yellow year is the first iteration of YOLO to have an official package where we can run YOLO V8

28
00:02:12,000 --> 00:02:17,000
used by PIP, install ultralytics or cloning the GitHub repo.

29
00:02:18,000 --> 00:02:18,000
Okay.

30
00:02:18,000 --> 00:02:22,000
But here we would prefer to do PIP install ultralytics.

31
00:02:22,000 --> 00:02:26,000
Now we will check whether the GPU is available or not.

32
00:02:26,000 --> 00:02:27,000
To check this.

33
00:02:28,000 --> 00:02:29,000
Their son.

34
00:02:29,000 --> 00:02:33,000
Import torch and torch dot dot dot is available.

35
00:02:33,000 --> 00:02:36,000
So if we have the GPU available, it will print true.

36
00:02:36,000 --> 00:02:39,000
If the GPU is not available, it will print false.

37
00:02:39,000 --> 00:02:45,000
Now torch dot which we are checking the torch version which is one point 13.1, the latest PyTorch version

38
00:02:45,000 --> 00:02:48,000
and 116.

39
00:02:48,000 --> 00:02:53,000
So first of all, we will perform or implement the object detection task.

40
00:02:53,000 --> 00:02:57,000
We will start with object detection for the sample image.

41
00:02:57,000 --> 00:02:57,000
Okay.

42
00:02:57,000 --> 00:03:05,000
To just run object detection on the sample image, I will I will use command CLI command, which is

43
00:03:05,000 --> 00:03:05,000
YOLO.

44
00:03:05,000 --> 00:03:09,000
Torch is equal to detect mode is equal to predict.

45
00:03:09,000 --> 00:03:15,000
If we are predicting because we are using a pre-trained model yolo v s dot over here.

46
00:03:15,000 --> 00:03:22,000
So if we want to fine tune our YOLO vitiate model on some data set, then we will use task as train.

47
00:03:22,000 --> 00:03:28,000
But here we are only doing the predictions, so we will choose mode as predict and currently we are

48
00:03:28,000 --> 00:03:30,000
performing detection.

49
00:03:30,000 --> 00:03:32,000
So we will choose task is equal to detect.

50
00:03:32,000 --> 00:03:36,000
If we are performing segmentation, then the task is equal to segmentation.

51
00:03:36,000 --> 00:03:40,000
If we are performing classification, then task is equal to classification.

52
00:03:42,000 --> 00:03:48,000
And we have and we are using YOLO model test our model on a sample image.

53
00:03:48,000 --> 00:03:51,000
And this is the sample image which I am passing over here.

54
00:03:51,000 --> 00:03:59,000
So I have imported a sample, image and video tutorial from my system onto the Google CoLab notebook

55
00:03:59,000 --> 00:04:08,000
so I can test your model on a sample image, which is this and on a demo video, which is of course.

56
00:04:08,000 --> 00:04:10,000
Okay, so this is the sample which I am using.

57
00:04:10,000 --> 00:04:12,000
So let's run this cell first.

58
00:04:16,000 --> 00:04:20,000
Or that it might take a few seconds, so it will run soon.

59
00:04:20,000 --> 00:04:21,000
Okay.

60
00:04:21,000 --> 00:04:23,000
The model has downloaded from here.

61
00:04:24,000 --> 00:04:25,000
And.

62
00:04:26,000 --> 00:04:29,000
Only clapped out work in just a few seconds.

63
00:04:30,000 --> 00:04:37,000
So we are our model has detected 11 person, one bicycle, one motorcycle, one bus route, traffic

64
00:04:37,000 --> 00:04:42,000
lights, two backpacks, three hand bags in 15.5 millisecond.

65
00:04:42,000 --> 00:04:43,000
It is fast.

66
00:04:44,000 --> 00:04:49,000
Okay, so our output is image is saved in run detect predict.

67
00:04:49,000 --> 00:04:57,000
So this is our runs folder inside detect inside predict we have our image dot jpg to display this image,

68
00:04:57,000 --> 00:05:01,000
click on here copy part and just paste it over here.

69
00:05:02,000 --> 00:05:08,000
Okay, So in the start, you know that we have imported the image library to display the image or input

70
00:05:08,000 --> 00:05:11,000
or output image into the Google CoLab notebook.

71
00:05:11,000 --> 00:05:16,000
So I'm displaying the output image using the image library into the Google CoLab notebook.

72
00:05:16,000 --> 00:05:17,000
So just run this cell.

73
00:05:20,000 --> 00:05:22,000
This is our output image.

74
00:05:22,000 --> 00:05:25,000
Our model has successfully detected persons here.

75
00:05:25,000 --> 00:05:27,000
This is the wrong prediction model has made.

76
00:05:27,000 --> 00:05:28,000
This is the backpack.

77
00:05:28,000 --> 00:05:29,000
This is the bus.

78
00:05:29,000 --> 00:05:30,000
These are the traffic lights.

79
00:05:30,000 --> 00:05:32,000
These are all the persons.

80
00:05:32,000 --> 00:05:34,000
So the.

81
00:05:34,000 --> 00:05:37,000
Yes, the wrong prediction over here.

82
00:05:37,000 --> 00:05:38,000
Over here as well.

83
00:05:38,000 --> 00:05:40,000
These are the errors which he has made.

84
00:05:40,000 --> 00:05:44,000
Back and back like this.

85
00:05:44,000 --> 00:05:49,000
So your predictions are wrong by majority of the predictions are correct.

86
00:05:51,000 --> 00:05:51,000
Okay.

87
00:05:51,000 --> 00:05:57,000
So if we want to save this bounding box coordinates information like which are the coordinates of this

88
00:05:57,000 --> 00:06:00,000
bounding box, what are the coordinates of this bounding box?

89
00:06:00,000 --> 00:06:06,000
So to save the bounding box information, like coordinates of all the bounding box which we have here,

90
00:06:06,000 --> 00:06:10,000
we will set save as text is equal to true by setting.

91
00:06:10,000 --> 00:06:17,000
Save text is equal to true will save the bounding box coordinate information with that output image

92
00:06:17,000 --> 00:06:18,000
as well.

93
00:06:18,000 --> 00:06:23,000
So let's set save dash text is equal to true and just run this set.

94
00:06:23,000 --> 00:06:24,000
Now.

95
00:06:24,000 --> 00:06:27,000
It will save the bounding box information as well.

96
00:06:30,000 --> 00:06:31,000
Okay.

97
00:06:31,000 --> 00:06:35,000
So our our output is saved in predict two labels folder.

98
00:06:35,000 --> 00:06:36,000
Let's see.

99
00:06:37,000 --> 00:06:42,000
Runs detect predict to and here this is the output image.

100
00:06:42,000 --> 00:06:45,000
So just go over here copy path.

101
00:06:46,000 --> 00:06:48,000
And just paste it over here.

102
00:06:48,000 --> 00:06:52,000
Now just run this cell and see our output image.

103
00:06:53,000 --> 00:06:55,000
So this is our output image.

104
00:06:55,000 --> 00:07:01,000
But we have used save text is equal to true over here, which will save the bounding box coordinates.

105
00:07:02,000 --> 00:07:03,000
Or you can see the information.

106
00:07:03,000 --> 00:07:11,000
So here we have the bounding box information here, 21 different objects are detected in this image.

107
00:07:11,000 --> 00:07:15,000
You can see the app, 21 different objects in this image.

108
00:07:15,000 --> 00:07:18,000
And here are all the coordinates information of them.

109
00:07:18,000 --> 00:07:25,000
So 24 310 represent the classes like the first term.

110
00:07:25,000 --> 00:07:31,000
This first term in all the lines represent the class like 24 represent the class three represent the

111
00:07:31,000 --> 00:07:35,000
class one represent the class zero represent the person class.

112
00:07:35,000 --> 00:07:39,000
So the first term represent the classes.

113
00:07:39,000 --> 00:07:42,000
This one represent the X1Y.

114
00:07:42,000 --> 00:07:42,000
This.

115
00:07:42,000 --> 00:07:44,000
This one represent the x one.

116
00:07:44,000 --> 00:07:49,000
This represents the Y one though the second term is basically x one.

117
00:07:49,000 --> 00:07:53,000
The third term is y one, so x one and y one are these points.

118
00:07:53,000 --> 00:07:55,000
And let me draw over here.

119
00:07:55,000 --> 00:07:59,000
Just I think I have the hair option over here.

120
00:08:03,000 --> 00:08:03,000
Okay.

121
00:08:03,000 --> 00:08:04,000
Just give me a minute.

122
00:08:04,000 --> 00:08:06,000
I will draw it.

123
00:08:06,000 --> 00:08:07,000
Set the.

124
00:08:08,000 --> 00:08:09,000
Better explain.

125
00:08:09,000 --> 00:08:13,000
I think I can explain it in a much.

126
00:08:15,000 --> 00:08:16,000
Okay.

127
00:08:17,000 --> 00:08:18,000
Just give me a minute, guys.

128
00:08:21,000 --> 00:08:21,000
Okay.

129
00:08:21,000 --> 00:08:24,000
So like, you can see this blue line over here.

130
00:08:25,000 --> 00:08:28,000
So X1 are these are these coordinates over here?

131
00:08:29,000 --> 00:08:31,000
These are the x1 coordinates over here.

132
00:08:32,000 --> 00:08:34,000
Now just there, all these.

133
00:08:35,000 --> 00:08:37,000
So if you can see this blue line.

134
00:08:38,000 --> 00:08:42,000
So these are the x1 y1 coordinates.

135
00:08:42,000 --> 00:08:43,000
And the last two are.

136
00:08:45,000 --> 00:08:47,000
Just on selected.

137
00:08:48,000 --> 00:08:48,000
Okay.

138
00:08:48,000 --> 00:08:50,000
I'm just trying to isolate it now.

139
00:08:50,000 --> 00:08:54,000
Okay, so this the last two are.

140
00:08:54,000 --> 00:08:59,000
This is x two and this is the y two coordinate x2 and y2 coordinate.

141
00:08:59,000 --> 00:09:01,000
So where are the x2 and y2 points?

142
00:09:01,000 --> 00:09:02,000
These are the.

143
00:09:05,000 --> 00:09:08,000
These are the x2 and Y2 points.

144
00:09:08,000 --> 00:09:11,000
Over here you can see these are the x2 and Y2 points.

145
00:09:11,000 --> 00:09:17,000
So in this way the bounding box coordinate information is saved in that text file with respect to each

146
00:09:17,000 --> 00:09:18,000
of the class.

147
00:09:20,000 --> 00:09:21,000
Okay.

148
00:09:21,000 --> 00:09:22,000
So.

149
00:09:23,000 --> 00:09:26,000
But if we want to prop each of the.

150
00:09:27,000 --> 00:09:28,000
And Egypt up.

151
00:09:28,000 --> 00:09:31,000
For example, if you want to drop each of the detected object.

152
00:09:31,000 --> 00:09:36,000
So for this we can just write save dash crop is equal to true.

153
00:09:36,000 --> 00:09:37,000
So here I have written.

154
00:09:39,000 --> 00:09:40,000
Just remove this thing.

155
00:09:40,000 --> 00:09:42,000
These are not required.

156
00:09:43,000 --> 00:09:47,000
So here I've written save dash prop.

157
00:09:47,000 --> 00:09:53,000
Is equal to true, so it will crop each of the detected object.

158
00:09:53,000 --> 00:09:55,000
Okay, so let's run this cell.

159
00:09:56,000 --> 00:10:02,000
It would drop each of the detected objects separately and save in a separate image file.

160
00:10:03,000 --> 00:10:04,000
Okay, let me show you.

161
00:10:04,000 --> 00:10:07,000
So our results are saved in runs detect period three.

162
00:10:07,000 --> 00:10:14,000
So just here and predict three and crop images over here so you can see that we have.

163
00:10:17,000 --> 00:10:18,000
Air bags.

164
00:10:18,000 --> 00:10:18,000
Bicycle.

165
00:10:18,000 --> 00:10:20,000
Bicycle bags.

166
00:10:20,000 --> 00:10:20,000
Motorcycle.

167
00:10:20,000 --> 00:10:22,000
Personal traffic lights.

168
00:10:22,000 --> 00:10:24,000
So you can see over here.

169
00:10:25,000 --> 00:10:31,000
So each person, each directed person bounding box is saved over here, like you can see over here,

170
00:10:31,000 --> 00:10:36,000
we have the separate image for each of the person, like you can see over here.

171
00:10:37,000 --> 00:10:43,000
He rejected person bounding box and the image is saved separate separately in a jpg file.

172
00:10:43,000 --> 00:10:50,000
To do this we just write save dash prop is equal to true which saves the each of the bounding box separately.

173
00:10:50,000 --> 00:10:51,000
Okay.

174
00:10:51,000 --> 00:10:53,000
So let me display this over here as well.

175
00:10:55,000 --> 00:10:56,000
Okay.

176
00:10:56,000 --> 00:10:58,000
So in the next step, if we want if we want to.

177
00:10:59,000 --> 00:11:06,000
Remove this label like this is the label person is the label, and 0.89 is the confidence.

178
00:11:06,000 --> 00:11:11,000
So if I want to remove the label of person and the 0.89, which is the confidence.

179
00:11:11,000 --> 00:11:13,000
So what I need to do is.

180
00:11:14,000 --> 00:11:19,000
So high Dash label will remove the label and hide dash.

181
00:11:19,000 --> 00:11:22,000
Confidence will hide the confidence value.

182
00:11:22,000 --> 00:11:27,000
Dash label will hide the label, which is person bus or a traffic light.

183
00:11:27,000 --> 00:11:31,000
Dash Confidence means how much model is shown that this prediction is correct.

184
00:11:31,000 --> 00:11:37,000
So if write height dash confidence is equal to true, it will hide the confidence value, which is varies

185
00:11:37,000 --> 00:11:38,000
from 0 to 1.

186
00:11:38,000 --> 00:11:46,000
So if the confidence value is 0.50, it means model is very much sure that this is predictions are correct.

187
00:11:46,000 --> 00:11:51,000
While if the confidence value is 0.10, it means the model is not sure that whether the prediction is

188
00:11:51,000 --> 00:11:53,000
correct or not.

189
00:11:53,000 --> 00:11:54,000
Okay.

190
00:11:56,000 --> 00:11:57,000
So I wonder if I run this cell.

191
00:12:00,000 --> 00:12:01,000
Uh.

192
00:12:01,000 --> 00:12:04,000
Here, you can see that it might take a few seconds.

193
00:12:05,000 --> 00:12:05,000
Run.

194
00:12:05,000 --> 00:12:08,000
So our results are saving prediction for.

195
00:12:08,000 --> 00:12:12,000
So if I go to this over here.

196
00:12:13,000 --> 00:12:15,000
And just copy this.

197
00:12:16,000 --> 00:12:21,000
And just paste it over here and display this output image.

198
00:12:21,000 --> 00:12:31,000
So we can see that our predictions, our class labels and the confidence value is not not either.

199
00:12:31,000 --> 00:12:32,000
We cannot see it.

200
00:12:32,000 --> 00:12:36,000
So we have to hide the class label and the corporate values.

201
00:12:37,000 --> 00:12:41,000
Okay, so now we can run object detection on videos as well.

202
00:12:41,000 --> 00:12:44,000
So here is a demo video which I have uploaded over here.

203
00:12:44,000 --> 00:12:48,000
You can upload any of the video of your choice by clicking.

204
00:12:48,000 --> 00:12:54,000
Right clicking over here and clicking on the upload option will upload any image or video of your choice

205
00:12:54,000 --> 00:12:55,000
into your local directory.

206
00:12:56,000 --> 00:12:58,000
Okay, so let's run this cell.

207
00:12:59,000 --> 00:13:00,000
For in the source.

208
00:13:00,000 --> 00:13:06,000
You just need to pass the video path of your video so we know how to get the video path.

209
00:13:06,000 --> 00:13:07,000
You will just click on this.

210
00:13:08,000 --> 00:13:12,000
He points over here and just copy path and just.

211
00:13:13,000 --> 00:13:20,000
Is this part over here and just run all this so you can see that each we get the predictions in each

212
00:13:20,000 --> 00:13:21,000
of the frames.

213
00:13:21,000 --> 00:13:25,000
These are the predictions in each of the frames, like in some frame, which is detecting three cars

214
00:13:25,000 --> 00:13:28,000
in some frame, obstructing five cars in some frame.

215
00:13:28,000 --> 00:13:30,000
It's selecting seven cars.

216
00:13:32,000 --> 00:13:35,000
So it's not a very big video.

217
00:13:35,000 --> 00:13:38,000
So it will just be complete very soon in a few seconds now.

218
00:13:40,000 --> 00:13:45,000
So our script has run and our output is saved and runs detect predict file.

219
00:13:45,000 --> 00:13:48,000
So here, over here our output is saved.

220
00:13:48,000 --> 00:13:50,000
So just copy this over here.

221
00:13:51,000 --> 00:13:54,000
Copy path and just paste this over here.

222
00:13:56,000 --> 00:13:58,000
And just run this.

223
00:14:00,000 --> 00:14:02,000
I will have the output video over here.

224
00:14:04,000 --> 00:14:05,000
Or just checking it.

225
00:14:07,000 --> 00:14:11,000
And now we get just get the output of over here.

226
00:14:11,000 --> 00:14:18,000
So it might take a few seconds more to run, but it will not take more than a few seconds.

227
00:14:18,000 --> 00:14:19,000
So just wait.

228
00:14:19,000 --> 00:14:24,000
And as we get the output demo, we will download it and show it to you.

229
00:14:24,000 --> 00:14:27,000
Then we'll move towards the image segmentation task.

230
00:14:27,000 --> 00:14:34,000
So just a few minutes more it will be downloading and will be can be seen over here.

231
00:14:42,000 --> 00:14:45,000
So here we have our output demo video.

232
00:14:45,000 --> 00:14:50,000
Let's download it over here and see what results did we get from this.

233
00:14:50,000 --> 00:14:53,000
So let's let we have downloaded.

234
00:14:54,000 --> 00:14:56,000
Let me just play this.

235
00:15:00,000 --> 00:15:02,000
I do love going to the spaceship.

236
00:15:03,000 --> 00:15:04,000
Okay.

237
00:15:04,000 --> 00:15:09,000
So you can see that detections are we have we are able to implement detections.

238
00:15:09,000 --> 00:15:16,000
You can see that the car is detected, bus is detected and proc are this is the wrong direction basically

239
00:15:16,000 --> 00:15:18,000
as we are using a small model.

240
00:15:18,000 --> 00:15:21,000
So detections are not very accurate.

241
00:15:21,000 --> 00:15:28,000
But if we use the YOLO last YOLO model, the detections will be very good, but an accuracy will also

242
00:15:28,000 --> 00:15:29,000
be very good.

243
00:15:29,000 --> 00:15:32,000
But we will have a compromise in terms of speed.

244
00:15:32,000 --> 00:15:33,000
So.

245
00:15:34,000 --> 00:15:36,000
So these are the results of our detection.

246
00:15:36,000 --> 00:15:39,000
Now let's move towards the segmentation part.

247
00:15:46,000 --> 00:15:49,000
So here we have the image segmentation.

248
00:15:49,000 --> 00:15:52,000
Now we will implement the image segmentation.

249
00:15:52,000 --> 00:16:00,000
So just we will make one change into the command line CLI, which is DOS is instead of setting dos is

250
00:16:00,000 --> 00:16:01,000
equal to detect.

251
00:16:01,000 --> 00:16:06,000
Now we'll select DOS is equal to segment for segmentation and all the everything will be saved mod is

252
00:16:06,000 --> 00:16:10,000
equal to predict model is equal to s.

253
00:16:10,000 --> 00:16:18,000
Now we will just write yolo v s dot, but here we will just as dash seg for the segmentation.

254
00:16:18,000 --> 00:16:22,000
So instead, in case of classification we just set dash cls.

255
00:16:22,000 --> 00:16:24,000
So in simple case for detection we just write.

256
00:16:25,000 --> 00:16:31,000
We ate escaped, but in case of segmentation we write dash sac for segmentation for classification,

257
00:16:31,000 --> 00:16:38,000
we do dash CLS and in the source we pass our input image path which is over here which we have passed.

258
00:16:38,000 --> 00:16:39,000
Image one dot JPG.

259
00:16:39,000 --> 00:16:45,000
Now just run this cell and see what output image we get.

260
00:16:45,000 --> 00:16:47,000
So it might take few seconds.

261
00:16:47,000 --> 00:16:50,000
So let's see what do we get in the output?

262
00:16:52,000 --> 00:16:57,000
So the fusing layers and here our output image is saved.

263
00:16:57,000 --> 00:17:00,000
So I have already set this path over here.

264
00:17:00,000 --> 00:17:07,000
And if we run this cell so we get our output image, which is like this, now we can see the segmentation

265
00:17:07,000 --> 00:17:08,000
is implemented.

266
00:17:08,000 --> 00:17:12,000
So this is basically semantic segmentation over here.

267
00:17:12,000 --> 00:17:18,000
Like you can see the person's all the persons have a same mask, same color mask, which is of maroon,

268
00:17:18,000 --> 00:17:24,000
while the bus, the traffic lights have some different color masks.

269
00:17:24,000 --> 00:17:27,000
But for the same object, the mask color is same.

270
00:17:27,000 --> 00:17:30,000
Okay, so we can also hide labels and confidence.

271
00:17:30,000 --> 00:17:36,000
Whether you are here by setting height dash labels equal true and hide dash confidence is equal to true.

272
00:17:36,000 --> 00:17:40,000
So just run this cell and display this output image over here.

273
00:17:43,000 --> 00:17:45,000
So it might take a few seconds.

274
00:17:45,000 --> 00:17:47,000
Then let's see what the results do we get.

275
00:17:48,000 --> 00:17:51,000
But you can see now we don't have any label or neither the confidence value.

276
00:17:51,000 --> 00:17:54,000
Now we can run segmentation on video as well.

277
00:17:54,000 --> 00:17:57,000
So now let's run the segmentation on video.

278
00:17:58,000 --> 00:18:00,000
So the script is running.

279
00:18:00,000 --> 00:18:04,000
It might take a few seconds to display, but let's see.

280
00:18:05,000 --> 00:18:10,000
So these are the detections in each of the frame where the model is doing the detections in each of

281
00:18:10,000 --> 00:18:17,000
the frame, like six cars, five trucks, and the time taken to detect this object is 11.4 millisecond,

282
00:18:17,000 --> 00:18:19,000
13.3 milliseconds.

283
00:18:19,000 --> 00:18:22,000
So the time taken to reach in each frame.

284
00:18:22,000 --> 00:18:25,000
So the timing can also be seen over here.

285
00:18:25,000 --> 00:18:29,000
So the process is going on.

286
00:18:29,000 --> 00:18:32,000
Let's see, as it finishes, we will see a display.

287
00:18:32,000 --> 00:18:35,000
Our output video is also over here as well.

288
00:18:38,000 --> 00:18:42,000
So it might take a few more seconds to complete.

289
00:18:42,000 --> 00:18:44,000
But let's wait and see what the results.

290
00:18:45,000 --> 00:18:47,000
So our results are saved in result runs.

291
00:18:47,000 --> 00:18:48,000
Predict segment.

292
00:18:48,000 --> 00:18:49,000
Predict three.

293
00:18:49,000 --> 00:18:56,000
So I've already saved this set, this part, and my video name was demo MP4, so my output video name

294
00:18:56,000 --> 00:18:58,000
is also demo dot mp4.

295
00:18:58,000 --> 00:19:02,000
Okay, so let's see the results over here.

296
00:19:02,000 --> 00:19:03,000
What kind of results do we get?

297
00:19:08,000 --> 00:19:10,000
But here is our output demo video.

298
00:19:10,000 --> 00:19:14,000
Let's just play it so you can see that the segmentation is implemented.

299
00:19:14,000 --> 00:19:20,000
Like we can see the mask over each of the object, The for example, the objects which are same like

300
00:19:20,000 --> 00:19:26,000
cars, they have the same color mask which is of orange, while the truck has a light green color mask

301
00:19:26,000 --> 00:19:29,000
and all the trucks have this light green color mask.

302
00:19:29,000 --> 00:19:32,000
So we are able to implement segmentation on video as well.

303
00:19:33,000 --> 00:19:34,000
Now export the model.

304
00:19:34,000 --> 00:19:41,000
Now let's see how we can export any model PRE-TRAINED or a fine tuned model in any of the format.

305
00:19:41,000 --> 00:19:45,000
For example, I want to export model into the onnx format.

306
00:19:45,000 --> 00:19:53,000
So to this I will just write format is equal to Onyx and all the other things will remain the same.

307
00:19:53,000 --> 00:19:57,000
I will just make format is equal to or double and onyx.

308
00:19:57,000 --> 00:19:59,000
And here I will write my model name.

309
00:19:59,000 --> 00:20:06,000
It can be a pre-trained model or a fine tuned model so you can write your own model name and just set

310
00:20:06,000 --> 00:20:10,000
the format is equal to double X on format.

311
00:20:10,000 --> 00:20:13,000
So our model will be converted into onnx format.

312
00:20:13,000 --> 00:20:15,000
So just run this cell over here.

313
00:20:17,000 --> 00:20:19,000
And see what results do we get?

314
00:20:21,000 --> 00:20:22,000
Might.

315
00:20:22,000 --> 00:20:23,000
It might take a few seconds.

316
00:20:24,000 --> 00:20:25,000
So just.

317
00:20:26,000 --> 00:20:30,000
So currently we are converting a detection model into the Onnx format.

318
00:20:30,000 --> 00:20:36,000
In the next part we will see how we can convert segmentation model into the onnx format.

319
00:20:36,000 --> 00:20:38,000
So let's say we have.

320
00:20:38,000 --> 00:20:46,000
So here you can see that we have passed YOLO, V8, SP2 over here and it is converted into DOT or an

321
00:20:46,000 --> 00:20:48,000
x or X format over here.

322
00:20:48,000 --> 00:20:52,000
Okay, let's convert the segmentation model into dot onex format.

323
00:20:52,000 --> 00:20:57,000
So here I have written the name of the segmentation model and format is equal to on double or x format

324
00:20:57,000 --> 00:21:01,000
and see whether it is converted into onnx format or not.

325
00:21:03,000 --> 00:21:09,000
It might take a few seconds and let's see so we can see that our segmentation model, which is YOLO,

326
00:21:09,000 --> 00:21:13,000
V8 dash segmentation, is converted into the ONNX format.

327
00:21:14,000 --> 00:21:18,000
So now we have done all this using command line interface.

328
00:21:18,000 --> 00:21:22,000
But if I want to do all this in Python, we can also do this.

329
00:21:22,000 --> 00:21:24,000
But let's start with image.

330
00:21:24,000 --> 00:21:27,000
For image we first type parameter.

331
00:21:27,000 --> 00:21:29,000
Let's import YOLO and.

332
00:21:30,000 --> 00:21:35,000
So here I will initialize YOLO by this variable name which I've written model.

333
00:21:35,000 --> 00:21:37,000
So you can write any of the variable name here.

334
00:21:37,000 --> 00:21:42,000
You can write your own name or here I can write my own name, which is Moinard here as well.

335
00:21:42,000 --> 00:21:46,000
So you can initialize YOLO with any of the variable name here.

336
00:21:46,000 --> 00:21:48,000
I've initialize YOLO with the model name.

337
00:21:48,000 --> 00:21:55,000
Then I have used the predict method that basically take all the parameters of this command line interface

338
00:21:55,000 --> 00:21:59,000
like model, board and task.

339
00:21:59,000 --> 00:22:05,000
So it takes all the parameters of the command line interface and in the model dot predict function,

340
00:22:05,000 --> 00:22:12,000
I just pass the source in which is my input image path and set save is equal to true, which will save

341
00:22:12,000 --> 00:22:16,000
the my output image and set the confidence value 0.5.

342
00:22:16,000 --> 00:22:22,000
And if I want to save the bounding box coordinates I will write save dash text is equal to true, which

343
00:22:22,000 --> 00:22:25,000
will save the bounding box coordinates and just run this cell.

344
00:22:26,000 --> 00:22:29,000
So here we will get the output image.

345
00:22:29,000 --> 00:22:33,000
It might take few seconds, but we will get the output image.

346
00:22:33,000 --> 00:22:36,000
So let's see, what else do we get over here?

347
00:22:36,000 --> 00:22:41,000
So our output image is saved in runs.

348
00:22:41,000 --> 00:22:42,000
Detect predict six.

349
00:22:42,000 --> 00:22:51,000
So if I pass this path over here, so just run this and see using this method, do we are we able to

350
00:22:51,000 --> 00:22:52,000
get the results?

351
00:22:52,000 --> 00:22:57,000
So you can see that my results are here and the detections are quite fine.

352
00:22:57,000 --> 00:23:00,000
So now we can use the same method for video as well.

353
00:23:00,000 --> 00:23:05,000
So here I'm just passing the path of my input video and setting the confidence value.

354
00:23:05,000 --> 00:23:07,000
Now I just don't want the coordinates.

355
00:23:07,000 --> 00:23:12,000
So let say text is equal to false and just run this cell.

356
00:23:12,000 --> 00:23:19,000
So this will print the output video, the detections on the output video.

357
00:23:19,000 --> 00:23:20,000
So let's see.

358
00:23:20,000 --> 00:23:22,000
What do we get over here?

359
00:23:22,000 --> 00:23:26,000
So here we are just for performing detections.

360
00:23:26,000 --> 00:23:29,000
You can also perform segmentation as well.

361
00:23:29,000 --> 00:23:31,000
So it's your or your choice.

362
00:23:31,000 --> 00:23:32,000
So.

363
00:23:33,000 --> 00:23:36,000
It will take for winners and to run.

364
00:23:36,000 --> 00:23:39,000
So let's wait for the few minutes until it finishes.

365
00:23:39,000 --> 00:23:43,000
Currently, it's where it is, so it's about to finish.

366
00:23:43,000 --> 00:23:47,000
So as it finish, we will see the output results.

367
00:23:47,000 --> 00:23:49,000
What results do we get over here?

368
00:23:49,000 --> 00:23:55,000
And then we will move towards the export section, how we can support our model into the Onnx format

369
00:23:55,000 --> 00:23:57,000
so it has run successfully.

370
00:23:57,000 --> 00:24:02,000
So I have already run this cell and here is the output video for the detections.

371
00:24:02,000 --> 00:24:07,000
You can see that the detections are working quite fine and the results are very good.

372
00:24:07,000 --> 00:24:12,000
Now we can see how we can export the model into the Onnx format as well.

373
00:24:12,000 --> 00:24:15,000
So export the model into the Onnx format.

374
00:24:15,000 --> 00:24:20,000
I will just remove the model dot, predict line and add the model dot export line.

375
00:24:20,000 --> 00:24:22,000
So let me show you what it is.

376
00:24:22,000 --> 00:24:27,000
So to convert the model into the onnx format, just remove this line and instead of this line, add

377
00:24:27,000 --> 00:24:29,000
the model dot export line.

378
00:24:29,000 --> 00:24:38,000
So let me say so instead of Model.predict, I have added model dot export and the format I want to convert

379
00:24:38,000 --> 00:24:40,000
on my model is on Onnx format.

380
00:24:40,000 --> 00:24:46,000
So just writing the name of the format and here is my model name and just converting into an export.

381
00:24:46,000 --> 00:24:49,000
So just run this and see what desserts do we get.

382
00:24:50,000 --> 00:24:57,000
So just let me tell you, I will upload this sample image video as well as this CoLab file into the

383
00:24:57,000 --> 00:25:03,000
attachment, so you can just download it from there and run the script and see what results do you get.

384
00:25:03,000 --> 00:25:04,000
So now it is done.

385
00:25:04,000 --> 00:25:08,000
So you can see that here is my model V8.

386
00:25:09,000 --> 00:25:10,000
For next fight.

387
00:25:10,000 --> 00:25:14,000
So the model successfully exported into the Onnx format.

388
00:25:14,000 --> 00:25:15,000
So this is all from this video.

389
00:25:15,000 --> 00:25:17,000
See you in the next video.

390
00:25:17,000 --> 00:25:18,000
Till then, bye bye.

