1
00:00:03,250 --> 00:00:10,480
In this video tutorial, we will learn how we can do object detection using YOLO v ten in Google Colab.

2
00:00:11,140 --> 00:00:16,030
YOLO ten is developed by researchers at Tsinghua University in China.

3
00:00:16,450 --> 00:00:23,170
YOLO ten is the state of the art, real time object detection model that outperforms all the other YOLO

4
00:00:23,170 --> 00:00:29,140
models in terms of average precision, parameter efficiency, and inference speed.

5
00:00:29,470 --> 00:00:35,830
So in this tutorial, we will look at how we can do object detection using YOLO written in Google Colab.

6
00:00:35,860 --> 00:00:39,850
So this is the official repository which you can see over here.

7
00:00:40,120 --> 00:00:44,830
Uh you can find all the graphs over here like average precision and latency.

8
00:00:44,980 --> 00:00:50,470
Uh, comparison latency or accuracy comparison graph where you can see that this is for the Yolo v ten.

9
00:00:50,470 --> 00:00:56,380
And you can see the YOLO v ten outperforms all the other Europe models in terms of accuracy as well

10
00:00:56,380 --> 00:00:58,030
as, uh, in latency.

11
00:00:58,030 --> 00:01:02,710
Like it takes less time as compared to, uh, other YOLO models.

12
00:01:02,710 --> 00:01:04,060
So what is basically latency?

13
00:01:04,060 --> 00:01:09,040
Latency is basically a time taken to do object detection on an input image.

14
00:01:10,250 --> 00:01:10,700
Okay.

15
00:01:10,730 --> 00:01:17,150
So, like, you can see that your lovely tan, uh, takes less inference time as compared to other YOLO

16
00:01:17,150 --> 00:01:21,890
model and in, uh, comparison to other YOLO models like you can see over here.

17
00:01:21,920 --> 00:01:23,990
Uh, these are the number of parameters in billions.

18
00:01:23,990 --> 00:01:29,450
So you can see that, uh, you know, over ten, uh, uh, basically, uh, the architecture of YOLO

19
00:01:29,570 --> 00:01:31,430
ten reduces the computational overhead.

20
00:01:31,430 --> 00:01:36,470
And you can see that the, uh, they are less number of parameters involved in YOLO v ten as compared

21
00:01:36,470 --> 00:01:37,970
to other YOLO models.

22
00:01:37,970 --> 00:01:43,700
And in terms of accuracy, we can clearly see that the model ten outperforms all the other YOLO models.

23
00:01:43,760 --> 00:01:44,000
Okay.

24
00:01:44,150 --> 00:01:50,480
And I can see that ten comes with, uh, six different models Yolov3 ten, nano, small, medium, large,

25
00:01:50,480 --> 00:01:52,100
extra large, all this stuff.

26
00:01:52,100 --> 00:01:57,170
And here you can see all the scripts are written, how we can do object detection using it and how we

27
00:01:57,170 --> 00:02:02,450
can export the model Onnx format and how we can, uh, train the model on any custom data set and how

28
00:02:02,450 --> 00:02:03,650
we can do the predictions.

29
00:02:03,650 --> 00:02:08,030
And here you can find the hugging face, all these things as well.

30
00:02:08,030 --> 00:02:09,950
And here is the citation.

31
00:02:10,250 --> 00:02:12,380
Uh, the code base is built with Ultralytics.

32
00:02:12,380 --> 00:02:14,180
So they have all provided all the details.

33
00:02:14,180 --> 00:02:16,370
And you can check out these details okay.

34
00:02:16,370 --> 00:02:20,120
And you can simply search it on GitHub and you will find this repository.

35
00:02:20,120 --> 00:02:24,530
And if you just click over here you will find the option paper as well.

36
00:02:24,530 --> 00:02:25,940
And you can review this paper.

37
00:02:25,940 --> 00:02:28,610
And I have already made a video tutorial as well.

38
00:02:28,610 --> 00:02:30,920
And in which I have explained the crux of this paper.

39
00:02:30,920 --> 00:02:33,710
Like what approach uh, approach they have adopt.

40
00:02:33,710 --> 00:02:34,790
Like uh.

41
00:02:35,620 --> 00:02:40,150
We all love it and follow the consistent rule assignment strategy.

42
00:02:40,150 --> 00:02:44,200
Plus they also follow efficient driven design strategy as well.

43
00:02:44,200 --> 00:02:46,360
So I've already explained all these things.

44
00:02:46,360 --> 00:02:53,500
What the normal approach they adopt uh, so that they can boost their accuracy and also uh, decrease

45
00:02:53,500 --> 00:02:56,470
the latency as well in France, latency as well.

46
00:02:56,920 --> 00:02:59,800
So I have already explained all these details in other video.

47
00:02:59,800 --> 00:03:03,070
You can check, uh, the video introductions to Eurovision.

48
00:03:04,690 --> 00:03:07,720
So now in the step number one, we need to clone this GitHub repo.

49
00:03:07,720 --> 00:03:11,590
So you can simply copy this link over here and add up this link over here.

50
00:03:11,590 --> 00:03:15,670
And we will just write git clone and set the clone folder as your current directory.

51
00:03:15,700 --> 00:03:21,040
CD stands for current directory, but please make sure that you have selected the runtime as your GPU

52
00:03:21,040 --> 00:03:22,390
so we are good to go ahead.

53
00:03:23,900 --> 00:03:26,300
So first of all, we'll clone this repository over here.

54
00:03:26,630 --> 00:03:27,050
Uh, okay.

55
00:03:27,050 --> 00:03:29,510
So now you can see we have the clone repository over here.

56
00:03:29,510 --> 00:03:31,970
And now we will install all the required packages.

57
00:03:33,670 --> 00:03:39,700
So now all the required packages will be installed that are required to do object detection on an image,

58
00:03:39,700 --> 00:03:41,410
video or a live webcam feed.

59
00:03:41,410 --> 00:03:43,840
So this will install all the required packages over here.

60
00:03:44,260 --> 00:03:48,460
So this will take few seconds before all the required packages are get installed.

61
00:03:52,080 --> 00:03:53,640
So meanwhile, it gets installed.

62
00:03:53,670 --> 00:03:56,220
Let me show you, uh, the pre-trained model words.

63
00:03:56,220 --> 00:03:57,180
Where are they available?

64
00:03:57,180 --> 00:04:02,010
So if you just click over here, uh, you can find all the pre-trained models weights over here.

65
00:04:02,010 --> 00:04:08,610
Your written small user written nano zero V ten x large and medium and user written large model here.

66
00:04:08,610 --> 00:04:12,210
So you can simply copy the link address over here okay.

67
00:04:12,540 --> 00:04:19,920
So and if you just paste this link address over here uh like simply if you just place this link address

68
00:04:19,920 --> 00:04:24,330
over here, you will find the link to your written nano model weights okay.

69
00:04:25,080 --> 00:04:31,080
So simply you just need to click over here and need to copy the link address.

70
00:04:31,080 --> 00:04:32,880
And you just need to add a link address.

71
00:04:32,880 --> 00:04:35,610
All these model weights link address over here.

72
00:04:35,610 --> 00:04:37,980
And uh I have already done the script.

73
00:04:37,980 --> 00:04:43,800
It will download the pre-trained model weights uh over here into this YOLO written repository.

74
00:04:43,800 --> 00:04:46,950
So in currently all the required packages that get installed.

75
00:04:47,040 --> 00:04:50,520
Uh, so this will take few more, uh, seconds before they get installed.

76
00:04:53,880 --> 00:04:55,920
So now you can see all the packages that are installed.

77
00:04:55,920 --> 00:04:58,650
So now we will download the model weights over there.

78
00:04:58,650 --> 00:05:00,180
So you can see we run this cell.

79
00:05:01,860 --> 00:05:02,130
And all.

80
00:05:02,130 --> 00:05:06,000
You can see that, um, all the models will get downloaded over here, like in.

81
00:05:06,000 --> 00:05:06,450
Fine.

82
00:05:07,080 --> 00:05:07,890
Okay.

83
00:05:07,920 --> 00:05:10,350
Let us get some leads and I will show you.

84
00:05:11,190 --> 00:05:14,850
So now you can see how weird the directory is being created over here.

85
00:05:14,850 --> 00:05:18,720
And we have added all the model weights over here like in find over here.

86
00:05:19,170 --> 00:05:19,530
Okay.

87
00:05:19,530 --> 00:05:22,650
So now we will be doing inference for an image using the pre-trained model.

88
00:05:22,650 --> 00:05:26,580
I have added, uploaded and input image on my Google drive.

89
00:05:26,580 --> 00:05:31,590
So I have just added the link over here and I'm directly downloading the input image from drive into

90
00:05:31,590 --> 00:05:33,360
this Google Colab notebook over here.

91
00:05:36,330 --> 00:05:36,870
Well, good.

92
00:05:37,620 --> 00:05:37,890
Okay.

93
00:05:37,890 --> 00:05:39,660
So now we can see we have the image.

94
00:05:39,660 --> 00:05:43,500
So we will doing object detection on this input image.

95
00:05:43,650 --> 00:05:44,280
Okay.

96
00:05:44,610 --> 00:05:51,510
So now you can see we have the bus traffic lights persons bicycle and all the stuff over here.

97
00:05:51,510 --> 00:05:54,150
Like you can see we have the first pathway as well.

98
00:05:54,150 --> 00:05:57,210
So let's do object detection on this input image.

99
00:05:57,210 --> 00:06:00,540
So we you just sitting right over here cost is equal to detect.

100
00:06:00,540 --> 00:06:04,350
We are doing prediction and we want to save the output image.

101
00:06:04,350 --> 00:06:08,520
And here you will pass the model weights over here you can use any other model weights over here.

102
00:06:08,520 --> 00:06:12,630
Like you can find all different model weights in which we can comes with.

103
00:06:12,660 --> 00:06:14,820
You can use any other model weights as well.

104
00:06:14,820 --> 00:06:17,400
And here you just pass the name of the input image.

105
00:06:17,610 --> 00:06:19,710
You are doing object detection on video.

106
00:06:19,710 --> 00:06:22,530
You will pass the video name over here okay.

107
00:06:22,530 --> 00:06:23,460
And let's see.

108
00:06:27,230 --> 00:06:28,310
Now how it works.

109
00:06:32,820 --> 00:06:38,310
But now you can see we have done detections of one bicycle, 12 persons, one train, three traffic

110
00:06:38,310 --> 00:06:45,390
lights, two bag bags, four handbags are detected and the inference time is taken as 95.5 millisecond,

111
00:06:45,390 --> 00:06:48,090
the time taken to do prediction on this input image.

112
00:06:48,840 --> 00:06:51,120
So let me display this output image over here.

113
00:06:51,600 --> 00:06:53,880
So here we have the output image.

114
00:06:53,880 --> 00:06:57,180
But you can see over here bicycle train traffic lights.

115
00:06:57,180 --> 00:07:01,230
This person which are very far away are also detected with a very good confidence score as well.

116
00:07:01,230 --> 00:07:03,720
Like can see over here and bag backpack.

117
00:07:03,720 --> 00:07:06,510
All this stuff is also detected and it looks good.

118
00:07:07,080 --> 00:07:10,980
Uh, now we will doing a friends on a video using the pre-trained model.

119
00:07:11,220 --> 00:07:13,380
Okay, so let's do this.

120
00:07:13,380 --> 00:07:18,600
I'm downloading a sample video from worldwide directly to this Google Colab notebook over here.

121
00:07:23,660 --> 00:07:27,710
Okay, so let's see how it goes.

122
00:07:33,690 --> 00:07:39,000
So now you can see, uh, all the complete video is being divided into 1314 frames.

123
00:07:39,000 --> 00:07:42,120
And we are doing object detection on each of the frames as well.

124
00:07:42,120 --> 00:07:46,410
And then in the output we get a complete video output as well.

125
00:07:46,440 --> 00:07:46,770
Okay.

126
00:07:46,770 --> 00:07:51,420
So now it will take few seconds to do object detection on this complete video.

127
00:07:56,170 --> 00:07:58,870
The object detection on this input video is being done.

128
00:07:58,870 --> 00:08:02,140
Like you can see for all the 1340 frames.

129
00:08:02,140 --> 00:08:02,710
It's done.

130
00:08:02,710 --> 00:08:05,020
And here we have the output video.

131
00:08:05,110 --> 00:08:09,340
So let me download this output video and show you on the screen how does it looks like?

132
00:08:10,360 --> 00:08:10,630
Uh.

133
00:08:22,310 --> 00:08:22,760
Us.

134
00:08:22,880 --> 00:08:25,250
Well, now you can see over here.

135
00:08:25,430 --> 00:08:31,340
Uh, here we have the output video, but like you can see over here, um, we are able to detect the

136
00:08:31,340 --> 00:08:32,810
cars over here.

137
00:08:32,900 --> 00:08:33,410
Okay.

138
00:08:33,410 --> 00:08:35,990
And the truck over here is also detected.

139
00:08:35,990 --> 00:08:41,570
We have detected, uh, this is a false positive like was shown previously.

140
00:08:41,870 --> 00:08:44,270
Um, but you can see the detection results are good.

141
00:08:44,270 --> 00:08:46,220
Like, we are able to detect the cars truck.

142
00:08:46,280 --> 00:08:52,040
So now you can see over here we are able to do object detection on an image and, and as well as on

143
00:08:52,040 --> 00:08:53,210
the video as well.

144
00:08:53,210 --> 00:08:56,900
And we have written gives quite good results as well.

145
00:08:57,230 --> 00:08:59,390
So that's all from this tutorial.

146
00:08:59,390 --> 00:09:05,960
In this tutorial we have seen that how we can do object detection on an image and on video using uh,

147
00:09:05,960 --> 00:09:06,980
YOLO written.

148
00:09:07,160 --> 00:09:07,640
Okay.

149
00:09:07,640 --> 00:09:08,690
Thank you for watching.