1
00:00:03,110 --> 00:00:08,600
The other way, then, is the latest state of the art object detection model that has been developed

2
00:00:08,600 --> 00:00:12,410
by the researchers at Tsinghua University in China.

3
00:00:13,280 --> 00:00:13,820
So YOLO.

4
00:00:13,820 --> 00:00:18,320
It then introduces a novel approach for real time object detection.

5
00:00:19,010 --> 00:00:26,630
So it basically addresses the deficiencies in both post-processing and model architecture that are found

6
00:00:26,630 --> 00:00:32,330
in the earlier YOLO versions, which include Yolo b nine, B eight, Yolo v7.

7
00:00:32,900 --> 00:00:36,710
So what are the problems in previous versions?

8
00:00:36,740 --> 00:00:45,170
So the previous YOLO models, including YOLO, YOLO, V, eight and V7 were using non expression technique

9
00:00:45,170 --> 00:00:46,580
in post-processing.

10
00:00:46,580 --> 00:00:50,450
During inference, I will explain you what is known by expression later on.

11
00:00:50,450 --> 00:00:57,860
But remember the previous models were using non expression techniques in post-processing during inference,

12
00:00:58,340 --> 00:01:03,320
which leads to inefficiencies and increased inference latency time.

13
00:01:04,190 --> 00:01:07,880
So you love it then eliminates the need for the normal expression.

14
00:01:07,880 --> 00:01:11,180
In yolo v ten we are not using the normal expression technique.

15
00:01:11,420 --> 00:01:17,540
Okay, so the non expression technique leads to inefficiencies and increases the inference latency.

16
00:01:18,080 --> 00:01:21,560
So yolo v ten eliminates the need for the normal expression.

17
00:01:21,560 --> 00:01:29,120
And along with this architecture enhancement are made as well in YOLO beta which include optimizing

18
00:01:29,120 --> 00:01:30,620
various model components.

19
00:01:31,540 --> 00:01:38,980
So in short, you can achieve state of the art performance with significantly reduced computational

20
00:01:38,980 --> 00:01:39,550
overhead.

21
00:01:39,700 --> 00:01:47,080
And when wallaby ten is ten state on the benchmark Ms-coco data set, wallaby ten showed shows a superior

22
00:01:47,080 --> 00:01:52,960
accuracy and latency trade off than the other YOLO models, which include YOLO, Be nine, YOLO, V8,

23
00:01:52,960 --> 00:01:54,010
and Yolov7.

24
00:01:54,040 --> 00:01:55,690
So in this tutorial.

25
00:01:56,920 --> 00:02:01,360
We will see what is geology and how geology ten works.

26
00:02:01,360 --> 00:02:06,220
And, uh, we will see what architecture enhancements are made in geology ten.

27
00:02:06,220 --> 00:02:11,680
And we will do a performance comparison of we ten with other Yolov2 models as well.

28
00:02:11,680 --> 00:02:14,170
And that's all what we will cover in this tutorial.

29
00:02:14,170 --> 00:02:15,460
So let's get started.

30
00:02:15,460 --> 00:02:22,060
So YOLO ten is the real time state of the art object detection model introduced in the paper YOLO v

31
00:02:22,060 --> 00:02:24,370
ten Real-Time End to End Object Detection.

32
00:02:24,370 --> 00:02:26,560
So this paper is available online.

33
00:02:26,560 --> 00:02:31,240
I have just added a snapshot of this paper so you can review this complete paper as well.

34
00:02:31,240 --> 00:02:35,050
In this tutorial I will try to present the crux of this paper.

35
00:02:35,050 --> 00:02:36,790
What is inside this paper.

36
00:02:36,790 --> 00:02:39,640
So I will just try to present the crux of this paper.

37
00:02:39,640 --> 00:02:46,180
So YOLO ten is released in May 2024 is a new advancement in the field of real time object detection.

38
00:02:46,180 --> 00:02:52,000
Basically, YOLO ten tries to address the issues that are faced by the, uh, previous models.

39
00:02:52,000 --> 00:02:55,510
Like what enhancements are made in YOLO ten plus.

40
00:02:55,840 --> 00:03:01,450
Our model architecture enhancements are made in YOLO plus YOLO and addresses the post-processing issues

41
00:03:01,570 --> 00:03:02,020
like the.

42
00:03:02,020 --> 00:03:04,720
It eliminates the need for the random expression.

43
00:03:07,350 --> 00:03:07,590
But.

44
00:03:08,510 --> 00:03:10,310
So what is YOLO we then?

45
00:03:10,310 --> 00:03:12,770
YOLO v ten is a cutting edge.

46
00:03:13,220 --> 00:03:18,620
Computer vision architecture designed for real time object detection, and it is built upon the advancements

47
00:03:18,620 --> 00:03:19,970
of its predecessor.

48
00:03:20,600 --> 00:03:23,660
Okay, Yolo v ten model achieves a higher mean average compare.

49
00:03:23,810 --> 00:03:31,070
Mean average precision compared to earlier modular models such as V9, Yolo V8, Yolo v7 when benchmarked

50
00:03:31,070 --> 00:03:33,020
against the Ms-coco data set.

51
00:03:33,020 --> 00:03:38,330
So you can see over here, um, I will explain it in the Dail as well, but you can see over here this

52
00:03:38,330 --> 00:03:41,330
is the accuracy average precision, and this is the latency.

53
00:03:41,330 --> 00:03:46,010
So we can see an accuracy and uh latency and accuracy trade off over here.

54
00:03:46,010 --> 00:03:47,930
And you can see this red line.

55
00:03:47,960 --> 00:03:55,910
So this is uh you can see over here YOLO V ten outperforms all the previous models in terms of accuracy

56
00:03:55,910 --> 00:03:58,580
as well as in terms of latency as well.

57
00:03:58,580 --> 00:04:00,530
So what is latency basically?

58
00:04:00,620 --> 00:04:05,600
Uh, latency is basically the time taken to do object detection on an input image.

59
00:04:05,600 --> 00:04:06,830
So this is latency.

60
00:04:06,830 --> 00:04:12,170
Latency is basically the time which is taken to do object detection on an input image.

61
00:04:12,170 --> 00:04:19,580
So we can say that YOLO with ten uh takes less time as compared to other YOLO models like you can see

62
00:04:19,580 --> 00:04:20,330
over here.

63
00:04:20,600 --> 00:04:26,870
Uh, from here we can see that YOLO ten definitely takes very much less time as compared to other YOLO

64
00:04:26,870 --> 00:04:33,230
models like, uh, it, uh, quickly does object detection on an input image or an input frame than

65
00:04:33,230 --> 00:04:34,250
other YOLO models.

66
00:04:34,250 --> 00:04:40,640
Plus it you can see over here, uh, YOLO ten gives good accuracy like you can see over here in comparison

67
00:04:40,640 --> 00:04:42,350
to other YOLO retirement.

68
00:04:42,440 --> 00:04:44,000
Uh, YOLO models.

69
00:04:44,450 --> 00:04:47,540
So, uh, that is all, uh.

70
00:04:48,690 --> 00:04:53,880
So YOLO with an object or various strategies to tackle the limitations of previous ruler models.

71
00:04:53,880 --> 00:04:58,110
Earlier neural models rely on non expression for post-processing during inference.

72
00:04:58,110 --> 00:05:05,430
I told you that uh, earlier models rely on the non expression for technique uh, during post-processing,

73
00:05:05,430 --> 00:05:10,860
uh, while doing inference which leads to inefficiencies and increase the inference latency.

74
00:05:10,860 --> 00:05:17,970
So the basically if I use Non-max suppression technique it post-processing uh during inference, uh,

75
00:05:17,970 --> 00:05:21,060
basically this will lead to uh, more latency time.

76
00:05:21,060 --> 00:05:25,200
Like it will take more time to process, uh, to do object detection on input image.

77
00:05:25,500 --> 00:05:33,180
Uh, and plus this will also compromise on the accuracy as well, which basically comes with inefficiencies.

78
00:05:33,240 --> 00:05:38,340
Uh, due to uh non expression comes with some inefficiencies to address these limitations.

79
00:05:38,340 --> 00:05:42,300
YOLO we then comes with a consistent view assignment strategy.

80
00:05:42,300 --> 00:05:48,060
So Yolo v ten basically adopts a consistent dual assignment strategy which eliminates the need for non

81
00:05:48,120 --> 00:05:50,310
expression during inference.

82
00:05:50,310 --> 00:05:53,640
And it significantly reduces the inference latency.

83
00:05:53,640 --> 00:05:59,190
So now you can see in yolo v ten we have eliminated or removed uh non expression technique.

84
00:05:59,190 --> 00:06:04,740
So by removing normal expression technique you can see that our latency time has reduced or reduced

85
00:06:04,740 --> 00:06:05,310
a bit.

86
00:06:05,550 --> 00:06:08,250
Uh our latency time has reduced significantly.

87
00:06:08,250 --> 00:06:13,530
Plus you can see that our accuracy has improved like YOLO outperforms all the other object detection

88
00:06:13,530 --> 00:06:14,220
models.

89
00:06:15,400 --> 00:06:17,020
So long.

90
00:06:17,080 --> 00:06:24,850
Expression technique is evaluated during inference in your log10, and it significantly reduces latency,

91
00:06:24,850 --> 00:06:28,300
the increased latency, while retaining competitive performance.

92
00:06:28,780 --> 00:06:36,010
So Jilawatan incorporates efficiency accuracy driven strategy, which involves optimizing various components

93
00:06:36,010 --> 00:06:40,540
of the model to minimize computational overhead enhance performance.

94
00:06:40,540 --> 00:06:45,340
So Yolov3 ten basically operates efficiency accuracy driven design strategy.

95
00:06:45,340 --> 00:06:49,450
So when we adopt efficiency accuracy driven design strategy.

96
00:06:49,450 --> 00:06:53,410
So we are optimizing various components in the YOLO model.

97
00:06:53,410 --> 00:06:58,330
We will discuss what components we are optimizing, what architecture enhancements are made in YOLO

98
00:06:58,510 --> 00:07:00,190
and later as we go ahead.

99
00:07:00,190 --> 00:07:06,310
So now you can see that as we adopt efficiency accuracy driven design strategy, uh, and we enhance

100
00:07:06,310 --> 00:07:10,930
various component of the model so that we can reduce computational overhead and enhance performance.

101
00:07:10,930 --> 00:07:16,360
And we can see in the graph as well, like you can see over here, you know, we didn't use uh, models

102
00:07:16,360 --> 00:07:21,550
has less number of parameters as compared to other models like these are in billions.

103
00:07:21,550 --> 00:07:22,930
Parameters are in millions.

104
00:07:22,930 --> 00:07:26,620
So now you can see over here this is the red color is for YOLO ten.

105
00:07:26,620 --> 00:07:31,330
And you can see that YOLO ten has uses less number of parameters.

106
00:07:31,570 --> 00:07:31,960
Uh.

107
00:07:32,940 --> 00:07:39,510
As compared to other YOLO models, and YOLO Eden has better accuracy as compared to all the other YOLO

108
00:07:40,770 --> 00:07:41,460
models.

109
00:07:43,010 --> 00:07:48,710
So now you can see that, uh, we have minimized by adopting efficiency, accuracy driven design strategy.

110
00:07:48,710 --> 00:07:50,780
We have minimized the computation overhead.

111
00:07:50,780 --> 00:07:57,320
And unlike less number of parameters are used and we have enhanced the model performance as well.

112
00:08:00,890 --> 00:08:06,830
So like we are discussing about Non-expression, we have evaluated non-max suppression during, in,

113
00:08:06,830 --> 00:08:09,380
uh, in post-processing, during inference.

114
00:08:09,380 --> 00:08:14,930
So what we are saying that, uh, YOLO eliminates the need for non-expert expression during inference.

115
00:08:14,930 --> 00:08:16,730
So what is Non-gm expression?

116
00:08:18,470 --> 00:08:24,380
So non expression is basically a post-processing technique used in object detection to remove the redundant

117
00:08:24,380 --> 00:08:26,210
or overlapping bounding boxes.

118
00:08:26,480 --> 00:08:29,840
Uh, the main aim of normal expression is to retain only the bounding boxes.

119
00:08:29,840 --> 00:08:33,740
Higher confidence score and the bounding boxes with lower confidence score are suppressed or removed.

120
00:08:34,070 --> 00:08:39,710
So now you can see that we have detected, uh, truck or a car in this image, like you can see over

121
00:08:39,710 --> 00:08:40,100
here.

122
00:08:40,100 --> 00:08:46,340
And you can see that after doing object detection with YOLO, we then, uh, what output we get is that,

123
00:08:46,340 --> 00:08:51,770
uh, we have that object detected, like, you can see over here, um, the truck, but you can see

124
00:08:51,770 --> 00:08:54,410
that we have multiple bounding boxes as well.

125
00:08:54,410 --> 00:09:01,190
So we will be we use non expression technique in earlier YOLO models like YOLO, YOLO, yolo v7 used

126
00:09:01,190 --> 00:09:05,630
non extrapolated non max suppression technique during inference.

127
00:09:05,810 --> 00:09:09,620
Uh so that they can remove the redundant or overlapping bounding boxes.

128
00:09:09,620 --> 00:09:11,270
So now you can see there is only one truck.

129
00:09:11,270 --> 00:09:15,050
So there should be only one bounding box like you can see over here.

130
00:09:15,530 --> 00:09:15,770
So.

131
00:09:16,970 --> 00:09:22,940
A little YOLO models by using non-max suppression techniques so that they can remove redundant or overlapping

132
00:09:22,940 --> 00:09:26,990
bounding boxes for the bounding box, which have the highest confidence score among all these bounding

133
00:09:26,990 --> 00:09:28,070
boxes will be retained.

134
00:09:28,070 --> 00:09:32,330
Like this morning box will have the highest confidence score than all the other bounding boxes, so

135
00:09:32,330 --> 00:09:33,080
it is retained.

136
00:09:33,290 --> 00:09:34,070
So.

137
00:09:34,820 --> 00:09:40,310
So previous models were using norm expression technique in during inference so that they can emulate

138
00:09:40,310 --> 00:09:42,830
the, uh, overlapping bounding boxes.

139
00:09:42,830 --> 00:09:45,230
But this is uh, leading to inefficiencies.

140
00:09:45,230 --> 00:09:48,830
And it was also increasing latency as well.

141
00:09:48,830 --> 00:09:53,030
So in rollup we then we have eliminated the need for the norm expression.

142
00:09:54,790 --> 00:09:58,120
So here you can see we will discuss how you have written words.

143
00:09:58,120 --> 00:10:02,470
You also written introduces a novel training strategy and architecture enhancement.

144
00:10:02,500 --> 00:10:05,380
Let's discuss the main component of all your written work.

145
00:10:05,440 --> 00:10:11,110
So yellow written produces an ms3 training strategy with new label assignments.

146
00:10:11,110 --> 00:10:13,000
So this is a snapshot from the paper I have.

147
00:10:13,000 --> 00:10:15,760
This app added our tracks over here.

148
00:10:15,760 --> 00:10:17,320
What is inside the paper.

149
00:10:17,620 --> 00:10:24,790
So YOLO written basically docs on novel training strategy and it provides some architecture enhancements

150
00:10:24,790 --> 00:10:27,970
as well or introduces some architecture enhancement as well.

151
00:10:28,240 --> 00:10:33,370
So like I told you that, uh, YOLO imitates the need for non-max expression.

152
00:10:33,370 --> 00:10:41,710
So traditional YOLO models like YOLO, benign yolo V8 employ 1 to 1 one to many assignments strategy

153
00:10:41,710 --> 00:10:44,950
during training which make it necessary to use norm expression.

154
00:10:44,950 --> 00:10:51,760
So the earlier models like YOLO benign V8 were using uh, norm expression technique.

155
00:10:51,760 --> 00:10:58,090
Or you can see that the earlier models were adopting one to many assignment strategy, which make it

156
00:10:58,090 --> 00:11:02,950
necessary to use norm expression during inference, which like we use norm expression.

157
00:11:02,950 --> 00:11:08,740
I told you that to filter out redundant or overlapping bounding boxes, which leads to inefficiencies

158
00:11:08,740 --> 00:11:10,990
and increase inference latency.

159
00:11:11,350 --> 00:11:14,650
Okay, so earlier YOLO models adopt one to many assignment strategy.

160
00:11:14,980 --> 00:11:21,730
So YOLO v ten adopts a new label assignment strategy that incorporates one to many like YOLO, and uses

161
00:11:21,730 --> 00:11:25,870
the one to many assignment strategy and 1 to 1 matching approaches.

162
00:11:25,870 --> 00:11:31,450
Like now you will be thinking that YOLO written observes one to many and 1 to 1 matching approaches

163
00:11:31,450 --> 00:11:31,840
like.

164
00:11:31,840 --> 00:11:38,410
But by thinking that in one to many assignment strategy they are using in non expression during inference

165
00:11:38,410 --> 00:11:40,630
to filter out predicted bounding boxes.

166
00:11:40,630 --> 00:11:46,990
But as we adopt one to many and 1 to 1 matching approaches, we will not be using non expression during

167
00:11:46,990 --> 00:11:47,560
inference.

168
00:11:47,560 --> 00:11:49,510
So what is 1 to 1 matching.

169
00:11:49,840 --> 00:11:51,400
So in 1 to 1 matching the.

170
00:11:51,400 --> 00:11:57,250
Our model assigns a single prediction to each ground truth instance eliminates the need for non expression.

171
00:11:57,250 --> 00:12:01,960
So in 1 to 1 uh matching we don't use non expression okay.

172
00:12:02,470 --> 00:12:08,920
But uh this results in weaker supervision or could be causing suboptimal accuracy and slower convergence.

173
00:12:08,920 --> 00:12:12,790
So but uh if we 1 to 1 matching we are not using norm expression.

174
00:12:12,790 --> 00:12:16,450
But uh this results in some compromise on accuracy.

175
00:12:16,630 --> 00:12:23,920
But in 1 to 1, one to many assignments, uh, although it provides richer supervisory signals or signals

176
00:12:23,920 --> 00:12:26,950
but uh required non expression per inference.

177
00:12:26,950 --> 00:12:31,180
So a one to many assignments we will be using non expression for inference.

178
00:12:31,180 --> 00:12:33,340
But this leads to uh.

179
00:12:34,610 --> 00:12:36,410
At latency like increased latency.

180
00:12:36,410 --> 00:12:41,570
So in 1 to 1 matching we don't use non-expression, but we are compromising accuracy.

181
00:12:41,570 --> 00:12:45,470
But in one to many, many segments we are not compromising on accuracy.

182
00:12:45,470 --> 00:12:50,330
We are using non-expression in inference, but this leads to increased latency.

183
00:12:50,600 --> 00:12:58,040
So how YOLO we can address this issue like not using non-expression plus not compromising on accuracy

184
00:12:58,040 --> 00:13:01,310
and also reducing the inference latency.

185
00:13:03,640 --> 00:13:08,830
Okay, so you love it and cleverly combine these strategies by introducing an additional 1 to 1 head.

186
00:13:08,830 --> 00:13:12,850
So basically, you know, we tend to use introducing an additional one one head.

187
00:13:13,060 --> 00:13:13,900
And.

188
00:13:15,350 --> 00:13:20,060
So mirroring the original one to many branch structure and optimization objectives.

189
00:13:20,330 --> 00:13:25,430
During start training, both heads are jointly optimized labeling that requires supervision from one

190
00:13:25,430 --> 00:13:26,780
to many assignments.

191
00:13:26,810 --> 00:13:30,110
So now this is embodied during the YOLO model.

192
00:13:30,110 --> 00:13:35,150
Initializes only 1 to 1 head, thus bypassing the need for norm expression.

193
00:13:35,150 --> 00:13:42,230
So while doing inference, YOLO with ten only adopts 1 to 1 approach, and it bypasses the need for

194
00:13:42,230 --> 00:13:47,480
norm expression and achieve high efficiency without adding additional inference cost.

195
00:13:47,480 --> 00:13:49,460
So this is how user written works.

196
00:13:51,090 --> 00:13:55,440
So another thing like consistent matching that is like we can see all the details in the paper.

197
00:13:55,440 --> 00:13:59,670
But the crux of this is that a key component of the dual assignment strategy.

198
00:13:59,670 --> 00:14:07,080
So as I told you, adopts a dual assignment strategy that uh, combines uh 1 to 1 matching and one to

199
00:14:07,080 --> 00:14:09,990
many assignments like you can see over here.

200
00:14:10,290 --> 00:14:15,180
So now you can see that dual label assignment strategy of one to many and 1 to 1 approach as well.

201
00:14:15,180 --> 00:14:21,390
But now you can see that after one to many and 1 to 1 they will be spent matching metric.

202
00:14:21,510 --> 00:14:27,060
So a key component of the dual label assignment strategy that we are adopting in YOLO and and so YOLO

203
00:14:27,060 --> 00:14:31,980
we then uh, adopted a novel approach which is called dual label assignment strategy.

204
00:14:31,980 --> 00:14:35,040
And here we have the complete architecture like how it works.

205
00:14:35,040 --> 00:14:39,840
So you can see a key component in new label assignment strategy is persistent matching metric.

206
00:14:39,990 --> 00:14:45,480
A few key component of that new label assignment strategy is the persistent matching metrics, which

207
00:14:45,480 --> 00:14:52,770
is used to evaluate the concordance between the prediction and ground truth instances like calculate

208
00:14:52,770 --> 00:14:58,200
the matching between the prediction and the ground truth strategy so you can see how close they are

209
00:14:58,200 --> 00:15:05,340
or how far away they are, like how close our predictions are with from, uh, ground truth and or how

210
00:15:05,340 --> 00:15:07,080
far our predictions are from the ground.

211
00:15:09,080 --> 00:15:12,590
So now you can see what architecture enhancements are made in your weekend.

212
00:15:12,620 --> 00:15:19,700
So the component of YOLO models are traditionally the components of YOLO model consist of the stem downsampling

213
00:15:19,700 --> 00:15:22,430
layer stages with basic building blocks, and the head.

214
00:15:22,970 --> 00:15:29,390
Uh Yolo focusing on optimizing the other three parts to enhance, uh, efficiency.

215
00:15:29,390 --> 00:15:34,100
So uh, we will be ten introduces basically lightweight classification had in it.

216
00:15:34,100 --> 00:15:38,810
And our light class lightweight classification head is designed to reduce the computational dependency

217
00:15:38,810 --> 00:15:41,810
in ensuring that a model operates more efficiently.

218
00:15:42,440 --> 00:15:44,600
Sparkle channel decoupled downsampling.

219
00:15:44,630 --> 00:15:51,200
So, uh, in YOLO, we dance party channel decoupled downsampling is applied to the Wise feature extraction,

220
00:15:51,200 --> 00:15:55,790
making uh the process more efficient and rank guided block design.

221
00:15:55,790 --> 00:16:01,670
The rank guided block design further streamlines the architecture, enhancing overall efficiency and

222
00:16:01,670 --> 00:16:02,960
large kernel convolution.

223
00:16:02,960 --> 00:16:07,670
The large kernel convolution is utilized to improve the model's capacity to capture detailed features,

224
00:16:07,670 --> 00:16:13,430
and the effective partial self-attention module most aggressive with minimal computational cost.

225
00:16:13,460 --> 00:16:16,250
So these are the architecture enhancement that are made in YOLO.

226
00:16:18,030 --> 00:16:23,520
So now we are doing a YOLO v ten performance comparison with other baseline models like YOLO, YOLO

227
00:16:23,520 --> 00:16:26,460
v ten comparison to other baseline models like v eight.

228
00:16:26,520 --> 00:16:28,260
So you can skip this.

229
00:16:28,290 --> 00:16:33,810
We are not, uh, using YOLO we ten D model in comparison because currently we are doing comparison

230
00:16:33,810 --> 00:16:36,060
with yolo v eight models.

231
00:16:36,060 --> 00:16:42,690
And uh, we are skipping this and we will be using yolo v ten and yolo with n s, YOLO with ten M,

232
00:16:42,690 --> 00:16:47,190
YOLO with n, l and x v x model for the comparison.

233
00:16:48,210 --> 00:16:54,120
In comparison to other baseline models like, there will be ten more straight improvements of 1.2%,

234
00:16:54,120 --> 00:17:01,290
1.4%, 0.5, 0.3 0.5% in average precision like you can see over here.

235
00:17:01,380 --> 00:17:03,390
Here we have the validation average precision.

236
00:17:03,390 --> 00:17:06,990
You can compare the yellow with ten, uh model average precision.

237
00:17:06,990 --> 00:17:08,610
And this yellow eight.

238
00:17:08,820 --> 00:17:14,670
And here you can see your S model average precision with yolo V8 S model over here.

239
00:17:14,670 --> 00:17:20,100
And so after doing this comparison you will see that your W ten demonstrate improvements by the numbers

240
00:17:20,100 --> 00:17:28,080
provided over here in average precision with 28%, 36%, 41%, 44%, 57 fewer parameters.

241
00:17:28,080 --> 00:17:34,560
Like you can see the parameters found in million by end users 2.3 in uh, yellow ten and model uses

242
00:17:34,560 --> 00:17:35,370
nano models.

243
00:17:35,370 --> 00:17:41,640
You can use this 2.3 million parameters, and Yolo V8 nano models uses 3.2 million parameters for definitely

244
00:17:41,790 --> 00:17:47,310
yellow B ten model uses less number of parameters, and YOLO with ten small model is 7.2 million parameters

245
00:17:47,310 --> 00:17:53,370
that YOLO small model uses 11.2 million parameters, so definitely using one model with this less number

246
00:17:53,370 --> 00:17:54,240
of parameters.

247
00:17:55,660 --> 00:18:05,890
And you can see that, uh, uh, with 28%, 36%, 41%, 44%, and 57% fewer minorities and eight model

248
00:18:06,280 --> 00:18:20,140
land uses, and 23%, 40, 24%, 25%, 27% 30% fewer calculation and 70% 65%, 50%, 41%, 37% lower

249
00:18:20,140 --> 00:18:20,710
latencies.

250
00:18:20,710 --> 00:18:25,300
Like you can see over here, latency in milliseconds with ten is and models.

251
00:18:25,300 --> 00:18:33,280
Nano models have 1.84, and the other V8 and nano model has 6.16 with ten, small model has 2.49 and

252
00:18:33,280 --> 00:18:33,820
Yolov5.

253
00:18:33,850 --> 00:18:36,640
A small model has 7.07.

254
00:18:36,640 --> 00:18:42,250
So you can see that Yolo v ten has a low latency as compared to the other.

255
00:18:42,250 --> 00:18:42,640
We're.

256
00:18:44,980 --> 00:18:49,630
So in conclusion, your dividend represents a significant investment in real time object detection and

257
00:18:49,990 --> 00:18:52,810
state of the art performance in terms of speed and accuracy.

258
00:18:52,810 --> 00:18:58,600
So Yolov3 ten introduces NMS speed training and it adopts an efficiency accuracy driven model design

259
00:18:58,600 --> 00:18:59,080
strategy.

260
00:18:59,080 --> 00:19:04,090
In efficiency accuracy driven model design strategy, we optimized various model components so that

261
00:19:04,090 --> 00:19:08,950
we can reduce the number of parameters and get more accurate results.

262
00:19:08,950 --> 00:19:14,530
So you do return includes improves accuracy while reducing computational redundancy and latency.

263
00:19:14,530 --> 00:19:20,110
Competitive analysis against baseline models like if we compare YOLO with ten model with eight Yolov7,

264
00:19:20,560 --> 00:19:26,680
our European R demonstrates superior performance in average precision, parameter efficiency and inference

265
00:19:26,680 --> 00:19:27,010
speed.

266
00:19:27,280 --> 00:19:29,020
Thank you for watching this tutorial.