1
00:00:00,330 --> 00:00:00,990
Welcome back.

2
00:00:01,230 --> 00:00:03,480
This is a lecture on YOLO.

3
00:00:03,540 --> 00:00:09,870
This is an overview, a high level overview of YOLO, which is my favorite object detector, so let's

4
00:00:09,870 --> 00:00:10,590
get started.

5
00:00:11,880 --> 00:00:17,730
So your stands for you only look once and was developed by two brilliant researchers, Joseph Redmond

6
00:00:17,730 --> 00:00:20,190
and Ali Farad in 2015.

7
00:00:20,310 --> 00:00:26,250
And this is basically a diagram that has become famous for YOLO because its use illustrates how good

8
00:00:26,250 --> 00:00:27,870
and how simple it was.

9
00:00:28,890 --> 00:00:34,890
So YOLO as a single stage detector, hence the name you only look once single stage, meaning that it

10
00:00:34,890 --> 00:00:40,010
actually saves a lot of time compared to our CNN's and even faster CNN's.

11
00:00:40,440 --> 00:00:45,750
Because the biggest drawback with faster our CNN's was that it was quite slow.

12
00:00:45,870 --> 00:00:53,940
It wasn't practical to run this on my video because frame rate on even powerful GPUs use back then with

13
00:00:53,940 --> 00:00:59,430
single digit frame rates, it was a seven usually unloved or faster and video super GPUs at that time.

14
00:01:00,330 --> 00:01:06,690
So since these two did dramatically improve that and they were also using a single stage technique,

15
00:01:06,690 --> 00:01:10,530
however, they wouldn't as nearly as accurate as our CNN's.

16
00:01:11,100 --> 00:01:13,350
And that's where YOLO is so important.

17
00:01:13,890 --> 00:01:14,430
YOLO.

18
00:01:14,550 --> 00:01:19,200
And now you look for and you live five attempted to solve both of those problems and strike a balance

19
00:01:19,200 --> 00:01:20,760
between speed and accuracy.

20
00:01:21,210 --> 00:01:28,200
And I actually would say no YOLO which and four and five and your X surpasses all other option factors

21
00:01:28,650 --> 00:01:33,150
in not just performance, but I mean, not not at speed, but in accuracy as well.

22
00:01:34,140 --> 00:01:36,030
So there are many flavors of YOLO.

23
00:01:36,600 --> 00:01:41,700
YOLO has had tremendous success in real world applications and have many, many different visions right

24
00:01:41,700 --> 00:01:41,970
now.

25
00:01:42,690 --> 00:01:49,410
Some examples are Tinio your motivation to live within Shreeve, which is four or five you'll scale

26
00:01:49,410 --> 00:01:55,830
the YOLO YOLO with various backhands is very customizable, and the research world and research community

27
00:01:55,830 --> 00:02:00,960
has really taken advantage of that and has explored so many different visions of YOLO.

28
00:02:01,560 --> 00:02:05,970
Right now, the most popular flavor fuel will use an industry at least right.

29
00:02:06,840 --> 00:02:10,110
Maybe before 2019 was Univision tree.

30
00:02:10,140 --> 00:02:13,980
However, no, it's the ultimate x implementation of YOLO five.

31
00:02:14,490 --> 00:02:19,960
YOLO five, which we'll talk about later on in a few lectures, is technically YOLO four.

32
00:02:19,980 --> 00:02:26,910
However, it's been YOLO for ported to PyTorch, with a number of optimizations that make training quite

33
00:02:26,910 --> 00:02:28,560
quick inference testing.

34
00:02:28,920 --> 00:02:34,110
It's really, really well done by automatics, and actually, I encourage you to use that can go to

35
00:02:34,110 --> 00:02:35,280
the GitHub and check it out.

36
00:02:35,400 --> 00:02:38,610
I'll actually be teaching a lecture on that later on in the course anyway.

37
00:02:39,360 --> 00:02:45,330
And because of how common YOLO is, is a number of pre-trained your models on the data said that are

38
00:02:45,330 --> 00:02:48,270
readily available out there that you can use in your code.

39
00:02:48,300 --> 00:02:54,750
Right now, it's quite easy to use, and these are the easy classes that it's been trained on on the

40
00:02:54,750 --> 00:02:55,560
cocoa dataset.

41
00:02:56,490 --> 00:02:59,850
So how does your little compare to other objects adapters?

42
00:03:00,420 --> 00:03:01,450
Well, let's take a look.

43
00:03:01,470 --> 00:03:05,820
So currently, the best performing YOLO models are YOLO four.

44
00:03:06,300 --> 00:03:09,130
Technically, you'll have four and five are the same, but your five is a little.

45
00:03:09,330 --> 00:03:11,870
That's a generally and you'll X.

46
00:03:11,910 --> 00:03:14,030
Then you can see this was a from the publication.

47
00:03:14,040 --> 00:03:18,450
This image here, the top right one from the YOLO four people.

48
00:03:18,900 --> 00:03:25,110
And you can see the speeds here, and the V 100 CPU was magnificent.

49
00:03:25,110 --> 00:03:29,160
It was like 110, and it was still getting very high at accuracies.

50
00:03:29,730 --> 00:03:35,910
You can see the efficient detect model with the backends we're getting better Macnab scores.

51
00:03:36,580 --> 00:03:36,960
However,

52
00:03:41,070 --> 00:03:42,270
however, they were quite slow.

53
00:03:43,230 --> 00:03:47,610
And then you can see in this diagram here this is the bottom left program.

54
00:03:48,120 --> 00:03:49,080
Yellow five.

55
00:03:49,080 --> 00:03:52,380
This is the diagram we used to compare to efficient with text.

56
00:03:52,860 --> 00:03:58,530
You can see it's actually beating a lot of the efficient tech models in terms of faster inference and

57
00:03:58,530 --> 00:04:01,050
better maps go as well, if it's quite remarkable.

58
00:04:01,680 --> 00:04:07,980
And this diagram here was from the Yellow X research paper that again showed it speeding efficient detect

59
00:04:07,980 --> 00:04:11,670
light, though not the efficient detect, not non light visions.

60
00:04:12,750 --> 00:04:14,940
You can see this is a small vision here.

61
00:04:14,970 --> 00:04:20,750
This is a larger vision here as well, with using the full efficient detector model YOLO Vision five

62
00:04:20,750 --> 00:04:25,500
a different backhands as well, and large and small models.

63
00:04:26,070 --> 00:04:31,680
And you can see your Excel has achieved quite good map score as well.

64
00:04:32,790 --> 00:04:34,590
So how does your work?

65
00:04:34,800 --> 00:04:43,230
Well, in prior object detection systems, they repurposed classify as a localized localizes to perform

66
00:04:43,230 --> 00:04:48,690
detection, and then they applied the model to an image at multiple locations and skills.

67
00:04:49,200 --> 00:04:54,120
So high scoring regions are considered image considered optic detections.

68
00:04:54,960 --> 00:04:59,760
So instead, yellower plays a single neural network.

69
00:04:59,900 --> 00:05:06,190
The full image, this network divides the image into regions and predicts bounding boxes and probabilities

70
00:05:06,190 --> 00:05:10,450
for each region probabilities of different classes to be specific.

71
00:05:10,990 --> 00:05:17,110
These bounding boxes are then weighted by the predicted probabilities, and this method has several

72
00:05:17,110 --> 00:05:22,000
advantages over classifier based systems because it looks at the whole image.

73
00:05:22,180 --> 00:05:29,170
Test time So its predictions are informed by the global context of the image, and also it makes predictions

74
00:05:29,170 --> 00:05:31,540
with a single network evaluation.

75
00:05:31,990 --> 00:05:36,460
Unlike RC events, which require 2000 for a single image.

76
00:05:37,150 --> 00:05:43,450
So this makes it extremely fast more than a thousand times faster than the original or CNN and 100x

77
00:05:43,450 --> 00:05:48,820
faster and faster as the events and that coming to the end of this lesson.

78
00:05:48,850 --> 00:05:53,820
However, I really do encourage you to read the your vision three people.

79
00:05:53,830 --> 00:05:55,390
It's actually quite hilarious.

80
00:05:55,840 --> 00:06:01,300
The researchers definitely don't take themselves too seriously, and it's quite an entertaining read

81
00:06:01,420 --> 00:06:02,530
to be thought to be us.

82
00:06:03,130 --> 00:06:04,330
So we'll stop there.

83
00:06:04,480 --> 00:06:09,880
And in the next section, we'll take a deeper look on how exactly does your work?

84
00:06:10,390 --> 00:06:12,430
So I'll see you in the next section.

85
00:06:12,580 --> 00:06:13,000
Thank you.