1
00:00:00,240 --> 00:00:07,230
Hi and welcome to the lecture on object detection in this section will basically be taking a high level

2
00:00:07,230 --> 00:00:09,930
overview of the principles of object detection.

3
00:00:10,470 --> 00:00:11,670
So let's get started.

4
00:00:11,790 --> 00:00:14,610
So firstly, what is object detection?

5
00:00:15,150 --> 00:00:21,600
Well, so far we've extensively dealt with image classification tests that is determining an image as

6
00:00:21,600 --> 00:00:23,820
a whole to tell you what the image contains.

7
00:00:24,210 --> 00:00:27,400
An example of that is basically the sofar dataset.

8
00:00:27,420 --> 00:00:32,430
You can see these are all different images of airplanes, automobiles, birds and so on.

9
00:00:32,970 --> 00:00:39,210
So an image classification test, you're just telling us what the images we're not seeing, what is

10
00:00:39,390 --> 00:00:42,720
what are the objects in the image specifically like?

11
00:00:42,720 --> 00:00:48,150
I know, I know we're saying it's a call, but we're not actually saying it's a car and a wall or a

12
00:00:48,150 --> 00:00:51,900
bird and the tugboat behind it or a cage and so on.

13
00:00:52,380 --> 00:00:56,550
So let's take a look a bit more closely at what object detection gives us.

14
00:00:57,450 --> 00:01:04,830
So what if we wanted to individually classify the objects in an image like this?

15
00:01:04,980 --> 00:01:10,560
You can see now we have calls with a bounding box around each car in the scene.

16
00:01:10,620 --> 00:01:13,560
It's a bit hard to see the cause, but it actually works fairly well.

17
00:01:14,160 --> 00:01:19,180
Then you have people being drawn over with this box and the bus as well.

18
00:01:19,200 --> 00:01:23,820
So this is an example of an object detection algorithm, and it works quite well.

19
00:01:23,850 --> 00:01:29,400
This is actually, I believe this is a field of vision tree are used here and it's working quite well.

20
00:01:30,070 --> 00:01:32,010
And then you can see in that an example of it here.

21
00:01:32,220 --> 00:01:36,240
It's identifying individual objects and drawing this building in boxes around them.

22
00:01:36,660 --> 00:01:39,450
You have a ball club, mouse bottle of wine glass.

23
00:01:39,990 --> 00:01:41,480
This is quite cool, isn't it?

24
00:01:41,490 --> 00:01:46,770
An object detection is one of the most useful algorithms in computer vision.

25
00:01:47,250 --> 00:01:50,220
It has such a wide range of applications.

26
00:01:51,090 --> 00:01:54,660
And here's the last example here, which just gets a cup in an image here.

27
00:01:56,220 --> 00:01:58,890
So you may be thinking this is hard and it is hard.

28
00:01:59,600 --> 00:02:00,690
Take a look at this scene here.

29
00:02:00,990 --> 00:02:03,150
There's so many different objects to identify.

30
00:02:03,690 --> 00:02:11,370
So as you might rightfully infer, this is a challenging problem because we need to put two things localization

31
00:02:12,150 --> 00:02:16,050
that is determining where the boundaries of this object lie.

32
00:02:16,530 --> 00:02:20,430
So we need to figure out a way to draw a bounding box around object.

33
00:02:20,680 --> 00:02:21,480
We have identified.

34
00:02:22,230 --> 00:02:25,080
And then secondly, we need to classify what our object is.

35
00:02:25,440 --> 00:02:28,050
So this one's a bulb, this one's a cup and a mouse.

36
00:02:28,470 --> 00:02:32,490
So we need an algorithm, a computer vision, deep learning algorithm, preferably.

37
00:02:33,090 --> 00:02:36,040
Actually, that's pretty much the only way we can do this.

38
00:02:36,260 --> 00:02:42,900
We can use horror cascade classifiers, which you would have seen in the OpenCV U.S. to do detection.

39
00:02:43,410 --> 00:02:46,620
However, those don't work out too well in the real world.

40
00:02:47,010 --> 00:02:49,380
In many tests, and they're quite hard to trained.

41
00:02:50,280 --> 00:02:54,030
So let's take a look a deeper look at up to the object detectors.

42
00:02:55,140 --> 00:03:01,950
So optical directors are usually trained on one or more classes and is seek to produce a bounding box

43
00:03:01,950 --> 00:03:04,590
around the object class that's been identified.

44
00:03:05,250 --> 00:03:07,200
And there are many kinds of object detectors.

45
00:03:07,560 --> 00:03:12,150
There are some nonlinear deep learning based ones, as I mentioned, our cascade classifiers.

46
00:03:12,570 --> 00:03:15,930
And then also the sliding window at home, such histogram of gradients.

47
00:03:16,410 --> 00:03:22,200
However, these preceded the very cool, deep learning methods that we use nowadays.

48
00:03:22,740 --> 00:03:29,460
And these deep learning methods include things like RC and Estes YOLO, which is a very good detection

49
00:03:29,460 --> 00:03:30,660
and detection too.

50
00:03:30,690 --> 00:03:35,940
Well, it's a very good, efficient detect is actually quite good as well, and also rathlin it.

51
00:03:36,630 --> 00:03:40,830
And then in the deep learning world, we have two main types of public detectors.

52
00:03:41,250 --> 00:03:48,210
We have two shot, which uses two stages a region proposal stage and then a classification stage and

53
00:03:48,210 --> 00:03:53,010
then a single shot that was localization and image classification at once.

54
00:03:53,670 --> 00:03:58,770
So now let's take a look at the formal definition from Wikipedia for object detection.

55
00:03:59,340 --> 00:04:05,790
So object detection is a computer technology related to computer vision and image processing that deals

56
00:04:05,790 --> 00:04:13,260
with detecting instances of semantic objects of a certain class, such as humans buildings cause in

57
00:04:13,260 --> 00:04:14,820
digital images and videos.

58
00:04:15,660 --> 00:04:20,610
Well, research two means of public action, including face detection, which we have seen previously,

59
00:04:21,060 --> 00:04:28,080
and pedestrian detection nowadays loads of other applications you can use object detection for, and

60
00:04:28,080 --> 00:04:32,280
these include image retrieval, video surveillance and many, many others.

61
00:04:33,180 --> 00:04:39,090
So now that you've got a deeper understanding of object detection, I want to take a look at something

62
00:04:39,090 --> 00:04:40,920
called object segmentation.

63
00:04:41,340 --> 00:04:48,270
This is so that you understand the difference between object detection and segmentation tests, so it's

64
00:04:48,270 --> 00:04:49,470
related to object attention.

65
00:04:49,560 --> 00:04:53,970
However, it sometimes relies on some very different algorithms to do segmentation.

66
00:04:54,330 --> 00:04:56,700
So what exactly is object segmentation?

67
00:04:57,180 --> 00:04:59,340
Well, segmentation firstly involves.

68
00:04:59,780 --> 00:05:03,740
So level classification, and let's take a look at this example here.

69
00:05:04,280 --> 00:05:10,460
So what this means here is that this is a person here, and you can see every pixel of the person is

70
00:05:10,460 --> 00:05:11,840
highlighted in a red color.

71
00:05:12,290 --> 00:05:16,280
And so far, you can see that other people in the scene are also in this red color.

72
00:05:16,910 --> 00:05:20,420
Notice that the sidewalk has been identified as a class on its own.

73
00:05:20,840 --> 00:05:27,350
And it's all been shaded pink and the has been identified and also been chained like a pinkish docket.

74
00:05:27,380 --> 00:05:33,110
I think it's great and can see the cause being blue and trees being green.

75
00:05:33,890 --> 00:05:36,680
So that's what pixel level classification is.

76
00:05:37,070 --> 00:05:39,910
And this is an example of segmentation.

77
00:05:39,920 --> 00:05:42,080
So let's take a look at the differences here.

78
00:05:42,410 --> 00:05:49,940
So these objects in object recognition is image classification and object localization, which relates

79
00:05:49,940 --> 00:05:51,290
to object detection.

80
00:05:51,740 --> 00:05:55,040
And then afterwards, we have something called instant segmentation.

81
00:05:55,640 --> 00:06:01,550
Now it is two types of segmentation this one called semantic segmentation that we see on the right to

82
00:06:02,510 --> 00:06:06,710
notice the difference between semantic segmentation and instant segmentation.

83
00:06:07,250 --> 00:06:13,850
Now, an instant segmentation, we have the individual sheep being classified and shaded in different

84
00:06:13,850 --> 00:06:14,330
colors.

85
00:06:14,930 --> 00:06:17,030
They are all going to be the same class as sheep.

86
00:06:17,040 --> 00:06:21,110
But no, we have sheep one sheep, two sheep three and dog one.

87
00:06:21,770 --> 00:06:26,980
Whereas previously with semantic segmentation, which is what we saw in the previous image here, that

88
00:06:26,990 --> 00:06:27,590
was this one.

89
00:06:28,160 --> 00:06:35,210
This was an example of semantic segmentation loops, so you can see that all the classes are the same

90
00:06:35,210 --> 00:06:37,220
color and they all merge into one another.

91
00:06:37,910 --> 00:06:44,900
So if you want to do some sort of fine grained analysis and allow analysis of individual classes like

92
00:06:44,900 --> 00:06:50,660
this instance, segmentation that would be the method to use and you can see an example of it here.

93
00:06:51,080 --> 00:06:52,790
So this is what I wanted to talk about.

94
00:06:52,790 --> 00:06:58,160
So we have classification is just seeing that image has a cat classification.

95
00:06:58,160 --> 00:07:01,550
Plus localization is actually during a bounding box around a cat.

96
00:07:02,030 --> 00:07:06,560
Similarly, public detection is basically built on top of that same thing.

97
00:07:07,040 --> 00:07:12,110
However, object detection can no do multiple classes and do multiple boxes in an image.

98
00:07:12,560 --> 00:07:18,350
So generally, we always refer to this as object detection, and we don't really use classification

99
00:07:18,350 --> 00:07:21,060
and localization as an algorithm and computer vision.

100
00:07:21,470 --> 00:07:23,390
It's just the way to get to this.

101
00:07:24,020 --> 00:07:27,770
And then we have instant segmentation and also semantic segmentation.

102
00:07:28,100 --> 00:07:34,130
Instant segmentation is the more advanced vision of segmentation because we're now joined bounding boxes

103
00:07:34,130 --> 00:07:35,450
around individual cats.

104
00:07:36,710 --> 00:07:42,260
So like I said previously, object detection has a wide range of use cases.

105
00:07:42,890 --> 00:07:49,640
This is a screenshot I took from the rubble of sight removal as as a very cool computer vision company

106
00:07:49,640 --> 00:07:51,290
to have an annotation tool.

107
00:07:51,290 --> 00:07:56,480
They need models to provide more so many different models that you can train.

108
00:07:56,960 --> 00:07:58,010
It's really, really good.

109
00:07:58,010 --> 00:08:01,790
And the guys who are behind our workflow always making new updates.

110
00:08:01,790 --> 00:08:07,940
So I encourage you to check it out a little or a lot from these guys, and you can see how many different

111
00:08:07,940 --> 00:08:08,570
tests.

112
00:08:08,750 --> 00:08:15,740
Things like gas leak detection, flare stack monitoring, augmented reality, bean counting, which

113
00:08:15,740 --> 00:08:23,160
is just kind of niche to be fit traffic onto portable IDs, soccer player tracking self-driving cars

114
00:08:23,180 --> 00:08:29,870
the big one Remote Tech support, tennis line tracking, hard hat detection to something a lot of people

115
00:08:29,870 --> 00:08:30,770
are doing nowadays.

116
00:08:31,220 --> 00:08:38,510
Satellite imagery Logo ID Sushi Identifier I mean, that is an unlimited number of tests.

117
00:08:39,080 --> 00:08:42,500
It's like as if you're a computer vision startup.

118
00:08:42,500 --> 00:08:49,070
It's such a good thing just to go with an industry, talk to people, figure out what they would want

119
00:08:49,250 --> 00:08:55,940
to use cameras to identify or to detect, like so many different things out there that you wouldn't

120
00:08:55,940 --> 00:09:01,940
think about things like this and managing the equipment like just identifying like where suits and equipment

121
00:09:01,940 --> 00:09:04,220
is using cameras is very useful.

122
00:09:05,030 --> 00:09:08,390
So that's it for an intro into object detectors.

123
00:09:08,780 --> 00:09:12,110
Next, we'll take a look at the history of Early Object Texas.

124
00:09:12,470 --> 00:09:19,190
So we'll go to a quick overview on hard cascade classifiers and sliding window with histogram of gradients.

125
00:09:19,220 --> 00:09:21,350
So stay tuned for that lesson.

126
00:09:21,470 --> 00:09:22,250
Thank you for watching.
