1
00:00:00,270 --> 00:00:06,420
So now let's talk about some early object detectors, and this would include horror cascade classifiers

2
00:00:06,870 --> 00:00:09,090
and using sliding windows with hugs.

3
00:00:09,570 --> 00:00:13,140
So let's take a look at the history of these early detectors.

4
00:00:13,800 --> 00:00:17,010
So first, we will start with her cascade classifiers.

5
00:00:17,700 --> 00:00:23,010
So object detection algorithms have been wrong since the early days of computer vision, and the first

6
00:00:23,010 --> 00:00:24,660
good attempts were in the early 90s.

7
00:00:25,200 --> 00:00:32,760
However, the first really successful implementation of object action was in 2001, with the introduction

8
00:00:32,760 --> 00:00:37,860
of hard cascade classifiers used by Violet joins for fierce detection.

9
00:00:38,400 --> 00:00:44,670
It was surprisingly fast and easy to use, and it was got pretty good accuracy, to be honest.

10
00:00:45,090 --> 00:00:50,790
It's still used today in a lot of applications, actually when we use open TV and we load that frontal

11
00:00:50,790 --> 00:00:51,660
fierce detector.

12
00:00:52,140 --> 00:00:53,290
It actually works quite well.

13
00:00:53,310 --> 00:00:58,170
I mean, even the deep learning methods nowadays or dylib and a bunch of the other algorithms, they

14
00:00:58,170 --> 00:01:00,510
do look better, but they're not substantially better.

15
00:01:01,320 --> 00:01:02,400
So how does it work?

16
00:01:02,430 --> 00:01:07,650
Well, it takes horror features as inputs, and we'll discuss what our features are in the next slide

17
00:01:08,190 --> 00:01:13,950
into a series of cascading classifiers and then uses a series of steps to determine if a face has been

18
00:01:13,950 --> 00:01:19,620
detected and horror classic classifiers are very effective.

19
00:01:19,800 --> 00:01:24,930
However, they are very difficult to develop and train, as well as optimize because they do rely on

20
00:01:24,930 --> 00:01:27,780
some manual parameter tweaking to adjust sensitivity.

21
00:01:28,320 --> 00:01:31,560
So it's not always easy to work with.

22
00:01:31,560 --> 00:01:35,670
And that's why it never took off for general object detection tests.

23
00:01:35,670 --> 00:01:42,990
But it worked very well with faces which tend tends to be sort of has limited interest class variability

24
00:01:43,380 --> 00:01:50,070
because we know with two eyes news is that the circle of a face those types of things, so it works

25
00:01:50,070 --> 00:01:50,700
well with that.

26
00:01:51,900 --> 00:01:57,330
So let's talk about how we can train on how Hardcastle Tusk Cascade classifiers actually work.

27
00:01:57,900 --> 00:02:04,080
Firstly, they're trained using the positive images of images, but faces and then images without faces.

28
00:02:04,230 --> 00:02:10,770
So we have negative images in that case, and features are extracting using rectangular blocks of various

29
00:02:10,770 --> 00:02:11,220
shapes.

30
00:02:11,250 --> 00:02:16,570
Those are the horror features I mentioned in the previous slide, and this is what they look like here.

31
00:02:16,590 --> 00:02:22,080
You can see there are various forms of edge detectors, lying detectors, full rectangle features,

32
00:02:22,080 --> 00:02:23,850
and there are actually many others, to be honest.

33
00:02:24,660 --> 00:02:31,140
And they use a process called into images that is used to speed up this process tremendously.

34
00:02:31,620 --> 00:02:35,660
And the researchers also used it to boost it a boost as a boosting technique.

35
00:02:35,700 --> 00:02:41,760
So you use a number of weak classifiers to build a strong classifier to improve the performance of accuracy

36
00:02:41,760 --> 00:02:43,140
with reduced features.

37
00:02:43,590 --> 00:02:49,740
So the volumes application of this method and used to the end stages of various features to detect faces.

38
00:02:49,740 --> 00:02:51,600
So you can see an illustration of it here.

39
00:02:51,960 --> 00:02:58,560
Stage one had one feature and it got no features, then moved to the stage two and you can see all the

40
00:02:58,560 --> 00:02:58,680
way.

41
00:02:58,680 --> 00:03:03,600
We keep increasing the number of features that these things that we're extracting all the way up to

42
00:03:03,600 --> 00:03:04,110
28.

43
00:03:06,000 --> 00:03:10,620
So now let's talk about a histogram of gradients with SVM sliding windows.

44
00:03:11,160 --> 00:03:14,760
So firstly, sliding windows seems intuitive.

45
00:03:14,790 --> 00:03:15,630
Do you know what it is?

46
00:03:16,080 --> 00:03:23,190
It's a method where we extract segments of our image piece by piece in the form of a rectangle rectangle

47
00:03:23,190 --> 00:03:24,120
extractor window.

48
00:03:24,600 --> 00:03:30,000
So remember when we have to be huddle convolution filters and we were multiplying the kernel by the

49
00:03:30,000 --> 00:03:30,360
image?

50
00:03:30,720 --> 00:03:35,850
It's the exact same pattern we move right to love and keep going down one step, right?

51
00:03:35,850 --> 00:03:37,140
So that's again, one step.

52
00:03:37,290 --> 00:03:42,210
And then eventually that's how it works and two different image sizes as well.

53
00:03:42,750 --> 00:03:49,800
So in the histogram of gradients, what we're doing here, we're calculating calculating dogs that in

54
00:03:49,800 --> 00:03:52,660
the wind in the window here, and this is what hogs look like.

55
00:03:52,680 --> 00:03:55,110
You can actually see it a bit here.

56
00:03:55,740 --> 00:04:02,490
It's basically the direction of the color of the intensity in that square in that region.

57
00:04:02,730 --> 00:04:03,380
We're looking at.

58
00:04:04,020 --> 00:04:10,440
And then once we have the hog features extracted from the block here, we can then feed it into SVM

59
00:04:10,440 --> 00:04:14,850
classifier to classify whether the object was in that window or not.

60
00:04:15,960 --> 00:04:22,920
So as you can see, previous object detection methods would basically this manual feature extraction

61
00:04:22,920 --> 00:04:24,900
methods combined with sliding windows.

62
00:04:25,800 --> 00:04:28,680
So was it like Hardcastle classifiers and.

63
00:04:29,670 --> 00:04:30,810
That's basically what they are.

64
00:04:31,680 --> 00:04:38,370
But we can eliminate eliminate some of the complexity by using sliding methods with CNN's as well.

65
00:04:38,850 --> 00:04:40,320
However, think about something here.

66
00:04:40,540 --> 00:04:46,410
Using a sliding window across an image results in hundreds, even thousands of classifications done

67
00:04:46,800 --> 00:04:47,730
on a single image.

68
00:04:47,820 --> 00:04:53,340
Because I mean, there's so many positions a sliding window can take over an image, especially if it's

69
00:04:53,340 --> 00:04:54,030
a large image.

70
00:04:54,540 --> 00:04:57,330
So the scaling issue is what size window do we use?

71
00:04:57,780 --> 00:04:59,940
Is permitting robust enough that?

72
00:05:00,230 --> 00:05:07,520
Scaling the image of different skills, we have one x two x 4x 8x and then getting smaller, I should

73
00:05:07,520 --> 00:05:08,600
say not larger.

74
00:05:09,740 --> 00:05:13,310
So what if the window isn't defined as we set it up to be?

75
00:05:13,670 --> 00:05:19,670
Do we use windows of different ratios that do use rectangular windows or square windows, so you can

76
00:05:19,910 --> 00:05:21,830
easily see how this could just blow up?

77
00:05:22,370 --> 00:05:23,000
And you can.

78
00:05:23,210 --> 00:05:28,160
Basically, it can be millions of classifications per image, especially if it's large.

79
00:05:28,700 --> 00:05:29,920
So it's an illustration.

80
00:05:29,930 --> 00:05:32,930
How do you slide the even slide left to right, right to left?

81
00:05:33,320 --> 00:05:36,890
Do you take a stride of two that is jumping to pixels at a time?

82
00:05:37,370 --> 00:05:38,450
So many questions.

83
00:05:38,690 --> 00:05:41,450
So how do we use hugs with sliding windows?

84
00:05:42,020 --> 00:05:45,920
Well, let's take a look at how to hug them with sliding windows works.

85
00:05:46,370 --> 00:05:54,080
So using a pixel grid, basically, we just take this box here and we compute the gradient of vector

86
00:05:54,080 --> 00:05:56,780
edge orientations at each pixel.

87
00:05:56,930 --> 00:05:58,720
That's how we get the system ropes.

88
00:05:59,450 --> 00:06:04,130
This generates 64 gradient vectors, which are then represented as a histogram.

89
00:06:05,120 --> 00:06:07,610
Each cell is split into angular bins.

90
00:06:08,060 --> 00:06:12,320
So the Angela bins are basically what the gradient was pointing to in that region, so you could have

91
00:06:12,320 --> 00:06:13,160
different angles.

92
00:06:13,580 --> 00:06:16,610
Remember gradients, movement level from intensity?

93
00:06:16,880 --> 00:06:23,270
So from like from this white to this darker color, the gradient will be pointing towards some direction

94
00:06:23,270 --> 00:06:24,680
and that's how you get angles here.

95
00:06:25,820 --> 00:06:31,280
So we can use different number of bins and de la La Land, for example, to use nine bins from zero

96
00:06:31,280 --> 00:06:34,580
to 180 degrees so you can do anything you want to reveal.

97
00:06:35,720 --> 00:06:42,920
And this effectively reduces 64 vectors to just nine values because we just have these these big space

98
00:06:42,920 --> 00:06:47,870
things here zero to 20, 2040, so we can scale it down into these bins.

99
00:06:49,340 --> 00:06:55,190
And as it's towards a gradient magnitudes, it's relatively immune to emissions, which is a good thing.

100
00:06:56,510 --> 00:07:02,540
So we don't normalize the gradients to ensure that invariant and illumination changes.

101
00:07:03,020 --> 00:07:09,050
Does that brightness and contrast example the images under right here you can see if we divide vectors

102
00:07:09,050 --> 00:07:14,210
by the gradient magnitude, we get point zero seven four or this is normalization.

103
00:07:14,630 --> 00:07:17,360
So this is a good illustration of normalization here.

104
00:07:17,480 --> 00:07:21,290
You can go through the mathematics if you want and take a look at it as well.

105
00:07:22,460 --> 00:07:28,790
So instead of individual window cell normalization, a method called block normalization was used.

106
00:07:29,240 --> 00:07:34,640
This takes into account neighbouring blocks that we normalize take into consideration the larger segments

107
00:07:34,640 --> 00:07:35,290
of the window.

108
00:07:35,300 --> 00:07:36,620
So something like this here?

109
00:07:38,300 --> 00:07:38,670
Oops.

110
00:07:39,980 --> 00:07:44,720
So that's basically it for histogram of gradients at sliding windows.

111
00:07:44,890 --> 00:07:48,390
This illustration for early object detection.

112
00:07:48,410 --> 00:07:54,560
I don't expect you to want to know much more about this because we don't especially use these methods

113
00:07:54,560 --> 00:07:55,010
anymore.

114
00:07:55,640 --> 00:08:00,800
But it's nevertheless it's good to understand the history of it so that when we move into the deep learning

115
00:08:00,800 --> 00:08:05,210
parts of it, you can appreciate that what the deep learning methods bring to the table.

116
00:08:05,720 --> 00:08:11,570
So next, we'll take a look at some of the metrics in how we assess object vector performance.

117
00:08:12,020 --> 00:08:18,140
And we'll starting with the intersection of reunion because this is the basis of the metric that is

118
00:08:18,140 --> 00:08:21,800
used to analyze how well our bounding boxes are located.

119
00:08:22,700 --> 00:08:23,540
So that's it.

120
00:08:23,540 --> 00:08:25,640
For this lesson, I'll see you in the next section.

121
00:08:25,830 --> 00:08:26,050
Thank.