1
00:00:00,060 --> 00:00:06,380
So now let's move on to the lesson on face and eye detection using horror cascade classifiers.

2
00:00:06,390 --> 00:00:08,400
So this is a pretty cool and fun project.

3
00:00:08,400 --> 00:00:11,970
We're getting into the more nifty areas of open TV.

4
00:00:12,510 --> 00:00:14,740
So in this lesson, what we're going to do?

5
00:00:14,910 --> 00:00:20,910
We're going to go through how you detect faces using a hard cascade classifier that's been previously

6
00:00:20,910 --> 00:00:24,480
trained to detect faces and also one that can detect eyes.

7
00:00:25,050 --> 00:00:30,630
And just in case you're wondering, you actually can access your webcam in club, and I'm going to show

8
00:00:30,630 --> 00:00:36,540
you how to do that and you can use your webcam picture assuming your laptop or desktop has a webcam.

9
00:00:37,140 --> 00:00:42,660
Get a snapshot of yourself and you can run your face detection and eye detection on that image.

10
00:00:42,810 --> 00:00:43,710
So that's pretty cool.

11
00:00:44,370 --> 00:00:48,670
So now let's just look at our libraries and ensure function as well as download.

12
00:00:48,690 --> 00:00:50,520
We're downloading not just images.

13
00:00:50,580 --> 00:00:54,480
This time we're downloading the horror cascade classifier from my Google Drive.

14
00:00:55,050 --> 00:01:01,080
So these these classifiers are basically models, and these are models that have been trained on data

15
00:01:01,080 --> 00:01:04,260
previously, which I will explain briefly a bit how it's trained.

16
00:01:05,220 --> 00:01:10,680
It's been trained to identify faces in one instance and in the other model detects eyes.

17
00:01:11,220 --> 00:01:14,070
So let's go ahead and do that actually already run this code.

18
00:01:14,070 --> 00:01:17,550
So it's going to ask me to rename or to suppress a.

19
00:01:21,400 --> 00:01:25,240
OK, so firstly, what is object detection?

20
00:01:25,990 --> 00:01:32,920
Object detection is the ability of an algorithm, a computer vision algorithm to identify individual

21
00:01:32,920 --> 00:01:37,610
objects in an image like this and correctly classify the object class.

22
00:01:37,690 --> 00:01:40,210
So this is a picture illustrates what I'm trying to say.

23
00:01:40,780 --> 00:01:49,390
Instead of seeing this entire image is a bicycle dog and and tree and a porch, which some classifies

24
00:01:49,660 --> 00:01:51,010
which image classifiers do.

25
00:01:51,790 --> 00:01:59,170
This one is actually going to draw bounding boxes over individual objects, which is a much harder thing

26
00:01:59,170 --> 00:02:04,840
to do for for computer vision and then draw these boxes around it with a class label.

27
00:02:05,500 --> 00:02:07,210
So this is a pretty cool and pretty useful.

28
00:02:07,210 --> 00:02:12,850
And currently this is actually a picture for your detector and not a hard cascade classifier, which

29
00:02:12,850 --> 00:02:14,170
is currently state of the art.

30
00:02:15,220 --> 00:02:18,130
So, but why are we talking about higher cascade classifiers?

31
00:02:18,910 --> 00:02:24,670
Well, they were the first real working optical textures detectors that worked fairly well and very

32
00:02:24,670 --> 00:02:25,210
efficiently.

33
00:02:25,840 --> 00:02:29,410
They were developed by Viola and Jones in 2001, and it was.

34
00:02:29,920 --> 00:02:35,830
It's probably been used in so many camera systems and better systems for face detection just because

35
00:02:35,830 --> 00:02:36,610
it's so fast.

36
00:02:37,240 --> 00:02:39,550
And I'll explain briefly how it works.

37
00:02:40,030 --> 00:02:46,690
So what it does, it uses a concept of sliding windows to basically slide these these images across.

38
00:02:46,690 --> 00:02:55,090
Is that this a picture like this and it does a convolution on top of this image and extracts those features

39
00:02:55,090 --> 00:02:56,170
and those features.

40
00:02:57,130 --> 00:03:01,870
It's we have lots of different edge features, line features for rectangle features and a bunch of others.

41
00:03:02,470 --> 00:03:04,930
Those features will correspond to a fierce like.

42
00:03:04,930 --> 00:03:10,600
Combinations of those features together correspond to a face, and those classifiers are trained to

43
00:03:10,600 --> 00:03:13,860
identify what what the different sequence.

44
00:03:13,870 --> 00:03:21,670
I can probably describe it as that the sequence of values that correspond to a person's face, at least

45
00:03:22,000 --> 00:03:22,960
whatever it's trained on.

46
00:03:23,530 --> 00:03:27,220
And to train this, you basically just need a bunch of positive images.

47
00:03:27,220 --> 00:03:30,910
That's images where the object is present and negative images.

48
00:03:31,330 --> 00:03:35,350
So that's how it learns to differentiate when a face is there and when a face isn't there.

49
00:03:36,430 --> 00:03:41,460
So let's start using Ohara Cascade classifier in open CVI.

50
00:03:42,160 --> 00:03:48,940
So the first thing we do, we use CBT with our cascade classifier, and we pointed that the path where

51
00:03:49,270 --> 00:03:51,640
we where we have a similar model.

52
00:03:52,120 --> 00:03:58,000
So you can see here where we don't look at it, you can see it's created a folder and it's towards these

53
00:03:58,000 --> 00:04:02,080
models where we actually have one for cause as well as as well as a full human body.

54
00:04:03,040 --> 00:04:09,730
So we can just point to the path of that animal model next to a LoDo image, which we've done previously

55
00:04:09,730 --> 00:04:11,920
before we converted to Grayscale.

56
00:04:11,950 --> 00:04:14,110
Now this step isn't necessary.

57
00:04:14,500 --> 00:04:21,190
However, Schakowsky classify classification does work a lot faster when the grayscale image.

58
00:04:21,820 --> 00:04:28,840
And there's one thing I should note it's kind of annoying, but her her cascade classifier isn't an

59
00:04:28,840 --> 00:04:32,880
image classifier, which you learn when we go onto convolutional neural nets.

60
00:04:32,890 --> 00:04:39,400
What a classifiers classifier tends to classify the entire image, feed an input image in and it tells

61
00:04:39,400 --> 00:04:43,090
you what's in that image or what class an image belong to.

62
00:04:43,510 --> 00:04:45,430
I should say so.

63
00:04:46,510 --> 00:04:49,780
Object detector, though, is what her cascade really is.

64
00:04:49,780 --> 00:04:52,090
It should be called the higher cascade object detector.

65
00:04:52,450 --> 00:04:55,750
But nevertheless, it's called RKC classifiers who will go with it.

66
00:04:56,260 --> 00:05:03,100
So we've created our first classifier object here and now that has a function called Detect Multi Skill.

67
00:05:03,610 --> 00:05:05,440
This is where we feed in the input image.

68
00:05:05,560 --> 00:05:10,520
The first parameter we can set scale factor, as well as a minimum neighbors.

69
00:05:10,520 --> 00:05:15,490
So the skill factor and minimum Labor's configuration parameters that adjust the sensitivity.

70
00:05:15,970 --> 00:05:21,220
So if you load this, you can get more boxes on the face and skill factor as well.

71
00:05:21,220 --> 00:05:23,080
If you live with this or increased as you can.

72
00:05:23,710 --> 00:05:27,880
Depends on the type of image and type of face or size of the faces in the image.

73
00:05:29,110 --> 00:05:33,370
You will get probably more sensitive, more sensitive detection.

74
00:05:34,480 --> 00:05:40,930
So that's just this line, and this line basically grabs two faces and extracts into an array.

75
00:05:41,650 --> 00:05:44,650
So if the faces is basically none, this is what this means.

76
00:05:44,650 --> 00:05:46,800
Here is none can use that as well.

77
00:05:46,810 --> 00:05:49,630
And then it will print new faces found.

78
00:05:49,720 --> 00:05:56,080
So at least we know no faces have been found, and we want wonder if all of the is working or not.

79
00:05:56,740 --> 00:06:03,610
If it does detect faces, do we just iterate through it and this here x y it w and h width and height?

80
00:06:04,180 --> 00:06:06,350
That's the bounding box dimensions for.

81
00:06:07,100 --> 00:06:11,110
Well, not really demanding boxes is the X and Y starting points.

82
00:06:11,200 --> 00:06:13,300
And this is the width and height.

83
00:06:13,540 --> 00:06:17,560
So, you know, let's say this is X and Y is this point here where I'm pointing.

84
00:06:18,340 --> 00:06:19,200
And the.

85
00:06:19,320 --> 00:06:25,690
It is this direction and the height is destruction down, so we use that to destroy the key to using

86
00:06:25,690 --> 00:06:30,740
the CBD rectangle function to draw the bounding box over it, and we just try to get a pink light.

87
00:06:30,780 --> 00:06:33,690
That's what this rgv bgr, I should say.

88
00:06:34,110 --> 00:06:39,120
Color scheme is corresponds to this of thickness and we just use to improve function.

89
00:06:39,120 --> 00:06:42,900
Look, we've done many times before to show the final image.

90
00:06:43,230 --> 00:06:46,230
So that's it working for Mr. Donald Trump.

91
00:06:47,310 --> 00:06:50,070
So now let's move on to face and eye detection.

92
00:06:50,730 --> 00:06:52,200
So what's different in this good?

93
00:06:52,620 --> 00:06:59,700
In this good, we're creating two objects here first classifier and AI classifier next what we do.

94
00:07:00,090 --> 00:07:02,340
We just run it and get the faces as well.

95
00:07:02,970 --> 00:07:06,630
If new faces are found, that should mean no eyes are found.

96
00:07:07,110 --> 00:07:11,640
However, if a face has been found, we do draw a box around it like we did previously.

97
00:07:12,180 --> 00:07:13,920
However, these new lines have good.

98
00:07:14,010 --> 00:07:14,940
Let's see what they do.

99
00:07:15,660 --> 00:07:16,680
What is this doing here?

100
00:07:16,710 --> 00:07:23,880
This is actually cropping freezing to grayscale image we've created here with Lauderhill.

101
00:07:24,870 --> 00:07:30,780
It's cropping the face out and then it's actually doing it similarly for the color image as well.

102
00:07:31,170 --> 00:07:34,040
So we can test it either on the color or gray.

103
00:07:34,050 --> 00:07:38,010
But what we do, we'll use degree and I'll show you why we crop the color out afterward.

104
00:07:38,850 --> 00:07:41,040
So, no, we just feed the arrow.

105
00:07:41,040 --> 00:07:41,660
I agree.

106
00:07:41,670 --> 00:07:47,700
That's the region of the face only into the eye classifier here because we notice the face.

107
00:07:47,820 --> 00:07:55,140
So what we do, we just feed our face input that extracted face into a classifier detector.

108
00:07:56,250 --> 00:08:02,010
And then for the eyes, if we get any ice, we just loop through the eyes and draw rectangles around

109
00:08:02,010 --> 00:08:02,170
them.

110
00:08:02,820 --> 00:08:09,210
So that's what this would give us, and you can see how badly it thinks Donald Trump has two left,

111
00:08:09,450 --> 00:08:12,480
which is actually actually that's his right eye to right eyes.

112
00:08:12,900 --> 00:08:17,520
And that's because the parameters were probably set too sensitive.

113
00:08:18,340 --> 00:08:23,910
That said, we use the default parameters here, so we can perhaps set something different.

114
00:08:23,970 --> 00:08:27,570
Let's try one point five and let's try seven.

115
00:08:28,530 --> 00:08:29,820
So let's see what that gives us.

116
00:08:32,380 --> 00:08:32,710
OK.

117
00:08:32,870 --> 00:08:36,220
And it actually eliminated ICE, which is probably not what we wanted.

118
00:08:36,760 --> 00:08:38,950
So let's be reduced to minimum neighbors.

119
00:08:39,310 --> 00:08:41,210
Firstly, that didn't help.

120
00:08:41,230 --> 00:08:42,520
Let's go back to one point three.

121
00:08:45,690 --> 00:08:46,620
That didn't help again.

122
00:08:47,250 --> 00:08:52,860
So actually, I think I'm not sure if these are the V4 values, but we can just leave it and go back

123
00:08:52,860 --> 00:08:54,030
to the default here.

124
00:08:54,810 --> 00:08:56,960
So let's see, let's try something else.

125
00:08:56,970 --> 00:08:57,420
That's right.

126
00:08:57,600 --> 00:08:59,910
One point one entry.

127
00:09:02,700 --> 00:09:03,250
There we go.

128
00:09:03,270 --> 00:09:07,520
So we know that these are lower values means it's more sensitive.

129
00:09:07,530 --> 00:09:08,790
So let's see if we can get it to us.

130
00:09:08,820 --> 00:09:10,050
Get two ways.

131
00:09:10,050 --> 00:09:10,590
And there we go.

132
00:09:10,620 --> 00:09:12,200
So these are the settings to use.

133
00:09:12,210 --> 00:09:15,540
Hopefully, you can get a feel of what these settings do.

134
00:09:16,200 --> 00:09:20,430
You can read the open CV documentation to get the explicit definition of what they do.

135
00:09:21,030 --> 00:09:25,530
I already told you that this is the skill factor and this isn't the minimum neighbors.

136
00:09:25,950 --> 00:09:30,960
They may have a deeper explanation on to how to interpret those things for your images here.

137
00:09:31,890 --> 00:09:35,940
In the meantime, though, this is the result of playing with those parameters.

138
00:09:36,840 --> 00:09:39,930
So, no, let's take a look at Google collapse.

139
00:09:40,560 --> 00:09:42,360
This is a code here for using Google.

140
00:09:42,360 --> 00:09:44,430
Call up to take snapshots from a camera.

141
00:09:45,030 --> 00:09:49,890
Now, I'm not going to go through this good because actually, I probably haven't actually inspected

142
00:09:49,890 --> 00:09:51,030
the school properly before.

143
00:09:52,260 --> 00:09:58,670
But what it does it basically it's a boilerplate function that Google provided to use collab.

144
00:09:59,070 --> 00:10:01,170
That's do you know what code that is?

145
00:10:01,170 --> 00:10:05,040
That's what we're using right now to use called up to get snapshots from a webcam.

146
00:10:05,550 --> 00:10:11,250
So let's run this code and say, Well, see this function, take a photo.

147
00:10:12,120 --> 00:10:14,520
And now let's use to take for the function right here.

148
00:10:15,030 --> 00:10:20,130
So I'm going to run this block of code, and it's going to ask you for permission to access your webcam.

149
00:10:21,000 --> 00:10:21,660
So what do you do?

150
00:10:21,690 --> 00:10:28,470
You hit a low, which gives the browser the permission to use a webcam, and it should pop up here.

151
00:10:28,500 --> 00:10:28,840
Yes.

152
00:10:28,860 --> 00:10:29,400
Here it is.

153
00:10:29,970 --> 00:10:31,890
So let's capture a picture.

154
00:10:32,820 --> 00:10:34,050
You stay still.

155
00:10:36,240 --> 00:10:39,180
OK, not the best picture, but we got a picture of myself.

156
00:10:39,900 --> 00:10:42,180
So, no, let's reduce the code.

157
00:10:42,180 --> 00:10:47,260
We both are using my photo, which is has been saved as football right here.

158
00:10:47,280 --> 00:10:47,790
You can see.

159
00:10:48,600 --> 00:10:49,260
Let's run, OK?

160
00:10:49,510 --> 00:10:56,520
OK, Hardcastle classifier on that image and you can see things that my pimple is I, which is unfortunate.

161
00:10:57,780 --> 00:11:02,040
But we I showed you how to adjust the sensitivity scale of this here.

162
00:11:02,790 --> 00:11:05,360
So that concludes this lesson.

163
00:11:05,370 --> 00:11:07,410
I've also put in some boarding school here.

164
00:11:07,410 --> 00:11:12,480
So if you wanted to use your local webcam to do it in a live video, you can run this code here.

165
00:11:12,510 --> 00:11:15,240
This could actually opens a webcam here.

166
00:11:15,810 --> 00:11:22,260
And while true, which means that a while the webcam is open and receiving images, it loads the images

167
00:11:22,260 --> 00:11:25,620
here and runs his face detector function that we create here.

168
00:11:26,280 --> 00:11:30,660
This feature that detect function is basically the same code right here.

169
00:11:31,950 --> 00:11:39,630
However, we've put this no into a function, and that function basically just plots the image on your

170
00:11:39,630 --> 00:11:40,530
live webcam.

171
00:11:41,250 --> 00:11:48,120
So if you have OpenCV installed locally on your machine, probably using Conda or one of the other bindings

172
00:11:48,120 --> 00:11:54,300
that that has opened so many, many different ways to install it on many different systems, which is

173
00:11:54,300 --> 00:12:01,110
why I avoided that messy install setup in this course, because CoLab is actually just super nice and

174
00:12:01,110 --> 00:12:01,700
easy to use.

175
00:12:01,710 --> 00:12:08,070
I use CoLab, many of my computer vision and data science colleagues use CoLab or Jupyter notebooks,

176
00:12:08,070 --> 00:12:10,860
which connected to virtual machine, so it's roughly the same thing.

177
00:12:11,550 --> 00:12:13,260
So instead of using local machines.

178
00:12:13,500 --> 00:12:15,450
So that's it for this lesson.

179
00:12:15,480 --> 00:12:22,200
Hope you enjoyed it, and we're going to move on now to vehicle and pedestrian detection, also using

180
00:12:22,200 --> 00:12:23,490
our cascade classifiers.

181
00:12:24,060 --> 00:12:25,290
So welcome.

182
00:12:25,650 --> 00:12:26,860
I'll see you in the next lesson.

183
00:12:26,910 --> 00:12:27,260
Thank you.