1
00:00:00,330 --> 00:00:06,540
And welcome to Lesson 17, where we take a look at using a vehicle and pedestrian detectors, which

2
00:00:06,540 --> 00:00:10,440
are actually also our cascade classifiers, which I mentioned in the previous lesson.

3
00:00:11,340 --> 00:00:16,740
So go ahead and open your vehicle and pedestrian in Python notebook.

4
00:00:17,430 --> 00:00:19,440
And let's first let us run.

5
00:00:19,440 --> 00:00:25,050
This could load all our classifiers and look at all the images and load libraries of a function.

6
00:00:25,710 --> 00:00:27,450
So get to get it out of the way.

7
00:00:27,460 --> 00:00:32,190
And while that loads, I'm just going to tell you what we're going to do differently in this lesson

8
00:00:32,520 --> 00:00:39,300
is that we're going to run a pre-trained hard cascade classifier to detect pedestrians and then actually

9
00:00:39,300 --> 00:00:42,930
use it, use it on a video so you can actually see it on a video within CoLab.

10
00:00:43,440 --> 00:00:49,470
So this lesson doubles as the teaching method of how to use, how to load videos and show videos and

11
00:00:49,740 --> 00:00:49,950
up.

12
00:00:50,520 --> 00:00:53,130
So let's scroll down and we'll see how that works.

13
00:00:53,970 --> 00:01:01,560
So firstly, what I'm going to do before before I move on to the videos, I want to illustrate how we

14
00:01:01,560 --> 00:01:06,990
actually how videos are processed in computer vision videos on processed differently to images.

15
00:01:07,260 --> 00:01:10,650
That's because the video is to treat it as a sequence of images.

16
00:01:11,190 --> 00:01:13,380
So remember, videos have frame rates.

17
00:01:13,800 --> 00:01:17,760
A frame rate is essentially how much images it captures per second.

18
00:01:18,240 --> 00:01:22,240
So standard videos are 24 20 frames per second.

19
00:01:22,260 --> 00:01:28,410
Most webcam and digital cameras do 30 frames per second, but movie cameras tend to do 24.

20
00:01:28,410 --> 00:01:31,320
But most things do to do most videos.

21
00:01:31,320 --> 00:01:34,110
You would have come to online at 30 frames per second.

22
00:01:35,010 --> 00:01:42,530
So how this works, you actually use to capture video capture function, and you pointed to your video.

23
00:01:42,540 --> 00:01:47,250
So this video was downloaded above you in case you're wondering where it came from for my Google Drive

24
00:01:47,700 --> 00:01:48,390
using genome.

25
00:01:49,110 --> 00:01:55,140
So you can see the file is here walking thought and before and before sunset impactful, which is a

26
00:01:55,140 --> 00:02:01,380
video format along with Evi and MTV and bunch of others.

27
00:02:01,380 --> 00:02:03,240
We wouldn't get into video formats here.

28
00:02:03,780 --> 00:02:10,830
What I'll tell you do is open TV directly opens and before FBI FBI files, and it can load images directly

29
00:02:10,830 --> 00:02:12,570
from that video as a sequence.

30
00:02:13,110 --> 00:02:14,820
So what we do here?

31
00:02:15,600 --> 00:02:17,320
We use something called Cap Doc read.

32
00:02:17,370 --> 00:02:22,680
Remember, we point we created this object called cap that's pointing to a video captured.

33
00:02:22,680 --> 00:02:29,370
Reid allows us to read the first frame of a video, so and read it basically returns true or false if

34
00:02:29,370 --> 00:02:30,750
it was successfully read.

35
00:02:31,590 --> 00:02:37,880
So if we use if RET is true, we can run this car cascade classifier and hide out.

36
00:02:37,890 --> 00:02:45,480
But for you, just so you can can see it afterwards and converting to grayscale that runs it into our

37
00:02:45,720 --> 00:02:51,150
body detection or people detector whatever you want to call it, and then draws a bounding boxes on

38
00:02:51,150 --> 00:02:51,440
it.

39
00:02:51,450 --> 00:02:57,510
And then we have kept at release, which basically ends to hold like tells the program that we're no

40
00:02:57,510 --> 00:02:58,500
longer looking at that video.

41
00:02:58,510 --> 00:02:59,910
So we released a video.

42
00:03:00,270 --> 00:03:07,590
So we basically stop all of the programming that that allowed us to open that video and then we showed

43
00:03:07,590 --> 00:03:09,960
a frame after we draw upon the box on it.

44
00:03:10,500 --> 00:03:17,520
So let's run this block of code and you can see it got two of the pedestrians here when these yellow

45
00:03:17,520 --> 00:03:18,900
boxes have missed this one.

46
00:03:19,500 --> 00:03:25,830
But this is pretty good for for such an old technology in terms of like our classical classifiers are

47
00:03:25,830 --> 00:03:26,970
quite outdated right now.

48
00:03:27,390 --> 00:03:32,280
There are YOLO and all the others will do in this course later on a much more advanced and state of

49
00:03:32,280 --> 00:03:34,290
the art because they take advantage of deep learning.

50
00:03:34,830 --> 00:03:36,180
This still does work pretty well.

51
00:03:36,780 --> 00:03:43,140
Her cascade classifiers can actually work fairly well for singular objects, as that is one class of

52
00:03:43,140 --> 00:03:47,340
objects that doesn't change, that doesn't have too much variation in them.

53
00:03:48,840 --> 00:03:52,560
People are deformable class, which means that they can be in different positions.

54
00:03:53,130 --> 00:03:58,380
And it does work fairly well on this screenshot here, but not it misses this person here.

55
00:03:59,160 --> 00:04:02,880
So, no, let's run this on a 15 second clip.

56
00:04:03,540 --> 00:04:07,010
So to do that, we're going to use something called CV to a writer.

57
00:04:07,530 --> 00:04:12,630
What that does is let me run this because this block of code takes about a minute to run.

58
00:04:13,440 --> 00:04:16,950
So we run this on a while.

59
00:04:16,950 --> 00:04:18,000
We're waiting for the output.

60
00:04:18,630 --> 00:04:19,800
I'll explain the score to you.

61
00:04:19,950 --> 00:04:28,170
So when we're using we, we use the same CV to capture a function just to load a video and what we do

62
00:04:28,170 --> 00:04:33,330
here, we use cup or get to get the width and height of that image.

63
00:04:33,840 --> 00:04:37,950
That's so that we can actually create our video writer with that information.

64
00:04:37,950 --> 00:04:38,720
You can see it here.

65
00:04:38,740 --> 00:04:39,540
The W and H.

66
00:04:40,080 --> 00:04:42,710
What this does see a YouTube video, right?

67
00:04:42,720 --> 00:04:47,190
It allows us to write a sequence of images to a video, which is a pretty cool.

68
00:04:47,200 --> 00:04:53,250
So if you have some process that's creating a whole sequence of images, you can actually use the C

69
00:04:53,250 --> 00:04:57,450
V to the video writer to create that video.

70
00:04:58,020 --> 00:04:59,820
So what we do, we specify the output and.

71
00:04:59,950 --> 00:05:06,880
Over here in the first parameter, second parameter, we specify what format we're going to do, so

72
00:05:07,360 --> 00:05:08,110
it's a bit odd.

73
00:05:08,170 --> 00:05:14,110
I'm not even sure why it's coded like this, but we specify that it's going to be MJG peg.

74
00:05:14,500 --> 00:05:15,580
OK, that's that's due.

75
00:05:16,270 --> 00:05:20,050
That's the video right to encoder that we're using.

76
00:05:21,010 --> 00:05:22,450
2D is the frames per second.

77
00:05:22,450 --> 00:05:24,400
So that's how you do the timing of the video.

78
00:05:25,210 --> 00:05:32,140
And then that's it for now that will use the out object later on in this thing, in this phone, in

79
00:05:32,140 --> 00:05:36,100
this loop, I should say, shouldn't call a thing, even though this thing isn't it?

80
00:05:36,820 --> 00:05:38,230
Well, moving on.

81
00:05:38,440 --> 00:05:43,480
So we have the body detector, which we have seen this before in the previous lessons where we just

82
00:05:43,480 --> 00:05:49,570
create this using CVT cascade classifier and low end point to the body detector, which is what we would

83
00:05:49,570 --> 00:05:51,160
have done up here as well.

84
00:05:52,000 --> 00:05:52,870
I guess you missed it.

85
00:05:53,680 --> 00:05:59,080
And then we just wrote about the boxes here and now you can see we're doing something called output

86
00:05:59,150 --> 00:06:00,270
right out.

87
00:06:00,280 --> 00:06:03,370
The right allows us to rate each frame while the is run.

88
00:06:04,120 --> 00:06:06,250
Each frame has been written to our dot right.

89
00:06:06,580 --> 00:06:08,620
And we do do cabinet release.

90
00:06:08,620 --> 00:06:10,150
All that released saved video.

91
00:06:10,780 --> 00:06:13,300
So while we do this, actually it finished.

92
00:06:13,300 --> 00:06:18,640
It's 42 seconds, 47 seconds and you can see the walking, I believe, if I was dead.

93
00:06:19,120 --> 00:06:22,570
So let's take a look and see how he played this file within color.

94
00:06:23,650 --> 00:06:26,200
So CoLab is a bit weird.

95
00:06:26,710 --> 00:06:28,600
It doesn't play EVF files.

96
00:06:28,600 --> 00:06:31,150
It plays only MPEG four files.

97
00:06:31,440 --> 00:06:37,480
It consists of a property of UFO, so we have to use something called FFmpeg, which is a very cool

98
00:06:38,020 --> 00:06:42,910
to minimal tool to video conversion, file formats and editing and splicing.

99
00:06:43,270 --> 00:06:44,220
I use it quite quite a bit.

100
00:06:44,230 --> 00:06:45,520
It's very efficient.

101
00:06:46,510 --> 00:06:53,350
So what we do, we run this and it it basically converts this file to a MPEG4 file.

102
00:06:54,010 --> 00:06:55,900
So let's go ahead and run this.

103
00:06:58,090 --> 00:07:06,190
We use the Dash wife argument here because just two sometimes these terminal programs, you prompt you

104
00:07:06,610 --> 00:07:07,510
two types.

105
00:07:08,200 --> 00:07:09,610
So there's automatically to us type.

106
00:07:09,790 --> 00:07:15,550
Yes, so it can proceed with the execution of it, and it takes seven seconds to run.

107
00:07:15,550 --> 00:07:16,300
So that's pretty quick.

108
00:07:17,110 --> 00:07:23,290
And then what we do know, this is how I python well in this clever book plays videos.

109
00:07:23,290 --> 00:07:29,110
So we just import these these functions here from the HTML so we can get the remote controls for video

110
00:07:29,890 --> 00:07:35,450
and then we'll do it over video here by doing open, pointing to the file and using it to create it

111
00:07:35,470 --> 00:07:36,100
to read it.

112
00:07:36,910 --> 00:07:39,360
You don't actually have to understand all of this code.

113
00:07:39,370 --> 00:07:41,860
This is this code is a lot of it's scored as boilerplate code.

114
00:07:42,280 --> 00:07:46,240
You just have to know how to use it if when you want to double your computer vision applications.

115
00:07:47,230 --> 00:07:48,160
So we set the data.

116
00:07:48,160 --> 00:07:49,650
You are, it's just here.

117
00:07:49,660 --> 00:07:50,590
This is what this is.

118
00:07:50,590 --> 00:07:54,670
What points to it here, right?

119
00:07:54,700 --> 00:07:55,480
You actually can see it here.

120
00:07:56,110 --> 00:07:58,300
So this is this brings up the controls for the video.

121
00:07:58,300 --> 00:07:59,470
So let's run this now.

122
00:08:00,430 --> 00:08:04,810
I know that's a lot of the controls or video player essentially or each demo video player.

123
00:08:07,510 --> 00:08:08,860
Which should come up shortly.

124
00:08:09,370 --> 00:08:09,940
There it is.

125
00:08:10,660 --> 00:08:16,450
So if you press play, no took a little while to come up, but you can see it plays a video where we

126
00:08:16,450 --> 00:08:17,200
get to pedestrians.

127
00:08:17,200 --> 00:08:18,040
So this is pretty cool.

128
00:08:18,700 --> 00:08:22,510
So now you can do a lot of testing within call up and testing videos here.

129
00:08:22,900 --> 00:08:27,280
So you don't always have to go through this messy and sort of set of these things on your local system

130
00:08:27,700 --> 00:08:31,420
can use a free processing powers offered by Google for you.

131
00:08:32,860 --> 00:08:35,530
So now let's move on to vehicle detection.

132
00:08:35,740 --> 00:08:40,630
So I'm not going to go through all of the code again, line by line like I did because it's the same

133
00:08:40,630 --> 00:08:41,020
standard.

134
00:08:41,020 --> 00:08:46,540
Could all we do differently is going to cause video and then point to the horror cascade car model.

135
00:08:47,500 --> 00:08:53,080
And we load this and you can see we just get it onto the car right here.

136
00:08:53,110 --> 00:08:57,070
This is just one of the causes identified in this, a bunch of them.

137
00:08:57,610 --> 00:08:59,830
So the vehicle that isn't too great.

138
00:09:00,430 --> 00:09:04,360
So let's also test this on the 15 second clip of the cause.

139
00:09:07,060 --> 00:09:09,770
I actually believe this wasn't 15 seconds.

140
00:09:09,790 --> 00:09:15,520
Actually, it was maybe this isn't 15 seconds and we have just pasted the title in, but we'll soon

141
00:09:15,520 --> 00:09:16,030
find out.

142
00:09:17,380 --> 00:09:18,580
Now that's convert to video.

143
00:09:20,980 --> 00:09:28,600
Let's load up and create a video each, you know, bring up the issue mode controls.

144
00:09:31,380 --> 00:09:31,890
There we go.

145
00:09:32,610 --> 00:09:34,980
So it's loaded 12 second video.

146
00:09:35,610 --> 00:09:38,220
OK, close enough so you can see it.

147
00:09:38,470 --> 00:09:39,810
It's not too bad, actually.

148
00:09:40,080 --> 00:09:46,680
It gets to cause in most cases all the time, but it generally works decently well.

149
00:09:47,430 --> 00:09:55,290
So that concludes this lesson when using horror classic classifiers for pedestrian detection, as well

150
00:09:55,290 --> 00:09:59,070
as using them on videos within CoLab, which I think is pretty cool.

151
00:09:59,850 --> 00:10:05,940
So we'll stop now and then we'll move on to the next lesson where we take a look at perspective transforms,

152
00:10:05,940 --> 00:10:08,460
which is a bit more advanced concept.

153
00:10:08,610 --> 00:10:11,790
It's a bit more of an advance on this concept in open TV.

154
00:10:12,540 --> 00:10:14,760
So I'll see you in the next lesson.

155
00:10:14,880 --> 00:10:15,300
Thank you.