1
00:00:00,060 --> 00:00:07,020
Hi and welcome to Lesson 24, where we take a look after motion tracking and means we're using the means

2
00:00:07,020 --> 00:00:08,820
shift and camshaft algorithms.

3
00:00:09,390 --> 00:00:12,210
So go ahead and open that notebook.

4
00:00:12,720 --> 00:00:14,020
So let's look at our libraries here.

5
00:00:14,040 --> 00:00:14,400
You can run.

6
00:00:14,400 --> 00:00:18,060
This could have already run this, so don't want to overwrite the images and have to wait.

7
00:00:18,630 --> 00:00:21,480
So let me explain to you what tracking is.

8
00:00:21,870 --> 00:00:29,580
So imagine you have a moving person or moving vehicle from like your CCTV video and you want to focus

9
00:00:29,580 --> 00:00:36,960
in on that person you want to like basically draw a box and move that box over the person as he or she

10
00:00:36,960 --> 00:00:39,210
or the car moves in the video.

11
00:00:39,840 --> 00:00:41,160
That's what tracking is.

12
00:00:41,670 --> 00:00:47,670
So let's take a look at two very old algorithms because many different camshaft have been around for

13
00:00:47,670 --> 00:00:51,570
at least 10 years or more, and we will see how it works.

14
00:00:52,170 --> 00:00:55,080
So let me explain to you what mean shift is.

15
00:00:55,590 --> 00:01:03,000
So the principle behind means shift is that imagine we establish a window with is over here.

16
00:01:03,000 --> 00:01:07,240
Let's wait for it to go back and establish a window appear.

17
00:01:07,860 --> 00:01:08,670
And what you do.

18
00:01:08,710 --> 00:01:14,460
You move this window iteratively to the the most intense parts of the frame.

19
00:01:15,030 --> 00:01:16,050
But how is that done?

20
00:01:16,680 --> 00:01:24,990
Well, it's done because we need to consider the histogram of of that of say, let's say it's intensities,

21
00:01:24,990 --> 00:01:29,400
color intensities in the initial bounding box that we established.

22
00:01:30,120 --> 00:01:35,880
And then you just set some criteria to look, move around and look for the next brightest spot around

23
00:01:35,880 --> 00:01:36,360
that image.

24
00:01:36,840 --> 00:01:38,340
And you just move it iteratively.

25
00:01:38,340 --> 00:01:44,220
So you're moving toward the densest area of intensity in this case, and you can use anything you want.

26
00:01:44,220 --> 00:01:51,630
You can use intensity of red blue green intensity of, say, saturation and HSV color space.

27
00:01:52,170 --> 00:01:53,820
So that's effectively what it is.

28
00:01:53,820 --> 00:01:57,450
And you can read more about this algorithm here if you click the link.

29
00:01:58,080 --> 00:02:01,800
But now let's take a look at how the actual algorithm is implemented.

30
00:02:02,910 --> 00:02:10,020
So since it's a video, we're doing all of our video writing back here so we can do split in club.

31
00:02:11,310 --> 00:02:12,930
This is our initial location.

32
00:02:13,830 --> 00:02:19,410
So we hardcoded initial location of the bounding box that we want to use and call that our truck window.

33
00:02:20,280 --> 00:02:23,000
And then what we do is set up the array for tracking.

34
00:02:23,010 --> 00:02:23,820
So we have that.

35
00:02:23,820 --> 00:02:25,500
We have the truck windows parameters here.

36
00:02:25,950 --> 00:02:27,720
So we extract the initial frame.

37
00:02:28,200 --> 00:02:29,670
We converted to HSV.

38
00:02:29,910 --> 00:02:35,200
We do a mass to filter on the most intense areas of that image.

39
00:02:35,200 --> 00:02:40,370
You can see it's zero to 180 in each, which is so you.

40
00:02:40,980 --> 00:02:45,330
And then you look at the saturation as well and the values that we're all looking at the brightest areas

41
00:02:45,330 --> 00:02:45,990
in the image.

42
00:02:46,680 --> 00:02:53,010
And then we calculate the histogram, which all of this you've seen before using this HSV with the mask.

43
00:02:53,790 --> 00:02:56,640
And then we normalize it, normalizing it.

44
00:02:56,640 --> 00:03:00,930
Just make sure that from frame to frame, it's sort of consistent in the same range.

45
00:03:01,140 --> 00:03:02,820
So we don't have any big jumps.

46
00:03:03,780 --> 00:03:06,600
And then we set up the termination criteria for the algorithm.

47
00:03:07,050 --> 00:03:11,790
So either 10 interactions or we move at least or move it by at least one point.

48
00:03:11,820 --> 00:03:12,780
So what does that mean?

49
00:03:21,970 --> 00:03:25,300
That's our termination criteria to stop tracking at that point.

50
00:03:25,900 --> 00:03:31,780
So, you know, we are either we stop attending commissions or where the movement is, at least by one

51
00:03:31,780 --> 00:03:34,330
point, that means we stop tracking at that point.

52
00:03:35,350 --> 00:03:36,460
So then we run the loop.

53
00:03:36,610 --> 00:03:38,760
We implement a mean shift right here.

54
00:03:39,280 --> 00:03:40,930
DSD is our image here.

55
00:03:41,770 --> 00:03:47,950
I should I should add that after the frame is converted to HSV, we use Kalkan Back Project, which

56
00:03:47,950 --> 00:03:50,470
is the back projection for the histogram calculation.

57
00:03:50,860 --> 00:03:52,210
So we used that here.

58
00:03:52,210 --> 00:03:53,670
And then this is DSD.

59
00:03:53,680 --> 00:04:00,130
That's the output of this is now fed into the mean shift tracking function along with a truck window,

60
00:04:00,130 --> 00:04:03,970
along with the termination criteria we established before outside of the loop.

61
00:04:04,690 --> 00:04:10,690
And then we draw a tracking window, have to draw a white window in this case, and we just output it

62
00:04:10,690 --> 00:04:12,160
back to a video so we can see.

63
00:04:12,550 --> 00:04:14,080
So run this.

64
00:04:14,080 --> 00:04:16,300
It takes about three seconds front, pretty quick.

65
00:04:17,860 --> 00:04:18,700
VIDEO It's pretty short.

66
00:04:18,700 --> 00:04:19,700
It's about 15 seconds.

67
00:04:19,700 --> 00:04:25,090
So you can tell this is a very efficient algorithm, given that it did it in a way faster than the actual

68
00:04:25,090 --> 00:04:25,810
length of the video.

69
00:04:26,290 --> 00:04:33,490
Now we can with using FFmpeg to EVA to the MPEG force if you are hit and take four, because that's

70
00:04:33,490 --> 00:04:36,730
what we can display in collab with the HDR mode video player.

71
00:04:40,250 --> 00:04:43,220
And when that's done, you can actually.

72
00:04:43,940 --> 00:04:47,180
I've already done this before, so you can you can just look at my output here.

73
00:04:47,720 --> 00:04:51,980
So this is the output here, so you can see something here, you can see this is the window we establish

74
00:04:51,980 --> 00:04:54,620
here initially, and it's tracking correctly.

75
00:04:55,130 --> 00:04:59,780
This is another random when do we establish here that just looks for the brightest areas in the image

76
00:04:59,780 --> 00:05:03,920
and you'll see let's move on with my mouse over that.

77
00:05:04,000 --> 00:05:06,500
The actual window initially tracks big trucks here.

78
00:05:08,240 --> 00:05:10,250
So it is just sort of working.

79
00:05:10,430 --> 00:05:11,120
Not ideal.

80
00:05:11,120 --> 00:05:12,830
It can see how it works.

81
00:05:13,460 --> 00:05:14,930
Now let's take a look at camp shift.

82
00:05:15,170 --> 00:05:18,200
Camp shift is basically almost the same as mean shift.

83
00:05:18,650 --> 00:05:22,880
However, it rotates, it returns a rotated rectangle as a result.

84
00:05:23,330 --> 00:05:26,060
So it's a more effective way of tracking.

85
00:05:26,510 --> 00:05:28,280
So let's take a look at that.

86
00:05:29,520 --> 00:05:33,350
Consider algorithm is implemented in mostly the same way.

87
00:05:33,410 --> 00:05:39,590
However, instead of a mean shift for you as camp shift there, but almost everything else remains exactly

88
00:05:39,590 --> 00:05:40,040
the same.

89
00:05:41,270 --> 00:05:46,040
We destroy differently because no, instead of drawing using the cvg rectangle, we have to get two

90
00:05:46,040 --> 00:05:50,210
points and draw the line polygon lens for the rotated rectangle.

91
00:05:50,720 --> 00:05:58,100
Since the rectangle, the key to that rectangle function only gives us rectangles in 90 zero 90 degrees

92
00:05:58,100 --> 00:05:58,820
orientation.

93
00:05:59,960 --> 00:06:04,340
So let's run this, which I've already done and I've already completed it to and before.

94
00:06:05,090 --> 00:06:08,390
So let's look at the output of camshaft.

95
00:06:08,780 --> 00:06:09,740
And you can see immediately.

96
00:06:09,740 --> 00:06:12,560
We have directed the triangle rectangle here.

97
00:06:13,640 --> 00:06:16,400
And you can see it's focusing on the bright spot here, which is not what we want.

98
00:06:16,400 --> 00:06:17,710
We wanted to focus on a car.

99
00:06:17,720 --> 00:06:22,280
So let's see if it ends up doing that when the video cuts to the next scene.

100
00:06:23,990 --> 00:06:24,920
No, it does not.

101
00:06:26,030 --> 00:06:27,290
But that is OK here.

102
00:06:29,480 --> 00:06:37,370
I should add one confusing thing is that go back up to the beginning here in this truck or this, this

103
00:06:37,370 --> 00:06:39,530
truck here is actually hardcoded into the video.

104
00:06:39,530 --> 00:06:44,030
It's not actually the main shift of the camera for using the camera from interfering is actually this

105
00:06:44,030 --> 00:06:44,540
box here.

106
00:06:44,930 --> 00:06:47,330
That previous box was just to illustrate.

107
00:06:47,360 --> 00:06:50,210
Other tracking methods prior to this.

108
00:06:50,360 --> 00:06:51,980
So you can take a look.

109
00:06:53,090 --> 00:06:53,890
This is sorry.

110
00:06:53,900 --> 00:06:55,880
This is the actual mean shift box here.

111
00:06:56,570 --> 00:07:01,850
This is another this is another means of box that's in hardcoded into the video that was using that

112
00:07:01,850 --> 00:07:03,680
was using a different initial criteria.

113
00:07:04,550 --> 00:07:06,970
So that's it for this lesson.

114
00:07:06,980 --> 00:07:12,470
The next lesson takes a look at object tracking using the more advanced method of call optical flow.

115
00:07:12,980 --> 00:07:14,240
So stay tuned for that.

116
00:07:14,450 --> 00:07:14,870
Thank you.
