1
00:00:01,000 --> 00:00:02,000
Hello everyone.

2
00:00:02,000 --> 00:00:08,000
This lecture presents an introduction to multiple object tracking using sort and deep sort algorithm.

3
00:00:08,000 --> 00:00:15,000
So stock sort stands for simple online real time tracking both sort and deep sort are multiple object

4
00:00:15,000 --> 00:00:16,000
tracking algorithms.

5
00:00:16,000 --> 00:00:20,000
Deep sort is an extension of sort algorithm.

6
00:00:20,000 --> 00:00:26,000
Basically, the authors of the sort algorithm have proposed that deep sort algorithm to address the

7
00:00:26,000 --> 00:00:28,000
issues in the sort algorithm.

8
00:00:28,000 --> 00:00:34,000
In this lecture, we will see what are the issues in the sorted rhythm and how does deep Sort solve

9
00:00:34,000 --> 00:00:36,000
the issues in the sort algorithm?

10
00:00:36,000 --> 00:00:41,000
In this lecture, we will start from the scratch and will try to explain you.

11
00:00:41,000 --> 00:00:45,000
What is basically tracking is why we need an object tracker.

12
00:00:45,000 --> 00:00:50,000
Then we will also look at the applications of object tracking types of trackers and so on.

13
00:00:50,000 --> 00:00:52,000
So let's get started.

14
00:00:55,000 --> 00:00:56,000
So following are the objectives.

15
00:00:56,000 --> 00:01:01,000
Or you can say that we have divided this whole lecture into eight different parts.

16
00:01:01,000 --> 00:01:04,000
So in the first part we will see Introduction to tracking.

17
00:01:04,000 --> 00:01:07,000
In the second part, we will see why we need an object tracker.

18
00:01:07,000 --> 00:01:11,000
In the third part I will discuss the applications of tracking.

19
00:01:11,000 --> 00:01:17,000
In the fourth part we will see the types of trackers we have, which include single object tracker and

20
00:01:17,000 --> 00:01:18,000
multiple object tracker.

21
00:01:18,000 --> 00:01:21,000
In the fifth part we will see what is Deepsort algorithm.

22
00:01:21,000 --> 00:01:28,000
In the sixth part we will discuss about sort algorithm, which is simple online real time tracking algorithm.

23
00:01:28,000 --> 00:01:35,000
In the seventh part we will see the issues in the SORT algorithm and in the eighth part we will see

24
00:01:35,000 --> 00:01:38,000
how sort deep sort algorithm solve these issues.

25
00:01:38,000 --> 00:01:41,000
So let's get started with this first part.

26
00:01:42,000 --> 00:01:45,000
So this slide presents an introduction to tracking.

27
00:01:45,000 --> 00:01:45,000
So what?

28
00:01:45,000 --> 00:01:52,000
Basically, object tracking is So object tracking is basically a method to track the detected object

29
00:01:52,000 --> 00:01:57,000
throughout the frames using their spatial and temporal features.

30
00:01:57,000 --> 00:02:03,000
So what basically we do in tracking so in tracking, we get the initial set of detections from the object

31
00:02:03,000 --> 00:02:07,000
detection algorithm, which can be YOLO, V7 or YOLO V8.

32
00:02:07,000 --> 00:02:14,000
In the next step, we assign the detected objects a unique ID and then track them throughout the frames

33
00:02:14,000 --> 00:02:18,000
of the video feed while maintaining the assigned unique ID.

34
00:02:18,000 --> 00:02:22,000
So object tracking algorithm assigns a unique ID randomly.

35
00:02:22,000 --> 00:02:24,000
Basically, it can be any random number.

36
00:02:24,000 --> 00:02:26,000
Unique ID can be any random number.

37
00:02:26,000 --> 00:02:28,000
It should not be in the form of order.

38
00:02:28,000 --> 00:02:31,000
So like the first object can be assigned the ID of 30.

39
00:02:31,000 --> 00:02:36,000
The second object can be assigned an ID of 35, 40 or 100.

40
00:02:36,000 --> 00:02:37,000
Okay.

41
00:02:37,000 --> 00:02:43,000
So in the let's see what basically tracking, what process the tracking, what So tracking can be considered

42
00:02:43,000 --> 00:02:45,000
a two step process.

43
00:02:45,000 --> 00:02:51,000
In the first step, we do the detection and localization of the object using any type of object detector

44
00:02:51,000 --> 00:02:55,000
algorithm which can be Yolov7, YOLO, V8 or YOLO.

45
00:02:55,000 --> 00:03:02,000
R So after doing detections in the second step using the motion predictor, we predict the future of

46
00:03:02,000 --> 00:03:05,000
the object using its past information.

47
00:03:05,000 --> 00:03:08,000
So this is how we implement tracking.

48
00:03:08,000 --> 00:03:11,000
And tracking is basically a two step process.

49
00:03:11,000 --> 00:03:12,000
Let's move towards the next slide.

50
00:03:15,000 --> 00:03:18,000
Let's see why we need an object tracker.

51
00:03:18,000 --> 00:03:24,000
When I start reading about object tracking, I was thinking why we need an object tracker.

52
00:03:24,000 --> 00:03:27,000
Why can't we use only object detector?

53
00:03:27,000 --> 00:03:30,000
Let's discuss this with an example.

54
00:03:31,000 --> 00:03:34,000
So why can't we use an object detector only?

55
00:03:34,000 --> 00:03:41,000
So in a video feed, for example, we are detecting the car whenever the car gets occluded or overlapped

56
00:03:41,000 --> 00:03:42,000
by something.

57
00:03:42,000 --> 00:03:45,000
For example, with a truck, the detector will fail.

58
00:03:45,000 --> 00:03:53,000
But if we have a tracker with it, we will be able to predict the future motion and as well as track

59
00:03:53,000 --> 00:03:56,000
the car by assigning a unique ID.

60
00:03:56,000 --> 00:04:00,000
So in object tracking, we assign a unique ID to each of the object.

61
00:04:00,000 --> 00:04:07,000
We want to track and maintain the object that ID till the object is in that frame.

62
00:04:09,000 --> 00:04:13,000
Let's explore the applications of tracking.

63
00:04:13,000 --> 00:04:18,000
Let's look at the first application, which is traffic monitoring, so trackers can be used to monitor

64
00:04:18,000 --> 00:04:21,000
traffic and track vehicles on the road.

65
00:04:21,000 --> 00:04:25,000
They can be used to judge traffic, detect violations and many more.

66
00:04:25,000 --> 00:04:31,000
Other applications of tracking are there, which includes automatic number, plate recognition and so

67
00:04:31,000 --> 00:04:31,000
on.

68
00:04:31,000 --> 00:04:35,000
So we can also use object trackers in sports as well.

69
00:04:35,000 --> 00:04:42,000
So trackers can be used in sports like ball tracking, player detecting or shuttlecock trick tracking

70
00:04:42,000 --> 00:04:43,000
in case of badminton.

71
00:04:43,000 --> 00:04:49,000
This is in turn can be used to detect false scores and many more in the game.

72
00:04:49,000 --> 00:04:54,000
So in game we have many applications object tracking which include the in basketball, we can count

73
00:04:54,000 --> 00:04:55,000
the number of scores.

74
00:04:55,000 --> 00:05:04,000
We can also see if there is a foul in football or we can also use a to track ball in case of cricket

75
00:05:04,000 --> 00:05:05,000
or in case of football.

76
00:05:05,000 --> 00:05:11,000
We can also track the football or in case of badminton, we can also track the shuttlecock.

77
00:05:11,000 --> 00:05:18,000
So in case of multicam surveillance, in tracking, multicam surveillance can also be applied in this,

78
00:05:18,000 --> 00:05:20,000
the core idea is re-identification.

79
00:05:20,000 --> 00:05:26,000
For example, if a person is being tracked in one camera with an ID and the person goes out of the frame

80
00:05:26,000 --> 00:05:33,000
and comes back in another camera, then the person will retain the same ID that they had previously.

81
00:05:33,000 --> 00:05:40,000
This application can help in identifying the object that appear in a different camera and can also be

82
00:05:40,000 --> 00:05:42,000
used in intrusion detection.

83
00:05:43,000 --> 00:05:45,000
Let's explore the types of trackers.

84
00:05:45,000 --> 00:05:47,000
So there are two types of tracker.

85
00:05:47,000 --> 00:05:50,000
Single object tracker and multiple object tracker.

86
00:05:51,000 --> 00:05:58,000
In single object tracker, we only track a single object no matter how many objects are present in the

87
00:05:58,000 --> 00:05:59,000
frame.

88
00:05:59,000 --> 00:06:03,000
Single object detectors trackers are actually very fast.

89
00:06:03,000 --> 00:06:11,000
Some of the single object trackers are cshd and many more as well, which are built using the computer

90
00:06:11,000 --> 00:06:11,000
vision.

91
00:06:12,000 --> 00:06:19,000
So in multiple object tracker, we can track multiple objects present in a frame at at the same time

92
00:06:19,000 --> 00:06:24,000
even of different classes while maintaining a high speed.

93
00:06:24,000 --> 00:06:29,000
So multiple object trackers have proved have proved to be more accurate.

94
00:06:29,000 --> 00:06:38,000
Some of the commonly used multiple object trackers are deepsort sort and central track algorithms.

95
00:06:40,000 --> 00:06:40,000
Hi, guys.

96
00:06:40,000 --> 00:06:41,000
Let's look.

97
00:06:41,000 --> 00:06:43,000
What is Deepsort algorithm?

98
00:06:43,000 --> 00:06:50,000
So Deepsort is a computer vision algorithm used to track the objects while assigning each of the tracked

99
00:06:50,000 --> 00:06:52,000
object a unique ID.

100
00:06:52,000 --> 00:06:55,000
So Deepsort is an extension of sort algorithm.

101
00:06:55,000 --> 00:07:02,000
Deepsort was introduced by the authors of sort algorithm to address the issues in the SORT algorithm.

102
00:07:02,000 --> 00:07:10,000
So Deepsort algorithm introduces deep learning into the sort algorithm by adding appearance description

103
00:07:10,000 --> 00:07:17,000
descriptor to reduce the identity switches and hence making the tracking more efficient and more accurate.

104
00:07:19,000 --> 00:07:19,000
Guys.

105
00:07:19,000 --> 00:07:26,000
As I told you, Deepsort introduces deep learning into the SORT algorithm by adding appearance descriptor

106
00:07:26,000 --> 00:07:28,000
so you can see here.

107
00:07:29,000 --> 00:07:34,000
Here we have added appearance descriptor into the SORT algorithm.

108
00:07:34,000 --> 00:07:36,000
So this becomes deep sort algorithm.

109
00:07:36,000 --> 00:07:44,000
So appearance descriptor helps us to reduce the identity switches and hence make the tracking more efficient.

110
00:07:44,000 --> 00:07:45,000
So.

111
00:07:46,000 --> 00:07:47,000
This note.

112
00:07:47,000 --> 00:07:54,000
The conclusion is Deepsort basically introduces deep learning into the SORT algorithm by adding appearance

113
00:07:54,000 --> 00:08:01,000
descriptors, which helps to reduce the identity switches and hence makes the tracking more efficient.

114
00:08:01,000 --> 00:08:04,000
So here you can see that in the SORT algorithm.

115
00:08:04,000 --> 00:08:08,000
We have added a previous descriptor over here, over here.

116
00:08:08,000 --> 00:08:15,000
So this becomes now our deepsort algorithm because deep learning is introduced into the SORT algorithm.

117
00:08:17,000 --> 00:08:19,000
Let's look at this sort algorithm.

118
00:08:19,000 --> 00:08:26,000
So sort is an approach to object tracking where Kalman filters and Hungarian algorithm are used to track

119
00:08:26,000 --> 00:08:27,000
objects.

120
00:08:27,000 --> 00:08:31,000
Sort consists of four components which are as follows.

121
00:08:31,000 --> 00:08:34,000
So the first component is detection.

122
00:08:34,000 --> 00:08:40,000
In the first step, the detection of all the objects which are basically needed to be tracked is done

123
00:08:40,000 --> 00:08:44,000
using Yolov5, YOLO, V7 or YOLO V8.

124
00:08:45,000 --> 00:08:49,000
So after doing detection in the next step, estimation is done.

125
00:08:49,000 --> 00:08:55,000
So in estimation step, the detections are passed from the current frame to the next frame to estimate

126
00:08:55,000 --> 00:09:02,000
the position of the target in the next frame using Gaussian distribution and constant velocity model.

127
00:09:03,000 --> 00:09:07,000
The acoustic estimation is done using the Kalman filter.

128
00:09:10,000 --> 00:09:12,000
The third step is data association.

129
00:09:12,000 --> 00:09:19,000
So in that association, the cost matrix is basically computed as the intersection over union distance

130
00:09:19,000 --> 00:09:25,000
between each detection and all predicted bounding boxes from the existing target.

131
00:09:25,000 --> 00:09:33,000
In the fourth step, the creation and deletion of track identities is done, so when the object enters

132
00:09:33,000 --> 00:09:39,000
a frame, it is being assigned a unique ID, and when the object leaves the frame, the unique ID is

133
00:09:39,000 --> 00:09:41,000
removed from the list.

134
00:09:43,000 --> 00:09:49,000
Naught algorithm performs very well in terms of tracking precision and accuracy.

135
00:09:49,000 --> 00:09:53,000
But there are issues in the SORT algorithm.

136
00:09:53,000 --> 00:09:59,000
The sort algorithm fails in case of occlusion and different viewpoints.

137
00:09:59,000 --> 00:10:06,000
And also, despite the effectiveness of Kalman filter, it returns a relatively higher number of identity

138
00:10:06,000 --> 00:10:07,000
switches.

139
00:10:07,000 --> 00:10:10,000
So these are the two issues in the SORT algorithm.

140
00:10:10,000 --> 00:10:17,000
So the authors of the sort algorithm propose the sort algorithm to address these two issues.

141
00:10:18,000 --> 00:10:22,000
But we have seen that there are two issues in the sort algorithm.

142
00:10:22,000 --> 00:10:25,000
First one is sort algorithm fail in case of allusion.

143
00:10:25,000 --> 00:10:31,000
And the second one is that sort algorithm returns a higher number of identity switches.

144
00:10:31,000 --> 00:10:36,000
So the authors of the sort algorithm propose deep sort to address these issues.

145
00:10:36,000 --> 00:10:43,000
So in deep sort, another distance metric is introduced based on the appearance of the object, which

146
00:10:43,000 --> 00:10:47,000
is the appearance feature vector or deep appearance descriptor.

147
00:10:47,000 --> 00:10:54,000
So Deep Sort uses a better association metric which combines both motion and appearance descriptor.

148
00:10:55,000 --> 00:11:01,000
So deep sort can be defined as a tracking algorithm which track object not only based on the velocity

149
00:11:01,000 --> 00:11:07,000
and motion of the object, but also based on the appearance of the object.

150
00:11:09,000 --> 00:11:13,000
The following is the block diagram of Deepsort algorithm.

151
00:11:13,000 --> 00:11:20,000
Deepsort introduces deep learning into the SORT algorithm by adding a adding appearance descriptor,

152
00:11:20,000 --> 00:11:22,000
which you can see over here.

153
00:11:24,000 --> 00:11:31,000
So as the appearance descriptor reduces the identity switches and hence making the tracking very much

154
00:11:31,000 --> 00:11:32,000
more efficient.

155
00:11:32,000 --> 00:11:34,000
So this is all from this lecture.

156
00:11:34,000 --> 00:11:41,000
See you in the next lecture in which we will implement Deepsort tracking in Google CoLab, and we will

157
00:11:41,000 --> 00:11:45,000
also implement sort algorithm in the upcoming lectures as well.

158
00:11:45,000 --> 00:11:47,000
So till then, bye bye.

