1
00:00:00,330 --> 00:00:05,310
And the section on corrective measures, which I'll look at data augmentation.

2
00:00:05,610 --> 00:00:14,820
In the ZDNet paper, the authors proposes data augmentation scheme where the simply crap nine patches

3
00:00:14,820 --> 00:00:21,840
from each image at different locations with a quater size of the original image.

4
00:00:22,410 --> 00:00:28,020
Now the first four patches contain four quarters of the image without overlapping.

5
00:00:28,790 --> 00:00:32,810
While the other five batches are randomly cropped from the input image.

6
00:00:33,080 --> 00:00:38,770
Then after that the mirror the patches so that they could double the training set.

7
00:00:38,780 --> 00:00:39,800
So that sat.

8
00:00:39,800 --> 00:00:44,480
Supposing we have this image right here, we're going to obtain this first patch.

9
00:00:44,510 --> 00:00:48,130
You could see clearly that it contains this region right here.

10
00:00:48,140 --> 00:00:55,040
Then we could also obtain this other patch, which happens to be this region, then another patch which

11
00:00:55,040 --> 00:00:59,300
is this region around this, then another which is this.

12
00:00:59,300 --> 00:01:05,450
So this first fall we ensure that they don't overlap, so they don't touch either each other.

13
00:01:05,450 --> 00:01:11,090
Then for the rest five we just crop from any position in the image at random.

14
00:01:11,090 --> 00:01:18,950
So we have this this notice that this now matches up with this or rather this overlaps with this.

15
00:01:18,950 --> 00:01:21,110
This too also overlaps with this.

16
00:01:21,110 --> 00:01:28,640
We also have this now that overlaps with it doesn't overlap with any and then we have no overlaps because

17
00:01:28,640 --> 00:01:30,290
we have this point here.

18
00:01:30,290 --> 00:01:32,720
We have it overlaps with this and overlaps with this.

19
00:01:32,810 --> 00:01:35,450
Anyway, we could pick the rest five for random.

20
00:01:35,450 --> 00:01:36,710
So there we go.

21
00:01:37,190 --> 00:01:45,050
Now, once we have this, as you could see, we've put all this different segments of the image together

22
00:01:45,290 --> 00:01:48,110
and this is how we double our data.

23
00:01:48,380 --> 00:01:52,390
So for this image, we now have this new image which is created.

24
00:01:52,400 --> 00:01:54,920
Now this process is repeated.

25
00:01:54,920 --> 00:02:03,350
Same for the density map to generate a density map wherein this parts are been recreated, as we could

26
00:02:03,350 --> 00:02:05,030
see right here.

27
00:02:05,660 --> 00:02:13,190
And then for us to train our model so that we have this sizes being equal, we will just simply resize

28
00:02:13,190 --> 00:02:13,520
this.

29
00:02:13,520 --> 00:02:15,050
So we have this now.

30
00:02:15,050 --> 00:02:21,980
Also note that we could define our model such that it takes an arbitrary size, so we could define our

31
00:02:21,980 --> 00:02:31,970
model so that it takes a size and then generates an output eight times smaller than this image or this

32
00:02:31,970 --> 00:02:33,620
input image size.

33
00:02:34,070 --> 00:02:39,500
So this means instead of having this in X and Y, we just have known.

34
00:02:40,850 --> 00:02:42,050
And then known.

35
00:02:43,370 --> 00:02:50,900
Right here, knowing full well that with the input we'll get is going to be divided by it because we're

36
00:02:50,900 --> 00:02:53,810
going to pass through this tool.

37
00:02:53,810 --> 00:02:58,520
Max, pulling layers and the VG model right here.

38
00:02:58,820 --> 00:03:01,360
Recall, we we had actually three, not two.

39
00:03:01,370 --> 00:03:02,960
So let's do this.

40
00:03:02,960 --> 00:03:14,660
Let's have our get base model summary so we could see our three max pulling layers.

41
00:03:15,200 --> 00:03:15,920
There we go.

42
00:03:15,920 --> 00:03:18,020
We have this one, this and this.

43
00:03:18,020 --> 00:03:25,490
So clearly, no matter the input share we have here, we would always have an output which is eight

44
00:03:25,490 --> 00:03:27,680
times smaller than the input.

45
00:03:28,370 --> 00:03:36,530
And this means that we could take an inputs with arbitrary sizes like, say, 500, because in the case

46
00:03:36,530 --> 00:03:48,410
where we actually recreate a dataset, we're going to have a new image which leaves from 768 1024 that's

47
00:03:48,410 --> 00:03:55,160
height width to 507 six 768.

48
00:03:55,160 --> 00:04:00,290
So this means that we could take an input of the shape and we could also take an input of the shape

49
00:04:00,290 --> 00:04:02,360
since this doesn't.

50
00:04:04,220 --> 00:04:06,560
Need the output to be fixed.

51
00:04:06,590 --> 00:04:07,970
Now, let's run this.

52
00:04:10,480 --> 00:04:11,260
And there we go.

53
00:04:11,350 --> 00:04:11,680
See that?

54
00:04:11,680 --> 00:04:12,670
Everywhere is known.

55
00:04:12,670 --> 00:04:15,460
So we could pass in arbitrary values right here.

56
00:04:15,880 --> 00:04:17,290
Now, that said, let's go.

57
00:04:17,290 --> 00:04:22,300
I had to implement the data augmentation scheme described in the paper first.

58
00:04:22,300 --> 00:04:29,140
Yeah, we have this get run, which permits us passing a minimum value, maximum value, and then generate

59
00:04:29,140 --> 00:04:30,790
a random number in this range.

60
00:04:30,790 --> 00:04:37,390
So we could take, for example, get rand of say, 0 to 30.

61
00:04:39,320 --> 00:04:40,730
It gives us 19.

62
00:04:41,180 --> 00:04:47,900
Then we have this in H in W, which is the same as in Y equals this.

63
00:04:48,230 --> 00:04:49,880
Okay, so there we go.

64
00:04:50,060 --> 00:04:51,320
We have run this.

65
00:04:51,800 --> 00:04:56,870
Now we have the code for getting the different patches which we saw on the slides.

66
00:04:57,170 --> 00:05:05,660
Recall that this first four patches right here, those first four first four patches at fixed positions

67
00:05:05,660 --> 00:05:08,900
because we want to be sure that they don't overlap.

68
00:05:09,200 --> 00:05:10,460
So that's it.

69
00:05:10,460 --> 00:05:13,970
And then we have now the last five patches.

70
00:05:16,070 --> 00:05:16,370
Right.

71
00:05:16,610 --> 00:05:21,620
Uh, last five random patches from the image.

72
00:05:21,620 --> 00:05:25,310
So we have this first for last five.

73
00:05:25,370 --> 00:05:27,290
Uh, first for last five.

74
00:05:27,290 --> 00:05:29,710
We're going to see why we're using this shortly.

75
00:05:29,750 --> 00:05:32,510
So firstly, we have our first patch.

76
00:05:32,510 --> 00:05:43,400
We'll decide to take the h main coordinate as this height divided by eight, and then the W mean coordinate

77
00:05:43,400 --> 00:05:45,920
as the width divided by eight.

78
00:05:46,040 --> 00:05:53,540
Now, note that if we have an image, we will have this the height dimension and then the width dimension.

79
00:05:54,320 --> 00:05:57,260
So we could decide to have this.

80
00:05:57,260 --> 00:06:01,670
This is the height and then there's the width.

81
00:06:02,030 --> 00:06:03,770
So this is the origin.

82
00:06:03,770 --> 00:06:05,900
Obviously we have this origin.

83
00:06:05,900 --> 00:06:09,380
Then we go up to the height divided by eight.

84
00:06:09,380 --> 00:06:14,180
So let's say we have this position for the width dimension.

85
00:06:14,180 --> 00:06:15,510
We have the width divided by eight.

86
00:06:15,530 --> 00:06:20,240
Suppose we have this position, so we see ourselves around this position right here.

87
00:06:20,270 --> 00:06:28,340
Now what we do is we shift a certain number of steps to the right and then shift this a number of steps

88
00:06:28,340 --> 00:06:29,250
to the left.

89
00:06:29,270 --> 00:06:30,050
And there we go.

90
00:06:30,050 --> 00:06:31,280
We have our patch.

91
00:06:31,730 --> 00:06:36,290
Now, the way we get this number of steps to the right and to the left is by simply using the fact that

92
00:06:36,290 --> 00:06:40,460
on the paper was stated that each of the patches have to be a part of the image.

93
00:06:40,460 --> 00:06:48,350
So we shift by a quarter of the height to the downward and by quater of the width to the right.

94
00:06:48,350 --> 00:06:50,840
And then that's how we obtain the patch simply.

95
00:06:50,840 --> 00:06:53,000
So normally this should be a bit bigger.

96
00:06:53,000 --> 00:06:54,680
So we should have this.

97
00:06:56,880 --> 00:06:57,600
And that's it.

98
00:06:57,600 --> 00:06:59,640
That's why we have this right here.

99
00:07:00,210 --> 00:07:06,750
And then this is when we take divided by two, while the weight is divided by eight divided by eight,

100
00:07:06,750 --> 00:07:11,010
though with divided by two, you are divided by two, the weight divided by two others to ensure that

101
00:07:11,010 --> 00:07:12,840
they actually never meet.

102
00:07:12,870 --> 00:07:18,300
Now, initially this height, Max and Max are considered to be zeros.

103
00:07:18,330 --> 00:07:24,300
Then later on we do the addition, which we just spoke of, where we have this patch, one zero, the

104
00:07:24,300 --> 00:07:25,410
patch one is this.

105
00:07:25,410 --> 00:07:33,480
So we're taking this coordinate, which is this plus a quarter of the height than to complete the competition

106
00:07:33,480 --> 00:07:34,770
for the patch one.

107
00:07:34,770 --> 00:07:39,690
We have the max, which is.

108
00:07:40,680 --> 00:07:46,050
The patch one one Does the W mean which way we calculate it right here.

109
00:07:48,120 --> 00:07:54,600
And that's this weight divided by eight now, plus the weight divided by four.

110
00:07:55,410 --> 00:08:00,120
And this is repeated for all this, four patches right here.

111
00:08:01,140 --> 00:08:05,580
Now, the difference with those five or the patches is that they are gotten at random.

112
00:08:05,580 --> 00:08:12,960
So we use the get round function, which will define already to obtain this mean values right here.

113
00:08:12,960 --> 00:08:22,910
But then we ensure that this doesn't cross a certain limit because if we pick the values for this h

114
00:08:22,920 --> 00:08:33,150
mean w mean to be around the borders, then it may happen that when computing the max w max we may have

115
00:08:33,150 --> 00:08:35,070
to go out of the image.

116
00:08:35,070 --> 00:08:43,950
So to avoid this, we make sure that this random values generated range from 0 to 3 quarters of the

117
00:08:43,950 --> 00:08:45,920
height and three quarters of the width.

118
00:08:45,930 --> 00:08:48,090
So that's why we do make this choice right here.

119
00:08:48,090 --> 00:08:54,660
And similar to what we had seen already initially, this max values are zeros and then we simply go

120
00:08:54,660 --> 00:09:00,990
ahead to add up the height divided by four and then the width divided by four to obtain the max and

121
00:09:00,990 --> 00:09:02,500
the max.

122
00:09:02,520 --> 00:09:09,180
Once this is done, we have this patches which takes in now each and every patch right here, a patch,

123
00:09:09,180 --> 00:09:11,480
one patch two up to patch nine.

124
00:09:11,490 --> 00:09:19,290
Then we shuffle this again so that when we creating our image, we are going to have something that

125
00:09:19,290 --> 00:09:24,840
isn't resembling previous augmentations for the same image.

126
00:09:25,290 --> 00:09:32,610
Now, the reason why we're doing the shuffling is mainly because of this four patches right here, because

127
00:09:32,610 --> 00:09:35,400
this already is gotten randomly.

128
00:09:35,400 --> 00:09:40,620
So for this file which is fixed, we just do the shuffling to change the positions.

129
00:09:40,860 --> 00:09:43,170
Now, once we have this, we have this patches.

130
00:09:43,200 --> 00:09:51,000
Now note that also what's interesting is that from this we could obtain the corresponding patches for

131
00:09:51,000 --> 00:09:58,770
the density maps by simply dividing the patches by it since all the positions actually match up.

132
00:09:58,770 --> 00:10:05,500
Given that we divide the image which is an input by it to obtain the density map.

133
00:10:05,520 --> 00:10:08,850
Now we could run this and there we go.

134
00:10:08,850 --> 00:10:09,870
Yeah, our patches.

135
00:10:09,870 --> 00:10:18,120
We have the values for the H mean W mean Max, W max and so on and so forth.

136
00:10:18,150 --> 00:10:25,950
Now if we take this off, we see that we always have c we always have the same values.

137
00:10:25,950 --> 00:10:28,440
The first four values are always the same.

138
00:10:28,950 --> 00:10:30,660
Let's take this now.

139
00:10:33,020 --> 00:10:33,800
There we go.

140
00:10:35,990 --> 00:10:39,860
We now have the coordinates for all the different patches.

141
00:10:40,280 --> 00:10:44,060
After getting all the different patches, we can now read our image.

142
00:10:44,180 --> 00:10:48,920
It's basically what we used to do in already, which was simply reading this image.

143
00:10:48,920 --> 00:10:54,410
And then for the image patch, there's this list will create.

144
00:10:54,410 --> 00:11:04,580
And right here we are going to store the actual image patches based on those coordinates into this image

145
00:11:04,580 --> 00:11:06,050
patch right here.

146
00:11:06,050 --> 00:11:09,170
So that's why we have image patch or the pen we take.

147
00:11:09,170 --> 00:11:12,290
Now the image array, which is basically the image.

148
00:11:12,290 --> 00:11:17,690
And then based on those dimensions of the patch or this hardness of the patches.

149
00:11:17,690 --> 00:11:21,380
So if you take this patch, for example, let's take this one because it's the image.

150
00:11:21,380 --> 00:11:32,630
If we take this patch, for example, what we'll do is we'll simply take the H mean W mean H max and

151
00:11:32,630 --> 00:11:37,610
W max or rather h mean h max.

152
00:11:37,610 --> 00:11:39,860
This isn't w w anymore.

153
00:11:39,890 --> 00:11:46,100
This h mean h max W mean w max.

154
00:11:46,280 --> 00:11:52,520
So we pick out this eye for this range of all the different patch corners.

155
00:11:52,520 --> 00:11:57,740
So here we have the patch coordinates and then here we actually get these patches.

156
00:11:57,740 --> 00:12:07,010
So this means, for example, that in this case we could have something like zero to say 272 So let's,

157
00:12:07,010 --> 00:12:17,600
let's take this 0 to 192 and then this could be from 0 to 256.

158
00:12:18,290 --> 00:12:19,430
So there we go.

159
00:12:20,270 --> 00:12:21,740
Basically, that's what we're doing.

160
00:12:21,770 --> 00:12:25,110
We're just picking out a portion of this image.

161
00:12:26,540 --> 00:12:32,190
Let's take that back and we'll find now that we have this image patch, you see, we just converted

162
00:12:32,290 --> 00:12:37,940
to tensor and then we redo the same process, but for the map patch now.

163
00:12:37,940 --> 00:12:46,010
So we create a map patch where based on what we got from this integration map, we actually pick out

164
00:12:46,010 --> 00:12:47,300
this patch from it.

165
00:12:48,440 --> 00:12:55,130
Once that's done, we now have this final step where we define this new image.

166
00:12:55,130 --> 00:13:00,890
So there's this new image now which will contain all the different patches which we've gathered throughout

167
00:13:00,890 --> 00:13:03,970
this nine different points here.

168
00:13:03,980 --> 00:13:05,960
So we're going to have this.

169
00:13:05,960 --> 00:13:07,520
We'll define this in new.

170
00:13:07,520 --> 00:13:09,050
You see, it takes a shapes.

171
00:13:09,500 --> 00:13:18,080
We could do this calculation to obtain the shape and the case where we have an input of 768 by 1024

172
00:13:18,080 --> 00:13:19,420
by three.

173
00:13:19,430 --> 00:13:27,920
Now this is going to give us the shape simply because if we have one and two by 256 and that we have

174
00:13:27,920 --> 00:13:34,180
to repeat this nine times, it simply means we are doing this high dimension.

175
00:13:34,190 --> 00:13:35,410
Let's take this off.

176
00:13:35,420 --> 00:13:38,060
We're doing this high dimension three times.

177
00:13:38,060 --> 00:13:41,210
So it's just like doing this three times.

178
00:13:41,210 --> 00:13:44,510
We're repeating this also three times.

179
00:13:44,510 --> 00:13:50,190
So if you multiply this by three, multiplied by three, you obtain this, which happens to be three

180
00:13:50,190 --> 00:13:52,940
quarters of the initial input.

181
00:13:53,240 --> 00:14:00,550
Now, once we have this, we go ahead to create this new image which we call new.

182
00:14:01,670 --> 00:14:10,010
Now, that said, suppose we have this new garden from our which is going to collect all those different

183
00:14:10,010 --> 00:14:10,820
patches.

184
00:14:10,940 --> 00:14:14,420
It means we have this nine different positions.

185
00:14:14,810 --> 00:14:15,980
So there we go.

186
00:14:16,040 --> 00:14:17,300
We have this.

187
00:14:18,890 --> 00:14:20,000
I have that.

188
00:14:20,000 --> 00:14:22,760
And then we have this right here now.

189
00:14:22,760 --> 00:14:24,110
So we have each patch.

190
00:14:24,110 --> 00:14:28,670
One, two, three, four, five, six, seven, eight, nine different patches to fit.

191
00:14:28,670 --> 00:14:34,970
And based on the patches we have seen previously and to fit this now all of those we just simply pick,

192
00:14:34,970 --> 00:14:39,230
for example, go from 0 to 192 and the height.

193
00:14:39,230 --> 00:14:47,740
And then in the width we go from when I zero will go from zero to like in this case we go from 0 to

194
00:14:47,740 --> 00:14:48,350
256.

195
00:14:48,350 --> 00:14:50,600
So that's how we obtain this.

196
00:14:51,170 --> 00:14:59,210
So this patch, we select this patch and we put in the value of the image patch, which we had seen

197
00:14:59,210 --> 00:15:01,660
previously recorded as image patches.

198
00:15:01,670 --> 00:15:07,620
Now there are nine different image patches which we want to fit into this, different boxes right here.

199
00:15:07,640 --> 00:15:14,390
Now when I is zero we have the image patch zero, so we have the zero image patch which will be fitted

200
00:15:14,390 --> 00:15:14,930
in here.

201
00:15:14,930 --> 00:15:16,160
So that's it.

202
00:15:16,280 --> 00:15:22,960
Now we go to we still at I equals zero, we move to this, we now shift the height.

203
00:15:22,970 --> 00:15:24,920
Basically what we're doing is we're shifting the height.

204
00:15:24,920 --> 00:15:27,530
So we now move to 192.

205
00:15:28,400 --> 00:15:32,240
So we're going from 109 to 292 times two basically.

206
00:15:32,240 --> 00:15:34,850
So we have 384.

207
00:15:35,120 --> 00:15:36,440
So this is the next.

208
00:15:36,440 --> 00:15:39,890
And then though, since I is zero, we still maintain this.

209
00:15:39,890 --> 00:15:42,680
So basically we're still going from zero 2 to 56.

210
00:15:42,680 --> 00:15:44,090
So now this is what we occupy.

211
00:15:44,450 --> 00:15:47,870
And then when I when we move to this, I still zero.

212
00:15:47,870 --> 00:15:49,160
We have two times this.

213
00:15:49,160 --> 00:15:52,430
So we have 384 to 507 six.

214
00:15:52,430 --> 00:15:55,090
And then we have this portion right here.

215
00:15:55,100 --> 00:16:04,850
Now when I equals one, you will see that we still have from we still go from 0 to 192 for the height,

216
00:16:04,850 --> 00:16:07,580
so the height from this to this.

217
00:16:07,580 --> 00:16:10,400
But then the width now goes from 256.

218
00:16:10,400 --> 00:16:18,890
So instead of starting from initial year, now we go from 256 to 2 56 to 56 plus 256.

219
00:16:18,890 --> 00:16:25,280
So that's it, 512 now doing this and match.

220
00:16:25,280 --> 00:16:27,950
Now with this we see that we fall in this box right here.

221
00:16:27,950 --> 00:16:29,690
So basically repeat this.

222
00:16:29,690 --> 00:16:35,960
We have this, we have this when I equals two, now we move to this, this and this.

223
00:16:35,960 --> 00:16:39,110
So that's how we fill out all these different boxes.

224
00:16:39,110 --> 00:16:41,630
And the same is repeated for the mapping.

225
00:16:43,090 --> 00:16:44,340
Now that's understood.

226
00:16:44,350 --> 00:16:45,550
We could run this.

227
00:16:46,150 --> 00:16:47,100
And there we go.

228
00:16:47,110 --> 00:16:50,320
We see that we now have this our image.

229
00:16:50,320 --> 00:16:54,100
Let's run this again so we could see how it's modified.

230
00:16:55,300 --> 00:16:56,710
Notice that it's going to change.

231
00:16:56,710 --> 00:16:58,030
So you see.

232
00:16:58,270 --> 00:16:58,900
There we go.

233
00:16:58,900 --> 00:17:06,670
So we see that we have actually done what's expected as of getting this different patches and then forming

234
00:17:06,670 --> 00:17:11,560
this new image, which will multiply our dataset size by two.

235
00:17:12,550 --> 00:17:20,140
Then the last step now is just for the this image, or rather this mappings.

236
00:17:20,140 --> 00:17:23,620
That's this Gaussian mapping which seen previously.

237
00:17:23,830 --> 00:17:30,760
So our density maps we should just verify to ensure that this is the same as the input.

238
00:17:30,790 --> 00:17:32,170
You see this head right here.

239
00:17:32,200 --> 00:17:34,000
Let's start from the so we have a head.

240
00:17:34,030 --> 00:17:36,160
So you notice that we have this right here.

241
00:17:36,310 --> 00:17:39,820
If we go down, we see this hat, it matches up with this, This hat.

242
00:17:39,820 --> 00:17:40,450
Yeah.

243
00:17:41,010 --> 00:17:43,150
Matches up with this.

244
00:17:44,290 --> 00:17:45,940
This other one here.

245
00:17:46,090 --> 00:17:48,430
This, this is what I had right here.

246
00:17:48,460 --> 00:17:49,660
Matches up with this.

247
00:17:49,690 --> 00:17:51,220
See this one, this, this.

248
00:17:51,220 --> 00:17:52,110
So that's it.

249
00:17:52,120 --> 00:17:55,360
Now, at the center, notice that the center, there's no head.

250
00:17:55,360 --> 00:17:58,030
So that's why you see the center looks empty right here.

251
00:17:58,630 --> 00:18:04,360
And then maybe this this consider to be a head, because very close to a head, it is not very clear

252
00:18:04,360 --> 00:18:05,010
from here.

253
00:18:05,020 --> 00:18:09,880
Now, looking at this top, you see there are many people looking at this, you see, because there

254
00:18:09,880 --> 00:18:13,450
are many people here, we have this different points.

255
00:18:13,470 --> 00:18:14,950
Then we have this point here.

256
00:18:14,950 --> 00:18:17,230
Let's look for a different image.

257
00:18:17,560 --> 00:18:19,480
Let's take this one.

258
00:18:22,160 --> 00:18:23,230
Run that.

259
00:18:23,240 --> 00:18:24,060
Okay.

260
00:18:24,080 --> 00:18:29,470
Now, this image is better because there's some portions where there is no head.

261
00:18:29,480 --> 00:18:31,960
So this one, for example, is very empty.

262
00:18:31,970 --> 00:18:32,930
This empty.

263
00:18:32,960 --> 00:18:36,410
See, now this portion is where you have this head.

264
00:18:36,440 --> 00:18:36,770
You see?

265
00:18:36,770 --> 00:18:37,370
You have this.

266
00:18:37,640 --> 00:18:38,240
Yeah.

267
00:18:38,690 --> 00:18:40,430
See where there's so many heads?

268
00:18:40,460 --> 00:18:42,410
See this and that.

269
00:18:42,410 --> 00:18:43,220
So that's it.

270
00:18:43,220 --> 00:18:46,340
That's how we multiply this by two.

271
00:18:46,370 --> 00:18:50,030
Now, let's integrate this into our data generator.

272
00:18:50,540 --> 00:18:56,300
Now, before integrating our data generator, let's note that we also define this new function.

273
00:18:56,300 --> 00:18:58,600
Generate new, which takes in a patch.

274
00:18:58,610 --> 00:19:02,810
Now know that this patch could be image patch or density map patch.

275
00:19:02,840 --> 00:19:09,650
It takes on this high dimension with dimension know that the density map and the image will have different

276
00:19:09,650 --> 00:19:10,250
dimensions.

277
00:19:10,250 --> 00:19:12,200
So that's why we're taking this.

278
00:19:12,200 --> 00:19:14,870
And also even the image could have different dimensions.

279
00:19:14,900 --> 00:19:19,790
In the long run, we can have that data set with different heights and widths compared to what we have

280
00:19:19,790 --> 00:19:20,420
now.

281
00:19:20,450 --> 00:19:22,940
Now, if it's an image, it's true.

282
00:19:22,970 --> 00:19:27,020
If it's not, we have this right here.

283
00:19:28,400 --> 00:19:31,000
And the reason why we're doing this is obvious.

284
00:19:31,010 --> 00:19:35,090
Once an image, we have this channel dimension.

285
00:19:35,270 --> 00:19:37,940
When a map, we have no channel dimension.

286
00:19:37,940 --> 00:19:39,020
So that's it.

287
00:19:39,050 --> 00:19:47,810
Now we repeat the same steps by just making sure that what we have now here are those variables which

288
00:19:47,810 --> 00:19:52,340
will vary depending on the input image right here.

289
00:19:53,990 --> 00:19:55,590
Now, that has been understood.

290
00:19:55,610 --> 00:19:59,590
Let's look at this integration in our data generator.

291
00:19:59,600 --> 00:20:02,870
So yeah, we have the X and Y.

292
00:20:02,900 --> 00:20:06,090
So X took this, Y took that, that was fine.

293
00:20:06,110 --> 00:20:11,710
Now we define those, we get our patches using the get patches function, which we've defined already.

294
00:20:11,720 --> 00:20:17,210
Once we get our patches, we go through the same process where we have the image patch and the map patches

295
00:20:17,210 --> 00:20:19,490
which we actually obtain.

296
00:20:19,490 --> 00:20:27,220
And then the next step is to apply the generate new and then get this image, this new image generated.

297
00:20:27,230 --> 00:20:33,800
We resize it to get the original, the size of the the original size of the image.

298
00:20:33,800 --> 00:20:38,690
Now we had to find this to be in X and in Y.

299
00:20:38,690 --> 00:20:42,380
So basically we have here in H.

300
00:20:43,940 --> 00:20:58,370
So here we have in H, in W, here we have in H divided by it and yeah, in W divided by it.

301
00:20:58,370 --> 00:20:59,480
So that's fine.

302
00:21:00,230 --> 00:21:03,620
Once this is done we stack the to.

303
00:21:04,880 --> 00:21:09,680
Elements of our list, both for the inputs and for the outputs.

304
00:21:09,680 --> 00:21:11,480
So we could run this now.

305
00:21:12,140 --> 00:21:13,070
That's fine.

306
00:21:13,460 --> 00:21:19,250
We are base model has been run already, so we could go on with that.

307
00:21:20,330 --> 00:21:27,950
And also there is this point we didn't mention is that we took off the reshape now in favor of this

308
00:21:27,950 --> 00:21:33,080
because with the reshape we are obliged to specify its outputs.

309
00:21:33,080 --> 00:21:35,970
Whereas now with this we don't have to specify the outputs.

310
00:21:35,990 --> 00:21:43,190
Given that, what we're just doing is we're taking off this last dimension, and to take that off,

311
00:21:43,190 --> 00:21:44,830
it just suffices to do this.

312
00:21:44,840 --> 00:21:52,550
That is because the all previous dimensions and then for the last dimension, take just the zero value.

313
00:21:53,600 --> 00:22:00,020
And if this okay, we can now train C let's take this back or train.

314
00:22:03,680 --> 00:22:05,150
We get in this error?

315
00:22:05,540 --> 00:22:06,230
There should be.

316
00:22:06,230 --> 00:22:16,130
Because yeah, when this generator who made this error of doing this resize based on the modified version

317
00:22:16,130 --> 00:22:21,940
of the hide, know that we have modified the height to match up with a new image generator.

318
00:22:21,950 --> 00:22:26,060
So this height had changed its number, the initial height.

319
00:22:26,060 --> 00:22:26,690
So that's why.

320
00:22:26,690 --> 00:22:31,370
Yeah, we should take X and then y the x.

321
00:22:32,060 --> 00:22:32,990
Yeah, y.

322
00:22:32,990 --> 00:22:33,650
So that's it.

323
00:22:34,100 --> 00:22:35,450
Run this again.

324
00:22:40,900 --> 00:22:42,280
And it works just fine.

325
00:22:43,180 --> 00:22:44,520
We've gone out of memory.

326
00:22:44,530 --> 00:22:46,450
We just have to restart.

327
00:22:46,960 --> 00:22:51,150
Now know that depending on your computing power, restarting may not help.

328
00:22:51,160 --> 00:22:55,840
So in that case, you may have to reduce the disk dimensions.

329
00:22:55,840 --> 00:22:56,050
Right?

330
00:22:56,170 --> 00:23:04,960
So you have to resize our inputs such that we have smaller inputs which could get into the GPUs memory.

331
00:23:05,680 --> 00:23:13,540
Nonetheless, with this GPU, when we restart and clear our memory, we could actually train on the

332
00:23:14,110 --> 00:23:16,990
inputs and the augmented inputs.

333
00:23:16,990 --> 00:23:18,880
So let's run this.

334
00:23:19,210 --> 00:23:20,500
Everything is going to work fine.

335
00:23:20,500 --> 00:23:24,460
Now run our data generator on that.

336
00:23:25,240 --> 00:23:28,120
Different models go ahead and train.

337
00:23:29,560 --> 00:23:35,320
Now we see that trains, we've cleared our memory and or just there was now enough space for it to train

338
00:23:35,320 --> 00:23:42,550
on this data set, which has been augmented after training for around 38 blocks, you have the results

339
00:23:42,550 --> 00:23:43,480
we obtain.

340
00:23:44,770 --> 00:23:51,130
Let's now pause the training and see what the kind of maps it generates.

341
00:23:51,130 --> 00:23:54,940
So test for this training set data right here.

342
00:23:55,900 --> 00:24:02,230
The density map generated is not bad, although the number of people predicted is not exactly the same

343
00:24:02,230 --> 00:24:02,680
as this.

344
00:24:02,680 --> 00:24:08,320
But the model is learning to generate this density maps correctly.

345
00:24:08,680 --> 00:24:14,680
Now let's test on others say 245, which is what we had in the slides.

346
00:24:15,190 --> 00:24:17,290
So we have 74 right here.

347
00:24:19,390 --> 00:24:22,930
And obviously this will get better as we continue with training.

348
00:24:22,960 --> 00:24:27,040
Now let's test on an image which is not in our dataset.

349
00:24:27,430 --> 00:24:29,140
We'll test with this image.

350
00:24:29,570 --> 00:24:34,210
We should call people to read the several test images which we used.

351
00:24:34,510 --> 00:24:41,120
Basically, we run this and then we observe the density map, which is generated from the input image.

352
00:24:41,140 --> 00:24:45,010
You see how it looks close to the input image we have.

353
00:24:45,010 --> 00:24:50,180
And also note that this image doesn't look like a typical image from our training dataset.

354
00:24:50,200 --> 00:25:00,370
So our model is learning quite well and generalizing even when the test set isn't looking as the training

355
00:25:00,370 --> 00:25:00,850
set.

356
00:25:01,000 --> 00:25:06,340
Now, also note that in practice you will have to, depending on the kind of problem you want to solve,

357
00:25:06,370 --> 00:25:08,620
get as much data as possible.

358
00:25:08,620 --> 00:25:15,160
So if you are having a say, a camera or CCTV which is placed in this kind of position, it's better

359
00:25:15,160 --> 00:25:18,520
for you to train on more of this kind of data.

360
00:25:18,520 --> 00:25:24,910
Whereas if your CCTV is place such that these images look more like this, then it's better to train

361
00:25:24,910 --> 00:25:26,050
on this kind of data.

362
00:25:26,050 --> 00:25:34,000
So it's very important to take note that the model isn't of great use if you don't pick out the right

363
00:25:34,000 --> 00:25:35,860
data to train that model with.

364
00:25:36,010 --> 00:25:37,150
So that's it.

365
00:25:37,420 --> 00:25:41,080
Let's now count the number of people we have in this image.

366
00:25:41,440 --> 00:25:46,780
Now, after segmenting this image into these different paths, we count this number of people seven

367
00:25:46,780 --> 00:25:53,360
plus 29 plus 26 plus 20 plus 18, 14, 14, 14, 17, 28.

368
00:25:53,380 --> 00:25:57,880
This gives us a total of 187 people.

369
00:25:58,000 --> 00:26:04,030
And we see that our model did quite well because it predicted 180 people.

370
00:26:04,030 --> 00:26:06,370
So we missed out on seven people.

371
00:26:07,150 --> 00:26:13,420
And this is also normal because we have this pose right here, which for this person, for example,

372
00:26:13,430 --> 00:26:16,060
she wasn't very clear and stuff like that.

373
00:26:16,060 --> 00:26:23,530
So it's also important to note that in a case where you are going to be predicting people where they

374
00:26:23,530 --> 00:26:29,170
are, polls like this, it's important to have as much data as possible where you have this kind of

375
00:26:29,170 --> 00:26:35,020
post so that our model learns to differentiate between the polls and maybe the people's heads.

376
00:26:35,020 --> 00:26:39,280
Like this poll seems to block this person's head right here.

377
00:26:39,430 --> 00:26:44,320
So it's very important to take note of the dataset you're working with.

378
00:26:44,530 --> 00:26:51,520
So it's very important to work with data set, which is representative of the kind of data your model

379
00:26:51,520 --> 00:26:53,950
will be receiving once it has been deployed.
