1
00:00:00,180 --> 00:00:06,120
Hello, everyone, and welcome to this new and exciting session in which we're going to visualize the

2
00:00:06,120 --> 00:00:09,510
convolutional neural networks feature maps.

3
00:00:09,780 --> 00:00:17,910
One very important part of building robust deep learning models involves understanding how these models

4
00:00:17,910 --> 00:00:22,340
work or understanding what goes on in the different hidden layers.

5
00:00:22,350 --> 00:00:30,510
And so in the section, we'll focus on taking a model which has already been pre trained and then generating

6
00:00:30,510 --> 00:00:31,500
this feature maps.

7
00:00:31,500 --> 00:00:35,860
So we get to see exactly what goes on under the hood.

8
00:00:35,880 --> 00:00:40,350
The pre trained model will be using here will be the G 16.

9
00:00:40,350 --> 00:00:45,330
So we'll simply copy this, get back to our code, paste this out.

10
00:00:46,080 --> 00:00:47,040
There we go.

11
00:00:47,070 --> 00:00:49,650
We have our G 16.

12
00:00:49,650 --> 00:00:57,480
We are going to take the top, so we'll take all this off now our input shape and we'll take this input

13
00:00:57,480 --> 00:00:58,590
tensor off.

14
00:00:58,590 --> 00:01:02,060
We're not going to include the top, so we set this to false.

15
00:01:02,070 --> 00:01:03,000
That's it.

16
00:01:03,000 --> 00:01:08,820
Then this input shape will define it as our size here.

17
00:01:08,820 --> 00:01:14,610
So we have configuration in size, that's fine.

18
00:01:14,610 --> 00:01:17,760
And we add this year.

19
00:01:18,000 --> 00:01:19,050
Okay, so that's it.

20
00:01:19,050 --> 00:01:26,010
We set this and we give it this name, VG Backbone.

21
00:01:26,220 --> 00:01:29,280
So we have VG backbone right here.

22
00:01:30,690 --> 00:01:36,090
We can check out the summary VG Backbone summary.

23
00:01:37,440 --> 00:01:38,370
Run that.

24
00:01:40,230 --> 00:01:41,700
There we go.

25
00:01:42,270 --> 00:01:47,220
You see, it has about 14.7 million parameters.

26
00:01:47,640 --> 00:01:53,580
We now move on to the next step where we're going to create this other model which will permit us visualize

27
00:01:53,580 --> 00:01:54,740
this feature maps.

28
00:01:54,750 --> 00:01:56,840
Now to explain how this works.

29
00:01:56,850 --> 00:02:00,660
Let's recall that we have this VG right here.

30
00:02:00,660 --> 00:02:02,520
So we have our VG.

31
00:02:02,520 --> 00:02:07,140
And then what the VG does is it takes in an input image.

32
00:02:07,140 --> 00:02:11,280
So we have an input image and then it produces a single output.

33
00:02:11,310 --> 00:02:21,210
Now if we have, say, not included the top, then we would have this output which is hit by it by 512.

34
00:02:21,210 --> 00:02:28,050
So here we have eight by eight by 512.

35
00:02:29,960 --> 00:02:34,310
And when it took in this 256 by 256 by three input.

36
00:02:34,580 --> 00:02:39,350
Now since we have only this one single output.

37
00:02:40,930 --> 00:02:47,900
And we are interested in visualizing the hidden layers that what that's what goes in the VG model.

38
00:02:47,920 --> 00:02:51,660
What we'll do now is we'll create a new model.

39
00:02:51,670 --> 00:02:58,810
We'll create a new model right here, which instead now has many different outputs.

40
00:02:58,810 --> 00:03:03,850
And there's different outputs will come from this different hidden layers.

41
00:03:03,850 --> 00:03:07,720
So we could take this one and it becomes now an output.

42
00:03:07,720 --> 00:03:13,250
This one, it becomes an output, this one output, and so on and so forth.

43
00:03:13,270 --> 00:03:20,680
So basically this hidden layers now are this hidden or the outputs of the hidden layers, that's our

44
00:03:20,680 --> 00:03:23,940
feature maps will become our outputs.

45
00:03:23,950 --> 00:03:31,060
So now have this model with this as input and then this as outputs.

46
00:03:31,060 --> 00:03:35,010
So this will be now our different outputs instead of just a single output.

47
00:03:35,020 --> 00:03:37,360
Now we have this different outputs.

48
00:03:37,360 --> 00:03:40,970
There'll be about 17 outputs in total.

49
00:03:40,990 --> 00:03:48,200
Now you may also decide to pick the specific output, so you may want to take only the conf layers or

50
00:03:48,250 --> 00:03:49,720
only the outputs of the conf layer.

51
00:03:49,720 --> 00:03:57,040
So you meet the max layers your your, your and your with this one.

52
00:03:57,040 --> 00:04:00,070
But it all depends on you and we'll see how to do this.

53
00:04:00,070 --> 00:04:04,750
So let's get back to the code where we're not going to build our feature maps.

54
00:04:04,750 --> 00:04:12,070
So we'll take the feature maps and then we'll put this in a list so you will get the layer output and

55
00:04:12,070 --> 00:04:18,310
this four layer in the VG backbone layers.

56
00:04:18,310 --> 00:04:25,120
So we, we, we have this big backbone layers right here or this big backbone model here.

57
00:04:25,120 --> 00:04:28,570
We're going to get all its layers starting from this one.

58
00:04:28,570 --> 00:04:29,800
We're going to pick the input.

59
00:04:29,800 --> 00:04:35,140
So we'll simply have this and that will be it.

60
00:04:35,140 --> 00:04:45,100
So we take this layers and then from here we'll build this new model which we'll call feature map model

61
00:04:45,100 --> 00:04:54,130
and Keras model from your it takes in as inputs the VG backbone.

62
00:04:54,340 --> 00:04:58,600
So we have VG backbone input here.

63
00:04:58,600 --> 00:04:59,860
We have the VG backbone.

64
00:04:59,860 --> 00:05:06,460
Input is the same input, but now with the difference that the outputs is this feature maps here.

65
00:05:06,460 --> 00:05:15,220
So we don't have this just one output, but all this other hidden layer outputs will now become our

66
00:05:15,220 --> 00:05:17,770
outputs or be part of our outputs.

67
00:05:17,770 --> 00:05:20,530
So here we have feature maps.

68
00:05:20,530 --> 00:05:21,280
So that's it.

69
00:05:21,280 --> 00:05:22,720
We've built this new model.

70
00:05:22,720 --> 00:05:31,630
We'll view the summary feature map model, that summary, and now we go, So that's it.

71
00:05:32,320 --> 00:05:39,410
It looks similar to what we have, but if we had picked, say, from model one to just model for or

72
00:05:39,460 --> 00:05:44,620
sorry, from layer one to layer four, then you see it's shut in because this all we need for this,

73
00:05:44,620 --> 00:05:45,580
our new model.

74
00:05:45,580 --> 00:05:51,340
But since we're getting right up to the end, you see that we actually go through the whole VG model,

75
00:05:51,340 --> 00:06:00,100
but with the difference that now we have outputs, that we have many outputs and we have just this one

76
00:06:00,100 --> 00:06:06,670
single input unlike before, where we have one input and one output.

77
00:06:06,850 --> 00:06:13,630
So from here we have this model which we've just this new model which we've just designed starts from

78
00:06:13,630 --> 00:06:17,320
one run that again, you have that.

79
00:06:17,320 --> 00:06:23,260
And now let's head on to passing an input to this model.

80
00:06:23,260 --> 00:06:31,210
So what we want to do now is we take this input image and then we pass it into our model.

81
00:06:31,210 --> 00:06:39,010
And now since our model outputs the different feature maps, the different hidden layer outputs, we

82
00:06:39,010 --> 00:06:46,750
will now be able to visualize what's going on inside our VG model.

83
00:06:48,830 --> 00:06:54,650
Now to get this output, we are going to use something similar to the testing which we've seen already.

84
00:06:54,770 --> 00:06:56,010
Recall we did.

85
00:06:56,030 --> 00:07:00,410
We carried out this testing here where we take we read this image.

86
00:07:00,410 --> 00:07:05,500
We could simply copy this, where we read this image, and then we pass it to our model to get the output.

87
00:07:05,510 --> 00:07:09,080
But now in our case, the model let's reduce this.

88
00:07:09,080 --> 00:07:15,480
The model will be working with is our newly created feature map model.

89
00:07:15,500 --> 00:07:18,320
So let's have this here and that's it.

90
00:07:18,320 --> 00:07:27,380
We have our test image, we resize, we pass this in this feature map model, and then you'll see that

91
00:07:28,040 --> 00:07:31,850
when we run this, we now check our feature maps.

92
00:07:31,850 --> 00:07:38,780
So we'll say for let's say for I in range length of the feature maps.

93
00:07:39,050 --> 00:07:43,340
We want to print out one to print out the F map ship.

94
00:07:43,730 --> 00:07:49,430
So we have that list has no attribute shape.

95
00:07:49,700 --> 00:07:55,700
Okay F maps I so let's pick out this I, we run that and you see this.

96
00:07:55,700 --> 00:08:01,460
So you see that the output styles from your from this one instead of the input starts from actually

97
00:08:01,460 --> 00:08:09,290
starts from here as we since we have decided to start from this one since we start from here because

98
00:08:09,290 --> 00:08:12,560
we don't want to include the input as part of our output.

99
00:08:12,560 --> 00:08:14,300
So that's logical.

100
00:08:14,300 --> 00:08:15,170
We have that.

101
00:08:15,170 --> 00:08:19,400
We start from this right up to this very last one year.

102
00:08:19,670 --> 00:08:23,420
Now here we've picked the conf layers and the max layers.

103
00:08:23,600 --> 00:08:31,820
So we have all this now so we can now visualize this different feature maps right here.

104
00:08:32,750 --> 00:08:35,720
Now let's note this length, let's print this out.

105
00:08:35,720 --> 00:08:36,650
Let's note this land.

106
00:08:36,660 --> 00:08:38,990
You see you have 18 different outputs.

107
00:08:39,110 --> 00:08:42,080
We get back here and we modify this.

108
00:08:42,080 --> 00:08:48,710
So we'll say that we'll only do this if it's conf of layer.

109
00:08:48,710 --> 00:08:51,050
That name is true.

110
00:08:51,050 --> 00:08:58,740
So if this is true, if this is true, then we are going to attach this to the outputs.

111
00:08:58,760 --> 00:09:01,280
Now we could also do this.

112
00:09:01,280 --> 00:09:03,500
We could just simply have it like this and that's fine.

113
00:09:03,500 --> 00:09:03,890
Now.

114
00:09:03,890 --> 00:09:09,360
So what this means is we are only going to take the conf layers as part of our output.

115
00:09:09,380 --> 00:09:14,330
Now we could define this is conf, this is context and a layer name.

116
00:09:14,330 --> 00:09:20,150
So basically this layer names what we have here does you see why it's important to always give your

117
00:09:20,150 --> 00:09:27,350
layers some names because now you see it's helpful or it's used now to differentiate between the different

118
00:09:28,010 --> 00:09:30,290
types of layers.

119
00:09:30,290 --> 00:09:38,180
So here we have this layer name and then we're going to see that if this layer name, if this layer

120
00:09:38,180 --> 00:09:52,790
name or rather if conf in the layer name, we return true, then else we return false.

121
00:09:52,790 --> 00:09:54,170
So we run that.

122
00:09:54,800 --> 00:09:56,750
There we go, We run this again.

123
00:09:56,750 --> 00:10:03,620
It looks similar to what we had before, but one thing you will notice now is that this length is reduced.

124
00:10:03,620 --> 00:10:05,420
So we've taken off five.

125
00:10:05,750 --> 00:10:12,830
We have gone from 18 to 13, so we've taken off five layers which correspond to this max layers since

126
00:10:12,830 --> 00:10:17,980
this conf, this conf isn't in this name right here.

127
00:10:17,990 --> 00:10:18,950
So that's it.

128
00:10:18,950 --> 00:10:19,850
We have that.

129
00:10:19,850 --> 00:10:23,210
We could also decide to say, okay, we want to take only the max polls.

130
00:10:23,210 --> 00:10:29,380
So we will say if Paul or Paul we run that and check this line.

131
00:10:29,390 --> 00:10:30,740
You see now only five.

132
00:10:30,770 --> 00:10:31,670
There we go.

133
00:10:31,790 --> 00:10:32,810
So that's it.

134
00:10:33,170 --> 00:10:37,370
Let's get back to conf and we have that.

135
00:10:37,670 --> 00:10:44,570
Okay, so now we've run this and we have the different ships now to carry out the final visualization.

136
00:10:44,570 --> 00:10:47,060
You see you have this F maps here.

137
00:10:47,060 --> 00:10:50,750
So we're going to go through each and every feature map.

138
00:10:50,750 --> 00:11:02,000
So for I in range, the length of our feature maps, we now create this figure and then we specify the

139
00:11:02,000 --> 00:11:03,080
fixed size.

140
00:11:03,080 --> 00:11:09,620
So we have fixed size equal to 56 by 256.

141
00:11:09,620 --> 00:11:15,200
Now we have that we call this method right here.

142
00:11:15,350 --> 00:11:20,330
And then from year since we're going through each and every feature map is important for us to get the

143
00:11:20,330 --> 00:11:21,470
feature size.

144
00:11:21,470 --> 00:11:25,310
So we want to get this values for each feature map.

145
00:11:25,430 --> 00:11:33,770
Now with this we just simply have F maps, we have K, and then I'll rather I here.

146
00:11:33,770 --> 00:11:41,330
So we get in this, we pick a feature map, we get that and then we get the shape one.

147
00:11:41,330 --> 00:11:43,400
So this will permit us to get this value.

148
00:11:43,400 --> 00:11:47,270
So this is shape zero shape one shape to shape.

149
00:11:47,500 --> 00:11:48,040
Three.

150
00:11:48,040 --> 00:11:52,690
So we get this feature size though of feature map size.

151
00:11:52,750 --> 00:11:55,570
Then we now get to this number of channels.

152
00:11:55,570 --> 00:12:00,910
So we have N channels equal the feature maps.

153
00:12:01,180 --> 00:12:06,790
I shape three three because the 0123.

154
00:12:06,790 --> 00:12:08,690
So is how we get the number of channels.

155
00:12:08,710 --> 00:12:10,990
Now we have this already set.

156
00:12:12,040 --> 00:12:19,030
We want to be able to visualize this such that like your search that all these channels are aligned

157
00:12:19,030 --> 00:12:19,960
on a single line.

158
00:12:19,960 --> 00:12:26,860
So because we have this 512 16 by 16, let's let's check this earlier.

159
00:12:26,860 --> 00:12:32,980
Once we have like your we have 64 to 56 by 256 images.

160
00:12:32,980 --> 00:12:35,590
So let's suppose that this is one of them.

161
00:12:35,590 --> 00:12:35,890
Yeah.

162
00:12:35,890 --> 00:12:42,670
We have this 256 to 56 by 256 right here.

163
00:12:42,910 --> 00:12:49,630
And then we have 64 of this for the 64 different channels right here.

164
00:12:49,900 --> 00:12:53,440
Now what we want to do is take this one.

165
00:12:53,440 --> 00:12:58,930
Let's take this one, put it here, take this other one and align it.

166
00:13:00,920 --> 00:13:09,830
Take the next one, align it so we could visualize this in one line up to the very last one right here.

167
00:13:10,250 --> 00:13:20,420
So to do this now, what we'll do is we'll create another array, which we'll call joint maps, Joint

168
00:13:20,420 --> 00:13:23,580
maps or MP.

169
00:13:23,600 --> 00:13:28,430
Once we initialize that way, and then the size is will take the feature size.

170
00:13:28,430 --> 00:13:35,810
So we have f size and then to get the this for the width, this for the height actually.

171
00:13:35,810 --> 00:13:41,420
So we know that the height in the case of 256 by 256 this height is 256.

172
00:13:41,420 --> 00:13:49,610
So we have this height to 56, this distance to 56, but then now the width is going to change.

173
00:13:49,610 --> 00:13:55,360
So the width is no longer 256 but 256 times in this case 64.

174
00:13:55,370 --> 00:14:04,430
So here we have 256, here we have f size times the number of channels.

175
00:14:04,460 --> 00:14:05,690
So that's how we do that.

176
00:14:05,690 --> 00:14:10,190
So with that, we now have this joint maps which is initialized to one.

177
00:14:10,190 --> 00:14:13,610
So we have this array now set.

178
00:14:13,640 --> 00:14:20,090
Now the next step will be to fill in this values, this outputs, this features in this array.

179
00:14:21,020 --> 00:14:25,710
Now we understand how this joint maps here was created.

180
00:14:25,730 --> 00:14:33,840
We now go ahead to fill this information, all these different features in this giant maps array.

181
00:14:33,860 --> 00:14:37,730
So, yeah, we'll do we'll go through the different channels.

182
00:14:37,730 --> 00:14:44,330
So for J in range and channels.

183
00:14:44,330 --> 00:14:50,690
So basically in channels we have that and then we fill in the joint maps.

184
00:14:51,020 --> 00:14:58,220
Now the way we'll do this is we'll keep the height fixed so we have the height fixed and then in this

185
00:14:58,220 --> 00:15:07,610
width dimension will fill this information in a way that as we go from one channel to another, we are

186
00:15:07,610 --> 00:15:10,930
going to skip steps up to 56.

187
00:15:10,940 --> 00:15:21,740
So here we will have f size, our filter size, our feature map sizes, 256 F size times g, and then

188
00:15:21,740 --> 00:15:28,300
we go to F size times G plus one.

189
00:15:28,310 --> 00:15:32,450
So what this means is would we fix the height?

190
00:15:32,450 --> 00:15:35,570
As we've said already, we will fix this height here.

191
00:15:35,570 --> 00:15:37,550
The height is fixed to 56.

192
00:15:37,550 --> 00:15:38,660
That's it here.

193
00:15:38,660 --> 00:15:43,790
So we'll take all elements and the height and then for the width.

194
00:15:45,960 --> 00:15:52,980
When J equals zero, for example, we have zero up to zero plus one.

195
00:15:52,980 --> 00:15:57,360
So the year we have zero zero plus one, so we have zero.

196
00:15:57,840 --> 00:16:04,190
And then zero times F is zero, then one times F size is 256.

197
00:16:04,200 --> 00:16:10,920
So this means that in the width dimension we're going to go in the height we have to 56 and the width

198
00:16:10,920 --> 00:16:12,480
we have to 56.

199
00:16:12,780 --> 00:16:23,670
Now when J goes to one now here we have one one times f size is 256, so we have 256 and then one plus

200
00:16:23,670 --> 00:16:26,490
one is two, two times 256 is 512.

201
00:16:26,490 --> 00:16:29,070
So now we'll skip to 56 steps at this point.

202
00:16:29,070 --> 00:16:33,540
We add this point now and we get this one and then we repeat this same process again.

203
00:16:33,540 --> 00:16:46,240
When when J equals two, we have 512 and then we move to this should be 768 no two, five, six here,

204
00:16:46,260 --> 00:16:51,660
768 So now we go from 512 to 768.

205
00:16:51,660 --> 00:16:55,920
And so that's how we are going to be filling up this different positions right here.

206
00:16:56,130 --> 00:17:02,880
Now, once we have this already set, we now go ahead and pass in the data what we have to fill in here.

207
00:17:02,880 --> 00:17:10,230
So we have F maps and then we take in I if we consider this case where G equals zero, that's we've

208
00:17:10,230 --> 00:17:13,940
picked this zone from zero 2 to 56 and we've collected all the height.

209
00:17:13,950 --> 00:17:21,840
Then we have this patch right here and to fill up this patch we have our feature maps which we've seen

210
00:17:21,840 --> 00:17:22,500
already.

211
00:17:22,500 --> 00:17:26,820
But then we are going to take a particular feature map.

212
00:17:26,970 --> 00:17:32,970
Obviously this either picks that particular feature map, then once we pick the particular feature map,

213
00:17:32,970 --> 00:17:40,290
we can now go ahead and set this here to the values of the feature maps while selecting the particular

214
00:17:40,290 --> 00:17:41,100
channel.

215
00:17:41,250 --> 00:17:47,280
So now we when G equals zero, for example, we take the zero channel and so we take all the values

216
00:17:47,280 --> 00:17:52,320
which come before and then pick out now this zero channel, for example.

217
00:17:52,320 --> 00:17:53,610
Now here is G.

218
00:17:53,610 --> 00:17:54,390
So that's it.

219
00:17:54,390 --> 00:18:02,130
We have, we have our join maps which has now been created and then we could take this off how we now

220
00:18:02,130 --> 00:18:03,540
are ready to plot our image.

221
00:18:03,540 --> 00:18:06,650
So we have this show and then we passenger and maps.

222
00:18:06,660 --> 00:18:13,110
Now if we want to pass all this is going to be very rum consuming.

223
00:18:13,110 --> 00:18:18,150
So we just take the we select all the height and then we'll pick some value.

224
00:18:18,150 --> 00:18:22,200
So we'll go from zero to, for example, 512.

225
00:18:22,200 --> 00:18:23,580
So we'll have that.

226
00:18:23,580 --> 00:18:28,950
And now we could run this, but before running this we need to set the different axis.

227
00:18:28,950 --> 00:18:37,410
Here we have axis, then we have plot, dot subplot, the length of the feature maps.

228
00:18:37,410 --> 00:18:44,640
So yeah, we will basically if we have 13 feature maps, then we are going to have this different subplots.

229
00:18:44,790 --> 00:18:45,930
Your one.

230
00:18:45,930 --> 00:18:49,410
And then here we have I plus one.

231
00:18:49,410 --> 00:18:50,190
So that is it.

232
00:18:50,190 --> 00:18:54,810
So once we have this set, we can now run this and see what we get.

233
00:18:55,100 --> 00:18:58,410
Takes a while to run and now that is complete.

234
00:18:58,410 --> 00:19:00,420
We could visualize the results here.

235
00:19:01,170 --> 00:19:04,530
Let's simply scroll down and you see this result.

236
00:19:04,530 --> 00:19:10,230
You see for the initial layers, we have this low level features which have been extracted.

237
00:19:10,230 --> 00:19:13,020
So we have we could see this clearly here.

238
00:19:13,020 --> 00:19:21,480
And as we go down or as we go further or deeper into the network, this the features we extracting start

239
00:19:21,480 --> 00:19:23,610
to become more high level features.

240
00:19:23,610 --> 00:19:36,480
So you see this see here, see this one focuses see here we extract this mount here unlike before,

241
00:19:36,480 --> 00:19:40,380
where we're more focused on edges.

242
00:19:41,940 --> 00:19:42,780
So that's it.

243
00:19:42,780 --> 00:19:51,060
We keep going deeper and we see the outputs or the results we get in and that's it.

244
00:19:51,060 --> 00:19:54,390
We have just visualized a train models feature maps.

245
00:19:54,930 --> 00:20:02,400
Now another thing we could do is suppose at this beginning here that we have known right here.

246
00:20:02,400 --> 00:20:11,520
So we don't we don't want the pre-trained weights, so we run this, run this again and check out on

247
00:20:11,520 --> 00:20:14,940
what we are going to or what the model is going to produce.

248
00:20:17,240 --> 00:20:19,070
Here is where we get you.

249
00:20:19,070 --> 00:20:19,820
Let's crawl.

250
00:20:19,820 --> 00:20:21,280
So you get to see this.

251
00:20:21,290 --> 00:20:22,670
You see the inputs.

252
00:20:23,990 --> 00:20:26,480
And as we go deeper.

253
00:20:27,020 --> 00:20:34,460
We will notice that not much information is yet to be extracted from the inputs.
