1
00:00:06,520 --> 00:00:07,840
Hi and welcome back.

2
00:00:08,560 --> 00:00:12,220
We're about to take a look at filter and class maximization.

3
00:00:12,730 --> 00:00:16,030
So open notebook 10 here and we'll get started.

4
00:00:16,960 --> 00:00:22,510
So to recap quickly, what we're going to take a look at is filter maximization and class maximization.

5
00:00:23,080 --> 00:00:29,680
So just to basically remind you of what filter activation is, basically we're trying to find the input

6
00:00:29,680 --> 00:00:33,190
that maximizes the output of that particular filter.

7
00:00:33,460 --> 00:00:38,410
This is a way we can inspect our filters and see what they actually look for, as opposed to actually

8
00:00:38,410 --> 00:00:42,180
visualizing them and seeing how they look, which doesn't tell us that much.

9
00:00:42,190 --> 00:00:43,630
That's what we did in the previous lesson.

10
00:00:44,680 --> 00:00:50,530
So to do this, what we have to do, we have to build a lost function that maximizes the value of given

11
00:00:50,530 --> 00:00:54,550
filter so we can specify what filter we want in any convolutional layer.

12
00:00:55,270 --> 00:01:01,270
And next, we will use stochastic gradient descent that technically ascent to adjust the values of the

13
00:01:01,270 --> 00:01:05,170
input image so that it maximizes the filters activation value.

14
00:01:05,890 --> 00:01:10,320
So as I said, we're going to we're going to use a different library for this.

15
00:01:10,330 --> 00:01:11,390
We're using Keros.

16
00:01:11,420 --> 00:01:15,850
However, we're using a library called T.f Dash Kiarostami's.

17
00:01:15,850 --> 00:01:17,680
I call it Crispies for short.

18
00:01:18,340 --> 00:01:20,570
And you can open this link if you want to check it out.

19
00:01:20,590 --> 00:01:25,240
It's a very good library that has been used in the research community quite a bit.

20
00:01:25,630 --> 00:01:31,390
It allows you to perform a number of types of activation maps, which we'll take a look at shortly.

21
00:01:31,390 --> 00:01:35,200
We'll take a look at Cloud Cam and some of these here school cam as well.

22
00:01:35,860 --> 00:01:41,800
And also, it allows you to do all these types of visualizations, so it's quite cool and quite useful.

23
00:01:42,310 --> 00:01:45,010
So we're going to take a very short tour of it here.

24
00:01:45,370 --> 00:01:49,210
So let's install Kerry's physical notebook.

25
00:01:49,210 --> 00:01:53,710
So we just do pip install notice two exclamation mark here.

26
00:01:53,740 --> 00:01:56,650
It's not needed if PIP is the only thing on the slide.

27
00:01:56,660 --> 00:02:03,250
But if you have other like Python code, you do need to put the exclamation mark so that the club interpreter

28
00:02:03,250 --> 00:02:05,800
knows you're trying to run bash here.

29
00:02:06,010 --> 00:02:11,500
And this is what we do and to the Christmas package has been installed.

30
00:02:11,620 --> 00:02:14,860
So now let's load those packages.

31
00:02:16,360 --> 00:02:20,020
We're also loading TensorFlow as TF just we're not using actually cares.

32
00:02:20,020 --> 00:02:21,970
We're using TensorFlow a bit in the back end.

33
00:02:22,540 --> 00:02:25,780
However, it's a bit interchangeable, as you have seen before.

34
00:02:26,860 --> 00:02:28,060
So what we're going to do?

35
00:02:28,690 --> 00:02:33,400
We haven't done this before, but we're going to load a pre-trained network.

36
00:02:34,180 --> 00:02:40,210
What that means is remember when we loaded our previous models like we trained in this model and we

37
00:02:40,210 --> 00:02:41,950
loaded it into another notebook?

38
00:02:42,550 --> 00:02:44,560
Well, that that's basically what we're doing here.

39
00:02:44,740 --> 00:02:45,790
That's a proofreading model.

40
00:02:46,300 --> 00:02:51,550
Free trade means it's been previously trained and TensorFlow actually comes with TensorFlow.

41
00:02:51,550 --> 00:02:58,630
Keros comes with some pre-trained models, some advanced scenes that have been trained on the base computer

42
00:02:58,630 --> 00:03:03,910
vision dataset that exist image net, which has well over a million images.

43
00:03:04,870 --> 00:03:06,190
So it's pretty good.

44
00:03:06,280 --> 00:03:10,450
It's pretty cool that we can actually load these networks and test them, which we'll take a look at

45
00:03:10,690 --> 00:03:11,580
shortly as well.

46
00:03:11,590 --> 00:03:16,450
We have adolescent's Lisa on where we actually do it just right here.

47
00:03:16,450 --> 00:03:16,750
Sorry.

48
00:03:17,260 --> 00:03:23,590
When we do look at some pre-trained the classic scenes like figure and resonated inception and a few

49
00:03:23,740 --> 00:03:24,520
others as well.

50
00:03:25,540 --> 00:03:26,890
So don't worry about it yet.

51
00:03:26,890 --> 00:03:30,730
So what we're doing here, we're just looking this model.

52
00:03:31,090 --> 00:03:35,770
It already knows the architecture because we're specifying the architecture and the model weights that

53
00:03:35,770 --> 00:03:36,130
we want.

54
00:03:36,130 --> 00:03:41,260
We want to image networks and include top means that we're including the full model.

55
00:03:41,740 --> 00:03:48,550
If we if it would include top to be false, what that does is that it just loads everything except the

56
00:03:48,860 --> 00:03:49,350
actually.

57
00:03:49,900 --> 00:03:52,630
And that allows us to use this model for other applications.

58
00:03:52,630 --> 00:03:54,000
But we'll get into that later on.

59
00:03:54,160 --> 00:03:55,840
Don't I don't want to confuse you guys yet.

60
00:03:56,740 --> 00:03:59,140
So let's load or pre-trained model.

61
00:03:59,590 --> 00:04:02,320
It has to download the weights here and there we go.

62
00:04:02,320 --> 00:04:05,430
It's quite fast and we've got this very big model.

63
00:04:05,440 --> 00:04:08,530
You can see it's a hundred and 2.8 million parameters.

64
00:04:09,040 --> 00:04:09,850
That is quite big.

65
00:04:10,000 --> 00:04:16,810
Actually, it's one of the most models, even though these have less parameters, mainly because this

66
00:04:16,810 --> 00:04:18,700
isn't a very well optimized model.

67
00:04:19,360 --> 00:04:25,630
VGA has been known to be a bulky model, but it's a it's a simple model and it does most tasks quite

68
00:04:25,630 --> 00:04:25,890
well.

69
00:04:25,900 --> 00:04:29,380
So it's like a good old, reliable model that you can use.

70
00:04:29,980 --> 00:04:34,000
So we've loaded it here, and you can see there's a number of convolutional layers here.

71
00:04:34,450 --> 00:04:40,030
The 16 and VGA stands for 16 layers, even though you'll have the max pool ever confuse you.

72
00:04:40,030 --> 00:04:45,210
But if you added up its 16 layers with about 12 lives, roughly 18.

73
00:04:46,180 --> 00:04:49,400
So what we're going to do, we're going to take a look.

74
00:04:49,420 --> 00:04:55,270
We're going to extract this layer here and because we want to visualize filters in that last layer,

75
00:04:55,390 --> 00:04:59,440
because those would be the most interesting filters to visualize the room with the final ones, learn

76
00:04:59,440 --> 00:05:04,780
the most advanced patterns and structure.

77
00:05:05,350 --> 00:05:05,650
So.

78
00:05:05,720 --> 00:05:09,280
You visualized the input that maximizes these filters.

79
00:05:09,820 --> 00:05:14,500
You'll you'll see some cool patterns, you'll see what the CNN has been sort of learning to recognize.

80
00:05:15,760 --> 00:05:21,880
So to do that, we just specify the learning here and we create a function called model modifier.

81
00:05:22,630 --> 00:05:24,250
This takes the current model.

82
00:05:24,250 --> 00:05:25,540
That's a model we looked at here.

83
00:05:26,140 --> 00:05:30,310
And what we do, we specified only in name that we wanted to extract.

84
00:05:30,790 --> 00:05:37,390
So we take Leonean here, give it to the input of this function called Cattleya, and we get our target

85
00:05:37,390 --> 00:05:38,080
layer here.

86
00:05:38,590 --> 00:05:40,660
Next, we create a new model.

87
00:05:41,110 --> 00:05:46,600
This new model, we use a carrot stock model function where we specify the inputs to be the current

88
00:05:46,600 --> 00:05:52,090
model, but inputs that the model we loaded here and the outputs to be the target layer, that's the

89
00:05:52,090 --> 00:05:54,100
outcome of that model right there.

90
00:05:55,300 --> 00:05:57,430
So it's a bit confusing so far.

91
00:05:57,560 --> 00:06:00,010
I will admit I do get a model.

92
00:06:00,010 --> 00:06:05,110
Researchers would sort of get confused sometimes when looking at this code because it's not nothing

93
00:06:05,110 --> 00:06:10,190
straightforward and it's nothing that we commonly do as computer vision practitioners.

94
00:06:10,210 --> 00:06:15,070
This is more of a research topic, but it's a quite good one because it helps you.

95
00:06:15,070 --> 00:06:17,710
It reinforces how CNN's actually live.

96
00:06:18,250 --> 00:06:23,590
So I think it's worthwhile to understand and learn it if you want to expand and extend your knowledge.

97
00:06:24,430 --> 00:06:26,620
So lastly, what we do here?

98
00:06:29,420 --> 00:06:34,520
We specify that we want the new model, that's a model we created up here.

99
00:06:35,360 --> 00:06:40,640
All the layers, except the last one to have an activation using linear activations and cameras.

100
00:06:40,820 --> 00:06:43,950
So like I said, this isn't as straightforward, but this is just how we do it.

101
00:06:43,970 --> 00:06:47,060
So let's run this and create that function.

102
00:06:47,120 --> 00:06:51,140
Next, we're going to create the activation maximization.

103
00:06:51,650 --> 00:06:54,050
So to do that, Keros does.

104
00:06:54,470 --> 00:07:02,630
This has a very cool package called both function call activation maximization that simply takes the

105
00:07:02,630 --> 00:07:03,080
model.

106
00:07:03,080 --> 00:07:08,300
The model modify the function, and we just specify we're not going to clone the model otherwise occupied.

107
00:07:08,300 --> 00:07:09,590
Occupy extra space, no memory.

108
00:07:10,130 --> 00:07:14,690
So we do that and then we go, so we have that next.

109
00:07:14,840 --> 00:07:19,070
We're going to specify that we want to look at the seventh filter on the output.

110
00:07:19,430 --> 00:07:24,530
So we have to create a lost function that basically just returns everything for that filter.

111
00:07:24,530 --> 00:07:26,810
No, it's not a real lost function.

112
00:07:26,810 --> 00:07:32,850
The gradient ascent, which is done in the activation maximization function here, that those direct

113
00:07:32,930 --> 00:07:38,180
the rest of it there, and we just use the outputs from this fake lost function.

114
00:07:38,960 --> 00:07:40,630
Now we're ready to visualize.

115
00:07:40,640 --> 00:07:43,340
So I would scroll down and spoil the visualization.

116
00:07:43,640 --> 00:07:50,550
Surprise for you, even though you might see it in your notebook and you can, but you already you already

117
00:07:50,550 --> 00:07:51,770
have seen it, so it spoils.

118
00:07:52,210 --> 00:07:55,790
It could spoil the surprise, but either way, I want to go to the court for you.

119
00:07:55,820 --> 00:07:57,110
So we're just using some.

120
00:07:57,680 --> 00:08:04,610
We're importing some callbacks here that this this callback or print is something that basically callback

121
00:08:04,610 --> 00:08:09,800
is basically a function that operates during your training process.

122
00:08:09,800 --> 00:08:18,140
And we can do things like extract different metrics we can do early, stopping many of other things.

123
00:08:18,140 --> 00:08:21,560
We will talk about callbacks later on, and it's in this chapter.

124
00:08:21,560 --> 00:08:23,180
So I should best mention what it is.

125
00:08:23,780 --> 00:08:24,560
It's just a way.

126
00:08:24,680 --> 00:08:30,950
It is just a way we can exercise functions during training if we want to do some more monitoring or

127
00:08:30,950 --> 00:08:35,120
advanced early stopping or anything else like that.

128
00:08:35,810 --> 00:08:40,940
So we're doing this one, this one called print, which is it just prints at intervals of 50, which

129
00:08:40,940 --> 00:08:41,690
you will see below.

130
00:08:41,690 --> 00:08:44,150
You can see the steps here 50, 100 150.

131
00:08:44,600 --> 00:08:45,620
That's all it does here.

132
00:08:46,370 --> 00:08:51,230
So we have an activation and basically we just can use that activation.

133
00:08:52,400 --> 00:08:53,690
We use that activation here.

134
00:08:53,690 --> 00:08:56,750
We converted to an umpire here so we can visualize it.

135
00:08:57,350 --> 00:09:00,800
Next, we just these a subplot arguments are not going to go into this.

136
00:09:00,800 --> 00:09:03,440
This is just for the rendering and visualization part of it.

137
00:09:06,210 --> 00:09:10,500
We feed those in there and then we just plot basically.

138
00:09:10,620 --> 00:09:13,770
So the maximization is the trading loop.

139
00:09:13,890 --> 00:09:19,080
Effectively, if you want to call the trading loop, that's performing gradient descent so that we can

140
00:09:19,080 --> 00:09:24,960
actually up to maximize that filter so it doesn't take that long to run, as you can see.

141
00:09:26,670 --> 00:09:27,480
And there we go.

142
00:09:27,660 --> 00:09:33,300
So this is this is the input image that would maximize this filter.

143
00:09:33,990 --> 00:09:35,580
What this means, it's zoom in a bit.

144
00:09:36,720 --> 00:09:40,320
This means here that this filter is looking for this pattern in the image.

145
00:09:40,860 --> 00:09:42,960
It sort of looks like a snakeskin pattern, doesn't it?

146
00:09:43,410 --> 00:09:43,950
Interesting.

147
00:09:44,520 --> 00:09:49,980
Or maybe some sort of rock texture, but it's interesting that this filter is looking for that, and

148
00:09:50,130 --> 00:09:52,740
that is the input image that maximizes that filters output.

149
00:09:53,280 --> 00:09:54,180
So that's pretty cool.

150
00:09:54,390 --> 00:09:57,030
Now let's take a look at some random filter numbers.

151
00:09:57,450 --> 00:09:59,010
So let's do tree filters here.

152
00:09:59,010 --> 00:10:00,840
So we just specify tree random numbers.

153
00:10:00,840 --> 00:10:03,690
You can choose different numbers up to five hundred and twelve.

154
00:10:03,720 --> 00:10:10,200
Since there are only 512 filters in this last layer, and I see it only mainly, mainly because there's

155
00:10:10,200 --> 00:10:12,690
actually quite a bit 512 of this.

156
00:10:13,350 --> 00:10:19,890
So we convert, we create our lost functions just between still, but like we did before, but now it's

157
00:10:19,890 --> 00:10:21,030
for these tree filters.

158
00:10:21,960 --> 00:10:28,770
Next, we just we create seed input because for multiple visualization, visualizing multiple convolution

159
00:10:29,370 --> 00:10:31,710
look filters, it requires.

160
00:10:31,710 --> 00:10:35,220
This package requires us to set the seed value seed input value.

161
00:10:35,610 --> 00:10:37,950
So we just creates a random image here.

162
00:10:38,160 --> 00:10:44,640
That's what this dimension run the values using this test at random uniform function, and that creates

163
00:10:44,640 --> 00:10:45,330
our seed input.

164
00:10:45,900 --> 00:10:48,060
Next, we can sort of visualize.

165
00:10:48,060 --> 00:10:50,400
So I'm not going to go into this code in detail.

166
00:10:50,400 --> 00:10:53,460
It's just basically the same code we saw before.

167
00:10:53,460 --> 00:10:57,630
But no, it's a full loop that does the iterations for each.

168
00:10:59,630 --> 00:11:00,070
Filter.

169
00:11:00,470 --> 00:11:01,550
So let's go.

170
00:11:02,510 --> 00:11:09,470
This will take a bit longer to generate, so we'll kick, kick back and relax and wait for it's should

171
00:11:09,470 --> 00:11:11,030
take about a minute maximum.

172
00:11:11,540 --> 00:11:12,170
Maybe less.

173
00:11:13,730 --> 00:11:14,200
There we go.

174
00:11:14,210 --> 00:11:20,450
So that actually was quite fast and we can see these are the inputs that maximized the other filters.

175
00:11:21,170 --> 00:11:22,760
I honestly, I'm not sure what I'm looking at.

176
00:11:22,760 --> 00:11:26,660
This looks like something from like a coral reef or an alien planet.

177
00:11:27,230 --> 00:11:30,360
This one looks like I do not know.

178
00:11:30,380 --> 00:11:32,270
I wouldn't hazard to guess.

179
00:11:32,540 --> 00:11:40,620
Looks like a frog to me, but this one is this weird pattern almost like, I don't know who knows,

180
00:11:40,620 --> 00:11:42,260
but what I thought was actually looking for.

181
00:11:42,650 --> 00:11:43,550
But that's what it is.

182
00:11:44,390 --> 00:11:47,720
So next, we'll take a look at class maximization.

183
00:11:48,230 --> 00:11:50,570
So task maximization is actually quite cool.

184
00:11:50,570 --> 00:11:54,740
It's one of my favorite types of visualizations to do class maximization.

185
00:11:54,980 --> 00:12:02,270
Instead of looking at individual filters that we that input maximizes, we are not looking at what input

186
00:12:02,270 --> 00:12:08,780
image would maximize the network and seeing that it's this belongs to this specific class.

187
00:12:09,800 --> 00:12:14,360
So to take a look at how we do that, we just basically this is starting from scratch again.

188
00:12:14,840 --> 00:12:17,330
So we look at all the packages if you want.

189
00:12:18,080 --> 00:12:22,910
It's already loaded previously, but it's the code is here from the default, so you can just run it

190
00:12:22,910 --> 00:12:25,760
straight from class maximization if you wanted.

191
00:12:26,480 --> 00:12:29,390
So we load the pre-trained model individually.

192
00:12:29,390 --> 00:12:29,960
16.

193
00:12:29,960 --> 00:12:33,800
So let's do the math here and we get our model loaded.

194
00:12:34,490 --> 00:12:37,730
We have to define the same model, modify a function.

195
00:12:38,150 --> 00:12:42,830
We also have to define our activation maximization or create that object.

196
00:12:44,120 --> 00:12:46,550
Next, we have to create a loss function here.

197
00:12:47,000 --> 00:12:49,050
So notice I have no T here.

198
00:12:49,370 --> 00:12:51,740
This this is specific to a class.

199
00:12:52,460 --> 00:12:56,900
So notice when we loaded in the network, we didn't.

200
00:12:56,990 --> 00:12:58,330
We included a top layer here.

201
00:12:58,370 --> 00:13:01,010
True, that means that the soft max output.

202
00:13:01,400 --> 00:13:07,730
So now we can actually specify the output here to what class we want it to be because we have the number

203
00:13:07,730 --> 00:13:10,370
of nodes in the image and data set.

204
00:13:10,520 --> 00:13:12,830
So you can see the image net classes here.

205
00:13:12,830 --> 00:13:19,640
If you want to try different numbers of combinations, there is a thousand classes and it's quite there's

206
00:13:19,640 --> 00:13:21,290
quite a lot of things going on here.

207
00:13:21,680 --> 00:13:27,460
So a lot of animals and a lot of common household objects, although objects as well.

208
00:13:27,470 --> 00:13:29,900
So it's quite extensive, to be fair.

209
00:13:30,770 --> 00:13:34,160
So let's create all those function for that specific class.

210
00:13:35,180 --> 00:13:37,800
And now we can run on maximization.

211
00:13:37,820 --> 00:13:43,460
So it's very similar to what we did before we have our activation maximization function input.

212
00:13:43,460 --> 00:13:51,410
So lost function intervals here, every possible support arguments, we get our activation out of it

213
00:13:51,530 --> 00:13:54,840
and convert it into an image that we can visualize with MATLAB.

214
00:13:55,640 --> 00:14:04,280
And let's run that, and let's take a look at what, by the way, this this task to you was a believe.

215
00:14:04,280 --> 00:14:05,890
That's a boot, if I remember correctly.

216
00:14:05,900 --> 00:14:09,620
Yes, it is a bid, and you can see it's a type of boot.

217
00:14:09,690 --> 00:14:13,630
This is the actual bid that it was supposed, it looks like.

218
00:14:13,640 --> 00:14:15,440
But think about this.

219
00:14:15,440 --> 00:14:20,210
This is the image that maximizes the CNN output for that specific class.

220
00:14:20,840 --> 00:14:25,400
It doesn't actually look like that, but it just looks like variations of a bit on it, on an image.

221
00:14:25,850 --> 00:14:33,120
So this sort of gives you an indication that CNN's are living differently to how over human brains learn

222
00:14:33,140 --> 00:14:37,890
about what's in images so we can do some more classes here.

223
00:14:37,910 --> 00:14:42,350
We're going to do one school for the goldfish to beer and also a assault rifle.

224
00:14:42,740 --> 00:14:46,700
These are the class numbers fits us one, twenty four and 14.

225
00:14:47,240 --> 00:14:52,630
You can double check that here and see what the Class One was goldfish.

226
00:14:52,790 --> 00:14:53,510
That's correct.

227
00:14:54,920 --> 00:15:03,110
So let's run that create our seed input because it's multiple field of multiple visualizations we're

228
00:15:03,110 --> 00:15:06,500
doing, and now we can visualize using that for a loop.

229
00:15:08,760 --> 00:15:11,130
And now we wait shouldn't take too long.

230
00:15:20,130 --> 00:15:21,010
All right, there we go.

231
00:15:21,030 --> 00:15:27,150
So let's take a look at what our CNN, what a big CNN thinks a goldfish looks like.

232
00:15:28,560 --> 00:15:32,070
That doesn't look like a goldfish to me, but maybe it is.

233
00:15:32,070 --> 00:15:39,330
Maybe it's an amalgamation of different goldfish type objects or different goldfish is maybe two different

234
00:15:39,330 --> 00:15:39,780
kinds.

235
00:15:39,800 --> 00:15:40,350
Who knows?

236
00:15:40,350 --> 00:15:40,810
But it does.

237
00:15:40,830 --> 00:15:45,030
I mean, it does look a little more like a wheel to me, but it can kind of see some of the goldfish

238
00:15:45,390 --> 00:15:47,350
features, maybe two fins a bit.

239
00:15:48,220 --> 00:15:54,360
Now this is a beer, and you can see this is what you see in in things the ideal beer looks like, which

240
00:15:54,360 --> 00:15:56,720
is basically I've seen many beers here.

241
00:15:56,760 --> 00:15:57,870
How many beers do you see?

242
00:15:58,260 --> 00:15:59,310
It's quite weird.

243
00:16:00,060 --> 00:16:05,310
And the assault rifle, you can definitely see outlines of a gun, but basically, this is what the

244
00:16:05,310 --> 00:16:08,430
CNN thinks the perfect assault rifle sort of looks like.

245
00:16:09,360 --> 00:16:11,100
So that's it for this lesson.

246
00:16:11,100 --> 00:16:16,800
I hope you enjoyed it because I think it's quite cool to do this and you can easily adapt discovered

247
00:16:17,280 --> 00:16:18,540
that we've seen here, too.

248
00:16:18,660 --> 00:16:24,630
We just learned any model that you've loaded that you've trained before, and you can start visualizing

249
00:16:24,630 --> 00:16:27,180
lives of class maximization from it.

250
00:16:27,630 --> 00:16:29,850
You can look at a filter maximization.

251
00:16:29,850 --> 00:16:35,370
So I think it'd be quite useful, especially in research, especially like in Masters projects of undergrad

252
00:16:35,370 --> 00:16:35,850
projects.

253
00:16:36,270 --> 00:16:42,540
We want to go a bit extra and show that your professor or supervisors, that you've done a little bit

254
00:16:42,540 --> 00:16:44,460
more research and understanding.

255
00:16:45,030 --> 00:16:46,740
I think this would be quite useful for that.

256
00:16:47,080 --> 00:16:54,600
So we'll stop there and will now move on to the grad cam visualization, and we'll also look at Grad

257
00:16:54,600 --> 00:16:56,370
Cam Plus and faster.

258
00:16:57,780 --> 00:16:59,070
What was it called again?

259
00:17:00,630 --> 00:17:05,610
It's actually listed here first, the school camp.

260
00:17:05,850 --> 00:17:09,780
So we'll stop there and I'll see you in the next lesson.

261
00:17:09,930 --> 00:17:10,350
Thank you.
