1
00:00:00,510 --> 00:00:01,650
Hi and welcome back.

2
00:00:02,160 --> 00:00:07,440
In this section, we'll take a look at filter activations for convolutional neural networks.

3
00:00:08,100 --> 00:00:10,620
This might appear confusing to you guys.

4
00:00:10,710 --> 00:00:16,560
However, I'll try to step through this section quite slowly so that you get a good understanding of

5
00:00:16,980 --> 00:00:22,680
what these filter activations look like and how images are, how it's responsive, different images.

6
00:00:23,160 --> 00:00:24,150
So let's get started.

7
00:00:25,530 --> 00:00:32,730
So imagine that an input image like you have a random image that you're being at feeding into your CNN

8
00:00:33,330 --> 00:00:35,010
now a trend of CNN, I should say.

9
00:00:35,700 --> 00:00:37,500
So what do you expect to happen?

10
00:00:37,620 --> 00:00:39,510
Well, what should happen in theory?

11
00:00:39,510 --> 00:00:45,090
What you should know is that when you feed this image into the net network, certain filters are going

12
00:00:45,090 --> 00:00:50,070
to be activated here because imagine this is this is a visualization of this filter here.

13
00:00:50,490 --> 00:00:55,400
So imagine this filter finds this pattern would have a pattern of drawing here.

14
00:00:55,410 --> 00:00:56,790
It's a random pattern.

15
00:00:57,450 --> 00:00:58,980
It finds this pattern in the image.

16
00:00:59,430 --> 00:01:04,680
That means that basically the filter turns on at that point.

17
00:01:05,340 --> 00:01:07,500
So there are two things we can visualize here.

18
00:01:08,070 --> 00:01:14,730
We can visualize the image basically the pixels in the image or areas of the image that corresponded

19
00:01:14,850 --> 00:01:16,020
to that filter turning on.

20
00:01:16,680 --> 00:01:20,580
Or we can actually visualize the actual activations as they go through.

21
00:01:21,240 --> 00:01:22,340
So let's take a look at both.

22
00:01:22,830 --> 00:01:25,020
So let's take a look at this image here.

23
00:01:25,680 --> 00:01:28,270
Look at the image on the left and bottom left here.

24
00:01:28,290 --> 00:01:29,670
What exactly are we looking at?

25
00:01:30,180 --> 00:01:35,520
Well, if you look carefully, you probably wouldn't notice that this is the outline of a cat somewhat

26
00:01:35,520 --> 00:01:37,200
to confuse two eyes right here.

27
00:01:37,650 --> 00:01:40,290
And a bit of it is you can see this in this image.

28
00:01:40,470 --> 00:01:42,000
It's the most clear, in my opinion.

29
00:01:42,750 --> 00:01:44,040
So what is this?

30
00:01:44,310 --> 00:01:53,610
Well, imagine we fed an image into a network and we wanted to see which, like which region of the

31
00:01:53,610 --> 00:01:55,560
image matched that filter.

32
00:01:56,160 --> 00:01:57,540
Whoop, Odom, explain this right?

33
00:01:57,540 --> 00:01:58,320
So you get this.

34
00:01:58,800 --> 00:02:03,360
But imagine each one of these are the images passed through each filter in our networks.

35
00:02:03,360 --> 00:02:05,640
And remember, we have, you know, example network.

36
00:02:05,640 --> 00:02:07,020
We had 22 filters.

37
00:02:07,530 --> 00:02:11,340
These are the filter outputs for each one passing the input image.

38
00:02:11,760 --> 00:02:13,320
And this on the right here.

39
00:02:13,800 --> 00:02:19,860
As we progressed deeper into the network, you as we remember, the feature maps gets smaller and smaller

40
00:02:20,190 --> 00:02:23,310
as we progress through the network because of the Max school is.

41
00:02:23,730 --> 00:02:27,990
So that's why they're they're more blurry and contain less pixels.

42
00:02:28,590 --> 00:02:32,020
So one thing you should notice and you can notice something quite quickly.

43
00:02:32,650 --> 00:02:40,470
The outputs tend to be sparse and localized, meaning that the scattered like this and basically a little

44
00:02:40,470 --> 00:02:44,880
bit region specific in some cases, and some of them didn't even turn on like this.

45
00:02:45,030 --> 00:02:47,370
Five six right here didn't even turn on.

46
00:02:48,390 --> 00:02:49,080
What does that mean?

47
00:02:49,710 --> 00:02:54,360
I mean, sometimes we can use this method of visualizing filter activations to spot dead filters that

48
00:02:54,360 --> 00:02:55,980
filters that don't turn on any input.

49
00:02:56,430 --> 00:03:01,530
It can happen depending on the initialization of the way it's you might have fulfilled as a do nothing

50
00:03:01,530 --> 00:03:04,110
and this add complexity to the network.

51
00:03:05,040 --> 00:03:06,810
So let's take a look at something else here.

52
00:03:07,620 --> 00:03:08,640
Here we have an image.

53
00:03:09,030 --> 00:03:09,660
It's a --.

54
00:03:09,660 --> 00:03:15,270
In case you didn't can tell, because it's quite small and you can see that filters are tuning on in.

55
00:03:15,270 --> 00:03:21,000
Many different regions like the second filter here turns on, maybe for the blue sky in background.

56
00:03:21,780 --> 00:03:26,760
This one turns on for this looks like a send the beach and some of the landscape.

57
00:03:27,450 --> 00:03:30,810
This one turns on for the see, especially sequences learning.

58
00:03:30,810 --> 00:03:34,020
A lot of the filters correspond to the many different parts of the image.

59
00:03:34,770 --> 00:03:40,140
So in the previous section, what we were looking at this is the path I don't want you to be confused

60
00:03:40,140 --> 00:03:40,500
about.

61
00:03:41,040 --> 00:03:46,830
We were looking at the raw widths of the filters themselves, just looking at us, looking at the patterns.

62
00:03:47,220 --> 00:03:52,830
Now we're seeing what parts of the image are turned on when we applied those feature detectors to them.

63
00:03:53,580 --> 00:03:56,880
So here the filter out emissions, we looked to the output.

64
00:03:57,540 --> 00:04:02,640
So again, here what we're doing, which is filter activations, is looking at the output generated

65
00:04:03,060 --> 00:04:05,580
applied by applying a filter to an input image.

66
00:04:06,420 --> 00:04:09,270
So let's take a look at these filter activations here.

67
00:04:09,870 --> 00:04:12,660
This was the one I showed you in the previous slide the one on the left.

68
00:04:13,170 --> 00:04:14,310
And you can see different.

69
00:04:14,340 --> 00:04:15,360
These are the different layers.

70
00:04:15,360 --> 00:04:20,310
So this block of filters and this block in this block are progressive CNN layers.

71
00:04:20,310 --> 00:04:26,190
And you remember as we progressed through the CNN network to fit the MAX bill, the max pooling function

72
00:04:26,610 --> 00:04:31,320
basically and even just applying a kind of filter reduces the output size.

73
00:04:31,350 --> 00:04:34,170
That's why these look a lot more pixelated than this.

74
00:04:34,770 --> 00:04:39,660
That's what we're visualizing here so we can see as we progress from the early layers left that are

75
00:04:39,670 --> 00:04:42,360
deeply as in the right to feature map show less detail.

76
00:04:43,080 --> 00:04:48,510
This shows that the image I mentioned get smaller, which is what I just mentioned as we progress through

77
00:04:48,510 --> 00:04:49,080
to CNN.

78
00:04:49,080 --> 00:04:53,430
In most cases, I mean, we can use padding and keep the resolution consistent rhetoric.

79
00:04:55,590 --> 00:05:01,110
So this enables what this does, this is important now, this enables the upper layers to learn more

80
00:05:01,110 --> 00:05:07,200
complex patterns because the upper layers here, the ones that are sparse in this case here correspond

81
00:05:07,200 --> 00:05:10,330
to multiple different patterns of the input layers.

82
00:05:10,350 --> 00:05:16,080
That's how this could seem confusing because it did confuse me the first time I learned this on my own

83
00:05:16,590 --> 00:05:20,370
that the even though the feature maps are a lot smaller here.

84
00:05:20,820 --> 00:05:27,210
These feature maps correspond to a number of different low level filters here, so they correspond to

85
00:05:27,570 --> 00:05:33,600
much more complicated pattern detectors like this little dot right here might actually correspond to

86
00:05:34,290 --> 00:05:38,040
something quite complicated, like a tree or palm tree or something.

87
00:05:38,520 --> 00:05:42,850
Mainly because that mean mainly because it's a combination of the previous filters.

88
00:05:43,320 --> 00:05:45,090
So hope you understood that.

89
00:05:45,990 --> 00:05:47,340
Now, here's a simple example.

90
00:05:47,580 --> 00:05:49,950
This is an example of the letter.

91
00:05:50,010 --> 00:05:51,950
G looks like a nine.

92
00:05:52,320 --> 00:05:56,820
I think it's a nine, actually, because the M this data sets, it's going to be the digit.

93
00:05:56,910 --> 00:06:02,340
So this digit here has been inputs into basically a CNN.

94
00:06:02,670 --> 00:06:04,590
And this is actually what we're going to do in our code.

95
00:06:04,590 --> 00:06:09,570
Listen, we're going to see the activations for each filter has a capacity to see another.

96
00:06:10,080 --> 00:06:14,550
Now the bright spots correspond to activations of dark spots correspond to new activations.

97
00:06:14,970 --> 00:06:17,670
So you can see this a dead filter, at least not a dead films.

98
00:06:17,670 --> 00:06:23,400
In the case where it doesn't respond to anything with the number 19 but respond to a one or two, who

99
00:06:23,400 --> 00:06:25,020
knows, we have to experiment and see.

100
00:06:25,710 --> 00:06:31,170
So you can actually see which parts of the image or which filter is responding to which parts of the

101
00:06:31,170 --> 00:06:31,500
image.

102
00:06:31,590 --> 00:06:34,350
So this is a pretty cool experiment to do.

103
00:06:34,680 --> 00:06:36,840
So let's take a look at how we implemented this.

104
00:06:37,350 --> 00:06:43,110
We just create a model that's what we did before the standard model that takes an input image and outputs.

105
00:06:43,620 --> 00:06:47,730
But we have to stop the model and only output the feature lapse because that's what we want to visualize.

106
00:06:48,840 --> 00:06:51,360
Then we look to image the normalized output.

107
00:06:51,480 --> 00:06:54,630
That means between zero and one or whatever scale we want to use.

108
00:06:54,630 --> 00:06:58,170
But zero one works well, but not what the visualizations.

109
00:06:58,710 --> 00:07:04,620
And then we propagate over new input or any input you want to our new model.

110
00:07:05,070 --> 00:07:07,500
And then we extract the feature map response.

111
00:07:07,890 --> 00:07:12,450
Remember what we visualizing here is the outputs of the feature maps essentially that we want to use

112
00:07:12,450 --> 00:07:13,680
MATLAB to visualize.

113
00:07:13,860 --> 00:07:20,970
So we'll stop there and we'll move on to implementing the filter activations and the filter visualization

114
00:07:21,360 --> 00:07:24,370
encoder using Keros afterwards.

115
00:07:24,390 --> 00:07:27,960
What we're going to do is tickle taking a look at maximizing filters.

116
00:07:28,740 --> 00:07:29,910
So that's going to be interesting.

117
00:07:29,910 --> 00:07:31,770
So stay tuned for that lesson.

118
00:07:31,950 --> 00:07:32,400
Thank you.
