1
00:00:00,120 --> 00:00:00,410
Hi.

2
00:00:00,510 --> 00:00:05,910
Welcome back to the visualization part of this course in helping you understand what CNN's live.

3
00:00:06,450 --> 00:00:09,600
So in this section, we'll take a look at maximizing filters.

4
00:00:10,110 --> 00:00:15,960
This basically is trying to tell us what image will make our filter fire fully.

5
00:00:16,230 --> 00:00:18,270
So let's take a look at how we do this.

6
00:00:18,960 --> 00:00:26,160
So as I said, what input maximizes our filter to truly understand what our filter is looking for?

7
00:00:26,700 --> 00:00:33,180
It's best if we can find the input image that results in that specific filter, obtaining its maximum

8
00:00:33,180 --> 00:00:34,530
real output score.

9
00:00:35,370 --> 00:00:40,680
So remember, this is a network here and what we're trying to find, this is a real output in the end

10
00:00:40,680 --> 00:00:40,950
here.

11
00:00:41,310 --> 00:00:47,970
We're trying to find what image here will cause the maximum values that we can possibly get here.

12
00:00:48,660 --> 00:00:49,770
So it's a bit tricky, isn't it?

13
00:00:49,780 --> 00:00:51,480
So let's take a look at how we do that.

14
00:00:51,930 --> 00:00:58,200
So now suppose we had a filter that responded maximum its maximum response output kitchen map output

15
00:00:58,800 --> 00:01:00,500
was to chins.

16
00:01:00,690 --> 00:01:07,770
OK, so you would think something like this might be then the input image that maximizes that particular

17
00:01:07,770 --> 00:01:09,810
filter that's looking for a chin?

18
00:01:10,500 --> 00:01:14,880
Well, in reality, it actually looks more like this kind of a mess.

19
00:01:14,880 --> 00:01:16,080
Isn't it kind of confusing?

20
00:01:16,650 --> 00:01:18,030
But that's actually what I was.

21
00:01:18,030 --> 00:01:18,870
CNN's learning.

22
00:01:18,880 --> 00:01:22,740
That's what the filters are learning, so it's pretty interesting to analyze each of these results.

23
00:01:23,250 --> 00:01:29,310
There were a lot of good papers around CNN visualization, and that's how Google Deep Dream, which

24
00:01:29,310 --> 00:01:31,080
we'll discuss later on and discourse.

25
00:01:31,650 --> 00:01:37,530
That's how that algorithm came about, trying to basically manipulate filter maximization and class

26
00:01:37,530 --> 00:01:39,360
maximization and those sorts of things.

27
00:01:40,050 --> 00:01:47,520
So viewing the input that maximizes each filter, it gives us a nice visualization of the CNN's modular

28
00:01:47,850 --> 00:01:50,580
hierarchical decomposition of its visual space.

29
00:01:51,030 --> 00:01:52,230
Pretty big sounding words.

30
00:01:52,230 --> 00:01:57,570
But what I'm trying to say here is that firstly, it's basically just in cool direction and color.

31
00:01:58,080 --> 00:02:03,660
Now this direction and colorful tones then get combined it into basic grids and spots and structures

32
00:02:03,660 --> 00:02:05,520
and more complicated patterns.

33
00:02:05,940 --> 00:02:11,580
So that's basically how we build up the sea and how the CNN builds up its ability to recognize complex

34
00:02:11,580 --> 00:02:12,060
patterns.

35
00:02:12,540 --> 00:02:18,750
So you can see, conversely, one here looks quite like this like textures and that type of stuff,

36
00:02:19,620 --> 00:02:19,780
too.

37
00:02:19,810 --> 00:02:23,970
We can see it's a little bit more structured, but doesn't look that much different to the conflict.

38
00:02:23,980 --> 00:02:29,490
One kind of tree you can see a bit more distinct patterns, definitely in control.

39
00:02:29,520 --> 00:02:34,320
You can see a lot more definite patterns as the image you can see.

40
00:02:34,320 --> 00:02:38,760
This looks like some sort of cloth pattern, and this looks like it's the same thing, but different

41
00:02:38,760 --> 00:02:39,330
angles.

42
00:02:39,330 --> 00:02:42,750
This one looks like a bunch of weird swirls of blobs in the middle.

43
00:02:43,230 --> 00:02:48,720
This one looks like a bunch of little circles, so you can see it's interesting patterns that the higher

44
00:02:49,230 --> 00:02:49,980
layers learn.

45
00:02:51,120 --> 00:02:55,800
So this is basically a summary of filter maximization here that we're seeing.

46
00:02:56,280 --> 00:02:59,730
Now, let's take a look at how we can implement this and could.

47
00:02:59,730 --> 00:03:02,220
So firstly, we have to load a tree in the model.

48
00:03:02,220 --> 00:03:05,550
So we have a model, a model that we've trained previously.

49
00:03:06,150 --> 00:03:11,310
Then we define a lost function that seeks to maximize the activation of a specific filter.

50
00:03:11,910 --> 00:03:16,560
We give it to filter index here and specific specifically and give it actually a name as well.

51
00:03:17,430 --> 00:03:22,740
And then a small trick here is that we normalized the gradient of pixels of the input image, which

52
00:03:22,740 --> 00:03:29,190
avoids very small and very large gradients to ensure a smooth gradient assent process.

53
00:03:29,760 --> 00:03:35,850
Now, this will sound a bit tricky, so it's fully understandable if you don't understand what is going

54
00:03:35,850 --> 00:03:36,790
on right now.

55
00:03:37,260 --> 00:03:41,430
However, hopefully when we start to execute this and could, it will only a lot more sense.

56
00:03:41,430 --> 00:03:44,640
This is a pretty high level explanation of what we're doing.

57
00:03:45,090 --> 00:03:51,240
But basically, in layman's terms, what we're trying to do is that we're trying to find a way to create

58
00:03:51,240 --> 00:03:58,140
a lost function that by using the gradient asset essence press process instead of this descent process.

59
00:03:58,650 --> 00:04:05,070
So instead of trying to find a minimum loss for inside, a maximum of us know of an input that maximizes

60
00:04:05,070 --> 00:04:06,210
that particular filter.

61
00:04:06,690 --> 00:04:09,810
So hopefully that does make a little more sense to, you know.

62
00:04:10,350 --> 00:04:15,150
And what will go on, though, after we after we go into the code and do that?

63
00:04:15,660 --> 00:04:20,980
And then we'll take a look at maximizing class activations, which I find is pretty cool.

64
00:04:21,000 --> 00:04:24,600
Actually, this is probably the coolest part of visualizations, in my opinion.

65
00:04:24,810 --> 00:04:26,520
So I'll see you in the next section.

66
00:04:26,640 --> 00:04:27,090
Thank you.