1
00:00:00,060 --> 00:00:06,270
Hi and welcome back to the course, we're about to take a look at a very cool algorithm called neural

2
00:00:06,270 --> 00:00:11,580
style transfer, which allows us to copy the autistic style of one image onto another.

3
00:00:11,760 --> 00:00:13,110
So let's take a look at this.

4
00:00:13,710 --> 00:00:21,090
So here's an example of what I mean here we are taking an image of myself regular image, then taking

5
00:00:21,090 --> 00:00:27,120
this autistic little image here and then copying that style onto the image here with me.

6
00:00:27,120 --> 00:00:28,350
So you can see this image.

7
00:00:28,350 --> 00:00:33,270
No copies the autistic style elements like these lines that are being drawn.

8
00:00:33,900 --> 00:00:36,560
It's it's been copied all across my image.

9
00:00:36,590 --> 00:00:40,290
Even my hair is kind of flake Coptic chicken in that style a bit.

10
00:00:40,290 --> 00:00:41,070
So it's pretty cool.

11
00:00:41,820 --> 00:00:49,140
So neural style transfer was probably the algorithm that really sparked whole II art revolution.

12
00:00:49,680 --> 00:00:57,990
It was introduced by Leon Curtis in 2015 in a paper titled Neural in Neural Algorithm for Autistic Style.

13
00:00:58,560 --> 00:01:03,630
Now your style transfer is basically the algorithms acronym A.T..

14
00:01:04,230 --> 00:01:06,460
And it went viral after that.

15
00:01:06,510 --> 00:01:08,790
I mean, there have been loads of improvements.

16
00:01:09,210 --> 00:01:16,860
Loads of startups and mobile apps that aim to basically pass this ability onto the user.

17
00:01:17,610 --> 00:01:23,940
So, like I said, neural cell transfer, this enables us to copy it the artistic style of one image

18
00:01:23,940 --> 00:01:30,150
here onto another so that copies things like color patterns, combinations and brushstrokes from the

19
00:01:30,150 --> 00:01:34,980
original source artistic image, and then applies those effects to your image.

20
00:01:35,280 --> 00:01:36,210
Pretty cool, isn't it?

21
00:01:36,720 --> 00:01:42,450
This one was generated by deep dream generator, and you can take a look at some of more art from deep

22
00:01:42,450 --> 00:01:43,030
dream generator.

23
00:01:43,050 --> 00:01:43,920
It's quite cool.

24
00:01:44,340 --> 00:01:46,470
There's also a new start of wombat.

25
00:01:46,470 --> 00:01:47,070
Isn't that new?

26
00:01:47,070 --> 00:01:53,520
But if it's basically been pushing of this very recently, some new algorithms that are generating very,

27
00:01:53,520 --> 00:01:59,280
very beautiful art, in my opinion, I'm actually get some of these and print them out and frame them.

28
00:01:59,910 --> 00:02:01,380
They're they're really quite good.

29
00:02:02,340 --> 00:02:06,360
So I would encourage you to check, check these sites out and have some fun messing with them.

30
00:02:06,370 --> 00:02:07,530
I mean, I do a lot.

31
00:02:08,220 --> 00:02:09,540
So how does this work?

32
00:02:09,550 --> 00:02:11,130
How do we actually get the content?

33
00:02:11,130 --> 00:02:12,930
I mean, the style onto the content image?

34
00:02:13,770 --> 00:02:17,160
Well, basically, we just needed two images here.

35
00:02:17,730 --> 00:02:18,990
And then what do we do?

36
00:02:19,260 --> 00:02:23,580
Well, simply, we notice we take the style of one image and transmit to the other.

37
00:02:24,060 --> 00:02:25,260
But how do we do it?

38
00:02:25,500 --> 00:02:29,440
Well, the old style transfer uses neural networks.

39
00:02:29,460 --> 00:02:37,020
What we do is we take a pre-trained network, something like a Virgin 19 or inception or arrest it.

40
00:02:37,680 --> 00:02:43,920
And then we define and combine treeless functions, which we all minimize these loss functions out of

41
00:02:43,920 --> 00:02:47,910
content, plus the style, loss and total variation loss.

42
00:02:48,630 --> 00:02:54,360
So the first of those functions we'll talk about is the content loss and what we're doing here.

43
00:02:54,360 --> 00:03:01,500
We're minimizing trying to minimize the distance in content between the generated image and content

44
00:03:01,500 --> 00:03:01,770
image.

45
00:03:01,780 --> 00:03:03,240
So what are we doing exactly?

46
00:03:03,810 --> 00:03:09,600
Well, the content loss function measures how similar the generated image is to the content image.

47
00:03:10,080 --> 00:03:17,100
It uses Euclidean distance that's the L2 norm to measure the difference between features of the content

48
00:03:17,100 --> 00:03:18,750
image and the generated image.

49
00:03:19,320 --> 00:03:23,620
So we're looking at features here and comparing it via Euclidean distance.

50
00:03:24,180 --> 00:03:30,030
Then we just build this last function by using the pre-trained network like video g 16 or 19.

51
00:03:30,810 --> 00:03:35,650
And then we select a higher level layer that serves as our content flawlessly.

52
00:03:35,680 --> 00:03:41,130
And the reason we use a higher level layer is because the higher level layers have more structure and

53
00:03:41,130 --> 00:03:44,790
you're able to capture content better than the lower level is.

54
00:03:45,480 --> 00:03:50,940
We then compute the activation of this layer so that activations here mean that we're looking at the

55
00:03:50,940 --> 00:03:52,740
filters and those layers.

56
00:03:53,190 --> 00:04:00,140
And if something activates both in content and style image, that tends to mean they're similar.

57
00:04:00,150 --> 00:04:05,850
So we just compute the activations for boot and we compare them to us using this loss function and we

58
00:04:05,850 --> 00:04:07,920
try to minimize the distance between them.

59
00:04:08,460 --> 00:04:15,630
So generally, what is this seeing is that our content image or generated image has to look similar

60
00:04:15,630 --> 00:04:16,720
to our content image.

61
00:04:16,740 --> 00:04:18,360
That's what this loss function is doing.

62
00:04:19,320 --> 00:04:24,060
So as you can see above, we just take the L2 norm between these activations there.

63
00:04:24,270 --> 00:04:24,660
All right.

64
00:04:25,560 --> 00:04:28,590
So now let's talk about the style loss.

65
00:04:29,250 --> 00:04:35,400
So the star loss is quite important, and what it's doing is that it's measuring how different the generated

66
00:04:35,400 --> 00:04:40,050
image is and style features, by the way, from our style image.

67
00:04:40,680 --> 00:04:41,910
So how are we doing that?

68
00:04:42,060 --> 00:04:46,370
So this is a bit more complex and this need something called the ground matrix.

69
00:04:46,380 --> 00:04:47,760
So we'll talk about that shortly.

70
00:04:48,420 --> 00:04:51,300
So stylus uses multiple layers.

71
00:04:51,810 --> 00:04:56,460
And the reason we use multiple layers is because we need to preserve some multi-skilled representation.

72
00:04:57,000 --> 00:04:59,880
And this allows it to capture low level features.

73
00:05:00,170 --> 00:05:07,060
Things like edges in the style mid-level features such as blobs and queues, and then more high level

74
00:05:07,070 --> 00:05:13,700
complex patterns and stellar representation is given by something called the ground matrix.

75
00:05:13,850 --> 00:05:15,620
So we'll take a look at that shortly.

76
00:05:16,160 --> 00:05:18,800
However, just imagine that these tiles.

77
00:05:19,190 --> 00:05:20,450
Imagine what style is.

78
00:05:20,450 --> 00:05:24,730
Tell things like brushstrokes of different color patterns.

79
00:05:24,800 --> 00:05:27,200
All of these things are captured in the features here.

80
00:05:27,740 --> 00:05:30,820
But we now need a way to measure the correlations between those.

81
00:05:30,830 --> 00:05:33,410
So let's take a look at what the Crown Matrix gives us.

82
00:05:34,280 --> 00:05:40,160
So the Crown matrix for our stylus, unlike the content loss, we just don't need to find the difference

83
00:05:40,160 --> 00:05:41,180
between the activations.

84
00:05:41,180 --> 00:05:47,000
We need to do something more and that something more is finding the correlation between these activations

85
00:05:47,450 --> 00:05:50,300
across different channels of the same layer.

86
00:05:50,780 --> 00:05:51,770
So what does that mean?

87
00:05:52,580 --> 00:05:57,660
Well, imagine this is the width of the network here, and these are the filters.

88
00:05:57,710 --> 00:06:03,740
Which of a filter and what the gram matrix is doing is that it's measuring the correlation between filters.

89
00:06:04,220 --> 00:06:09,940
So you can see looking at the screen filter here, we can see in the graph below the graph, but on

90
00:06:09,950 --> 00:06:10,780
a matrix form.

91
00:06:10,790 --> 00:06:16,820
And what after we complete the correlations, we can visualize it what it is between each of the filters

92
00:06:16,820 --> 00:06:17,090
here.

93
00:06:17,750 --> 00:06:19,130
So why is this important?

94
00:06:20,360 --> 00:06:26,600
Well, remember, remember, each filter activates upon seeing an example of a feature such as like

95
00:06:26,600 --> 00:06:27,620
a cat's nose.

96
00:06:28,280 --> 00:06:35,870
Now, if another filter in that layer activates upon detecting a cut, that would mean those two filters

97
00:06:36,350 --> 00:06:36,980
correlate.

98
00:06:37,670 --> 00:06:41,060
So that means very able would able to detect a cut.

99
00:06:41,720 --> 00:06:45,380
This is important in building now style knowledge.

100
00:06:45,680 --> 00:06:49,430
OK, so let's talk a bit more with the Graham Matrix.

101
00:06:49,430 --> 00:06:56,390
So in order to capture the style, but not the global arrangement of the content to remember, the global

102
00:06:56,390 --> 00:07:03,170
arrangement is more the layout of the image and what that's what's represented in a content image.

103
00:07:03,590 --> 00:07:04,310
That's what we want.

104
00:07:04,310 --> 00:07:05,420
That's the actual picture.

105
00:07:05,840 --> 00:07:12,380
We simply just want to get to copy style like brushes, paint strokes, artistic elements.

106
00:07:12,380 --> 00:07:16,760
So to do that, we must rely on the correlations between our filters.

107
00:07:17,150 --> 00:07:24,350
And to do that, we take the L2 norm between the ground matrix between layer activations.

108
00:07:25,520 --> 00:07:32,360
We minimize loss between the style of the output image with our original style image and thereby forcing

109
00:07:32,360 --> 00:07:37,520
the style image of the output image to correlate with the style of the style image.

110
00:07:37,850 --> 00:07:39,020
Now that's a bit of a mouthful.

111
00:07:39,020 --> 00:07:41,150
I read it out and I hope it made sense to you.

112
00:07:41,600 --> 00:07:44,430
Take your time and read this paragraph over.

113
00:07:44,430 --> 00:07:49,190
If it doesn't, just pause a video, I take a look, but I'll try to explain it to you one more time.

114
00:07:49,580 --> 00:07:55,460
So what we're trying to do is minimize the loss between the style of the output image and the style

115
00:07:55,460 --> 00:08:02,770
of an original style image that Lewis is represented in artistic style, which is what we talked about

116
00:08:02,850 --> 00:08:05,420
in Star Wars as well as content.

117
00:08:05,570 --> 00:08:11,390
So we're forcing the output image to correlate with the style of the style image and also correlate

118
00:08:11,390 --> 00:08:15,380
with the content loss with the content image as well.

119
00:08:16,340 --> 00:08:23,000
So that brings us to something called total variation loss now that a variation loss was included as

120
00:08:23,000 --> 00:08:25,580
it reduces noisy and pixelated outputs.

121
00:08:26,060 --> 00:08:32,630
It allows us to maintain some smoothness and spatial continuity, so we get nicer looking images.

122
00:08:33,200 --> 00:08:38,750
And the total variation loss considers both the content loss and style loss as well.

123
00:08:39,350 --> 00:08:45,380
So it operates on the output image only, with the only goal being to enhance that image itself.

124
00:08:46,040 --> 00:08:50,510
So this is what the equation looks like, and we can assign a widths to the content and style.

125
00:08:50,870 --> 00:08:58,730
So if you want more style, less content, you can adjust as we take respectively, or vice versa if

126
00:08:58,730 --> 00:09:04,490
you want to increase the content and reduce the style just to make this bigger than this.

127
00:09:05,840 --> 00:09:12,380
So thereby, we have all these these tree loss functions, so we now need to combine them.

128
00:09:12,830 --> 00:09:18,590
So the final loss function is basically the sum of all previous lowest functions with each component

129
00:09:18,590 --> 00:09:18,980
weighted.

130
00:09:20,300 --> 00:09:23,120
This weighting allows us to tweak the results image.

131
00:09:23,600 --> 00:09:29,800
So if you wanted more content and lost style, they can adjust ensuites appropriately and to implement,

132
00:09:29,810 --> 00:09:32,390
we start iterating for a preset number.

133
00:09:32,420 --> 00:09:39,370
So we're setting epochs beforehand and then using gradient descent, sometimes using the L Dash VFX

134
00:09:39,410 --> 00:09:40,010
algorithm.

135
00:09:40,490 --> 00:09:42,260
We minimize our lost functions.

136
00:09:43,190 --> 00:09:44,480
So that's it.

137
00:09:45,080 --> 00:09:48,140
That's basically neural cell transfer, in a nutshell.

138
00:09:48,650 --> 00:09:53,210
It's this is a deep topic in the research paper is actually very tough to read.

139
00:09:53,930 --> 00:09:59,330
It's it's quite complex, and implementing in the code is also quite complex.

140
00:09:59,960 --> 00:10:05,810
However, will now be doing that in both Keras and PyTorch, we'll run through the court and you'll

141
00:10:06,130 --> 00:10:07,220
be able to train it.

142
00:10:07,350 --> 00:10:12,410
Sorry, not Trina, but you'll be able to execute it and implement it on your own test images.

143
00:10:12,830 --> 00:10:13,640
So it's pretty fun.

144
00:10:14,000 --> 00:10:17,390
It's basically like deep stream generator, but with your own cord now.

145
00:10:18,110 --> 00:10:20,450
OK, so I'll see you in those lessons.

146
00:10:20,600 --> 00:10:21,050
Thank you.