1
00:00:00,120 --> 00:00:07,140
Hi there and welcome to this new session in which we are going to delve deep into generative adversarial

2
00:00:07,140 --> 00:00:08,580
neural networks.

3
00:00:09,090 --> 00:00:15,600
In the previous session, we have seen how we made use of the Variational auto encoders to generate

4
00:00:15,600 --> 00:00:23,420
these kinds of outputs, which are in fact mixed digits from just an input noise.

5
00:00:23,460 --> 00:00:33,780
So what we had was this kind of encoder decoder structure where we had some input images here, like

6
00:00:33,780 --> 00:00:42,600
this one, for example, some inputs, and then we train the model such that the outputs look like this

7
00:00:42,600 --> 00:00:43,560
inputs.

8
00:00:44,480 --> 00:00:52,220
And then once training is done, we could take off this encoder and then just pass in a noise signal

9
00:00:52,220 --> 00:00:55,370
in here and then generate new.

10
00:00:56,730 --> 00:01:03,210
Outputs like what we have here, which look like those from the original data set.

11
00:01:04,080 --> 00:01:13,350
And in this section, we'll be seeing how to build a new set of or a new category of generative models

12
00:01:13,350 --> 00:01:15,000
known as the Gans.

13
00:01:15,600 --> 00:01:19,920
And we'll use them to generate images like this one, which we have here.

14
00:01:19,950 --> 00:01:26,610
You should know that all those images are images of people who do not actually exist.

15
00:01:27,540 --> 00:01:34,740
But before we dive into practice and see how to build models which can build this kind of realistic

16
00:01:34,740 --> 00:01:43,250
looking images, we shall start by understanding how this generative adversarial neural networks that's

17
00:01:43,260 --> 00:01:44,760
Gans work.

18
00:01:44,790 --> 00:01:51,600
Now this Gans, we're first introduced in this paper by Goodfellow et al, where the GAN architecture

19
00:01:51,600 --> 00:01:53,220
was first proposed.

20
00:01:53,820 --> 00:02:01,140
And to understand how the Gans work, let's make use of this figure from Luis Bouchard's post.

21
00:02:01,170 --> 00:02:05,790
Now let's suppose we have the bank which produces real money.

22
00:02:05,820 --> 00:02:09,420
You see here we have this real $100 bill.

23
00:02:09,810 --> 00:02:16,050
And then on this other end, we have the thief who produces fake money.

24
00:02:16,080 --> 00:02:23,250
You see here, this hundred dollar bill has this man with a moustache, which is in the case of the

25
00:02:23,250 --> 00:02:24,660
real dollar bill.

26
00:02:25,470 --> 00:02:34,650
And because differences like this and say, for example, this compared to this are very clear or can

27
00:02:34,650 --> 00:02:41,670
be easily seen, this police officer is able to detect that this money is fake.

28
00:02:42,720 --> 00:02:51,510
But when this police officer detects that the the bank note is fake, he or she says that it's fake

29
00:02:51,510 --> 00:02:59,160
because we have a mustache, for example, because we have this word fake written on it, because this

30
00:02:59,180 --> 00:03:03,240
hundred has a fake around it and stuff like that.

31
00:03:03,240 --> 00:03:13,920
So it tells the forger or the thief what needs to be ameliorated in order to make sure that the next

32
00:03:13,920 --> 00:03:22,680
time this thief presents this fake money to the police officer, the officer thinks it is maybe this

33
00:03:22,680 --> 00:03:23,700
real money.

34
00:03:25,660 --> 00:03:31,750
And so if we suppose that real money takes a value of one and fake money takes a value of zero.

35
00:03:31,780 --> 00:03:33,310
Let's change the color.

36
00:03:33,340 --> 00:03:35,530
Fake money takes a value of zero.

37
00:03:35,530 --> 00:03:42,400
Then let's say in the first year of production of this fake dollar bills, the police officer correctly

38
00:03:42,400 --> 00:03:46,720
says, okay, this is a zero and this is one.

39
00:03:46,720 --> 00:03:49,360
That's all it's very evident at the beginning.

40
00:03:49,360 --> 00:03:58,750
But then with time, this thief gains experience and now produces fake money, which looks just like

41
00:03:58,750 --> 00:03:59,800
the real money.

42
00:04:00,730 --> 00:04:09,340
And this now pushes the police officer not to be able to distinguish between the real and the fake any

43
00:04:09,340 --> 00:04:10,000
longer.

44
00:04:11,020 --> 00:04:19,120
Now, although we do not advocate for these kinds of malpractices, it turns out that this is the way

45
00:04:19,120 --> 00:04:21,130
the gangs actually work.

46
00:04:21,910 --> 00:04:29,740
And in our case, we'll replace this generator or we replace this thief by a generator model, which

47
00:04:29,740 --> 00:04:30,940
is a neural network.

48
00:04:30,940 --> 00:04:32,830
So this is going to be a neural network.

49
00:04:32,830 --> 00:04:41,860
And we replace this police officer by a discriminator, which also is a neural network, specifically

50
00:04:41,860 --> 00:04:50,650
a binary classifier, which takes in some inputs and says whether it is real or not.

51
00:04:51,460 --> 00:05:00,550
Whereas our generator here takes in some random inputs and then learns to output.

52
00:05:01,810 --> 00:05:08,620
This bank notes such that the discriminator thinks that it is real.

53
00:05:10,160 --> 00:05:17,090
Now we shall head on to the GAN lab, which is a project by Mensa, Kang et al.

54
00:05:17,570 --> 00:05:20,680
And we'll consider some very simple example.

55
00:05:20,690 --> 00:05:24,260
So, yeah, we suppose that the distribution is this year.

56
00:05:24,290 --> 00:05:25,040
See that?

57
00:05:25,040 --> 00:05:28,130
And then notice how we have a generator.

58
00:05:28,490 --> 00:05:38,270
We have a discriminator, and then the generator takes in some random noise and then outputs this sample,

59
00:05:38,270 --> 00:05:45,380
and then the discriminator takes in the sample and says whether it is real or it's fake.

60
00:05:45,770 --> 00:05:52,910
Now, apart from this fake samples, it also takes in the real samples and then also says whether it's

61
00:05:52,910 --> 00:05:54,000
real or fake.

62
00:05:54,020 --> 00:06:02,870
Now, now the weight of the generator and the discriminator are being updated such that after some time

63
00:06:02,930 --> 00:06:08,780
this samples which are going to be produced will look very much like this right here.

64
00:06:09,080 --> 00:06:14,480
So let's go ahead and click on Run and we'll see what we get.

65
00:06:17,220 --> 00:06:19,160
You see how we start with the training?

66
00:06:19,170 --> 00:06:20,650
Let's take all this off.

67
00:06:20,670 --> 00:06:22,410
See, we start with the training for now.

68
00:06:22,410 --> 00:06:22,950
The fix.

69
00:06:22,950 --> 00:06:23,460
Let's.

70
00:06:23,460 --> 00:06:25,170
Let's pause and start over.

71
00:06:25,680 --> 00:06:26,880
Let's restart that.

72
00:06:26,910 --> 00:06:27,460
Okay.

73
00:06:27,480 --> 00:06:30,170
You see, initially we get in these kinds of outputs.

74
00:06:30,180 --> 00:06:31,530
You see this output.

75
00:06:31,800 --> 00:06:32,910
You can look at this here.

76
00:06:32,910 --> 00:06:42,480
So this is this is a kind of sample generated and this is the data of our real data right here.

77
00:06:43,530 --> 00:06:51,690
And one thing we notice is that at times, as time goes on, you have your both the green and the purples,

78
00:06:51,690 --> 00:06:57,260
which are considered to be real and then yours, all of the purples consider to be fake.

79
00:06:57,270 --> 00:07:05,340
So this discriminator now starts making errors when it comes to seeing whether a given sample is real

80
00:07:05,340 --> 00:07:06,150
or not.

81
00:07:07,290 --> 00:07:16,320
Whereas on the other hand, the generator is now producing samples which look much like the real samples.

82
00:07:17,310 --> 00:07:24,270
And it's because of this competition between the generator and the discriminator that is actually called

83
00:07:24,270 --> 00:07:28,700
the Gans generative, adversarial.

84
00:07:28,710 --> 00:07:34,260
This adversary comes from this competition between the generator and the discriminator.

85
00:07:35,580 --> 00:07:41,910
And it now leads to the generator producing samples, which look very much like the real samples.

86
00:07:42,510 --> 00:07:43,950
One other point.

87
00:07:43,950 --> 00:07:50,580
You should notice that as we carried out this training, overall, we have two main parts.

88
00:07:50,580 --> 00:07:53,100
We have this block right here.

89
00:07:53,340 --> 00:07:55,130
Oh, let's take this block.

90
00:07:55,140 --> 00:07:57,060
We have this block.

91
00:07:57,990 --> 00:08:06,480
Which consists of wearing when the discriminator takes in this reel and then outputs some value.

92
00:08:06,510 --> 00:08:12,870
The output from this is used to update the parameters of the discriminator.

93
00:08:12,870 --> 00:08:22,620
And then when the discriminator takes this, and then when the discriminator takes in this, fake samples

94
00:08:22,620 --> 00:08:23,580
this.

95
00:08:24,840 --> 00:08:29,700
Output your is used now to update the generator.

96
00:08:29,700 --> 00:08:39,660
So we have this block wherein we update our discriminator, we have the block wherein we update our

97
00:08:39,660 --> 00:08:40,920
generator.

98
00:08:42,120 --> 00:08:47,400
So with that now we could pause this and you could change the data distribution.

99
00:08:48,770 --> 00:08:57,230
And so again, in comparison to with a VA is where we had this encoder and then the decoder where if

100
00:08:57,230 --> 00:09:04,610
we have a distribution like this one, so we have some, uh, some input we, we want to have or we

101
00:09:04,610 --> 00:09:10,880
want to be able to output or get outputs which will look similar to this input distribution.

102
00:09:11,690 --> 00:09:23,960
And then after training this encoder decoder, we now break this up and then make use of only this decoder

103
00:09:23,960 --> 00:09:27,910
to now generate output which are similar.

104
00:09:28,940 --> 00:09:36,800
Ours distribution is similar to that of this real inputs right here.

105
00:09:38,030 --> 00:09:45,140
Again, another important point to note is the fact that after training, after a certain number of

106
00:09:45,140 --> 00:09:52,160
epochs at some point where when the reels and the fakes look very similar, the discriminator now becomes

107
00:09:52,160 --> 00:09:56,420
somehow confused, as if you notice here you find some green patches.

108
00:09:56,420 --> 00:09:59,120
You see, you have some green patches, purple patches.

109
00:09:59,120 --> 00:10:01,370
Here we have some green patches and purple patches.

110
00:10:01,370 --> 00:10:07,520
So it's no longer able to distinguish between the reals and the fakes.

111
00:10:08,540 --> 00:10:14,780
And so instead of as before or as at the beginning, where I was able to say this is a one and this

112
00:10:14,780 --> 00:10:20,570
was a zero, now it sees this as a 0.5 and this as a 0.5.

113
00:10:20,570 --> 00:10:21,860
It becomes confused.

114
00:10:23,180 --> 00:10:28,640
And so, in fact, the aim of our gang training process will be to.

115
00:10:30,420 --> 00:10:34,830
Ensure that the generator wins the fight.

116
00:10:36,030 --> 00:10:45,690
It should be noted that most very cool applications of gangs are in the domain of image generation.

117
00:10:45,690 --> 00:10:50,900
And so we'll look at some of these applications in this article by Jonathan Hoy.

118
00:10:51,030 --> 00:10:54,000
Gans can be used in creating anime characters.

119
00:10:54,000 --> 00:11:02,760
You see here we have this anime characters which have been generated automatically using Gans.

120
00:11:02,970 --> 00:11:11,820
And so this means that similar to what we had here, we are going to have a real data set which produces

121
00:11:11,820 --> 00:11:15,630
images similar to what we have right here.

122
00:11:15,660 --> 00:11:18,510
Let's get back here similar to what we have right here.

123
00:11:18,510 --> 00:11:25,800
And then we're going to have a generator, which we'll learn over time to generate images which will

124
00:11:25,800 --> 00:11:28,650
look like the real images.

125
00:11:28,650 --> 00:11:32,700
And that's how we get to have images like this one.

126
00:11:33,060 --> 00:11:39,790
Now, from here, we also have post guided person image generation.

127
00:11:39,810 --> 00:11:46,170
Here, for example, you'll notice that we have this input image and then if we want to have this same

128
00:11:46,170 --> 00:11:52,950
person but with a different post, then we could specify, we could pass in this post and then get this

129
00:11:52,950 --> 00:11:54,620
output right here.

130
00:11:54,630 --> 00:12:01,920
So you see that we have this input, does this image with this different pose and then this is being

131
00:12:01,920 --> 00:12:02,850
generated.

132
00:12:02,850 --> 00:12:07,020
Another application is in cross domain translations.

133
00:12:07,020 --> 00:12:14,730
So you see here we have this input where we have this to say three zebras and then we are able to transform

134
00:12:14,730 --> 00:12:21,510
this input image automatically into this other domain where we instead have horses.

135
00:12:21,510 --> 00:12:29,940
And so this image or this input or this output rather, has been generated from this input and this

136
00:12:29,940 --> 00:12:33,300
could be done in the reverse direction as we would see here.

137
00:12:33,570 --> 00:12:42,000
We see you could get from zebra to horse as we had here, and then from horse to zebra.

138
00:12:42,030 --> 00:12:43,770
Then we have another example.

139
00:12:43,770 --> 00:12:51,780
This is a star gun which permits us to carry out translations or transformations where we modify specific

140
00:12:51,780 --> 00:12:53,880
high level features of an image.

141
00:12:53,880 --> 00:12:59,490
So right here, you see we have this input and then we add, for example, blonde hair.

142
00:12:59,550 --> 00:13:06,490
So it changes the gender modified so that this this male becomes a female aged.

143
00:13:06,630 --> 00:13:13,590
He's aged and talking about aged, you could build an application which tells you or which shows you

144
00:13:13,590 --> 00:13:19,410
what you would look like after, say, 20, 50 or years.

145
00:13:19,590 --> 00:13:21,480
Then here we have pale skin.

146
00:13:21,480 --> 00:13:24,420
Another or some other modifications we could have.

147
00:13:24,420 --> 00:13:27,810
Here is the angry, happy, fearful.

148
00:13:27,810 --> 00:13:29,430
And so that's it.

149
00:13:29,470 --> 00:13:32,700
You see, making use of this star again.

150
00:13:32,700 --> 00:13:36,990
You could carry out these kinds of transformations on an input image.

151
00:13:37,110 --> 00:13:45,180
The next we have this pixel again, which creates clothing, images and styles from an input image.

152
00:13:45,180 --> 00:13:50,820
So you see we have the source image and we have this different images which have been generated from

153
00:13:50,820 --> 00:13:52,080
this source.

154
00:13:52,470 --> 00:14:02,220
We have other examples right here you could see, and then we have super resolution now for super resolution.

155
00:14:03,300 --> 00:14:13,260
Gans have been used to increase the resolution of an input image while making this higher resolution

156
00:14:13,260 --> 00:14:15,570
images more realistic.

157
00:14:15,570 --> 00:14:24,120
So you see this image, for example, this by Kubik, this one right here, which has been gotten using

158
00:14:24,120 --> 00:14:25,500
the by cubic method.

159
00:14:25,500 --> 00:14:32,880
And then we have this other image which is using a rest net and then now you'll notice that there is

160
00:14:33,030 --> 00:14:38,490
some difference with the kind of outputs we get using.

161
00:14:38,490 --> 00:14:49,740
Again, notice how this image here looks more realistic or looks much more like the original as compared

162
00:14:49,740 --> 00:14:56,650
to what this other two non gan methods produced to even get a much clearer difference.

163
00:14:56,670 --> 00:15:01,890
Notice this part here where we have this water boring.

164
00:15:01,980 --> 00:15:10,830
You'll notice how this looks much more realistic as compared to using a classical neural network like

165
00:15:10,830 --> 00:15:11,620
the rest net.

166
00:15:11,640 --> 00:15:19,920
Then from your move on to the next application, which is that of generating faces by this time around

167
00:15:19,920 --> 00:15:22,200
very high definition faces.

168
00:15:22,200 --> 00:15:29,280
So you see you have 1024 by 10,024 images generated you see this images of.

169
00:15:29,360 --> 00:15:34,700
People would not actually exist, which haven't gotten using a pro game.

170
00:15:34,910 --> 00:15:36,740
That's a progressive game.

171
00:15:36,770 --> 00:15:40,700
Now we move to the next.

172
00:15:40,700 --> 00:15:48,170
We have style again, which even comes with much better resolution and with some styling.

173
00:15:48,200 --> 00:15:56,690
So from here we go on to high resolution image synthesis, where we could get you see the semantic map.

174
00:15:56,690 --> 00:16:01,730
And then from here we generate this output we have right here.

175
00:16:02,360 --> 00:16:12,620
Then the next Gauguin's, which as we've seen already, takes these kinds of semantic maps and thence

176
00:16:13,640 --> 00:16:15,660
produces this output.

177
00:16:15,690 --> 00:16:17,470
You see, this is the ground truth.

178
00:16:17,480 --> 00:16:21,920
This is the exact output and this is what the GAN produces.

179
00:16:22,070 --> 00:16:24,620
So that see from here we able to produce this.

180
00:16:24,620 --> 00:16:26,750
From this we are able to produce this.

181
00:16:26,780 --> 00:16:34,850
Now, this kind of technology could be applied in video compression in the sense that during a video

182
00:16:34,850 --> 00:16:41,090
call where we have this input, so we suppose now we have this, this is a sender and then this is a

183
00:16:41,090 --> 00:16:45,470
receiver we separated by this dotted lines right here.

184
00:16:45,470 --> 00:16:53,090
So we have this input right here and then we carry out key point extraction where we get this key points,

185
00:16:53,090 --> 00:16:53,960
as you could see here.

186
00:16:53,960 --> 00:16:57,200
And then this is what's been transmitted via the network.

187
00:16:57,200 --> 00:17:04,670
So instead of transmitting this input, we transmit this key points and then making use of some key

188
00:17:04,670 --> 00:17:05,870
frame which has been passed.

189
00:17:05,870 --> 00:17:12,290
Initially we combine those key points with the keyframe to produce.

190
00:17:12,290 --> 00:17:21,290
Now this output right here, which looks like this original input which we wanted to pass.

191
00:17:21,290 --> 00:17:29,960
And so now at the receiving end we are able to get this year at a much lower bandwidth since we are

192
00:17:29,960 --> 00:17:36,020
taking in only the key points and not this whole input image.

193
00:17:36,020 --> 00:17:43,340
Then we also have applications in text to image where we could pass in a text like this flower has long

194
00:17:43,340 --> 00:17:51,440
ten yellow petals and a lot of yellow enters in the center and this generates this kind of output.

195
00:17:51,800 --> 00:17:58,670
Now, talking about those kinds of applications, we could check out Korean dot com, which is in fact

196
00:17:58,690 --> 00:18:05,810
a dial E mini model where we'll be able to create much more realistic output.

197
00:18:06,020 --> 00:18:13,940
So let's click on this and while this is loading, we could check out on the other applications on the

198
00:18:13,940 --> 00:18:16,310
next application text to image.

199
00:18:16,310 --> 00:18:18,890
We've seen this already face synthesis.

200
00:18:19,820 --> 00:18:23,570
You see right here we get with a single input image.

201
00:18:23,570 --> 00:18:26,270
We create faces in different viewing angles.

202
00:18:26,270 --> 00:18:31,430
So for example, we can use this to transform images that will be easier for face recognition.

203
00:18:31,430 --> 00:18:40,280
So if we suppose that we get in this kind of input, let's say we get this kind of input we could generate

204
00:18:40,280 --> 00:18:48,500
or we could convert this, for example, into this one year such that it will be easier for a face recognition

205
00:18:48,500 --> 00:18:50,660
model to do its job.

206
00:18:51,620 --> 00:18:56,390
Now from here we go to the next one image in painting.

207
00:18:56,480 --> 00:19:00,500
Okay, so right here we have this input.

208
00:19:00,500 --> 00:19:01,790
Let's increase this.

209
00:19:02,090 --> 00:19:03,550
Um, take this off.

210
00:19:03,560 --> 00:19:06,260
So as we're saying, we have let's, let's pick this one.

211
00:19:06,260 --> 00:19:11,750
For example, we have this input right here, but we have this patch which has been taken off and now

212
00:19:11,750 --> 00:19:21,020
making use of again, we are able to generate an output which will be like this input without the patch.

213
00:19:21,020 --> 00:19:28,760
So just like again takes this patch off and as you could see, the again does this job quite well.

214
00:19:29,450 --> 00:19:31,400
Let's reduce this.

215
00:19:31,400 --> 00:19:32,450
There we go.

216
00:19:33,680 --> 00:19:41,570
Before we move on, let's get back to this output created our icon dot com and you can see that this

217
00:19:41,570 --> 00:19:47,600
doll E mini model produces even much more realistic images.

218
00:19:48,260 --> 00:19:56,030
Then we move on to the discourse, Gans where we could create outputs which match the style of a given

219
00:19:56,030 --> 00:19:56,750
input.

220
00:19:56,750 --> 00:20:05,360
So supposing you want to go out on a little trip, you want to say take this back and you wanted to

221
00:20:05,360 --> 00:20:08,960
get some ideas of the kind of show you could put on.

222
00:20:08,960 --> 00:20:16,040
Then you could make a call on the this score again and you'll get this kind of output based on your

223
00:20:16,040 --> 00:20:16,640
input.

224
00:20:16,880 --> 00:20:26,060
So this is a model which is similar to the cycle again, which we are already seeing right here, and

225
00:20:26,060 --> 00:20:28,970
where we were able to leave from one.

226
00:20:29,590 --> 00:20:31,270
Domain to another.

227
00:20:31,930 --> 00:20:37,030
One other fun project will be to generate emojis from input images.

228
00:20:37,030 --> 00:20:43,840
So here we have this input image, for example, and we have this emoji which is generated from the

229
00:20:43,840 --> 00:20:44,380
input.

230
00:20:45,160 --> 00:20:49,080
Then another very interesting application will be in the blurring.

231
00:20:49,090 --> 00:20:57,010
So right here we suppose that we have this input image which is clearly blurred and then we want to

232
00:20:57,220 --> 00:20:58,270
blur this image.

233
00:20:58,280 --> 00:21:04,210
You see that we're able to produce this kind of images making use of Gans, and you can see from this

234
00:21:04,210 --> 00:21:07,750
images here that again, it's do this job quite well.

235
00:21:07,930 --> 00:21:10,780
Another application is in photo editing.

236
00:21:11,470 --> 00:21:19,420
And so now you do not need to be some X part in photo editing to carry out some of this photo edits.

237
00:21:20,080 --> 00:21:23,560
All you need now is some gain and you're good to go.

238
00:21:23,740 --> 00:21:30,400
Apart from image Generation Gans tool can be used in music generation though in this course we shall

239
00:21:30,400 --> 00:21:32,950
focus on image generation.

240
00:21:32,950 --> 00:21:39,550
Then the medical domain Gans could be used in anomaly detection, and that's it for this section.

241
00:21:39,550 --> 00:21:47,050
In the next section we are going to look at how this Gans are actually trained in practice and the type

242
00:21:47,050 --> 00:21:50,590
of loss functions we use when training this Gans.