1
00:00:00,170 --> 00:00:02,280
Hi there and welcome to the session.

2
00:00:02,300 --> 00:00:08,660
In this session we are going to look at the Gan loss function and also how Gans have been trained.

3
00:00:09,110 --> 00:00:14,330
Then finally we'll look at common Gan training problems.

4
00:00:14,720 --> 00:00:21,710
So for a brief summary of what we've seen already, we'll consider that we have this two distributions

5
00:00:21,710 --> 00:00:22,460
right here.

6
00:00:22,490 --> 00:00:31,280
This distribution or this one, uh, and this black dotted lines represents the real data that's similar

7
00:00:31,280 --> 00:00:32,480
to what we had here.

8
00:00:32,480 --> 00:00:36,170
So similar to, uh, this real data right here.

9
00:00:36,200 --> 00:00:43,910
And then this other distribution in green represents the fake data which is going to be generated by

10
00:00:43,910 --> 00:00:45,070
our generator.

11
00:00:45,080 --> 00:00:55,280
So, um, getting back here, we have this, uh, we see this here, which is our discriminator.

12
00:00:55,280 --> 00:01:04,580
So right here we have the discriminator D then we have, as we've said already, the real and the fake

13
00:01:04,580 --> 00:01:05,870
distributions.

14
00:01:06,530 --> 00:01:14,120
Now, once we start training, we have this discriminator which sees that most most of the real data

15
00:01:14,150 --> 00:01:21,260
is, uh, getting a score of one, or it classifies with a probability of one that this data is real.

16
00:01:21,260 --> 00:01:31,580
And then as we approach this generated data, uh, as we get samples from our generated distribution,

17
00:01:31,580 --> 00:01:36,620
we find that the discriminator now outputs a zero.

18
00:01:36,950 --> 00:01:45,620
Now, as we keep on training, the generation gets better and then the generator samples now look a

19
00:01:45,620 --> 00:01:47,950
bit more like the real data.

20
00:01:47,960 --> 00:01:49,760
See, they come, they get closer.

21
00:01:49,760 --> 00:01:56,120
This, uh, distance here becomes smaller as compared to compared to what we had before.

22
00:01:56,120 --> 00:02:02,160
So, uh, we've trained and we've gotten to this point where now this discriminator here still sees

23
00:02:02,160 --> 00:02:11,580
the real data to be one, but now confuses or sometimes classifies this fake data to, to be one.

24
00:02:11,580 --> 00:02:18,690
So you see data around here, see this samples we consider to be ones or classified as one, though

25
00:02:18,690 --> 00:02:21,300
we'll still have some samples classified as zero.

26
00:02:21,300 --> 00:02:27,630
And then once we get to convergence, we have this, uh, half here.

27
00:02:27,630 --> 00:02:36,330
So we have this, um, classifier now which is unable to differentiate between the real and the generated

28
00:02:36,330 --> 00:02:40,950
or fake data because the distributions look very much alike.

29
00:02:40,950 --> 00:02:43,530
And so that's it for the recap of the previous session.

30
00:02:43,530 --> 00:02:47,730
Now we're going to dive into the Gan loss function.

31
00:02:47,730 --> 00:02:53,220
So right here we have this, um, training algorithm which we could see here.

32
00:02:53,220 --> 00:02:54,000
Let's expand this.

33
00:02:54,030 --> 00:03:00,450
We have this training algorithm and if you notice, we have this two loss functions here, one for the

34
00:03:00,450 --> 00:03:03,270
discriminator and the other for the generator.

35
00:03:03,270 --> 00:03:07,500
But this could be combined into one equation right here.

36
00:03:07,530 --> 00:03:09,810
Let's take this here.

37
00:03:09,810 --> 00:03:16,500
And now we have our two player Minimax game with this value function V.

38
00:03:16,530 --> 00:03:24,750
Now, when we talk about this Minimax game right here, it makes allusion to what goes on between the

39
00:03:24,750 --> 00:03:27,240
generator and the discriminator.

40
00:03:27,240 --> 00:03:31,470
So our two players in this case are our generator and the discriminator.

41
00:03:31,470 --> 00:03:37,890
Now one thing you could also do with this Gan lab is you could put out your own, uh, distribution.

42
00:03:37,890 --> 00:03:46,950
So let's say, for example, this distribution we apply and we start the train and the generator now,

43
00:03:46,950 --> 00:03:53,040
together with the discriminator, start to play this game where at the end or at Convergence, we expect

44
00:03:53,040 --> 00:03:57,660
the generator to produce outputs which are similar to that of the discriminator.

45
00:03:57,660 --> 00:04:04,230
Now, coming back to this equations, you'll notice that we have mean and G and Max and Z.

46
00:04:04,260 --> 00:04:14,430
So to understand this notation, you can consider that we are minimizing this expression right here

47
00:04:15,660 --> 00:04:23,400
by updating the parameters of G, and then we maximize this expression by updating the parameters of

48
00:04:23,400 --> 00:04:26,490
Z, which is our discriminator, where G is our generator.

49
00:04:27,270 --> 00:04:35,160
And then if we try to separate these two, that's to to get for the two to start with minimizing, for

50
00:04:35,160 --> 00:04:36,600
example, for the generator.

51
00:04:36,600 --> 00:04:43,260
You'll find that given that in this first expression, you see this, this expression right here, let's

52
00:04:43,260 --> 00:04:43,980
have that.

53
00:04:43,980 --> 00:04:47,040
So in this expression, there is actually no G.

54
00:04:47,130 --> 00:04:53,910
So we could make use of only this when we're trying to minimize this whole expression with respect to

55
00:04:53,910 --> 00:04:54,840
parameters of G.

56
00:04:54,990 --> 00:04:59,580
So that's why if you, if you, if you check in the, um, algorithm given to.

57
00:04:59,670 --> 00:05:00,440
Us right here.

58
00:05:00,450 --> 00:05:08,280
You find that for a loss for G, you have only that second expression.

59
00:05:08,370 --> 00:05:12,700
Now for the D, Let's get back here for the d.

60
00:05:12,850 --> 00:05:15,660
D is in the both sides.

61
00:05:15,690 --> 00:05:18,450
Or this both this expression and this other expression.

62
00:05:18,450 --> 00:05:24,060
So that's why we have the combination of the two in this, our algorithm right here.

63
00:05:24,180 --> 00:05:30,180
Now, to better explain or better understand this in depth, let's consider this here.

64
00:05:30,180 --> 00:05:33,660
So we've extracted for the D and for the G.

65
00:05:33,750 --> 00:05:36,030
You should also note that let's get back here.

66
00:05:36,030 --> 00:05:41,820
When we talk about updating the discriminator or updating the generator is basically our gradient descent

67
00:05:41,820 --> 00:05:42,330
step.

68
00:05:42,330 --> 00:05:48,900
Remember, if we have, for example, Tedder and or rather, if we have let's take this off.

69
00:05:48,900 --> 00:05:58,980
If we have Tedder at a particular step, let's say step I or to get Tedder at I plus one step, then

70
00:05:58,980 --> 00:06:03,040
we would have tedder I as a previous step or let's say tedder I minus one.

71
00:06:03,490 --> 00:06:10,780
Um let's take this off so we want to get the I which is equal to the I minus one minus the learning

72
00:06:10,780 --> 00:06:20,980
rate times the partial derivative of the loss with respect to our Tedder I minus one.

73
00:06:20,980 --> 00:06:23,770
So this is basically what we have in here.

74
00:06:23,770 --> 00:06:35,110
And this expression here where we have this, um, reversed triangle that's, uh, this right here is

75
00:06:35,110 --> 00:06:39,010
actually this section of our gradient descent.

76
00:06:39,220 --> 00:06:43,570
And so this means that we're finding the partial derivative of our loss.

77
00:06:43,600 --> 00:06:46,930
This is the loss with respect to the parameters.

78
00:06:46,960 --> 00:06:50,890
Tedder Then we have the same for the generator.

79
00:06:50,920 --> 00:06:58,480
Now, let's get back here where we're going to, um, understand in depth what all these different expressions

80
00:06:58,480 --> 00:07:04,870
actually mean, right here we have this real sample and this fake sample.

81
00:07:04,870 --> 00:07:07,930
So these two here, we have real and fake.

82
00:07:07,930 --> 00:07:16,810
And then, uh, with this real sample, we pass it into the discriminator, which we hope or wish we

83
00:07:16,810 --> 00:07:19,390
will train to see that this is a one.

84
00:07:19,390 --> 00:07:25,450
And then for the, the fake sample, because this is a person who doesn't exist, whereas this person

85
00:07:25,450 --> 00:07:26,380
actually exists.

86
00:07:26,380 --> 00:07:34,270
And we train again, uh, which we'll see later on in this course, uh, to produce this type of outputs.

87
00:07:34,270 --> 00:07:36,310
So, um, let's get back here.

88
00:07:36,310 --> 00:07:39,580
We have our discriminator, then we, we from here.

89
00:07:39,610 --> 00:07:44,770
Let's take, let's instead shift this, um, let's move this, this way.

90
00:07:44,770 --> 00:07:47,350
So let's take this right here.

91
00:07:47,560 --> 00:07:48,100
Yeah.

92
00:07:48,190 --> 00:07:50,170
Okay, so we have that.

93
00:07:50,560 --> 00:07:53,350
And then here we have this right here.

94
00:07:53,350 --> 00:07:55,720
Now this is our fake sample.

95
00:07:55,720 --> 00:07:57,880
And then we have our generator.

96
00:07:57,880 --> 00:07:59,410
Let's change the color for the generator.

97
00:07:59,410 --> 00:08:01,810
So our generator produces this.

98
00:08:01,810 --> 00:08:06,610
And then we have some random noise which we send in here because we just we just want to be able to

99
00:08:06,610 --> 00:08:08,680
produce this from some random noise.

100
00:08:08,680 --> 00:08:15,820
So we have this random noise right here which produces this, our G, which produces that, um, fake

101
00:08:15,820 --> 00:08:16,420
sample.

102
00:08:16,420 --> 00:08:24,130
And then we have our discriminator who is able to classify or say whether an input is fake or not.

103
00:08:24,490 --> 00:08:28,330
Okay, so getting back here, let's take this off.

104
00:08:28,630 --> 00:08:30,010
Getting back here.

105
00:08:30,910 --> 00:08:36,770
When we want to train our discriminator, we're going to have an input X right here.

106
00:08:36,800 --> 00:08:38,330
You see this input x?

107
00:08:38,350 --> 00:08:42,310
Now this input X is this real sample right here.

108
00:08:42,310 --> 00:08:46,900
So this is our x I, which is which is this here.

109
00:08:46,900 --> 00:08:49,870
And this is our discriminator, which takes in this input.

110
00:08:49,900 --> 00:08:55,450
Then once the discriminator takes this input is expected to output a one.

111
00:08:55,450 --> 00:09:04,060
And if you get back your let's get back here you find you're told update the discriminator by ascending

112
00:09:04,090 --> 00:09:09,520
its stochastic gradient and then you update the generator by descending stochastic gradient.

113
00:09:09,550 --> 00:09:16,180
Now generally when we're trying to ascend or ascending or descending actually is different, like what

114
00:09:16,180 --> 00:09:22,240
we've seen already, where we had theta equal theta minus learning rate, partial derivative of the

115
00:09:22,240 --> 00:09:27,070
loss with respect to theta is gradient descent.

116
00:09:28,220 --> 00:09:30,730
Which is actually what we're doing for the generator.

117
00:09:30,740 --> 00:09:34,910
But when we talk about gradient ascent, it's actually this.

118
00:09:34,940 --> 00:09:38,030
Instead, we have a plus instead of a minus.

119
00:09:38,060 --> 00:09:38,870
See that?

120
00:09:38,960 --> 00:09:48,620
So, um, if we have for, uh, let's plot our loss function here with respect to theta for our gradient

121
00:09:48,620 --> 00:09:50,750
descent, that's classical gradient descent.

122
00:09:50,780 --> 00:09:53,750
What we want to do is to minimize this loss.

123
00:09:53,750 --> 00:09:58,250
But for our gradient ascent, we want to instead maximize this loss.

124
00:09:59,730 --> 00:10:07,290
And then getting back here before we move on, we'll consider this plot of the log function.

125
00:10:07,290 --> 00:10:15,300
So here if we have X and we have log of x, then we have a plot which looks like this.

126
00:10:15,330 --> 00:10:17,310
Oops, let's have this.

127
00:10:18,480 --> 00:10:19,230
There we go.

128
00:10:19,230 --> 00:10:20,880
And this is one.

129
00:10:21,150 --> 00:10:23,820
So when x equal one, the log is zero.

130
00:10:23,820 --> 00:10:31,130
And then as X takes smaller values approaching zero, the log goes towards negative infinity.

131
00:10:31,140 --> 00:10:38,550
So as we as X goes towards zero, obviously from the right in this direction, going in this direction,

132
00:10:38,550 --> 00:10:39,510
then.

133
00:10:40,800 --> 00:10:43,320
The log of X goes towards negative infinity.

134
00:10:43,320 --> 00:10:44,040
So that's it.

135
00:10:44,310 --> 00:10:46,410
Now let's get back here.

136
00:10:46,980 --> 00:10:50,790
For the discriminator, as we've seen, we're having a gradient ascent.

137
00:10:50,790 --> 00:10:53,040
So we're trying to maximize this.

138
00:10:53,040 --> 00:10:59,550
Remember, from our minimax loss function, we we try to maximize our discriminator.

139
00:10:59,550 --> 00:11:08,850
So when we when the discriminator takes in a real sample, um, it, it gives us an output of one and

140
00:11:09,270 --> 00:11:14,160
with an output of one, the log of one, as we'll see here, is going to give us zero.

141
00:11:14,160 --> 00:11:15,810
So the log of one is zero.

142
00:11:15,810 --> 00:11:17,780
So this will give us a zero.

143
00:11:17,790 --> 00:11:26,730
Now, you should note that all the outputs of our discriminator range between 0 and 1, so our discriminator

144
00:11:26,730 --> 00:11:29,700
is our usual classifier.

145
00:11:29,700 --> 00:11:32,190
So it's going to output values between 0 and 1.

146
00:11:32,190 --> 00:11:40,360
And so getting a value of zero is the highest you could get as an output after passing through the log.

147
00:11:40,360 --> 00:11:42,010
So let me explain.

148
00:11:42,110 --> 00:11:47,140
We have this discriminator which outputs a value between 0 and 1 or let's plot it out.

149
00:11:47,140 --> 00:11:48,580
Let's replot it again here.

150
00:11:48,580 --> 00:11:53,710
So the discriminator outputs values between 0 and 1.

151
00:11:53,710 --> 00:11:54,430
See that?

152
00:11:54,430 --> 00:11:55,960
So this is what the discriminator outputs.

153
00:11:55,960 --> 00:12:03,520
And so when, when you when you have the log of this values between 0 and 1, then the maximum value

154
00:12:03,520 --> 00:12:07,930
for the log will be a zero.

155
00:12:08,260 --> 00:12:13,330
You see that the maximum value for all these values between 0 and 1 will be a zero.

156
00:12:13,330 --> 00:12:19,120
So when you have a zero, it means you've maximized this and that falls in line with what you expect

157
00:12:19,120 --> 00:12:25,750
because we want to maximize this, um, expression we have right here.

158
00:12:25,990 --> 00:12:26,740
Okay?

159
00:12:26,740 --> 00:12:32,950
So we get back here for the reals we want to output a one and the log of one will give us the maximum

160
00:12:32,950 --> 00:12:35,170
possible value, which is in this case is zero.

161
00:12:35,170 --> 00:12:38,740
Now, for this year we have a Z.

162
00:12:38,770 --> 00:12:41,530
Remember this X, Is this real image here.

163
00:12:41,650 --> 00:12:46,870
The Z here is our random noise, which we have seen already here.

164
00:12:46,870 --> 00:12:53,770
So when this random noise passes through our generator, we output a fake sample.

165
00:12:53,800 --> 00:12:58,660
See that the generator takes in the random noise and then the generator.

166
00:12:58,690 --> 00:12:59,860
Let's have this here.

167
00:12:59,890 --> 00:13:02,920
This generator outputs this fake sample.

168
00:13:02,920 --> 00:13:10,990
And once he outputs this, our fake sample, we now take this fake sample and pass it into our discriminator.

169
00:13:10,990 --> 00:13:16,060
And we expect our discriminator to produce instead of a one this time around.

170
00:13:16,750 --> 00:13:17,650
A zero.

171
00:13:17,920 --> 00:13:18,700
See that?

172
00:13:19,150 --> 00:13:29,320
And in our case, the log, the log of one minus a zero is a log of one and the log of one is zero.

173
00:13:29,350 --> 00:13:35,620
That's the highest possible value we could get when we're dealing with logs in the range 0 to 1.

174
00:13:35,620 --> 00:13:37,450
So we're maximizing this.

175
00:13:37,490 --> 00:13:38,170
You see that?

176
00:13:38,200 --> 00:13:41,380
Now, if that's understood, we could move on to the generator.

177
00:13:41,380 --> 00:13:45,570
For the generator, We want to instead minimize this expression.

178
00:13:45,580 --> 00:13:52,270
So since we were trying to minimize this expression, we would expect the output from this to be negative

179
00:13:52,270 --> 00:13:54,470
infinity as a lowest possible value.

180
00:13:54,490 --> 00:14:00,910
But let's get let's put in these values and see how we obtain this negative infinity.

181
00:14:00,940 --> 00:14:03,490
Now we have Z, which is our random noise.

182
00:14:04,030 --> 00:14:08,950
When the random noise passes through G, it outputs this fake sample again.

183
00:14:08,950 --> 00:14:12,100
And now this time around, we expect the.

184
00:14:13,500 --> 00:14:18,080
To consider this, for example, this time around to be like a real sample.

185
00:14:18,090 --> 00:14:25,280
So what we're saying here is at one instance, we want our discriminator to see this as real.

186
00:14:25,290 --> 00:14:29,670
Another instance, we want the discriminator to see this as fake.

187
00:14:30,810 --> 00:14:38,310
And depending on the instance we're going to get or we're going to update our parameters of the corresponding

188
00:14:38,340 --> 00:14:39,270
network.

189
00:14:39,270 --> 00:14:46,890
So in the case where we want to see this as a fake, we want to update the parameters of the discriminator

190
00:14:47,520 --> 00:14:50,010
such that it sees this as fake.

191
00:14:50,010 --> 00:14:55,650
And then the case where we want this to be seen as a real by the discriminator, one will be updating

192
00:14:55,650 --> 00:14:57,690
the parameters of the generator.

193
00:14:57,690 --> 00:15:05,310
So in fact, um, what's going to be happening here is let's change this color when training the discriminator,

194
00:15:05,310 --> 00:15:07,200
we're going to freeze the generator.

195
00:15:07,200 --> 00:15:08,640
So we're not going to update these parameters.

196
00:15:08,640 --> 00:15:10,680
We're going to update the parameters of the discriminator.

197
00:15:10,680 --> 00:15:13,060
Now, when training, when training.

198
00:15:13,060 --> 00:15:19,300
Let's get back when training our generator, we're going to freeze this here.

199
00:15:20,180 --> 00:15:25,320
Oops when training the generator, we're going to freeze this and then update its parameters so that

200
00:15:25,340 --> 00:15:29,270
it's able to fool the discriminator to think that this is a real sample.

201
00:15:29,270 --> 00:15:33,560
And when it thinks it's a real sample, it's going to output a one.

202
00:15:33,770 --> 00:15:40,160
Now log of one minus one is log of zero and log of zero is negative infinity.

203
00:15:40,190 --> 00:15:41,090
See that?

204
00:15:41,090 --> 00:15:43,580
And this is the minimum possible value.

205
00:15:43,580 --> 00:15:46,550
So we're minimizing this expression.

206
00:15:46,550 --> 00:15:52,580
So getting back here, we have for a number of training iterations that's basically for the number of

207
00:15:52,580 --> 00:15:53,330
epochs.

208
00:15:53,750 --> 00:15:55,640
Uh, we're going to do K steps.

209
00:15:55,670 --> 00:15:59,060
We're going to update the discriminator via K steps.

210
00:15:59,250 --> 00:16:01,370
See, for K steps do this.

211
00:16:01,400 --> 00:16:03,950
We take we sample a mini batch of noise.

212
00:16:03,950 --> 00:16:07,870
We get the noise, uh, mini batch of real samples.

213
00:16:07,880 --> 00:16:17,600
Then from here we obtain, uh, we get the generator outputs and then we obtain the output loss which

214
00:16:17,600 --> 00:16:20,730
we use to update the discriminator parameters.

215
00:16:20,730 --> 00:16:29,090
And then for this k steps, I think in here, let's say we use K equals one, the number of steps we

216
00:16:29,100 --> 00:16:30,630
apply to use K equal one.

217
00:16:30,630 --> 00:16:33,810
So you could modify this, although in practice k equals one is fine.

218
00:16:33,810 --> 00:16:41,250
So, um, getting back here after going through k steps for this, we're going to update now the generator.

219
00:16:41,250 --> 00:16:44,010
So sample mini batch of M noise again.

220
00:16:44,010 --> 00:16:49,260
So just like this, then we obtain this output from the generator.

221
00:16:49,290 --> 00:16:51,690
This time around, we expect it to fool the discriminator.

222
00:16:51,690 --> 00:16:57,720
So we're going to update the generator by descending this stochastic gradient such that it fools the

223
00:16:57,720 --> 00:16:58,590
discriminator.

224
00:16:58,590 --> 00:17:00,210
And that's it for this section.

225
00:17:00,210 --> 00:17:07,410
In the next section, we are going to get into some practice and see how to get this kind of results

226
00:17:07,410 --> 00:17:08,610
we had here.

227
00:17:08,610 --> 00:17:15,090
So this are the real samples and then what we will obtain will be something like this.

228
00:17:15,120 --> 00:17:21,450
See, we're able to generate this fake outputs or these people who do not actually exist and they look

229
00:17:21,450 --> 00:17:22,620
pretty realistic.

230
00:17:22,650 --> 00:17:28,620
Then one point we have to note before we move on is that the type of neural network used in this original

231
00:17:28,620 --> 00:17:34,230
Gan paper is a classic artificial neural network, and this way the kinds of outputs they got.

232
00:17:34,260 --> 00:17:38,610
But in the practice, we're going to be making use of the Gan.

233
00:17:38,610 --> 00:17:42,120
So instead of the simple gan, we'll make use of the DC gan.

234
00:17:42,150 --> 00:17:45,450
The DC actually stands for Deep Convolutional.

235
00:17:45,450 --> 00:17:46,920
Um, and that's it.

236
00:17:46,960 --> 00:17:54,180
DC Deep Convolutional So this again means deep convolutional generative adversarial neural networks.

237
00:17:54,210 --> 00:18:00,540
Okay, so here we're going to be looking at this, um, neural networks which are convolution based.

238
00:18:00,540 --> 00:18:02,460
And so with that, see you in the next section.