1
00:00:00,170 --> 00:00:08,120
Hello, everyone, and welcome to this new section in which we are going to be building our own variational

2
00:00:08,120 --> 00:00:11,000
autoencoder models from scratch.

3
00:00:11,000 --> 00:00:20,630
Previously we saw how variational autoencoders could be used in helping to generate new images where

4
00:00:20,630 --> 00:00:30,230
we build out this encoder decoder structure such that we could produce outputs which are similar to

5
00:00:30,230 --> 00:00:34,160
the inputs while being entirely new images.

6
00:00:34,520 --> 00:00:41,750
In this section, what we'll be doing will be to build our own variational auto encoders and generate

7
00:00:41,780 --> 00:00:43,700
our own images.

8
00:00:43,970 --> 00:00:46,250
The data will be used in training.

9
00:00:46,250 --> 00:00:54,260
Our variational auto encoder will be the MNIST dataset which you could get from TensorFlow datasets.

10
00:00:54,260 --> 00:01:02,310
So right here we load this dataset and then we concatenate both the training and the test datasets.

11
00:01:02,310 --> 00:01:10,020
So generally we usually have a dataset made of say X train and y train X test y test.

12
00:01:10,020 --> 00:01:14,430
But since here we are not going to be making use of this outputs.

13
00:01:14,440 --> 00:01:16,260
That's the Y train and y test.

14
00:01:16,290 --> 00:01:22,230
We just get this tool and then we concatenate both since we we're not be having a test set.

15
00:01:22,230 --> 00:01:24,180
So basically we have that.

16
00:01:24,180 --> 00:01:30,630
And then one other modification we make is we are one of the preprocessing step we take is we divide

17
00:01:30,630 --> 00:01:36,900
this values by 255, so we normalize our dataset.

18
00:01:36,900 --> 00:01:45,030
So let's run this and then once our dataset has been downloaded, what we do now is we convert this

19
00:01:45,030 --> 00:01:49,470
dataset into the TensorFlow data format.

20
00:01:49,470 --> 00:02:03,600
So we have our dataset, which is TF dataset or rather data TF data dataset from, from tensor slices.

21
00:02:03,600 --> 00:02:10,590
So you see that we take that and then we pass in our MNIST digits, which we've already downloaded.

22
00:02:10,590 --> 00:02:13,950
So we run that and that should be fine.

23
00:02:14,490 --> 00:02:20,880
Now we could check out the length of this dataset and you see it should have 70,000.

24
00:02:20,880 --> 00:02:25,770
So we have 70,000 different data points which make up our dataset.

25
00:02:26,490 --> 00:02:29,310
From here we're going to define the batch size.

26
00:02:29,310 --> 00:02:33,210
So we will have a batch size of 128.

27
00:02:33,660 --> 00:02:34,590
That's it.

28
00:02:34,590 --> 00:02:39,510
And then we'll go to the usual steps of shuffling our dataset.

29
00:02:39,540 --> 00:02:44,130
We have our dataset, we shuffle, we batch, and then we prefetch.

30
00:02:44,130 --> 00:02:48,420
Now if you're new to this, you could check out the previous sections in this course.

31
00:02:48,420 --> 00:02:55,350
Anyways, we have this three as we've said, and then now we could run this.

32
00:02:56,510 --> 00:02:57,530
So that's it.

33
00:02:57,550 --> 00:02:59,300
Could say train.

34
00:03:00,180 --> 00:03:01,080
There is sent.

35
00:03:01,560 --> 00:03:02,970
There we go.

36
00:03:03,000 --> 00:03:03,690
You see that?

37
00:03:03,690 --> 00:03:07,590
We have this train data set here, and we we could see its shape.

38
00:03:07,590 --> 00:03:14,430
So it's all 28 by 28 by one images we have in our data set and there are 70,000 of them.

39
00:03:14,430 --> 00:03:16,770
So that's fine.

40
00:03:17,850 --> 00:03:21,210
Now, getting to the modeling, we're going to start with the encoder.

41
00:03:21,240 --> 00:03:26,340
You can recall that what we had seen so far was, oops.

42
00:03:26,370 --> 00:03:34,500
What we had seen so far was this model or this encoder model, which takes in an input image right here

43
00:03:34,500 --> 00:03:40,530
and then outputs the mean and the variance.

44
00:03:40,530 --> 00:03:47,670
So we have the mean and the variance and then this tool have been, um.

45
00:03:49,070 --> 00:03:56,810
Combined via the reparameterization technique where we have mu plus sigma.

46
00:03:57,630 --> 00:04:01,500
Times a random value drawn from a normal distribution.

47
00:04:01,500 --> 00:04:05,820
And then this Z is passed into a decoder here.

48
00:04:06,090 --> 00:04:13,080
So we pass this into a decoder and then we get an output image such that the the difference between

49
00:04:13,080 --> 00:04:15,210
these two is minimized.

50
00:04:15,240 --> 00:04:20,760
Now, that said, let's get back to the code and we design our encoder.

51
00:04:20,760 --> 00:04:24,960
So our encoder here is going to be a very simple convnet.

52
00:04:24,990 --> 00:04:28,080
We'll start by defining the latent dimension.

53
00:04:28,890 --> 00:04:31,190
Um, there we go.

54
00:04:31,200 --> 00:04:32,580
Let's just put this right here.

55
00:04:32,580 --> 00:04:35,880
So latent dimension will be two.

56
00:04:36,180 --> 00:04:40,620
We have that, and then we get back here for our encoder.

57
00:04:40,620 --> 00:04:44,820
We're going to start, as we said, with this encoder input.

58
00:04:44,820 --> 00:04:46,680
So we have an encoder input.

59
00:04:46,680 --> 00:04:55,650
And then this this has a shape which we're going to give to be 28 by 28 by one, just as we just just

60
00:04:55,650 --> 00:04:59,380
the same as that of our images in our dataset.

61
00:04:59,380 --> 00:05:00,640
So that's the input.

62
00:05:00,670 --> 00:05:01,780
We've seen this already.

63
00:05:01,780 --> 00:05:15,760
And then from here we'll define a conv 2d, a conv 2D, which um, has 32 filters, uh, three by three

64
00:05:15,970 --> 00:05:17,410
activation.

65
00:05:17,530 --> 00:05:23,770
RELU So we're supposing that you already have some background knowledge and convnets activation?

66
00:05:23,770 --> 00:05:29,740
RELU Uh, number of strides equal to the padding.

67
00:05:29,770 --> 00:05:30,640
Same.

68
00:05:30,640 --> 00:05:34,300
So we're going to build this very basic, um, convnet.

69
00:05:34,330 --> 00:05:36,730
Now, this takes in the encoder inputs.

70
00:05:36,760 --> 00:05:41,800
Remember or recall, we're using the the Keras functional API right here.

71
00:05:41,800 --> 00:05:44,690
So we have encoder inputs.

72
00:05:44,710 --> 00:05:46,960
We then create another conv layer.

73
00:05:46,960 --> 00:05:49,330
We'll just basically copy this and paste it out.

74
00:05:49,330 --> 00:05:55,780
And then what we'll have here is an increased number of channels.

75
00:05:55,780 --> 00:06:01,540
So we have 64 here and from here we'll go ahead with flatten our outputs.

76
00:06:01,570 --> 00:06:09,460
Now note that here we have X, so we should change this to X From here we move on to flatten so or we

77
00:06:09,460 --> 00:06:13,660
have flatten and this takes in x.

78
00:06:13,660 --> 00:06:15,940
So now the output is flattened.

79
00:06:16,630 --> 00:06:23,530
We are now going to output both the mean and the standard deviation which we are going to use in sampling.

80
00:06:24,340 --> 00:06:28,900
But before that we'll pass this into another dense layer.

81
00:06:28,900 --> 00:06:39,310
So here we have this dense layer, say 16 outputs, uh, activation, activation, RELU And then we

82
00:06:39,310 --> 00:06:45,850
take in X, Okay, so we have that and now we're ready to let's just copy from here.

83
00:06:45,880 --> 00:06:49,600
We're ready to, to get the mean and the standard deviation.

84
00:06:49,930 --> 00:06:51,160
Copy that.

85
00:06:51,430 --> 00:06:53,620
Paste it out here and this.

86
00:06:53,650 --> 00:06:58,180
Okay, so here we have the mean and we'll have the standard deviation.

87
00:06:58,180 --> 00:07:00,880
So we have dense activation.

88
00:07:00,880 --> 00:07:02,500
RELU Standard deviation.

89
00:07:02,500 --> 00:07:09,490
And then here, since remember we have uh, this output to be the latent dimension.

90
00:07:09,490 --> 00:07:17,110
So here we have instead of 16, we now have latent dimension which we've already fixed right here to

91
00:07:17,110 --> 00:07:17,740
be two.

92
00:07:18,580 --> 00:07:24,940
Now, one very important reason why we, we have this activations to be relu here will be simply the

93
00:07:24,940 --> 00:07:29,950
fact that the the mean and the standard deviation are all positive numbers.

94
00:07:29,950 --> 00:07:37,570
So because, uh, we may output or we may get negative numbers here, we want to always make sure that

95
00:07:37,570 --> 00:07:40,060
the values we get are positive.

96
00:07:40,090 --> 00:07:40,990
Now.

97
00:07:42,210 --> 00:07:49,950
The problem with a standard deviation, particularly, is the fact that it's usually a very small number

98
00:07:49,950 --> 00:07:59,180
between 0 and 1, where the number is very far away from one, meaning that the number is very instead

99
00:07:59,190 --> 00:08:00,180
closer to zero.

100
00:08:00,180 --> 00:08:03,720
So we'll have a number very close to zero like this.

101
00:08:04,730 --> 00:08:14,170
But then the problem with working with the RELU is that having to find derivatives around this, uh,

102
00:08:14,180 --> 00:08:19,730
zero here will lead to numerical instability during training.

103
00:08:19,730 --> 00:08:27,380
And so what we want to do instead is to map this range of values or this possible range of values that

104
00:08:27,380 --> 00:08:30,840
the standard deviation can take to a larger range.

105
00:08:30,860 --> 00:08:34,490
Now, to carry out this mapping, we have to use a function.

106
00:08:35,580 --> 00:08:44,430
Which is both continuous in this range and monotonous that is either increasing or decreasing.

107
00:08:44,460 --> 00:08:49,380
Now, one great function for this task will be the log function.

108
00:08:51,600 --> 00:08:59,340
As this log function will map values of X, the range 0 to 1 to values in the range.

109
00:08:59,940 --> 00:09:02,070
Oh, let's open it up here.

110
00:09:02,070 --> 00:09:08,880
And the range negative infinity log of the limit as we go to a zero of the log is negative infinity.

111
00:09:08,880 --> 00:09:11,370
If you plot out the log, you will have something like this.

112
00:09:12,180 --> 00:09:13,830
Um, let's have something like this.

113
00:09:13,830 --> 00:09:19,320
So you find that as you go towards zero log that goes towards negative infinity.

114
00:09:19,320 --> 00:09:23,820
So, uh, that's why we have negative infinity here and the log of one.

115
00:09:24,990 --> 00:09:26,100
Is zero.

116
00:09:26,550 --> 00:09:31,710
So we go from this range to this larger range.

117
00:09:31,710 --> 00:09:36,090
Hence we can have a much more stable training process.

118
00:09:36,660 --> 00:09:44,190
And so what we'll do now is instead of relying on this, uh, RELU activation to ensure that our standard

119
00:09:44,190 --> 00:09:52,140
deviation is always positive, what we'll do is we'll instead compute the log of the standard deviation

120
00:09:52,140 --> 00:09:57,120
square, which happens to be the log of the variance.

121
00:09:58,080 --> 00:10:03,620
So we're going to take this off for both the standard deviation and the mean.

122
00:10:03,630 --> 00:10:04,770
Let's take that off.

123
00:10:04,770 --> 00:10:10,560
And then right here we have, um, log var.

124
00:10:10,590 --> 00:10:16,740
That's a log of the standard deviation square, uh, log var, and then we'll move on to the sampling

125
00:10:16,740 --> 00:10:19,050
process where we're going to obtain Z.

126
00:10:19,260 --> 00:10:27,220
Remember, Z is equal MU, which is the mean plus, uh, the standard deviation Times Epsilon where

127
00:10:27,220 --> 00:10:32,260
epsilon here is a random number drawn from a normal distribution.

128
00:10:32,500 --> 00:10:38,920
We are now going to create the sampling layer which takes in the mean and the log of the variance and

129
00:10:38,920 --> 00:10:40,810
then outputs the Z.

130
00:10:40,840 --> 00:10:45,460
So here we have Z equals sampling.

131
00:10:46,060 --> 00:10:47,170
There we go.

132
00:10:47,170 --> 00:10:54,940
We now taking as inputs the mean and the log of the variance and that's it.

133
00:10:54,940 --> 00:10:59,320
So from now we're going to create the sampling uh, layer.

134
00:10:59,320 --> 00:11:02,080
So let's go ahead and create the sampling layer.

135
00:11:02,170 --> 00:11:06,190
So we have here sampling, sampling.

136
00:11:06,220 --> 00:11:07,510
There we go.

137
00:11:07,510 --> 00:11:09,520
And this is a layer.

138
00:11:09,940 --> 00:11:11,590
Okay, so we have that.

139
00:11:11,590 --> 00:11:17,950
Then we just, we just have this call method which takes in our inputs.

140
00:11:18,190 --> 00:11:20,650
Let's, let's just have our inputs.

141
00:11:20,650 --> 00:11:27,820
And then from those inputs, what we'll have is the mean and the variance.

142
00:11:28,060 --> 00:11:32,680
So basically we have what we extract, the mean and the variance from this inputs.

143
00:11:32,680 --> 00:11:39,130
Remember, we have the mean and a log let's, let's say mean and the log var actually.

144
00:11:39,130 --> 00:11:47,740
So we have that mean and log var which we extract from the inputs and then.

145
00:11:49,730 --> 00:11:57,770
If you remember what we have here to the way we obtain Z is mu plus sigma times this random number right

146
00:11:57,770 --> 00:11:58,260
here.

147
00:11:58,280 --> 00:12:07,160
Now let's let's see how when given mu when we, when we get mu and we get the log of the variance,

148
00:12:07,190 --> 00:12:11,570
we are able to obtain this sigma because we already have mu.

149
00:12:11,570 --> 00:12:12,880
But now we need to get sigma.

150
00:12:12,890 --> 00:12:17,630
Now note that sigma is a standard deviation and the standard deviation.

151
00:12:17,630 --> 00:12:21,710
Or let's say let's just write Sigma and sigma itself.

152
00:12:23,080 --> 00:12:27,310
Is equal, the square root of the variance.

153
00:12:27,460 --> 00:12:29,770
You see that it's equal to square root of the variance.

154
00:12:29,770 --> 00:12:38,620
So it means that, um, and then we know that the variance written like this can be written as e plus

155
00:12:38,620 --> 00:12:43,570
exponential to the power of log of the variance.

156
00:12:44,650 --> 00:12:52,480
So generally we know that X equal E to the log of x, you see that.

157
00:12:52,480 --> 00:12:56,230
So here we have E to the log of the variance and then we also have the square root.

158
00:12:56,230 --> 00:12:57,810
So let's have this here.

159
00:12:57,820 --> 00:13:03,100
Now this is equal E to the log of the variance.

160
00:13:03,100 --> 00:13:06,520
Let's just leave that as V and all of this to the power of a half.

161
00:13:07,340 --> 00:13:14,570
We also know that e to the power of E to the power of X, to the power of all of this, to the power

162
00:13:14,570 --> 00:13:16,740
of E is equal.

163
00:13:16,790 --> 00:13:18,980
E to the power of a x.

164
00:13:19,010 --> 00:13:19,640
See that?

165
00:13:19,640 --> 00:13:28,280
So here this is going to be equal E to the power of half times log V, so it's going to be half log

166
00:13:28,280 --> 00:13:29,360
variance.

167
00:13:29,930 --> 00:13:37,100
So now, now that we have the log of the variance to obtain the sigma, this basically what we need

168
00:13:37,100 --> 00:13:37,700
to do.

169
00:13:38,000 --> 00:13:51,110
So let's get back and then we have um, the mean plus the sigma, which is a exponential.

170
00:13:51,110 --> 00:14:00,170
And then we have a half, so 0.5 times the log of the variance.

171
00:14:00,950 --> 00:14:07,560
And then we need to multiply this year the sigma by our um, random value.

172
00:14:07,560 --> 00:14:17,580
So we have the random normal and then we specify a shape which is simply the batch size and the latent

173
00:14:17,910 --> 00:14:20,670
latent dimension.

174
00:14:21,150 --> 00:14:26,360
Now that we have this, we can now go ahead and define our encoder model.

175
00:14:26,400 --> 00:14:34,680
Call this encoder model, which is uh, TensorFlow model, and we have encoder inputs for the input.

176
00:14:34,710 --> 00:14:35,640
There we go.

177
00:14:35,670 --> 00:14:36,540
That's it here.

178
00:14:36,540 --> 00:14:45,150
And then for the outputs, we have this list made of, uh, Z, the mean and the log variance.

179
00:14:45,150 --> 00:14:49,500
So we have mean and log var.

180
00:14:49,560 --> 00:14:51,270
Okay, so let's give it a name.

181
00:14:51,270 --> 00:14:53,580
We'll call this encoder.

182
00:14:53,610 --> 00:14:57,450
Now, once we have this, we could, uh, get a summary of this.

183
00:14:57,450 --> 00:14:59,550
So we have encoder model.

184
00:14:59,550 --> 00:15:01,020
Summary.

185
00:15:01,910 --> 00:15:04,490
Wish we could visualize right here.

186
00:15:04,520 --> 00:15:05,210
See that?

187
00:15:06,740 --> 00:15:07,370
From here.

188
00:15:07,370 --> 00:15:10,850
Now we will move on to defining our decoder.

189
00:15:12,220 --> 00:15:18,550
The decoder, as we've seen already, will take in the Z here.

190
00:15:19,410 --> 00:15:22,560
And then I'll put the images.

191
00:15:24,310 --> 00:15:30,010
So we'll go ahead here and create the inputs called as latent inputs.

192
00:15:30,010 --> 00:15:37,210
And then there we have our TensorFlow input and shape is going to be the same as that of Z.

193
00:15:37,210 --> 00:15:40,930
So right here we have latent Z and that's fine.

194
00:15:41,290 --> 00:15:43,630
Okay, so now we have this.

195
00:15:43,660 --> 00:15:46,750
We're going to take our Z, Let's get back here.

196
00:15:47,380 --> 00:15:59,890
So now we have to upsample this year, which has a shape to now an image of shape, 28 by 28 by one.

197
00:16:00,070 --> 00:16:04,420
But generally what we've been doing is we've been used to Downsampling.

198
00:16:04,420 --> 00:16:13,420
So in Downsampling or we have an input, we have some convnet layers, we stack them up and then when

199
00:16:13,420 --> 00:16:21,070
we get towards the end we could flatten and then we have some dense layers with a specific output which

200
00:16:21,070 --> 00:16:23,440
matches the type of output we want to get.

201
00:16:23,530 --> 00:16:27,470
But in our case, we're now doing some sort of the opposite of this.

202
00:16:27,470 --> 00:16:34,790
So what we'll do is we'll go through some dense layer here, we'll go to some dense layer, and then

203
00:16:34,790 --> 00:16:42,170
from this dense layer we would pass this into a transpose convolution layer.

204
00:16:42,170 --> 00:16:49,700
So we've been using the convolution layer, but here we'll just use the Conv 2D transpose layer.

205
00:16:49,910 --> 00:17:00,890
Essentially what we have is this come to the layer with its weights which upsamples inputs.

206
00:17:01,670 --> 00:17:09,170
Now getting back to the code, we have this input which has shape batch size by letting dimension let's

207
00:17:09,170 --> 00:17:09,890
say two.

208
00:17:10,040 --> 00:17:18,860
And then from here we want to make use of a conv 2d transpose layer which takes inputs batch size by

209
00:17:18,860 --> 00:17:23,060
some x, by some y by some z.

210
00:17:23,090 --> 00:17:25,880
That's similar to the conv layer.

211
00:17:25,880 --> 00:17:30,800
So what we will have to do is we're going to reshape this actually.

212
00:17:30,800 --> 00:17:34,010
So we have to reshape this such that we have something like this.

213
00:17:34,010 --> 00:17:44,150
Now if we want to have a x, y, z such that X is say for example, seven, y is seven and Z is say

214
00:17:44,150 --> 00:17:46,520
64, Let's take the example.

215
00:17:47,390 --> 00:17:57,770
Then we have to ensure that what we getting after this has shape be by um, this seven by seven by 64,

216
00:17:57,770 --> 00:18:03,170
seven times seven times 64.

217
00:18:03,680 --> 00:18:08,270
You see, um, getting back here we have this.

218
00:18:08,270 --> 00:18:11,390
Let's, let's, let's maybe redraw this again so it's clear.

219
00:18:11,390 --> 00:18:19,970
So what we're saying is we have this input and then what we intend to have is something like a batch

220
00:18:19,970 --> 00:18:25,360
size by seven, by seven by 64.

221
00:18:25,370 --> 00:18:31,520
Now, the reason why we're picking seven is because the output is 28 by 28.

222
00:18:31,520 --> 00:18:40,760
So one, to be able to upsample sorry that we could say seven by seven Upsampled to 14 by 14 and then

223
00:18:40,760 --> 00:18:44,510
the 14 by 14 Upsample to 28 by 28.

224
00:18:44,510 --> 00:18:46,220
So that's why we're picking seven here.

225
00:18:46,220 --> 00:18:51,500
Now, uh, this here, you see, it doesn't match with this.

226
00:18:51,500 --> 00:18:54,050
There's no way we could reshape to, to become this.

227
00:18:54,050 --> 00:19:01,520
So what we'll do is that we'll pass this through a dense layer which has outputs batch by seven times,

228
00:19:01,520 --> 00:19:04,940
seven times 64.

229
00:19:04,940 --> 00:19:07,940
And now after reshaping, we could obtain this.

230
00:19:07,940 --> 00:19:16,760
So let's get into the code and what we'll have is our dense layer, dense, oops, we have our dense

231
00:19:16,760 --> 00:19:24,440
layer, um, seven times, seven times 64 and then.

232
00:19:25,570 --> 00:19:27,280
We have the activation.

233
00:19:28,490 --> 00:19:29,480
Activation.

234
00:19:29,480 --> 00:19:33,670
RELU And this simply takes in the latent inputs.

235
00:19:33,680 --> 00:19:34,980
So we have here.

236
00:19:35,000 --> 00:19:36,230
Latent inputs.

237
00:19:36,230 --> 00:19:38,630
There we go then from from here.

238
00:19:38,630 --> 00:19:40,190
Now we do the reshaping.

239
00:19:40,190 --> 00:19:45,410
So we have x equal reshape and then we specify the shape.

240
00:19:45,410 --> 00:19:50,030
So we have seven by seven by 64.

241
00:19:50,030 --> 00:19:50,810
So that's it.

242
00:19:50,840 --> 00:19:52,340
Now we have that.

243
00:19:52,340 --> 00:19:54,350
We have X, there we go.

244
00:19:54,350 --> 00:19:57,080
We now start with our conv 2D transpose.

245
00:19:57,080 --> 00:19:58,610
So you see we reshape this into this.

246
00:19:58,610 --> 00:20:07,730
Now could make use of our conv to the transpose come to the transpose which takes in a number of filters

247
00:20:07,730 --> 00:20:09,050
is very similar to the conv layer.

248
00:20:09,050 --> 00:20:17,690
So we have a number of filters, let's say 64 filters and then the kernel size three, the activation.

249
00:20:18,230 --> 00:20:23,420
RELU So we're going to use the RELU activation number of strides.

250
00:20:23,810 --> 00:20:27,500
Um, two the padding is going to be the same.

251
00:20:27,500 --> 00:20:28,140
See?

252
00:20:28,140 --> 00:20:32,340
So it's quite similar to the conv layer, but with the difference is that now we upsampling instead

253
00:20:32,340 --> 00:20:33,330
of downsampling.

254
00:20:33,330 --> 00:20:34,920
So we have that.

255
00:20:35,130 --> 00:20:40,170
Then from here we are going to change this to 32.

256
00:20:40,170 --> 00:20:50,100
So for the encoder, what we did was we increased this year this number of channels and then here we

257
00:20:50,100 --> 00:20:51,780
reduce the number of channels.

258
00:20:51,810 --> 00:21:01,770
Now in our final output layer, we're going to have this decoder output, which is going to have just

259
00:21:01,770 --> 00:21:03,150
one channel.

260
00:21:03,150 --> 00:21:12,600
So here we have an output which is 28 by 28 by one where the values lie between 0 and 1.

261
00:21:13,050 --> 00:21:18,720
So what we're going to do now is we're going to have channel numbers equal one.

262
00:21:19,080 --> 00:21:24,120
Um, the activation instead of RELU will be sigmoid.

263
00:21:24,240 --> 00:21:31,680
So here we have sigmoid and then we're not going to use any strides since we're not Upsampling So that's

264
00:21:31,680 --> 00:21:32,010
it.

265
00:21:32,040 --> 00:21:37,230
Now, the reason why we're using sigmoid here is quite simple since one of our values will be fall between

266
00:21:37,230 --> 00:21:43,440
0 and 1, we want that each and every time we have an input, we have something like this.

267
00:21:43,440 --> 00:21:52,290
So no matter the input we have, the sigmoid will always put the the value or make that input turn into

268
00:21:52,290 --> 00:21:55,590
a value which lies between 0 and 1.

269
00:21:55,590 --> 00:22:00,120
And that's basically what we want to do in this last layer right here.

270
00:22:00,120 --> 00:22:02,610
So that's why you see we're making use of the sigmoid.

271
00:22:02,640 --> 00:22:11,700
Now once that's done, we create our decoder model, which is TensorFlow model, and then we have our

272
00:22:11,700 --> 00:22:13,500
latent inputs.

273
00:22:13,530 --> 00:22:14,670
There we go.

274
00:22:14,670 --> 00:22:17,580
We have our decoder output.

275
00:22:18,030 --> 00:22:21,180
Uh, the name is Decoder.

276
00:22:21,270 --> 00:22:25,500
And then we could, uh, get a summary of this model.

277
00:22:25,500 --> 00:22:27,360
So that's basically it.

278
00:22:27,690 --> 00:22:29,600
Let's run that.

279
00:22:29,600 --> 00:22:30,670
And there we go.

280
00:22:30,680 --> 00:22:34,250
See, we have our decoder model now for the training.

281
00:22:34,250 --> 00:22:41,150
We are going to make use of the Adam Optimizer with the learning rate of of 0.001 and we're going to

282
00:22:41,150 --> 00:22:43,190
train for over 20 epochs.

283
00:22:44,300 --> 00:22:51,320
Now, as we've seen already, our loss we made of two parts is the reconstruction and the regularization

284
00:22:51,320 --> 00:22:51,920
part.

285
00:22:51,950 --> 00:22:59,030
Now for the reconstruction part, our aim is to minimize the difference between those output at image

286
00:22:59,030 --> 00:23:01,100
and the input image.

287
00:23:02,080 --> 00:23:05,590
So we'll go ahead and start with the reconstruction loss.

288
00:23:05,620 --> 00:23:08,450
We have our custom loss.

289
00:23:08,470 --> 00:23:09,350
There we go.

290
00:23:09,370 --> 00:23:12,200
It takes in y pred Dixon y.

291
00:23:12,220 --> 00:23:12,970
Let's start with y.

292
00:23:13,000 --> 00:23:13,350
True.

293
00:23:13,390 --> 00:23:14,620
Dixon y true.

294
00:23:14,650 --> 00:23:15,970
Y pred.

295
00:23:16,450 --> 00:23:21,550
And then the reconstruction loss itself is defined such that we have recons.

296
00:23:21,730 --> 00:23:23,470
Let's just say loss.

297
00:23:23,620 --> 00:23:27,160
Reconstruction is equal.

298
00:23:27,770 --> 00:23:33,020
Tf.keras losses are binary.

299
00:23:33,220 --> 00:23:35,590
Cross entropy loss.

300
00:23:35,620 --> 00:23:40,780
Okay, so our outputs remember, our outputs range between 0 and 1.

301
00:23:40,780 --> 00:23:44,020
So we could use the binary cross entropy loss here.

302
00:23:44,770 --> 00:23:47,440
Feel free to test out our different losses.

303
00:23:47,440 --> 00:23:50,800
So we have that and then we pass in y.

304
00:23:50,830 --> 00:23:53,080
True and y pred.

305
00:23:53,110 --> 00:23:59,950
Now, once we make use of this, we are going to now sum all the values because we haven't.

306
00:23:59,950 --> 00:24:01,460
Let's say we have something like this.

307
00:24:01,460 --> 00:24:05,300
Let's suppose that it was a five by five output we were having.

308
00:24:05,300 --> 00:24:07,880
So we'd have something like this.

309
00:24:07,880 --> 00:24:09,140
One, two, three, four, five.

310
00:24:09,170 --> 00:24:10,130
That's fine.

311
00:24:10,280 --> 00:24:11,690
Five.

312
00:24:12,830 --> 00:24:14,030
There we go.

313
00:24:14,360 --> 00:24:16,370
You see, we have this five by five here.

314
00:24:16,370 --> 00:24:24,410
So with this binary cross entropy, we'll be able to get the the difference for each and every position

315
00:24:24,410 --> 00:24:24,840
here.

316
00:24:24,860 --> 00:24:27,400
Now, we need to sum all this here.

317
00:24:27,410 --> 00:24:30,650
So what we'll have is, um.

318
00:24:30,650 --> 00:24:32,000
Let's take this off.

319
00:24:33,300 --> 00:24:39,320
We'll take this off and we'll have the reduce some.

320
00:24:39,330 --> 00:24:47,070
So this reduce some now we'll some all this different different all this different loss values we get

321
00:24:47,070 --> 00:24:47,550
here.

322
00:24:47,550 --> 00:24:54,540
So now we we some for each and every position obviously we have 28 so 28 by 28 positions.

323
00:24:54,570 --> 00:24:58,380
We also need to specify the axis here.

324
00:24:58,380 --> 00:25:02,670
So the axis we're going to work with is one and two, one and two.

325
00:25:02,670 --> 00:25:09,300
Now to understand why we have having this, let's take a look at the shape of the output is B by 28,

326
00:25:09,300 --> 00:25:12,060
by 28 by one.

327
00:25:12,300 --> 00:25:20,550
But where we actually carrying out this, where we actually computing the loss is in this axis here.

328
00:25:20,550 --> 00:25:27,000
So we specify one two because this is zero one, two, three.

329
00:25:27,000 --> 00:25:32,500
So this is where we want to, um, compute our loss.

330
00:25:32,500 --> 00:25:40,270
So it's on this two axis now said we have that and then we now look for the mean so we could after summing

331
00:25:40,270 --> 00:25:45,220
up, we could look for the mean that's average the values and that should be it.

332
00:25:45,220 --> 00:25:49,540
So that's it for our um loss reconstruction.

333
00:25:49,540 --> 00:25:50,950
The next step.

334
00:25:50,950 --> 00:25:52,420
Let's take all this off.

335
00:25:52,420 --> 00:25:57,730
The next step will be this last regularization getting back here.

336
00:25:57,730 --> 00:25:59,770
You see, we have this sum.

337
00:25:59,950 --> 00:26:08,710
See the sum of, um, the variance plus the mean square of the mean minus one, minus the log of the

338
00:26:08,710 --> 00:26:09,430
variance.

339
00:26:09,430 --> 00:26:11,860
So let's get back here.

340
00:26:11,860 --> 00:26:13,930
And we have this negative half here.

341
00:26:13,930 --> 00:26:21,340
If we take this negative and multiply by each and every one of this, we'll have um, log var plus one

342
00:26:21,520 --> 00:26:25,030
minus the mean square, minus the var.

343
00:26:25,630 --> 00:26:36,330
Um, before we continue, remember again that we could get Sigma J as E to the log of, uh, Sigma J

344
00:26:36,960 --> 00:26:37,620
See that?

345
00:26:37,620 --> 00:26:39,210
So also, Yeah, yeah.

346
00:26:39,210 --> 00:26:41,880
No, let's, let's, let's, let's rewrite this better.

347
00:26:41,880 --> 00:26:49,440
So we have sigma J equal E to the log of Sigma J.

348
00:26:50,010 --> 00:26:51,990
Okay, so that's it.

349
00:26:52,140 --> 00:26:54,240
Um, let's get back here.

350
00:26:54,240 --> 00:26:57,000
We have our loss.

351
00:26:57,330 --> 00:27:03,810
Uh, we'll still have this mean and sum, so we'll say of average sum and then find the average.

352
00:27:04,200 --> 00:27:07,800
Uh, let's have that and then we'll have.

353
00:27:09,300 --> 00:27:11,850
-0.5

354
00:27:13,350 --> 00:27:17,430
times the log var.

355
00:27:18,670 --> 00:27:20,440
Let's get here.

356
00:27:20,470 --> 00:27:21,280
We'll need a log.

357
00:27:21,280 --> 00:27:21,880
Var.

358
00:27:22,810 --> 00:27:28,420
Oh, let's say me mean and then let's let's let's get exactly what we had here.

359
00:27:28,570 --> 00:27:34,950
Remember in this, um, encoder model, we outputted the mean and the log var.

360
00:27:34,960 --> 00:27:36,670
So let's have mean.

361
00:27:36,790 --> 00:27:37,810
There we go.

362
00:27:37,810 --> 00:27:42,760
We have mean and log var.

363
00:27:45,350 --> 00:27:47,360
Okay, so we have this set.

364
00:27:47,390 --> 00:27:49,380
Now, we could make use of it right here.

365
00:27:49,400 --> 00:27:54,140
We have, as we've said already, log var plus one.

366
00:27:54,230 --> 00:27:58,860
So we just have plus one minus the mean square.

367
00:27:58,880 --> 00:28:14,300
So TF math mean or rather the square of the mean and then minus tf math E to the power of log var.

368
00:28:14,330 --> 00:28:15,110
See that?

369
00:28:15,110 --> 00:28:23,540
And now we return the last reconstruction and the last regularization.

370
00:28:23,690 --> 00:28:24,610
So that's it.

371
00:28:24,680 --> 00:28:26,930
So this is our custom loss.

372
00:28:26,960 --> 00:28:30,620
We run this, we're getting this error.

373
00:28:32,190 --> 00:28:34,800
Let's add this and run that again.

374
00:28:34,920 --> 00:28:36,540
Okay, So that's fine.

375
00:28:36,540 --> 00:28:44,220
And we now set to start with the training, but before going on, we have to um, also specify the axis

376
00:28:44,220 --> 00:28:45,120
for the sum.

377
00:28:45,120 --> 00:28:54,150
So right here we have this axis which is equal one and we explain why we, we specify this axis to be

378
00:28:54,150 --> 00:28:54,660
one.

379
00:28:55,380 --> 00:29:01,380
Now the shape of the log var and the mean is this batch by two.

380
00:29:01,380 --> 00:29:05,460
And so the shape of all the sum will still be this.

381
00:29:05,460 --> 00:29:12,210
And so if you're comparing this, all right, if you're competing this year, you'll get this kind of

382
00:29:12,210 --> 00:29:12,930
output.

383
00:29:13,920 --> 00:29:17,220
And so since for the loss, we need a single value.

384
00:29:17,250 --> 00:29:22,770
We need to sum all the, um, values we get in this axis.

385
00:29:22,770 --> 00:29:25,860
So that's why you see, we specify the axis to be one.

386
00:29:25,860 --> 00:29:29,130
We have this, uh, input which we define.

387
00:29:29,160 --> 00:29:36,100
Then we have the encoder model which outputs Z, the mean and the variance, which we're not going to

388
00:29:36,100 --> 00:29:36,910
make use of.

389
00:29:36,910 --> 00:29:42,970
And then we have the decoder which takes in Z and then produces this output.

390
00:29:42,970 --> 00:29:49,000
So here we have this uh, model which contains both the encoder and the decoder.

391
00:29:49,000 --> 00:29:52,210
Now we're going to go ahead and build our custom training block.

392
00:29:52,210 --> 00:29:56,980
We suppose that you already have an idea of how this works.

393
00:29:56,980 --> 00:30:05,050
So we have that training block which takes in our X batch, and then we're going to make use of tensorflow's

394
00:30:05,050 --> 00:30:06,100
gradient tape.

395
00:30:06,100 --> 00:30:12,250
So we will have this with TF gradient tape.

396
00:30:12,400 --> 00:30:13,300
Gradient tape.

397
00:30:13,300 --> 00:30:14,140
There we go.

398
00:30:14,140 --> 00:30:15,910
As a recorder.

399
00:30:17,250 --> 00:30:20,100
We're going to pass this batch into the encoder.

400
00:30:20,100 --> 00:30:24,780
So we have our encoder model, which takes in X batch.

401
00:30:24,780 --> 00:30:31,920
And then what it outputs is Z, the mean and the log variance.

402
00:30:32,190 --> 00:30:33,570
Here we have this.

403
00:30:33,600 --> 00:30:34,110
Okay?

404
00:30:34,110 --> 00:30:35,610
So we have that set.

405
00:30:35,610 --> 00:30:44,190
And then now once we get this, we, we, we, we get the Z from here and pass this into our decoder.

406
00:30:44,190 --> 00:30:51,360
So we have our decoder model which takes in Z and then what it outputs is our Y predicted.

407
00:30:51,720 --> 00:30:52,470
See that?

408
00:30:52,680 --> 00:31:00,780
And now from here we could obtain our loss by simply calling on our custom loss method, which we've

409
00:31:00,780 --> 00:31:01,530
defined.

410
00:31:01,590 --> 00:31:02,460
It takes in the Y.

411
00:31:02,490 --> 00:31:03,210
True.

412
00:31:03,240 --> 00:31:08,460
It takes in the y pred widespread.

413
00:31:08,490 --> 00:31:16,450
It takes in um as we've defined here, the mean and it takes in the log variance.

414
00:31:16,480 --> 00:31:19,780
Now this y true.

415
00:31:19,990 --> 00:31:21,280
Y true.

416
00:31:21,310 --> 00:31:24,090
Happens to be the batch.

417
00:31:24,100 --> 00:31:28,150
Remember we are having this year.

418
00:31:28,180 --> 00:31:31,000
This encoder takes in that.

419
00:31:31,850 --> 00:31:34,290
So we have our input image here.

420
00:31:34,310 --> 00:31:37,840
Let's let's take this off and draw it a bit here, a bit clearer.

421
00:31:37,850 --> 00:31:41,300
So here we have this and we have this.

422
00:31:41,300 --> 00:31:47,030
So we have our encoder and we have our decoder, we have our input image, which is what we expect to

423
00:31:47,030 --> 00:31:47,780
have here.

424
00:31:47,780 --> 00:31:52,340
So our white arrow is what we pass as input here, which is this X batch.

425
00:31:52,340 --> 00:31:56,990
So that's why you see we specify Y trust x batch, then Y pred is what the decoder produces.

426
00:31:56,990 --> 00:32:01,340
So we're going to compare the y pred and then the Y.

427
00:32:01,370 --> 00:32:05,360
True, which is, as we've said already, the X batch.

428
00:32:05,870 --> 00:32:07,340
Um, that's fine.

429
00:32:07,340 --> 00:32:17,960
We get back to the code, we have that set and then now we have our partial derivatives or partial derivatives

430
00:32:17,960 --> 00:32:21,560
which will get by making use of the recorder.

431
00:32:22,010 --> 00:32:24,770
So we have recorder gradient.

432
00:32:24,800 --> 00:32:31,100
It takes in the loss, it takes in the overall model's trainable weights.

433
00:32:31,100 --> 00:32:33,720
So we have trainable weights.

434
00:32:33,750 --> 00:32:34,230
Okay.

435
00:32:34,230 --> 00:32:40,590
So talking about the overall model, which we yet to define, we have it right here.

436
00:32:40,590 --> 00:32:44,160
It takes in this input 28 by 28 by one.

437
00:32:44,190 --> 00:32:51,720
It takes the input of pass into the encoder model, gets the output from the encoder model and then

438
00:32:52,080 --> 00:32:53,550
into the decoder model.

439
00:32:53,550 --> 00:33:01,230
And then from there we create our model, which takes in the input and this output right here.

440
00:33:01,230 --> 00:33:08,520
So from here, let's get the summary and we see that we have exactly what we expect.

441
00:33:08,520 --> 00:33:15,600
So we have this model which takes, as we've said already, this input, the encoder outputs, Z, the

442
00:33:15,600 --> 00:33:22,320
mean and log VAR, and then the decoder outputs this, this image right here.

443
00:33:22,860 --> 00:33:30,480
So let's get back to our training as we're saying, where we have this, uh, partial derivatives from

444
00:33:30,480 --> 00:33:34,560
the loss and the trainable weights.

445
00:33:34,950 --> 00:33:36,060
So that's it.

446
00:33:37,140 --> 00:33:44,100
We then go ahead with the gradient descent step, we apply the optimizer, so we have optimizer dot

447
00:33:44,100 --> 00:33:46,320
apply gradients.

448
00:33:46,320 --> 00:33:53,970
There we go, which takes in our partial derivatives and our trainable weights.

449
00:33:53,970 --> 00:33:55,200
So there we go.

450
00:33:55,200 --> 00:34:03,060
We have the partial derivatives, which we've just calculated right here, derivatives.

451
00:34:03,270 --> 00:34:09,000
And then the trainable weights, trainable weights.

452
00:34:09,000 --> 00:34:09,930
There we go.

453
00:34:10,080 --> 00:34:11,400
Okay, so that's it.

454
00:34:11,430 --> 00:34:16,020
We could from here just simply return the loss.

455
00:34:16,800 --> 00:34:22,590
Now we're going to run this and let's we haven't run this here.

456
00:34:22,620 --> 00:34:23,250
Okay?

457
00:34:23,250 --> 00:34:24,600
So we have that already.

458
00:34:24,630 --> 00:34:27,810
Now we could define our neural learn method.

459
00:34:28,560 --> 00:34:34,020
So we have here neural learn which will take in the number of epochs.

460
00:34:34,230 --> 00:34:44,040
And then from here, for the epoch in range epochs.

461
00:34:44,370 --> 00:34:45,570
There we go.

462
00:34:45,600 --> 00:34:57,000
We're going to start by printing out the training starts for epoch number, whatever epoch we are at.

463
00:34:57,030 --> 00:35:03,630
So we format that to take in the epoch plus one since we're going to start from zero.

464
00:35:03,630 --> 00:35:09,570
Or we could just take this one from here and then there we go.

465
00:35:11,020 --> 00:35:12,970
Now we have to add plus one here.

466
00:35:13,150 --> 00:35:13,480
Okay.

467
00:35:13,480 --> 00:35:15,490
So we have that.

468
00:35:15,490 --> 00:35:26,950
And then now we're going to do for step um, X batch, we're going to enumerate in enumerate, um,

469
00:35:26,950 --> 00:35:29,940
the train data set, which we've defined already.

470
00:35:29,950 --> 00:35:36,850
Let's get back to the top and we see we have our train data set, so we're going to go through this

471
00:35:36,850 --> 00:35:38,050
train data set.

472
00:35:38,500 --> 00:35:40,480
Let's get back here.

473
00:35:41,110 --> 00:35:42,130
There we go.

474
00:35:42,130 --> 00:35:50,680
So we go through our train data set, we take a specific batch and then we compute the loss.

475
00:35:50,680 --> 00:35:55,810
So we have that training block, which we've defined here.

476
00:35:56,110 --> 00:35:58,120
Our training block has been defined here.

477
00:35:58,120 --> 00:36:04,090
So not only we compute the loss, but we also apply the gradient descent step for that specific batch.

478
00:36:04,090 --> 00:36:07,870
So we're doing this for each and every batch of our training data set.

479
00:36:07,900 --> 00:36:12,730
Now, what this takes in is our batch, simple as that.

480
00:36:12,740 --> 00:36:17,810
Now once we have this, the next thing we'll do is we'll print out our loss.

481
00:36:17,810 --> 00:36:24,680
So the training loss, um, is, um, there we go.

482
00:36:24,680 --> 00:36:25,820
We have loss.

483
00:36:25,820 --> 00:36:30,650
And then once the training is complete, we could simply print out.

484
00:36:32,020 --> 00:36:34,270
Training complete.

485
00:36:35,950 --> 00:36:38,410
Okay, so we have that set.

486
00:36:38,440 --> 00:36:42,300
Now let's run this and then we have neural learn.

487
00:36:42,310 --> 00:36:44,830
Let's train for epochs.

488
00:36:46,670 --> 00:36:48,470
Um, yeah, we're getting this error.

489
00:36:48,480 --> 00:36:49,670
Let's take that off.

490
00:36:50,150 --> 00:36:50,390
Let's.

491
00:36:50,390 --> 00:36:52,370
Fine, let's run this again.

492
00:36:52,640 --> 00:36:56,220
So training is now complete, and here's what we get.

493
00:36:56,240 --> 00:37:03,830
You see that loss drops and then we can now get straight into testing out our model.

494
00:37:03,860 --> 00:37:09,200
Now, before testing, let's recall that our VA is comprised of two units.

495
00:37:09,200 --> 00:37:10,970
That's our encoder.

496
00:37:10,970 --> 00:37:14,510
And then the decoder right here.

497
00:37:14,870 --> 00:37:25,280
Now we've trained this model end to end to make sure that the inputs look very similar to this output

498
00:37:25,280 --> 00:37:26,270
produced.

499
00:37:26,420 --> 00:37:33,470
Now, if we want to generate a new outputs, what we'll do is we are going to cut off or we are going

500
00:37:33,470 --> 00:37:43,370
to take off this region here and focus only on the decoder now to generate an image at random or to

501
00:37:43,370 --> 00:37:46,050
generate a digit, in our case, a random.

502
00:37:46,050 --> 00:37:50,190
We'll just have to pass in a Z in here.

503
00:37:50,220 --> 00:37:52,320
That's a random value of Z.

504
00:37:52,350 --> 00:37:56,760
Remember, Z is mu plus sigma.

505
00:37:57,680 --> 00:37:58,580
Epsilon.

506
00:37:58,580 --> 00:38:04,400
So we have to pass this value of Z in here and then get a random output.

507
00:38:05,360 --> 00:38:08,240
Now, remember, Z is two dimensional.

508
00:38:08,240 --> 00:38:12,670
So Z is this vector made of two values.

509
00:38:12,680 --> 00:38:23,330
And so that said, we will define the first value here, which we will call grid X, and we use the

510
00:38:23,330 --> 00:38:27,620
Linspace method to get values from a scale.

511
00:38:27,620 --> 00:38:37,250
Let's add a cell above this so we will have a scale, um, which will take a value of one and then we

512
00:38:37,250 --> 00:38:40,970
will have n equals to say 16 different values.

513
00:38:40,970 --> 00:38:48,200
So here we go from -1 to 1 and then we'll have 16 values in between.

514
00:38:48,590 --> 00:38:51,920
Uh, repeat this for grid y now.

515
00:38:51,920 --> 00:38:55,940
So this is the first, uh, element and this is the next element.

516
00:38:55,940 --> 00:39:01,530
So here we will have different elements so that we could generate different images.

517
00:39:02,700 --> 00:39:03,780
There we go.

518
00:39:03,780 --> 00:39:11,130
Let's run this and then print out our grid X and our grid y.

519
00:39:12,690 --> 00:39:18,780
Now that we have this done, the next thing we'll do is we'll plot out our different images, which

520
00:39:18,780 --> 00:39:21,540
you shall generate using our decoder.

521
00:39:21,540 --> 00:39:24,660
So here we have this figure.

522
00:39:24,690 --> 00:39:32,400
We define the fixed size, let's say five and we five by five.

523
00:39:32,400 --> 00:39:33,750
And then.

524
00:39:35,310 --> 00:39:38,070
For I in greed X.

525
00:39:39,130 --> 00:39:43,110
And for J in greed.

526
00:39:43,120 --> 00:39:44,140
Why?

527
00:39:45,290 --> 00:39:49,120
We define the different subplots subplot.

528
00:39:49,550 --> 00:39:49,920
Um.

529
00:39:50,090 --> 00:39:53,090
Five by five by K plus one.

530
00:39:53,120 --> 00:39:55,180
Let's define k right here.

531
00:39:55,190 --> 00:39:56,630
So K equals zero.

532
00:39:56,720 --> 00:39:58,340
Okay, so that's it.

533
00:39:58,340 --> 00:40:00,260
This is greed, not greed.

534
00:40:00,290 --> 00:40:02,330
We have greed that way.

535
00:40:02,330 --> 00:40:08,630
And then now we're ready to use or to use this greed X values.

536
00:40:08,630 --> 00:40:13,310
That's the values of I and J to generate new images.

537
00:40:13,670 --> 00:40:18,580
Okay, so what we'll do now is this is plt dot subplot.

538
00:40:18,590 --> 00:40:27,050
So what we'll do now is we'll have our input which is TensorFlow constant and then we have J.

539
00:40:27,080 --> 00:40:28,850
So these two values.

540
00:40:28,850 --> 00:40:39,890
And then from here we have the output which is our V, but notice how we pick out, uh, layers too.

541
00:40:39,920 --> 00:40:46,820
So, uh, to better understand this, you should get back here where we defined our model.

542
00:40:46,820 --> 00:40:54,230
So here you see, this is, uh, the variational auto encoder, which is made of different layers.

543
00:40:54,230 --> 00:41:01,130
So, uh, this is layer zero, layer one, and then layer two.

544
00:41:01,940 --> 00:41:08,060
Um, if you do v dot layers, um, let's say zero.

545
00:41:11,720 --> 00:41:14,700
See you have this input layer right here.

546
00:41:14,720 --> 00:41:24,290
Now, if you change this and say, anyway, let's let's say for I in, uh, range three.

547
00:41:24,950 --> 00:41:25,820
There we go.

548
00:41:25,820 --> 00:41:27,170
We're going to print this out.

549
00:41:27,170 --> 00:41:28,670
So we're going to print our layers.

550
00:41:28,700 --> 00:41:29,270
I.

551
00:41:31,150 --> 00:41:32,110
Let's run that.

552
00:41:32,800 --> 00:41:39,840
And we see that we have this input layer, we have this functional model, this other functional model.

553
00:41:39,850 --> 00:41:44,920
But you should note that's basically our encoder and decoder.

554
00:41:45,070 --> 00:41:45,790
Okay.

555
00:41:45,790 --> 00:41:46,630
So that's set.

556
00:41:46,660 --> 00:41:49,930
We're going to make use of our layers here.

557
00:41:49,930 --> 00:41:55,960
So we have the layers to this is to say that we're using the decoder actually.

558
00:41:55,960 --> 00:42:01,780
So our decoder is going to take in the input, see that it takes in the inputs.

559
00:42:01,930 --> 00:42:03,070
There we go.

560
00:42:03,070 --> 00:42:05,320
And then we'll have.

561
00:42:06,690 --> 00:42:10,780
To select our first axis here.

562
00:42:10,800 --> 00:42:12,420
So that's why we have this.

563
00:42:12,420 --> 00:42:20,550
And then we have to, um, select this other axis right here, so such that our output now, you see,

564
00:42:20,550 --> 00:42:25,440
if we, if we do this, if we have zero here, we do this.

565
00:42:25,590 --> 00:42:28,470
See, this is not going to be transformed to 28 by 28.

566
00:42:28,470 --> 00:42:29,980
So basically that's why we do this.

567
00:42:30,000 --> 00:42:33,630
Now once we have that, we do the show.

568
00:42:34,920 --> 00:42:37,200
And then we have our output.

569
00:42:37,320 --> 00:42:41,850
Our map is, uh, gray.

570
00:42:42,930 --> 00:42:44,370
There we go.

571
00:42:44,700 --> 00:42:47,000
Plot axis.

572
00:42:47,790 --> 00:42:48,250
Um.

573
00:42:48,360 --> 00:42:48,960
Of.

574
00:42:51,280 --> 00:42:54,430
And then K plus equal one.

575
00:42:54,460 --> 00:42:55,090
Okay.

576
00:42:55,090 --> 00:43:02,230
So we're basically increasing the value of one of K, sorry, such that we could have this as different

577
00:43:02,230 --> 00:43:03,010
subplots.

578
00:43:03,010 --> 00:43:05,980
So let's run this now and then see what we get.

579
00:43:06,910 --> 00:43:08,480
You getting this error here?

580
00:43:08,490 --> 00:43:16,230
So this is this is actually because we're having many more values as compared to what we define here

581
00:43:16,230 --> 00:43:17,820
in this subplots.

582
00:43:17,820 --> 00:43:23,990
So what we should have here should be n and here should be n that set.

583
00:43:24,000 --> 00:43:27,300
Let's run this again and see what we get.

584
00:43:28,790 --> 00:43:37,400
Now, as you could see, we are able to generate this digits, making use of just this Z vector right

585
00:43:37,400 --> 00:43:39,960
here, which is composed of two numbers.

586
00:43:39,980 --> 00:43:51,110
Now, one thing you can notice here in this latent space is that as we go from values of -1 to 1 in

587
00:43:51,110 --> 00:44:02,000
this two dimensional latent space, the outputs are created or generated such that in each line, as

588
00:44:02,000 --> 00:44:07,550
you could see, we have one digit, which is being.

589
00:44:08,610 --> 00:44:11,900
Slowly morphed into another.

590
00:44:11,910 --> 00:44:13,830
You see here we start with a nine.

591
00:44:13,830 --> 00:44:21,570
But as we change values in this latent space, you see how slowly we get to eight.

592
00:44:21,570 --> 00:44:24,200
And then here you see eight.

593
00:44:24,210 --> 00:44:29,850
And at this point you start with a nine and then you slowly get into fives and so on and so forth.

594
00:44:29,850 --> 00:44:36,510
You can look at this horizontally as well as vertically, see that we get to six and that.

595
00:44:36,510 --> 00:44:42,720
And then you could also look at this diagonal, you see nine, eight, three, two, six, and then

596
00:44:42,720 --> 00:44:43,880
you get two zeros.

597
00:44:43,890 --> 00:44:45,330
So, um.

598
00:44:46,360 --> 00:44:49,510
For this first line or this first lines.

599
00:44:49,510 --> 00:44:54,550
You can look at this first lines as going from 9 to 8.

600
00:44:54,790 --> 00:45:02,830
Then here is like from 9 to 5, but passing through three C and so on and so forth.

601
00:45:02,860 --> 00:45:09,550
So basically this is how we generate images using variational autoencoders.