1
00:00:00,080 --> 00:00:03,290
In the section on generating new samples.

2
00:00:03,290 --> 00:00:10,580
What we are going to be doing is essentially taking in an input image like this one.

3
00:00:10,760 --> 00:00:14,540
Adding this up with a mask.

4
00:00:15,270 --> 00:00:17,910
As you could see here is specifically the boot.

5
00:00:17,910 --> 00:00:27,090
So if we take out just this part and just this other part from the input image, we have this mask.

6
00:00:27,300 --> 00:00:29,970
Obviously our model can generate this.

7
00:00:29,970 --> 00:00:39,330
So if we take this, um, pass this into our trained um, segmentation model or former model, we can

8
00:00:39,330 --> 00:00:43,140
obtain this mask containing only the boots.

9
00:00:43,140 --> 00:00:52,470
And with this prompt we pass this into our inpainting model, which is available in the diffusers library

10
00:00:52,470 --> 00:00:53,700
of hugging face.

11
00:00:53,700 --> 00:01:03,150
We could obtain a new sample where we've modified the region where we have the, the, the, the black

12
00:01:03,150 --> 00:01:03,660
boots.

13
00:01:03,660 --> 00:01:08,160
So we go from this black boots to this red colored boots.

14
00:01:08,610 --> 00:01:16,380
And so given that we have the input image or we could get some input image from our veil data set.

15
00:01:16,710 --> 00:01:23,310
Um, we've trained our formal model and obviously from this tool we could obtain the mask.

16
00:01:23,310 --> 00:01:25,650
And then we also have this prompt.

17
00:01:25,980 --> 00:01:32,430
It means that it's now very easy for us to generate or edit our images.

18
00:01:32,430 --> 00:01:35,070
This in this kind of automatic manner.

19
00:01:35,070 --> 00:01:40,950
Now, it should be noted that the mask generated by the former model obviously will contain the other

20
00:01:40,950 --> 00:01:42,030
parts of the image.

21
00:01:42,030 --> 00:01:44,010
So you would expect to have pants.

22
00:01:44,010 --> 00:01:45,930
So let's change the color.

23
00:01:45,930 --> 00:01:50,760
So we would expect to have pants and not just only the boots.

24
00:01:50,760 --> 00:01:58,410
So if we were to pick out only the pants then we would have this zone picked out, would have this zone

25
00:01:58,410 --> 00:01:59,700
tool picked out.

26
00:01:59,700 --> 00:02:03,570
But now that we're dealing only with the boots, that's why we've picked out only the zone.

27
00:02:03,570 --> 00:02:12,480
So what we'll do now is we'll write code, um, which will take the output mask generated from the former

28
00:02:12,480 --> 00:02:21,360
model and then only pick a given class, so all the other classes will be sent or will be turned to

29
00:02:21,360 --> 00:02:22,320
zeros.

30
00:02:22,320 --> 00:02:28,620
And then only that specific class will be, um, given a value of one.

31
00:02:28,620 --> 00:02:31,920
And then we'll obviously have a prompt related to the class.

32
00:02:31,920 --> 00:02:38,370
So this means that if we, um, wanted to deal only with the pants, or if we picked the pants on the

33
00:02:38,370 --> 00:02:43,710
mask here, then we'll change this boots and then we'll replace it with pants.

34
00:02:43,710 --> 00:02:47,370
So instead of boots we would have a red colored pants.

35
00:02:47,370 --> 00:02:49,440
Obviously we could change the the color.

36
00:02:49,440 --> 00:02:51,630
So we could say green color pants.

37
00:02:51,660 --> 00:02:56,760
We could modify the whole prompt and obtained our desired outputs.

38
00:02:56,760 --> 00:03:02,640
So this is how easy it is now to to edit our images once we have a trained model.

39
00:03:02,640 --> 00:03:06,630
Now let's dive into the code and see how we could generate this mask.

40
00:03:06,630 --> 00:03:14,790
When given an input image like this one, we'll start by our generate inputs method, which will take

41
00:03:14,790 --> 00:03:20,370
in the input or the image path, and then a specific mask ID.

42
00:03:20,730 --> 00:03:21,690
So that's it.

43
00:03:21,690 --> 00:03:30,120
Now if we do label to ID or id to label let's check out all the different labels we have.

44
00:03:30,120 --> 00:03:30,540
Boots.

45
00:03:30,540 --> 00:03:33,390
Boots is actually this year seven.

46
00:03:33,390 --> 00:03:41,010
So when running this we're going to specify seven because we want to be able to edit our input image

47
00:03:41,010 --> 00:03:48,750
based on uh, the boots could check out this image 0003 see it contains boots.

48
00:03:48,780 --> 00:03:55,320
Now make sure that you have, um, other specific ID matches with what you have in the image.

49
00:03:55,320 --> 00:03:56,640
So that's it.

50
00:03:56,640 --> 00:03:58,290
Um, there we go.

51
00:03:58,290 --> 00:04:01,320
We have written this out.

52
00:04:02,880 --> 00:04:03,810
So that was it.

53
00:04:03,840 --> 00:04:11,820
We have our source source image which is which we can read with, with um OpenCV.

54
00:04:12,360 --> 00:04:17,550
Here we have the image path and then we have the mask.

55
00:04:17,550 --> 00:04:24,420
So our mask the way we get our mask is by simply taking our model which we've trained, we pass in and

56
00:04:24,420 --> 00:04:26,580
then we pass in our source image.

57
00:04:26,580 --> 00:04:30,510
So we pass in the source image to obtain the output.

58
00:04:30,510 --> 00:04:33,900
Now um, note that we need to carry out some pre-processing.

59
00:04:33,900 --> 00:04:39,570
So we need to modify this code such that we have the pre-processed source image.

60
00:04:39,570 --> 00:04:42,600
Get into the model before we obtain the mask.

61
00:04:42,840 --> 00:04:46,980
That said, copying and pasting out the code which we've written already.

62
00:04:47,310 --> 00:04:49,770
Um, here we have our image.

63
00:04:50,130 --> 00:04:53,610
Instead of sample file path, we have the image path.

64
00:04:53,640 --> 00:05:02,460
Take this off and then we resize, we cast, we normalize, and then we get the transpose.

65
00:05:02,460 --> 00:05:08,580
We expand um, we add the extra dimension and then we obtain the output.

66
00:05:08,970 --> 00:05:14,790
So we return our source image and we also return our output.

67
00:05:15,120 --> 00:05:17,310
We could go ahead and test this out.

68
00:05:17,310 --> 00:05:22,500
So let's create this new cell generate generate inputs.

69
00:05:23,220 --> 00:05:26,730
And then we specify that uh we specify the path.

70
00:05:26,730 --> 00:05:30,060
And then the mask ID is seven as we've just seen.

71
00:05:30,060 --> 00:05:33,750
So let's run this and then see what we get.

72
00:05:34,580 --> 00:05:39,860
We get in this error, transpose expects a vector of size three, but inputs a vector of size four.

73
00:05:39,860 --> 00:05:44,300
So let's get back to the code and see what we have there.

74
00:05:44,300 --> 00:05:49,130
Well, let's first of all take this off and then run this again.

75
00:05:49,130 --> 00:05:54,320
Then what we'll do now is take off this two dimensions and resize our output.

76
00:05:54,320 --> 00:06:02,120
So once we get the output we are going to do the argmax to get the the the category with the highest

77
00:06:02,120 --> 00:06:02,810
score.

78
00:06:02,810 --> 00:06:10,550
Then once we get this argmax we resize our output and then finally we take off the batch dimension.

79
00:06:10,550 --> 00:06:13,580
So now we have our resized output.

80
00:06:13,580 --> 00:06:15,560
Now we could um, there we go.

81
00:06:15,590 --> 00:06:18,650
We could check out this resized output.

82
00:06:18,650 --> 00:06:21,470
So instead of output now we have resized output.

83
00:06:21,470 --> 00:06:22,880
Let's check that out.

84
00:06:22,880 --> 00:06:24,890
We now have the right shapes.

85
00:06:24,890 --> 00:06:35,240
So we could go ahead now and create this mask where all the other regions are black, while only this

86
00:06:35,240 --> 00:06:37,880
part where we have the boots is actually white.

87
00:06:37,880 --> 00:06:45,860
So what essentially we want to do here is say all this um, region or all the other regions will take

88
00:06:45,860 --> 00:06:54,470
up values of zero, while the rest, or rather while where we have the boots will take up value of 255.

89
00:06:54,590 --> 00:06:57,140
So that's exactly what we want to do.

90
00:06:57,170 --> 00:07:05,660
Now, that said, let's consider that we have this input as masks or as our mask, um, generated by

91
00:07:05,660 --> 00:07:06,380
the model.

92
00:07:06,380 --> 00:07:10,190
Now obviously here you see that all the different values take up.

93
00:07:10,190 --> 00:07:16,100
Um, or we have different values except for this position here where we have the value seven.

94
00:07:16,100 --> 00:07:17,480
Let's change the color.

95
00:07:17,840 --> 00:07:19,100
Um, there we go.

96
00:07:19,100 --> 00:07:19,550
Okay.

97
00:07:19,550 --> 00:07:24,950
So all uh, all through we have different values except for this one where we have the value seven.

98
00:07:24,950 --> 00:07:31,370
So it's similar to this scenario here where we could have, um, different masks or different mask values

99
00:07:31,370 --> 00:07:32,600
generated by the model.

100
00:07:32,600 --> 00:07:37,700
And then, um, for a given region like this region, we have the value seven.

101
00:07:37,700 --> 00:07:40,190
So we just simplifying it this way.

102
00:07:40,190 --> 00:07:51,080
Now what we could do to, um, take this like take this 7 to 255 one to send this to 255.

103
00:07:51,080 --> 00:07:53,960
And we also want to send all the rest to zero.

104
00:07:53,960 --> 00:08:04,700
So what we'll do first and foremost is add this with negative seven times um, once.

105
00:08:04,700 --> 00:08:11,450
So we will have one or ones which take up the exact shape of our input.

106
00:08:11,630 --> 00:08:18,740
So here if we if we do this, if we add this up like this, then what we'll be getting will be another

107
00:08:18,740 --> 00:08:19,370
matrix.

108
00:08:19,370 --> 00:08:21,200
Let's draw another matrix below.

109
00:08:21,200 --> 00:08:29,450
So our output matrix now will be such that we will have um values or all other values different from

110
00:08:29,450 --> 00:08:29,750
zero.

111
00:08:29,750 --> 00:08:34,400
So here this x is different from zero.

112
00:08:34,400 --> 00:08:36,290
So here we would have x.

113
00:08:36,290 --> 00:08:37,460
We would have x.

114
00:08:37,460 --> 00:08:38,810
We would have x.

115
00:08:38,930 --> 00:08:40,310
We would have x.

116
00:08:40,550 --> 00:08:43,040
We would have x x.

117
00:08:43,040 --> 00:08:50,420
And then um at that position where we had the seven because we subtracted with seven then we are going

118
00:08:50,420 --> 00:08:51,560
to have a zero.

119
00:08:51,560 --> 00:08:54,560
So all through here we have x and x.

120
00:08:54,650 --> 00:08:56,660
Let's change back the color.

121
00:08:56,660 --> 00:09:00,470
Take this and then see here we would have a zero.

122
00:09:00,620 --> 00:09:04,190
So all these values are both positive and negative values.

123
00:09:04,190 --> 00:09:06,560
They could be positive as well as they could be negatives.

124
00:09:06,800 --> 00:09:08,390
Um it doesn't matter.

125
00:09:08,480 --> 00:09:14,990
Um they are all different from zero except for obviously this position where we have um, the seven

126
00:09:14,990 --> 00:09:16,040
get into the code.

127
00:09:16,040 --> 00:09:17,540
We're going to define a mask.

128
00:09:17,540 --> 00:09:19,550
So here we'll define a mask.

129
00:09:19,550 --> 00:09:27,050
And now this max will be such that we will take in our resized output, taking our resized output and

130
00:09:27,050 --> 00:09:31,850
add that plus um negative the mask ID.

131
00:09:32,120 --> 00:09:34,280
So it's like doing negative seven.

132
00:09:34,280 --> 00:09:36,290
Negative seven does it this part.

133
00:09:36,290 --> 00:09:39,890
So so this is the the the resized output here.

134
00:09:39,920 --> 00:09:41,840
This is the mask ID.

135
00:09:41,840 --> 00:09:47,000
And then this is some ones which take up the same shape as the resized output.

136
00:09:47,000 --> 00:09:58,910
So now we will have that mask ID plus or at times um times the ones or ones like it's actually ones

137
00:09:58,910 --> 00:10:03,830
like um our resized output.

138
00:10:03,830 --> 00:10:04,640
There we go.

139
00:10:04,640 --> 00:10:10,760
So now we take this and add with this we, we expect that to have this kind of matrix where we have

140
00:10:10,760 --> 00:10:13,730
all different values except for here where we have a zero.

141
00:10:13,730 --> 00:10:22,460
Now getting back to our illustration, the next step will be to give this a value of 255 and then all

142
00:10:22,460 --> 00:10:23,930
the rest a value of zero.

143
00:10:23,930 --> 00:10:30,440
So first things first, given that we have both negatives and positives, what we could do is we we

144
00:10:30,440 --> 00:10:32,660
could multiply this by itself.

145
00:10:32,660 --> 00:10:33,980
Or better still, we could.

146
00:10:34,260 --> 00:10:39,420
Where this well, same square will be confusing this with matrix multiplication.

147
00:10:39,420 --> 00:10:42,390
But essentially what we're saying is let's copy this.

148
00:10:42,390 --> 00:10:44,700
So we just paste it out here.

149
00:10:45,210 --> 00:10:48,360
Essentially what we're saying is we have the square.

150
00:10:48,360 --> 00:10:51,720
Let's take this off this to off and this other two off.

151
00:10:51,720 --> 00:10:52,380
There we go.

152
00:10:52,380 --> 00:10:56,400
So what we're saying is we're doing an element wise multiplication.

153
00:10:56,400 --> 00:10:57,840
So it's not matrix multiplication.

154
00:10:57,840 --> 00:11:01,800
We take this value multiply by itself and so on and so forth.

155
00:11:01,800 --> 00:11:07,050
Now obviously again all the values will still be non zeros except for here where we have zero times

156
00:11:07,050 --> 00:11:07,590
zero.

157
00:11:07,590 --> 00:11:10,380
But with the difference that now all these values are positive.

158
00:11:10,380 --> 00:11:14,190
So x now is going to be strictly greater than zero.

159
00:11:14,190 --> 00:11:18,300
So we go from x being different from zero to x strictly greater than zero.

160
00:11:18,300 --> 00:11:22,440
So we would have a new a new one year space that out.

161
00:11:22,440 --> 00:11:23,760
Copy this.

162
00:11:23,760 --> 00:11:24,810
And there we go.

163
00:11:24,810 --> 00:11:29,250
So now our x in this case our x is strictly greater than zero.

164
00:11:29,280 --> 00:11:30,870
Take off this two okay.

165
00:11:30,870 --> 00:11:31,980
So that's it.

166
00:11:31,980 --> 00:11:34,260
Uh we now have this new matrix.

167
00:11:34,260 --> 00:11:38,460
And then the next step will be to multiply this by a very large number.

168
00:11:38,460 --> 00:11:42,690
So let's say we multiply this by ten to the power of ten.

169
00:11:42,810 --> 00:11:45,390
So if we multiply this by ten to the power of ten.

170
00:11:45,390 --> 00:11:55,560
And then we clip the output to in such a way that the max will be 255, and then the mean will be um,

171
00:11:55,560 --> 00:11:56,460
zero.

172
00:11:56,490 --> 00:11:58,590
Then what we'll be getting.

173
00:11:58,590 --> 00:12:00,420
Now let's take this off.

174
00:12:00,420 --> 00:12:02,820
What we'll be getting now that's after the clipping.

175
00:12:02,820 --> 00:12:07,710
So we again we multiply this, all these values by ten to the power of ten.

176
00:12:07,710 --> 00:12:15,450
So after multiplying by ten to the power of ten and then clipping such that the max is 255, and then

177
00:12:15,450 --> 00:12:19,320
the mean value is zero would obtain a new matrix.

178
00:12:19,320 --> 00:12:22,050
So let's copy this again.

179
00:12:23,310 --> 00:12:24,540
And piece it out here.

180
00:12:24,540 --> 00:12:26,940
So we would obtain this new matrix.

181
00:12:27,540 --> 00:12:34,950
And this new matrix now will take up values of x would have values of x being equal to 55.

182
00:12:34,950 --> 00:12:38,220
So all the values of x are now equal to 55.

183
00:12:38,220 --> 00:12:47,250
And so finally what we could do is if we subtract by or each and every value here we subtract 255 from

184
00:12:47,250 --> 00:12:48,780
each and every value.

185
00:12:48,810 --> 00:12:51,240
Then all these values.

186
00:12:51,240 --> 00:12:54,570
Now are all these values of x now become zero.

187
00:12:54,570 --> 00:13:00,330
And where we had zero it becomes -255 obviously, to convert that to 255, we just simply need to multiply

188
00:13:00,330 --> 00:13:01,920
by negative one.

189
00:13:01,920 --> 00:13:07,710
So now we if we subtract by or we remove -255.

190
00:13:07,710 --> 00:13:09,360
And then we add a negative sign here.

191
00:13:09,360 --> 00:13:15,420
So now we'll have all zeros because we obviously we've subtracted um 255 from all those values.

192
00:13:15,420 --> 00:13:24,480
But here we would have 255 and all the rest zeros, zeros all these zeros.

193
00:13:24,480 --> 00:13:26,160
And here zero.

194
00:13:26,160 --> 00:13:32,940
So that's how we can simply convert this input mask to this mask where we have only zeros.

195
00:13:32,970 --> 00:13:40,890
Um, for all the, the, the known uh, required regions or for all the regions where the category we

196
00:13:40,890 --> 00:13:43,620
want to work with isn't, um, represented.

197
00:13:43,620 --> 00:13:47,100
So getting back to the code, we are going to implement this.

198
00:13:47,640 --> 00:13:49,920
Um, obviously we've already done this first part.

199
00:13:49,920 --> 00:13:53,250
So we already now have this, um, this matrix.

200
00:13:53,250 --> 00:13:56,220
So let's go ahead and multiply it by itself.

201
00:13:56,220 --> 00:13:58,440
So we could do our new max.

202
00:13:58,440 --> 00:14:04,860
Now it's simply going to be uh, multiply, multiply by itself to make sure that we only have positive

203
00:14:04,860 --> 00:14:05,400
values.

204
00:14:05,400 --> 00:14:07,770
So we have mask mask.

205
00:14:07,770 --> 00:14:13,170
And then after multiplying by itself we multiply by a very large number.

206
00:14:13,380 --> 00:14:14,490
Let's get back to the code.

207
00:14:14,490 --> 00:14:14,910
We get.

208
00:14:14,910 --> 00:14:16,860
Multiply this by a very large number.

209
00:14:16,860 --> 00:14:23,160
We'll have uh one e to the ten times.

210
00:14:23,220 --> 00:14:24,840
Um this mask.

211
00:14:25,020 --> 00:14:27,660
So now we've multiplied by that very large number.

212
00:14:27,660 --> 00:14:29,160
The next thing will be to clip.

213
00:14:29,160 --> 00:14:36,420
So just as we have seen here, we clip, we're going to clip and make sure that our values fall between

214
00:14:36,420 --> 00:14:37,950
0 and 255.

215
00:14:37,950 --> 00:14:44,100
And since obviously all the values will be greater than 255, they will come down to 255 and and only

216
00:14:44,100 --> 00:14:46,260
have the zeros which will remain zero.

217
00:14:46,260 --> 00:14:49,680
So getting back to the code, let's reconnect.

218
00:14:49,680 --> 00:14:52,710
Getting back to the code we have NP clip.

219
00:14:53,100 --> 00:15:00,600
And then we specify our mean and the mask on the max actually.

220
00:15:00,600 --> 00:15:03,840
So here we have the max which is 255.

221
00:15:03,840 --> 00:15:04,560
There we go.

222
00:15:04,560 --> 00:15:05,400
So that's it.

223
00:15:05,400 --> 00:15:09,000
So now we've clipped getting back to our explanations.

224
00:15:09,000 --> 00:15:15,300
The next thing we want to do is subtract 255 and then multiply by a negative, um one.

225
00:15:15,300 --> 00:15:23,520
So uh, what we could do, what we could do is simply, we could simply take, um, 255 minus this.

226
00:15:23,520 --> 00:15:30,450
So let's get back and just do 255 minus that okay.

227
00:15:30,450 --> 00:15:32,160
So now we have our mask.

228
00:15:32,160 --> 00:15:38,850
And then after obtaining the mask let's, um, print out the mask shape instead.

229
00:15:38,850 --> 00:15:39,270
Now.

230
00:15:39,270 --> 00:15:44,400
So after obtaining the mask what we'll do is we'll use um OpenCV to write this.

231
00:15:44,400 --> 00:15:45,570
So we could check that out.

232
00:15:45,570 --> 00:15:50,430
So write our mask mask jpg.

233
00:15:50,430 --> 00:15:52,350
And then we have mask okay.

234
00:15:52,350 --> 00:15:54,330
So let's run that and see what we get.

235
00:15:55,020 --> 00:16:03,150
Now we open up our mask and normally we should get oh we get in this instead of what we expect.

236
00:16:03,180 --> 00:16:09,120
The reason we are getting this output, instead of, say, the region specific to only the boats, is

237
00:16:09,120 --> 00:16:15,930
because the model doesn't perform well with, um, categorizing this region as, uh, boots.

238
00:16:15,930 --> 00:16:18,780
So let's check out shoes.

239
00:16:18,900 --> 00:16:22,650
Um, if we go shoes, there we go.

240
00:16:22,650 --> 00:16:23,370
39.

241
00:16:23,370 --> 00:16:27,060
So let's replace that ID with 39 and see what we get.

242
00:16:27,420 --> 00:16:29,490
Let's say 39.

243
00:16:29,490 --> 00:16:32,610
Run that again and check out mask.

244
00:16:34,250 --> 00:16:40,640
Normally you should, um, get that because, uh, the model confuses the boots with a shoe.

245
00:16:42,090 --> 00:16:44,910
Just close this and check that again.

246
00:16:46,850 --> 00:16:47,750
There we go.

247
00:16:47,750 --> 00:16:54,620
As you could see, now we have this region, um, 39, which is, um, all in white, while the rest

248
00:16:54,620 --> 00:16:55,550
is in black.

249
00:16:55,550 --> 00:17:02,330
So that's it for this part, we're able to generate our inputs, as we had, um, seen already from

250
00:17:02,330 --> 00:17:03,920
our initial schematic.

251
00:17:03,920 --> 00:17:09,980
We have this input, uh, we have the original image, and then we have this, um, region.

252
00:17:09,980 --> 00:17:13,730
Now, the next step will be to carry out the inpainting.

253
00:17:13,730 --> 00:17:20,030
That is to take this input image, pass it to our inpainting model, and then, um, edit our input

254
00:17:20,030 --> 00:17:27,710
image such that we have this new image the exact inpainting model will be using will be the stable diffusion

255
00:17:27,710 --> 00:17:35,570
tool inpainting model from stability AI and which is made available on um hugging face via diffusers

256
00:17:35,570 --> 00:17:36,290
library.

257
00:17:36,290 --> 00:17:45,080
So as usual, when working with hugging face, you have this um, simple examples which can help you

258
00:17:45,080 --> 00:17:47,540
run the code very easily.

259
00:17:47,540 --> 00:17:50,270
So here we have how it works.

260
00:17:50,270 --> 00:17:54,200
You see, you can have an image, you have the mask and then you have some output.

261
00:17:54,200 --> 00:17:57,140
So let's go ahead and test this out.

262
00:17:57,680 --> 00:17:59,900
Um here we have some notes.

263
00:17:59,900 --> 00:18:06,110
Despite not being a dependency, we highly recommend you to install Xformers for memory efficient attention.

264
00:18:06,410 --> 00:18:07,820
Uh, that's better performance.

265
00:18:07,820 --> 00:18:13,580
And if you have a low GPU Ram, make sure to add pip dot enable attention slicing after sending it to

266
00:18:13,580 --> 00:18:17,000
Cuda for less Vram usage to the cost of speed.

267
00:18:17,030 --> 00:18:22,970
Okay, so that said, let's simply copy this or we have already installed this.

268
00:18:22,970 --> 00:18:24,530
So you could also copy this and install.

269
00:18:24,530 --> 00:18:25,910
But we've already installed it.

270
00:18:25,910 --> 00:18:28,880
So we'll simply copy this and then get back to the code.

271
00:18:29,620 --> 00:18:31,000
From the code we copied.

272
00:18:31,000 --> 00:18:36,910
First thing, we are going to import the stable diffusion in paint pipeline from the diffusion library.

273
00:18:36,970 --> 00:18:44,650
Then we create our simple pipeline specifying the exact model ID that's from stability AI.

274
00:18:44,680 --> 00:18:48,400
Then the PyTorch data type is float 16.

275
00:18:48,850 --> 00:18:54,670
Um, we are going to be working with our GPU, so we have pipe to Cuda.

276
00:18:54,700 --> 00:18:57,850
Then um, we specify the prompt.

277
00:18:57,850 --> 00:18:59,590
But we're going to get back to this.

278
00:18:59,590 --> 00:19:07,840
Now it should be noted that as we had seen, um, in this um, model card, we're told that let's scroll

279
00:19:07,840 --> 00:19:08,710
down.

280
00:19:08,860 --> 00:19:14,950
Uh, we're told that despite not being, uh, we recommend you installing the X format for memory efficient

281
00:19:14,950 --> 00:19:15,760
attention.

282
00:19:15,760 --> 00:19:17,050
So.

283
00:19:17,800 --> 00:19:22,690
We are going to do this immediately after we send this to Cuda.

284
00:19:22,690 --> 00:19:25,090
So let's save this.

285
00:19:25,540 --> 00:19:26,410
There we go.

286
00:19:26,410 --> 00:19:31,690
So that's why you see, we have this extra line right here which was not in the model card.

287
00:19:31,720 --> 00:19:39,160
Then from from here we have the the image which has been generated using the prompt using the original

288
00:19:39,160 --> 00:19:39,880
image.

289
00:19:39,910 --> 00:19:43,240
And it turns out to be a Pil image as specified here.

290
00:19:43,240 --> 00:19:48,940
Then we have the mask and we generate the images which we can now then save.

291
00:19:48,940 --> 00:19:51,340
So essentially we're taking in an image.

292
00:19:51,340 --> 00:19:54,850
We take in a prompt, we take it in a mask and then we generate in.

293
00:19:54,850 --> 00:20:00,430
Are we editing this original image and generating a new image from our edited image.

294
00:20:00,430 --> 00:20:04,480
So that said we need this image and the mask image.

295
00:20:04,480 --> 00:20:07,840
So we're going to call on our generate inputs method.

296
00:20:07,840 --> 00:20:10,180
Let's copy this here.

297
00:20:10,360 --> 00:20:11,380
There we go.

298
00:20:11,380 --> 00:20:13,600
We call on our generate inputs method.

299
00:20:13,810 --> 00:20:16,510
And then here we specify 39.

300
00:20:16,510 --> 00:20:17,380
There we go.

301
00:20:17,380 --> 00:20:18,910
And that's it.

302
00:20:18,910 --> 00:20:23,860
So now in our generate inputs method we we we output our numpy array.

303
00:20:23,860 --> 00:20:27,640
So we need to convert this um into Pil images.

304
00:20:27,640 --> 00:20:34,210
So that said we have image um from array from array.

305
00:20:34,390 --> 00:20:35,980
That's our source image.

306
00:20:36,610 --> 00:20:37,900
Let's take off the shape.

307
00:20:38,500 --> 00:20:39,940
We have the image from our array.

308
00:20:39,940 --> 00:20:45,670
And we also have the same image uh from array image from array.

309
00:20:45,670 --> 00:20:50,200
And then we take off the shape.

310
00:20:50,200 --> 00:20:51,190
There we go.

311
00:20:51,190 --> 00:20:53,620
So now we have our Pil images.

312
00:20:53,620 --> 00:20:56,350
Let's run our generate inputs and see what we get.

313
00:20:56,770 --> 00:20:58,510
Um there we go.

314
00:20:58,510 --> 00:21:06,250
Let's, let's have this image and then mask image and there we go.

315
00:21:06,250 --> 00:21:09,160
So let's run this and then see what we have as output.

316
00:21:09,160 --> 00:21:11,380
But before running let's update the the prompt.

317
00:21:11,380 --> 00:21:21,190
So instead of face of a yellow card cat let's say we have red colored um shoe because this is a shoe

318
00:21:21,220 --> 00:21:22,600
high resolution.

319
00:21:22,600 --> 00:21:25,480
And let's take this off okay.

320
00:21:26,620 --> 00:21:27,280
That's fine.

321
00:21:27,280 --> 00:21:30,040
So we have red colored shoe high resolution.

322
00:21:30,040 --> 00:21:32,950
And we're going to save this in.

323
00:21:32,950 --> 00:21:37,240
We're going to save this in our let's say shoe after running.

324
00:21:37,240 --> 00:21:42,460
As you could see here we have the shoe which is um, not really that red.

325
00:21:42,460 --> 00:21:44,500
Maybe it looks more brownish.

326
00:21:44,710 --> 00:21:47,530
Um, and then you could also compare with the original.

327
00:21:47,530 --> 00:21:50,740
So you see that there's that, um, editing which we've done on the image.

328
00:21:50,740 --> 00:21:56,200
Now the image size could be changed so we could resize our image so it looks like the original.

329
00:21:56,200 --> 00:21:59,080
Now let's modify again.

330
00:21:59,080 --> 00:22:07,390
So before we took this um the the the mask ID to be 39, let's take 13 which will present the code and

331
00:22:07,390 --> 00:22:08,410
modify the code.

332
00:22:08,410 --> 00:22:17,230
So let's say we have uh, a green, a green colored, um coat.

333
00:22:19,040 --> 00:22:22,550
A green colored nice looking coat.

334
00:22:22,550 --> 00:22:24,920
And let's run that again and see what we get.

335
00:22:24,920 --> 00:22:31,010
Well, normally we should separate this so we don't have to always create all the pipeline each time

336
00:22:31,010 --> 00:22:31,520
we run this.

337
00:22:31,520 --> 00:22:36,590
So we'll separate the part where we create our pipeline with where we actually generate the inputs,

338
00:22:36,590 --> 00:22:38,630
or rather we actually generate a new image.

339
00:22:38,630 --> 00:22:46,130
So let's go ahead and paste this out here then before running here.

340
00:22:46,130 --> 00:22:47,390
Now we have coat.

341
00:22:47,390 --> 00:22:48,920
So we have coats.

342
00:22:48,920 --> 00:22:50,000
There we go.

343
00:22:50,000 --> 00:22:52,940
And then um let's run this.

344
00:22:52,940 --> 00:22:55,670
So we, we already have our pipeline.

345
00:22:55,670 --> 00:22:57,860
Then we, we have now this coat.

346
00:22:58,040 --> 00:23:00,770
Then we're going to go ahead and resize our output.

347
00:23:00,770 --> 00:23:03,230
So here we have image resize.

348
00:23:03,770 --> 00:23:09,260
Um we have width init and then height init.

349
00:23:09,260 --> 00:23:12,170
So instead of height width we have width height.

350
00:23:12,470 --> 00:23:14,000
And that looks fine.

351
00:23:14,000 --> 00:23:19,580
After resizing we could just simply display and then visualize that.

352
00:23:19,580 --> 00:23:22,220
So let's run this and see what we get.

353
00:23:22,730 --> 00:23:27,470
We get this output, but, um, we don't have that green color.

354
00:23:27,470 --> 00:23:30,110
So we're going to modify this prompt a little here.

355
00:23:30,110 --> 00:23:38,030
We say uh, photo realistic because here we were just green colored nice looking coat, high resolution

356
00:23:38,030 --> 00:23:40,160
without taking into consideration the whole image.

357
00:23:40,160 --> 00:23:50,900
So we have a photorealistic photo of a woman, um, wearing a green colored coat, wearing a green color,

358
00:23:51,470 --> 00:24:00,110
um, let's say nice looking coat, um, high resolution and let's say all green, high resolution.

359
00:24:00,110 --> 00:24:00,800
There we go.

360
00:24:00,800 --> 00:24:02,900
Let's run that again and see what we get.

361
00:24:03,200 --> 00:24:04,250
While that's running.

362
00:24:04,250 --> 00:24:06,170
We could change this to coat.

363
00:24:06,170 --> 00:24:13,640
And then we modify the code of generate inputs such that this takes in the the mask label.

364
00:24:13,640 --> 00:24:22,400
And then here we'll define mask ID label to id and mask label.

365
00:24:22,400 --> 00:24:23,270
There we go.

366
00:24:23,270 --> 00:24:29,330
So when we pass in coat label to ID we produce the the id which in this case is 13.

367
00:24:29,330 --> 00:24:30,380
So let's get back.

368
00:24:30,410 --> 00:24:32,000
You see now it's much better.

369
00:24:32,000 --> 00:24:40,280
We have um, um, a code now which looks much more greener as compared to what we had, uh, just in

370
00:24:40,280 --> 00:24:41,060
before.

371
00:24:41,180 --> 00:24:48,110
Let's run this again with a modifications, which we just had and, uh, we've already run this.

372
00:24:48,110 --> 00:24:49,880
Let's run this again and see what we get.

373
00:24:50,330 --> 00:24:55,520
Uh, we have our edited image, uh, only on this zone, and that's it.

374
00:24:55,520 --> 00:25:04,280
So we've just seen how we could go from an input image, um, which is this input image right here to

375
00:25:04,280 --> 00:25:12,980
an edited version of that image based on a specific mask, which was generated from our, um, trained

376
00:25:13,520 --> 00:25:14,660
former model.