WEBVTT

00:00.935 --> 00:04.800
-: All right, let me show you how to train a product

00:04.800 --> 00:06.570
into Stable Diffusion.

00:06.570 --> 00:08.673
So this is using Dream Booth.

00:09.630 --> 00:14.020
You know, this is the Shivam Shrirao.

00:15.323 --> 00:16.950
(chuckles) I can never pronounce it.

00:16.950 --> 00:21.630
This is there, you know, specific Colab notebook,

00:21.630 --> 00:23.553
which is really, really powerful.

00:24.630 --> 00:26.580
Colab, the reason we use this

00:26.580 --> 00:29.070
is it actually gives us access to a GPU,

00:29.070 --> 00:31.470
which is needed in order to run this.

00:31.470 --> 00:33.570
And even with the GPU, it takes quite a while.

00:33.570 --> 00:36.120
So buckle up.

00:36.120 --> 00:39.120
So we're gonna go through, first thing you wanna do,

00:39.120 --> 00:40.380
I'm not actually gonna run this

00:40.380 --> 00:41.400
'cause it'll take a long time.

00:41.400 --> 00:42.870
I've already run it in the background,

00:42.870 --> 00:44.910
but I'll kind of walk you through it.

00:44.910 --> 00:48.510
So first you need to kind of install the code

00:48.510 --> 00:53.510
to train Dream Booth and the Diffusers library.

00:53.918 --> 00:57.450
You also need to log into Hugging Face.

00:57.450 --> 01:01.620
So you need to put your HuggingFace token in here.

01:01.620 --> 01:04.470
You can get that when by logging in.

01:04.470 --> 01:06.750
I'm going to, yeah, I've already changed mine here,

01:06.750 --> 01:08.670
so you know, yours would work,

01:08.670 --> 01:11.840
but if you try this, it won't work.

01:11.840 --> 01:14.850
But you'll have your own and you can just kind of

01:14.850 --> 01:19.320
visit Having Face and go to the API section to get that.

01:19.320 --> 01:21.546
I don't recommend saving the model with gdrive,

01:21.546 --> 01:23.340
it takes up four to five gigs,

01:23.340 --> 01:25.980
so it can take up quite a lot of your gdrive,

01:25.980 --> 01:28.480
but if you're gonna use it again, you can do that.

01:29.998 --> 01:32.700
The model that you're using is here.

01:32.700 --> 01:36.513
I use the Stable Diffusion version 1.5.

01:38.280 --> 01:39.900
You know, I think there are later versions,

01:39.900 --> 01:42.960
I don't know if Dream Booth specifically works them,

01:42.960 --> 01:46.080
but this is the one I found works pretty well.

01:46.080 --> 01:48.720
And then you just wanna kind of create a directory

01:48.720 --> 01:50.850
for outputting the weights.

01:50.850 --> 01:53.610
Alright, so I'm gonna start training

01:53.610 --> 01:56.190
and there's a few things you need to change

01:56.190 --> 01:57.023
before you do that.

01:57.023 --> 02:00.450
So one is like, at least specifically here, you know,

02:00.450 --> 02:02.730
if you're training on a person,

02:02.730 --> 02:05.043
like if you're making one of those AI photo,

02:06.510 --> 02:08.970
you know, photo profile things

02:08.970 --> 02:12.240
where you train yourself into the AI

02:12.240 --> 02:14.580
and then put yourself into different situations,

02:14.580 --> 02:18.690
then you would use this class instance of a person.

02:18.690 --> 02:21.510
But in this case, we're doing a product

02:21.510 --> 02:23.797
and then we're calling it the product zwx.

02:26.130 --> 02:29.640
So that's what we're going to use in the actual prompt, zwx.

02:29.640 --> 02:32.760
And you just run that and then you can upload your photos.

02:32.760 --> 02:37.760
So here I have, you know, all the photos that I've uploaded

02:38.970 --> 02:43.970
and it's actually takes a little bit of time to do that.

02:44.130 --> 02:45.461
But you know, that's done.

02:45.461 --> 02:48.480
You could also, if you know how to structure it,

02:48.480 --> 02:50.820
you can actually upload it into the instant data directory.

02:50.820 --> 02:54.540
But, you know, if you're not as technical,

02:54.540 --> 02:58.620
then just run this and click "choose files".

02:58.620 --> 03:00.960
All, right so this is where

03:00.960 --> 03:03.240
the actual training happens, right?

03:03.240 --> 03:06.750
So, you know, put the model name in, et cetera.

03:06.750 --> 03:09.090
You don't need to change anything around this.

03:09.090 --> 03:12.360
Just make sure that this is a sample prompt.

03:12.360 --> 03:15.270
So this is what is going to use to run afterwards.

03:15.270 --> 03:17.850
Although obviously you can run your own prompts

03:17.850 --> 03:20.370
and you just need to kind of make sure that fits.

03:20.370 --> 03:25.290
So I have the product as zwx or dwx and you run that

03:25.290 --> 03:26.730
and then it starts the training.

03:26.730 --> 03:29.823
So it took a little bit of a while.

03:31.020 --> 03:34.250
In this case I think it was, you know, 14...

03:35.460 --> 03:37.440
Yeah, 26 minutes.

03:37.440 --> 03:41.343
So not terrible, especially for how valuable it is.

03:43.050 --> 03:44.430
And then this is where you kind of run it.

03:44.430 --> 03:45.840
So if you already have some weights,

03:45.840 --> 03:48.120
if you've saved them in the past, you know,

03:48.120 --> 03:51.000
you could use them here.

03:51.000 --> 03:53.340
But this is where it really gets interesting.

03:53.340 --> 03:55.200
So what I trained it on

03:55.200 --> 03:57.030
were pictures of my daughter's purse.

03:57.030 --> 03:58.533
So she's three, right?

03:59.400 --> 04:03.960
And she has like this little kinda purse that she got

04:03.960 --> 04:06.660
with sequins, purple and turquoise.

04:06.660 --> 04:08.820
And the really cool thing here is that

04:08.820 --> 04:11.280
these are not real pictures of the purse, right?

04:11.280 --> 04:12.870
Like I uploaded them earlier.

04:12.870 --> 04:17.870
These are actual kind of, AI generated photos

04:17.880 --> 04:19.830
and they look similar to the ones I've uploaded

04:19.830 --> 04:23.730
because that's kind of in the training data now.

04:23.730 --> 04:27.180
Like I uploaded one that was next to my kitchen tiles,

04:27.180 --> 04:29.610
one on a wooden surface, et cetera.

04:29.610 --> 04:32.370
You wanna provide different formats

04:32.370 --> 04:36.360
and the more different scenarios and angles

04:36.360 --> 04:38.580
you put the product in, the better

04:38.580 --> 04:41.400
because then it's gonna be able to replicate that.

04:41.400 --> 04:44.070
Cool, so that kinda shows that it works,

04:44.070 --> 04:46.620
but then the way you actually use it is

04:46.620 --> 04:48.900
when you run inference down here.

04:48.900 --> 04:50.504
So you can set a seed

04:50.504 --> 04:53.400
and that makes it reproducible, so you run the same thing

04:53.400 --> 04:55.080
and it'll get the same result.

04:55.080 --> 04:59.040
But what I wanted to do, just to impress my daughter

04:59.040 --> 05:00.870
was have Princess Elsa from Frozen

05:00.870 --> 05:05.250
carrying zwx product, right?

05:05.250 --> 05:06.660
'Cause I've negative prompts,

05:06.660 --> 05:09.540
you can change the guidance scale, all these other things,

05:09.540 --> 05:12.180
but you can see that, you know,

05:12.180 --> 05:13.770
in some cases it doesn't get it right.

05:13.770 --> 05:16.683
but in other cases it kind of gets a bit creepy.

05:18.000 --> 05:20.040
Here it kind of looks like the purse,

05:20.040 --> 05:21.600
but then some of 'em turn out great.

05:21.600 --> 05:24.150
So you can see here that this really looks

05:24.150 --> 05:26.550
much like the purse and then it's pretty much got

05:26.550 --> 05:28.770
Princess Elsa here as well.

05:28.770 --> 05:31.020
So, you know, this is kind of a basic prompt

05:31.020 --> 05:32.640
and you can do more fine tuning on this.

05:32.640 --> 05:34.530
You can provide more images,

05:34.530 --> 05:36.660
you can actually do this with multiple products.

05:36.660 --> 05:39.960
But just wanted to kinda show you how you could do this.

05:39.960 --> 05:44.070
You know, if you have this use case of generating ads

05:44.070 --> 05:49.070
for a client or, you know, generating photos

05:49.290 --> 05:52.623
where you actually have to have the real product inserted,

05:53.970 --> 05:58.053
now you can train the AI on this specific concept.

06:00.121 --> 06:00.954
And that's it.

06:00.954 --> 06:04.740
So this takes up a lot of space and you know,

06:04.740 --> 06:08.610
you can just about run it on a Google Colab for free.

06:08.610 --> 06:11.640
But you know, typically once you're doing this more often

06:11.640 --> 06:15.780
you would do this on your own GPU enabled computer

06:15.780 --> 06:17.580
or in the cloud.

06:17.580 --> 06:21.240
And this is basically the same code

06:21.240 --> 06:25.890
that these AI profile photo companies use,

06:25.890 --> 06:27.660
but obviously, we're using it

06:27.660 --> 06:29.910
to train the product in this case.

06:29.910 --> 06:31.760
So, yeah, hopefully that was helpful.

06:32.640 --> 06:34.770
It's a really powerful use case

06:34.770 --> 06:37.740
and obviously if you can kind of get it to work

06:37.740 --> 06:39.690
for the more powerful Stable Diffusions

06:39.690 --> 06:41.987
or as the models get better,

06:41.987 --> 06:44.703
then you can be much more flexible with this.