WEBVTT

00:00.510 --> 00:03.510
-: Hey, I'm gonna show you how to make AI profile pictures

00:03.510 --> 00:04.890
like you might've seen on Twitter.

00:04.890 --> 00:08.340
So there are a few of these tools out there that do this.

00:08.340 --> 00:10.230
And to be honest, it's probably better

00:10.230 --> 00:12.180
and easier just to use those tools,

00:12.180 --> 00:14.280
especially if you don't have access to a GPU.

00:14.280 --> 00:17.370
But you know, I think there's a lot you can do

00:17.370 --> 00:19.500
in terms of flexibility once you understand

00:19.500 --> 00:21.540
this format and how it works.

00:21.540 --> 00:22.890
We're using Dream Booth.

00:22.890 --> 00:25.020
A Dream booth is a way to fine tune

00:25.020 --> 00:27.120
the Stable Diffusion library

00:27.120 --> 00:30.300
to train it on your own images.

00:30.300 --> 00:32.130
So you could train it on a specific style

00:32.130 --> 00:34.320
or you could train it on pictures of yourself.

00:34.320 --> 00:35.790
Like I have.

00:35.790 --> 00:39.720
The main thing to note here, I'm just gonna connect,

00:39.720 --> 00:44.520
is when you connect, you wanna view your resources

00:44.520 --> 00:46.210
and change runtime type

00:47.130 --> 00:50.190
and you need to make sure that you're running with a GPU.

00:50.190 --> 00:52.470
So the reason why you need this is

00:52.470 --> 00:55.830
you're running Stable Diffusion and also training the model

00:55.830 --> 00:57.930
and that is resource intensive.

00:57.930 --> 00:59.700
So I'm not actually gonna run this,

00:59.700 --> 01:01.410
I've run it previously,

01:01.410 --> 01:03.120
just 'cause it takes quite a long time.

01:03.120 --> 01:04.890
But you'll have access to this code

01:04.890 --> 01:07.680
to be able to run it in a couple hours yourself.

01:07.680 --> 01:12.210
So we're just, you know, first thing just downloading

01:12.210 --> 01:13.950
the install requirements.

01:13.950 --> 01:15.090
So that's pretty quick.

01:15.090 --> 01:17.130
And then you need a HuggingFace token.

01:17.130 --> 01:18.513
So this one is mine.

01:20.075 --> 01:22.050
I'm gonna delete this afterwards

01:22.050 --> 01:23.730
just so you guys can't access. (chuckles)

01:23.730 --> 01:28.530
But in order to get it, you could just go into Hugging Face.

01:29.790 --> 01:33.930
This is the original Stable Diffusion, the 1.5 version.

01:33.930 --> 01:37.470
And then I'm just gonna go to the settings.

01:37.470 --> 01:40.410
I think first you have to accept this model card,

01:40.410 --> 01:42.510
which I've already done so it's not showing up for me.

01:42.510 --> 01:46.140
But if you go to Access Tokens

01:46.140 --> 01:49.680
and then you know, you can create a new access token here.

01:49.680 --> 01:52.410
So that's where you would run this.

01:52.410 --> 01:55.140
And then if you wanna invalidate and refresh,

01:55.140 --> 01:56.853
you can do that here as well.

01:58.350 --> 02:00.090
And then this token will stop working.

02:00.090 --> 02:02.100
So that's fine 'cause I'm not running this.

02:02.100 --> 02:03.960
The model name that you're using

02:03.960 --> 02:05.670
is Stable Diffusion 1.5.

02:05.670 --> 02:07.800
And then you have like an output directory,

02:07.800 --> 02:10.650
which is gonna kinda store this locally.

02:10.650 --> 02:15.240
There's just enough RAM and storage in the computer

02:15.240 --> 02:17.970
that Google CoLab gives you to store all this.

02:17.970 --> 02:20.460
But it's kind of getting to the limits.

02:20.460 --> 02:24.060
Okay, so the main thing you need to do is

02:24.060 --> 02:26.730
you need to edit this part here so you know,

02:26.730 --> 02:28.680
you can put in different concepts.

02:28.680 --> 02:32.100
I would say only do one concept at a time

02:32.100 --> 02:35.730
and then like generate the image and then cut out that part.

02:35.730 --> 02:37.980
Because when you try and combine multiple concepts,

02:37.980 --> 02:39.660
it tends not to work very well

02:39.660 --> 02:41.430
and takes a really long time for training.

02:41.430 --> 02:42.750
So here we go.

02:42.750 --> 02:44.700
What we're doing is, the instance prompt

02:44.700 --> 02:47.640
is Photo of ukj person.

02:47.640 --> 02:49.110
And this is a type of class.

02:49.110 --> 02:50.580
So if you're doing a style,

02:50.580 --> 02:51.930
like a certain type of art style,

02:51.930 --> 02:53.400
you could change that here.

02:53.400 --> 02:58.290
And then the reason why we use ukj or you know, zwx

02:58.290 --> 03:00.090
is because that's like pretty far away

03:00.090 --> 03:01.650
from an actual word, right?

03:01.650 --> 03:04.680
It's not a real word and so we won't get confused.

03:04.680 --> 03:06.780
And you know, there's different types of classes.

03:06.780 --> 03:10.350
So like this product person or you could have style.

03:10.350 --> 03:14.010
Alright, so that's just gonna run and when you run that,

03:14.010 --> 03:15.930
it will do the JSON dump here,

03:15.930 --> 03:18.300
to kind of use this concept list when it's training.

03:18.300 --> 03:22.170
And now what you can do is upload the actual images.

03:22.170 --> 03:25.530
And what I've done, I just went on my Facebook account

03:25.530 --> 03:29.070
and downloaded 30, I think it was 30.

03:29.070 --> 03:31.680
And actually only 20 images of myself,

03:31.680 --> 03:33.090
just screen shotted them.

03:33.090 --> 03:34.830
And then I uploaded them.

03:34.830 --> 03:36.450
I would say typically you need at least 10,

03:36.450 --> 03:38.430
otherwise it doesn't work very well.

03:38.430 --> 03:40.500
20 is fine, 30 is great.

03:40.500 --> 03:43.080
The more you upload, the longer it takes, obviously.

03:43.080 --> 03:44.610
But about 20 was fine.

03:44.610 --> 03:48.010
The main thing to make sure is that the images are all like

03:48.900 --> 03:51.420
very similar or not very similar,

03:51.420 --> 03:53.550
so you wanna make sure that the images

03:53.550 --> 03:55.830
have some diversity, you know, some full body,

03:55.830 --> 03:58.530
some close up, you know, some of you laughing,

03:58.530 --> 04:00.060
some of you wearing a hat,

04:00.060 --> 04:01.980
wearing different clothes, in different locations.

04:01.980 --> 04:04.860
I think that tends to improve the diversity

04:04.860 --> 04:06.810
because then it can kind of see the common threads

04:06.810 --> 04:08.880
of like what makes you, you

04:08.880 --> 04:12.120
when it kind of trains.

04:12.120 --> 04:14.370
This is the actual training code here.

04:14.370 --> 04:18.660
So nothing to change other than the sample_prompt.

04:18.660 --> 04:21.420
And so what I wanted to do is I wanted to make

04:21.420 --> 04:23.310
an illustration of myself in the style of

04:23.310 --> 04:25.800
SpiderMan: into the Spiderverse.

04:25.800 --> 04:27.390
I generated this prompt.

04:27.390 --> 04:29.430
I would say like this is really good

04:29.430 --> 04:31.620
to combine fine tuning with prompting.

04:31.620 --> 04:34.140
So this is an actual prompt that I used to use

04:34.140 --> 04:37.560
in order to make Into The Spiderverse type images.

04:37.560 --> 04:40.260
And then this is kind of like doubling up, right?

04:40.260 --> 04:41.700
So you know, we have the prompt

04:41.700 --> 04:43.650
and then we have the image of me

04:43.650 --> 04:45.630
and then, you know, that way

04:45.630 --> 04:48.630
I don't have to like kind of describe myself in this prompt.

04:48.630 --> 04:51.690
You can actually get me from the training data.

04:51.690 --> 04:53.550
Cool, so you can also save the weights

04:53.550 --> 04:54.870
and reuse them if you want.

04:54.870 --> 04:56.280
I haven't saved them, in this case,

04:56.280 --> 04:58.290
it takes up quite a lot of space

04:58.290 --> 05:01.200
in your Google drive or locally.

05:01.200 --> 05:03.330
And then when you run this,

05:03.330 --> 05:06.930
then it's generating the example images after the training.

05:06.930 --> 05:10.770
So just to see here, this took about 10,

05:10.770 --> 05:13.140
I guess this is 10 minutes.

05:13.140 --> 05:15.325
It's not, oh no, sorry, it's 10 hours. Yeah.

05:15.325 --> 05:17.520
(chuckles) So it's quite a while I was running it.

05:17.520 --> 05:20.430
Yeah, I think it's 10 minutes,

05:20.430 --> 05:23.313
but sometimes it takes a long time to run.

05:24.870 --> 05:27.540
I've run it before and it took half a day.

05:27.540 --> 05:29.550
And so I think that's probably the longest.

05:29.550 --> 05:31.110
You just make sure you have the GPU

05:31.110 --> 05:32.880
and that's what makes it faster.

05:32.880 --> 05:34.590
Cool. So it did a pretty good job, right?

05:34.590 --> 05:36.570
Like this is a decent image of me,

05:36.570 --> 05:39.630
obviously this is kind of cheating, showing me in a mask,

05:39.630 --> 05:42.960
but you know, this is kind of, taking my beard

05:42.960 --> 05:45.420
and my eyebrows and my hair and stuff

05:45.420 --> 05:48.960
and yeah, I think it's done a good job of matching the style

05:48.960 --> 05:51.990
and putting me in into the Spiderverse.

05:51.990 --> 05:53.640
You can also save the weights and kind of

05:53.640 --> 05:55.560
use that for...

05:55.560 --> 05:59.340
At some of the web UIs, like automatic, if you want.

05:59.340 --> 06:03.270
But you can also just keep running it in this Google CoLab.

06:03.270 --> 06:06.720
So at this part here is to actually kind of run the pipeline

06:06.720 --> 06:09.390
and then you can set a seed from here.

06:09.390 --> 06:11.370
And that's just so you get the same image every time

06:11.370 --> 06:12.660
if you use the same prompt.

06:12.660 --> 06:15.780
And then I've used at this specific prompt here,

06:15.780 --> 06:19.290
illustration ukj person in the style of SpiderMan,

06:19.290 --> 06:20.640
blah, blah, blah, right?

06:20.640 --> 06:22.560
So that's what it's come back with.

06:22.560 --> 06:25.350
And you can edit this so you can say, you know,

06:25.350 --> 06:26.910
I want guidance scale.

06:26.910 --> 06:30.450
Like you know, the higher the guidance scale,

06:30.450 --> 06:31.800
the closer it is to the prompt,

06:31.800 --> 06:35.100
the lower, the more kind of creative it can be.

06:35.100 --> 06:37.230
There's also number of inference steps.

06:37.230 --> 06:39.420
So, you know, if these steps are higher,

06:39.420 --> 06:43.140
it tends to be higher quality images, but not always.

06:43.140 --> 06:45.690
I think it's good to experiment with this.

06:45.690 --> 06:47.070
And then number of samples as well.

06:47.070 --> 06:49.560
So I've generated four images here

06:49.560 --> 06:52.413
and yeah, you can see it's done pretty cool job here.

06:53.640 --> 06:56.163
Cool. So, that's how this works.