WEBVTT

00:00.900 --> 00:02.700
What is stable diffusion?

00:02.700 --> 00:05.370
This is one of the most interesting image generation models

00:05.370 --> 00:08.310
'cause it's the only one that's open source

00:08.310 --> 00:11.760
and still state-of-the-art in terms of generation.

00:11.760 --> 00:13.890
The diffusion models, they're all kind

00:13.890 --> 00:16.740
of use the same underlying infrastructure.

00:16.740 --> 00:18.750
The way that they're trained is you take tons

00:18.750 --> 00:20.160
and tons of images from the internet

00:20.160 --> 00:21.690
like pictures of cats

00:21.690 --> 00:24.120
and you have the captions, the alt tags

00:24.120 --> 00:25.470
that people have added to them

00:25.470 --> 00:27.420
that say what's in the image.

00:27.420 --> 00:29.850
Then you kind of add lots of noise

00:29.850 --> 00:31.110
in the training process

00:31.110 --> 00:33.691
until it becomes just a random image of noise.

00:33.691 --> 00:36.570
And then if you train it on enough of those things

00:36.570 --> 00:38.280
you can basically reverse the process.

00:38.280 --> 00:40.821
So you could take some random noise

00:40.821 --> 00:42.600
and then give it a caption

00:42.600 --> 00:44.040
which is the prompt

00:44.040 --> 00:45.510
and then it will reverse.

00:45.510 --> 00:47.340
So it will kind of become

00:47.340 --> 00:49.789
slowly diffused into the image.

00:49.789 --> 00:53.370
And the really cool thing about Stable Diffusion

00:53.370 --> 00:55.189
as I mentioned, is that it's open source.

00:55.189 --> 00:57.660
It was released by Stability AI

00:57.660 --> 01:00.660
and a bunch of other partners like Runway ML

01:00.660 --> 01:02.280
and it came out

01:02.280 --> 01:04.380
I think a university in Munich

01:04.380 --> 01:06.000
was one of the researchers.

01:06.000 --> 01:06.891
So

01:06.891 --> 01:10.230
it's, pretty intense

01:10.230 --> 01:13.650
that a model that is this good is free

01:13.650 --> 01:15.780
and available to run on your home computer

01:15.780 --> 01:17.400
if you have a GPU

01:17.400 --> 01:18.977
or you have one of the new kind of

01:18.977 --> 01:20.460
M1 or M2 Macs.

01:20.460 --> 01:21.630
It's, it's state of the art

01:21.630 --> 01:23.580
like it's as good as Midjourney

01:23.580 --> 01:24.750
as good as DALL-E

01:24.750 --> 01:27.330
which was a real surprise to everyone.

01:27.330 --> 01:28.590
But in many ways it's better.

01:28.590 --> 01:30.060
It has more functionality

01:30.060 --> 01:32.190
because it's open source people have added to it.

01:32.190 --> 01:34.050
So DreamBooth, for example

01:34.050 --> 01:36.390
is only really available in Stable Diffusion

01:36.390 --> 01:39.090
you can train yourself into the AI.

01:39.090 --> 01:41.280
A few of the other features, like I mentioned DreamBooth

01:41.280 --> 01:42.810
there is a picture of me

01:42.810 --> 01:44.700
except as like a dwarf

01:44.700 --> 01:47.370
a dwarf with armor on.

01:47.370 --> 01:48.203
You know

01:48.203 --> 01:49.650
pretty crazy you can do that.

01:49.650 --> 01:52.680
You can use this for your profile pictures, whatever.

01:52.680 --> 01:53.820
It has CFG scale.

01:53.820 --> 01:55.410
So kind of gives you a few more

01:55.410 --> 01:57.270
parameters than you get with DALL-E.

01:57.270 --> 01:59.130
You can basically tell the image

01:59.130 --> 02:01.290
how close you want it to be to your prompt.

02:01.290 --> 02:03.000
And you can also add negative prompts.

02:03.000 --> 02:04.770
So you know, this is a photograph

02:04.770 --> 02:06.840
of a astronaut riding a horse

02:06.840 --> 02:08.730
except not in space, right

02:08.730 --> 02:10.683
we added the negative prompt of space.

02:11.640 --> 02:14.490
Technical ability is the road blocker here.

02:14.490 --> 02:16.830
You do probably need to know how to code

02:16.830 --> 02:18.240
or at least, you know, not be afraid

02:18.240 --> 02:20.280
of modifying a few lines here and there.

02:20.280 --> 02:22.290
And you know, you can

02:22.290 --> 02:24.090
obviously if you want to get it running

02:24.090 --> 02:25.170
to power your website

02:25.170 --> 02:26.003
or something like that, you can

02:26.003 --> 02:28.590
if you want your own version running

02:28.590 --> 02:31.020
you either need a very good computer

02:31.020 --> 02:31.980
and you know, and

02:31.980 --> 02:34.440
and or you need to understand

02:34.440 --> 02:37.749
how cloud hosting like Google Cloud Console or

02:37.749 --> 02:39.660
or AWS works.

02:39.660 --> 02:40.793
Because it can

02:40.793 --> 02:42.750
you can run into a lot of kind of

02:42.750 --> 02:44.493
setup issues, shall we say.

02:45.360 --> 02:48.030
But that said, you know, once, once you've got it

02:48.030 --> 02:49.020
you can use it for pretty much

02:49.020 --> 02:50.400
anything like profile pictures.

02:50.400 --> 02:52.830
A lot of the profile picture websites that

02:52.830 --> 02:54.480
that are out there or you know

02:54.480 --> 02:56.760
automatic stock photo websites

02:56.760 --> 02:58.050
they use Stable Diffusion.

02:58.050 --> 02:59.758
You can actually build a business around this

02:59.758 --> 03:01.410
because it's open source.

03:01.410 --> 03:03.810
You're not in danger of anyone turning it off.

03:03.810 --> 03:05.520
You could also use it for product placement.

03:05.520 --> 03:07.038
So I haven't seen anyone

03:07.038 --> 03:07.950
you know

03:07.950 --> 03:09.510
at least kind of publicly

03:09.510 --> 03:11.280
talking about this

03:11.280 --> 03:13.800
but this is something I tried early on.

03:13.800 --> 03:15.137
I this, this,

03:15.137 --> 03:17.070
this picture on the right

03:17.070 --> 03:20.010
is actually my daughter's purse

03:20.010 --> 03:22.170
but being held by Elsa from Frozen.

03:22.170 --> 03:23.160
And this isn't a real picture

03:23.160 --> 03:25.710
this is purely generated by AI except

03:25.710 --> 03:28.590
I trained Stable Diffusion on lots of pictures

03:28.590 --> 03:30.450
of my daughter's purse.

03:30.450 --> 03:33.090
So I can put that purse in any situation now

03:33.090 --> 03:34.980
and use it in my prompts.

03:34.980 --> 03:36.603
So really, really cool stuff.

03:38.070 --> 03:40.260
The, the thing with Stable Diffusion

03:40.260 --> 03:41.093
I think

03:41.093 --> 03:42.360
is that

03:42.360 --> 03:45.000
the learning curve is very difficult.

03:45.000 --> 03:46.290
You know, if you don't know how to code

03:46.290 --> 03:47.790
even if you do know how to code

03:47.790 --> 03:49.230
you can run into a lot of issues.

03:49.230 --> 03:51.150
It takes a lot of processing power to run.

03:51.150 --> 03:53.310
You could very quickly rack up a lot

03:53.310 --> 03:55.290
of cost if you're hosting this to yourself

03:55.290 --> 03:58.080
so those are the things to really watch out for.

03:58.080 --> 03:59.100
But otherwise

03:59.100 --> 04:00.810
it's honestly magic.

04:00.810 --> 04:01.890
You know, try it,

04:01.890 --> 04:03.330
use it, play around with it.

04:03.330 --> 04:04.530
Experiment.

04:04.530 --> 04:06.963
This is a real gift that we have this.
