WEBVTT

00:00.510 --> 00:02.130
-: Hey, let's talk about DALL-E3's

00:02.130 --> 00:04.200
capabilities and limitations.

00:04.200 --> 00:05.940
So first, the capabilities.

00:05.940 --> 00:08.610
One thing it's really good at is...

00:08.610 --> 00:09.810
rephrase.

00:09.810 --> 00:11.160
One thing it's really good at is,

00:11.160 --> 00:13.320
that you can talk to the model directly

00:13.320 --> 00:15.600
or you can ask it to change certain things

00:15.600 --> 00:17.490
about the image, it's gonna rewrite the prompt

00:17.490 --> 00:19.140
and then regenerate the image.

00:19.140 --> 00:22.620
So quite useful for iterating in natural language.

00:22.620 --> 00:24.330
Makes it much more accessible.

00:24.330 --> 00:26.550
It also has inpainting, so you can select certain

00:26.550 --> 00:28.620
parts of the image and just prompt for those,

00:28.620 --> 00:31.170
and that makes it much easier, if you already

00:31.170 --> 00:33.210
have an image you like, you can improve it,

00:33.210 --> 00:34.890
which is really helpful.

00:34.890 --> 00:37.470
Finally, you have access via an API.

00:37.470 --> 00:40.230
You can call DALL-E programmatically, and that

00:40.230 --> 00:43.440
helps you scale some of the use cases you need.

00:43.440 --> 00:46.620
The limitations are that it has limited functionality,

00:46.620 --> 00:48.300
like none of these things are possible.

00:48.300 --> 00:49.590
You don't have weighted terms,

00:49.590 --> 00:52.620
you don't have any control over the parameters.

00:52.620 --> 00:54.630
You don't have a way to fine tune it.

00:54.630 --> 00:56.460
Like for example, this is an image from

00:56.460 --> 00:58.860
Simple Diffusion where I fine tuned it on my face.

00:58.860 --> 01:01.140
It could make a picture of me.

01:01.140 --> 01:03.120
You can't do any of that in DALL-E.

01:03.120 --> 01:06.000
The other thing that I don't really like about it,

01:06.000 --> 01:09.420
is it tends to make the images more cartoony.

01:09.420 --> 01:11.490
It can try and do realism, but it doesn't do

01:11.490 --> 01:13.200
a very good job of it compared

01:13.200 --> 01:16.080
to Stable Diffusion or Midjourney.

01:16.080 --> 01:18.450
The other big problem, this is what a lot of people

01:18.450 --> 01:21.240
see, is that the content moderation is excessive.

01:21.240 --> 01:24.030
You can't create images of public figures.

01:24.030 --> 01:26.640
You can't use artists that appeared

01:26.640 --> 01:28.200
in the past 100 years.

01:28.200 --> 01:31.230
So it really limits what you can do with the model.

01:31.230 --> 01:32.850
But overall, it's a very good model.

01:32.850 --> 01:36.300
It's very good at specifically placing

01:36.300 --> 01:38.340
different objects, different parts of the image.

01:38.340 --> 01:41.340
It seems to understand a little bit more how

01:41.340 --> 01:43.140
to place text in image as well.

01:43.140 --> 01:45.390
But because of the limitations,

01:45.390 --> 01:47.550
I don't know anyone who's seriously using

01:47.550 --> 01:49.863
it in production for anything important.