WEBVTT

00:00.600 --> 00:02.883
-: Hey, let's talk about DALL-E 3.

00:03.750 --> 00:06.510
So DALL-E 3 is a diffusion model at its core

00:06.510 --> 00:08.970
and that means it can generate images

00:08.970 --> 00:12.480
from random noise given a prompt

00:12.480 --> 00:15.660
and the diffusion model works by first to take lots

00:15.660 --> 00:17.640
of images and add random noise to them

00:17.640 --> 00:19.830
and then see if the model kind of worked backwards.

00:19.830 --> 00:21.060
That's how it's trained.

00:21.060 --> 00:23.640
The DALL-E model is closed source,

00:23.640 --> 00:25.410
so we don't know exactly how it works,

00:25.410 --> 00:26.520
but some people conjecture

00:26.520 --> 00:29.580
it's also a transform model in there

00:29.580 --> 00:31.080
as well using the Diffusion model.

00:31.080 --> 00:33.630
So that's why it's so good at being able

00:33.630 --> 00:36.390
to place certain things in different parts of the image,

00:36.390 --> 00:39.180
unlike a lot of image generation models

00:39.180 --> 00:42.363
because it actually has some reason to.

00:43.740 --> 00:45.630
DALL-E 3 is created by OpenAI.

00:45.630 --> 00:47.790
They also make ChatGPT.

00:47.790 --> 00:49.590
It was released before Midjourney

00:49.590 --> 00:51.870
and Stable Diffusion, the first version

00:51.870 --> 00:53.130
and DALL-E 2 was the one

00:53.130 --> 00:56.430
that was first really good at image generation.

00:56.430 --> 00:58.950
Kinda kicked off that whole craze in 2022,

00:58.950 --> 01:02.760
but version three only came out in 2023.

01:02.760 --> 01:06.450
And then it's also available through ChatGPT

01:06.450 --> 01:08.760
and via the API.

01:08.760 --> 01:11.700
What are some key features of this compared

01:11.700 --> 01:13.740
to other image generation tools?

01:13.740 --> 01:15.480
One is that it can do in painting

01:15.480 --> 01:18.660
and they can do that directly in line in ChatGPT interface,

01:18.660 --> 01:19.650
which is really cool.

01:19.650 --> 01:21.540
So you can scrub out part of the image

01:21.540 --> 01:24.000
and then replace it with by prompting.

01:24.000 --> 01:25.290
It's really good at that actually,

01:25.290 --> 01:27.180
probably the best at that.

01:27.180 --> 01:30.150
It can also create prompts.

01:30.150 --> 01:31.800
This isn't specifically DALL-E 3,

01:31.800 --> 01:33.210
this is ChatGPT doing this,

01:33.210 --> 01:34.740
but ChatGPT can write a prompt

01:34.740 --> 01:38.160
for DALL-E 3 in the ChatGPT interface

01:38.160 --> 01:39.660
and you can see that prompt.

01:39.660 --> 01:43.020
So, you can click the I when you click into an image

01:43.020 --> 01:45.150
and it'll tell you what the prompt was

01:45.150 --> 01:46.530
and that's really helpful.

01:46.530 --> 01:49.290
The other thing is that you can just edit by talking.

01:49.290 --> 01:50.123
So you can say,

01:50.123 --> 01:51.990
"Oh no, I wanted this dog to look realistic."

01:51.990 --> 01:55.710
And it can change the way the images look cool.

01:55.710 --> 01:58.140
One of the problems is that there's no negative prompting.

01:58.140 --> 02:00.750
You can't really get rid of things in the image like you can

02:00.750 --> 02:02.820
with Stable Diffusion or Midjourney.

02:02.820 --> 02:05.460
There's no classifier free guidance.

02:05.460 --> 02:07.830
There's no real control at all on the parameters

02:07.830 --> 02:10.800
that you're using as you would get from some

02:10.800 --> 02:12.570
of the other platforms.

02:12.570 --> 02:13.770
There's no way to fine tune it

02:13.770 --> 02:15.720
like you can with Stable Diffusion.

02:15.720 --> 02:18.270
There's also a limited use of artist names.

02:18.270 --> 02:20.910
You can't use any artist in living memory

02:20.910 --> 02:22.560
like in the last 100 years.

02:22.560 --> 02:25.620
It's also a lot of content moderation, so you can't,

02:25.620 --> 02:28.020
for example, generate a picture of Donald Trump.

02:29.340 --> 02:31.230
Some of the use cases, given that some

02:31.230 --> 02:33.180
of the weaknesses turn into strength here.

02:33.180 --> 02:34.770
It's really great for children's stories

02:34.770 --> 02:38.880
'cause it never generates anything really bad or sensitive.

02:38.880 --> 02:39.990
So, that's helpful.

02:39.990 --> 02:41.310
It's good for blog images

02:41.310 --> 02:45.060
'cause it has like a cartoony vibe or style.

02:45.060 --> 02:48.030
And this is an example of a blog image that I generated.

02:48.030 --> 02:49.380
It's also quite fun for memes

02:49.380 --> 02:52.410
'cause it's quite good at text on images

02:52.410 --> 02:55.860
and it's quite good understanding concepts.

02:55.860 --> 02:57.750
I've been using it mostly for that.

02:57.750 --> 03:01.950
I don't use DALL-E 3 really that much in a serious capacity

03:01.950 --> 03:04.680
just because Stable Diffusion is much better

03:04.680 --> 03:06.330
to build your business on.

03:06.330 --> 03:09.000
It works quite well and it's open source.

03:09.000 --> 03:09.833
You don't have to worry

03:09.833 --> 03:11.520
about content moderation (indistinct).

03:11.520 --> 03:13.770
I would say DALL-E 3 is very good though

03:13.770 --> 03:16.653
and worth using for some use cases.
