WEBVTT

00:00.930 --> 00:02.820
-: All right, this is a really fun one,

00:02.820 --> 00:05.760
which is Gemini 2.0 image generation.

00:05.760 --> 00:07.350
And what makes this fun

00:07.350 --> 00:09.450
is not that it's an image generation model,

00:09.450 --> 00:11.340
which is neither here nor there.

00:11.340 --> 00:13.410
There's a lot of image generation models.

00:13.410 --> 00:17.700
But what's cool about this is that it has native editing.

00:17.700 --> 00:22.560
When you say, make me an image with the words hello Mike,

00:22.560 --> 00:25.050
it actually really understands the prompt intent

00:25.050 --> 00:26.400
as straight into an image.

00:26.400 --> 00:31.140
It is not actually calling an image generation model,

00:31.140 --> 00:32.610
like ChatGPT is.

00:32.610 --> 00:36.120
It's much better, I found, at prompt adherence.

00:36.120 --> 00:40.290
But a really cool thing that I think is the killer feature

00:40.290 --> 00:42.870
is that it can take an existing image,

00:42.870 --> 00:46.920
and because it understands a lot about what's in that image

00:46.920 --> 00:49.380
and it's not converting it into a prompt,

00:49.380 --> 00:52.140
then you can get some pretty cool stuff.

00:52.140 --> 00:53.850
I'm just gonna upload an image here.

00:53.850 --> 00:56.100
This is from a marketing website

00:56.100 --> 01:01.100
and this is specific '90s nostalgic gaming-style pixelated.

01:01.110 --> 01:06.110
So I'm just gonna say, create another image of this style,

01:07.320 --> 01:10.126
but it's a major city.

01:10.126 --> 01:13.680
(keyboard clacking)

01:13.680 --> 01:15.810
And if I do this in ChatGPT,

01:15.810 --> 01:17.430
it tends to get pretty bad results

01:17.430 --> 01:20.340
because it misunderstands the image style.

01:20.340 --> 01:24.000
But because we're not converting into a prompt

01:24.000 --> 01:26.490
and then converting to an image generation model,

01:26.490 --> 01:29.430
which might lose some of the information along the way,

01:29.430 --> 01:33.180
instead it's really bringing that same style across,

01:33.180 --> 01:34.170
which is very cool.

01:34.170 --> 01:35.490
And then, I'm just gonna say,

01:35.490 --> 01:40.490
make it of Times Square but in the same style.

01:40.532 --> 01:41.939
(keyboard clacking)

01:41.939 --> 01:42.772
Okay.

01:51.510 --> 01:53.430
There you go. Very cool.

01:53.430 --> 01:54.810
All right, I'm gonna try one more.

01:54.810 --> 01:57.773
And this is another really interesting way to do it.

01:57.773 --> 02:01.350
Because it understands what's in the image,

02:01.350 --> 02:03.990
it can also do character consistency.

02:03.990 --> 02:07.020
So I'm just gonna find a headshot.

02:07.020 --> 02:08.043
Let me find one.

02:11.040 --> 02:12.340
Yeah, there's my headshot.

02:14.580 --> 02:18.794
I'm gonna say, make a passport photo of this guy.

02:18.794 --> 02:21.794
(keyboard clacking)

02:26.010 --> 02:28.020
And there you go. (laughs)

02:28.020 --> 02:28.950
It's pretty good,

02:28.950 --> 02:31.320
considering that we didn't fine-tune a model here.

02:31.320 --> 02:33.720
This has been possible with fine-tuning for a while,

02:33.720 --> 02:36.720
but this is one-shot character consistency,

02:36.720 --> 02:38.580
which is pretty powerful.

02:38.580 --> 02:41.070
So yeah, I encourage you to mess around with it

02:41.070 --> 02:43.740
and think about how you wanna use it

02:43.740 --> 02:47.640
because there are many new applications you can create

02:47.640 --> 02:50.730
with this type of one-shot application,

02:50.730 --> 02:54.930
with this type of one-shot image-to-image type model

02:54.930 --> 02:56.940
or text-to-image type model.

02:56.940 --> 02:58.860
Because it's working natively,

02:58.860 --> 03:01.290
it can do so much more and it's really surprising

03:01.290 --> 03:04.890
how much more it can do without losing information

03:04.890 --> 03:07.080
in between translating into a prompt

03:07.080 --> 03:08.640
for an image generation model.

03:08.640 --> 03:10.050
It's all the same model now

03:10.050 --> 03:12.153
and it feels completely different.
