WEBVTT

00:00.720 --> 00:04.350
-: The second principle to try is to specify the format.

00:04.350 --> 00:06.930
Define what rules you want the AI to follow,

00:06.930 --> 00:10.080
and set a required structure for the response.

00:10.080 --> 00:12.120
All right, so we're gonna look at this

00:12.120 --> 00:13.560
through a text models first,

00:13.560 --> 00:15.780
and then we're gonna go onto images.

00:15.780 --> 00:17.850
Looking for a list of 10 product names

00:17.850 --> 00:20.670
for a pair of shoes that can fit any foot size.

00:20.670 --> 00:22.500
And this works fine.

00:22.500 --> 00:24.420
If you wanted to engineer the response

00:24.420 --> 00:26.370
to make it more reliable at scale

00:26.370 --> 00:28.770
if you're running this hundreds of thousands of times.

00:28.770 --> 00:30.720
Like if you're making a product name generator

00:30.720 --> 00:31.920
that people can use.

00:31.920 --> 00:35.100
But then you want to put a little bit more work into it.

00:35.100 --> 00:38.220
We've applied all the five principles to this prompt here.

00:38.220 --> 00:39.510
And specifically we're gonna look at

00:39.510 --> 00:41.580
how we specified format.

00:41.580 --> 00:44.820
And so this is a product template if you wanna grab it.

00:44.820 --> 00:47.880
But specifically in the format where we've said

00:47.880 --> 00:50.850
what we want is this section here,

00:50.850 --> 00:53.130
we said comma separated list.

00:53.130 --> 00:57.030
And all of the examples also provide a comma separated list.

00:57.030 --> 01:00.390
And we've also then told it we need three product names.

01:00.390 --> 01:02.640
We don't want a big list of 10 things.

01:02.640 --> 01:05.970
Now the reason why this is important is that the response

01:05.970 --> 01:08.880
to a prompt isn't the end of the task typically.

01:08.880 --> 01:11.550
It's usually just the input of the next step in the chain,

01:11.550 --> 01:12.930
which could also be another prompt,

01:12.930 --> 01:15.843
or it could be a human created task as well.

01:17.190 --> 01:18.960
And what impact on the results does it have?

01:18.960 --> 01:20.910
Depending on how you want the format,

01:20.910 --> 01:22.890
you could have it as a numbered list.

01:22.890 --> 01:25.770
This is if we change the prompt to have a numbered list,

01:25.770 --> 01:27.330
it comes out like this.

01:27.330 --> 01:28.770
But then if you're a developer

01:28.770 --> 01:31.530
and you're working on a product, you typically want

01:31.530 --> 01:34.170
the data to be structured in a way that you can pass it

01:34.170 --> 01:37.380
and use it in the next step, like displayed in the UI.

01:37.380 --> 01:40.590
So JSON is a very common mode,

01:40.590 --> 01:44.520
and in fact is actually a mode you can use in the OpenAI API

01:44.520 --> 01:47.100
and force it to return JSON.

01:47.100 --> 01:48.420
And then there's also YAML,

01:48.420 --> 01:50.280
which is more human readable,

01:50.280 --> 01:53.520
a little bit easier to pass in some cases.

01:53.520 --> 01:56.940
Whatever the data format you need, you can specify that.

01:56.940 --> 01:58.620
And one of the main things you want to do

01:58.620 --> 02:00.900
with prompt engineering is test different ways

02:00.900 --> 02:02.370
to make sure it always comes back

02:02.370 --> 02:05.310
with the required response format.

02:05.310 --> 02:06.930
So let's look at this in image models,

02:06.930 --> 02:08.370
'cause it's a little bit different.

02:08.370 --> 02:09.630
We're gonna look at stable diffusion.

02:09.630 --> 02:12.540
So this is prompting stable diffusion XL.

02:12.540 --> 02:15.120
And we can see, you know, we have a shoe

02:15.120 --> 02:17.670
that fits any foot size and that's fine.

02:17.670 --> 02:20.670
But you can see one of the responses it came back

02:20.670 --> 02:22.350
with was like a diagram.

02:22.350 --> 02:23.760
We actually want product photography.

02:23.760 --> 02:25.530
So how do we specify that?

02:25.530 --> 02:27.270
With a bit of prompt engineering,

02:27.270 --> 02:30.000
we have this prompt that is much more reliable

02:30.000 --> 02:33.060
and giving the same format back every time.

02:33.060 --> 02:36.870
And the part that where we specified format specifically,

02:36.870 --> 02:39.330
is if we go past the template here

02:39.330 --> 02:41.550
in section here of the prompt,

02:41.550 --> 02:43.710
we have photography, extremely detailed,

02:43.710 --> 02:46.740
studio lighting, 3.5 millimeter DSLR.

02:46.740 --> 02:49.890
So this is a way for us to keep it very consistent.

02:49.890 --> 02:52.470
If we append this on the end of every prompt,

02:52.470 --> 02:54.090
then we're gonna get a pretty consistent

02:54.090 --> 02:55.773
visual style every time.

02:56.820 --> 02:58.740
We've defined what rules we would like to follow,

02:58.740 --> 03:00.710
we've, you know, told it what sort of structure

03:00.710 --> 03:02.460
of the response comes back.

03:02.460 --> 03:05.640
And in image prompting you can get a little bit complicated

03:05.640 --> 03:09.510
because sometimes the format also blurs into the style.

03:09.510 --> 03:11.370
If you use product photography,

03:11.370 --> 03:14.790
then that's gonna necessitate certain types of style.

03:14.790 --> 03:18.600
And similarly, if you tell it you wanna style of a painting,

03:18.600 --> 03:20.640
then it's gonna clash with product photography

03:20.640 --> 03:22.380
and sometimes you get unreliable results.

03:22.380 --> 03:25.320
So really a case of testing both the first principle

03:25.320 --> 03:26.940
and the second principle together,

03:26.940 --> 03:29.133
and see which combination works well.

03:30.750 --> 03:32.880
Okay, let's look at the impact on results.

03:32.880 --> 03:35.310
Product photography is not the only thing we could do here.

03:35.310 --> 03:37.590
There's different types of product photography.

03:37.590 --> 03:39.870
So in this case, if we made it more of a model shoot,

03:39.870 --> 03:41.337
we wanted someone actually wearing it

03:41.337 --> 03:43.320
and we could achieve that.

03:43.320 --> 03:45.000
If we wanted street wear,

03:45.000 --> 03:47.130
now that's a different vibe entirely,

03:47.130 --> 03:50.160
and that you can see how it changes the style of the shoe

03:50.160 --> 03:52.230
as well alongside the format.

03:52.230 --> 03:54.480
So it takes some testing to get what

03:54.480 --> 03:56.370
you want in terms of your vision.

03:56.370 --> 03:58.830
And then the final one here is like Vogue Magazine.

03:58.830 --> 04:02.070
You can see that we have the classic problem with AI

04:02.070 --> 04:04.710
where she's got three legs. (laughs)

04:04.710 --> 04:06.750
Again, with AI, you need to test,

04:06.750 --> 04:10.170
you need to see how different changes to the prompt

04:10.170 --> 04:13.170
come back in terms of reliability of the results.

04:13.170 --> 04:15.630
And then also the quality of the image.

04:15.630 --> 04:17.220
I think it's a very high quality image,

04:17.220 --> 04:20.130
but it's obviously AI given that she has three legs.

04:20.130 --> 04:23.100
Maybe that's part of the alien vibe, I'm not sure.

04:23.100 --> 04:25.890
But whatever it is, getting the right format back

04:25.890 --> 04:28.110
will help you achieve the final task.

04:28.110 --> 04:32.760
Because if you're using these images for a magazine shoot,

04:32.760 --> 04:35.160
or if you're using them on your website,

04:35.160 --> 04:37.410
or if you're using on social media, you need to be able

04:37.410 --> 04:40.590
to specify what type of format you're expecting.

04:40.590 --> 04:41.730
Is it a model shoot?

04:41.730 --> 04:44.070
Do you actually want someone wearing these shoes?

04:44.070 --> 04:46.680
Or do you just want some product photography

04:46.680 --> 04:48.063
of the shoes themselves?