WEBVTT

00:00.200 --> 00:00.560
All right.

00:00.560 --> 00:05.000
We're going to create XYZ prompt grids with flux and fallai.

00:05.040 --> 00:07.120
So this is what a prompt grid looks like.

00:07.440 --> 00:15.400
Typically what you're going to do is try a few different variables or parameters on one side, and on

00:15.400 --> 00:20.880
the other side, you combine them together to see how that combination affects the final image output.

00:20.920 --> 00:25.120
So in this case on the left hand side here we have inference steps.

00:25.240 --> 00:27.840
So that's how much compute we put towards the image.

00:27.840 --> 00:32.600
And you can see at the top here with only six steps it doesn't make a very good image.

00:32.640 --> 00:34.080
It's fuzzy and messed up.

00:34.080 --> 00:39.320
But then down at 50 steps here it's very crisp and clean but doesn't actually look that good in my opinion.

00:39.360 --> 00:43.000
Like I prefer the softer edge here, like around 28 steps.

00:43.640 --> 00:45.200
And then this is guidance scale.

00:45.200 --> 00:51.160
So this is how much the prompt adheres to or how much the image adheres to the prompt.

00:51.160 --> 00:54.120
And you can see that there are some kind of sweet spots here.

00:54.120 --> 00:58.840
I really like 28 steps and 3.5 guidance scale.

00:58.880 --> 01:05.240
And it really gives you a good feel for what you want out of an image and if you can make all those

01:05.240 --> 01:08.720
combinations together, then you can pick where on the map you want to be.

01:09.200 --> 01:11.320
That's what we're trying to do today.

01:11.640 --> 01:18.160
Let's go through and run it from the beginning so you can understand how this works.

01:18.360 --> 01:21.680
So I'm just going to run this to create the file client.

01:22.080 --> 01:24.440
And that's what we're going to use to to run this.

01:24.520 --> 01:30.560
We also want to get us numpy and matplotlib.

01:30.600 --> 01:33.680
This is going to help us do some of the plotting of the images.

01:34.520 --> 01:37.960
And then we also need here the environment.

01:37.960 --> 01:42.760
So if you don't have a file key in your env file locally you just go to file.

01:42.760 --> 01:48.760
I get the keys, create an account, make sure that you've got money in there, and then load the API

01:48.760 --> 01:49.520
key here.

01:50.000 --> 01:54.600
One other thing in terms of setup we need to have here is nest async here.

01:54.640 --> 01:57.080
That's because we're going to run this asynchronously.

01:57.280 --> 02:02.920
And that's just needed for Jupyter Notebook specifically so that it doesn't get confused and stuck in

02:02.920 --> 02:03.400
the loop.

02:04.000 --> 02:04.240
All right.

02:04.240 --> 02:06.030
So that's all the setup we need.

02:06.430 --> 02:09.190
Now let's actually start creating it.

02:09.230 --> 02:14.270
We need to get async here and file client.

02:14.510 --> 02:16.590
Those are the two things we need right now.

02:17.150 --> 02:18.790
And then we're going to create a function.

02:18.790 --> 02:23.150
So we're going to say async def generate images.

02:24.270 --> 02:29.110
And it's going to take the model and then the prompt.

02:33.790 --> 02:36.790
And then that the model here is just a string.

02:37.550 --> 02:38.950
The prompt is a string as well.

02:39.750 --> 02:42.550
All right then we want to set some guidance scales.

02:42.550 --> 02:45.430
You could actually pass these step pass these through if you want.

02:45.590 --> 02:48.390
Here I'm just going to copy the values that I used.

02:51.550 --> 02:53.150
And you can set whatever you want in here.

02:53.150 --> 02:54.310
You can set more of them.

02:54.310 --> 02:56.230
But obviously that's just going to run for longer.

02:56.830 --> 02:59.950
And then we want to create the handlers.

02:59.950 --> 03:01.510
So let me just get rid of this.

03:01.950 --> 03:05.630
So I'm going to say handlers equals await.

03:08.150 --> 03:08.830
Listen here.

03:08.870 --> 03:09.470
Gather.

03:15.750 --> 03:19.550
So this is going to create all of the different API requests.

03:21.550 --> 03:27.190
And then I'm going to say file client submit async and then submit the model.

03:27.790 --> 03:28.910
Submit the prompt.

03:28.910 --> 03:32.870
And then we've got the guidance scale and the number of inference steps.

03:32.870 --> 03:35.630
And then we want to set a seed just so it's reproducible.

03:35.870 --> 03:40.070
So when we run this again it's going to it's going to go again right.

03:40.070 --> 03:40.230
Yeah.

03:40.230 --> 03:44.390
So we've got it's going to go through and it's going to loop through every guidance scale.

03:44.750 --> 03:49.790
And then it's going to also loop through every inference step and kind of multiply them together this

03:49.790 --> 03:50.110
way.

03:50.270 --> 03:52.190
So hopefully that makes sense to you.

03:52.350 --> 03:55.310
We're basically just creating all of the different requests.

03:55.430 --> 04:02.230
And then with the async API I would just want to now gather the actual results.

04:02.270 --> 04:04.190
So that's result async here.

04:04.350 --> 04:09.540
And we're just passing in the handler request ID and that's for handler and handlers.

04:09.580 --> 04:13.180
So I got the handlers here that's creating all the requests.

04:13.180 --> 04:17.100
And then we want to wait for the results to come back.

04:17.380 --> 04:21.380
And then we're going to pass through results guidance scales and inference steps.

04:21.820 --> 04:22.060
All right.

04:22.100 --> 04:23.540
So let's see what this looks like.

04:23.580 --> 04:28.940
If we pass flux dev and we pass this prompt of a rhino and a bar.

04:29.180 --> 04:32.260
And then we're just going to say results guidance scales inference steps.

04:32.260 --> 04:34.660
We're going to come out of doing this here.

04:34.660 --> 04:41.740
We can just do like wait generate images and then see if this runs.

04:46.700 --> 04:51.780
So this should take five seconds to run all those images, which is the benefit of doing an asynchronous

04:51.820 --> 04:54.180
or to take taking a lot longer otherwise.

04:54.460 --> 04:58.860
And now we just want to I guess we can see what these results look like.

04:59.620 --> 05:02.860
See you can see we've got all the different images here.

05:04.860 --> 05:09.180
If we open one of them, this is like a fuzzy one with a low inference step.

05:09.220 --> 05:09.620
Cool.

05:09.660 --> 05:10.020
All right.

05:10.060 --> 05:13.500
So now let's display them in an image grid.

05:13.540 --> 05:15.380
So to do that we're going to use Pil.

05:15.700 --> 05:17.820
We just need to get image draw image font.

05:17.860 --> 05:20.460
We also need requests to download the images.

05:20.580 --> 05:23.300
And then we're going to start creating this function here display image grid.

05:23.340 --> 05:25.820
We're going to take results scales and inference steps.

05:25.820 --> 05:27.300
And we're going to do something with them.

05:27.780 --> 05:32.500
First we need to create the image dimensions which is 400 by 300.

05:32.660 --> 05:34.820
And the margin we want between the images.

05:34.820 --> 05:37.700
And then we can use that to figure out the total width and height.

05:37.940 --> 05:40.660
And then we want to do scale factor.

05:41.020 --> 05:47.340
This is basically to create a higher resolution image and scale it down to make it not look really crappy.

05:48.460 --> 05:50.260
And then we have this combined image.

05:50.260 --> 05:51.220
So image new.

05:51.420 --> 05:54.900
And then it's it's creating like a white image essentially.

05:55.620 --> 05:55.860
Okay.

05:55.900 --> 05:59.300
And then we're going to draw that we're going to get the fonts.

05:59.620 --> 06:01.340
This is like a bit of a pain.

06:02.380 --> 06:04.380
I'm just going to copy this font names in.

06:04.860 --> 06:06.300
You don't need to worry too much about this.

06:06.340 --> 06:09.060
It's just for for show.

06:09.940 --> 06:14.420
And then this is just some kind of boilerplate to load the fonts.

06:14.460 --> 06:19.180
I don't think it's that important, but it's just what happened when I was coding it.

06:20.460 --> 06:21.060
There we go.

06:21.420 --> 06:22.260
So that should work.

06:22.260 --> 06:24.740
Hopefully, even if you're on a PC.

06:25.580 --> 06:28.820
And now we need this function I'm just going to copy this function in.

06:29.380 --> 06:34.700
And what this function does is it just draws text on on the on the side.

06:34.700 --> 06:36.020
This is actually the hardest part.

06:36.620 --> 06:38.820
And we're going to use that later on.

06:39.420 --> 06:39.740
All right.

06:39.820 --> 06:43.420
Now we just need to add the images to the grid right.

06:46.780 --> 06:47.300
Yeah.

06:47.340 --> 06:48.580
Let's call this grid.

06:50.380 --> 06:50.740
All right.

06:50.780 --> 06:57.740
So to do that we're just going to say for index result in numeric results.

06:58.180 --> 07:00.380
And then we're going to get the row.

07:05.820 --> 07:06.500
Five by four.

07:06.540 --> 07:08.980
And then the column.

07:11.620 --> 07:14.580
And then we're going to get the URL for this one.

07:15.650 --> 07:16.890
Download the image.

07:18.250 --> 07:24.290
And then let me just copy this across because this is like a specific way I found it works.

07:26.930 --> 07:28.530
So this is getting the URL.

07:29.370 --> 07:31.810
It's going to get the image from file.

07:31.850 --> 07:35.410
It's going to download that image using the request library.

07:35.770 --> 07:40.050
And then we're going to get the content of that image and then scale it and do resampling.

07:40.610 --> 07:48.290
So again don't worry if you don't know what any of this does because it's not super important.

07:48.290 --> 07:50.970
This is really just to make it display nicely.

07:51.290 --> 07:54.130
And then this bit here is just going to calculate the position.

07:54.130 --> 07:59.930
So it's going to take the column multiplied by the image width by the scale factor plus the margin multiplied

07:59.930 --> 08:01.250
by the scale factor again.

08:01.250 --> 08:04.090
And then for the same thing for the Y as well.

08:05.050 --> 08:07.730
And now we're going to add the labels as well.

08:07.730 --> 08:14.130
So from the from the bottom row then we just want to add the text.

08:14.930 --> 08:18.930
And then if we're on the leftmost column we're going to add the text as well.

08:18.970 --> 08:23.250
Don't ask me what this does because this is.

08:27.050 --> 08:27.490
Yeah.

08:27.890 --> 08:28.610
Should work.

08:29.410 --> 08:29.610
Yeah.

08:29.650 --> 08:33.730
Because this is just basically adding the text on the left hand side.

08:33.730 --> 08:36.210
This was like a lot of messing around to get it working.

08:36.210 --> 08:39.050
But I'm sure that you could could do a better job here.

08:39.810 --> 08:40.010
All right.

08:40.050 --> 08:46.570
And then I'm going to go final image is going to just resize and resample.

08:48.290 --> 08:52.610
And then we want to just rather return and just display that final image.

08:55.170 --> 08:55.770
Okay.

08:55.890 --> 08:58.170
And then display image grid.

08:59.330 --> 09:02.730
These needs to be here I believe.

09:02.770 --> 09:03.210
Yeah.

09:03.730 --> 09:04.010
Cool.

09:04.010 --> 09:06.370
So just to summarize there's a lot going on here.

09:06.370 --> 09:07.170
But you don't.

09:07.250 --> 09:08.690
None of it's important.

09:08.730 --> 09:10.890
All this is doing is just creating the grid.

09:11.450 --> 09:13.210
And there are lots of different ways to create a grid.

09:13.210 --> 09:15.050
So there's no secret sauce in here.

09:15.050 --> 09:16.730
It's just overly complicated.

09:17.010 --> 09:21.520
It's just figuring out where the images should go, How to scale them to the right size and then how

09:21.520 --> 09:23.040
to add text onto them.

09:23.160 --> 09:25.240
So if we run this, hopefully it should work.

09:27.440 --> 09:29.160
So it's just getting all those images.

09:37.320 --> 09:40.680
The reason it takes some time is it's got to download the image from file.

09:43.680 --> 09:44.480
And here we go.

09:45.200 --> 09:45.920
Now we have the grid.

09:45.960 --> 09:49.840
And because we used the same seed it's the exact same image.

09:49.880 --> 09:52.240
And that's the nice thing about reproducibility here.

09:52.680 --> 09:54.040
So we could actually change.

09:54.080 --> 09:54.840
Now we have this.

09:54.840 --> 09:59.520
We could go and change the different steps that we put in here the guidance scales.

09:59.520 --> 10:01.600
Or we could use different parameters as well.

10:01.640 --> 10:04.320
Like for example we could change something about the prompt.

10:04.360 --> 10:11.200
Maybe we want like a lion and a rhino and a chicken and just kind of see how they how they change things

10:11.240 --> 10:11.920
over time.

10:11.920 --> 10:17.240
This is a technique I use all the time to figure out how much to tune one way or another, a parameter,

10:17.240 --> 10:21.560
or what a prompt looks like with different variables in in the template.
