WEBVTT

00:00.000 --> 00:01.260
-: And in this video we're gonna cover

00:01.260 --> 00:03.360
how you can do prompt chaining.

00:03.360 --> 00:05.400
So prompt chaining is a process

00:05.400 --> 00:08.130
of decomposing a task into a series of steps,

00:08.130 --> 00:10.890
where each LLM call will basically process

00:10.890 --> 00:12.690
of the output of the previous one.

00:12.690 --> 00:14.370
You can think of it like a series of steps

00:14.370 --> 00:15.720
you can see in this diagram.

00:15.720 --> 00:18.180
So let's go and install a couple of packages.

00:18.180 --> 00:20.850
So we'll install OpenAI Pydantic,

00:20.850 --> 00:22.320
then we're gonna do some imports,

00:22.320 --> 00:23.940
so we input the operating system,

00:23.940 --> 00:26.640
typing Pydantic and OpenAI.

00:26.640 --> 00:28.650
Then you will need to set your OpenAI key.

00:28.650 --> 00:30.270
And we've also created a client.

00:30.270 --> 00:33.630
We are going to generate a story and a plot outline.

00:33.630 --> 00:35.310
And then with those plot outlines,

00:35.310 --> 00:36.900
we're going to then generate

00:36.900 --> 00:39.450
a bunch of different actual stories.

00:39.450 --> 00:40.590
Now the first thing we're gonna do

00:40.590 --> 00:42.903
is also generate our story-based model.

00:44.550 --> 00:48.720
And we're gonna have a title story author.

00:48.720 --> 00:50.550
And I also want plurals.

00:50.550 --> 00:54.630
So we've got the class stories, which is a stories key

00:54.630 --> 00:56.070
with a list of stories.

00:56.070 --> 00:59.910
So we've got the title story and author.

00:59.910 --> 01:01.380
We've also got a function

01:01.380 --> 01:04.830
that I'm gonna set up called serial_chain_workflow,

01:04.830 --> 01:08.160
which takes a topic and the number of stories.

01:08.160 --> 01:10.460
I'm just gonna generate a doc string for this.

01:13.110 --> 01:17.973
And then we're then going to have the stories plot prompt.

01:21.420 --> 01:23.280
And this is going to be an f-string,

01:23.280 --> 01:27.663
which will say, generate number of stories about topic,

01:28.980 --> 01:31.380
and we'll say generate, instead of stories,

01:31.380 --> 01:35.790
we'll say generate story

01:35.790 --> 01:40.590
plot ideas for stories about this topic.

01:40.590 --> 01:44.640
These should be short and concise for a children's book.

01:44.640 --> 01:46.170
Why not?

01:46.170 --> 01:51.170
And we'll also say later on these plot ideas

01:54.630 --> 01:57.783
will be used to generate full stories.

01:59.884 --> 02:01.530
And we'll just split this onto a couple of different lines

02:01.530 --> 02:03.483
'cause that's, I'll read a bit better.

02:08.700 --> 02:10.500
Okay. The next thing we're gonna do is we're going

02:10.500 --> 02:14.550
to use the structured outputs API from OpenAI.

02:14.550 --> 02:15.660
And this basically allows us

02:15.660 --> 02:18.150
to use those Pydantic models that we had before.

02:18.150 --> 02:20.730
So we'll call it stories_plot_response,

02:20.730 --> 02:22.770
and we're gonna actually use the

02:22.770 --> 02:25.710
client.beta.chat completions.

02:25.710 --> 02:28.470
And then we use the .parse function instead.

02:28.470 --> 02:30.090
The .parse function allows us

02:30.090 --> 02:33.270
to specifically get a Pydantic model.

02:33.270 --> 02:36.000
So it will be able to look for a story

02:36.000 --> 02:38.790
or a list of stories as a Pydantic model.

02:38.790 --> 02:41.320
And how that works is you use the response format

02:43.410 --> 02:47.040
and then you basically end up putting the response format

02:47.040 --> 02:48.420
and then you put in what you want there.

02:48.420 --> 02:51.090
So we are also gonna want one more Pydantic model.

02:51.090 --> 02:52.200
So I'm just gonna go up here

02:52.200 --> 02:54.807
and we're gonna add class StoryPlot.

02:56.820 --> 02:59.090
We'll have class StoryPlots,

03:02.948 --> 03:06.270
and that will just be a list of story plots.

03:06.270 --> 03:07.290
And then in our scenario

03:07.290 --> 03:11.223
I want a response format of story plots.

03:12.300 --> 03:15.693
We're gonna use the messages as well for this.

03:17.310 --> 03:19.140
And the messages that we're gonna put in here

03:19.140 --> 03:23.190
is the StoryPlots prompt.

03:23.190 --> 03:26.820
We're also going to set the temperature

03:26.820 --> 03:31.820
to be around 0.6 for a bit more creativity.

03:32.070 --> 03:35.643
We're also going to set the model to be GPT-4o mini.

03:37.710 --> 03:42.060
Then after that, we're gonna go, if not stories.choices

03:42.060 --> 03:47.060
or stories_plot.choices0.message.past,

03:47.640 --> 03:50.160
We're actually gonna raise a value error.

03:50.160 --> 03:52.710
And the reason why we're doing this is sometimes if you look

03:52.710 --> 03:56.190
at the plot response, it has a past chat completion,

03:56.190 --> 03:59.610
but in the .choices that might not actually exist

03:59.610 --> 04:02.220
and the .parse also might be a none type.

04:02.220 --> 04:03.510
You can see that here.

04:03.510 --> 04:06.570
So it's good to have some validation error in here.

04:06.570 --> 04:08.460
And then we can just go and test this out.

04:08.460 --> 04:10.980
So we have three story plots.

04:10.980 --> 04:13.050
Okay, so we've done some sequential work,

04:13.050 --> 04:14.400
but the next thing that we want to do

04:14.400 --> 04:15.660
is for each of these plots,

04:15.660 --> 04:19.470
we also want to then generate a story.

04:19.470 --> 04:22.290
So for example, we could have here, instead of returning,

04:22.290 --> 04:23.520
we could then just store this

04:23.520 --> 04:27.250
and say, stories as a python list.

04:30.270 --> 04:32.523
And then we could loop over each story,

04:33.930 --> 04:36.153
and then we could generate a story prompt.

04:41.010 --> 04:43.680
And there you go. So now we're also then saying that we want

04:43.680 --> 04:47.490
to generate a response format of a story

04:47.490 --> 04:50.670
using that client.beta.chat_completion.parse.

04:50.670 --> 04:52.650
And we're parsing in the story prompt,

04:52.650 --> 04:53.700
which is generate a story

04:53.700 --> 04:55.890
based on the following plot plot plot,

04:55.890 --> 04:59.280
and then we have the story response, which is coming back.

04:59.280 --> 05:01.320
And then from that, then what we wanna do is we want

05:01.320 --> 05:04.920
to check does that story response exist?

05:04.920 --> 05:07.500
And if not, when maybe we could raise an error.

05:07.500 --> 05:10.320
If you wanted to be a little bit less secure,

05:10.320 --> 05:11.250
you could do something like this

05:11.250 --> 05:12.900
but you could just continue,

05:12.900 --> 05:17.100
if it is a valid story, we add it to the stories list.

05:17.100 --> 05:20.040
And then what we can then do is return a Pydantic model,

05:20.040 --> 05:22.080
which is our stories.

05:22.080 --> 05:25.170
So I'm just gonna go and give that a go.

05:25.170 --> 05:26.970
And then we'll also save that.

05:26.970 --> 05:29.640
So we'll do serial_chain_workflow, the story,

05:29.640 --> 05:33.693
and we'll do something like data science and engineering.

05:35.820 --> 05:36.720
And we'll see,

05:36.720 --> 05:40.770
now we should get back three stories

05:40.770 --> 05:42.210
which have been generated from their plots.

05:42.210 --> 05:44.910
And you can see that this is increasing the amount of time

05:44.910 --> 05:46.320
that this takes to run.

05:46.320 --> 05:49.230
So obviously this chain is a sequential chain,

05:49.230 --> 05:52.920
and the fact is, is we're first generating all the plots.

05:52.920 --> 05:53.970
If there's an error

05:53.970 --> 05:56.430
and we don't have any plots, then we raise the error.

05:56.430 --> 05:58.320
Then we have a Python list here.

05:58.320 --> 06:00.360
And for each individual plot,

06:00.360 --> 06:03.240
what we then do is generate a story here.

06:03.240 --> 06:05.100
And we then, if we don't have a story,

06:05.100 --> 06:08.040
we just continue to the next iteration of that for loop.

06:08.040 --> 06:10.740
But if we do have a story, we append that in.

06:10.740 --> 06:12.300
And then we return a Pydantic model

06:12.300 --> 06:13.560
at the end with these stories.

06:13.560 --> 06:15.840
And you can see here now we've got these different stories.

06:15.840 --> 06:17.167
So what we'll do is we'll go into the stories

06:17.167 --> 06:19.950
and we'll do .stories,

06:19.950 --> 06:22.110
and then you can see we've got our Python list,

06:22.110 --> 06:24.450
and I can do square bracket 0.

06:24.450 --> 06:28.053
And then we can, for example, do the .title.

06:29.010 --> 06:29.887
And then you can see we've got

06:29.887 --> 06:32.070
"Mia and the Missing Crystal."

06:32.070 --> 06:34.670
And we can also do things like get the entire story.

06:36.870 --> 06:38.910
Okay, so you've learned about prompt chaining,

06:38.910 --> 06:40.950
which is basically a series of input

06:40.950 --> 06:43.020
and output pairs going into the next one,

06:43.020 --> 06:44.580
we've covered this also in LangChain,

06:44.580 --> 06:45.900
but I thought it's good for you to see

06:45.900 --> 06:49.770
how you can do this automatically using Open AI's package

06:49.770 --> 06:52.100
and using the structured outputs API.

06:52.100 --> 06:54.000
In the next video, we'll cover routing,

06:54.000 --> 06:56.400
how you can easily, given a user query,

06:56.400 --> 06:59.400
route different queries to different types of endpoints,

06:59.400 --> 07:01.290
whether these are different LLM calls

07:01.290 --> 07:02.430
or different functions

07:02.430 --> 07:05.160
so that your program can determine at runtime

07:05.160 --> 07:07.770
what to do with specific types of user queries.

07:07.770 --> 07:09.520
Cool, I'll see you in the next one.
