WEBVTT

00:00.080 --> 00:01.440
Hey there, Eden here.

00:01.440 --> 00:08.480
And in this video I want to introduce to you to the with structured output method, which is the preferred

00:08.480 --> 00:12.600
way to get structured output out of large language models.

00:12.640 --> 00:19.800
This is a much more reliable method to get structured output, and it is the modern method of getting

00:19.800 --> 00:20.760
it now.

00:20.880 --> 00:26.200
It's more reliable because it leverages function calling of large language models.

00:26.280 --> 00:30.720
Now we're going to elaborate and go very, very deep on function calling.

00:30.720 --> 00:32.240
So don't worry about that.

00:32.280 --> 00:36.760
In this video I'm simply going to show you the interface and how to use it.

00:36.760 --> 00:40.320
And at the end of the video we're going to be comparing the two methods.

00:40.320 --> 00:42.960
And we're going to really see the differences between them.

00:43.160 --> 00:43.680
All right.

00:43.680 --> 00:45.000
So let's go to the code.

00:45.560 --> 00:48.120
And I want now to run everything.

00:48.120 --> 00:50.160
So I will have a fresh trace.

00:50.520 --> 00:58.200
So I'm going to remind you that the output parsing actually was the last step we did where we leveraged

00:58.200 --> 01:01.290
the parse method of the pedantic output parser.

01:01.890 --> 01:09.050
This entire process relied on the fact that we injected to our original prompt the formatting instructions

01:09.050 --> 01:09.890
to the LLM.

01:10.370 --> 01:14.570
We hope for the best, so we hope that the LLM is going to enforce that.

01:14.770 --> 01:21.530
And then a link chain is going to have an easy time parsing it and turning it into a pedantic object.

01:21.570 --> 01:21.850
Right.

01:21.890 --> 01:24.530
So you can see the um the call was successful.

01:24.930 --> 01:31.250
And by the way, it was successful because we used a model which is pretty strong, which is GPT four,

01:31.810 --> 01:37.490
even though it's not the latest GPT five, and it's more than capable of doing this task.

01:37.610 --> 01:44.970
Now, in the early days when we had GPT 3.5 or 3.5 turbo, that wasn't the case, and we'd actually

01:44.970 --> 01:46.810
get a lot of errors when doing it.

01:46.810 --> 01:49.050
And it was actually very error prone.

01:49.050 --> 01:55.570
And it's actually nice to see that as the models become better, the applications inherently become

01:55.570 --> 01:57.010
more reliable, right?

01:57.260 --> 02:00.700
So let me show you the trace in Langschmidt.

02:00.740 --> 02:03.780
Just to remind you everything we did in the previous video.

02:04.740 --> 02:07.100
And let me open up the trace.

02:07.300 --> 02:11.180
And here you can see the last call we made.

02:11.380 --> 02:19.500
And here you can see the format instructions that we instructed it to enforce the schema of our agent

02:19.500 --> 02:20.540
response schema.

02:20.580 --> 02:20.900
Okay.

02:20.940 --> 02:26.260
So it's eventually some text we inject to the prompt being sent to the LLM here.

02:26.500 --> 02:28.620
Now notice this text by the way.

02:28.820 --> 02:32.260
It was sent over and over and over again to the LLM.

02:32.300 --> 02:37.660
Even though we didn't need it, we only needed this part of the prompt in the last iteration when we

02:37.660 --> 02:41.620
wanted the answer from the LLM after it executed the tool.

02:41.780 --> 02:44.460
And in the previous steps, we didn't need it.

02:44.500 --> 02:47.540
We didn't need it for the reasoning to choose the search tool.

02:47.580 --> 02:50.700
So this is only for the output schema right.

02:50.740 --> 02:54.980
So this is problem number one or optimization number one we can make here.

02:55.430 --> 02:55.950
All right.

02:55.950 --> 03:03.310
So after the LM responded with the answer, hoping to be according to what we requested, we ran the

03:03.310 --> 03:06.150
runnable lambda to get the answer field.

03:06.150 --> 03:12.670
And then we ran the runnable lambda, which used the dot parse method of the pedantic output parser,

03:12.710 --> 03:18.670
which took this JSON schema and simply casted it into a Pydantic object.

03:20.070 --> 03:20.470
All right.

03:20.470 --> 03:24.750
So let's go now and let's switch to the structured output.

03:25.110 --> 03:29.870
So the first thing I want to do is to get rid of this pedantic output parser import.

03:29.870 --> 03:33.070
And let me remove also the output parser object.

03:33.070 --> 03:33.590
Here.

03:33.590 --> 03:35.070
We do not need it anymore.

03:35.390 --> 03:36.390
And here's the magic.

03:36.390 --> 03:39.550
I'm going to create a variable called structured LM.

03:39.870 --> 03:41.750
And I'm going to use the lm.

03:41.790 --> 03:47.590
We initialize GPT four and I'm going to use the with structured output method.

03:47.830 --> 03:53.030
And I'm going to give it the input of the pedantic objects of our agent response.

03:53.200 --> 03:56.000
So what's this with structured output method?

03:56.160 --> 04:03.560
It's going to create us a new version of the model, a new instance of the model that is specially configured

04:03.560 --> 04:11.600
to produce output in a specific structured format, which is going to be the agent response schema,

04:11.840 --> 04:18.440
which is going to be the blueprint which defines exactly what kind of structured data we want the model

04:18.440 --> 04:19.120
to return.

04:20.040 --> 04:24.360
Now, the structured data now is the new model instance.

04:24.360 --> 04:31.440
And when we're going to ask it something, it will always try to respond with the data that exactly

04:31.440 --> 04:35.840
matches the schema of the agent response object.

04:36.360 --> 04:39.960
So normally llms they just return as plain text.

04:40.120 --> 04:41.760
So this can be unpredictable.

04:41.760 --> 04:45.280
This was the case in older and less capable models.

04:45.280 --> 04:50.440
And it can cause problems downstream when we'll try to parse it in the application layer.

04:50.480 --> 04:57.850
Now conceptually, you can think about this line sort of like injecting to the prompt like we did previously

04:57.850 --> 05:00.130
with the schema instructions.

05:00.130 --> 05:05.770
However, this is leveraging the function calling capabilities of LMS, which is going to make everything

05:05.770 --> 05:07.050
much more reliable.

05:07.050 --> 05:09.930
And we're going to be elaborating on that in future sections.

05:09.930 --> 05:11.330
So don't worry about it.

05:11.530 --> 05:12.010
All right.

05:12.010 --> 05:15.090
So now let me get rid of those format instructions.

05:15.090 --> 05:16.330
We do not need it anymore.

05:16.330 --> 05:20.090
And I'm simply putting here some empty strings here.

05:20.330 --> 05:24.770
So remember this special prompt the special react prompt with the format instructions.

05:24.770 --> 05:27.610
So right now it's going to be populated with empty strings.

05:27.610 --> 05:29.890
We can also use the original react prompt.

05:29.930 --> 05:31.050
It doesn't really matter.

05:31.050 --> 05:32.650
So I'm just going to continue.

05:33.010 --> 05:33.410
All right.

05:33.410 --> 05:40.570
So it's important that when we're going to run our reasoning engine with the create react agent function,

05:40.570 --> 05:45.450
notice we're passing in the regular LM without the structured output.

05:45.450 --> 05:50.820
So we don't pass through the variable structure LM we want to use that special instance with the structured

05:50.820 --> 05:57.220
output only in the last iteration after the LM has created the answer, and we just want to format it

05:57.220 --> 06:00.940
nicely as the pydantic object of the agent response object.

06:01.220 --> 06:01.820
All right.

06:01.820 --> 06:06.380
So let me now remove the parse output line here.

06:06.380 --> 06:07.420
We don't need it.

06:07.660 --> 06:13.980
And the last execution in our chain is going to be piping it to the structured LM here.

06:14.300 --> 06:23.180
And in this last step here in our chain the structured LM is going to convert the text output to structured

06:23.180 --> 06:23.780
data.

06:23.780 --> 06:27.340
And this is going to be based on the tool calling capabilities.

06:27.500 --> 06:30.460
And again I do not want to dive deep here in tool calling.

06:30.500 --> 06:35.020
We're going to have an entire dedicated section for that where we're going to see underneath the hood

06:35.060 --> 06:36.340
how everything is working.

06:36.340 --> 06:44.100
But what's important to see here is that the structured output here is replacing a, the injection of

06:44.100 --> 06:51.960
the schema to the prompt, and In be the parsing itself, which previously we executed it explicitly.

06:51.960 --> 06:57.920
So to recap, this is the interface of using the with structured output method.

06:58.080 --> 06:58.600
All right.

06:58.600 --> 07:01.080
So let me go now and execute the code.

07:01.320 --> 07:04.160
And I'm going to fast forward everything.

07:04.360 --> 07:07.040
And what's important to see here is the output.

07:07.040 --> 07:09.760
And then we're going to explore the trace.

07:11.880 --> 07:12.560
Right.

07:12.720 --> 07:14.440
So here we can see the output.

07:14.440 --> 07:18.680
And we can see we got the output which adheres to the pedantic object.

07:18.680 --> 07:22.640
We can see we got the answer and we can see the sources.

07:22.840 --> 07:27.600
So everything is a like we expect it to be like in the previous run.

07:27.600 --> 07:29.520
So let me go now to Langschmidt.

07:29.520 --> 07:34.400
And let's see now the second trace and notice something interesting here.

07:34.600 --> 07:38.560
Notice here the prompt that we sent to the LLM.

07:38.600 --> 07:41.840
Every iteration we don't have the format instructions.

07:42.280 --> 07:48.010
So we didn't give any information about the schema and how the answer should look like.

07:48.050 --> 07:52.730
We're leaving everything to the LM, and this is actually saving us also tokens.

07:52.930 --> 07:58.930
So let me go to the runnable lambda which is going to extract the a result here.

07:59.090 --> 08:04.170
And this is eventually what the LM is going to return in the output field I remind you.

08:04.450 --> 08:07.250
And here we can see this is simply text here.

08:07.250 --> 08:15.330
And in the final part of our chain we're going to now make another LM call with this as input.

08:15.330 --> 08:17.050
So let me go and show you it.

08:17.290 --> 08:21.970
And you can see we have something which is new here which is tools.

08:21.970 --> 08:25.610
And we can see it's called we can see here agent response.

08:25.650 --> 08:32.410
We can see here the pydantic schema of our agent response and what the LM returned us here.

08:32.450 --> 08:38.810
The structured LM is an agent response object with the following fields.

08:38.810 --> 08:41.890
So you can see it right over here we have the answer field.

08:41.890 --> 08:47.420
It has the sources field with the URLs, and it's going to return it in a JSON, by the way.

08:47.420 --> 08:49.140
And now it's displayed as a YAML.

08:49.140 --> 08:53.260
So this is how Link Smith chooses to display the JSON response.

08:53.420 --> 08:58.940
And the important and cool thing to show you here is that remember the pedantic output parser?

08:58.940 --> 09:05.220
We didn't put it explicitly here, so link chain put it for us when we used the with structured output.

09:05.660 --> 09:09.540
And it's simply going to take the JSON.

09:09.540 --> 09:11.340
And I remind you this is not a YAML.

09:11.380 --> 09:12.500
This is a JSON.

09:12.500 --> 09:15.300
You can go and click in the see it as JSON.

09:15.300 --> 09:22.220
So link chain is going to take this JSON response from the LLM which adheres to the schema which the

09:22.220 --> 09:24.860
LLM use function calling to create it.

09:24.860 --> 09:30.940
And it's going to use the pedantic output parser in order to convert it into a pedantic object.

09:30.980 --> 09:33.420
It's going to be very similar to what we did before.

09:33.620 --> 09:36.260
However, here link Chain is doing it for us.

09:37.340 --> 09:37.820
Cool.

09:38.100 --> 09:41.710
Let's go back to the code and before we commit everything.

09:41.750 --> 09:48.110
Let's discuss the differences between pedantic parsing and with structured output.

09:48.110 --> 09:54.110
So as far as ease of use, pedantic parsing is harder because we need to inject a prompt.

09:54.110 --> 09:55.350
Then we need to parse it.

09:55.350 --> 10:00.750
And when we use we structured output, we only need to plug in the pedantic object and that's it.

10:01.310 --> 10:09.430
Now as far as reliability and this is the main difference, we structured output is much more reliable

10:09.470 --> 10:12.590
because it leverages tool calling of Llms.

10:12.590 --> 10:15.910
And I'll be showing you this in later sections of the course.

10:15.910 --> 10:21.870
So this is the main reason why it's preferred to use with structured output.

10:21.910 --> 10:29.150
Now when it comes to model support, the original pedantic parsing is supporting all models.

10:29.190 --> 10:36.270
I mean all models that has decent reasoning because it doesn't rely on function calling while the structured

10:36.270 --> 10:38.790
output method relies on function calling.

10:39.120 --> 10:45.240
Now, it's important to note that all the state of the art models these days have function calling,

10:45.240 --> 10:47.400
so this is really a non-issue now.

10:47.720 --> 10:51.240
And lastly, regarding control, the prompt control.

10:51.560 --> 10:57.640
While the pedantic output parser, the original way has a bit more control because we really control

10:57.680 --> 11:04.240
what's going to the prompt to the LLM, we can really use all of these features when describing the

11:04.280 --> 11:05.480
pedantic object.

11:05.480 --> 11:10.640
So eventually when using with structured output, we can still get the same results here.

11:10.880 --> 11:11.320
All right.

11:11.320 --> 11:18.120
And just to reiterate, the preferred way to go in output parsing and getting structured output is to

11:18.160 --> 11:20.360
use the with structured output method.

11:20.640 --> 11:21.000
All right.

11:21.000 --> 11:27.440
Let me open now cloud code and let me commit everything to the repo so you can find the code.

11:27.760 --> 11:34.360
So I'm going to write in the prompt create a commit from my new changes in a main.py file.

11:36.200 --> 11:36.640
All right.

11:36.640 --> 11:38.570
So it's going to write the commit.

11:38.930 --> 11:44.530
Let me now go and push it to the repo and let me show you the commit and the diff.

11:46.010 --> 11:47.650
So let's go to the repo.

11:49.490 --> 11:53.410
And I'm going to choose the branch of react search agent.

11:54.010 --> 11:54.450
Here.

11:54.450 --> 11:56.930
Let me go now to the commit list.

11:57.930 --> 12:01.650
And here the final commit refactor output parsing.

12:02.010 --> 12:07.930
Here you can see replace pedantic output parser with the structured output method and downgrade the

12:07.930 --> 12:09.290
model to GPT four.

12:09.690 --> 12:12.450
And yeah so this is the diff.

12:12.490 --> 12:17.130
You can see here I will be linking it alongside with the traces in the videos.

12:17.130 --> 12:17.890
Resources.

12:18.090 --> 12:24.130
Now I just want to remind you that in this video, I only wanted to show you the interface of how to

12:24.130 --> 12:24.890
use it.

12:24.890 --> 12:28.490
And I didn't talk anything about function calling.

12:28.490 --> 12:30.850
And we have an entire section dedicated to that.

12:30.850 --> 12:33.850
So don't worry if you don't understand how this is working under the hood.

12:33.890 --> 12:35.490
We're going to soon figure it out.