WEBVTT

00:00.000 --> 00:01.350
-: Okay, so in this video we're gonna have a look

00:01.350 --> 00:03.120
at the chat completions endpoint

00:03.120 --> 00:05.880
versus the responses API endpoint.

00:05.880 --> 00:07.440
What I want you to do is open

00:07.440 --> 00:10.440
the chat_completions_vs_ responses_api notebook.

00:10.440 --> 00:11.880
And we're just gonna run this

00:11.880 --> 00:13.620
and have a look at the differences.

00:13.620 --> 00:17.100
So the main thing is that you've got this .create method,

00:17.100 --> 00:18.420
and it's happening on both

00:18.420 --> 00:21.630
either the chat.completions part of OpenAI,

00:21.630 --> 00:24.930
or it's happening on the responses part of OpenAI.

00:24.930 --> 00:27.780
Now, if you go and have a look at the differences,

00:27.780 --> 00:30.330
the state management for chat completions

00:30.330 --> 00:33.240
is you have a messages array or a Python list.

00:33.240 --> 00:35.730
And in every call it's completely stateless.

00:35.730 --> 00:38.910
You have to manually add the previous messages.

00:38.910 --> 00:41.790
In the responses API, by default,

00:41.790 --> 00:44.040
it is a stateful back service.

00:44.040 --> 00:47.280
So OpenAI will hold your messages

00:47.280 --> 00:50.340
and you just have to use the previous response ID

00:50.340 --> 00:51.360
inside of that.

00:51.360 --> 00:53.190
But remember, we also overcame that

00:53.190 --> 00:55.620
by setting the stories equal to false

00:55.620 --> 00:58.620
to make the response's API endpoint stateless.

00:58.620 --> 01:01.530
And then we manually optimize the message history

01:01.530 --> 01:03.930
so we could do custom things like token counting.

01:03.930 --> 01:07.200
The chat completions endpoint provides a messages array

01:07.200 --> 01:10.290
and you have to extract that with this choices list,

01:10.290 --> 01:12.240
with the responses API,

01:12.240 --> 01:15.120
you don't have an input for the messages,

01:15.120 --> 01:17.010
it's just the input parameter.

01:17.010 --> 01:20.430
So the responses API is designed for building agents

01:20.430 --> 01:23.010
and will replace the current assistance API,

01:23.010 --> 01:25.440
and it offers a smoother development experience

01:25.440 --> 01:27.960
by holding the chat context between turns.

01:27.960 --> 01:29.940
If we're looking at a minimal example,

01:29.940 --> 01:32.280
and I'm gonna basically just explain through this code

01:32.280 --> 01:33.240
so you can see what's happening.

01:33.240 --> 01:37.860
So you've got OpenAI, we've got that gpt-4.1-mini,

01:37.860 --> 01:39.420
and then we're setting up a client,

01:39.420 --> 01:41.940
in the client.chat.completions.create,

01:41.940 --> 01:44.010
everything is saying from the model's perspective,

01:44.010 --> 01:46.800
but we have this messages argument

01:46.800 --> 01:49.170
and we paste our messages in here,

01:49.170 --> 01:52.410
and then, you know, when we want to then add those

01:52.410 --> 01:56.040
to extract the messages before we do responses.choices

01:56.040 --> 01:59.610
and get this first one out, .message.content.

01:59.610 --> 02:01.380
Then we can create a follow-up request

02:01.380 --> 02:03.750
where we pass in the original conversation

02:03.750 --> 02:06.030
plus our additional message as well.

02:06.030 --> 02:07.590
Tell me another joke.

02:07.590 --> 02:10.530
And then we can also get the messages from here

02:10.530 --> 02:15.530
via that .choices[0].message.content.

02:15.540 --> 02:18.510
If we look at the responses API like we have before,

02:18.510 --> 02:20.970
you've got rid of the messages argument here

02:20.970 --> 02:23.850
for the .create, and we've replaced it with input,

02:23.850 --> 02:25.710
but the item is basically the same.

02:25.710 --> 02:29.730
It can either be a textual string, like as a prompt,

02:29.730 --> 02:32.130
or it can be a list of messages.

02:32.130 --> 02:34.830
We also have the previous response ID

02:34.830 --> 02:37.890
where basically we don't have to manually save the state

02:37.890 --> 02:39.030
of the message history,

02:39.030 --> 02:41.880
we're just using that previous response ID.

02:41.880 --> 02:45.180
So just to summarize with the chat completions API endpoint,

02:45.180 --> 02:47.730
you manually have to handle the conversation state,

02:47.730 --> 02:50.070
and it uses the messages parameter

02:50.070 --> 02:54.210
and returns an output under the choices messages.content.

02:54.210 --> 02:55.890
With the responses API,

02:55.890 --> 02:57.720
you don't have to manage message history

02:57.720 --> 02:59.130
unless you want to,

02:59.130 --> 03:01.260
and uses the input parameter

03:01.260 --> 03:03.270
and returns the aggregated output

03:03.270 --> 03:05.310
as the .output_text property.

03:05.310 --> 03:07.710
Cool. All right, I'll see you in the next video.
