WEBVTT

00:00.000 --> 00:00.930
-: Alright, welcome back.

00:00.930 --> 00:02.490
So in this video, we're gonna have a look

00:02.490 --> 00:06.330
at how you can use streaming inside of open AI's API

00:06.330 --> 00:07.800
and what streaming is.

00:07.800 --> 00:09.720
The notebook that you'll be following along with

00:09.720 --> 00:12.570
is inside the OpenAI features and functionality folder.

00:12.570 --> 00:15.720
It's called streaming with responses API.

00:15.720 --> 00:18.270
So open that up and at the start,

00:18.270 --> 00:21.570
we're gonna do a PIP install on the OpenAI client.

00:21.570 --> 00:25.350
And to give you a bit of information about streaming,

00:25.350 --> 00:28.530
basically rather than waiting for the entire response,

00:28.530 --> 00:32.130
you can use OpenAI streaming to get all of the changes

00:32.130 --> 00:36.960
that are happening in real time inside of Open AI's API

00:36.960 --> 00:39.000
and get those events streamed directly

00:39.000 --> 00:41.460
to you via your client so that you can give that

00:41.460 --> 00:42.960
to users immediately.

00:42.960 --> 00:45.420
Just like you can see here when I've got some information

00:45.420 --> 00:48.840
and I say generate an article outline,

00:48.840 --> 00:51.690
ChatGPT won't wait until respond with everything,

00:51.690 --> 00:54.090
it will actually stream back things in real time.

00:54.090 --> 00:55.200
And so we'll see,

00:55.200 --> 00:58.950
make it about digital marketing really long.

00:58.950 --> 01:00.480
And you'll see that we don't have to wait

01:00.480 --> 01:02.370
for the entire response to come back, right?

01:02.370 --> 01:05.430
It's just streamed directly back to us in the UI.

01:05.430 --> 01:08.370
And this is essentially what streaming architecture is

01:08.370 --> 01:11.010
where rather than the entire response coming back in one go

01:11.010 --> 01:14.580
and waiting for the end response, we can stream the changes

01:14.580 --> 01:18.270
that happen in real time and increment those delta changes

01:18.270 --> 01:19.980
to become a final output.

01:19.980 --> 01:23.280
So OpenAI exposes a couple of different event types,

01:23.280 --> 01:24.750
when the response was created,

01:24.750 --> 01:27.000
what the output text.delta is,

01:27.000 --> 01:28.410
when the response is completed,

01:28.410 --> 01:30.510
and whether there is an error as well.

01:30.510 --> 01:31.740
So let's follow along

01:31.740 --> 01:33.930
and just have a look at this coding example.

01:33.930 --> 01:37.020
We have a GPT 4.1.mini

01:37.020 --> 01:39.390
and you're going to need to set your Open AI key,

01:39.390 --> 01:40.710
so make sure to do that.

01:40.710 --> 01:42.870
Now, you'll see there's only one change that we made

01:42.870 --> 01:45.330
to get streaming, which is adding the stream argument

01:45.330 --> 01:49.590
to the client.responses.create and setting that to true.

01:49.590 --> 01:51.060
And then basically, all you do is,

01:51.060 --> 01:53.100
you iterate over the stream

01:53.100 --> 01:55.050
and then you can print the details.

01:55.050 --> 01:56.940
So you can see we've got this event,

01:56.940 --> 01:58.770
which is a response stream event

01:58.770 --> 02:00.480
and we can see there is a type of this

02:00.480 --> 02:01.650
and there's gonna run this now

02:01.650 --> 02:04.200
and we'll see the stream events coming in in real time.

02:04.200 --> 02:06.630
So you can see here, you've got responses.created,

02:06.630 --> 02:10.050
then it had that in progress, the output item.added.

02:10.050 --> 02:11.700
So all of these come in in real time.

02:11.700 --> 02:13.950
So I'll just try and run it again so you can have a look

02:13.950 --> 02:16.650
and I'll scroll back up rather quickly.

02:16.650 --> 02:19.800
So let's go and do this one more time and have a look.

02:19.800 --> 02:20.880
And you can see, there you go.

02:20.880 --> 02:22.410
So these are all the events that are coming in

02:22.410 --> 02:24.870
and you've got things like the responses created,

02:24.870 --> 02:25.980
the responses in progress.

02:25.980 --> 02:28.410
You can use all these different types of events

02:28.410 --> 02:31.260
to figure out how you want to talk to your front end.

02:31.260 --> 02:32.280
The next thing I'm gonna show you is

02:32.280 --> 02:34.440
how you can build the text up in real time.

02:34.440 --> 02:36.180
So we've got the same thing happening here

02:36.180 --> 02:39.600
where we're gonna request that the model do a tongue twister

02:39.600 --> 02:43.620
and give us some output from whilst using the streaming.

02:43.620 --> 02:45.390
So you've got that streaming is equal to true.

02:45.390 --> 02:47.550
We set our text to be nothing.

02:47.550 --> 02:50.160
And then, as we iterate over the stream,

02:50.160 --> 02:51.630
if the event type is equal

02:51.630 --> 02:55.500
to a response.output.text.delta,

02:55.500 --> 02:58.800
we take the original text and we apply the delta to it.

02:58.800 --> 03:00.270
So have a look at what you get from this

03:00.270 --> 03:01.140
when you're generating this.

03:01.140 --> 03:03.600
So, we started with the sure,

03:03.600 --> 03:05.850
then we added the exclamation mark, then the here,

03:05.850 --> 03:07.710
then the it, then the is.

03:07.710 --> 03:10.140
So basically, that's what's happening is,

03:10.140 --> 03:11.970
we are setting streaming to true,

03:11.970 --> 03:14.280
we're iterating over the stream.

03:14.280 --> 03:16.560
And then what we're doing is, when the event type

03:16.560 --> 03:19.470
is equal to this output text.delta,

03:19.470 --> 03:22.080
we are then just basically building up this Python variable

03:22.080 --> 03:24.960
in real time and printing it out every single time

03:24.960 --> 03:26.490
the delta is changing.

03:26.490 --> 03:28.680
Cool. So in summary, to you streaming, you just have

03:28.680 --> 03:30.870
to set the stream easy equal to true parameter

03:30.870 --> 03:32.250
in your API request.

03:32.250 --> 03:35.523
You will need to handle the event types and the deltas.