WEBVTT

00:00.080 --> 00:04.100
And in this video, you're going to learn about how you can do streaming inside of Lang chain.

00:04.100 --> 00:09.650
Streaming is a concept that allows you to get the output of a large language model, token by token,

00:09.650 --> 00:12.020
rather than waiting for the entire response.

00:12.020 --> 00:16.430
Firstly, we've installed all the different packages such as lang chain, lang chain, OpenAI, lang

00:16.460 --> 00:21.380
chain community, and then we've also added on a section here for adding in your API key.

00:21.410 --> 00:25.910
What we're now going to have to do is we're then going to have to import the chat OpenAI package from

00:25.910 --> 00:26.600
Lang chain.

00:26.630 --> 00:28.490
I'm going to start my Jupyter notebook.

00:28.490 --> 00:35.210
And then also we're going to import lang chain core dot messages import human message.

00:36.020 --> 00:37.730
Next we're going to set up the chat model.

00:37.730 --> 00:40.220
So I'm going to do model is equal to chat OpenAI.

00:40.700 --> 00:46.190
And then what we're going to do is instead of invoking it we're also going to have to add a parameter

00:46.190 --> 00:47.540
to the chat OpenAI.

00:47.660 --> 00:49.880
So we'll turn on streaming is equal to true.

00:49.880 --> 00:54.560
And then what you'll see is then instead of doing dot invoke we're going to do dot stream.

00:54.560 --> 00:56.270
Now what does dot stream give you.

00:56.300 --> 00:57.770
Well it gives you chunks back.

00:57.800 --> 01:01.790
You can see it returns a iterator of base message chunk right.

01:01.820 --> 01:03.320
So it's going to return an iterator.

01:03.320 --> 01:11.060
So what we can then do is to for chunk in model dot stream print chunk dot content An easier way to

01:11.060 --> 01:13.790
do this would probably be to flush the input stream.

01:13.790 --> 01:17.810
So we'll just change this and say flush the input stream.

01:22.970 --> 01:23.810
And here you go.

01:23.810 --> 01:25.190
That's exactly what we're looking for.

01:25.220 --> 01:25.400
Right.

01:25.400 --> 01:28.670
So you can see here that the content will come in.

01:28.700 --> 01:33.320
You can also adapt this to work with the human message or the system message or the AI message classes

01:33.320 --> 01:34.670
that you previously learned about.

01:34.670 --> 01:39.680
So for example, we can do for chunk in model dot stream.

01:39.680 --> 01:42.650
And then we will pass a list of messages.

01:42.680 --> 01:46.700
And we're going to have a human message which says what is the capital of the moon?

01:46.730 --> 01:49.310
We all know there is no capital of the moon, but just bear with me.

01:49.310 --> 01:52.190
Then what we're doing is we're then going to do exactly what we had here.

01:52.190 --> 01:56.720
So printing a chunk of content, we're then flushing the input as well.

01:56.720 --> 02:02.690
So you'll see here you do have to pass a list if you're providing message history in the stream.

02:02.690 --> 02:07.730
And what you'll see here is that we actually get the output exactly what we had like before, but we're

02:07.730 --> 02:10.160
now able to pass in different types of messages.

02:10.160 --> 02:15.590
In the next video, you want to learn about how you can extract structured information out using a large

02:15.590 --> 02:19.370
language model with a concept called an output parser inside of.
