WEBVTT

00:00.000 --> 00:00.833
-: And in this lesson,

00:00.833 --> 00:02.910
we're gonna cover a concept called routing.

00:02.910 --> 00:05.580
Routing is a workflow pattern that you can use of LLMs

00:05.580 --> 00:08.850
to dictate when a user input hits a various bit

00:08.850 --> 00:12.060
of your system, where that user query should go next.

00:12.060 --> 00:14.430
It's almost like a classification happening,

00:14.430 --> 00:15.930
and then you're basically assigning it

00:15.930 --> 00:18.240
to a different task downstream.

00:18.240 --> 00:19.500
There are a couple of different use cases,

00:19.500 --> 00:21.090
but generally it's when you have different types

00:21.090 --> 00:23.100
of customer service or inquiries,

00:23.100 --> 00:25.500
different types of LMS or model configurations to deal

00:25.500 --> 00:27.030
with different types of tasks.

00:27.030 --> 00:29.640
And also evaluating whether a request meets certain

00:29.640 --> 00:31.290
guidelines or triggers.

00:31.290 --> 00:33.480
So the first thing we're gonna do is install pip.

00:33.480 --> 00:36.019
We're going to install OpenAI and Pydantic,

00:36.019 --> 00:37.890
and we're gonna do a couple of imports.

00:37.890 --> 00:39.390
And then we're also going to allow you

00:39.390 --> 00:40.740
to put your OpenAI key,

00:40.740 --> 00:45.280
and we've set up a client with a model of GPT-4o

00:45.280 --> 00:46.650
Okay. So the next one we need

00:46.650 --> 00:48.090
to do is we're gonna get a couple

00:48.090 --> 00:49.830
of different endpoints that we're gonna hit.

00:49.830 --> 00:51.180
The first is a function.

00:51.180 --> 00:54.030
So we're gonna create a function called classify_as_spam,

00:55.080 --> 00:59.130
and we're gonna take in a user_query, which is a string.

00:59.130 --> 01:01.580
And what we're gonna then do is return a boolean.

01:03.090 --> 01:05.100
For now, I'm just gonna put this as a pass.

01:05.100 --> 01:08.850
We're also gonna have a generate_summary function,

01:08.850 --> 01:11.340
which is also gonna take in a user_query,

01:11.340 --> 01:13.449
but instead of return a boolean,

01:13.449 --> 01:14.370
we're going to return a string

01:14.370 --> 01:15.780
and we're just gonna put pass.

01:15.780 --> 01:19.680
So let's build up the classify_as_spam function first.

01:19.680 --> 01:22.680
So the the prompt that we're gonna use is a spam_prompt,

01:22.680 --> 01:26.740
and we're gonna say, "You are a spam classifier

01:28.050 --> 01:31.830
that takes a user query

01:31.830 --> 01:36.830
and returns True if the user query is spam

01:37.380 --> 01:39.720
and false otherwise."

01:39.720 --> 01:40.980
Cool. So we've got our prompt.

01:40.980 --> 01:42.780
Then what we're gonna do is we're gonna make our

01:42.780 --> 01:43.797
spam_response,

01:43.797 --> 01:48.797
and we're gonna use the client.beta.chat.completions.pass

01:48.870 --> 01:51.803
and remember, we're gonna use that response_format

01:52.737 --> 01:55.137
and we're gonna use a response_format of a bool.

01:56.610 --> 01:57.870
And then as well as that,

01:57.870 --> 02:01.230
we're also going to put the model=MODEL

02:01.230 --> 02:03.360
so we're gonna use the messages,

02:03.360 --> 02:05.250
and then we're gonna have a role of user

02:05.250 --> 02:07.110
with the content of that prompt.

02:07.110 --> 02:09.660
We're also going to turn down the tokens.

02:09.660 --> 02:11.850
We're also going to turn down the temperature

02:11.850 --> 02:14.730
and set that to zero to make it more deterministic.

02:14.730 --> 02:18.413
And then we also need to add in the user_query.

02:21.019 --> 02:21.852
"Here is the user query." So we'll go back up to the prompt.

02:23.130 --> 02:25.560
Here is the user query and remember to make sure

02:25.560 --> 02:27.243
that is an string as well.

02:28.110 --> 02:29.670
We're also gonna add in some validation here.

02:29.670 --> 02:33.690
So we'll say, if not spam.choices.message.content,

02:33.690 --> 02:36.647
or not spam_responses.choices[0]

02:50.583 --> 02:55.203
What I want to do is .message.parsed.

03:04.402 --> 03:05.235
There we go.

03:05.235 --> 03:08.133
And then we're gonna do our generate_summary function.

03:09.420 --> 03:11.193
So we'll have a summary_prompt.

03:14.610 --> 03:16.980
We'll also use the chat completions,

03:16.980 --> 03:19.350
but in this case, we're just gonna return the content.

03:19.350 --> 03:21.600
We're also going to define a router.

03:21.600 --> 03:23.790
This router's gonna define all the different endpoints

03:23.790 --> 03:25.290
which the function can go into.

03:25.290 --> 03:26.820
So we're gonna have a destination.

03:26.820 --> 03:28.590
We're gonna set the Literal,

03:28.590 --> 03:30.810
which basically means this string either has

03:30.810 --> 03:33.360
to be a type of spam.

03:33.360 --> 03:36.210
And we're gonna change these actually to the real ones above

03:36.210 --> 03:39.360
to classify_as_spam and generate_summary.

03:39.360 --> 03:41.430
So this router has to have a destination,

03:41.430 --> 03:43.680
is either a key of destination

03:43.680 --> 03:45.780
with a value of classify_as_spam

03:45.780 --> 03:49.110
or a key of destination with a value of generate_summary.

03:49.110 --> 03:51.450
We're then gonna make a function called router,

03:51.450 --> 03:55.667
which takes in a user_query and then will return a router.

03:55.667 --> 04:00.667
So what we're then gonna do is say you have a router_prompt.

04:01.132 --> 04:03.663
You are a router. Here is the user query.

04:03.663 --> 04:07.200
And then we're going to basically say,

04:07.200 --> 04:10.500
we then have a response format of a router,

04:10.500 --> 04:14.160
which is then going to specifically give us a router.

04:14.160 --> 04:15.810
So this would give us a router

04:15.810 --> 04:17.583
and tell us where we need to go.

04:18.510 --> 04:20.670
And then what we're then gonna do is we're then gonna use

04:20.670 --> 04:23.580
this router to actually do that type of routing.

04:23.580 --> 04:25.860
So we're gonna have try and accept,

04:25.860 --> 04:28.650
and then we're gonna basically, for now,

04:28.650 --> 04:31.650
we're gonna put a print of a router

04:31.650 --> 04:34.020
and let's put something in that would be useful

04:34.020 --> 04:36.000
to actually do something with.

04:36.000 --> 04:38.820
So we'll say, "I want to make a summary

04:38.820 --> 04:40.623
of the following text:

04:42.480 --> 04:44.580
This is awesome,

04:44.580 --> 04:46.017
how are you doing?"

04:46.890 --> 04:50.220
And then what we'll see is we'll get back a router object

04:50.220 --> 04:52.590
and notice how it decided to send this

04:52.590 --> 04:55.530
to the destination key with a string letter

04:55.530 --> 04:57.000
of generate_summary.

04:57.000 --> 04:58.200
So we could store this

04:58.200 --> 05:01.457
as the destination_router_result

05:03.872 --> 05:07.500
and then what we could then do is print this out.

05:07.500 --> 05:08.940
So you can now see that we could go

05:08.940 --> 05:10.953
to the .destination of this.

05:15.990 --> 05:18.066
And we've got the generate_summary.

05:18.066 --> 05:19.175
So we could extend this

05:19.175 --> 05:22.508
to then basically get the router choice.

05:23.375 --> 05:24.986
And then we could basically do something like this

05:24.986 --> 05:25.865
if the generate

05:25.865 --> 05:28.890
at the destination_router.destination is equal

05:28.890 --> 05:29.723
to generate_summary,

05:29.723 --> 05:32.580
generate_summary, and then we have the user_query.

05:32.580 --> 05:35.820
So let's go and avoid hard coding this user_query.

05:35.820 --> 05:38.040
So we'll move this user_query out

05:38.040 --> 05:42.123
from the router function up here,

05:45.480 --> 05:47.180
and then we'll put that back here.

05:48.090 --> 05:52.680
And then if it's not, then it's a classifier spam.

05:52.680 --> 05:55.443
And then we'll put else raise ValueError.

05:56.850 --> 05:59.820
So now what's happening is we have a user_query,

05:59.820 --> 06:02.610
it goes into the router function.

06:02.610 --> 06:05.430
And remember all this router function is doing is having a

06:05.430 --> 06:08.070
router prompt, which is determining

06:08.070 --> 06:10.200
what thing is gonna happen next.

06:10.200 --> 06:14.460
And then we use the client.beta.chat.completion.parsed

06:14.460 --> 06:16.027
with that router prompt to decide,

06:16.027 --> 06:17.910
"Hey, given this user query,

06:17.910 --> 06:20.250
where are we gonna send that user query?"

06:20.250 --> 06:22.710
It will then return back the router here

06:22.710 --> 06:24.750
with the string literals of where

06:24.750 --> 06:26.970
that specific thing can go to.

06:26.970 --> 06:28.380
And then as well as that,

06:28.380 --> 06:31.380
then what we also do is once we have that destination,

06:31.380 --> 06:34.440
then we can specifically route the user_query

06:34.440 --> 06:36.180
to a specific destination.

06:36.180 --> 06:38.400
This is a really useful pattern,

06:38.400 --> 06:40.140
and it's similar to function calling,

06:40.140 --> 06:43.440
but we are basically just redirecting the user query.

06:43.440 --> 06:46.320
We're not necessarily asking the LLM

06:46.320 --> 06:49.170
to generate functional arguments or keyword arguments.

06:49.170 --> 06:52.830
We're simply doing a navigational step in determining

06:52.830 --> 06:54.270
what should happen next.

06:54.270 --> 06:56.970
And we have quite a lot of explicit control

06:56.970 --> 06:58.560
over this routing pattern.

06:58.560 --> 07:00.330
Alright, so in the next one we'll have a look

07:00.330 --> 07:04.593
at parallelization and how we can paralyze tasks with LLMs.
