WEBVTT

00:00.000 --> 00:00.900
-: Hey, welcome back.

00:00.900 --> 00:02.910
And in this video we're gonna have a look at a technique

00:02.910 --> 00:04.470
called parallelization.

00:04.470 --> 00:07.650
This is basically where you have some prompt inputs.

00:07.650 --> 00:10.200
You send that out to multiple LLM call providers.

00:10.200 --> 00:13.290
So in this example we've got one, two, three call providers,

00:13.290 --> 00:14.790
which are happening in parallel,

00:14.790 --> 00:16.830
and then we're using an LLM or some type

00:16.830 --> 00:19.800
of functional aggregator to aggregate these results

00:19.800 --> 00:21.720
and then producing a output.

00:21.720 --> 00:24.420
So good examples of these is where you're like asking

00:24.420 --> 00:26.850
an LLM to generate multiple answers

00:26.850 --> 00:28.830
and then synthesizing a new answer,

00:28.830 --> 00:30.630
or reviewing a piece of code

00:30.630 --> 00:33.480
for both security vulnerabilities and stylistic improvements

00:33.480 --> 00:34.560
at the same time.

00:34.560 --> 00:37.440
Or maybe analyzing some text for emotional tone,

00:37.440 --> 00:39.300
intent, and potential biases,

00:39.300 --> 00:42.690
with each aspect handled by a different dedicated LLM.

00:42.690 --> 00:46.080
So firstly, we're gonna import both OpenAI, Pydantic,

00:46.080 --> 00:47.820
and also nest-asyncio.

00:47.820 --> 00:49.560
We are using Nest-asyncio here

00:49.560 --> 00:52.140
because we're gonna run some async code inside

00:52.140 --> 00:53.490
of the Jupyter notebook.

00:53.490 --> 00:57.360
We also are importing Asyncio, the OS package,

00:57.360 --> 00:59.730
typing for literal, a pedantic base model,

00:59.730 --> 01:02.100
and an Async OpenAI client.

01:02.100 --> 01:04.800
Then you'll just need to update your OpenAI key here.

01:04.800 --> 01:06.630
And then we also have the client

01:06.630 --> 01:07.800
and the model that we'll be using.

01:07.800 --> 01:10.290
So I'm just gonna run all these imports.

01:10.290 --> 01:11.820
Now, the first thing that we're gonna do is

01:11.820 --> 01:16.110
we're gonna have a LLM analyze the decision

01:16.110 --> 01:17.910
of what we're currently doing

01:17.910 --> 01:20.370
and deciding and helping with our future direction.

01:20.370 --> 01:23.010
So think of business coach or an executive coach.

01:23.010 --> 01:24.810
The way that we're gonna do that is we're gonna have

01:24.810 --> 01:27.360
a function called analyze decision,

01:27.360 --> 01:31.020
and then we're gonna put a role, which is a string,

01:31.020 --> 01:33.000
and we're also gonna have the context,

01:33.000 --> 01:34.410
which will be a string.

01:34.410 --> 01:36.663
And then what this will return is a string.

01:37.830 --> 01:40.200
We're also gonna have an aggregator function.

01:40.200 --> 01:45.200
We're going to call this aggregate to a final answer,

01:45.540 --> 01:49.200
which is going to take a roll of string, context of string,

01:49.200 --> 01:50.940
and a list of answers,

01:50.940 --> 01:52.530
which have previously happened before,

01:52.530 --> 01:54.390
and it's going to return a string.

01:54.390 --> 01:56.940
Now we're also gonna have a entry point into this,

01:56.940 --> 01:59.547
which will be an async def main function,

01:59.547 --> 02:02.340
and we're just gonna put pass there for now.

02:02.340 --> 02:04.590
So let's work on the analyze decision.

02:04.590 --> 02:06.570
So we're gonna have a prompt,

02:06.570 --> 02:08.520
and our prompt is gonna be an F-string.

02:09.930 --> 02:12.330
And inside of our prompt, you're gonna say you, are a role.

02:12.330 --> 02:13.470
Who's analyzing the decision?

02:13.470 --> 02:15.600
The context is the context.

02:15.600 --> 02:19.500
We're gonna say you are analyzing the potential

02:19.500 --> 02:24.500
for a user and helping them like a career coach.

02:26.880 --> 02:29.020
Then we're gonna set up our chat response

02:31.650 --> 02:32.640
and we're going use

02:32.640 --> 02:37.640
the await client.beta.chat completions.pass.

02:39.870 --> 02:42.900
We're gonna pass in the model and the messages,

02:42.900 --> 02:45.180
and then we're gonna return the content.

02:45.180 --> 02:47.820
Now for the aggregation, we're gonna do something similar,

02:47.820 --> 02:49.830
but we now have the answers.

02:49.830 --> 02:51.690
So we'll say this is the context,

02:51.690 --> 02:55.120
the answers are, and then we'll say

02:57.330 --> 02:59.493
in some, or we'll say here,

03:00.660 --> 03:04.770
your job is to aggregate the answers

03:04.770 --> 03:07.083
and to provide a final answer.

03:08.880 --> 03:11.580
And then what we need to do is then put, use this prompt.

03:11.580 --> 03:13.470
So we're then gonna use that prompt.

03:13.470 --> 03:17.220
After that, then we are going to have a function in main,

03:17.220 --> 03:19.170
which is gonna provide some context.

03:19.170 --> 03:21.300
So they're looking for a job.

03:21.300 --> 03:23.100
The role is career coach,

03:23.100 --> 03:25.320
and we're gonna add some additional context here.

03:25.320 --> 03:30.320
So for example, like they currently have Next.js experience

03:30.540 --> 03:35.540
for two years and SQL experience for three years.

03:36.540 --> 03:40.140
They are thinking about whether to apply

03:40.140 --> 03:43.950
to a SQL role or a Next.js role.

03:43.950 --> 03:46.950
And then we also then need to specifically run

03:46.950 --> 03:48.030
our main function.

03:48.030 --> 03:50.250
So rather than just running it like this,

03:50.250 --> 03:54.543
we are going to use asyncio.run main.

03:55.920 --> 03:59.460
And then what's happening here is we have a context,

03:59.460 --> 04:02.850
we have a role, and then we are using asyncio.gather

04:02.850 --> 04:07.440
to basically run three LLM calls in parallel.

04:07.440 --> 04:09.720
And then once all of those come back,

04:09.720 --> 04:12.360
we are then aggregating all of those answers

04:12.360 --> 04:15.060
into a final answer and printing that out.

04:15.060 --> 04:17.760
So the idea here is that you have a series of steps

04:17.760 --> 04:21.000
that are being run in parallel, not sequentially,

04:21.000 --> 04:24.270
and then you take the outputs of all of those

04:24.270 --> 04:27.870
and you aggregate that with another LM or another function.

04:27.870 --> 04:30.840
And you can see here it's decided some information here

04:30.840 --> 04:35.010
and giving us some aggregated and synthesized response.

04:35.010 --> 04:37.770
So the idea here is that you're really trying to do

04:37.770 --> 04:40.290
lots of separate tasks in parallel

04:40.290 --> 04:43.410
and then synthesize those into a final output

04:43.410 --> 04:44.270
with another LLM.

04:44.270 --> 04:46.620
In the next video, we'll have a look at how we can use

04:46.620 --> 04:48.900
LLMs to be orchestration workers

04:48.900 --> 04:51.720
and to dynamically create tasks at runtime,

04:51.720 --> 04:53.910
which will then be handled and executed.

04:53.910 --> 04:55.660
Cool. I'll see you in the next one.