WEBVTT

00:00.510 --> 00:03.090
-: Hi, let's learn about parallelization.

00:03.090 --> 00:05.850
Let's install OpenAI.

00:05.850 --> 00:08.640
Just gonna get the OpenAI client.

00:08.640 --> 00:10.350
That's the main thing we need.

00:10.350 --> 00:14.490
So let me just paste this in and we're gonna run this.

00:14.490 --> 00:16.710
It's gonna ask you for the secret key

00:16.710 --> 00:21.710
and you're gonna go into your account into API keys

00:22.530 --> 00:25.680
and you're gonna get an API key and paste in here.

00:25.680 --> 00:30.680
So if you don't know where that is, then here is the link,

00:30.840 --> 00:33.390
platform.openai, api-keys.

00:33.390 --> 00:36.450
All right, so now we have the secret key locally.

00:36.450 --> 00:39.000
Now we can run our...

00:39.000 --> 00:44.000
So just going to create the clients.

00:45.210 --> 00:46.620
We're gonna create two clients.

00:46.620 --> 00:48.840
One is the client and one is the async_client,

00:48.840 --> 00:50.640
and I'll explain what that means.

00:50.640 --> 00:52.920
But broadly speaking, what we're gonna show you

00:52.920 --> 00:56.730
is how to do requests a normal way to OpenAI

00:56.730 --> 00:59.070
and how to do it asynchronously.

00:59.070 --> 01:02.100
Asynchronously just means it can run in parallel.

01:02.100 --> 01:06.021
That means you could run one, two, three, maybe 100

01:06.021 --> 01:09.660
or 1,000 API calls all at the same time.

01:09.660 --> 01:11.970
And it's just gonna be a lot faster

01:11.970 --> 01:14.850
than if you run them one after another.

01:14.850 --> 01:16.380
It might not be obvious to you

01:16.380 --> 01:18.060
if you didn't study computer science

01:18.060 --> 01:19.050
why that might be the case.

01:19.050 --> 01:21.170
But I think about it like a checkout line

01:21.170 --> 01:22.560
in the supermarket.

01:22.560 --> 01:24.300
If you just have one teller,

01:24.300 --> 01:26.310
one person checking the groceries

01:26.310 --> 01:28.110
before you go out the door,

01:28.110 --> 01:29.820
like, you know, if you have one person

01:29.820 --> 01:31.230
you can pay for the groceries,

01:31.230 --> 01:33.330
then every single person in the store

01:33.330 --> 01:35.940
has to wait up in a line for that person to finish,

01:35.940 --> 01:38.190
and they're processed one by one.

01:38.190 --> 01:39.870
If you hire more people in the checkout,

01:39.870 --> 01:42.240
and if you have 10 of those people

01:42.240 --> 01:43.920
where you can pay for your groceries,

01:43.920 --> 01:46.230
then they can process 10 people at a time.

01:46.230 --> 01:48.480
So even if one person's being slow,

01:48.480 --> 01:50.880
like, all the rest of the people don't have to wait.

01:50.880 --> 01:55.197
And that's essentially what asynchronous API calls is for.

01:55.197 --> 01:56.280
And what we're gonna do,

01:56.280 --> 01:58.950
I'm just gonna paste in a prompt here

01:58.950 --> 02:02.580
from a project that I run.

02:02.580 --> 02:05.910
So I run a company called Rally, askrally.com,

02:05.910 --> 02:08.460
and we did this experiment with media diets.

02:08.460 --> 02:10.350
And the way this experiment worked

02:10.350 --> 02:12.000
is we tried to predict the US election

02:12.000 --> 02:13.170
to see who voted for Trump,

02:13.170 --> 02:16.080
who voted for Kamala Harris using personas.

02:16.080 --> 02:18.780
So here we have Sophia Martinez,

02:18.780 --> 02:22.080
she's a 40-year-old emergency room nurse from San Francisco.

02:22.080 --> 02:23.940
I want to see who she would vote for,

02:23.940 --> 02:26.070
Kamala Harris or Donald Trump.

02:26.070 --> 02:28.950
And so that's the simple prompt.

02:28.950 --> 02:30.090
You know, our actual system prompt

02:30.090 --> 02:32.190
is a little bit more complexness, (chuckles)

02:32.190 --> 02:35.910
but it's like a basically decent version.

02:35.910 --> 02:39.580
So what we're gonna do is we're going to get the response

02:40.890 --> 02:42.970
and we're gonna do chat.completions.create.

02:42.970 --> 02:47.070
We're gonna pass in the model, which is GPT-4.1

02:47.070 --> 02:50.250
and we need to pass in that system prompt.

02:50.250 --> 02:53.060
So we're gonna put in...

02:54.360 --> 02:58.550
Oh sorry, I already have the content, so...

02:59.670 --> 03:01.080
Yeah, yeah, there we go.

03:01.080 --> 03:06.080
It's gonna be role system and then role user.

03:07.620 --> 03:09.750
The system prompt is going here

03:09.750 --> 03:13.800
'cause that's telling it how to act as the persona.

03:13.800 --> 03:15.690
And then you have the user prompt,

03:15.690 --> 03:17.970
which is the question that we're asking.

03:17.970 --> 03:20.790
The prompt format we're gonna use here...

03:20.790 --> 03:22.840
Let's just get that back as JSON

03:25.680 --> 03:29.310
and then gonna print the responses.

03:29.310 --> 03:30.720
We also want to time it as well.

03:30.720 --> 03:32.500
Let me just do the start time

03:36.930 --> 03:39.330
and that should all run.

03:39.330 --> 03:40.440
So we're gonna start the timer.

03:40.440 --> 03:41.760
We're gonna get this response

03:41.760 --> 03:44.370
and then we're gonna print out at the end.

03:44.370 --> 03:45.660
How long it took?

03:45.660 --> 03:48.000
Okay, so we've got an issue here.

03:48.000 --> 03:51.090
Ah yeah, we're passing on the wrong structure.

03:51.090 --> 03:52.560
Okay, here we go.

03:52.560 --> 03:54.847
So they're thinking about the vote.

03:54.847 --> 03:57.510
"Oh boy, it's not even a hard decision for me.

03:57.510 --> 04:00.090
Donald Trump brings back all those chaotic nights

04:00.090 --> 04:02.130
at the hospital mask fights, misinformation,

04:02.130 --> 04:03.240
the tension between coworkers."

04:03.240 --> 04:05.940
So not a big Donald Trump fan (chuckles)

04:05.940 --> 04:07.140
as she votes for Harris, right?

04:07.140 --> 04:09.900
So that vote came in two seconds,

04:09.900 --> 04:11.880
which is really great and it's quite fast.

04:11.880 --> 04:14.010
But if you needed to poll maybe 100

04:14.010 --> 04:16.680
or 1,000 of these personas,

04:16.680 --> 04:18.120
it's gonna take a long time.

04:18.120 --> 04:21.660
One of the things we've done with Rally is we've set it up

04:21.660 --> 04:25.110
so that it pulls lots of different personas all at once

04:25.110 --> 04:27.030
and I'll just kinda show you how that works.

04:27.030 --> 04:31.059
So what we're gonna do is gonna run_multiple_queries.

04:31.059 --> 04:32.880
We're gonna count the total time taken

04:32.880 --> 04:35.730
and we're gonna tally the votes, okay?

04:35.730 --> 04:37.770
For i in range of number of runs,

04:37.770 --> 04:40.203
so here, we wanna run 10 times.

04:41.220 --> 04:44.400
We're going to measure the start time.

04:44.400 --> 04:46.023
We're gonna get the response.

04:51.840 --> 04:53.030
And then...

04:54.990 --> 04:57.753
Yeah, we use the same json_object stop,

05:00.540 --> 05:02.760
so that makes it easier to tally.

05:02.760 --> 05:05.320
I'm gonna count the total time taken

05:06.210 --> 05:08.360
and then we also want to pass the response,

05:09.900 --> 05:12.270
so it's gonna count all the votes.

05:12.270 --> 05:15.243
All right, and let's figure out the average time,

05:16.080 --> 05:18.330
the results after a number of runs,

05:18.330 --> 05:21.870
so we'll count the votes and then the total time taken.

05:21.870 --> 05:23.540
Okay, so that's...

05:24.630 --> 05:28.113
It seems well, so let's actually run the function.

05:29.520 --> 05:31.680
And it's gonna run this 10 times.

05:31.680 --> 05:33.060
It's gonna tally the votes.

05:33.060 --> 05:34.950
So you can see it's taking more than two seconds now

05:34.950 --> 05:37.020
'cause we're running it 10 times.

05:37.020 --> 05:39.210
This is just to really show you how slow it can be

05:39.210 --> 05:40.770
to run things in sequence.

05:40.770 --> 05:42.993
And we're gonna show you async in a second.

05:47.280 --> 05:49.230
So if it takes two seconds each,

05:49.230 --> 05:51.873
it should take just over 20 seconds potentially.

05:53.730 --> 05:56.070
But some API calls take longer than others.

05:56.070 --> 05:58.890
So you can see here total time 26 seconds,

05:58.890 --> 06:01.230
took 2.6 seconds on average

06:01.230 --> 06:03.840
and it voted for Kamala nine times.

06:03.840 --> 06:05.520
But if you're trying different personas,

06:05.520 --> 06:07.080
you might get some votes for Trump.

06:07.080 --> 06:08.520
It's also really useful just to see

06:08.520 --> 06:12.000
how often this persona votes for Kamala or votes for Trump

06:12.000 --> 06:14.640
because then you can average out the votes

06:14.640 --> 06:17.943
and see if sometimes they vote for another candidate.

06:19.380 --> 06:22.290
All right, so now how do we do async?

06:22.290 --> 06:25.560
The way we do it is a little bit different.

06:25.560 --> 06:28.950
We say async def instead of just def.

06:28.950 --> 06:31.830
To create the formula, we have to make an async formula.

06:31.830 --> 06:34.620
So we're just gonna call this make a single query,

06:34.620 --> 06:37.680
measure the time, and then we're gonna get the response.

06:37.680 --> 06:40.770
And it's exactly the same and always

06:40.770 --> 06:43.980
except for just one small thing,

06:43.980 --> 06:47.340
which is we have to await the response

06:47.340 --> 06:49.740
and we have to use the async_client.

06:49.740 --> 06:51.540
So that if we try and use the normal client,

06:51.540 --> 06:52.830
that's not gonna work.

06:52.830 --> 06:56.580
But in this case, you just need async_client

06:56.580 --> 06:58.020
and you just need to await.

06:58.020 --> 07:00.420
And if you use await within a function,

07:00.420 --> 07:02.760
but you have this async at the beginning,

07:02.760 --> 07:04.680
then that's gonna give you an error as well.

07:04.680 --> 07:07.080
But that's really only major difference

07:07.080 --> 07:08.703
for creating the function.

07:10.080 --> 07:12.720
And then we're gonna return the vote and the time taken.

07:12.720 --> 07:15.000
All right, so this is an async function

07:15.000 --> 07:18.030
and that means you can run this multiple times in parallel.

07:18.030 --> 07:19.140
So let's do that.

07:19.140 --> 07:22.143
So let's just say run_multiple_queries.

07:24.479 --> 07:26.070
And then the way that you do async

07:26.070 --> 07:28.860
to make it run concurrently

07:28.860 --> 07:30.570
is that you create the tasks.

07:30.570 --> 07:35.460
So here we're making a single query for the range, right?

07:35.460 --> 07:38.310
So this is gonna make 10 tasks.

07:38.310 --> 07:41.433
And then you gather at the end with asyncio.

07:42.450 --> 07:46.980
We just do a wait asyncio.gather and then *tasks.

07:46.980 --> 07:48.780
And what that does is it spreads out the tasks,

07:48.780 --> 07:52.170
it will run all 10 of them at the same time, so that's...

07:52.170 --> 07:54.210
You can think of this as like setting the clock.

07:54.210 --> 07:57.060
It's gonna start running and then it's gonna finish

07:57.060 --> 08:00.600
and you'll get the results when all 10 have completed

08:00.600 --> 08:03.780
or if one of them failed, whatever it is.

08:03.780 --> 08:05.460
Cool, we're gonna count our votes.

08:05.460 --> 08:07.890
We're gonna count the times as well

08:07.890 --> 08:09.990
and then we're gonna iterate through them.

08:11.370 --> 08:12.930
But yeah, it looks like it's predicting

08:12.930 --> 08:14.760
the right things here.

08:14.760 --> 08:15.593
Just looking at the code.

08:15.593 --> 08:17.607
So it's gonna tell you the votes for us.

08:17.607 --> 08:19.380
It's gonna tell you the individual times.

08:19.380 --> 08:21.990
It's gonna give us the average individual time

08:21.990 --> 08:24.940
and then let's just like print out some stuff here as well.

08:26.813 --> 08:29.370
All right, so now we're gonna run this async.

08:29.370 --> 08:30.570
Let's see if this works.

08:36.300 --> 08:37.740
There we go. It was much faster.

08:37.740 --> 08:38.700
It took four seconds.

08:38.700 --> 08:42.000
What this means is it's on average gonna...

08:42.000 --> 08:45.720
Like the whole 10 queries is gonna take

08:45.720 --> 08:47.640
just as long as like the longest query.

08:47.640 --> 08:49.380
'Cause they all start at the same time,

08:49.380 --> 08:50.640
like starting a race, right?

08:50.640 --> 08:52.260
They all start on the same startling line

08:52.260 --> 08:54.720
and then when the last one crosses the finish line,

08:54.720 --> 08:56.670
that's when we finish.

08:56.670 --> 08:59.670
So sometimes the responses take a long time.

08:59.670 --> 09:02.220
Like, on average it was 2.6 seconds this time

09:02.220 --> 09:04.080
or on average it was 3.6 seconds.

09:04.080 --> 09:07.500
But in this case, the longest one finished at 4.3.

09:07.500 --> 09:10.530
So it was much, much faster than what we had before.

09:10.530 --> 09:12.660
All right, where it took 26 seconds

09:12.660 --> 09:14.310
to do the same number of queries.

09:15.210 --> 09:17.850
There is a limit to how many you can run per minute.

09:17.850 --> 09:18.780
It changes all the time,

09:18.780 --> 09:20.850
so I'd check that in the documentation.

09:20.850 --> 09:23.190
You could run hundreds or thousands of these.

09:23.190 --> 09:26.040
You can see Kamala got 10 votes, we've got the same result,

09:26.040 --> 09:27.870
but we just ran them in parallel.

09:27.870 --> 09:30.030
Really useful concept to understand.

09:30.030 --> 09:32.670
Definitely dig a bit deeper into that if you're interested.

09:32.670 --> 09:35.190
But we have the ability now

09:35.190 --> 09:38.760
to run lots and lots of things all at once

09:38.760 --> 09:41.730
and that is very helpful for specific cases,

09:41.730 --> 09:43.440
like what we have with Rally.

09:43.440 --> 09:45.240
For example, I just ran a project

09:45.240 --> 09:46.800
for, like, a custom project for a client

09:46.800 --> 09:51.800
where we were running a survey for 5,000 AIs.

09:52.170 --> 09:54.900
So 5,000 AI personas like this.

09:54.900 --> 09:58.020
And you can imagine if we took two seconds each

09:58.020 --> 10:00.090
for 5,000 personas,

10:00.090 --> 10:01.620
that's gonna take us a couple of days.

10:01.620 --> 10:04.560
Whereas, you know, I could run it in 10 minutes or something

10:04.560 --> 10:06.240
because I was running it asynchronous

10:06.240 --> 10:09.450
and I was doing a set of personas at a time.

10:09.450 --> 10:11.520
I think I was running 100 at a time

10:11.520 --> 10:14.280
and then that only let me run the whole thing

10:14.280 --> 10:17.160
in just a few batches, which was really great.

10:17.160 --> 10:19.680
All right, so hey, hopefully you find this useful

10:19.680 --> 10:20.880
and you think about this

10:20.880 --> 10:24.003
when you're running a long query like that.
