WEBVTT

00:00.000 --> 00:01.740
-: I'm gonna talk about SAMMO,

00:01.740 --> 00:04.380
which is a new library from Microsoft.

00:04.380 --> 00:06.870
It's quite useful for writing prompts

00:06.870 --> 00:08.397
and also testing and optimizing.

00:08.397 --> 00:10.860
And I'll walk you through this.

00:10.860 --> 00:13.890
The way that you call a model with SAMMO

00:13.890 --> 00:18.123
is you need to import runners OpenAIChat,

00:19.200 --> 00:20.640
this is for OpenAI.

00:20.640 --> 00:22.770
And then you just need the generate text component

00:22.770 --> 00:24.600
and the output component.

00:24.600 --> 00:28.020
And the way it works is you put in your model ID,

00:28.020 --> 00:29.700
your OpenAI API key,

00:29.700 --> 00:31.980
which here it's getting from the environment,

00:31.980 --> 00:35.070
and then that you also set up a cache file.

00:35.070 --> 00:36.600
So you don't need to worry too much about this.

00:36.600 --> 00:39.180
This basically just means every time you run,

00:39.180 --> 00:40.410
it's gonna look in that cache file.

00:40.410 --> 00:42.750
And if you've already run the exact same prompt before,

00:42.750 --> 00:44.640
then it's gonna get that result.

00:44.640 --> 00:47.160
And you can set a timeout so it does things like timeouts

00:47.160 --> 00:50.430
and retries automatically, which is quite nice.

00:50.430 --> 00:53.880
And I'm just gonna tell it's talk bug sheets

00:53.880 --> 00:57.450
and then tell it in my job.

00:57.450 --> 00:58.530
And here we go.

00:58.530 --> 01:01.140
So we just put the output, we put in that program

01:01.140 --> 01:03.570
that we made, which was with the generate text.

01:03.570 --> 01:06.300
This is our prompt and then this is our system prompt.

01:06.300 --> 01:08.400
And then we're just passing the winner,

01:08.400 --> 01:11.760
and we get a nicely formatted output here.

01:11.760 --> 01:13.800
Don't worry by the way, we don't have any inputs

01:13.800 --> 01:14.633
in this case.

01:14.633 --> 01:19.633
If we added a variable in here, then see something coming.

01:19.830 --> 01:21.480
Okay, cool.

01:21.480 --> 01:24.000
All right, so you're gonna see that in a second.

01:24.000 --> 01:27.330
The other thing you can do is you can pass in history.

01:27.330 --> 01:29.370
So if this is like a really easy way

01:29.370 --> 01:31.530
of handling the message stream,

01:31.530 --> 01:34.200
which is sometimes a bit of a pain.

01:34.200 --> 01:38.400
So here we're saying write a four line poem about my job,

01:38.400 --> 01:41.883
and we're passing the first program at the history.

01:43.260 --> 01:47.913
And then just gonna run that.

01:50.190 --> 01:51.023
And here we go.

01:51.023 --> 01:52.920
It's still using the Shakespeare stuff,

01:52.920 --> 01:54.240
which is really helpful.

01:54.240 --> 01:56.490
So it's talking about prompt engineering

01:56.490 --> 01:59.580
because I've told it my job in the previous message,

01:59.580 --> 02:01.440
but it's also still talking like Shakespeare,

02:01.440 --> 02:02.490
like the system prompt.

02:02.490 --> 02:03.940
I didn't have to add that in.

02:05.250 --> 02:06.840
Cool, so that's how that works.

02:06.840 --> 02:09.120
One thing that it does, which is quite fun

02:09.120 --> 02:11.130
and quite useful I think for debugging is

02:11.130 --> 02:16.130
you can take the outputs, and then you can do a plot trace.

02:16.800 --> 02:19.280
So if we just a...

02:29.130 --> 02:30.260
Just need to...

02:33.287 --> 02:36.630
Assign it to a variable, and then we can see,

02:36.630 --> 02:39.937
here we go, you can see, the first text we put in here,

02:39.937 --> 02:43.620
"Hello, my name is," and then the response.

02:43.620 --> 02:46.500
And then we pass that into generate text,

02:46.500 --> 02:48.988
and we have the other prompt here,

02:48.988 --> 02:50.460
and we get the final result.

02:50.460 --> 02:53.400
Just really helpful, especially if you have nested programs.

02:53.400 --> 02:56.880
Just very helpful to do this, in terms of the plots,

02:56.880 --> 03:00.460
and that's gonna make it a lot easier to debug more complex.

03:02.607 --> 03:06.690
And the other thing which I find quite useful is being able

03:06.690 --> 03:09.210
to run it in loops and kind of join stuff together.

03:09.210 --> 03:11.883
One thing you can do is use this union,

03:15.420 --> 03:20.420
port from components, port union, and then...

03:27.265 --> 03:29.130
we can see N equals five.

03:29.130 --> 03:31.380
We're gonna run through this five times

03:31.380 --> 03:35.523
and say fruits equals, generate text,

03:36.900 --> 03:38.940
what is the name of a fruit?

03:38.940 --> 03:43.923
And just going to look through that.

03:44.940 --> 03:49.940
We'll say randomness is 0.9, say seed equals I.

03:51.390 --> 03:53.580
So what this is gonna do is gonna give us a different

03:53.580 --> 03:56.250
response every time because we're setting a new seed

03:56.250 --> 04:00.967
each time and bringing everything together with this union,

04:04.140 --> 04:05.820
we could also assign this to a variable

04:05.820 --> 04:09.335
which might make it a little bit easier to understand.

04:09.335 --> 04:11.790
What we're doing is we're getting all the fruits

04:11.790 --> 04:15.540
and we're spinning up a new API call for each one

04:15.540 --> 04:17.940
and then the union brings them all together,

04:17.940 --> 04:21.120
so it makes more sense if we plot the program

04:21.120 --> 04:25.110
'cause now we've assigned the program to a variable,

04:25.110 --> 04:26.310
we can plot that program.

04:26.310 --> 04:30.360
So this is like the plot of the prompts

04:30.360 --> 04:32.125
that we looked at before.

04:32.125 --> 04:32.958
This actually shows like the inputs

04:32.958 --> 04:35.130
and outputs and the responses.

04:35.130 --> 04:37.800
But in this case, this is plotting the program.

04:37.800 --> 04:40.170
And what this library is trying

04:40.170 --> 04:43.140
to do really is just make prompts into programs.

04:43.140 --> 04:44.760
So there's inputs, there's outputs,

04:44.760 --> 04:47.550
so even if you haven't run it yet, you can see

04:47.550 --> 04:49.320
how are these different things joined together.

04:49.320 --> 04:51.900
And here we have four different generated texts.

04:51.900 --> 04:54.540
This has a seed of four, this has a seed of three.

04:54.540 --> 04:58.290
This has a seed of two, seed of one, seed of zero.

04:58.290 --> 05:00.120
And then we're gonna join all them together.

05:00.120 --> 05:03.993
And what we get is this nicely formatted output at the end.

05:05.010 --> 05:06.813
You can see these are the responses.

05:09.450 --> 05:11.460
Sometimes it just says apple, but then sometimes

05:11.460 --> 05:14.613
it gives you a bit of a longer response.

05:16.170 --> 05:17.003
Cool, okay.

05:17.003 --> 05:19.440
Now where it starts to get a little bit more complicated,

05:19.440 --> 05:23.790
but also more useful is in the stuff that it's done

05:23.790 --> 05:27.900
to make it easier to run through a bunch of responses

05:27.900 --> 05:30.090
and bring 'em all together into a batch.

05:30.090 --> 05:32.610
The way this works, I'm just gonna import,

05:32.610 --> 05:35.033
actually I'll copy and paste this first, quicker.

05:37.333 --> 05:39.690
Just gonna import for each as well as a template.

05:39.690 --> 05:41.610
And then extracting Regex.

05:41.610 --> 05:43.920
So these are helpful little programs,

05:43.920 --> 05:48.824
and what we're gonna do is, in this case,

05:48.824 --> 05:51.060
we're gonna make a bunch of OpenAI models

05:51.060 --> 05:52.170
as just as an example.

05:52.170 --> 05:57.170
So extract Regex, generate the text, and say generate a list

06:03.300 --> 06:04.950
Five models.

06:04.950 --> 06:09.950
I map each model in all text.

06:11.430 --> 06:12.263
And.

06:18.000 --> 06:22.443
And what we want to do is run, say, model blurbs.

06:25.680 --> 06:29.463
We're gonna create some output, we're gonna say ForEach.

06:30.420 --> 06:33.660
So it's gonna iterate through the different models.

06:33.660 --> 06:37.083
I'm gonna say each model.

06:42.690 --> 06:45.303
And then we're gonna say generate why is...

06:54.180 --> 06:55.013
good model?

06:57.390 --> 06:59.820
Cool, what have we done here?

06:59.820 --> 07:03.930
We are creating a, basically like a program

07:03.930 --> 07:08.430
that iterates through all of the models

07:08.430 --> 07:10.230
that we generate in this first prompt.

07:10.230 --> 07:14.550
Previously we were iterating through using the same prompt,

07:14.550 --> 07:16.500
here, what is the name of the fruit?

07:16.500 --> 07:19.110
Or in this case we could change this.

07:19.110 --> 07:24.110
We could say generate a name of the model from OpenAI.

07:25.260 --> 07:28.140
So you could change this to, you want to see,

07:28.140 --> 07:29.370
bring everything together.

07:29.370 --> 07:32.370
Now we didn't use any variables in this print,

07:32.370 --> 07:33.930
we just changed the seed.

07:33.930 --> 07:38.340
So we had to do a loop, but with SAMMO's ability

07:38.340 --> 07:42.540
to run things asynchronously, we didn't have to do a loop.

07:42.540 --> 07:45.723
We could run all of these things at once,

07:46.653 --> 07:48.000
and input the variables here.

07:48.000 --> 07:50.550
This is the inputs, it's just the model.

07:50.550 --> 07:52.050
And it's gonna bring in all

07:52.050 --> 07:53.640
these different models into them.

07:53.640 --> 07:56.140
And that's what we're doing with the ForEach.

07:56.140 --> 07:57.870
What this is gonna do is, it's gonna generate

07:57.870 --> 08:00.990
five different models, we're going to extract it

08:00.990 --> 08:03.630
by using this ExtractRegex, that's gonna pull all

08:03.630 --> 08:05.400
the different model tags together.

08:05.400 --> 08:07.530
And then we're gonna generate

08:07.530 --> 08:10.350
a description of that model for each.

08:10.350 --> 08:13.833
So it's a little bit of a different way of doing things.

08:15.870 --> 08:20.870
Error here, I have a problem with my Regex,

08:21.930 --> 08:23.180
just figure this one out.

08:26.010 --> 08:28.170
Oh I didn't actually give it the Regex.

08:28.170 --> 08:29.250
That would help.

08:29.250 --> 08:34.250
So if we need to pull different types

08:34.800 --> 08:39.060
of output from from the prompt description,

08:39.060 --> 08:42.060
and then we need to kind of tell it what the format is.

08:42.060 --> 08:44.430
And in this case we told it to wrap it in model tags.

08:44.430 --> 08:46.353
Actually we could say wrap each model,

08:49.408 --> 08:54.090
and/or, that way we know that each model

08:54.090 --> 08:56.280
that we get back is be wrapped to these tags,

08:56.280 --> 08:58.260
and therefore we can use the Regex to look for them.

08:58.260 --> 09:00.990
So that's gonna extract those five out.

09:00.990 --> 09:04.410
So hopefully you understand we're gonna see

09:04.410 --> 09:08.490
how that works now because we can do the trace.

09:08.490 --> 09:10.860
This is potentially complicated otherwise,

09:10.860 --> 09:13.380
but we can see it all happening.

09:13.380 --> 09:17.793
We have, in this case, this prompt,

09:18.690 --> 09:20.850
generate a list of five models.

09:20.850 --> 09:24.540
Now I wrapped them in the tags, and this is what it output.

09:24.540 --> 09:28.560
It said we've got GPT-3, GPT-4, DALL-E, Codex, and CLIP.

09:28.560 --> 09:32.070
Then we've extracted the Regex says given as this output,

09:32.070 --> 09:35.130
this list, 'cause these things were wrapped

09:35.130 --> 09:38.280
in the model tags and then it worked.

09:38.280 --> 09:39.840
And then we went through the ForEach,

09:39.840 --> 09:41.910
and for the ForEachs we had,

09:41.910 --> 09:44.190
so why is this model a good model?

09:44.190 --> 09:48.270
And then this is the prompt pulling in this format here.

09:48.270 --> 09:52.360
And then what that's gone through is

09:54.210 --> 09:57.750
hold everything together and then for every prompt

09:57.750 --> 10:01.743
we basically have the full response for each of them.

10:03.300 --> 10:04.833
So hopefully that makes sense.

10:18.300 --> 10:21.573
What we get in the end is, here we go, here's the app.

10:22.410 --> 10:25.200
GPT-3 excels due to its vast tracking data,

10:25.200 --> 10:26.910
advanced architecture, et cetera.

10:26.910 --> 10:30.570
DALL-E does this, CLIP does this.

10:30.570 --> 10:33.453
So everything is pulled at the end.

10:35.490 --> 10:37.170
Cool, so hopefully that makes sense.

10:37.170 --> 10:38.970
It depends on your preference in terms

10:38.970 --> 10:40.020
of how you'd like to do these things.

10:40.020 --> 10:43.200
But I think this is quite an interesting way to approach it

10:43.200 --> 10:47.130
where you're getting a full kind of visual understanding

10:47.130 --> 10:48.690
of how everything is coming together.

10:48.690 --> 10:50.850
And you can click into these and say,

10:50.850 --> 10:54.480
Hey look, GPT-4 is pulling in here,

10:54.480 --> 10:56.670
and then that's giving the description here,

10:56.670 --> 11:00.540
and then that's showing up here in the final thing.

11:00.540 --> 11:02.760
Like I find when you're do building these things

11:02.760 --> 11:05.820
in LangChain or DS-Pi, it's pretty hard

11:05.820 --> 11:10.470
to understand what's actually happening in these models

11:10.470 --> 11:12.180
or these kind of chains that you're bringing together.

11:12.180 --> 11:14.940
So even though this is still a little bit complicated

11:14.940 --> 11:17.010
to understand if you're not used to it,

11:17.010 --> 11:21.810
it is so much more useful once you get the hang of it,

11:21.810 --> 11:24.780
it's much more easy to, so part of the program

11:24.780 --> 11:27.060
like I just did here, you can see like

11:27.060 --> 11:28.680
how everything comes together,

11:28.680 --> 11:30.630
and the difference between plotting the program

11:30.630 --> 11:32.730
and plotting the prompt is that here

11:32.730 --> 11:34.830
we've actually generated the five models,

11:34.830 --> 11:36.540
whereas plotting the program,

11:36.540 --> 11:38.070
it's just the structure of the program.

11:38.070 --> 11:41.190
It's not, yeah, it hasn't actually generated the models yet,

11:41.190 --> 11:42.023
if that makes sense.

11:42.023 --> 11:43.860
It's just showing you what's gonna happen.

11:43.860 --> 11:45.360
You bring everything together.

11:47.002 --> 11:48.840
It's a little bit easier to understand.

11:48.840 --> 11:52.740
Okay, so let's make use of some of the paralyzation.

11:52.740 --> 11:55.110
The next thing we're gonna do is we're gonna bring

11:55.110 --> 12:00.110
in this data table.

12:03.240 --> 12:05.730
And what a data table lets you do is bring everything

12:05.730 --> 12:07.740
together and run it in parallel

12:07.740 --> 12:10.740
like I would do with Pandas data frame.

12:10.740 --> 12:12.330
This is quite often what I do,

12:12.330 --> 12:15.360
but we're gonna recreate the runner,

12:15.360 --> 12:19.570
except in this case we're gonna say rate set that

12:21.384 --> 12:25.503
as, so it's gonna run six things at a time,

12:27.180 --> 12:30.130
and just reiterate through some numbers.

12:34.499 --> 12:36.270
So the numbers you want to go through,

12:37.216 --> 12:38.287
through the numbers one six.

12:40.290 --> 12:45.060
And then let's create a variable for this output.

12:45.060 --> 12:50.060
So the output will be, we wanna generate text,

12:53.130 --> 12:58.130
and then template and then the template is,

12:59.340 --> 13:04.340
it's gonna be this, only the corresponding Greek letter

13:06.270 --> 13:07.413
in English.

13:10.410 --> 13:12.333
And then we're gonna give it the,

13:14.100 --> 13:16.200
sorry, it needs to be double curly braces.

13:17.340 --> 13:20.280
Okay, what are we doing here?

13:20.280 --> 13:22.430
Which is different from what we did before.

13:25.740 --> 13:29.820
So what we did before was we were relying

13:29.820 --> 13:33.720
on the output of that first prompt which gave us the models,

13:33.720 --> 13:35.180
and then we're stuffing those models.

13:35.180 --> 13:38.190
In this case we've created a list of numbers

13:38.190 --> 13:41.460
and put them through into our prompt template

13:41.460 --> 13:43.020
using the double curly braces.

13:43.020 --> 13:44.880
So it's just a little bit different

13:44.880 --> 13:46.830
in terms of the approach.

13:46.830 --> 13:50.670
We're gonna have a runner, and then we're gonna create

13:50.670 --> 13:53.253
a data table of those numbers.

13:54.300 --> 13:57.063
So this, hopefully it works.

13:57.930 --> 13:58.763
Yeah, here we go.

13:58.763 --> 14:02.280
So we can see that we get the inputs and the outputs,

14:02.280 --> 14:04.140
and it's pretty easy to understand.

14:04.140 --> 14:06.210
Now alpha, beta, gamma, delta, epsilon

14:06.210 --> 14:09.210
these are the Greek letters for each number.

14:09.210 --> 14:12.120
But you can see here where it says minibatches,

14:12.120 --> 14:14.670
it's basically running all five at the same time,

14:14.670 --> 14:15.630
asynchronously.

14:15.630 --> 14:18.810
So it's gonna be much faster if you're doing a lot of data.

14:18.810 --> 14:22.350
And I think a really powerful feature that SAMMO has.

14:22.350 --> 14:24.540
So that's like one of the major reasons

14:24.540 --> 14:26.310
why you would go through this extra effort

14:26.310 --> 14:29.400
to set it up in this way if you're not familiar,

14:29.400 --> 14:31.680
because I find that quite often I'm having

14:31.680 --> 14:34.410
to rewrite my code very early on,

14:34.410 --> 14:36.000
when I'm using the OpenAI API,

14:36.000 --> 14:37.260
then I'm like, oh, I have to go back

14:37.260 --> 14:39.900
and make this asynchronous 'cause it's super slow.

14:39.900 --> 14:42.270
But with SAMMO, it has that kind of out of the box,

14:42.270 --> 14:44.100
and you can set up a data table,

14:44.100 --> 14:46.560
and then you can just boom, smash through things,

14:46.560 --> 14:48.030
which is really nice.

14:48.030 --> 14:52.410
You can also create a data table from Pandas data frames.

14:52.410 --> 14:55.470
And then you have also an ability to optimize

14:55.470 --> 14:57.720
your AB test prompts with SAMMO as well,

14:57.720 --> 15:00.693
which we're gonna get into in the next sections.