WEBVTT

00:00.020 --> 00:02.180
Hunting is one of those common techniques.

00:02.180 --> 00:04.070
This is it's really easy to do.

00:04.100 --> 00:09.980
What it is, is just asking the lion to play the role of a specific celebrity or someone that made the

00:10.160 --> 00:14.660
that is a domain expert, but you could also just ask it to play a specific job role.

00:14.660 --> 00:18.020
So you as a math teacher, answer this question.

00:18.350 --> 00:22.310
There is some evidence that it actually does improve the intelligence of the model slightly.

00:22.310 --> 00:28.100
Say you can get better math scores when you tell it to act like a math teacher, but the way that I

00:28.100 --> 00:31.950
tend to use it is really just to get it in the right style.

00:32.040 --> 00:38.070
Because some things are subjective about our responses, and some organizations want something in the

00:38.070 --> 00:45.330
style of Hemingway, whereas another organization might want as something in the style Enid Blyton or

00:45.330 --> 00:47.010
whatever your preferences are.

00:47.040 --> 00:53.070
It is no good answer or right answer, but with role playing, you at least get to a good subjective

00:53.280 --> 00:54.270
preference.

00:54.510 --> 00:56.580
So let me show you an example.

00:57.060 --> 01:01.530
Now we have in here the Citron probe, the role setting and then the normal.

01:01.530 --> 01:03.000
And I use a prompt here.

01:03.270 --> 01:10.800
And just to show you how those things come together for the normal prompt, we're just passing in here.

01:10.800 --> 01:13.830
Brainstorm a list of product names for a shoe that fits any foot size.

01:14.010 --> 01:15.780
That's the task that we wanted to do.

01:16.050 --> 01:21.370
Whereas with the role raw front, we are telling that you are ill and Elon Musk, you're brainstorming

01:21.370 --> 01:24.070
new products and are given some examples.

01:24.280 --> 01:30.310
And this is what I think is necessary to make raw prompting work in Insead, because previously you

01:30.310 --> 01:34.720
should be able to just say you are Elon Musk or whatever, and it would do a decent job, or at least

01:34.720 --> 01:35.590
a better job.

01:35.980 --> 01:40.540
But what I'm finding now is that raw prompting doesn't make that much of a difference, unless you give

01:40.540 --> 01:44.930
it some examples of what your musk would would name this product.

01:44.990 --> 01:46.790
So I made up some fake products here.

01:46.820 --> 01:49.580
So this is a refrigerator that dispenses beer.

01:49.580 --> 01:55.670
And then I just lazily from the call it beer fridge act like space X or not a beer fridge like not a

01:55.670 --> 01:59.930
flamethrower, which showed products that Musk has invented or named.

01:59.930 --> 02:01.820
So that's that's really helpful.

02:01.820 --> 02:07.070
I think it's if you can even make some synthetic examples of what you want and put them in then and

02:07.070 --> 02:13.160
then, and you can get it into the type of character that you want with or that you want this example

02:13.160 --> 02:13.370
here.

02:13.490 --> 02:21.650
So I'm gonna run this and you'll see that if we look at the role play answer, we're getting something

02:21.650 --> 02:26.420
much like your massive universal fit shoe, completely fit all that.

02:26.900 --> 02:33.330
Whereas with the previous example, we were just asking it without all the all of these things, I were

02:33.330 --> 02:37.080
just getting adaptive fit, stride, universal fit, flexi fit.

02:37.080 --> 02:38.790
So I'm not getting Musk like names.

02:39.240 --> 02:41.280
So that's raw quantity in a nutshell.

02:41.280 --> 02:43.680
But that's not where it ends, right?

02:43.710 --> 02:45.420
If we need some evaluation metric.

02:45.420 --> 02:53.310
And typically for this type of evaluation metric you need an LM to check essentially whether there as

02:53.310 --> 02:59.620
a judge whether you are following IL must die or should we have another prompt here which says is a

02:59.620 --> 03:03.310
language model with knowledge of Elon Musk's style and naming convention, to evaluate the following

03:03.310 --> 03:07.750
product names and determine if they sound like they're created by Elon Musk.

03:08.200 --> 03:13.060
So we have the product name, and then we need to provide the evaluation as a percentage.

03:13.060 --> 03:19.040
And then we have some template here that we've said like product X or the product for free or not product.

03:19.520 --> 03:24.800
So I find that this really helped if you're giving it some examples, just like adding examples to previous

03:24.800 --> 03:26.240
prompts really helped as well.

03:26.660 --> 03:31.640
And what we're responding with here is instead of the product description and product names, were responding

03:31.640 --> 03:32.810
with an explanation.

03:32.810 --> 03:34.760
And then the Musk likelihood.

03:35.060 --> 03:40.100
Now the reason I do it this way round, by the way, is that if you ask it to provide an explanation

03:40.100 --> 03:46.610
first, or some analysis or thinking first, it does a better job of getting an accurate percentage

03:46.610 --> 03:51.560
or a reliable percentage that you know will come back again and again if you run it multiple times.

03:52.280 --> 03:55.820
And so this like a small chain of thought implementation.

03:56.390 --> 03:56.690
All right.

03:56.690 --> 03:58.160
So we're going to run that.

03:58.340 --> 04:06.170
I'm just going to see what we're doing is running the role play names with the SAT names and just seeing

04:06.750 --> 04:08.100
which ones are more muscular.

04:08.130 --> 04:09.240
All play names do really well.

04:09.240 --> 04:12.960
We have 90% 100% likely to be Elon Musk.

04:12.990 --> 04:14.370
We have to have the explanation.

04:14.370 --> 04:17.940
So if we don't agree how we can just check this and see.

04:18.150 --> 04:19.950
I think that that works really well.

04:20.070 --> 04:24.960
And then if we look at the standard name, we've got a 60%, 50%, 70%.

04:24.960 --> 04:26.820
So we can really tell the difference.

04:26.820 --> 04:31.990
Like these names still sound like Musk according to it because it's playful.

04:31.990 --> 04:37.630
Suggest innovation lacks a boldness, a whimsical nature often found in Musk's branding.

04:37.810 --> 04:38.050
Yeah.

04:38.050 --> 04:43.540
So yeah, we could optimize maybe the the percentages here by changing this prompt.

04:43.540 --> 04:45.730
And obviously this is a prompt as well.

04:45.730 --> 04:47.920
So we need to optimize this prompt too.

04:48.160 --> 04:54.340
But it's usually much easier to optimize an evaluation prompt because you can rewrite the percentages

04:54.340 --> 04:58.330
yourself and say, this should have been 20%, and then you could test again.

04:58.390 --> 05:02.140
This evaluator and then get to a prompt that gets it right.

05:02.140 --> 05:05.710
And that's accurately repeating the 20% rule.

05:06.370 --> 05:08.140
So now you have this.

05:08.140 --> 05:09.970
You could a B test your original prompt.

05:10.240 --> 05:14.980
Now you can evaluate your last name and then go back and just check whether it's actually following

05:14.980 --> 05:15.880
that rule or not.
