WEBVTT

00:00.440 --> 00:00.860
A motion.

00:00.860 --> 00:07.520
Prompting is one of my favorite techniques because it's quite fun and easy and it just works.

00:07.520 --> 00:14.480
Quite often it's only a small percentage increase, typically in accuracy, maybe around 10% 5 to 10%.

00:14.480 --> 00:16.640
But it's very easy to implement.

00:16.640 --> 00:23.300
And what we're talking about when we talk about emotion prompting is specifically this paper.

00:23.300 --> 00:26.510
Large language models understanding can be enhanced by emotional stimuli.

00:26.510 --> 00:34.670
So the trick is essentially use things from psychology that motivate people emotionally and apply them

00:34.670 --> 00:35.390
to llms.

00:35.390 --> 00:41.660
And the reason this works is because they've learned from the internet that when someone puts this is

00:41.660 --> 00:48.290
very important to my career in a message, then the response that follows is slightly more diligent,

00:48.290 --> 00:48.470
right?

00:48.470 --> 00:54.710
Like it is, people actually do pay attention to emotional cues and in terms of how important a task

00:54.710 --> 00:57.380
is and then how much they should respond.

00:57.420 --> 01:02.610
Another thing that falls in this bucket is when people put instructions in all caps locks, or they

01:02.610 --> 01:08.460
threaten to kill a person if we don't have this response return in JSON.

01:08.550 --> 01:09.360
Things like that.

01:09.480 --> 01:12.390
That was a lot more common back in the day.

01:12.390 --> 01:13.740
It's less needed now.

01:13.740 --> 01:14.430
To be honest.

01:14.430 --> 01:19.350
It's much better at instruction following, but it's still something that's worth testing.

01:19.380 --> 01:23.100
Um, one of the really famous ones was developer used to put.

01:23.100 --> 01:31.350
please output the full code because I don't have any fingers and and it did actually work in terms of

01:31.350 --> 01:34.320
getting GPT four to output the full code.

01:34.410 --> 01:40.440
So anyway, that's a really fun one for you to test, but I'm just going to give you a really simple

01:40.440 --> 01:41.700
example here.

01:41.790 --> 01:45.180
So we just have a get completion function.

01:45.180 --> 01:46.440
Just returns the response.

01:46.440 --> 01:52.000
And what we've asked for is a 2000 word detailed explanation of photosynthesis.

01:52.030 --> 01:55.690
What you'll find quite often is that when you count the words.

01:55.690 --> 01:57.160
So this is our evaluation metric.

01:57.250 --> 02:00.370
Just splitting the text and then counting the length.

02:00.370 --> 02:05.050
So that should give us a rough approximate approximation of the words, is that it doesn't get anywhere

02:05.050 --> 02:06.280
near 2000 words.

02:06.280 --> 02:10.660
Most LMS will only write about 1000 words, or just over a thousand.

02:10.660 --> 02:13.030
And you can see that here.

02:13.390 --> 02:18.400
We're only getting 700 words on photosynthesis, But our emotion prompt is working.

02:18.400 --> 02:19.810
It is actually much better.

02:19.810 --> 02:24.430
It's 1069 words and this is non-deterministic.

02:24.430 --> 02:27.820
So if you run this again you're going to get a different result.

02:27.820 --> 02:33.130
But we want to see on average does the emotion prompt improve performance.

02:33.130 --> 02:35.920
And there's two things by the way that make this an emotion prompt.

02:35.920 --> 02:42.880
One is that it's all in caps locks, this kind of interpreted or it's interpreted as a shouting in,

02:42.880 --> 02:46.330
in text, like on the internet, if you put something in all caps.

02:46.450 --> 02:48.700
And then we've also said all someone dies.

02:49.180 --> 02:51.580
So here we go, 739 words.

02:51.580 --> 02:54.310
And let's see if the emotion prompt does better.

02:54.400 --> 02:54.580
Yeah.

02:54.580 --> 02:55.480
So it does better.

02:55.480 --> 02:57.370
Again 949 words.

02:57.850 --> 02:58.120
All right.

02:58.210 --> 02:59.260
This isn't a real test though.

02:59.260 --> 03:02.080
We want to run this multiple times.

03:02.200 --> 03:07.120
Quite often I will do something like this where we set the runs to 30.

03:07.120 --> 03:10.710
And then we'll test the standard prompt 30 times.

03:10.710 --> 03:14.190
So you're going to test the emotion prompt 30 times and then average the results.

03:14.190 --> 03:19.170
And the reason why you do this asynchronous is that it would take a really long time to do one after

03:19.170 --> 03:23.850
another, whereas this allows us to test all 30 at the same time.

03:23.850 --> 03:26.340
And they all come back roughly at the same time as well.

03:26.820 --> 03:27.960
So it's much quicker.

03:27.960 --> 03:34.470
I'm just going to hit this and run, and Asyncio will handle the async part for us, which is nice,

03:34.470 --> 03:39.260
and we're going to get this nice average word count at the end for this standard prompt and for the

03:39.260 --> 03:40.310
emotion prompt.

03:43.760 --> 03:45.380
So this is just running.

03:45.980 --> 03:48.620
It should take about half a minute something like that.

03:49.340 --> 03:49.730
Okay.

03:49.730 --> 03:51.800
So that took a little bit longer than expected.

03:51.830 --> 03:54.590
It's being slow today I guess at one minute 40s.

03:54.590 --> 04:03.960
But we can see that over the 30 runs we are getting 12% improvement in length just by adding this one

04:04.530 --> 04:07.500
little threat to the end here or someone dies.

04:08.220 --> 04:10.200
So that is emotion prompting.

04:10.200 --> 04:12.210
It's really easy to implement.

04:12.210 --> 04:18.450
You can test different threats if you'd like to, but essentially it's the same thing when you are looking

04:18.450 --> 04:19.470
for a small boost.

04:19.560 --> 04:26.010
You can sometimes get it from putting a bit of human emotion or motivation into the prompt.
