WEBVTT

00:00.690 --> 00:03.153
-: All right, I'm gonna walk you through a technique

00:03.153 --> 00:05.137
that I call Personas of Thought.

00:05.137 --> 00:08.340
I think that's just a play on the chain of thought idea

00:08.340 --> 00:10.470
where you would get an LLMC,

00:10.470 --> 00:13.200
then some tokens to think about the task first

00:13.200 --> 00:14.250
before doing it.

00:14.250 --> 00:18.420
And this adaption that I've been using more recently is

00:18.420 --> 00:20.430
that you ask it to come up with a bunch

00:20.430 --> 00:22.701
of different personalities or personas.

00:22.701 --> 00:25.903
Personas are experts who will all have

00:25.903 --> 00:28.920
a different opinion on the task,

00:28.920 --> 00:31.470
and then you can aggregate their opinions together

00:31.470 --> 00:32.610
to get the final result.

00:32.610 --> 00:34.560
And I find that it's really good for the rest

00:34.560 --> 00:37.050
of your thought and uniqueness of insights,

00:37.050 --> 00:40.470
rather than just the generic bland stuff

00:40.470 --> 00:42.390
that you get from an LLM.

00:42.390 --> 00:46.830
So here's a code that I'm running here.

00:46.830 --> 00:48.660
Just have a question that I want to ask.

00:48.660 --> 00:50.310
Like I'm trying to come up with an idea

00:50.310 --> 00:52.860
of whether I should use this product name

00:52.860 --> 00:54.690
for a new product I invented,

00:54.690 --> 00:56.430
and I just call OpenAI.

00:56.430 --> 00:57.480
This is (indistinct),

00:58.680 --> 01:00.750
and the prompt is very simple.

01:00.750 --> 01:02.370
It's just I want a paragraph response

01:02.370 --> 01:04.260
to the following question.

01:04.260 --> 01:07.350
And the typical sycophantic stuff, it's like,

01:07.350 --> 01:09.360
it's a strong choice, it's really good.

01:09.360 --> 01:11.640
And yeah, it's catchy and easy to remember,

01:11.640 --> 01:13.350
and I actually think this is a good name.

01:13.350 --> 01:16.140
But yeah, this is a typical LLM response.

01:16.140 --> 01:19.800
It's, you know, very rarely is useful information

01:19.800 --> 01:22.260
if you just prompt it naively.

01:22.260 --> 01:25.747
So the next one we're gonna try is the experts prompt.

01:25.747 --> 01:29.310
I'm gonna run this, but this is very different.

01:29.310 --> 01:31.380
I want a paragraph response to the following question.

01:31.380 --> 01:32.910
But first, I want you to name a number

01:32.910 --> 01:34.860
of world-class experts past or present

01:34.860 --> 01:36.870
who would be great at answering this.

01:36.870 --> 01:38.070
Then for each expert,

01:38.070 --> 01:40.080
I want you to answer critically from their perspective,

01:40.080 --> 01:41.580
given their background experience.

01:41.580 --> 01:43.530
And then finally combine all of that together

01:43.530 --> 01:46.090
into a single response, as if they'd collaborated

01:47.040 --> 01:49.890
in writing a joint anonymous answer.

01:49.890 --> 01:52.200
And I've given it some structure in terms of the format.

01:52.200 --> 01:54.150
So we'll see what we get.

01:54.150 --> 01:57.660
It came up with, in this case, a Nike designer,

01:57.660 --> 01:59.850
the former FDA commissioner, an invent,

01:59.850 --> 02:01.800
the Harvard Business School Professor,

02:01.800 --> 02:04.290
Tim Ferriss, Tony Hsieh.

02:04.290 --> 02:06.240
These are all fairly relevant, which is great,

02:06.240 --> 02:07.073
and it's good at coming up

02:07.073 --> 02:09.930
with good ideas here for inspiration,

02:09.930 --> 02:12.600
and then as given individual feedback free.

02:12.600 --> 02:14.670
Tinker Hatfield, thought it was good,

02:14.670 --> 02:17.670
but this one, this guy David Kessler,

02:17.670 --> 02:18.810
I was worried about that,

02:18.810 --> 02:21.480
whether it resonates with consumers health needs,

02:21.480 --> 02:23.040
and that makes sense

02:23.040 --> 02:26.070
because he's an expert in consumer health.

02:26.070 --> 02:28.680
So that's the sort of thing that just wouldn't come up

02:28.680 --> 02:31.620
and didn't come up in the previous feedback

02:31.620 --> 02:32.940
that we got from the LLM,

02:32.940 --> 02:34.140
but because it's been pushed

02:34.140 --> 02:36.960
to get a diverse crowd of opinions,

02:36.960 --> 02:40.080
and then it ends up being a quite useful opinion.

02:40.080 --> 02:41.115
So then you can go through

02:41.115 --> 02:43.800
and you can look and see which feedback you agree with

02:43.800 --> 02:45.390
and I find that that's really great

02:45.390 --> 02:46.920
for brainstorming in general,

02:46.920 --> 02:50.640
but then also we get this final response at the end

02:50.640 --> 02:54.120
and it has some of that diversity of thought in here.

02:54.120 --> 02:55.740
It says, "However, it's essential the name

02:55.740 --> 02:57.300
is distinctive enough,"

02:57.300 --> 03:00.450
and then, yeah, it doesn't specifically mention

03:00.450 --> 03:02.220
the health benefits in this case,

03:02.220 --> 03:05.280
but it's not very sycophantic,

03:05.280 --> 03:08.490
like it does bring a bunch of these ideas together,

03:08.490 --> 03:10.020
which is really helpful.

03:10.020 --> 03:11.580
I find this technique works really well.

03:11.580 --> 03:14.640
Now, this is another type of prompt

03:14.640 --> 03:18.450
which is the exact same thing as the experts prompt,

03:18.450 --> 03:19.890
but with just regular people,

03:19.890 --> 03:22.290
so Personas Prompt is what I call it.

03:22.290 --> 03:24.960
Name a number of demographic personas

03:24.960 --> 03:26.430
who would be relevant for answering this.

03:26.430 --> 03:29.040
What we're trying to do is get like a crowd of people,

03:29.040 --> 03:31.140
and instead of them being experts,

03:31.140 --> 03:33.150
we just want regular people.

03:33.150 --> 03:35.460
And what are the relevant demographics?

03:35.460 --> 03:39.480
So in this case, we do have a footwear industry expert,

03:39.480 --> 03:41.580
but we also have a busy parent and an athlete

03:41.580 --> 03:43.980
'cause they're gonna have different perspectives.

03:43.980 --> 03:45.930
Then it's a fashion conscious person.

03:45.930 --> 03:49.380
So we're gonna tackle this from multiple angles.

03:49.380 --> 03:52.230
You can say it does sound, you know, direct and trendy,

03:52.230 --> 03:55.020
but it doesn't specifically mention footwear.

03:55.020 --> 03:57.900
Omnifit might be too vague, in this case.

03:57.900 --> 03:59.850
I suggest convenience for busy parents,

03:59.850 --> 04:02.370
however, it doesn't clarify the product's purpose.

04:02.370 --> 04:05.790
Getting a much better broad sense

04:05.790 --> 04:07.680
of the problems with this name,

04:07.680 --> 04:10.590
which I think you didn't really get in the previous ones.

04:10.590 --> 04:13.080
And then, in this case, it says we do think

04:13.080 --> 04:14.790
it's a promising name,

04:14.790 --> 04:17.670
however, it needs to clearly convey

04:17.670 --> 04:20.310
the product's unique selling position.

04:20.310 --> 04:23.280
I think that that's a much more powerful final response,

04:23.280 --> 04:24.690
in my opinion.

04:24.690 --> 04:25.523
Cool.

04:25.523 --> 04:26.400
So bringing all that together,

04:26.400 --> 04:28.893
like here are the three responses side by side,

04:29.760 --> 04:31.740
and in general, whenever I run this,

04:31.740 --> 04:33.930
even for any type of question,

04:33.930 --> 04:35.580
I'm finding I'm getting much better results

04:35.580 --> 04:36.420
with the persona one

04:36.420 --> 04:39.510
and sometimes the expert one as well.

04:39.510 --> 04:42.540
We can judge this, like we can use an LLM judge.

04:42.540 --> 04:44.940
In this case, this is just another prompt

04:44.940 --> 04:47.070
where I'm asking it to compare the two responses,

04:47.070 --> 04:50.010
and I want to make sure that the responses are unique,

04:50.010 --> 04:51.210
that they're non-obvious

04:51.210 --> 04:53.610
and that they sound human, not like an AI.

04:53.610 --> 04:55.890
And then it's gonna choose which one is better.

04:55.890 --> 04:59.430
This is like a common format for judging responses

04:59.430 --> 05:02.910
because I wanna run this across a bunch of questions

05:02.910 --> 05:04.560
and not just one question,

05:04.560 --> 05:07.410
and I don't wanna just like manually check each one.

05:07.410 --> 05:10.380
So I'm automating the checking, in this case.

05:10.380 --> 05:13.890
This code basically just runs,

05:13.890 --> 05:15.810
it just runs the judge prompt

05:15.810 --> 05:19.263
and we can see if we just run this now.

05:20.825 --> 05:24.570
In the naive versus experts prompts,

05:24.570 --> 05:26.790
I chose the experts, number two.

05:26.790 --> 05:29.220
In naive versus personas, it chose number two.

05:29.220 --> 05:32.040
And then in experts versus personas it chose number one.

05:32.040 --> 05:34.110
So what that means is they were both an improvement

05:34.110 --> 05:35.730
over the naive prompt,

05:35.730 --> 05:36.840
the first one we tried,

05:36.840 --> 05:39.390
but the experts one was better.

05:39.390 --> 05:41.100
And it comes with some reasoning here.

05:41.100 --> 05:43.260
You can see like why it chose that,

05:43.260 --> 05:45.150
but it says response one provided

05:45.150 --> 05:48.360
more comprehensive analysis, which is useful.

05:48.360 --> 05:49.193
Cool.

05:49.193 --> 05:50.130
You'll get something different when you run it

05:50.130 --> 05:51.870
and especially if you change the question,

05:51.870 --> 05:56.870
but I find the judge also is pretty good at agreeing

05:57.630 --> 05:59.370
with what my assessment would be,

05:59.370 --> 06:01.410
which in this case I did think

06:01.410 --> 06:03.630
that the personas one was a little bit better,

06:03.630 --> 06:05.850
but in this case, it's pretty clear

06:05.850 --> 06:08.940
that they're both better than the naive prompt.

06:08.940 --> 06:10.920
All right, so bring those together.

06:10.920 --> 06:12.090
I'm just gonna run this.

06:12.090 --> 06:15.360
All this does is take all the things we just did

06:15.360 --> 06:17.490
and put it into a function called run_test

06:17.490 --> 06:19.050
that takes in a question.

06:19.050 --> 06:21.960
And we're just gonna run through every question here.

06:21.960 --> 06:23.177
So there's five different questions.

06:23.177 --> 06:26.220
And we just wanna see generally how often does

06:26.220 --> 06:29.880
the persona prompt win versus the expert prompt.

06:29.880 --> 06:34.110
And so, right now we're just setting up the naive response,

06:34.110 --> 06:36.330
and then setting up the expert's response,

06:36.330 --> 06:38.100
and then setting up the personas response

06:38.100 --> 06:39.720
of that same question,

06:39.720 --> 06:43.560
and then it judges them over here.

06:43.560 --> 06:45.030
This is where it's judging.

06:45.030 --> 06:46.560
I mean it's printing stuff out for us,

06:46.560 --> 06:48.690
and then it returns to final results.

06:48.690 --> 06:52.500
So you can see that, in this case, we have the question,

06:52.500 --> 06:54.030
what's the best way to learn a new skill?

06:54.030 --> 06:56.280
And the naive response is useful,

06:56.280 --> 06:57.780
but the expert response,

06:57.780 --> 06:59.580
it pulled in all these heavyweights.

06:59.580 --> 07:00.413
All right.

07:00.413 --> 07:02.670
Pulled in Malcolm Gladwell, Angela Duckworth,

07:02.670 --> 07:05.250
Ericsson K. Anders, Tim Ferriss.

07:05.250 --> 07:06.780
That's the first, it's pretty common.

07:06.780 --> 07:08.790
Actually, all these questions,

07:08.790 --> 07:10.173
but also people like Rich,

07:11.190 --> 07:13.230
and they all have a different opinion.

07:13.230 --> 07:14.569
Malcolm Gladwell talks his shop.

07:14.569 --> 07:17.100
You know, he know talks about the 10,000 hours.

07:17.100 --> 07:18.630
Angela Duckworth talks about grit,

07:18.630 --> 07:19.950
and her book was called "Grit."

07:19.950 --> 07:21.390
And so yeah, as you expect,

07:21.390 --> 07:24.480
but because you're forcing it to be a little bit broader,

07:24.480 --> 07:26.733
it does tend to work a lot better.

07:52.020 --> 07:53.670
You can see, at the end here,

07:53.670 --> 07:58.670
the actual personas response is quite useful.

08:00.960 --> 08:02.743
Actually, we can pause, open a text editor,

08:02.743 --> 08:07.743
and we can see here goes the final response for personas.

08:10.020 --> 08:11.880
And yeah, I think this is really useful.

08:11.880 --> 08:13.050
Like it talks about structural learning,

08:13.050 --> 08:15.420
hands on practice, community engagement.

08:15.420 --> 08:16.980
This is the sort of thing I would recommend

08:16.980 --> 08:18.450
people do as well.

08:18.450 --> 08:20.550
And practical skills, diving into real world projects

08:20.550 --> 08:21.810
and seeking mentorship.

08:21.810 --> 08:24.090
That's exactly what I do to learn a new skill.

08:24.090 --> 08:26.490
And you can see as well that, in this case,

08:26.490 --> 08:28.290
the personas prompt beat everything.

08:28.290 --> 08:31.860
The experts beat naive, and then personas beat experts.

08:31.860 --> 08:34.170
Let's go on through the rest of the questions now.

08:34.170 --> 08:35.820
We'll see how it gets at the end.

08:46.080 --> 08:47.340
What we're gonna do at the end is

08:47.340 --> 08:49.920
just add up how many wins each one got

08:49.920 --> 08:52.110
and then also how many times they would test it

08:52.110 --> 08:54.183
when we run naive_vs_experts.

08:55.285 --> 08:58.350
And then we'll count one for naive, in terms of trials,

08:58.350 --> 09:01.230
and one for experts, in terms of trials.

09:01.230 --> 09:03.810
And then we're also gonna count, in this case,

09:03.810 --> 09:05.940
if naive won,

09:05.940 --> 09:08.190
and then we're gonna increment the wins for naive,

09:08.190 --> 09:10.200
or if experts won, then we're gonna increment the wins.

09:10.200 --> 09:14.550
So what will happen is we'll get the accurate percentage

09:14.550 --> 09:16.470
of how many times did it

09:16.470 --> 09:20.340
when it was being tested against one of the other prompts.

09:20.340 --> 09:22.793
And we're just gonna see what that looks like at the end.

09:36.030 --> 09:38.160
Okay, everything's finished running now.

09:38.160 --> 09:42.090
So I'm just gonna run this and here we go.

09:42.090 --> 09:45.030
So on average, across the different trials,

09:45.030 --> 09:48.060
the naive prompt only won one in 10 times.

09:48.060 --> 09:51.360
So sometimes it is the better one, but it's pretty rare.

09:51.360 --> 09:53.580
Experts prompt was much better.

09:53.580 --> 09:55.470
60% of the times they won.

09:55.470 --> 09:59.136
And the personas prompt won 80% of the times in the trials.

09:59.136 --> 10:01.290
This is pretty common, actually,

10:01.290 --> 10:03.000
I'm finding across different questions.

10:03.000 --> 10:05.250
I try this on when I'm measuring it,

10:05.250 --> 10:08.250
the personas prompt actually tends to be the best

10:08.250 --> 10:09.690
and it's like wisdom of the crowds,

10:09.690 --> 10:10.560
so I'm not sure.

10:10.560 --> 10:11.910
I think with the experts prompt,

10:11.910 --> 10:14.820
it does tend to be better than what you naively get,

10:14.820 --> 10:17.010
but it sometimes goes a bit over the top

10:17.010 --> 10:21.540
with the details of what Tim Ferriss,

10:21.540 --> 10:23.430
or what Daniel Kahneman, or whatever.

10:23.430 --> 10:25.590
All these famous people will specifically push

10:25.590 --> 10:26.970
like their 10,000 hours,

10:26.970 --> 10:30.600
and I think that harms the diversity of thoughts sometimes.

10:30.600 --> 10:32.400
Whereas when it's looking for regular people

10:32.400 --> 10:33.900
and it's having to be a bit more creative

10:33.900 --> 10:36.390
with how those people would approach

10:36.390 --> 10:38.100
answering that question.

10:38.100 --> 10:40.980
Also this is just meant to be a way

10:40.980 --> 10:43.020
to demonstrate how it works.

10:43.020 --> 10:45.120
I use this technique in lots of different ways.

10:45.120 --> 10:46.623
So just like chain of thought,

10:47.852 --> 10:50.580
it doesn't necessarily have to be applied exactly like this.

10:50.580 --> 10:52.710
It's just more of a concept that you can try.

10:52.710 --> 10:54.540
And you can do this in ChatGPT as well.

10:54.540 --> 10:58.020
But in general, what I tend to do is

10:58.020 --> 11:01.500
I'll create the personas first as like a separate step,

11:01.500 --> 11:04.380
and then I will run each one of those in parallel,

11:04.380 --> 11:05.700
and then I'll aggregate at the end.

11:05.700 --> 11:08.100
So that tends to be how I approach it.

11:08.100 --> 11:09.960
I just wanted to show a simple version here

11:09.960 --> 11:12.840
where it just does it all in more prompt.

11:12.840 --> 11:13.673
Cool.

11:13.673 --> 11:15.290
Yeah, hopefully that's useful for you guys.