WEBVTT

00:00.160 --> 00:01.360
Hey there, Eden here.

00:01.360 --> 00:08.040
And in this video we're going to be discussing system prompts and their importance in context engineering.

00:08.200 --> 00:13.240
And I know you've probably read it a thousand times on Twitter and on LinkedIn.

00:13.280 --> 00:18.720
That system prompts are important and you should work on them, and you should iterate on them.

00:18.720 --> 00:20.600
And you should make them really, really good.

00:20.760 --> 00:27.120
And saying having a good system prompt is the most generic advice in AI engineering.

00:27.320 --> 00:32.960
So instead of saying to you again that system prompts are important, I want to show you some example

00:33.000 --> 00:37.280
system prompts of the state of the art agents out there.

00:37.480 --> 00:46.000
So this repository here contains all of these system prompts, leaked system prompts of all of the famous

00:46.000 --> 00:47.640
state of the art agents.

00:47.800 --> 00:54.200
And most of the agents here are coding agents like cloud code, like cursor, like Devin.

00:54.320 --> 01:00.880
But you can find here other agents like the comment assistant in the comment browser by perplexity.

01:02.360 --> 01:07.440
And this is a very popular repository here at the point of recording it.

01:07.480 --> 01:10.160
It has almost 90,000 stars.

01:11.000 --> 01:14.080
Let me go here, for example, to cloud code.

01:14.640 --> 01:17.840
And here you can see the cloud code system prompt.

01:19.560 --> 01:22.960
We can see we have here almost 200 lines of code here.

01:25.120 --> 01:28.000
And here we can see all of the tool descriptions.

01:28.160 --> 01:30.240
And this is 500 words.

01:30.360 --> 01:36.040
Now all of the tool descriptions here are going to be injected into the system prompt here.

01:36.240 --> 01:42.240
Let's go have a look on cursor and let's check out the cursor agent.

01:45.160 --> 01:47.760
This prompt is also around 200 lines of code.

01:48.080 --> 01:49.840
You can see here the structures.

01:50.240 --> 01:53.600
And let's go and check out Devin.

01:55.680 --> 02:00.480
And Devin has 400 lines of code here in the prompt here.

02:00.840 --> 02:08.390
And the goal of this video is not to go and to analyze each prompt and to tell you all of the techniques

02:08.390 --> 02:14.110
they used in the prompts, because we can have an entire course only dedicated to that.

02:14.110 --> 02:22.350
But my point here is just to show you that it's important, and this repository is continuously being

02:22.390 --> 02:30.790
updated because the system prompts keep evolving and because llms are evolving all the time, so does

02:30.790 --> 02:31.830
the system prompts.

02:31.830 --> 02:39.150
And a lot of engineering resources are going to keep curating those system prompts and making them better

02:39.150 --> 02:39.950
and better.

02:39.950 --> 02:42.550
And this is an iterative process.

02:42.550 --> 02:43.070
All right.

02:43.070 --> 02:49.470
So I hope by this part of the video you agree with me that system prompts are important.

02:49.470 --> 02:55.430
So I want to talk about a bit of best practices when curating and crafting those system prompts.

02:55.430 --> 03:02.390
And I like the analogy of a system prompt to giving someone directions to go somewhere.

03:02.790 --> 03:07.990
If we go say something like go over there, They'll be confused.

03:08.030 --> 03:09.910
They won't know where to go.

03:10.430 --> 03:18.590
However, if we give them a 50 page manual with every possible turn and street, and we're going to

03:18.630 --> 03:24.190
overwhelm them with information they won't be getting where they need to go as well.

03:24.270 --> 03:26.430
So here we want to be clear.

03:26.430 --> 03:27.870
We want to be specific.

03:28.150 --> 03:33.350
And we want to give just enough information to get there where they need to go.

03:33.830 --> 03:36.070
So here's comes the hard part.

03:36.110 --> 03:38.310
We need to find that sweet spot.

03:38.550 --> 03:46.030
So when we're going to be writing system prompts, we're looking to what anthropic refer to as the Goldilocks

03:46.030 --> 03:46.470
zone.

03:46.630 --> 03:53.070
So it's going to be not too vague, not too detailed, but it's going to be just about as right.

03:53.070 --> 03:59.750
So you can see this in this scale over here we have on the very far left too specific and on the very

03:59.750 --> 04:01.590
far right too vague here.

04:01.590 --> 04:03.750
So we want to be right here in the middle.

04:04.070 --> 04:10.020
So let's break down how this a specific prompt here at the very far left here.

04:10.460 --> 04:18.980
And the core problem here is that we are treating the LM like a deterministic state machine rather than

04:19.020 --> 04:20.900
an intelligent agent.

04:21.500 --> 04:24.420
And we hard code logic.

04:24.420 --> 04:31.660
So we can see here stuff like if user intent is incident resolution, ask three follow up questions.

04:31.860 --> 04:34.700
So why do we need exactly three questions.

04:34.740 --> 04:36.980
What if two is going to suffice?

04:37.100 --> 04:39.260
Or what if we need five questions here.

04:39.660 --> 04:42.700
And we can see we have exhaustive enumeration.

04:42.700 --> 04:48.140
So we list every possible escalation scenario which is impossible to complete.

04:48.140 --> 04:55.860
And it forces the model through predetermined paths that may not match the reality of the input.

04:56.220 --> 05:02.660
And it's also going to be a maintenance nightmare, because every new edge case is going to require

05:02.660 --> 05:04.780
to have prompt modification here.

05:04.780 --> 05:11.100
And by the way, if we have predetermined steps and maybe an agent, an autonomous agent here is not

05:11.100 --> 05:12.020
going to be the answer.

05:12.020 --> 05:14.220
Maybe we just need an authentic workflow.

05:16.020 --> 05:22.740
All right, let's move to the opposite end of the spectrum where we have a very vague prompt.

05:23.100 --> 05:30.660
And the core problem here is that we provide insufficient signal for consistent behavior.

05:30.980 --> 05:35.700
So in this example we have no actionable guidance.

05:35.860 --> 05:40.740
So assist in a manner consistent with the principles of essence of the company brand.

05:40.900 --> 05:47.300
What are those principles we have here this false assumption of a shared context.

05:47.300 --> 05:55.100
And this assumes that the model knows the bakery and knows the customer service norms, which it actually

05:55.100 --> 05:59.180
doesn't, and it really has undefined boundaries.

05:59.380 --> 06:03.540
And the statement here escalates to a human if needed.

06:03.740 --> 06:05.300
When is it needed?

06:05.300 --> 06:07.380
So the model needs to guess this.

06:07.660 --> 06:10.700
And in this prompt we have no framework.

06:10.700 --> 06:15.700
We have no structure of approaching problems in a systematic way.

06:15.940 --> 06:22.260
So this is going to lead us with inconsistent behavior, which means that different runs is going to

06:22.300 --> 06:26.500
produce us wildly different approaches to solve the same problem here.

06:26.780 --> 06:33.700
And the main problem of this prompt is that it's essentially saying something like, do the right thing

06:33.900 --> 06:38.180
without defining what right really means in the context.

06:38.700 --> 06:44.380
Here we start by providing clear identity and scope.

06:44.980 --> 06:50.820
And this immediately going to establish boundaries because it's customer support.

06:50.860 --> 06:52.180
It's not marketing.

06:52.180 --> 06:53.220
It's not sales.

06:53.460 --> 06:55.740
And it's going to give the domains.

06:55.740 --> 07:02.540
So it's going to be borders and basic questions but not complex business operations.

07:02.940 --> 07:07.660
And then we go and we empower rather than constraint.

07:08.420 --> 07:16.010
So instead of prescribing and telling exactly which tool to use in which situation establish a goal

07:16.010 --> 07:18.890
which is efficient in professional resolution.

07:18.890 --> 07:26.770
And the heuristic here is that we believe that the agent is going to select the appropriate tools when

07:26.810 --> 07:27.370
needed.

07:27.730 --> 07:32.410
And here we also provide a reasoning framework and not a flowchart.

07:32.610 --> 07:40.370
So we have this four step response framework where it first needs to identify the core issue to gather

07:40.370 --> 07:46.530
the necessary context, to provide clear resolution and to confirm the customer satisfaction.

07:46.810 --> 07:48.970
Now this is guidance.

07:49.290 --> 07:54.130
It's something which should work across a lot of scenarios.

07:54.290 --> 07:58.650
And it's not some rigid branching logic like we had before here.

07:58.690 --> 08:04.490
Now in the last part of the system prompt, we establish here clear boundaries.

08:04.770 --> 08:07.450
And we have here some principles to follow through.

08:07.930 --> 08:12.410
So if we have multiple solutions we need to choose the simplest one.

08:12.810 --> 08:15.250
So this is a heuristic.

08:15.250 --> 08:19.850
So it's something we're hoping that it's going to be the right thing to do.

08:20.050 --> 08:25.570
And it's actually, by the way, reminding me in computer science, a greedy algorithm approach.

08:26.650 --> 08:34.770
Now I want to elaborate on why this prompt here is much more superior than the other system prompts.

08:34.810 --> 08:43.410
The very specific prompt tried to do the thinking for it, and actually it made it worse when situations

08:43.410 --> 08:51.610
does not match the exact script here, and the too vague prompt didn't give the LM enough to work with.

08:52.170 --> 08:58.850
However, the the prompt, it really took advantage of what state of the art large language models are

08:58.850 --> 09:05.890
really, really good at, which is to recognize patterns and to apply general rules to specific situations

09:05.890 --> 09:06.330
here.

09:06.850 --> 09:15.810
Now, the prompt handles new situations very well because it teaches principles instead of giving specific

09:15.810 --> 09:16.330
rules.

09:16.530 --> 09:22.960
So it should work even when it's going to encounter something new, because it has the framework which

09:22.960 --> 09:24.200
is still going to work.

09:24.480 --> 09:30.560
The middle prompt also is very efficient, so it doesn't waste any words.

09:30.760 --> 09:34.240
So each guideline covers many different situations.

09:34.640 --> 09:36.680
The principles here are compressed.

09:36.800 --> 09:43.080
So instead of having hundreds or thousands of edge cases we have simple sentences here.

09:43.080 --> 09:49.440
And we don't have anything which is repeated or overlapping instructions which can cause contradiction

09:49.440 --> 09:51.800
instructions to the large language models.

09:51.840 --> 09:52.280
Hey there.

09:52.320 --> 09:53.440
Even here popping out.

09:53.440 --> 09:55.040
So I hope you enjoyed this video.

09:55.080 --> 09:57.920
I just want to reiterate on the videos goals.

09:58.120 --> 10:04.200
The first goal is to simply show you that system prompts are really, really important and I hope you

10:04.240 --> 10:05.280
got this message.

10:05.400 --> 10:11.800
And the second one is to give you an example of a good system prompt and compare it to some bad system

10:11.800 --> 10:12.360
prompts.

10:12.560 --> 10:15.200
So we'd love to get your feedback on this video.

10:15.200 --> 10:19.920
And if you like this kind of content, please let me know and I'll be making more theoretical content

10:19.920 --> 10:20.640
like this.
