WEBVTT

00:00.080 --> 00:07.800
Next up is a topic that I love and it is one of the latest buzzwords in AI context engineering.

00:08.000 --> 00:14.080
I think the term was coined by a few people, but definitely made famous by a Google engineer called

00:14.080 --> 00:14.640
Phil Schmidt.

00:14.680 --> 00:16.520
That will look at his post in a second.

00:16.760 --> 00:23.160
And it really comes down to this observation that, look llms are constrained.

00:23.200 --> 00:26.760
They have an input in the form of a set of tokens.

00:26.800 --> 00:29.480
They're predicting what would most likely come next.

00:29.480 --> 00:34.520
And that input set of tokens has to sit within something called the context window.

00:34.560 --> 00:41.080
The maximum number of tokens that it can squeeze in and still pay attention across all of this input.

00:41.080 --> 00:48.000
And that input needs to include everything like the conversation so far, the chat history, descriptions

00:48.000 --> 00:52.640
of any tools that need to be called, any tools that have been called, and the results of them being

00:52.640 --> 00:58.680
called as part of this particular conversation, along with with any rag that you've done, anything

00:58.680 --> 01:01.660
you've shoved in the system prompt and the system prompt itself.

01:01.780 --> 01:06.020
All of that stuff, it all has to go in this context window.

01:06.020 --> 01:08.060
And so that is a constraint.

01:08.340 --> 01:14.820
And also it's not like it's not like you want to use up all of the context window, because the more

01:14.820 --> 01:21.740
you shove in there, the more you're at risk of losing what people call coherence of the model, which

01:21.740 --> 01:26.980
is basically saying how reliable is it at generating outputs consistent with your goals?

01:27.100 --> 01:33.140
The more you confuse it by putting lots of stuff in there, the less likely you are to get reliable

01:33.140 --> 01:33.980
outcomes.

01:34.100 --> 01:39.860
And so adding in tools, which feels like I'm equipping my model with more and more abilities to do

01:39.900 --> 01:45.940
different things, this has to be a good thing, but no, giving it more choice can lead to less predictable

01:45.940 --> 01:48.860
outcomes and can reduce reduce accuracy.

01:49.020 --> 01:55.900
And so the term context engineering came about to represent the art and science of getting it right,

01:56.260 --> 02:01.000
picking the right input tokens, the right information to pack into the space.

02:01.000 --> 02:05.080
You've got to give your model the greatest chance of getting the right answer.

02:05.120 --> 02:11.360
Picking the right tool, giving the right the right reasoning, or giving the right kind of background

02:11.360 --> 02:12.920
information from your rag.

02:13.000 --> 02:15.080
Getting it right by.

02:15.120 --> 02:20.160
By putting the best possible cut of information in your inputs.

02:20.160 --> 02:23.320
And that that is what context engineering is all about.

02:23.320 --> 02:30.160
And here is the seminal blog post from Phil Schmidt, the Google engineer from Google DeepMind, who

02:30.440 --> 02:35.640
really was one of the first to expand on this idea of context engineering and express it.

02:35.640 --> 02:40.360
And he did so visually with that Venn diagram there that shows all the kinds of things that you pack

02:40.360 --> 02:47.120
in to the context, to the to the input tokens, and includes what people like to call short term memory,

02:47.120 --> 02:51.640
which is just referring to the conversation history, the stuff we've been putting into the memory in

02:52.320 --> 02:57.400
with longer term memory, which is really what people use to describe all of the other databases that

02:57.400 --> 03:04.060
you might have that are allowing your your LM to have some persistent knowledge of, of its interactions

03:04.380 --> 03:05.980
or of the information that it needs.

03:07.060 --> 03:13.380
The system prompt, which typically contains your rag available tools, structured output.

03:13.380 --> 03:19.300
You see, I'm thrilled to see that Phil included that my favorite piece in his diagram, and he cunningly

03:19.300 --> 03:24.380
showed it as overlapping with available tools, probably to recognize that most of the time structured

03:24.380 --> 03:28.220
output is in fact implemented behind the scenes, just as another tool.

03:28.220 --> 03:29.260
It's just the way it works.

03:29.260 --> 03:31.460
But as far as we're concerned, it is the output.

03:31.780 --> 03:38.220
And uh, and the, the user prompt as well, of course, most importantly is the input that's used that

03:38.220 --> 03:40.140
the LM is actually responding to.

03:40.180 --> 03:46.020
So figuring out the right decisions and trade offs to get this most efficiently packed into your context

03:46.020 --> 03:46.540
window.

03:46.580 --> 03:47.860
That's what it's all about.

03:47.860 --> 03:50.420
And you should read this blog post because it's terrific.

03:50.420 --> 03:53.620
And if you Google context engineering, this is probably the first thing that comes up.

03:53.620 --> 03:56.380
But there's lots of other good material on it.

03:56.380 --> 03:59.560
And I'll put a link to Phil Schmidt's post in the course resources.

03:59.600 --> 04:05.680
Now you're probably thinking, okay, so what's the process that I should follow in order to get the

04:05.680 --> 04:06.560
best context?

04:06.560 --> 04:07.200
Engineering?

04:07.200 --> 04:11.960
What is what is the waterfall that you take to get the context to be optimized?

04:12.360 --> 04:18.960
And of course, the answer is it's not like there is a fixed process like an A, B, C, d.

04:19.280 --> 04:25.360
This is a classic example with LMS of where what you need to do is experiment.

04:25.360 --> 04:27.840
This is about R&amp;D context.

04:27.840 --> 04:32.440
Engineering is trial and error R&amp;D and the North Star.

04:32.480 --> 04:36.160
The way that you do it is by having an evaluation.

04:36.160 --> 04:41.920
You need to have a metric, something which allows you to measure how effective is your agent system

04:41.920 --> 04:44.520
towards your ultimate commercial objective.

04:44.600 --> 04:49.200
And once you've got that metric, which we'll be talking about a bit later, once you've got that evaluation

04:49.200 --> 04:55.880
metric, you can then use that to test different approaches for your context test using Rag in a different

04:55.880 --> 04:56.160
way.

04:56.200 --> 04:59.580
Test different combinations of tools to figure out what does best.

04:59.700 --> 05:04.460
Maybe try tools versus structured outputs to see which one performs better.

05:04.580 --> 05:08.860
And so start with a metric and then run lots of experiments.

05:08.860 --> 05:11.900
There is no shortcut to experimentation.

05:11.900 --> 05:14.380
That's how to do great context engineering.

05:14.700 --> 05:15.300
All right.

05:15.300 --> 05:16.820
And the other big topic for today.

05:16.860 --> 05:20.020
Sub agents which we've already met a bit.

05:20.060 --> 05:23.580
So a few things that sub agents are all about.

05:23.780 --> 05:32.060
It's all about dividing your bigger agentic problem into small steps that you can test independently.

05:32.060 --> 05:37.420
That a sub agent, another AI agent, can manage itself and be responsible for.

05:37.820 --> 05:45.580
It's about being able to reuse the same AI agent as part of multiple larger agentic flows.

05:45.580 --> 05:50.700
If you've got one task, perhaps it might be, say, finding prospects.

05:51.020 --> 05:55.620
If that task is something that you might want to be able to reuse, maybe behind an MCP server, maybe

05:55.620 --> 05:57.920
in another project, maybe and another a couple of projects.

05:57.920 --> 06:04.400
Then building that out as a subagent that carries out its task, is tested and evaluated, meets the

06:04.400 --> 06:09.360
criteria, then that is great for reusing it in different agentic flows.

06:09.360 --> 06:12.040
That's another good reason to use a subagent.

06:12.320 --> 06:16.120
And of course, super important context engineering.

06:16.280 --> 06:21.680
If you've got like 12 different tools that you're equipping with your same model, it might get to a

06:21.680 --> 06:23.800
point where it's starting to lose coherence.

06:23.800 --> 06:26.720
It's not able to track all these different bits of functionality.

06:26.760 --> 06:32.520
You might find that there's a sensible group of them, that one particular task only needs that group

06:32.520 --> 06:35.880
of tools, and no other task needs that group of tools.

06:35.880 --> 06:39.160
That sounds like that's going to be a great thing to experiment with.

06:39.200 --> 06:40.720
Try breaking that off.

06:40.760 --> 06:47.400
Have that be a separate subagent that is then offered up as a single tool to the bigger agent, and

06:47.400 --> 06:49.840
now you've just optimized your context.

06:49.880 --> 06:54.160
The context window for the Subagent is very efficient, and you've reduced the window.

06:54.200 --> 06:58.750
The amount of tokens you need for the the overall agent as well.

06:58.750 --> 07:00.950
So that can be a great trade to make.

07:00.990 --> 07:05.310
And now Nan gives us a particularly easy way to implement Subagents.

07:05.310 --> 07:11.550
They can be implemented using these things called sub workflows, which we've actually already seen

07:11.590 --> 07:13.190
because I did that yesterday.

07:13.390 --> 07:19.910
Uh, when we built out the prospecting Subagent, I called it a sub agent rather sneakily without even

07:19.910 --> 07:21.230
defining a sub agent.

07:21.270 --> 07:24.110
We called it a prospecting subagent.

07:24.110 --> 07:25.190
That's what we created.

07:25.190 --> 07:26.990
And I set it up as a workflow.

07:27.310 --> 07:27.990
And I had it.

07:27.990 --> 07:30.030
I added the trigger.

07:30.030 --> 07:33.790
I had it be something that was triggered by a different workflow.

07:33.990 --> 07:38.910
So it effectively was a sub workflow because it could be called by another workflow.

07:38.910 --> 07:44.990
And doing it that way is a very nice, easy, simple way to say all of this workflow is itself a sub

07:44.990 --> 07:45.710
agent.

07:45.710 --> 07:48.150
And so Nan makes it very simple for us.

07:48.150 --> 07:53.590
So it's easy to build up these workflows, test them, give them a limited number of tools that make

07:53.590 --> 07:55.230
sense for the task at hand.

07:55.330 --> 08:00.090
Have it be something that we can evaluate, have it be something we can test, we can sign it off,

08:00.090 --> 08:01.930
we can publish it to production.

08:01.930 --> 08:07.130
And then that is something we can use from a different agent, which will have a more efficient use

08:07.130 --> 08:10.730
of its context, because we farmed everything out to the subagent.

08:10.770 --> 08:12.090
It makes total sense.

08:12.090 --> 08:16.210
And I should mention you don't need to have a sub workflow to have a sub agent.

08:16.250 --> 08:20.170
A sub agent can be as simple as, hey, you're in an you're in a workflow.

08:20.210 --> 08:26.330
You put down one AI agent, the output you put into another AI agent, that output you put into a third

08:26.370 --> 08:31.890
AI agent, and then you just got like three AI agents in a row and you can say, these are each three

08:31.930 --> 08:35.210
sub agents that are working together on a bigger task.

08:35.210 --> 08:38.290
So, you know, you can you can organize agents in many different ways.

08:38.610 --> 08:44.370
A sub agent is just really the name of saying you've got a smaller part of your bigger agentic problem,

08:44.370 --> 08:49.170
and that is itself an AI agent, which which, you know, it seems very reasonable thing to do.

08:49.330 --> 08:55.470
It's just that it's quite nice and elegant to use a sub workflow as your sub subagent, but it's not

08:55.470 --> 08:55.950
required.

08:55.990 --> 09:00.230
Okay, so those are the good ways to use subagents or the good reasons to use them.

09:00.830 --> 09:02.550
A few things to watch out for.

09:02.950 --> 09:09.830
First of all, beware of the thing that I called the human trap, which I mentioned all the way back

09:09.830 --> 09:12.510
in week one, week one, day two.

09:12.550 --> 09:18.510
I do believe, and that is when you see people that are anthropomorphizing, they are coming up with

09:18.550 --> 09:23.150
agents with human like roles because it just sort of sounds good.

09:23.150 --> 09:24.230
It feels right.

09:24.230 --> 09:29.110
We should have these different agents that have a junior this and a senior this and do this and this

09:29.110 --> 09:31.910
and this, because that's how our org is structured.

09:31.910 --> 09:38.070
And that's why they choose the agents to have those responsibilities, human like responsibilities.

09:38.070 --> 09:43.310
It might turn out that that is a good way to do it in order to divide your problem into smaller steps.

09:43.310 --> 09:48.110
Maybe there was a good reason why humans were put into those roles, but but still, it shouldn't be

09:48.150 --> 09:50.510
first and foremost why you do it.

09:50.510 --> 09:56.730
You should divide into subagents because it gives you independently testable steps because it improves

09:56.730 --> 09:59.850
your evals, because it fixes a particular problem.

09:59.850 --> 10:05.010
Not not just because it sounds like those are responsibilities that should make sense.

10:05.090 --> 10:13.170
So the other thing to be aware of with Subagents is that it it constrains the way that your agency platform

10:13.170 --> 10:13.770
runs.

10:13.810 --> 10:18.210
You are putting some some constructs around what has to happen.

10:18.210 --> 10:23.890
This this overall agent is going to only be able to call these fixed subagents.

10:24.130 --> 10:28.850
And in doing so, you are reducing some of the flexibility that you are giving your LLM.

10:28.890 --> 10:32.050
It's not able to decide, you know what, I'm going to do things a bit differently.

10:32.090 --> 10:33.210
I'm going to call this tool.

10:33.210 --> 10:35.210
And then the other one over here and then this one.

10:35.210 --> 10:38.970
But rather you're packaging your tools into fixed subagents.

10:38.970 --> 10:45.130
So whilst you're giving this the benefit of the smaller, independently testable steps, you are removing

10:45.130 --> 10:50.490
some of the flexibility that is the trade off that you are making, and you just have to be cognizant

10:50.490 --> 10:51.730
of that trade off.

10:52.050 --> 10:57.270
And Typically when you do it this way with Subagents, you do have more to build.

10:57.270 --> 11:01.510
It's like you've got to build each of these subagents, you've got to test them all independently.

11:01.510 --> 11:03.150
So it's more work.

11:03.150 --> 11:09.750
Building with Subagencies is harder than just simply setting up one main AI agent, equipping it with

11:09.750 --> 11:12.790
a ton of tools and say, hey, this is your goal.

11:12.830 --> 11:14.510
Go do what agents do.

11:14.550 --> 11:18.110
Go run these tools in a loop with a goal.

11:18.150 --> 11:22.550
Make it happen that is easy to code and wonderful when it works.

11:22.910 --> 11:29.270
So building subagents is harder work and takes longer, but of course it's easier to test it because

11:29.270 --> 11:34.350
you've got this much, much more sort of constrained, independently testable steps.

11:34.350 --> 11:40.910
And it's typically it is, as a result, more bulletproof, more predictable because you've got it on

11:40.910 --> 11:41.590
rails.

11:41.790 --> 11:48.310
It's worth pointing out this trade off between being more flexible, more autonomous, and being more

11:48.310 --> 11:50.110
bulletproof and reliable.

11:50.110 --> 11:54.970
That's something I just mentioned a few minutes ago as well, in the context of structured outputs versus

11:54.970 --> 11:55.690
tools.

11:55.690 --> 11:59.730
So you'll see this sort of familiar pattern that we're coming back to that again with Subagents.

11:59.730 --> 12:05.410
And in fact, many of the decisions that you make with context engineering come down to this, this

12:05.410 --> 12:09.490
trade off between autonomy, flexibility and reliability.

12:09.490 --> 12:16.050
And I'll tell you that when I see real systems implemented in production, they tend to lean heavily

12:16.050 --> 12:19.250
towards this more bulletproof approach.

12:19.290 --> 12:24.930
Structured outputs subagents divide your problem into smaller, testable units.

12:25.130 --> 12:28.650
That's how Agentic AI is being deployed to production today.

12:29.130 --> 12:34.850
The only exception is the more high tech tools like Claude Code, where you're more willing to just

12:34.850 --> 12:37.610
give a bunch of tools to a model and let it go at it.

12:37.610 --> 12:42.050
And you have some tolerance for failure because you need to see the innovation.

12:42.050 --> 12:47.530
So understanding these trade offs and keeping them front of mind that that is really that is what it

12:47.530 --> 12:52.490
takes to move from being someone who's got some experience in this field to being a pro.