WEBVTT

1
00:00:00.690 --> 00:00:02.460
<v Instructor>Before we talk about deep agents,</v>

2
00:00:02.460 --> 00:00:06.030
we need first to understand a bit of taxonomy

3
00:00:06.030 --> 00:00:09.300
and to talk about what kind of agents do we have

4
00:00:09.300 --> 00:00:12.000
currently in production in the industry.

5
00:00:12.000 --> 00:00:15.930
So let me simplify this and give you how I view things.

6
00:00:15.930 --> 00:00:18.630
So we have the domain of agents

7
00:00:18.630 --> 00:00:21.540
and here are going to be all types of agents

8
00:00:21.540 --> 00:00:23.160
we have now in the industry.

9
00:00:23.160 --> 00:00:26.970
Now they can be agents, they can be agentic applications

10
00:00:26.970 --> 00:00:30.510
like for example this hybrid RAG architecture,

11
00:00:30.510 --> 00:00:34.620
where we have an LLM which is deciding which step to make,

12
00:00:34.620 --> 00:00:39.210
whether to make a search, whether to rephrase the query

13
00:00:39.210 --> 00:00:40.530
before the retrieval.

14
00:00:40.530 --> 00:00:43.770
And we have an LLM making a decision what's going

15
00:00:43.770 --> 00:00:46.080
to be the next step to be executed.

16
00:00:46.080 --> 00:00:49.380
Now under this umbrella of agents,

17
00:00:49.380 --> 00:00:54.380
we have here the ReAct agent where we have an LLM,

18
00:00:54.480 --> 00:00:58.050
which is deciding whether to use a tool or not.

19
00:00:58.050 --> 00:01:01.860
We execute this tool, then we either decide that we want

20
00:01:01.860 --> 00:01:03.960
to return a response to the user if we have enough

21
00:01:03.960 --> 00:01:06.540
information and if not, this agent continues

22
00:01:06.540 --> 00:01:09.960
an execute tool gets the result, the observation

23
00:01:09.960 --> 00:01:14.130
and simply goes on and on and on until it gets an answer.

24
00:01:14.130 --> 00:01:17.280
And by the way, this is what started this all,

25
00:01:17.280 --> 00:01:19.140
this all agent paradigm started

26
00:01:19.140 --> 00:01:22.620
from this ReAct algorithm, this ReAct loop here.

27
00:01:22.620 --> 00:01:24.840
Now when I showed you the taxonomy

28
00:01:24.840 --> 00:01:27.960
and show you how I view the words of agents,

29
00:01:27.960 --> 00:01:32.130
I refer to the ReAct agent as a shallow agent.

30
00:01:32.130 --> 00:01:35.640
Now the reason why I refer to it as a shallow agent

31
00:01:35.640 --> 00:01:39.870
is because this agent doesn't have the capabilities

32
00:01:39.870 --> 00:01:41.730
of going really, really deep.

33
00:01:41.730 --> 00:01:44.100
So for example, to perform a very deep

34
00:01:44.100 --> 00:01:47.640
and thorough research, even though it has a Search tool

35
00:01:47.640 --> 00:01:49.680
and a Wiki tool and a Tavily tool,

36
00:01:49.680 --> 00:01:53.700
if I want to make a decent research,

37
00:01:53.700 --> 00:01:55.980
this agent is not going to be enough.

38
00:01:55.980 --> 00:01:59.190
And the reason for it is from this agent design

39
00:01:59.190 --> 00:02:02.760
and from the reality of modern LLMs.

40
00:02:02.760 --> 00:02:06.120
So this agent is based on function calling.

41
00:02:06.120 --> 00:02:08.970
And in this decision arrow here we have here,

42
00:02:08.970 --> 00:02:11.580
there is a function call that the LLM makes

43
00:02:11.580 --> 00:02:13.860
in order to choose the tool to execute.

44
00:02:13.860 --> 00:02:15.180
Now don't get me wrong,

45
00:02:15.180 --> 00:02:17.730
there is no problem with function calling.

46
00:02:17.730 --> 00:02:19.890
Actually there is a bit of problem with function calling.

47
00:02:19.890 --> 00:02:22.170
I discussed it when I talk about code mode.

48
00:02:22.170 --> 00:02:24.870
But the inherent limitation of this agent,

49
00:02:24.870 --> 00:02:26.160
what makes it shallow

50
00:02:26.160 --> 00:02:30.090
and not being able to perform deep tasks is that

51
00:02:30.090 --> 00:02:32.910
it has a very limited context window.

52
00:02:32.910 --> 00:02:36.300
Now we know LLMs have context window. This is not new.

53
00:02:36.300 --> 00:02:39.720
But because of this architecture, every time we're going

54
00:02:39.720 --> 00:02:42.720
to be making a decision then to execute the tool,

55
00:02:42.720 --> 00:02:45.060
then to plug the result into the LLM,

56
00:02:45.060 --> 00:02:48.660
we are going to be bloating the context.

57
00:02:48.660 --> 00:02:50.880
And if we're going to have one iteration,

58
00:02:50.880 --> 00:02:52.710
it's going to be maybe reasonable.

59
00:02:52.710 --> 00:02:55.200
But as we have more iterations,

60
00:02:55.200 --> 00:02:58.320
this context window keeps growing and growing and growing

61
00:02:58.320 --> 00:03:01.290
and this is going to be leading to context rot.

62
00:03:01.290 --> 00:03:04.830
So we can have context confusion, context contradiction,

63
00:03:04.830 --> 00:03:08.550
context pollution, and a lot of things that eventually

64
00:03:08.550 --> 00:03:11.490
are going to degrade the LLM's performance

65
00:03:11.490 --> 00:03:16.490
and are going to get our agents to go off rails.

66
00:03:16.500 --> 00:03:18.870
Just to emphasize, I'm talking here about the use case

67
00:03:18.870 --> 00:03:23.310
where we have a complex task, a deep task where we need

68
00:03:23.310 --> 00:03:27.150
to continue and continue and continue and gather information

69
00:03:27.150 --> 00:03:29.040
and deduce and process.

70
00:03:29.040 --> 00:03:31.830
And we have something which is long running,

71
00:03:31.830 --> 00:03:33.660
not simply book me a flight.

72
00:03:33.660 --> 00:03:35.580
We have something much more complex like

73
00:03:35.580 --> 00:03:36.660
to implement a feature

74
00:03:36.660 --> 00:03:39.930
or to implement a deep research on a topic.

75
00:03:39.930 --> 00:03:43.110
So up until now I just talk about the degradation

76
00:03:43.110 --> 00:03:45.170
of the performance of the LLM,

77
00:03:45.170 --> 00:03:47.490
of the quality of the results.

78
00:03:47.490 --> 00:03:49.380
Of course it's going to be costing us more money

79
00:03:49.380 --> 00:03:51.960
because it's more tokens for each call we make.

80
00:03:51.960 --> 00:03:55.050
So the LLM call is going to be heavier.

81
00:03:55.050 --> 00:03:56.850
So it's going to contain more tokens,

82
00:03:56.850 --> 00:03:59.670
it's going to cost us more, it's going to be also slower.

83
00:03:59.670 --> 00:04:03.720
And this agent, this ReAct agent architecture,

84
00:04:03.720 --> 00:04:05.190
while it is amazing

85
00:04:05.190 --> 00:04:07.980
and it's the basis for basically everything,

86
00:04:07.980 --> 00:04:12.980
it's not really usable for complex tasks, those deep tasks.

87
00:04:12.990 --> 00:04:15.540
It is actually very good for shallow tasks

88
00:04:15.540 --> 00:04:17.850
that don't require a lot of iterations.

89
00:04:17.850 --> 00:04:19.770
And we actually have a lot of those kinds

90
00:04:19.770 --> 00:04:21.120
of agents in production.

91
00:04:21.120 --> 00:04:22.770
I've worked with many customers

92
00:04:22.770 --> 00:04:25.410
and a lot of them have this kind of architecture.

93
00:04:25.410 --> 00:04:28.590
And for a lot of use cases, it is actually enough

94
00:04:28.590 --> 00:04:30.810
and we don't really need more than that.

95
00:04:30.810 --> 00:04:33.270
Let's go and look at the world of agents again

96
00:04:33.270 --> 00:04:36.990
and we explain this shallow and ReAct part here.

97
00:04:36.990 --> 00:04:40.050
So now let's start to talk about deep agents.

98
00:04:40.050 --> 00:04:43.710
And deep agents are agents that are able

99
00:04:43.710 --> 00:04:46.830
to perform those kinds of long horizon tasks.

100
00:04:46.830 --> 00:04:50.460
Those tasks that are complex that require a lot

101
00:04:50.460 --> 00:04:53.370
of iterations, a lot of processing.

102
00:04:53.370 --> 00:04:55.020
Those agents are long running.

103
00:04:55.020 --> 00:04:59.220
They can run for minutes, for hours, even for days.

104
00:04:59.220 --> 00:05:01.170
They can get even user input.

105
00:05:01.170 --> 00:05:04.320
They can stop their execution, they can then resume it

106
00:05:04.320 --> 00:05:06.210
after they receive the user input.

107
00:05:06.210 --> 00:05:09.240
And those are what are called deep agents.

108
00:05:09.240 --> 00:05:12.150
So we have here deep research agents

109
00:05:12.150 --> 00:05:14.370
and here for example, we have Perplexity

110
00:05:14.370 --> 00:05:17.280
and we have here a deep research feature.

111
00:05:17.280 --> 00:05:20.250
And this is going to trigger a deep research agent run.

112
00:05:20.250 --> 00:05:23.310
Now deep research agents are very, very common today.

113
00:05:23.310 --> 00:05:26.400
So almost every other vendor have kind of support

114
00:05:26.400 --> 00:05:27.840
for research agents.

115
00:05:27.840 --> 00:05:31.350
Here we can see in Claude Code the research option.

116
00:05:31.350 --> 00:05:35.550
And here we can see in ChatGPT also a research option.

117
00:05:35.550 --> 00:05:38.640
And all of those are going to trigger deep agents

118
00:05:38.640 --> 00:05:40.470
that are going to perform research.

119
00:05:40.470 --> 00:05:43.260
And every one of them is implemented a bit differently

120
00:05:43.260 --> 00:05:44.910
by the vendors.

121
00:05:44.910 --> 00:05:49.020
We have also open source deep research agents,

122
00:05:49.020 --> 00:05:51.660
for example GPT Researcher, which at the time

123
00:05:51.660 --> 00:05:53.010
of recording this video

124
00:05:53.010 --> 00:05:55.740
is one of the most popular open source projects.

125
00:05:55.740 --> 00:05:59.130
And I actually review it in depth in my LangGraph course.

126
00:05:59.130 --> 00:06:01.740
Alright, let's go back to the taxonomy here.

127
00:06:01.740 --> 00:06:05.280
So we talked about deep research agents as an example,

128
00:06:05.280 --> 00:06:09.900
and we also have coding agents like Claude Code, like Devin,

129
00:06:09.900 --> 00:06:14.220
like Cursor, like Gemini CLI, and the list goes on and on

130
00:06:14.220 --> 00:06:18.180
because coding agents are actually one of the best examples

131
00:06:18.180 --> 00:06:21.060
of how deep agents can be very, very effective

132
00:06:21.060 --> 00:06:22.770
and useful for the industry

133
00:06:22.770 --> 00:06:26.250
and actually used by millions of developers.

134
00:06:26.250 --> 00:06:28.620
And here I'm going to be focusing on Claude Code

135
00:06:28.620 --> 00:06:32.430
because it is currently the leading coding agent.

136
00:06:32.430 --> 00:06:36.690
Now Claude Code is extremely good, it's very,

137
00:06:36.690 --> 00:06:40.830
very useful and it can actually be used for a wide variety

138
00:06:40.830 --> 00:06:43.050
of tasks which are long-running.

139
00:06:43.050 --> 00:06:47.130
They need to perform deep, they need scale.

140
00:06:47.130 --> 00:06:50.010
And this is the top coding agent

141
00:06:50.010 --> 00:06:52.020
at least when making this video.

142
00:06:52.020 --> 00:06:54.330
So we can go and ask Claude

143
00:06:54.330 --> 00:06:57.900
or any other coding agent to implement us an application

144
00:06:57.900 --> 00:07:00.690
or a feature in the application and those agents can go

145
00:07:00.690 --> 00:07:02.730
and run and actually accomplish it.

146
00:07:02.730 --> 00:07:06.630
They can go and not only code the required code

147
00:07:06.630 --> 00:07:09.930
but also to run tests, to test the application,

148
00:07:09.930 --> 00:07:12.360
to open up a browser, to take screenshots

149
00:07:12.360 --> 00:07:14.580
and to do the very same things

150
00:07:14.580 --> 00:07:17.040
that a normal software engineer would do.

151
00:07:17.040 --> 00:07:22.040
So those coding agents are going to be a subset

152
00:07:22.170 --> 00:07:25.560
of deep agents which are tailor-made for coding.

153
00:07:25.560 --> 00:07:27.480
So this is their specialty.

154
00:07:27.480 --> 00:07:30.570
And as a side note here, deep agents are

155
00:07:30.570 --> 00:07:33.060
what today drive innovation

156
00:07:33.060 --> 00:07:36.510
and they are the top tier technology we have right now.

157
00:07:36.510 --> 00:07:39.720
And it's actually very interesting to see the fact

158
00:07:39.720 --> 00:07:42.600
that LLMs now are becoming better and better,

159
00:07:42.600 --> 00:07:46.140
but currently their improvements are pretty gradual.

160
00:07:46.140 --> 00:07:49.920
So we don't have this huge leap of reasoning

161
00:07:49.920 --> 00:07:51.780
or other capabilities

162
00:07:51.780 --> 00:07:54.780
that we see from each new model that comes out.

163
00:07:54.780 --> 00:07:56.550
The quality is getting better.

164
00:07:56.550 --> 00:08:00.210
Yes, the reasoning is better, the capabilities are better,

165
00:08:00.210 --> 00:08:01.530
but it grows gradually.

166
00:08:01.530 --> 00:08:02.940
It doesn't grow exponentially.

167
00:08:02.940 --> 00:08:04.980
However, we are now at the point

168
00:08:04.980 --> 00:08:09.480
that abstraction on abstractions of agents, on agents

169
00:08:09.480 --> 00:08:11.670
in the application layer, if we know how

170
00:08:11.670 --> 00:08:13.710
to use those LLMs correctly,

171
00:08:13.710 --> 00:08:15.870
we can achieve some very impressive

172
00:08:15.870 --> 00:08:19.590
and very capable machines that can automate

173
00:08:19.590 --> 00:08:23.070
a lot of human work, which requires a lot of reasoning.

174
00:08:23.070 --> 00:08:26.340
Things that we once thought that are impossible

175
00:08:26.340 --> 00:08:29.160
because if I were to tell you five years ago, for example,

176
00:08:29.160 --> 00:08:32.340
that we are going to have a technology that is able

177
00:08:32.340 --> 00:08:37.340
to create a working pretty beautiful application from zero

178
00:08:37.470 --> 00:08:40.290
with AI with no human intervention,

179
00:08:40.290 --> 00:08:42.360
you would probably say that I'm crazy.

180
00:08:42.360 --> 00:08:44.760
And the point I'm trying to make here is

181
00:08:44.760 --> 00:08:48.870
that the application layer today, the application layer

182
00:08:48.870 --> 00:08:51.330
that implements those kinds of deep agents,

183
00:08:51.330 --> 00:08:55.710
this is what's driving now the technology, not the LLMs

184
00:08:55.710 --> 00:08:57.060
that are getting better,

185
00:08:57.060 --> 00:09:01.260
but the application layer that we build as developers on top

186
00:09:01.260 --> 00:09:03.840
of them, how to harness those LLMs.

187
00:09:03.840 --> 00:09:06.810
And today, this is what pushes the boundaries

188
00:09:06.810 --> 00:09:09.330
and pounds on the innovation.

189
00:09:09.330 --> 00:09:12.270
This type of work, the application layer work,

190
00:09:12.270 --> 00:09:15.150
this deep agents implementation.

191
00:09:15.150 --> 00:09:18.450
And let's go back to the discussion on deep agents

192
00:09:18.450 --> 00:09:21.480
and what makes an agent a deep agent.

193
00:09:21.480 --> 00:09:24.240
So there isn't really a clear definition,

194
00:09:24.240 --> 00:09:28.410
but an agent that is able to perform a complex task,

195
00:09:28.410 --> 00:09:30.180
in my opinion is a deep agent.

196
00:09:30.180 --> 00:09:32.760
And in order for an agent to go

197
00:09:32.760 --> 00:09:35.580
and perform these kinds of research tasks

198
00:09:35.580 --> 00:09:38.610
or coding tasks which are complex

199
00:09:38.610 --> 00:09:40.980
and require long running,

200
00:09:40.980 --> 00:09:44.160
we need to have certain capabilities of those agents

201
00:09:44.160 --> 00:09:47.880
that are going to make sure that we get a quality result.

202
00:09:47.880 --> 00:09:51.300
Now it all boils down to context engineering

203
00:09:51.300 --> 00:09:55.470
and smart context management, which is going to allow us

204
00:09:55.470 --> 00:09:58.410
to solve this problem of the context bloat

205
00:09:58.410 --> 00:10:00.330
and context accumulation.

206
00:10:00.330 --> 00:10:04.350
And it's going to help us scale those agents

207
00:10:04.350 --> 00:10:07.320
to perform a lot of things concurrently.

208
00:10:07.320 --> 00:10:08.670
And to achieve this

209
00:10:08.670 --> 00:10:10.950
and to achieve this capability of performing

210
00:10:10.950 --> 00:10:13.530
and executing on long horizon tasks,

211
00:10:13.530 --> 00:10:17.160
most deep agents implement the following.

212
00:10:17.160 --> 00:10:21.480
First, they have a planning tool which is going to plan

213
00:10:21.480 --> 00:10:24.030
what the deep agent is going to do.

214
00:10:24.030 --> 00:10:27.270
They also have a subagents capability.

215
00:10:27.270 --> 00:10:30.660
So this means they can spawn specialized workers

216
00:10:30.660 --> 00:10:33.810
that are going to work in isolated context

217
00:10:33.810 --> 00:10:36.540
and going to help the deep agent scale

218
00:10:36.540 --> 00:10:39.300
and perform a lot of tasks concurrently

219
00:10:39.300 --> 00:10:41.550
without bloating the context.

220
00:10:41.550 --> 00:10:43.680
Thirdly, they have a file system,

221
00:10:43.680 --> 00:10:46.740
where they can write intermediate results

222
00:10:46.740 --> 00:10:49.530
and shared states between those agents

223
00:10:49.530 --> 00:10:52.170
and everything is not going to blow to the context

224
00:10:52.170 --> 00:10:54.210
because it's going to be written to the disc.

225
00:10:54.210 --> 00:10:58.650
And thirdly, they have a monstrous system prompt.

226
00:10:58.650 --> 00:11:01.980
If you're going to pick up today one deep agent,

227
00:11:01.980 --> 00:11:05.520
it's going to probably implement those kinds of four ideas.

228
00:11:05.520 --> 00:11:07.680
And in the next video, I'll be diving

229
00:11:07.680 --> 00:11:09.060
and explaining those ideas,

230
00:11:09.060 --> 00:11:11.110
including show you implementations of it.

