1
00:00:00,000 --> 00:00:07,699
What are AI agents?

2
00:00:07,699 --> 00:00:11,079
And to start explaining that, we have to look at the various shifts that we're seeing in

3
00:00:11,079 --> 00:00:12,479
the field of generative AI.

4
00:00:12,479 --> 00:00:19,520
And the first shift I would like to talk to you about is this move from monolithic models

5
00:00:19,520 --> 00:00:28,680
to compound AI systems.

6
00:00:28,680 --> 00:00:33,299
So models on their own are limited by the data they've been trained on.

7
00:00:33,299 --> 00:00:42,360
So that impacts what they know about the world and what sort of tasks they can solve.

8
00:00:42,360 --> 00:00:47,040
They are also hard to adapt, so you could tune a model, but it would take an investment

9
00:00:47,040 --> 00:00:54,040
in data and in resources.

10
00:00:54,040 --> 00:00:57,380
So let's take a concrete example to illustrate this point.

11
00:00:57,380 --> 00:01:02,779
I want to plan a vacation for this summer, and I want to know how many vacation days

12
00:01:02,779 --> 00:01:08,300
are at my disposal.

13
00:01:08,300 --> 00:01:22,059
What I can do is take my query, feed that into a model that can generate a response.

14
00:01:22,059 --> 00:01:26,900
I think we can all expect that this answer will be incorrect, because the model doesn't

15
00:01:26,900 --> 00:01:32,860
know who I am and does not have access to this sensitive information about me.

16
00:01:32,860 --> 00:01:37,419
So models on their own could be useful for a number of tasks, as we've seen in other

17
00:01:37,419 --> 00:01:38,419
videos.

18
00:01:38,419 --> 00:01:40,220
So they can help with summarizing documents.

19
00:01:40,220 --> 00:01:44,319
They can help me with creating first drafts for emails and different reports I'm trying

20
00:01:44,319 --> 00:01:45,459
to do.

21
00:01:45,459 --> 00:01:50,459
But the magic gets unlocked when I start building systems around the model and actually take

22
00:01:50,459 --> 00:01:54,980
the model and integrate them into the existing processes I have.

23
00:01:55,059 --> 00:02:00,660
So if we were to design a system to solve this, I would have to give the model access

24
00:02:00,660 --> 00:02:05,099
to the database where my vacation data is stored.

25
00:02:05,099 --> 00:02:09,419
So that same query would get fed into the language model.

26
00:02:09,419 --> 00:02:15,380
The difference now is the model would be prompted to create a search query.

27
00:02:15,380 --> 00:02:20,300
And that would be a search query that can go into the database that I have.

28
00:02:20,300 --> 00:02:26,100
So that would go and fetch the information from the database, output an answer, and then

29
00:02:26,100 --> 00:02:31,179
that would go back into the model that can generate a sentence to answer.

30
00:02:31,179 --> 00:02:35,639
So Maya, you have 10 days left in your vacation database.

31
00:02:35,639 --> 00:02:44,339
So the answer that I would get here would be correct.

32
00:02:44,339 --> 00:02:47,460
This is an example of a compound AI system.

33
00:02:47,460 --> 00:02:51,699
And it recognizes that certain problems are better solved when you apply the principles

34
00:02:51,699 --> 00:02:57,779
of system design.

35
00:02:57,779 --> 00:03:00,139
So what does that mean?

36
00:03:00,139 --> 00:03:04,100
By the term system, you can understand there's multiple components.

37
00:03:04,100 --> 00:03:06,899
So systems are inherently modular.

38
00:03:06,899 --> 00:03:07,899
I can have a model.

39
00:03:07,899 --> 00:03:13,820
I can choose between tuned models, large language models, image generation models.

40
00:03:13,820 --> 00:03:17,020
But also I have programmatic components that can come around it.

41
00:03:17,580 --> 00:03:20,020
So I can have output verifiers.

42
00:03:20,020 --> 00:03:25,059
I can have programs that can take a query and then break it down to increase the chances

43
00:03:25,059 --> 00:03:26,860
of the answer being correct.

44
00:03:26,860 --> 00:03:29,580
I can combine that with searching databases.

45
00:03:29,580 --> 00:03:32,059
I can combine that with different tools.

46
00:03:32,059 --> 00:03:38,059
So when we're talking about a system approach, I can break down what I desire my program

47
00:03:38,059 --> 00:03:42,500
to do and pick the right components to be able to solve that.

48
00:03:42,500 --> 00:03:46,779
And this is inherently easier to solve for than tuning a model.

49
00:03:46,820 --> 00:03:50,179
So that makes this much faster and quicker to adapt.

50
00:03:56,580 --> 00:03:57,460
OK.

51
00:03:57,460 --> 00:04:03,139
So the example I used below is an example of a compound AI system.

52
00:04:03,139 --> 00:04:09,020
You also might be popular with retrieval augmented generation, which is one of the most popular

53
00:04:09,020 --> 00:04:13,539
and commonly used compound AI systems out there.

54
00:04:13,539 --> 00:04:19,859
Most RAG systems, and the example I used below, are defined in a certain way.

55
00:04:19,859 --> 00:04:24,380
So if I bring a very different query, let's say I ask about the weather in this example

56
00:04:24,380 --> 00:04:31,660
here, it's going to fail because the path that this program has to follow is to always

57
00:04:31,660 --> 00:04:36,179
search my vacation policy database, and that has nothing to do with the weather.

58
00:04:36,179 --> 00:04:40,899
So when we say the path to answer a query, we are talking about something called the

59
00:04:40,899 --> 00:04:45,140
control logic of a program.

60
00:04:45,140 --> 00:04:51,899
So compound AI systems, we said most of them have programmatic control logic.

61
00:04:51,899 --> 00:04:57,540
So that was something that I defined myself as the human.

62
00:04:57,540 --> 00:05:01,980
Now, let's talk about where do agents come in.

63
00:05:01,980 --> 00:05:07,940
One other way of controlling the logic of a compound AI system is to put a large language

64
00:05:07,940 --> 00:05:09,820
model in charge.

65
00:05:10,059 --> 00:05:14,579
And this is only possible because we're seeing tremendous improvements in the capabilities

66
00:05:14,579 --> 00:05:16,940
of reasoning of large language models.

67
00:05:16,940 --> 00:05:21,500
So large language models, you can feed them complex problems, and you can prompt them

68
00:05:21,500 --> 00:05:25,739
to break them down and come up with a plan on how to tackle it.

69
00:05:25,739 --> 00:05:30,459
Another way to think about it is on one end of the spectrum, I'm telling my system to

70
00:05:30,459 --> 00:05:36,339
think fast, act as programmed, don't deviate from the instructions I've given you.

71
00:05:36,339 --> 00:05:42,140
And on the other end of the spectrum, you're designing your system to think slow.

72
00:05:42,140 --> 00:05:47,540
So create a plan, attack each part of the plan, see where you get stuck, see if you

73
00:05:47,540 --> 00:05:49,420
need to readjust the plan.

74
00:05:49,420 --> 00:05:53,739
So I might give you a complex question, and if you would just give me the first answer

75
00:05:53,739 --> 00:05:57,600
that pops into your head, very likely that answer might be wrong.

76
00:05:57,600 --> 00:06:01,940
But you have higher chances of success if you break it down, understand where do you

77
00:06:01,940 --> 00:06:05,779
need external help to solve some parts of the problem, and maybe take an afternoon to

78
00:06:05,779 --> 00:06:07,100
solve it.

79
00:06:07,100 --> 00:06:13,140
And when we put LLMs in charge of the logic, this is when we're talking about an agentic

80
00:06:13,140 --> 00:06:15,420
approach.

81
00:06:15,420 --> 00:06:21,899
So let's break down the components of LLM agents.

82
00:06:21,899 --> 00:06:26,500
The first capability is the ability to reason, which we talked about.

83
00:06:26,500 --> 00:06:31,339
So this is putting the model at the core of how problems are being solved.

84
00:06:31,339 --> 00:06:35,579
The model will be prompted to come up with a plan and to reason about each step of the

85
00:06:35,579 --> 00:06:37,899
process along the way.

86
00:06:37,899 --> 00:06:41,779
Another capability of agents is the ability to act.

87
00:06:41,779 --> 00:06:47,739
And this is done by external programs that are known in the industry as tools.

88
00:06:47,739 --> 00:06:52,940
So tools are external pieces of the program, and the model can define when to call them

89
00:06:52,940 --> 00:06:57,540
and how to call them in order to best execute the solution to the question they've been

90
00:06:57,540 --> 00:06:58,839
asked.

91
00:06:58,839 --> 00:07:05,940
So an example of a tool could be search, searching the web, searching a database at their disposal.

92
00:07:05,940 --> 00:07:10,100
Another example could be a calculator to do some math.

93
00:07:10,100 --> 00:07:15,279
This could be a piece of program code that maybe might manipulate the database.

94
00:07:15,279 --> 00:07:19,880
This can also be another language model that maybe you're trying to do a translation task

95
00:07:19,880 --> 00:07:23,119
and you want a model that can be able to do that.

96
00:07:23,119 --> 00:07:25,519
And there's so many other possibilities of what can do there.

97
00:07:25,519 --> 00:07:30,160
So these could be APIs, basically any piece of external program you want to give your

98
00:07:30,160 --> 00:07:32,320
model access to.

99
00:07:32,320 --> 00:07:37,160
Third capability, that is the ability to access memory.

100
00:07:37,160 --> 00:07:39,480
And the term memory can mean a couple of things.

101
00:07:39,480 --> 00:07:45,200
So we talked about the models thinking through the program, kind of how you think out loud

102
00:07:45,200 --> 00:07:47,799
when you're trying to solve through a problem.

103
00:07:47,799 --> 00:07:52,760
So those inner logs can be stored and can be useful to retrieve at different points

104
00:07:52,760 --> 00:07:53,760
in time.

105
00:07:53,760 --> 00:07:58,760
But also this could be the history of conversations that you as a human had when interacting with

106
00:07:58,760 --> 00:07:59,760
the agent.

107
00:07:59,760 --> 00:08:03,480
And that would allow to make the experience much more personalized.

108
00:08:03,480 --> 00:08:07,640
So the way of configuring AI agents, there's many ways to approach it.

109
00:08:07,640 --> 00:08:13,760
One of the most popular ways of going about it is through something called React, which

110
00:08:13,760 --> 00:08:20,940
as you can tell by the name, combines the reasoning and act components of LLM agents.

111
00:08:20,940 --> 00:08:23,519
So let's make this very concrete.

112
00:08:24,279 --> 00:08:25,839
What happens when I configure a React agent?

113
00:08:25,839 --> 00:08:28,959
You have your user query.

114
00:08:28,959 --> 00:08:31,920
That gets fed into a model, so an LLM.

115
00:08:31,920 --> 00:08:33,599
The LLM is given a prompt.

116
00:08:33,599 --> 00:08:39,599
So the instructions it's given is, don't give me the first answer that pops to you.

117
00:08:39,599 --> 00:08:40,719
Think slow.

118
00:08:40,719 --> 00:08:44,559
Plan your work.

119
00:08:44,559 --> 00:08:46,760
And then try to execute something.

120
00:08:46,760 --> 00:08:48,299
Try to act.

121
00:08:48,299 --> 00:08:53,039
And when you want to act, you can define whether if you want to use external tools to help

122
00:08:53,039 --> 00:08:58,320
you come up with the solution, once you call a tool and you get an answer, maybe it gave

123
00:08:58,320 --> 00:09:03,479
you the wrong answer or it came up with an error, you can observe that.

124
00:09:03,479 --> 00:09:08,400
So the LLM would observe the answer, would determine if it does answer the question at

125
00:09:08,400 --> 00:09:13,640
hand or whether it needs to iterate on the plan and tackle it differently up until I

126
00:09:13,640 --> 00:09:18,859
get to a final answer.

127
00:09:18,859 --> 00:09:22,340
So let's go back and make this very concrete again.

128
00:09:22,340 --> 00:09:25,340
Let's talk about my vacation example.

129
00:09:25,340 --> 00:09:29,159
And as you can tell, I'm really excited to go on one.

130
00:09:29,159 --> 00:09:31,619
So I want to take the rest of my vacation days.

131
00:09:31,619 --> 00:09:33,859
I'm planning to go on to Florida next month.

132
00:09:33,859 --> 00:09:36,340
I'm planning on being outdoors a lot.

133
00:09:36,340 --> 00:09:37,739
And I'm prone to burning.

134
00:09:37,739 --> 00:09:43,979
So I want to know, what is the number of two-ounce sunscreen bottles that I should bring with

135
00:09:43,979 --> 00:09:44,979
me?

136
00:09:44,979 --> 00:09:46,260
And this is a complex problem.

137
00:09:46,260 --> 00:09:49,700
So there's a first thing, there's a number of things to plan.

138
00:09:49,700 --> 00:09:52,659
One is how many vacation days am I planning to take?

139
00:09:52,659 --> 00:09:57,099
And maybe that is information the system can retrieve from its memory because I asked that

140
00:09:57,099 --> 00:09:58,700
question before.

141
00:09:58,700 --> 00:10:01,059
Two is how many hours do I plan to be in the sun?

142
00:10:01,059 --> 00:10:02,880
I said I plan to be in there a lot.

143
00:10:02,880 --> 00:10:08,059
So maybe that would mean looking into the weather forecast for next month in Florida

144
00:10:08,059 --> 00:10:12,340
and seeing what is the average sun hours that are expected.

145
00:10:12,340 --> 00:10:17,640
Three is maybe going to a public health website to understand what is the recommended dosage

146
00:10:17,640 --> 00:10:19,760
of sunscreen per hour in the sun.

147
00:10:19,760 --> 00:10:24,299
And then four, doing some math to be able to determine how much of that sunscreen fits

148
00:10:24,299 --> 00:10:25,859
into two-ounce bottles.

149
00:10:25,859 --> 00:10:27,580
So that's quite complicated.

150
00:10:27,580 --> 00:10:31,799
But what's really powerful here is there's so many paths that can be explored in order

151
00:10:31,799 --> 00:10:32,880
to solve a problem.

152
00:10:32,880 --> 00:10:35,080
So this makes the system quite modular.

153
00:10:35,080 --> 00:10:39,020
And I can hit it with much more complex problems.

154
00:10:39,020 --> 00:10:45,280
So going back to the concept of compound AI systems, compound AI systems are here to stay.

155
00:10:45,280 --> 00:10:49,479
What we're going to observe this year is that they're going to become more agentic.

156
00:10:49,479 --> 00:11:00,880
The way I like to think about it is you have a sliding scale of LLM autonomy.

157
00:11:00,880 --> 00:11:08,460
And the person defining the system would examine what trade-offs they want in terms of autonomy

158
00:11:08,460 --> 00:11:10,119
in the system.

159
00:11:10,119 --> 00:11:15,200
For certain problems, especially problems that are narrow, well-defined, so you don't

160
00:11:15,200 --> 00:11:19,440
expect someone to ask about the weather when they mean to ask about vacations.

161
00:11:19,440 --> 00:11:24,159
So a narrow problem set, you can define a narrow system like this one.

162
00:11:24,159 --> 00:11:28,679
It's more efficient to go the programmatic route because every single query will be answered

163
00:11:28,679 --> 00:11:29,679
the same way.

164
00:11:29,679 --> 00:11:35,700
If I were to apply the agentic approach here, there might be unnecessarily looping and iteration.

165
00:11:35,700 --> 00:11:40,460
So for narrow problems, programmatic approach can be more efficient than going the agentic

166
00:11:40,460 --> 00:11:41,460
route.

167
00:11:41,460 --> 00:11:48,380
But if I expect to have a system accomplish very complex tasks, like, say, trying to solve

168
00:11:48,380 --> 00:11:55,020
GitHub issues independently and handle a variety of queries, a spectrum of queries, this is

169
00:11:55,020 --> 00:11:59,179
where an agentic route could be helpful because it would take you too much effort to configure

170
00:11:59,179 --> 00:12:01,659
every single path in the system.

171
00:12:01,659 --> 00:12:04,820
And we're still in the early days of agentic systems.

172
00:12:04,940 --> 00:12:09,299
So we're seeing rapid progress when you combine the effects of system design with agentic

173
00:12:09,299 --> 00:12:10,979
behavior.

174
00:12:10,979 --> 00:12:14,979
And of course, you will have a human in the loop in most cases as the accuracy is improving.