1
00:00:07,000 --> 00:00:11,000
So we all know what Retrieval Augmented Generation is.

2
00:00:11,000 --> 00:00:13,000
Let's just do a quick refresher.

3
00:00:13,000 --> 00:00:17,000
Retrieval Augmented Generation is a powerful and popular pipeline

4
00:00:17,000 --> 00:00:20,000
that enhances responses from a large language model.

5
00:00:20,000 --> 00:00:24,000
It does this by incorporating relevant data retrieved from a vector database,

6
00:00:24,000 --> 00:00:29,000
adding it as context to the prompt, and sending it to the LLM for generation.

7
00:00:29,000 --> 00:00:34,000
What this does is it allows the LLM to ground its response in concrete and accurate information,

8
00:00:34,000 --> 00:00:38,000
and that improves the quality and reliability of the response.

9
00:00:38,000 --> 00:00:40,000
Let me quickly sketch it out.

10
00:00:40,000 --> 00:00:47,000
So let's say we have a user, or an application even,

11
00:00:47,000 --> 00:00:51,000
and they send a query.

12
00:00:51,000 --> 00:00:53,000
Now, without Retrieval Augmented Generation,

13
00:00:53,000 --> 00:01:01,000
this query is going to go and get itself interpolated into a prompt.

14
00:01:01,000 --> 00:01:07,000
And from there, that's going to hit the LLM,

15
00:01:07,000 --> 00:01:14,000
and that's going to generate an output.

16
00:01:14,000 --> 00:01:18,000
To make this RAG, we can add a vector database.

17
00:01:18,000 --> 00:01:21,000
So instead of just going directly and getting itself interpolated into the prompt,

18
00:01:21,000 --> 00:01:24,000
it's going to hit this vector DB,

19
00:01:24,000 --> 00:01:29,000
and the response from that vector DB is going to be used as context for the prompt.

20
00:01:29,000 --> 00:01:33,000
Now, in this typical RAG pipeline, we call the LLM only once,

21
00:01:33,000 --> 00:01:37,000
and we use it solely to generate a response.

22
00:01:37,000 --> 00:01:41,000
But what if we could leverage the LLM not just for responses,

23
00:01:41,000 --> 00:01:43,000
but also for additional tasks,

24
00:01:43,000 --> 00:01:47,000
like deciding which vector database to query if we have multiple databases,

25
00:01:47,000 --> 00:01:50,000
or even determining the type of response to give?

26
00:01:50,000 --> 00:01:54,000
Should it answer with text, generate a chart, or even provide a code snippet?

27
00:01:54,000 --> 00:01:58,000
And that would all be dependent on the context of that query.

28
00:01:58,000 --> 00:02:05,000
So this is where the agentic RAG pipeline comes into play.

29
00:02:05,000 --> 00:02:09,000
In agentic RAG, we use the LLM as an agent,

30
00:02:09,000 --> 00:02:12,000
and the LLM goes beyond just generating a response.

31
00:02:12,000 --> 00:02:16,000
It takes on an active role and can make decisions that will improve both

32
00:02:16,000 --> 00:02:19,000
the relevance and accuracy of the retrieved data.

33
00:02:19,000 --> 00:02:24,000
Now let's explore how we can augment the initial process with an agent

34
00:02:24,000 --> 00:02:26,000
and a couple of different sources of data.

35
00:02:26,000 --> 00:02:32,000
So instead of just one single source, let's add a second.

36
00:02:32,000 --> 00:02:38,000
And the first one can be, you know, internal documentation, right?

37
00:02:38,000 --> 00:02:46,000
And the second one can be general industry knowledge.

38
00:02:46,000 --> 00:02:49,000
Now in the internal documentation, we're going to have things like policies,

39
00:02:49,000 --> 00:02:50,000
procedures, and guidelines.

40
00:02:50,000 --> 00:02:54,000
And the general knowledge base will have things like industry standards,

41
00:02:54,000 --> 00:02:57,000
best practices, and public resources.

42
00:02:57,000 --> 00:03:01,000
So how can we get the LLM to use the vector database that contains the data

43
00:03:01,000 --> 00:03:04,000
that will be most relevant to the query?

44
00:03:04,000 --> 00:03:11,000
Let's add that agent into this pipeline.

45
00:03:11,000 --> 00:03:15,000
Now this agent can intelligently decide which database to query

46
00:03:15,000 --> 00:03:17,000
based on the user's question.

47
00:03:17,000 --> 00:03:19,000
And the agent isn't making a random guess.

48
00:03:19,000 --> 00:03:23,000
It's leveraging the LLM's language understanding capabilities

49
00:03:23,000 --> 00:03:27,000
to interpret the query and determine its context.

50
00:03:27,000 --> 00:03:29,000
So if an employee asks,

51
00:03:29,000 --> 00:03:32,000
"What's the company's policy on remote work during the holidays?"

52
00:03:32,000 --> 00:03:34,000
It would route that to the internal documentation.

53
00:03:34,000 --> 00:03:38,000
And that response will be used as context for the prompt.

54
00:03:38,000 --> 00:03:40,000
But if the question is more general, like,

55
00:03:40,000 --> 00:03:44,000
"What are the industry standards for remote work in tech companies?"

56
00:03:44,000 --> 00:03:47,000
The agent's going to route that to the general knowledge database.

57
00:03:47,000 --> 00:03:50,000
And that context is going to be used within that prompt.

58
00:03:50,000 --> 00:03:53,000
Powered by an LLM and properly trained,

59
00:03:53,000 --> 00:03:55,000
the agent analyzes the query,

60
00:03:55,000 --> 00:03:58,000
and based on the understanding of the content and the context,

61
00:03:58,000 --> 00:04:00,000
decides which database to use.

62
00:04:00,000 --> 00:04:03,000
But they're not always going to ask questions that are generally

63
00:04:03,000 --> 00:04:07,000
or genuinely relevant to any of the stuff that we have in our vector DB.

64
00:04:07,000 --> 00:04:11,000
So what if someone asks a question that is just totally out of left field?

65
00:04:11,000 --> 00:04:14,000
Like, "Who won the World Series in 2015?"

66
00:04:14,000 --> 00:04:19,000
What the agent can do at that point is it could route it to a fail-safe.

67
00:04:21,000 --> 00:04:27,000
So because the agent is able to recognize the context of the query,

68
00:04:27,000 --> 00:04:31,000
it could recognize that it's not a part of the two databases that we have.

69
00:04:31,000 --> 00:04:34,000
It could route it to the fail-safe and return back,

70
00:04:34,000 --> 00:04:38,000
"Sorry, I don't have the information you're looking for."

71
00:04:38,000 --> 00:04:43,000
This agentic RAG pipeline can be used in customer support systems and legal tech.

72
00:04:43,000 --> 00:04:46,000
For example, a lawyer can source answers to their questions

73
00:04:46,000 --> 00:04:48,000
from their internal briefs,

74
00:04:48,000 --> 00:04:52,000
and then in another query, just get stuff from public caseload databases.

75
00:04:52,000 --> 00:04:55,000
The agent can be utilized in a ton of ways.

76
00:04:55,000 --> 00:04:59,000
Agentic RAG is an evolution in how we enhance the RAG pipeline

77
00:04:59,000 --> 00:05:02,000
by moving beyond simple response generation

78
00:05:02,000 --> 00:05:04,000
to more intelligent decision-making.

79
00:05:04,000 --> 00:05:07,000
By allowing an agent to choose the best data sources

80
00:05:07,000 --> 00:05:10,000
and potentially even incorporate external information,

81
00:05:10,000 --> 00:05:13,000
like real-time data or third-party services,

82
00:05:13,000 --> 00:05:16,000
we can create a pipeline that's more responsive,

83
00:05:16,000 --> 00:05:19,000
more accurate, and more adaptable.

84
00:05:19,000 --> 00:05:21,000
This approach opens up so many possibilities

85
00:05:21,000 --> 00:05:25,000
for applications in customer service, legal tech, healthcare,

86
00:05:25,000 --> 00:05:27,000
virtually any field.

87
00:05:27,000 --> 00:05:29,000
As the technology continues to evolve,

88
00:05:29,000 --> 00:05:32,000
we will see AI systems that truly understand context

89
00:05:32,000 --> 00:05:34,000
and can deliver amazing values.