WEBVTT

1
00:00:00.180 --> 00:00:01.920
<v ->Hey there, Eden here.</v>

2
00:00:01.920 --> 00:00:06.390
And up until now, we saw two ways to do RAG in the course.

3
00:00:06.390 --> 00:00:08.640
The first one was two-step RAG,

4
00:00:08.640 --> 00:00:12.900
and this was done with the LangChain Expression Language.

5
00:00:12.900 --> 00:00:14.160
And here, the retrieval

6
00:00:14.160 --> 00:00:16.650
always happens before the generation,

7
00:00:16.650 --> 00:00:18.510
simple and predictable.

8
00:00:18.510 --> 00:00:20.370
So here we have a lot of control

9
00:00:20.370 --> 00:00:22.740
of when the retrieval is going to happen.

10
00:00:22.740 --> 00:00:26.370
It's not that flexible because it's always going to happen.

11
00:00:26.370 --> 00:00:28.890
We're always going to retrieve the documents.

12
00:00:28.890 --> 00:00:30.390
And it's really, really fast

13
00:00:30.390 --> 00:00:33.360
because we don't have an LLM deciding

14
00:00:33.360 --> 00:00:35.670
whether if we need to do retrieval or not.

15
00:00:35.670 --> 00:00:37.290
And the second thing we saw,

16
00:00:37.290 --> 00:00:39.810
queue LangChain call it agentic RAG,

17
00:00:39.810 --> 00:00:42.420
but I call it a RAG agent.

18
00:00:42.420 --> 00:00:45.960
So here we took our React agent

19
00:00:45.960 --> 00:00:48.360
and we simply gave it a retrieval tool.

20
00:00:48.360 --> 00:00:51.300
And here the LLM decided

21
00:00:51.300 --> 00:00:55.200
when and how to retrieve during the reasoning process.

22
00:00:55.200 --> 00:00:57.150
And here we have much less control

23
00:00:57.150 --> 00:00:59.070
because we don't really control

24
00:00:59.070 --> 00:01:01.260
when the retrieval is going to happen.

25
00:01:01.260 --> 00:01:02.760
So it's really flexible

26
00:01:02.760 --> 00:01:05.190
because the LLM decides what to do,

27
00:01:05.190 --> 00:01:07.740
and the latency here in the RAG agent

28
00:01:07.740 --> 00:01:10.050
is slower than the two-step RAG,

29
00:01:10.050 --> 00:01:12.120
because here we have another LLM call

30
00:01:12.120 --> 00:01:14.040
before we even do the retrieval.

31
00:01:14.040 --> 00:01:18.360
And if we have a bunch of LLM calls and then the retrieval,

32
00:01:18.360 --> 00:01:20.220
so this thing can vary here.

33
00:01:20.220 --> 00:01:22.980
And there is also a hybrid architecture,

34
00:01:22.980 --> 00:01:26.220
so it's going to combine elements from the agentic RAG

35
00:01:26.220 --> 00:01:27.540
and the two-step RAG.

36
00:01:27.540 --> 00:01:29.400
And this is actually something

37
00:01:29.400 --> 00:01:31.920
I'm going to show you in the LangGraph section,

38
00:01:31.920 --> 00:01:33.360
because we're going to be implementing

39
00:01:33.360 --> 00:01:35.580
this kind of approach with LangGraph.

40
00:01:35.580 --> 00:01:39.300
Now, the question that comes up is which approach is better?

41
00:01:39.300 --> 00:01:42.030
And the answer for this question

42
00:01:42.030 --> 00:01:44.910
is it depends on your use case.

43
00:01:44.910 --> 00:01:47.340
However, hybrid RAG is something

44
00:01:47.340 --> 00:01:50.430
which is going to combine the both of both worlds.

45
00:01:50.430 --> 00:01:54.270
And from my experience working with production systems

46
00:01:54.270 --> 00:01:57.330
and with customers in enterprises,

47
00:01:57.330 --> 00:02:00.240
these kinds of architecture usually wins,

48
00:02:00.240 --> 00:02:03.510
and it is most commonly used, at least today.

49
00:02:03.510 --> 00:02:05.790
So in this hybrid RAG approach,

50
00:02:05.790 --> 00:02:09.240
it's going to introduce intermediate steps

51
00:02:09.240 --> 00:02:11.040
as query pre-processing,

52
00:02:11.040 --> 00:02:14.220
retrieval, validation, and post-generation checks.

53
00:02:14.220 --> 00:02:16.560
And these systems offer more flexibility

54
00:02:16.560 --> 00:02:17.760
than the fixed pipeline

55
00:02:17.760 --> 00:02:20.730
while maintaining some control over execution.

56
00:02:20.730 --> 00:02:24.690
And in this approach, the benefits that we get for it

57
00:02:24.690 --> 00:02:27.210
is that it has query enhancement.

58
00:02:27.210 --> 00:02:29.670
So it's going to take the original query,

59
00:02:29.670 --> 00:02:31.260
it's going to make it better,

60
00:02:31.260 --> 00:02:33.690
so it would be better for the retrieval process.

61
00:02:33.690 --> 00:02:35.580
Then after we retrieve the documents,

62
00:02:35.580 --> 00:02:36.960
we are going to review them

63
00:02:36.960 --> 00:02:39.570
and we're going to be validating those documents

64
00:02:39.570 --> 00:02:41.610
to see that they actually mean a lot

65
00:02:41.610 --> 00:02:44.490
and that they actually can answer the question

66
00:02:44.490 --> 00:02:45.360
that was asked.

67
00:02:45.360 --> 00:02:48.180
And thirdly, there is this answer validation

68
00:02:48.180 --> 00:02:50.250
to see that there are no hallucinations

69
00:02:50.250 --> 00:02:53.370
and that the answer actually answer the questions.

70
00:02:53.370 --> 00:02:55.800
And you can see the architecture right over here.

71
00:02:55.800 --> 00:02:57.270
And I don't want to cover it now

72
00:02:57.270 --> 00:03:01.260
because we go very, very deep into this later in this course

73
00:03:01.260 --> 00:03:03.330
and we see how to implement everything here

74
00:03:03.330 --> 00:03:04.680
from the grounds up,

75
00:03:04.680 --> 00:03:07.620
and we're going to have a very deep understanding on it.

76
00:03:07.620 --> 00:03:11.520
So the short answer is that this hybrid approach,

77
00:03:11.520 --> 00:03:13.590
this is actually what's currently used

78
00:03:13.590 --> 00:03:16.410
in production systems for enterprises.

79
00:03:16.410 --> 00:03:19.170
And this kinds of architecture is what I recommend

80
00:03:19.170 --> 00:03:21.720
because it captures the both of both worlds.

81
00:03:21.720 --> 00:03:26.610
The RAG agent, we saw from before, is way, way too flexible.

82
00:03:26.610 --> 00:03:28.860
This here, it's way, way too flexible

83
00:03:28.860 --> 00:03:32.790
because we give the LLM the entire freedom what to do.

84
00:03:32.790 --> 00:03:35.760
For a lot of production application, this is not the case.

85
00:03:35.760 --> 00:03:38.070
And let's talk about the use cases of RAG,

86
00:03:38.070 --> 00:03:39.750
because usually we want to use it

87
00:03:39.750 --> 00:03:42.930
for question-answer over documents,

88
00:03:42.930 --> 00:03:47.310
over documentation, over some internal documents

89
00:03:47.310 --> 00:03:52.110
or some knowledge base that we want to ground our answering.

90
00:03:52.110 --> 00:03:54.720
And an agent is going to be an overkill for that.

91
00:03:54.720 --> 00:03:56.880
So to summarize, this RAG agent

92
00:03:56.880 --> 00:03:59.610
that we saw here and we implemented,

93
00:03:59.610 --> 00:04:02.070
I don't really think it's the best solution,

94
00:04:02.070 --> 00:04:04.653
and I would definitely not use it in production.

