WEBVTT

00:00.160 --> 00:00.680
Hey there.

00:00.720 --> 00:01.360
Ethan here.

00:01.400 --> 00:02.880
Hope you're enjoying this section.

00:02.880 --> 00:04.360
Learning about rag retrieval.

00:04.360 --> 00:05.480
Augmentation generation.

00:05.720 --> 00:11.800
So in this video I want to discuss and explore a bit the link chain documentation.

00:12.280 --> 00:15.760
Now I want to start by saying that I love link chain.

00:15.760 --> 00:19.600
I love the ecosystem and link graph and everything around it.

00:19.880 --> 00:26.000
However, I do have some criticism to link chain and how they manifested their documentation.

00:26.320 --> 00:30.120
So I've been with link chain since the very, very beginning.

00:30.120 --> 00:32.000
I've seen all the iterations.

00:32.000 --> 00:36.320
I've seen it how it evolved from from really the very, very beginning.

00:36.640 --> 00:40.720
And they've made tons and a lot of changes to the documentation.

00:41.000 --> 00:49.760
And specifically, after version 1.0, they removed tons of documentation and things that I think are

00:49.760 --> 00:55.280
super, super important and are still in the source code and are not planned to be deprecated.

00:55.400 --> 01:00.770
Not only that, I think their approach they took, for example, for direct documentation, which I'm

01:00.770 --> 01:03.090
going to show you here, is wrong.

01:03.290 --> 01:05.970
And I don't really believe that's the best practice.

01:06.290 --> 01:12.130
And this is why what I teach in this section is a bit different than what you see in the documentation,

01:12.130 --> 01:13.410
and I want to show you it.

01:13.410 --> 01:19.690
So if I go to the documentation here, if you go under the tutorials, you have a bunch of documentation,

01:19.730 --> 01:21.530
you have a semantic search.

01:21.530 --> 01:27.330
So here you can see you have a tutorial on building a semantic search engine with link chain.

01:27.370 --> 01:33.530
This is pretty much what we did in the previous video where we discussed about the ingestion and the

01:33.570 --> 01:34.290
retrieval.

01:34.330 --> 01:34.970
Right.

01:35.010 --> 01:40.610
So you can see here you have the concepts of documents, text, feature embeddings, vector stores,

01:40.650 --> 01:41.370
retrievers.

01:41.370 --> 01:42.650
We've discussed all of them.

01:42.650 --> 01:43.090
Right.

01:43.450 --> 01:48.370
So in the examples here, here is that example for example of how to load a document.

01:48.730 --> 01:52.130
So here you can see how you can load a PDF document.

01:52.290 --> 01:54.530
Here we are printing the page content.

01:54.570 --> 01:55.810
Those are the things we did.

01:55.850 --> 01:59.790
They discuss about splitting here about embedding.

02:00.110 --> 02:01.870
About what is the embedding output.

02:01.870 --> 02:08.190
So things that we've already discussed and how we can go and add the embeddings into the vector store

02:08.190 --> 02:10.230
with the add documents function.

02:10.230 --> 02:12.470
But we don't really see an application.

02:12.470 --> 02:14.550
So we get here only snippets.

02:14.590 --> 02:19.830
And here in this example you can see that the taking a retrieval function which is going to use the

02:19.830 --> 02:21.710
vector store similarity search.

02:21.710 --> 02:23.550
And they wrap it as a chain.

02:23.710 --> 02:29.590
So I've never seen this syntax before which is actually quite surprising to see it like this.

02:29.990 --> 02:36.870
Here you can see some examples of them using the retriever and then invoking it here it's a batch invoke.

02:36.910 --> 02:38.630
So this is the first thing.

02:38.630 --> 02:41.070
So let's go to build a Rag application.

02:41.310 --> 02:45.350
And then it leads us to build a Rag agent link chain.

02:45.390 --> 02:53.110
And here they actually use a react agent with a searching tool a similarity search tool.

02:53.150 --> 02:53.750
Right.

02:53.790 --> 03:00.400
And first in this tutorial they show you how to build a Rag agent, which is a react agent with a search

03:00.400 --> 03:02.600
tool to perform similarity search.

03:02.840 --> 03:08.200
Then they show a two step rag gene, which is very similar to what we did in the course.

03:08.200 --> 03:09.520
But there's a caveat here.

03:09.720 --> 03:10.760
So let me show you.

03:10.800 --> 03:14.600
So we discussed about indexing about retrieval and generation.

03:14.600 --> 03:16.360
So we really know all about it.

03:16.360 --> 03:18.440
And basically what they did here.

03:18.560 --> 03:24.000
So here they are discussing indexing which we discussed already splitting the documents.

03:24.280 --> 03:26.840
All right here let's talk about the retrieval and generation.

03:26.840 --> 03:28.560
So this is the interesting part here.

03:28.600 --> 03:34.240
So in their example they write a function which they call it retrieve context which is going to receive

03:34.280 --> 03:37.120
a query and it's going to retrieve.

03:37.520 --> 03:40.240
And it's going to return the retrieve docs.

03:40.400 --> 03:44.880
And they take these functions and they wrap it as a tool with some structured output.

03:45.000 --> 03:49.600
So in this implementation they create a retrieve context tool.

03:49.720 --> 03:51.000
So here's the tool list.

03:51.000 --> 03:54.680
And they create a react agent with one tool.

03:54.800 --> 03:56.220
And they tell it in the prompt.

03:56.220 --> 03:59.540
You have access to a tool that retrieves context from a blog post.

03:59.580 --> 04:02.140
Use this tool to help answer user queries.

04:02.540 --> 04:05.420
So I don't like this approach at all.

04:05.740 --> 04:09.940
We leave the decision whether to call this tool or not to the agent.

04:10.060 --> 04:17.660
I worked with hundreds of customers and I never seen anything like this in production because we do

04:17.660 --> 04:20.220
not want to leave this to the LLM.

04:20.260 --> 04:26.540
The create agent here is a react agent, and this is something which is pretty autonomous.

04:26.540 --> 04:28.700
So it has all the freedom to do what it wants.

04:28.700 --> 04:30.380
And I discussed this in the course.

04:30.380 --> 04:37.700
And if we have an application for example a customer support, then we do not want the agent to go and

04:37.700 --> 04:40.580
answer anything else besides our business logic.

04:40.860 --> 04:44.780
So here we have a lot of room to fail.

04:44.980 --> 04:50.740
And this agent can be very easily be manipulated to do nonsense.

04:50.940 --> 04:51.900
Let's go and continue.

04:51.940 --> 04:57.030
Here you can see, by the way, an example of this run where you ask a question with tool calling,

04:57.070 --> 04:58.150
it decides what to call.

04:58.150 --> 05:00.110
It calls the retrieval tool.

05:00.270 --> 05:06.310
And this is actually very redundant and expensive and costly when it comes to tokens.

05:06.310 --> 05:12.430
And it adds the latency because, for example, if we have a customer support agent on our application

05:12.430 --> 05:16.470
which knows our business logic, we don't really need tool calling here.

05:16.470 --> 05:23.630
This is just adding overhead, because in this case, we always want to go and query our knowledge base,

05:23.630 --> 05:25.990
which is the vector store in this case.

05:25.990 --> 05:29.950
So doing it with tool calls is actually quite redundant here.

05:29.990 --> 05:35.350
And now what I like about this doc is the fact that they actually discuss this trade off here.

05:35.350 --> 05:39.750
So here they're mentioning the benefits and drawbacks when using this.

05:40.110 --> 05:42.070
So in the above Agentic rag.

05:42.070 --> 05:46.230
So we used our react agent with a search tool formulation.

05:46.230 --> 05:48.350
We allow the LLM to use its discretion.

05:48.350 --> 05:53.920
So this is what I told you which was problematic in generating a tool call to help to answer the user

05:53.920 --> 05:54.480
queries.

05:54.520 --> 05:57.600
This is good general purpose solution but comes with trade offs.

05:57.640 --> 05:59.880
Okay, let's see the trade offs.

05:59.960 --> 06:01.840
Okay let's start with the reduce control.

06:02.120 --> 06:08.040
The LM may skip searches when they are actually needed or issue extra searches when unnecessary.

06:08.160 --> 06:12.600
And this is a really big problem because we are using the react agent here.

06:12.600 --> 06:14.400
And here we have two inference calls.

06:14.400 --> 06:20.560
When a search is performed, it requires one call to generate the query and another to produce the final

06:20.560 --> 06:21.400
response here.

06:21.400 --> 06:23.480
So this is something that adds latency.

06:23.520 --> 06:25.960
Why they say it's good search only when needed.

06:26.000 --> 06:31.800
The LM can handle greetings, follow ups and simple queries without triggering unnecessary searches.

06:32.040 --> 06:39.520
So this is true, but it can also answer things that are non-relevant and can actually cause problems

06:39.520 --> 06:40.440
to the company.

06:40.480 --> 06:45.680
If somebody wants to try to jailbreak the agent, you don't really want to allow it.

06:45.720 --> 06:47.320
Second contextual search.

06:47.800 --> 06:53.940
And here they say contextual search queries by treating a search as a tool with a query input the alien

06:53.940 --> 06:57.220
craft its own queries that incorporates conversational context.

06:57.260 --> 06:58.620
This is semi-true.

06:58.620 --> 07:02.900
So this is really an advantage because it's using function calling under the hood.

07:03.140 --> 07:09.380
So this query that is going to be embedded and going to be performing the similarity search on.

07:09.540 --> 07:15.620
So it's going to be changing constantly according to the user input according to the user input and

07:15.620 --> 07:17.140
the conversation history.

07:17.140 --> 07:18.340
So this is a smart move.

07:18.380 --> 07:23.500
But we can actually implement this with an expression language and make this deterministic as well.

07:23.540 --> 07:25.420
And here we have multiple searches allowed.

07:25.460 --> 07:29.900
The LLM can execute several searches in support for a single user query.

07:29.900 --> 07:37.500
So later in the course I talk about real Agentic Rag, which is based on Landgraf, which is going to

07:37.500 --> 07:42.540
be based on research papers, which is actually going to do something much better than this, and it's

07:42.540 --> 07:44.580
going to be much more deterministic.

07:44.620 --> 07:45.140
All right.

07:45.180 --> 07:51.830
Then they discuss another common approach is a two step chain in which we always run a search, potentially

07:51.830 --> 07:53.110
using raw user query.

07:53.150 --> 07:58.070
This is what we did in the course and incorporate the results as context for a single query.

07:58.110 --> 07:59.310
This is exactly what we did.

07:59.350 --> 08:02.710
This results in a single inference called per query.

08:02.870 --> 08:06.190
Buying reduced latency at the expense of flexibility.

08:06.230 --> 08:08.230
Because it's fixed, it's always doing.

08:08.230 --> 08:11.350
It's always retrieving the documents then making an LM call.

08:11.390 --> 08:16.790
In this approach, we no longer call the model in a loop, but instead make just a single pass.

08:16.830 --> 08:21.870
We can implement this chain by removing the tools from the agent, and instead incorporating the retrieval.

08:21.870 --> 08:24.350
Step into a custom prompt and hear.

08:24.350 --> 08:29.870
What they did is to create an agent without any tools, and then injected this functionality to do the

08:29.870 --> 08:36.830
similarity search to get the relevant documents inside the middleware so it can work.

08:37.310 --> 08:42.950
But it's going like this and they create agent is in fact running in a loop.

08:42.950 --> 08:48.390
So I do not like this solution because when you're going to use the create agent, in fact you do not

08:48.390 --> 08:50.130
know what's going under the hood.

08:50.170 --> 08:53.570
You can know if you'll dive into the source code and you'll analyze it.

08:53.570 --> 08:55.450
But those things are changing.

08:55.530 --> 08:59.730
So if you update the package, this can suddenly change and break your application.

08:59.850 --> 09:01.490
So this is not explicit here.

09:01.490 --> 09:03.930
And this is way, way too abstracted.

09:04.050 --> 09:09.330
And I think in order to implement something which is robust and it's production ready, you really need

09:09.330 --> 09:11.090
to have control on everything.

09:11.210 --> 09:12.930
So this was ragged long chain.

09:13.250 --> 09:18.050
Now they have here a section custom rack agent under Landgraf.

09:18.050 --> 09:20.010
And this tutorial is excellent.

09:20.010 --> 09:23.250
And I actually go and show this architecture in this course.

09:23.250 --> 09:25.850
This architecture is based on papers.

09:26.050 --> 09:28.690
This one is proved to work well.

09:28.930 --> 09:34.930
It has a lot of things like checking hallucinations, checking answers which are relevant to the question.

09:34.930 --> 09:40.850
And we are going to cover this really, really in depth in the course in the graph section.

09:40.850 --> 09:43.170
So we're going to cover it later.

09:43.170 --> 09:44.650
So I like this one.

09:44.650 --> 09:46.610
So this is it for this video.

09:46.650 --> 09:48.410
We'd love to get your feedback on it.
