WEBVTT

00:00.120 --> 00:06.520
Alrighty, let me go and tag the solution as this is the solution without link chain expression language.

00:06.520 --> 00:09.200
And it's the simple function based approach.

00:09.400 --> 00:15.600
And let's now go and implement the link chain expression language version of our Rag retrieval.

00:15.600 --> 00:20.600
So to the link chain expression language version let's go and add another import.

00:20.600 --> 00:26.760
So I imported the str output parser that we have got to know in previous sections.

00:27.320 --> 00:33.160
And let's go now and also import something which is called a runnable pass through.

00:33.560 --> 00:41.920
And as it name suggests, it's a runnable that is going to let its input pass through when it's invoked.

00:41.920 --> 00:43.720
So it's not going to change it.

00:44.000 --> 00:47.520
So let me go to its actual implementation.

00:47.760 --> 00:52.520
So right now we are actually going to the source code of the Lang chain ecosystem.

00:52.560 --> 00:59.400
And we can see that this runnable actually behaves almost exactly like the identity function.

00:59.400 --> 01:07.630
So it does not change the input, except that it has the ability to be configured such that we add additional

01:07.670 --> 01:09.030
keys to the output.

01:09.190 --> 01:11.230
If the input is a dictionary.

01:11.950 --> 01:16.590
And this is going to be very helpful for us, and we'll see it later in this video.

01:17.190 --> 01:26.830
And I want also to import item Gator, which is a Python utility function from the operator module that

01:26.830 --> 01:34.590
creates a callable object to fetch items from an object using indexing, we can actually use lambda

01:34.590 --> 01:38.550
functions to do so, but it's more convenient to do it with item gator.

01:38.550 --> 01:40.830
So we're going to also use this later.

01:43.190 --> 01:46.430
So I'm going to create a new function.

01:46.630 --> 01:52.510
And this function is going to be called create retrieval chain with link chain expression language.

01:52.710 --> 01:57.310
Now notice that this function does not receive any arguments.

01:57.550 --> 02:04.350
And the reason for this is that this function is going to eventually return us a length chain chain.

02:04.630 --> 02:08.230
So the chain is going to be a length chain runnable.

02:08.230 --> 02:12.910
So we can go and use the invoke method on what this function will return.

02:13.070 --> 02:16.350
So this is why this function does not receive any input.

02:16.550 --> 02:22.750
The input to the chain is going to be the input to the invoke function, like we know from previous

02:22.750 --> 02:24.110
videos of this course.

02:24.390 --> 02:28.110
And we're going to see this very very soon after we implement this function.

02:28.470 --> 02:28.990
All right.

02:28.990 --> 02:34.870
So here I listed all of the advantages of using the length chain expression language.

02:35.150 --> 02:40.590
So you're going to see that it's going to be much more declarative and composable.

02:40.590 --> 02:44.150
So we can go and use it later on other linked chain components.

02:44.190 --> 02:50.990
And because it uses a runnable interface then we have streaming support, async support, batch processing

02:50.990 --> 02:53.070
support and type safety.

02:53.310 --> 02:54.830
So this is very very cool.

02:55.030 --> 03:02.380
But I think the most important advantage is that we get much better observability with lamb-smith.

03:02.420 --> 03:07.900
Once we have a runnable object, once we bind everything under a LinkedIn chain.

03:08.100 --> 03:10.940
Going to see this when we trace everything with Len Smith.

03:11.260 --> 03:11.700
All right.

03:11.700 --> 03:14.260
Let's go and start implementing this function.

03:14.260 --> 03:17.660
So I'm going to create a variable called a retrieval chain.

03:17.660 --> 03:20.980
And this is what's going to be the return value from this function.

03:20.980 --> 03:22.820
So let's go and return it.

03:26.700 --> 03:29.540
Right now let's start writing this retrieval chain.

03:29.780 --> 03:33.260
And I pasted this three piping of the object.

03:33.460 --> 03:41.260
Now those parts of the chain correspond to the original implementation to steps three, four and five

03:41.300 --> 03:41.700
here.

03:41.700 --> 03:48.860
So I piped the prompt template to the LM and the lm to the str output parser.

03:49.060 --> 03:55.780
So the prompt template is going to be the prompt template of the prompt that we wrote with the user's

03:55.780 --> 04:02.820
question and with the retrieve context, so that we're going to pipe into the LM.

04:02.820 --> 04:06.740
So this means we're going to invoke the LM with this input.

04:06.940 --> 04:14.180
And once we get a response from the LM, we're going to use the str output parser simply to access the

04:14.220 --> 04:17.580
dot content key of the response here.

04:17.700 --> 04:23.700
So what we see so far is stuff that we already did in previous sections of this course.

04:23.900 --> 04:31.140
So now the question is how do we get to populate this prompt template with the context field.

04:31.140 --> 04:37.420
And with the question field, just like we did in the naive implementation in steps one, two and three.

04:37.940 --> 04:46.180
So we want somehow to invoke the retriever then to pipe the result of the retriever with the retrieved

04:46.180 --> 04:52.860
documents to the format documents function in that to pipe to the prompt template.

04:53.100 --> 04:57.900
So eventually the format documents is going to return us a string.

04:58.140 --> 05:01.490
And this is going to be piped to the prompt template.

05:01.490 --> 05:04.850
And once we have that, everything should work here.

05:04.890 --> 05:05.250
Hey there.

05:05.290 --> 05:05.850
Ethan here.

05:05.850 --> 05:06.490
Popping out.

05:06.490 --> 05:07.930
I hope you're enjoying the course.

05:07.930 --> 05:11.930
I just want to give you a quick heads up on the next couple of minutes.

05:12.250 --> 05:15.330
So the next part is a bit tricky.

05:15.690 --> 05:17.490
It's nothing complicated.

05:17.490 --> 05:22.010
It's just a lot of syntactic sugar with the link chain expression language.

05:22.010 --> 05:27.410
So I have to tell you that from my personal experience, it took me a while to understand the next couple

05:27.450 --> 05:28.330
of concepts.

05:28.450 --> 05:34.290
However, once you understand them and I hope you will by the end of this video, you have a much deeper

05:34.330 --> 05:39.450
understanding on LinkedIn and how link chain is actually working under the hood.

05:39.450 --> 05:42.650
So I just want to give you a quick heads up before we continue.

05:42.810 --> 05:48.250
So it's okay if you don't understand everything, right from the first viewing of this video.

05:48.490 --> 05:54.450
So I do recommend watching this video a couple of times from the beginning to the end, until you finally

05:54.450 --> 06:01.840
understand what's happening and what's happening under the hood, However, we have a couple of problems

06:01.840 --> 06:02.360
right now.

06:02.760 --> 06:06.920
So I remind you, the format docs is a function.

06:07.280 --> 06:09.680
It's not a long chain runnable.

06:09.720 --> 06:11.640
It doesn't have the invoke method.

06:11.640 --> 06:17.160
And even if you try for yourself, try to invoke this function, you will get an error.

06:17.280 --> 06:20.880
But I promise you, we're going to run this and this is going to work.

06:20.880 --> 06:28.240
Because when we use regular Python functions in a link chain expression language, chain link chain

06:28.240 --> 06:34.400
automatically converts those regular Python functions into runnable lambdas.

06:35.000 --> 06:42.040
So when we write something like retrieval pipe format docs, pipe to the prompt template under the hood

06:42.280 --> 06:51.440
link chain automatically converts the format docs into a runnable lambda that is going to run the format

06:51.480 --> 06:51.880
doc.

06:51.920 --> 06:57.200
And this runnable lambda is a runnable, and it adheres to the runnable interface.

06:57.200 --> 06:58.680
So we can invoke it.

06:58.720 --> 07:01.920
We can stream with it, we can batch process with it.

07:01.920 --> 07:04.600
So it's really going to fix this issue.

07:04.640 --> 07:05.600
I've discussed now.

07:06.000 --> 07:06.400
Cool.

07:06.400 --> 07:09.600
So now we need to solve another issue.

07:09.760 --> 07:12.920
And I promise you this is the last issue we're going to be solving today.

07:13.480 --> 07:19.480
So the prompt template I remind you it needs to receive two arguments.

07:19.520 --> 07:22.440
It needs to receive the question of the user.

07:22.440 --> 07:25.760
And it needs to also receive the context which is a string.

07:26.320 --> 07:34.520
So right now the output of this part over here is not going to do this for us.

07:34.520 --> 07:45.720
So we need to find some kind of way to take this output and somehow to attribute it to the key of context.

07:45.920 --> 07:53.080
So the result of this, which is going to go into the prompt template, is going to be under the context

07:53.120 --> 07:56.200
key when we invoke the prompt template.

07:56.200 --> 08:04.230
So we are going to take this and we are going to wrap it under a runnable pass through, and we are

08:04.230 --> 08:06.430
going to use the assign method.

08:06.790 --> 08:14.270
So the gist of it is that runnable passthrough dot assign creates a new dictionary that is going to

08:14.270 --> 08:21.150
combine the original input with the new computed field that we're going to explicitly mention.

08:21.510 --> 08:27.590
And here the input field is going to be the question and what is pinecone.

08:27.590 --> 08:31.430
Because this is eventually how we're going to be executing and running this chain.

08:31.750 --> 08:33.910
Because we're using the runnable pass through.

08:33.950 --> 08:36.150
Then the input is not going to change.

08:36.350 --> 08:42.950
So the output of it still is going to have a dictionary of the key question with the value of what is

08:42.950 --> 08:43.590
pinecone.

08:43.830 --> 08:52.990
However, now we want to add to this dictionary new key with the key of context and the value of this

08:52.990 --> 08:54.230
chain right over here.

08:54.430 --> 08:57.830
So this is now a link chain chain which is going to be invoked.

08:58.070 --> 09:02.470
So we have the retrieval and the format documents that we already know.

09:02.750 --> 09:08.910
And we have another edition of the item getter, which we called with the value of question.

09:09.350 --> 09:13.590
So this is equivalent of using a lambda function like this.

09:13.870 --> 09:17.550
And it simply pulls out just the question string.

09:17.710 --> 09:19.630
So that's all that it's doing.

09:19.870 --> 09:20.350
Okay.

09:20.390 --> 09:22.750
So I know this is a lot to digest.

09:22.750 --> 09:25.910
So I want to reiterate on it one more time.

09:25.910 --> 09:26.870
And this is okay.

09:26.870 --> 09:29.470
If you want to watch this video one more time.

09:30.310 --> 09:32.750
For me it took also a while to understand this.

09:32.750 --> 09:37.070
So I want to talk about this step of the pipeline right here.

09:37.510 --> 09:43.310
So the input for it is going to be a dictionary with the key of question.

09:43.310 --> 09:45.150
And it's going to have some value.

09:45.150 --> 09:50.070
So this is going to be the input when we invoke now this a chain.

09:50.070 --> 09:51.990
And this is the first part of the chain.

09:52.390 --> 09:57.990
So we see that the first step is wrapped entirely with a runnable passthrough.

09:58.150 --> 10:01.060
So this means we do not change the input.

10:01.060 --> 10:07.860
So the result of invoking this is still going to have a dictionary with the key of question.

10:07.860 --> 10:10.220
And it's going to have this value inside it.

10:10.220 --> 10:17.060
And because we used the assign method and we gave it a keyword argument of context.

10:17.260 --> 10:21.860
So this means we are adding now to this output dictionary.

10:22.100 --> 10:23.140
Another key.

10:23.380 --> 10:25.660
And the key is going to be context.

10:25.940 --> 10:31.060
And the value of the key is going to be this chain running over here.

10:31.500 --> 10:34.780
I remind you we're still inside a runnable passthrough.

10:34.980 --> 10:43.220
So this means that the input for running this chain this sub chain is going to be the original input.

10:43.420 --> 10:50.620
So when we use the item getter A with question it's simply going to fish out the question value.

10:50.860 --> 10:55.500
And it's going to pipe this string into the retriever.

10:55.500 --> 10:59.380
And then the output to the format documents and so on and so on.

10:59.540 --> 11:05.100
So after running the runnable path through, step over here with this input.

11:05.140 --> 11:12.980
The output of this step is going to be a dictionary with the key of question and the key of context.

11:12.980 --> 11:19.500
And the context now is going to be having the relevant context after we retrieve the documents and format

11:19.500 --> 11:20.380
them nicely.

11:20.660 --> 11:24.620
And this is eventually what we're going to be piping to the prompt template.

11:24.780 --> 11:29.220
So this is how everything is playing nicely with each other.

11:29.740 --> 11:37.900
I know this was a bit overwhelming, so I think an easy way to also understand it is to compare our

11:38.340 --> 11:44.500
naive and simple version to this link chain expression language version here.

11:44.540 --> 11:49.900
Okay, so try to think of which step is existing where.

11:51.860 --> 11:53.940
Hey there Eden here popping out again.

11:54.060 --> 11:59.730
I told you this part was tricky, so if you didn't fully understand it, Please don't freak out.

12:00.170 --> 12:00.890
Trust me.

12:01.170 --> 12:02.290
Try to have a break.

12:02.330 --> 12:04.810
Maybe go drink some coffee, then return.

12:04.810 --> 12:06.410
And then watch this video again.

12:06.810 --> 12:12.330
And this should be much clearer when you're going to be writing the code and running it and seeing it

12:12.330 --> 12:13.090
for yourself.

12:13.650 --> 12:14.650
So let's continue.

12:16.330 --> 12:16.770
Cool.

12:16.770 --> 12:23.090
So let's go now and run this baby and see that everything is working and we're getting a nice result.

12:23.450 --> 12:28.850
So I'm going to write in the main part over here that we are going to run the linked chain expression

12:28.850 --> 12:30.250
language implementation.

12:30.250 --> 12:34.650
It has a bunch of advantages which we are going to see after we run it.

12:34.810 --> 12:41.050
And in order to run everything, we simply need to call the create retrieval chain function.

12:41.250 --> 12:45.570
This is going to return us with a long chain chain which is a runnable.

12:45.570 --> 12:47.330
So it has the invoke method.

12:47.330 --> 12:54.890
So the chain with LCL variable we can actually go and invoke it with the input of the dictionary with

12:54.890 --> 12:57.530
the key of question and the user's query.

12:57.850 --> 13:01.320
Let's go now and print the result.

13:04.600 --> 13:08.600
And let's go now and run everything and let's see what we get.

13:12.240 --> 13:15.040
So let me go and show the output.

13:19.480 --> 13:19.960
All right.

13:19.960 --> 13:24.320
So now it's running in the chain without LCL.

13:24.640 --> 13:27.800
After it finishes it's going to run with the LCL.

13:32.240 --> 13:32.720
All right.

13:32.720 --> 13:39.560
And we can see the answer, which is very similar to the answer of our original implementation here.

13:40.280 --> 13:43.000
So the results are the same.

13:43.040 --> 13:44.720
The implementation is different.

13:45.080 --> 13:51.680
And now I want to show you how better it is as far as observability and tracing.

13:51.840 --> 13:55.040
So I want to go and switch to Laxmi.

13:55.160 --> 13:59.440
And I want to show you the trace of this chain execution.

13:59.880 --> 14:03.440
So I'm going to also link it in the videos resources.

14:03.440 --> 14:11.080
But we can see all the moving parts here of this Rag retrieval pipeline are in one place are under a

14:11.120 --> 14:15.560
link chain runnable sequence because this is eventually a chain.

14:15.760 --> 14:23.080
So everything here is under one trace, which is super, super convenient to debug and to analyze.

14:23.080 --> 14:30.000
Later we can see the original question and we can see the final output of this chain, which is this

14:30.000 --> 14:30.720
raw text.

14:30.720 --> 14:34.040
Over here we can see how long everything took.

14:34.040 --> 14:37.920
We can see what was the bottleneck, what took most of the time.

14:37.920 --> 14:42.200
And this is why it's so convenient to structure everything under a chain here.

14:42.680 --> 14:43.080
All right.

14:43.120 --> 14:48.800
Now let me show you the tricky part I told you about with the runnable a assign.

14:49.080 --> 14:54.040
And here you can see we are invoking it with the assigning of the context.

14:54.080 --> 15:02.750
We can see the input is a dictionary with the key of question, and the output is a dictionary with

15:02.750 --> 15:09.030
the key of context and the relevant context, and it also has the key of question.

15:09.070 --> 15:10.310
The original input.

15:10.630 --> 15:11.390
All right.

15:11.430 --> 15:17.830
Now this entire runnable path through has inside it three steps.

15:17.830 --> 15:21.510
We have the item getter, the retrieval and the format docs.

15:21.750 --> 15:24.150
So let's start with the item getter.

15:24.150 --> 15:32.110
So here it's simply a runnable lambda underneath the hood which takes the input dictionary and return

15:32.110 --> 15:34.390
us the question itself.

15:34.630 --> 15:36.910
So this is what's happening here.

15:37.070 --> 15:40.150
Now this is going to be piped to the retriever.

15:40.150 --> 15:41.910
So let's go check out the retriever.

15:43.550 --> 15:49.990
We can see now the retriever gets the user's query and it returns the relevant documents which will

15:49.990 --> 15:50.990
limit it to three.

15:51.510 --> 15:56.190
And then we go and we pipe this into the format docs.

15:56.310 --> 16:02.430
And the output of this subchain, we simply want to assign under the context key of the output.

16:02.750 --> 16:06.390
So this is what's happening in this overall chain here.

16:06.670 --> 16:13.470
So now we take now this dictionary which has the key of context with the relevant context and the original

16:13.470 --> 16:14.190
question.

16:14.510 --> 16:19.470
And we simply go and pipe it to the chat prompt template.

16:19.750 --> 16:24.790
And then we go and pipe the chat prompt template into the LLM.

16:25.110 --> 16:28.550
And this is what's going to generate the answer here.

16:28.790 --> 16:32.710
So we have all of the augmented prompt with the relevant context.

16:32.990 --> 16:35.590
And finally we parse out the output.

16:36.350 --> 16:40.190
And you can go and get the code for this video.

16:40.190 --> 16:45.230
If you go to the repository in the branch project slash rag gist.

16:46.310 --> 16:48.310
Let's go here to the commits.

16:48.710 --> 16:53.510
And here you can see this commit add HCl based retrieval chain.

16:53.750 --> 16:57.070
Here you can see all the code that we added here in this video.