WEBVTT

00:00.400 --> 00:00.840
Hey there.

00:00.880 --> 00:06.360
This is going to be a longer video because we're going to implement the self rag paper end to end.

00:06.640 --> 00:09.880
So it's going to be much easier now that we have the structure.

00:09.880 --> 00:12.760
And we have been doing things very similar to this.

00:12.800 --> 00:16.440
We can see the graph that we're eventually going to build on our end.

00:16.640 --> 00:23.520
And basically what we'll do is we'll simply add conditional branching after the generate node.

00:23.520 --> 00:28.720
So instead after generate to go to the end we'll add another layer of reflection.

00:29.440 --> 00:35.400
And in this video we're going to implement the hallucination grader and the answer grader chains.

00:35.440 --> 00:40.720
Both chains are going to reflect on the answer and to determine either whether the model hallucinated

00:40.720 --> 00:44.600
the answer, and if the answer is not grounded in the documents.

00:44.880 --> 00:49.960
And if the answer actually answers the question, the original question.

00:50.680 --> 00:56.960
After we have that, all we need to do is to simply create the conditional edge that will run and trigger

00:56.960 --> 00:57.800
those chains.

00:58.080 --> 01:01.840
And you might be thinking, hey, why I'm not putting this into a note.

01:01.880 --> 01:07.320
A note that will check grounding in the documents, and a note that will check grounding in the question.

01:07.320 --> 01:10.760
Or maybe to have them both into a note and then decide what to do.

01:11.160 --> 01:14.560
And the answer is, of course I can do it if I want.

01:14.840 --> 01:20.200
However, because in this process, we do decide whether we want to finish or we want to generate it

01:20.200 --> 01:27.040
again, or maybe to perform another search, then this heavily implies that we need to choose the next

01:27.040 --> 01:27.560
step.

01:27.840 --> 01:32.360
So obviously here conditional branching sounded more intuitively to me.

01:32.720 --> 01:35.040
Alrighty, let's go to the code.

01:37.600 --> 01:41.840
Alrighty, let's go and create a new file under chains.

01:42.160 --> 01:46.560
And we want to call it Hallucination Greater.

01:48.080 --> 01:54.560
Now in this file we're going to implement a chain that is going to determine whether the answer we get

01:54.560 --> 01:55.640
back from the LM.

01:55.680 --> 01:58.910
The generation is grounded in the documents.

02:00.310 --> 02:03.150
So let's go and start with the imports.

02:03.150 --> 02:04.950
We want the chat prompt template.

02:04.950 --> 02:13.150
We want to import base model and field from pedantic and runnable sequence for type hinting and chat.

02:13.150 --> 02:13.910
OpenAI.

02:14.430 --> 02:18.110
Let's go and also initialize our LLM.

02:18.310 --> 02:22.270
And we're going to do the same trick with the structured output.

02:22.270 --> 02:27.310
So we'll create a class called grade hallucinations which inherits from the base model.

02:27.550 --> 02:34.150
And let's give it a quick description that it's a binary score for hallucination present in the generated

02:34.150 --> 02:34.710
answer.

02:35.350 --> 02:43.070
Now it's going to have only one attribute of binary score which is a boolean, and the description is

02:43.070 --> 02:51.110
that answer is grounded in the facts, yes or no because we set the binary score type hint to be boolean.

02:51.470 --> 02:58.350
Then the length chain output parser eventually is going to Who cast the LM answer into a boolean.

02:59.110 --> 02:59.750
All right.

02:59.750 --> 03:01.990
So now we want to take the LM.

03:02.110 --> 03:05.790
And we want to use with structured output method.

03:05.790 --> 03:09.430
And we want to pass the great hallucinations class.

03:10.030 --> 03:16.590
So basically the answer we will get back from the LM link chain will format it as the pedantic class

03:16.590 --> 03:22.950
of great hallucinations which we created now, which is going to have only one attribute of binary score,

03:22.950 --> 03:23.910
which is a boolean.

03:24.430 --> 03:24.990
All right.

03:25.030 --> 03:27.310
Let's create the system prompt for our chain.

03:28.030 --> 03:36.350
You are a grader assessing whether an LM generation is grounded in slash, supported by a set of documents.

03:36.790 --> 03:38.830
Give a binary score of yes or no.

03:38.950 --> 03:43.510
And yes means that the answer is grounded slash supported by the facts.

03:44.270 --> 03:47.790
Alrighty, now let's create the chat prompt template.

03:48.150 --> 03:55.430
And for that we're going to be using the chat prompt template from Messages, which is going to receive

03:55.470 --> 04:01.030
a list, and it's going to be a list of tuples where the first element is the row, which is.

04:01.030 --> 04:06.110
Here is system, and the content is going to be the system message we wrote about.

04:06.310 --> 04:09.670
And the second message is going to be the role of human.

04:10.070 --> 04:13.070
And the content is going to be a set of facts.

04:13.070 --> 04:18.990
And then we're going to plug in the documents, and then we're going to plug in the LLM generation,

04:18.990 --> 04:21.710
which is what the LLM responded earlier.

04:22.150 --> 04:24.910
We have all the moving parts for our chain.

04:24.910 --> 04:30.310
So let's create the hallucination greater chain, which is going to take the hallucination prompt.

04:30.310 --> 04:33.990
And it's going to pipe it to the structured LLM grader.

04:34.270 --> 04:36.590
So eventually what will happen here.

04:36.590 --> 04:39.070
We're going to get an answer yes or no.

04:39.270 --> 04:43.470
If the answer is indeed grounded in the documents that we plug in.

04:44.030 --> 04:44.910
Alrighty.

04:44.910 --> 04:47.470
So let's go now and write some tests.

04:47.670 --> 04:51.510
So we'll go to test chains.py file.

04:51.950 --> 04:56.670
And here we want to add some tests to the hallucination checker.

04:57.270 --> 05:04.350
So we first need to start with importing the chain and the Hallucination Grader class.

05:05.310 --> 05:08.790
So now we are able to write the test function.

05:08.990 --> 05:12.150
So let's define test hallucination grader answer yes.

05:16.110 --> 05:18.710
And the question is going to be agent memory.

05:18.750 --> 05:20.750
We're going to retrieve the documents.

05:22.150 --> 05:26.790
So now we want to use the generation chain we already have imported.

05:27.070 --> 05:33.070
And in the context we're going to give the retrieve documents which should have information about the

05:33.110 --> 05:34.150
agent memory.

05:34.150 --> 05:35.910
And we're going to plug in the question.

05:36.550 --> 05:39.910
So the answer should be grounded in the documents.

05:40.270 --> 05:40.790
All right.

05:40.790 --> 05:48.030
So now we want to run the hallucination grader chain with the relevant documents we retrieved and the

05:48.030 --> 05:48.790
generation.

05:49.070 --> 05:55.390
And the result should be yes, because we sent valid documents and I see I got an error.

05:55.750 --> 06:01.870
And this is because, um, my import above was before I loaded the environment variables.

06:01.870 --> 06:06.390
So let's just change that and let's now rerun this again.

06:10.030 --> 06:12.230
And we can see the test passed.

06:15.150 --> 06:15.910
All right.

06:15.910 --> 06:22.790
So let's now test the second um case where we do have an hallucinated answer.

06:23.590 --> 06:27.230
So for that I'm going to create a new test function.

06:27.390 --> 06:29.750
And I'm going to call it test hallucinations gradient.

06:29.750 --> 06:30.070
No.

06:30.550 --> 06:33.670
And I'm going to put here the exact content as before.

06:33.830 --> 06:36.470
But I'm going to change the assert into no.

06:39.830 --> 06:45.150
And now in the generation I'm simply going to put here some rubbish.

06:45.150 --> 06:49.620
So let's put here for example a And the string of.

06:50.060 --> 06:53.340
In order to make pizza, we first need to start with the dough.

06:53.900 --> 06:55.380
So let's run it.

06:55.380 --> 06:58.460
And of course we're going to expect that this test will pass.

06:58.460 --> 07:01.580
That is that this is an hallucinated answer.

07:01.860 --> 07:04.980
So we're going to have binary score equals false.

07:08.100 --> 07:09.700
And it indeed passed.

07:10.100 --> 07:14.740
And in line 62 we can remove it because we're not really using the generation.

07:15.420 --> 07:18.980
So that's pretty much it for our tests.

07:19.300 --> 07:23.300
Let's just run everything together.

07:23.300 --> 07:27.620
So let's run all the tests just to have a sanity check that all is working.

07:27.820 --> 07:33.340
So I'm going to select the test chains here and let's rerun all of the tests.

07:42.980 --> 07:45.900
And we can see that all of our tests passed.

07:46.420 --> 07:47.140
Amazing.

07:47.380 --> 07:48.100
Alrighty.

07:48.100 --> 07:52.420
So now let's create a new file and we'll call it Answer Grader.

07:52.460 --> 07:54.260
Of course it's going to be under chains.

07:54.260 --> 07:59.220
And here we're going to implement a chain that will grade the answer and will determine whether this

07:59.220 --> 08:01.620
answer answers the questions or not.

08:02.020 --> 08:03.580
Very similar like before.

08:03.620 --> 08:05.620
Let's start with the imports.

08:09.820 --> 08:11.380
And let's initialize data.

08:11.940 --> 08:14.260
We'll create a class called grade answer.

08:14.420 --> 08:18.260
And it's going to have only one attribute of binary score.

08:19.220 --> 08:22.420
And this binary score let's give it in the field.

08:22.420 --> 08:26.460
The description of answer addresses the question yes or no.

08:29.900 --> 08:32.340
And as always let's go and initialize data.

08:33.020 --> 08:35.660
And let's create this structured LM grader.

08:35.660 --> 08:42.380
So this is the same LM but we use the with structured output and we give it the grade answer pedantic

08:42.380 --> 08:42.860
class.

08:43.060 --> 08:46.060
And now all we need to do is create a system prompt.

08:46.060 --> 08:51.340
And here you are a grader, assessing whether an answer addresses and resolves the question.

08:51.340 --> 08:53.100
Give a binary score yes or no.

08:53.380 --> 08:55.860
Yes means that the answer resolves the question.

08:57.260 --> 09:00.100
And finally we want to create the chat prompt template.

09:00.100 --> 09:03.060
So again we'll use the from messages method.

09:03.580 --> 09:06.740
And here we're going to plug in the system message.

09:06.780 --> 09:09.260
We're going to give it the role of system.

09:09.380 --> 09:15.980
And the next message is going to be a human message with the user's question plugged in alongside with

09:15.980 --> 09:17.260
the LM generation.

09:17.260 --> 09:19.900
So this is the answer the LM generated.

09:20.260 --> 09:27.100
And finally, let's go and create the answer grader chain, which will take the answer prompt.

09:27.100 --> 09:30.620
And we'll pipe it into the structured LM grader.

09:30.860 --> 09:36.820
So eventually we'll get back here an object of the Great Answer class, which will have the information

09:36.820 --> 09:40.500
of true or false, whether it answered the question or not.

09:40.660 --> 09:45.220
So this chain was very similar to the hallucination greater chain.

09:45.220 --> 09:48.260
So nothing new here as far as prompt engineering techniques.

09:48.260 --> 09:52.220
So we simply use the structured output with the pedantic class.

09:52.540 --> 09:53.140
All right.

09:53.140 --> 09:58.380
So now we have the chains that will run the reflection of the generation.

09:58.580 --> 10:01.380
And now we want to incorporate it into our graph.

10:01.380 --> 10:03.460
So let's create a conditional branch.

10:03.940 --> 10:07.700
So I'm going to switch over to graph dot pi here.

10:08.060 --> 10:11.260
And let's go and import the answer grader.

10:11.260 --> 10:13.620
And we want to import Hallucination Grader.

10:14.260 --> 10:17.220
And we now want to define a function.

10:17.220 --> 10:20.460
And this function will be the conditional edge function.

10:20.500 --> 10:24.300
We'll call it grade generation grading documents in question.

10:24.540 --> 10:28.060
Of course it will receive the state and it's going to return us a string.

10:28.060 --> 10:31.020
The string is going to be which node do we want to go next.

10:31.900 --> 10:32.660
All right.

10:32.660 --> 10:33.740
So let's begin.

10:33.740 --> 10:36.460
And first here we want to check the hallucinations.

10:36.820 --> 10:43.610
So I'm simply going to fish out and extract from the state the question, the documents and the generation.

10:43.610 --> 10:48.210
So this is all the information we have so far when we get to this conditional edge.

10:48.610 --> 10:51.490
And let's go and run the hallucination grader chain.

10:51.730 --> 10:58.210
And here we're going to run it with the retrieve documents, with the search or without the search and

10:58.210 --> 10:59.930
the generation.

11:00.170 --> 11:05.690
Now the response we get back has the attribute of binary score.

11:06.530 --> 11:10.290
So let's go and call it the hallucination grade.

11:10.850 --> 11:17.050
And if this value is equals to true then this means that there is no hallucination.

11:17.050 --> 11:20.130
So we got the positive grade anyways.

11:20.250 --> 11:21.210
So let's go.

11:21.250 --> 11:26.490
If this is true then this means that the generation is grounded in the documents.

11:26.890 --> 11:33.010
And if it is the case, what we want to do is to grade the generation against the question.

11:33.010 --> 11:34.970
So whether it answers the question or not.

11:35.010 --> 11:37.490
And for this we'll use the answer grader chain.

11:37.970 --> 11:43.610
And again we're going to have a score which is going to have the binary score attribute.

11:43.610 --> 11:45.530
And if this is equals to true.

11:45.530 --> 11:49.810
So this means that the generation does address the question.

11:49.810 --> 11:52.370
And here I want to return useful.

11:52.370 --> 11:54.890
And I'm not going to return the end node.

11:54.890 --> 12:00.810
I'm simply going to return the string useful because later I want to show you how to use the graph mapping.

12:00.810 --> 12:04.050
And and this is what I'm going to output currently.

12:04.250 --> 12:04.930
All right.

12:05.290 --> 12:12.210
However, if the answer does not address the question but it is grounded in the documents, then we

12:12.250 --> 12:15.490
simply want to return that the answer is not useful.

12:15.490 --> 12:19.730
And if this is the case, then this means that the information in the vector store was not sufficient

12:19.730 --> 12:20.850
to answer the question.

12:20.850 --> 12:22.970
So we want to use external search.

12:23.010 --> 12:27.250
We would like to execute the search tool later anyways.

12:27.290 --> 12:34.570
And if the answer is not even grounded in the documents, then what we want to do is to say that it's

12:34.570 --> 12:35.530
not supported.

12:35.530 --> 12:39.050
And in that case, we want to regenerate it generated again from the documents.

12:39.690 --> 12:40.250
All right.

12:40.250 --> 12:43.410
So now let's create all the conditional edges.

12:43.930 --> 12:46.090
The source node is going to be generate.

12:46.090 --> 12:49.010
And from generate what's going to determine the next node.

12:49.050 --> 12:50.050
Is the function.

12:50.050 --> 12:52.890
Great generation grounded in documents in question.

12:53.250 --> 12:58.210
However now we're going to add the third argument of path map which is a dictionary.

12:58.210 --> 13:03.050
And here we're going to map not supported useful not useful.

13:03.090 --> 13:06.610
The strings that we returned from the previous function.

13:07.050 --> 13:12.450
But because they don't represent a node to go to they don't represent a real node name.

13:13.050 --> 13:16.410
Then we are going to map them into the node names.

13:16.410 --> 13:23.970
So not supported is going to go to generate because we want to regenerate after the answer was not grounded

13:23.970 --> 13:31.930
in the documents useful, we want to go to the end node and return the answer to the user and not useful

13:31.930 --> 13:33.290
will go to web search.

13:33.330 --> 13:39.800
Since the vector store didn't have information well enough to answer the question, but notice that

13:39.800 --> 13:45.960
the strings will return from the great generation graduated in documents and question function.

13:46.360 --> 13:51.040
Then those strings are going to be what is going to be displayed on the edges.

13:51.320 --> 13:55.560
So this is something cool and it really makes our graph even more explainable.

13:56.160 --> 13:56.720
All right.

13:56.720 --> 13:59.920
So let's go now and let's run everything.

13:59.920 --> 14:02.160
So I'm going to go to Main.py.

14:02.760 --> 14:07.000
And let's run this with the question what is agent memory.

14:07.240 --> 14:11.000
Now here we're expecting to have sort of a happy flow.

14:11.000 --> 14:15.040
So the answer is going to be grounded in the documents.

14:15.040 --> 14:17.080
And that is going to answer the question.

14:17.280 --> 14:20.320
So we can see here we have the new graph that was created.

14:20.760 --> 14:24.440
And if we check the logs we can see exactly what is happening.

14:24.440 --> 14:28.960
So the flow we just built and ran successfully.

14:28.960 --> 14:31.920
So the answer was indeed grounded in the documents.

14:31.920 --> 14:34.480
And the generation did answer the question.

14:34.880 --> 14:36.680
So this is very cool.

14:36.680 --> 14:39.920
And we have a very complex workflow right now.

14:40.920 --> 14:47.160
And if we'll go to lengths to our project let's have a look at the last execution.

14:47.200 --> 14:49.800
And we can see all of the nodes that ran.

14:50.160 --> 14:53.800
And we can see at the end after the generate node.

14:53.840 --> 14:56.480
Then we ran the grade generation.

14:56.480 --> 15:02.440
So this is what A triggered the conditional branch that finally gave us the answer.

15:03.400 --> 15:07.360
So you can see here the grade generation grounded in documents in question.

15:07.640 --> 15:08.800
So it ran.

15:08.840 --> 15:14.480
And two queries to Dlrm which decided whether it was the right question that we want to return to the

15:14.480 --> 15:14.960
user.

15:15.400 --> 15:21.600
And finally, if you want to check out the source code, then you are more than welcome to go and compare

15:21.600 --> 15:24.000
it with the code in the repository.

15:24.200 --> 15:26.800
So this is in the branch ten frag.

15:27.040 --> 15:32.560
And you can check out and compare the files and um and continue from there.