WEBVTT

00:00.440 --> 00:01.360
Alrighty.

00:01.400 --> 00:07.600
So after we've implemented the retrieve node, we're going to implement now the document grader node.

00:08.000 --> 00:12.480
So when we enter this node we have in our state the retrieve documents.

00:12.720 --> 00:19.560
So now we want to iterate over those documents and to determine whether they are indeed relevant to

00:19.600 --> 00:21.120
our question or not.

00:21.240 --> 00:27.840
So for that we're going to be writing a retrieval grader chain, which is going to use structured output

00:27.840 --> 00:34.440
from our LLM and turning it into a Pydantic object that will have the information whether this document

00:34.440 --> 00:35.800
is relevant or not.

00:35.840 --> 00:42.000
And if the document is not relevant, we want to filter it out and keep only the documents which are

00:42.000 --> 00:43.440
relevant to the question.

00:43.720 --> 00:46.760
And if not all documents are relevant.

00:46.960 --> 00:51.880
So this means that at least one document is not relevant to our query.

00:52.120 --> 00:55.160
Then we want to mark the web search flag to be true.

00:55.160 --> 00:56.800
So we'll go in later.

00:56.840 --> 00:58.120
Search for this term.

00:58.360 --> 01:00.910
And this is a simple heuristic we're making here.

01:02.990 --> 01:03.870
Alrighty.

01:03.910 --> 01:05.790
So let's go to change package.

01:05.790 --> 01:11.470
And here we want to create a new Python file and let's call it Retrieval Grader.

01:11.950 --> 01:18.590
And this chain is going to receive as an input the original question and the retrieve document.

01:18.590 --> 01:22.990
And it's going to determine whether the document is relevant to the questions or not.

01:22.990 --> 01:26.990
And we're going to be running this chain for each documents we retrieve.

01:27.150 --> 01:29.910
And we'll be leveraging structured output for this.

01:30.510 --> 01:30.910
Cool.

01:30.950 --> 01:32.590
Let's start with the imports.

01:32.590 --> 01:35.670
We want to first import the chat prompt template.

01:35.990 --> 01:43.150
We want to import from Pydantic base module and field and the chat OpenAI client.

01:43.870 --> 01:48.990
And let's initialize the LSTM to be the default with temperature equals zero.

01:49.510 --> 01:55.710
And now we want to create a new class called grade document which is a Pydantic model.

01:56.230 --> 02:03.460
And it's going to have a single field of binary score, which is going to be a string which will be

02:03.500 --> 02:04.300
yes or no.

02:04.620 --> 02:10.100
And in the fields description, we want to write that documents are relevant to the question yes or

02:10.100 --> 02:10.420
no.

02:10.740 --> 02:17.140
So this is important because the LLM is going to leverage the description that we write over here to

02:17.180 --> 02:20.180
decide whether this document is relevant or not.

02:20.460 --> 02:23.060
So this will help the enforcement of the schema.

02:23.060 --> 02:26.460
So we want binary score to be only yes or no.

02:27.220 --> 02:32.020
So we'll take the LLM and we'll use the with structured output method.

02:32.020 --> 02:35.180
And we're going to plug in grade documents class.

02:35.620 --> 02:40.820
Now what is going to do under the hood it's going to use function calling.

02:41.140 --> 02:47.020
And for every LLM call we make we are going to return a Pydantic object.

02:47.180 --> 02:50.500
And the LLM is going to return in the schema that we want.

02:50.700 --> 02:57.540
And it's important to note that the default LLM for chat OpenAI is going to be GPT 3.5.

02:57.900 --> 03:03.850
And if you want to use with structured output, we have to make sure that our LLM supports function

03:03.850 --> 03:05.850
calling or else it's not going to work.

03:06.090 --> 03:12.010
And I highly recommend checking the with structured output function and to really see how blockchain

03:12.010 --> 03:12.970
implements this.

03:15.250 --> 03:17.850
Let's review the prompt we'll be sending to the LLM.

03:18.050 --> 03:25.290
The system prompt you are a grader assessing relevance of a retrieved document to a user question.

03:25.490 --> 03:33.530
If the document contains keywords or semantic meaning related to the question graded as relevant, give

03:33.570 --> 03:39.050
a binary score yes or no to indicate whether the document is relevant to the question.

03:39.970 --> 03:44.450
So now we want to use the chat from template from messages method.

03:44.570 --> 03:47.370
And here we're going to plug in the system message.

03:47.690 --> 03:51.090
And we're going to put here a human message.

03:51.090 --> 03:55.170
And we're going to put a placeholder for the fetched document.

03:55.170 --> 03:59.510
So this is supposed to be the document we want to figure out if it's relevant or not.

03:59.750 --> 04:03.270
And we also want to plug in the original user's question.

04:04.150 --> 04:06.070
And finally let's create the chain.

04:06.070 --> 04:07.750
We'll call it Retrieval Grader.

04:08.030 --> 04:10.750
And it's going to take the great prompt.

04:10.870 --> 04:15.230
And it's going to pipe it into the LLM with the structured output.

04:15.510 --> 04:17.310
And that's it for the chain.

04:17.710 --> 04:18.950
Pretty straightforward.

04:20.390 --> 04:22.230
And now it's time to write something.

04:22.590 --> 04:24.350
And I love writing tests.

04:24.350 --> 04:28.470
And I think it's a super important part of the software development cycle.

04:28.710 --> 04:36.510
I think it's important to note that writing text for JNI and LLM based application is tricky because

04:36.550 --> 04:42.830
first of all, we're relying on a third party and LLM that is statistic.

04:43.030 --> 04:46.030
So the answer that we get are not idempotent.

04:46.190 --> 04:52.230
So every time we send a request to the LLM the answer we get back is not guaranteed to be exactly the

04:52.230 --> 04:52.590
same.

04:52.830 --> 04:54.390
So that's reason number one.

04:54.830 --> 05:02.620
Second because it's a third party then we don't really have control about the availability, the durability.

05:02.900 --> 05:05.660
Availability of that third party application.

05:05.660 --> 05:12.460
And we can get a number of errors like rate limiting like service not available or even internal service

05:12.460 --> 05:14.620
error when we deal with third party.

05:14.820 --> 05:20.740
And lastly, it will cost money because each invocation of the LM costs money, it costs token.

05:20.980 --> 05:28.140
Now we can use cheaper models in order to integrate into our SDLC, or we can even use open source model.

05:28.180 --> 05:33.180
But when we use open source models, we will have to handle all of the operations side.

05:33.180 --> 05:38.940
So the deployment and the scalability and availability of that open source model that we're deploying.

05:39.020 --> 05:45.740
So now I want us to put all those testing problems aside and let's go implement some tests for our application,

05:45.740 --> 05:47.380
which is still better than nothing.

05:47.460 --> 05:53.580
And even though we're not going to run it in a CI CD pipeline, we can run it manually and it will give

05:53.580 --> 05:58.130
us a sanity check to see that our application is working and doing what it's supposed to do.

05:59.970 --> 06:06.330
And lastly, I have to admit that recently all the new models, the top tier models, have become so

06:06.330 --> 06:07.210
much better.

06:07.250 --> 06:14.850
Judging by the quality of the answer, latency and cost, it's always getting better, faster and cheaper.

06:15.370 --> 06:19.810
And I've even seen companies integrating that into their CI CD system.

06:20.010 --> 06:26.690
Despite all the disadvantages I've noted before, and I have to say that there are ways to address and

06:26.690 --> 06:28.850
to mitigate those disadvantages.

06:29.050 --> 06:32.450
But we're not going to discuss them in this scope of this course.

06:33.530 --> 06:38.130
So in the test file let's start writing the tests for this chain.

06:38.450 --> 06:42.490
So I'm going to remove this boilerplate test that we have right now.

06:42.930 --> 06:45.210
And let's start with a bunch of imports.

06:45.210 --> 06:50.010
We first want to import load env to load all the API keys and such.

06:50.290 --> 06:55.650
And we also want to import the grade documents class and the retrieval grader chain.

06:55.850 --> 06:59.840
And of course, we also need the retriever to get us relevant documents.

07:00.520 --> 07:04.920
Let's start with the first test case where this is a happy flow.

07:04.920 --> 07:07.240
So we get a relevant document.

07:07.240 --> 07:09.880
So we're going to call it Test Retrieval Grader.

07:10.080 --> 07:10.960
Answer yes.

07:11.760 --> 07:14.680
And here we're going to take our example question.

07:14.680 --> 07:16.960
So it's going to be agent memory.

07:17.480 --> 07:21.760
And we want to use the retriever to get us relevant documents.

07:21.760 --> 07:26.560
And we're going to get the first documents that we find and take its content.

07:26.800 --> 07:30.400
And this means that this document is going to have the highest score.

07:30.880 --> 07:33.760
And I've accidentally put here the index of one.

07:33.800 --> 07:35.720
You can put here 1 or 0.

07:36.120 --> 07:40.720
All the documents in the beginning of this array should be relevant to this question.

07:40.960 --> 07:48.800
And because we did populate our vector DB with documents that are relevant to agents in memory, then

07:48.800 --> 07:52.640
this should be a document containing some information about agent memory.

07:53.360 --> 08:00.830
And here we are going to invoke the chain and we're going to get back a result, which is going to be

08:00.830 --> 08:02.790
an object of great documents.

08:02.790 --> 08:08.990
Because I remind you, we're using the structured output, and the question is going to be our original

08:08.990 --> 08:09.750
question.

08:09.950 --> 08:13.110
And the document is going to be the document we retrieved.

08:13.630 --> 08:19.550
And finally we want to assert that the result with the binary score is yes here.

08:20.110 --> 08:20.790
Alrighty.

08:20.790 --> 08:25.310
Let's go and run the tests and let's see what we get.

08:25.750 --> 08:30.230
It should pass and we can see everything passed as expected.

08:30.470 --> 08:35.710
And let's reverse the assert as a sanity check to see that the test indeed fails.

08:38.990 --> 08:40.350
And we can see it fails.

08:40.350 --> 08:42.510
So it's working as expected.

08:43.190 --> 08:43.990
All right.

08:44.030 --> 08:47.830
So final run just to see that it's still working.

08:48.270 --> 08:54.270
And if we want we can even run it with the terminal by running Pytest dot dash s v.

08:55.420 --> 09:00.660
We can see one test passed and it's the test retrieval grader.

09:00.660 --> 09:01.380
Answer yes.

09:02.500 --> 09:03.060
All right.

09:03.060 --> 09:08.380
So now let's implement another test which is going to be called Test Retrieval Grader.

09:08.380 --> 09:09.220
Answer no.

09:10.100 --> 09:13.820
And here we're going to have a very similar implementation.

09:13.820 --> 09:19.940
But instead of a question to be agent memory we want to switch it with something not related to the

09:19.940 --> 09:21.060
retrieved documents.

09:21.060 --> 09:23.300
So something like how to make pizza.

09:24.380 --> 09:28.420
So we want to retrieve relevant documents to agent memory.

09:28.420 --> 09:33.340
So we're going to use the retriever with that question from before.

09:34.060 --> 09:40.780
However now when we're going to use the retrieval grader, we are going to plug in a different question

09:40.780 --> 09:41.780
how to make pizza.

09:41.780 --> 09:46.300
And we're expecting that the question here is going to be different than the document, which is going

09:46.300 --> 09:47.900
to talk about agent memory.

09:49.540 --> 09:51.340
And the answer should be no here.

09:53.660 --> 09:54.340
All right.

09:54.360 --> 10:00.640
let's go and run all of our tests again and we can see it all passed.

10:00.800 --> 10:03.760
So our unit tests are working as expected.

10:05.160 --> 10:05.960
Alrighty.

10:05.960 --> 10:08.760
So we know that our chain is working as expected.

10:08.760 --> 10:14.880
And let's now create under nodes a new file and let's call it grade documents.

10:18.280 --> 10:22.960
And here we're going to write our node implementation of grading all the documents.

10:22.960 --> 10:27.040
So to decide whether we want to filter them out or to keep them in.

10:27.560 --> 10:31.520
So we're going to define a function which will receive the state.

10:31.520 --> 10:35.800
And in that state we're going to have already the fetched documents.

10:35.800 --> 10:38.400
We're going to iterate through all the documents.

10:38.400 --> 10:43.440
And our grader chain is going to decide for each document whether it's relevant or not.

10:43.600 --> 10:46.560
If it's not relevant, we're going to filter it out.

10:46.680 --> 10:53.200
And finally, if we have found any document that's not relevant, we're going to change the web searching

10:53.200 --> 10:58.550
flag to be true so we can go and later on search for that query.

10:58.550 --> 11:03.390
So we'll have additional information since not all of the documents are relevant for us.

11:04.030 --> 11:04.750
Alrighty.

11:04.750 --> 11:10.630
So you can pause the video, copy all the imports and the function signature, and then we can continue

11:10.630 --> 11:11.590
and implement it.

11:16.310 --> 11:16.870
Cool.

11:17.150 --> 11:23.350
So now we want to simply print out that we're checking the relevance of the documents.

11:23.910 --> 11:30.190
And let's now extract from the state the original question and the fetch documents.

11:30.910 --> 11:36.430
And we want to create a new list which is called filter docs, which we're going to append to this list.

11:36.430 --> 11:41.990
Every time we found a relevant doc, we'll initialize a boolean of web search to false.

11:42.190 --> 11:49.190
If we find any document that is not relevant to the question, then we'll toggle this boolean into true.

11:50.150 --> 11:56.300
And now let's iterate through all the documents And for each document we want to call the retrieval

11:56.300 --> 11:57.140
grader chain.

11:57.540 --> 12:02.780
And we want to supply it to the original question and the document that we're iterating over.

12:02.820 --> 12:09.340
Of course, we want to take the page content, and this will give us the score whether this is a relevant

12:09.340 --> 12:10.700
document or not.

12:11.100 --> 12:14.540
We're going to take the binary attribute of the score.

12:14.540 --> 12:15.780
So it's yes or no.

12:15.780 --> 12:22.020
And if it's yes, we want to append it to our filter docs because it's relevant to our question.

12:22.700 --> 12:27.220
And if it's not, then we want to toggle the boolean to true.

12:27.300 --> 12:31.380
And we want to continue and we're not adding it into the document list.

12:31.580 --> 12:34.500
Finally we want to update our graph state.

12:34.780 --> 12:38.020
So here we're going to put in the documents.

12:38.020 --> 12:39.980
It's going to be the filter documents.

12:40.300 --> 12:43.020
And the question is going to be the original question.

12:43.020 --> 12:46.100
And we're updating the web search flag as well.

12:46.380 --> 12:48.300
And that was pretty much it.

12:48.340 --> 12:52.900
And we finished implementing the node of grade documents.