WEBVTT

00:00.240 --> 00:01.200
Instructor: So in this video

00:01.200 --> 00:05.100
I wanna focus on the LLM as a reasoning engine.

00:05.100 --> 00:08.790
And specifically I wanna focus on the LLM call

00:08.790 --> 00:11.670
that we make to determine which tool to use

00:11.670 --> 00:14.100
or whether if we have the answer or not.

00:14.100 --> 00:16.260
So we solve the prompt that we're sending,

00:16.260 --> 00:19.020
which is documentation of the original question

00:19.020 --> 00:21.600
with all our tools and showing

00:21.600 --> 00:24.659
and describing to the LLM how it should format its output

00:24.659 --> 00:28.860
with the action action input format.

00:28.860 --> 00:31.650
But now, after we added the agent scratch pad,

00:31.650 --> 00:34.170
I want to show you exactly what are the calls

00:34.170 --> 00:36.810
that are being sent to the LLM,

00:36.810 --> 00:40.413
and I want to show you why it is working as it is working.

00:42.510 --> 00:44.910
So to do that I want to create a new file

00:44.910 --> 00:47.400
and I'll call it callbacks.

00:47.400 --> 00:49.656
And this file is going to have all the logic

00:49.656 --> 00:52.890
that is going to log all our LLM calls.

00:52.890 --> 00:54.330
We're going to create a new class

00:54.330 --> 00:57.030
and we'll call it AgentCallbackHandler.

00:57.030 --> 00:58.920
And it's going to inherit from a class,

00:58.920 --> 01:01.140
which is called the BaseCallbackHandler.

01:01.140 --> 01:03.480
Now let's import it from LangChain.

01:03.480 --> 01:06.420
So what's the BaseCallbackHandler?

01:06.420 --> 01:08.520
I'm gonna go to Google, I'm going to search

01:08.520 --> 01:10.710
for callback handler LangChain

01:10.710 --> 01:13.530
and I'm going to click on the first result.

01:13.530 --> 01:15.990
So basically we have this base class

01:15.990 --> 01:17.520
that we can inherit from

01:17.520 --> 01:20.550
and override all of the functions that we have here

01:20.550 --> 01:24.210
that will be triggered in every LangChain interesting event.

01:24.210 --> 01:26.220
What's an interesting event in LangChain?

01:26.220 --> 01:29.063
Maybe a call to the LLM, a response from the LLM,

01:29.063 --> 01:33.300
maybe when we select a tool, or after we execute a tool,

01:33.300 --> 01:34.710
or when we get an error,

01:34.710 --> 01:37.050
or when we have a new token on the LLM.

01:37.050 --> 01:40.080
So all of this we can override

01:40.080 --> 01:42.060
and this helps us with tracing.

01:42.060 --> 01:45.960
So let's go back and I want to override on LLM start.

01:45.960 --> 01:49.800
So it's when we send something to the LLM in on_llm_end.

01:49.800 --> 01:52.503
So this is when we get a response from the LLM,

01:53.370 --> 01:56.283
let's now import all those type hinting.

02:03.540 --> 02:05.490
And now let's override those methods.

02:05.490 --> 02:08.460
So when we start and send something to the LLM,

02:08.460 --> 02:11.820
I want to print nicely what is the prompt for the LLM.

02:11.820 --> 02:14.580
And this is I'm going to get from the prompts argument,

02:14.580 --> 02:16.110
and I'm going to take the first one

02:16.110 --> 02:19.020
because I'm assuming there is going to be one prompt.

02:19.020 --> 02:21.240
And once we get the answer from the LLM,

02:21.240 --> 02:22.650
I want to log it as well.

02:22.650 --> 02:24.930
So I'm simply going to write some nice prints

02:24.930 --> 02:26.130
over here as well.

02:26.130 --> 02:29.070
And I'm going to access the response objects

02:29.070 --> 02:32.460
and get what was generated by the LLM.

02:32.460 --> 02:34.380
So you can just copy that.

02:34.380 --> 02:35.700
It's not very interesting.

02:35.700 --> 02:37.500
You can inspect those objects yourself.

02:37.500 --> 02:41.790
So I'm not going to go over and describe this entire object.

02:41.790 --> 02:43.740
Of course, all of this code is available

02:43.740 --> 02:45.510
in the courses resources.

02:45.510 --> 02:49.950
So let's now go and import our AgentCallbackHandler.

02:49.950 --> 02:51.510
And now we want to use it

02:51.510 --> 02:53.825
and because we're logging in LLM responses

02:53.825 --> 02:58.500
and LLM calls, so we're going to inject this object

02:58.500 --> 03:02.640
into the LLM variable, which is an instance of chat OpenAI.

03:02.640 --> 03:05.430
So this one here receives an argument of callbacks

03:05.430 --> 03:06.930
and right over here we'll send it

03:06.930 --> 03:09.090
to the AgentCallbackHandler.

03:09.090 --> 03:13.650
So this will log all the responses and calls to the LLM.

03:13.650 --> 03:17.670
Let's not forget to instantiate an instance from this class.

03:17.670 --> 03:19.653
And let's run everything.

03:24.330 --> 03:26.850
So we can see now we have a couple of prints

03:26.850 --> 03:30.270
and we will soon see what are there.

03:30.270 --> 03:34.470
So we can see that we start with this prompt

03:34.470 --> 03:37.050
and this is the ReAct prompt you see right now.

03:37.050 --> 03:40.500
And here we plugged in all the tools that we have

03:40.500 --> 03:43.350
and we have also the tool names

03:43.350 --> 03:47.340
and we have also what is the original question.

03:47.340 --> 03:50.160
We see now what is the LLM response

03:50.160 --> 03:53.613
and this is what was sent to the output parser.

03:54.600 --> 03:56.310
And if we copy that,

03:56.310 --> 03:58.860
I want to show you in action how it works.

03:58.860 --> 04:01.770
So I copied that and I want to head up

04:01.770 --> 04:04.113
to OpenAI playground.

04:08.040 --> 04:11.400
And right over here we can put our prompt.

04:11.400 --> 04:13.410
So I'm simply going to plug it in

04:13.410 --> 04:15.063
and I'm going to execute it.

04:16.200 --> 04:17.820
Now you can see right here

04:17.820 --> 04:20.550
that the response we get from the LLM

04:20.550 --> 04:23.400
is not the response that we logged.

04:23.400 --> 04:25.920
And this is because this is the same response,

04:25.920 --> 04:29.460
except that here we have also an observation part

04:29.460 --> 04:32.760
and another thought and another final answer.

04:32.760 --> 04:36.180
Now I remind you that this is the first iteration

04:36.180 --> 04:37.620
of our reAct loop.

04:37.620 --> 04:40.440
So we need this to select the correct tool.

04:40.440 --> 04:43.110
Now if we would've sent this to the output parser,

04:43.110 --> 04:45.570
then we saw already that it would generate

04:45.570 --> 04:47.430
some kind of error for us

04:47.430 --> 04:49.800
because it would have also a parable

04:49.800 --> 04:51.990
and also a final answer.

04:51.990 --> 04:53.940
Anyways, so this scenario,

04:53.940 --> 04:56.640
we see that the LLM generated too much for us.

04:56.640 --> 04:59.280
We only need the first four lines here.

04:59.280 --> 05:01.020
So how do we fix it?

05:01.020 --> 05:02.940
With a stop token.

05:02.940 --> 05:05.250
So remember when we initialize the LLM,

05:05.250 --> 05:09.480
we plugged in a stop token of back slash observation.

05:09.480 --> 05:11.640
That's exactly why we need it

05:11.640 --> 05:14.437
because once the LLM would encounter the token

05:14.437 --> 05:16.860
of backslash end observation it,

05:16.860 --> 05:18.630
it'll stop generating tokens

05:18.630 --> 05:20.400
and it'll not include it,

05:20.400 --> 05:22.680
yielding us with the first four lines

05:22.680 --> 05:25.350
of this response, which we need.

05:25.350 --> 05:27.690
So let's add in the stop sequences,

05:27.690 --> 05:30.060
the backslash observation

05:30.060 --> 05:31.720
and send the response

05:33.120 --> 05:36.210
and we can see that we get indeed the answer that we need.

05:36.210 --> 05:39.303
And this is the exact answer that our agent received.

05:40.170 --> 05:43.980
So let's head up back to our logs of the LLM

05:43.980 --> 05:48.300
and we can see that in the LLM response, that's what we got.

05:48.300 --> 05:52.290
And now we went for the output parsing.

05:52.290 --> 05:55.920
We got this output, we used the output parser

05:55.920 --> 05:59.460
to parse the tool name and the tool inputs.

05:59.460 --> 06:03.060
Then we ran the tool and we got an observation,

06:03.060 --> 06:04.560
the result of the tool.

06:04.560 --> 06:06.996
So after that we initialize another iteration

06:06.996 --> 06:09.030
of the reAct prompt.

06:09.030 --> 06:11.550
This time we sent in the agent scratchpad

06:11.550 --> 06:14.490
all of the history of what has been done so far.

06:14.490 --> 06:16.350
So we can see right here with the action,

06:16.350 --> 06:19.380
action input, and the observation from before.

06:19.380 --> 06:22.230
And we can see that from here

06:22.230 --> 06:25.650
that the LLM responded us with the answer.

06:25.650 --> 06:29.460
So it responded as I know the final answer,

06:29.460 --> 06:32.400
final answer call, and then the final answer.

06:32.400 --> 06:35.400
And then our output parser read that

06:35.400 --> 06:37.590
and it created an object of agent finish

06:37.590 --> 06:39.150
with the final answer.

06:39.150 --> 06:42.240
And then we finished the iteration.

06:42.240 --> 06:45.390
So we didn't actually create a while loop,

06:45.390 --> 06:47.370
but let's go and do this

06:47.370 --> 06:50.280
because it's going to be a very easy fix.

06:50.280 --> 06:53.550
So I'm gonna go kind of to the top of the file

06:53.550 --> 06:55.470
before we execute anything,

06:55.470 --> 06:59.250
and I'll define that the agent step is going to be empty,

06:59.250 --> 07:02.010
and now I'm going to create a while loop

07:02.010 --> 07:04.590
while the agent_step is not an

07:04.590 --> 07:06.480
instance of agent_finish.

07:06.480 --> 07:08.940
So as long as we have something to do

07:08.940 --> 07:12.810
and now I'm going to take everything and indent it.

07:12.810 --> 07:15.840
However, we don't really need the second evocation

07:15.840 --> 07:18.900
of the agent because we're going to do that

07:18.900 --> 07:21.180
and execute it as long as the agent step

07:21.180 --> 07:22.980
is not agent finish.

07:22.980 --> 07:24.690
So let's remove that.

07:24.690 --> 07:26.130
And that's pretty much it.

07:26.130 --> 07:30.420
We get and execute line 96 if we finish this while loop.

07:30.420 --> 07:34.410
So the agent_step is definitely agent_finish.

07:34.410 --> 07:37.200
Anyways, let's go and format it a bit

07:37.200 --> 07:39.422
and let's now run it and test it

07:39.422 --> 07:42.303
and see that everything is working as expected.

07:43.920 --> 07:47.040
And essentially what we are running right now

07:47.040 --> 07:48.663
is the agent loop.

07:49.634 --> 07:54.450
So this is what happens when we initialize an agent

07:54.450 --> 07:56.520
with the LangChain built-in function.

07:56.520 --> 07:58.920
We give it a zero short react description

07:58.920 --> 08:01.170
and this entire while loop,

08:01.170 --> 08:04.650
this is exactly what the agent executor is running.

08:04.650 --> 08:07.110
And if you want to support me in the course,

08:07.110 --> 08:08.580
I would appreciate very much

08:08.580 --> 08:10.620
if you can leave me a Udemy rating.

08:10.620 --> 08:12.630
This really motivates me to continue

08:12.630 --> 08:15.590
and create new videos for this course and reach the content.

08:15.590 --> 08:17.476
And it also helps feature students

08:17.476 --> 08:19.290
to decide whether this course

08:19.290 --> 08:21.150
is a right fit for them or not.

08:21.150 --> 08:24.060
So if you don't mind, I'd appreciate if you can pause,

08:24.060 --> 08:27.540
go to the rating section and leave me a Udemy review.

08:27.540 --> 08:29.940
Thank you so much and see you in the next video.
