WEBVTT

00:00.160 --> 00:03.200
And this may be more more theory than you thought you were signing up for.

00:03.200 --> 00:04.840
But this is going to be really useful.

00:04.840 --> 00:09.720
This is going to give you some valuable intuition when we actually get to building agent systems.

00:09.760 --> 00:10.040
Okay.

00:10.080 --> 00:12.520
The third one is a simple one chaining LMS.

00:12.720 --> 00:18.760
Look, you you probably know this one already, but you can write a complicated prompt to an LM, like

00:18.760 --> 00:20.680
come up with a puzzle and then solve it.

00:20.680 --> 00:25.080
And sometimes it's a good thing to do because it gives the LM a lot of flexibility.

00:25.120 --> 00:26.880
We like to use the word autonomy.

00:26.920 --> 00:31.680
It gives the LM the ability to choose to go in different directions, to come up with two puzzles and

00:31.680 --> 00:33.160
solve them both if it wants to.

00:33.360 --> 00:39.160
You know, you're giving it quite a broad remit, but sometimes you want it to always have one puzzle,

00:39.160 --> 00:40.200
one solution.

00:40.200 --> 00:44.680
You want to be very careful to make sure it's a hard puzzle, and you want to frame that well, in which

00:44.680 --> 00:48.480
case you could divide this into two separate LM calls.

00:48.480 --> 00:53.360
The first LM call is just come up with a puzzle, and you'd maybe refine that prompt.

00:53.400 --> 00:57.160
You'd add in some some details, some specifics.

00:57.160 --> 00:59.600
So it gets really good at coming up with a puzzle.

00:59.760 --> 01:05.000
And then when it responds with the puzzle, you can you can build a second prompt to an LM and just

01:05.000 --> 01:07.400
include that puzzle in your second prompt.

01:07.400 --> 01:09.560
You can say, hey, I'd like you to solve a puzzle.

01:09.560 --> 01:10.400
Here is the puzzle.

01:10.400 --> 01:11.880
And then you put the puzzle in there.

01:12.160 --> 01:15.640
Now when you do that, we would draw it graphically like this.

01:15.640 --> 01:20.200
We would show it as if it's like one LM call, calling another LM call.

01:20.200 --> 01:24.960
But of course, what's really going on is we're just making an alarm call, taking the output and using

01:24.960 --> 01:26.480
that in our next prompt.

01:26.480 --> 01:30.520
And so and so that that's how we would show it is these two different workflow steps.

01:30.760 --> 01:32.920
And that's what we would call chaining LMS.

01:32.960 --> 01:39.160
It's kind of obvious, but you use it a lot to put more control over your separate calls and to test

01:39.160 --> 01:40.240
them individually.

01:40.280 --> 01:40.800
Okay.

01:40.840 --> 01:44.560
And the fourth one everyone's favorite is tools.

01:44.600 --> 01:47.760
Tools is something that that feels super magical.

01:47.760 --> 01:50.040
And it is in fact really mundane.

01:50.200 --> 01:55.560
Uh, and again, this may be one of those things that, you know, but there's no harm in emphasizing

01:55.560 --> 01:56.000
this.

01:56.040 --> 02:02.720
Let's suppose we have an LM set up where the user asks a question like, what's the stock price of Google?

02:03.000 --> 02:06.320
And we want to call an LLM with that.

02:06.320 --> 02:12.160
So we've got some code or some software or a product like like an A10 and it's sitting there.

02:12.200 --> 02:18.880
It takes that user question and it's going to make a call to an AI, to a large language model to get

02:18.880 --> 02:19.240
that.

02:19.280 --> 02:22.040
It's going to, to, to make a call to an LLM.

02:22.320 --> 02:26.480
And it's an LLM that has been equipped with tools, as we would say.

02:26.480 --> 02:28.560
It's got it's got these extra powers.

02:28.560 --> 02:35.880
It's been told you have the ability to look up stock prices using this product, this product that we

02:35.880 --> 02:37.880
saw last time called Market Stack.

02:38.160 --> 02:40.920
And it's able to make an API request.

02:40.960 --> 02:47.160
It's able to connect to the internet, make an API request and find out Google's stock price.

02:47.160 --> 02:53.080
And then in its response back to us, it includes it says, hey, the price of Google stock is whatever.

02:53.280 --> 02:55.640
And, uh, it seems magical.

02:55.640 --> 03:01.200
And we know that Llms are responsible for taking an input and generating output tokens based on pattern

03:01.200 --> 03:01.960
matching.

03:01.960 --> 03:07.960
So at what point in all of that did this LLM suddenly have the ability to to say, oh, I'm not going

03:08.000 --> 03:09.080
to generate tokens.

03:09.080 --> 03:12.000
Instead of that, I'm going to connect to an API.

03:12.120 --> 03:13.600
I'm going to connect to the internet.

03:13.600 --> 03:14.760
Go and do something.

03:14.800 --> 03:15.720
Get back the results.

03:15.720 --> 03:17.880
And that's what I'm going to put in my output tokens.

03:17.960 --> 03:18.880
How did that happen?

03:18.880 --> 03:21.440
How did we equip an LLM with these tools?

03:21.440 --> 03:23.920
And of course you already know the answer to this.

03:23.920 --> 03:26.280
I expect it is more mundane.

03:26.280 --> 03:31.480
It's not that GPT has this power to stop generating tokens and connect to the internet instead.

03:31.600 --> 03:36.240
No, no, what really happens is this very much more simple the.

03:36.280 --> 03:43.880
Net code when it prompts the LLM, when it's building this message that's going to GPT that it's going

03:43.880 --> 03:44.960
to try and complete.

03:45.160 --> 03:48.560
It doesn't just say what's the stock price of Google.

03:48.560 --> 03:51.800
What it says is, look, you can do two things.

03:51.800 --> 03:58.120
You can either respond with an answer, tell me the answer to my question, or if you wish, you can

03:58.120 --> 04:05.800
respond and say, I need you to run the following tool on my behalf, and then I will or will call you

04:05.800 --> 04:08.720
a second time with the result of doing that.

04:09.200 --> 04:12.050
The tool you can call is looking up stock prices.

04:12.210 --> 04:16.450
The question from the user is what's the stock price of Google?

04:16.690 --> 04:20.450
And the LM will get that whole input sequence.

04:20.450 --> 04:23.090
We've carefully packaged this input sequence.

04:23.090 --> 04:30.250
And the output it will give will say I want to to find the stock price of Google, please use the tool.

04:30.370 --> 04:35.810
And then you call it a second time with all of that and with the tool and it responds with the price.

04:35.810 --> 04:38.330
So it's just a clever trick.

04:38.370 --> 04:46.010
It's a conjuring trick by by calling an LM and by massaging the input to explain the tools that it can

04:46.010 --> 04:51.330
use, and getting that output and interpreting it, and then doing stuff and calling it a second time,

04:51.330 --> 04:57.970
we're able to give this kind of illusion that an LM is able to run tools, but it all comes down to

04:58.010 --> 04:59.090
clever prompting.

04:59.090 --> 05:02.610
Like so much with Llms, it all comes down to clever prompting.

05:02.810 --> 05:05.170
The use of tools is a is a trick.

05:05.170 --> 05:11.850
And just in case you don't believe me, I always love showing this screenshot here, which I did in

05:12.090 --> 05:15.690
ChatGPT and you can do it to read this prompt.

05:15.690 --> 05:17.490
I just wrote this in ChatGPT.

05:17.850 --> 05:20.170
You are a support agent for an airline.

05:20.170 --> 05:23.610
You answer user questions, just respond.

05:23.650 --> 05:30.050
Only use tool to fetch ticket price for London to retrieve the ticket price for London or for any city

05:30.050 --> 05:31.410
that you choose to name.

05:31.570 --> 05:33.250
Here's the user's question.

05:33.250 --> 05:34.370
I'd like to go to Paris.

05:34.370 --> 05:38.130
How much is a flight and look at what ChatGPT responded to me.

05:38.170 --> 05:41.290
It responded use tool to fetch ticket price for Paris.

05:41.490 --> 05:48.850
It just responds with that and hopefully that crystallizes for you this this trick that is tool calling

05:48.850 --> 05:50.370
right here in ChatGPT.

05:50.410 --> 05:54.090
And you can go and do it to yourself to get that first hand experience.

05:54.090 --> 05:57.690
And then the fifth trick is this idea of an agent loop.

05:57.850 --> 06:06.050
This is this concept that you could have an LLM that is repeatedly called and is able to run tools continuously

06:06.050 --> 06:09.930
until it's finished, and it's achieved its goal and it's done.

06:10.090 --> 06:16.010
And in doing so you get this, this behavior where it appears to go off and do some work to achieve

06:16.010 --> 06:16.570
a goal.

06:16.930 --> 06:18.490
So here's an example.

06:18.490 --> 06:20.930
Supposing we had this prompt to an LLM.

06:21.050 --> 06:24.090
Your task is to find the current value of my portfolio.

06:24.330 --> 06:27.730
You are able to number one retrieve my portfolio.

06:27.890 --> 06:31.850
And number two you can look up share price just by responding.

06:31.890 --> 06:33.850
Use my tool to retrieve portfolio.

06:34.050 --> 06:36.530
Use my tool to look up a share price.

06:36.810 --> 06:39.290
So this is given to it.

06:39.290 --> 06:41.050
And how does it complete?

06:41.050 --> 06:45.250
What are the tokens that it will put as the most likely next tokens?

06:45.290 --> 06:50.130
Well, if it's given this kind of prompt, the most likely thing it would want to do, given that its

06:50.130 --> 06:55.730
task is to find the current value, it's going to say, okay, I'm going to use tool to retrieve portfolio.

06:56.010 --> 06:56.370
Fine.

06:56.410 --> 06:57.450
Makes total sense.

06:57.690 --> 07:03.450
And now what we do is we take that completion and we put it into the prompt.

07:03.490 --> 07:05.330
We put the prompt, the same prompt as before.

07:05.450 --> 07:08.010
We add in use tool to retrieve portfolio.

07:08.010 --> 07:11.290
And then we add in the results of actually doing that.

07:11.330 --> 07:12.330
We go off and do it.

07:12.330 --> 07:13.610
We retrieve the portfolio.

07:13.610 --> 07:17.050
And let's say that the answer is three shares of Google stock.

07:17.170 --> 07:19.130
So we just put that in there in the prompt.

07:19.130 --> 07:20.770
And this is the second prompt.

07:20.770 --> 07:22.810
We send the LM in a loop.

07:22.810 --> 07:23.690
We just keep going.

07:23.730 --> 07:26.570
We send this and it then says okay.

07:26.610 --> 07:31.610
The most likely way to complete this input is use tool to look up share price Google.

07:31.890 --> 07:34.050
And we say fine, we take that.

07:34.050 --> 07:39.450
We go and look up the Google share price and we come up with a new prompt which has use tool to retrieve

07:39.490 --> 07:40.130
portfolio.

07:40.170 --> 07:45.610
Three shares of Google use tool to look up share price of Google share price is $100 or whatever.

07:46.010 --> 07:48.530
And we shove all of that in the prompt.

07:48.530 --> 07:53.690
And this entire thing on that's written here goes in one prompt to the LM.

07:53.930 --> 07:55.410
Every prompt is stateless.

07:55.410 --> 07:57.330
It has the full conversation so far.

07:57.730 --> 07:59.650
And now we continue our loop.

07:59.650 --> 08:00.730
We call it again.

08:00.930 --> 08:02.290
This time it responds.

08:02.290 --> 08:05.210
The value of your portfolio is $300.

08:05.250 --> 08:06.770
That's its final response.

08:06.770 --> 08:08.410
It doesn't call any tools.

08:08.570 --> 08:09.530
It's done.

08:09.530 --> 08:10.850
And there you have it.

08:10.970 --> 08:18.850
That is calling an LM in a loop with tools to achieve a goal that is a genetic AI.