WEBVTT

00:00.120 --> 00:01.530
-: Welcome, in this video we're gonna have a look at

00:01.530 --> 00:02.790
tool calling and agents.

00:02.790 --> 00:05.550
We'll understand what tool or function calling is,

00:05.550 --> 00:08.190
how it works, and also look at building agents.

00:08.190 --> 00:10.050
Tool calling, also known as function calling

00:10.050 --> 00:12.390
enables large language models to interact

00:12.390 --> 00:14.730
with external code or services.

00:14.730 --> 00:17.640
The large language model can decide when to call a function

00:17.640 --> 00:20.730
you define and providing structured arguments.

00:20.730 --> 00:22.200
Basically this bridges the gap

00:22.200 --> 00:23.460
between the language understanding

00:23.460 --> 00:25.290
and actionable code generation.

00:25.290 --> 00:26.640
As you can see on the right here,

00:26.640 --> 00:28.500
the LLM has access to some tools

00:28.500 --> 00:30.240
and it decides when to call it.

00:30.240 --> 00:32.760
There are seven steps in the tool calling flow.

00:32.760 --> 00:34.050
Firstly, you have to define

00:34.050 --> 00:36.990
what tools the large language model has access to.

00:36.990 --> 00:40.650
You often do this in a structured form called JSON schema.

00:40.650 --> 00:42.870
After that, you then send a user message

00:42.870 --> 00:45.810
or an assistant message to the chat model

00:45.810 --> 00:49.350
and with an input and also the tool definitions.

00:49.350 --> 00:51.690
The model will then determine if tools are needed

00:51.690 --> 00:53.760
to be used to answer the query.

00:53.760 --> 00:56.640
And if they are, the model will return the function name

00:56.640 --> 00:58.440
and the structured arguments

00:58.440 --> 01:01.200
that it wants you as the developer to run.

01:01.200 --> 01:03.570
You will then have to be responsible for executing

01:03.570 --> 01:06.120
and running the code and running the functions.

01:06.120 --> 01:09.840
Then you will return the output to the chat model.

01:09.840 --> 01:12.900
The chat model will then produce a final response

01:12.900 --> 01:15.030
and it will incorporate the function results

01:15.030 --> 01:16.680
into a coherent answer.

01:16.680 --> 01:17.550
If you're looking on the right,

01:17.550 --> 01:19.710
you can see the user input comes in here.

01:19.710 --> 01:22.230
You define your tools, you query the model.

01:22.230 --> 01:23.700
The model then analyzes it,

01:23.700 --> 01:25.230
and if there is no tools needed,

01:25.230 --> 01:27.090
it generates a direct response.

01:27.090 --> 01:30.570
Otherwise it will do a function call, execute the function,

01:30.570 --> 01:31.437
return the results to the model,

01:31.437 --> 01:33.750
and then it will generate a final response.

01:33.750 --> 01:36.270
So this is what it looks like when you generate your tools.

01:36.270 --> 01:39.300
You'll have the type of a function, you'll have the name of

01:39.300 --> 01:42.840
get weather, the description, the parameters,

01:42.840 --> 01:45.030
such as a latitude and a longitude,

01:45.030 --> 01:48.000
and also you'll say what parameters are required.

01:48.000 --> 01:49.380
Okay, so we've looked at tool calling.

01:49.380 --> 01:52.260
We are aware that we can equip a language model

01:52.260 --> 01:55.200
with tools to enhance its capabilities.

01:55.200 --> 01:56.370
What is an agent though?

01:56.370 --> 01:57.780
An agent is an autonomous system

01:57.780 --> 02:00.660
that uses an LLM to achieve goals through a cycle

02:00.660 --> 02:02.460
of reasoning about the current state,

02:02.460 --> 02:04.140
planning appropriate actions,

02:04.140 --> 02:06.270
executing those actions via tools,

02:06.270 --> 02:09.120
observing those results and updating and understanding.

02:09.120 --> 02:11.880
Common components you'll find in an agent include tools,

02:11.880 --> 02:13.620
memory, planning.

02:13.620 --> 02:16.080
There's also this concept called the agentic loop,

02:16.080 --> 02:18.360
which is basically a big wild true loop

02:18.360 --> 02:20.640
with a series of things that happen inside of that.

02:20.640 --> 02:23.160
So at the start of that, it will do some reasoning.

02:23.160 --> 02:24.990
It might decide to use some tools,

02:24.990 --> 02:26.370
it will execute those tools,

02:26.370 --> 02:28.380
integrate those into the chat history.

02:28.380 --> 02:31.200
Then it will decide to continue in a loop if it needs to.

02:31.200 --> 02:34.380
Otherwise it will finish with a final answer.

02:34.380 --> 02:36.690
Let's have a look at how that would look in Python.

02:36.690 --> 02:39.390
We start with a Python list of messages,

02:39.390 --> 02:41.640
check the weather in Paris and Berlin.

02:41.640 --> 02:45.270
We enter our agentic loop, we get a model response.

02:45.270 --> 02:48.300
Then we check for all of the responses to outputs

02:48.300 --> 02:50.040
for a tool call.

02:50.040 --> 02:53.610
And if there is a tool call, we execute that function

02:53.610 --> 02:57.300
and else we just respond with an output text and we break.

02:57.300 --> 03:00.450
So when there isn't a function call to happen,

03:00.450 --> 03:02.793
we basically break out of the agentic loop.

03:04.290 --> 03:06.060
There's other advanced agent patterns.

03:06.060 --> 03:09.390
So for example, defining a custom successful criteria,

03:09.390 --> 03:12.090
continuing the loop until the objective is achieved.

03:12.090 --> 03:13.680
An example might be search the whether

03:13.680 --> 03:15.180
in five different cities.

03:15.180 --> 03:17.073
Also, there's key implementation patterns,

03:17.073 --> 03:18.480
so parallel tools.

03:18.480 --> 03:20.460
So tools can be executed in parallel.

03:20.460 --> 03:22.170
You also have the option to choose

03:22.170 --> 03:25.530
does the agent or LLM automatically choose the tools?

03:25.530 --> 03:26.550
Is there a tool required

03:26.550 --> 03:28.860
or a forced selection of certain tools?

03:28.860 --> 03:30.870
Strict mode, ensure function arguments

03:30.870 --> 03:33.210
match the exact schema correctly.

03:33.210 --> 03:34.410
And also you have streaming.

03:34.410 --> 03:36.630
So providing real time progress updates

03:36.630 --> 03:37.860
during function calls.

03:37.860 --> 03:39.600
Some best practices include writing

03:39.600 --> 03:42.570
clear function descriptions and parameter documentation.

03:42.570 --> 03:45.000
Keep the function scheme as simple and intuitive.

03:45.000 --> 03:46.710
Apply software engineering principles,

03:46.710 --> 03:49.470
so least surprise and no invalid states

03:49.470 --> 03:51.840
and prefer fewer, more powerful functions

03:51.840 --> 03:53.700
over many specialized ones.

03:53.700 --> 03:55.050
So the next thing that we're gonna have a look at

03:55.050 --> 03:57.600
is let's get some hands on and practice learning

03:57.600 --> 03:58.830
how to do tool calling.

03:58.830 --> 04:00.270
And then we'll be following that up

04:00.270 --> 04:02.040
by building agentic loops.

04:02.040 --> 04:04.860
I want you to go back into your GitHub repository

04:04.860 --> 04:07.650
and go to the openai features and functionality folder.

04:07.650 --> 04:11.460
Load in and start this notebook called tool calling.

04:11.460 --> 04:12.360
And we're gonna go through

04:12.360 --> 04:13.890
and write a couple of examples in here.

04:13.890 --> 04:15.510
You can either run the code

04:15.510 --> 04:17.550
or feel free to follow along

04:17.550 --> 04:19.320
as we code this up from scratch.

04:19.320 --> 04:20.550
Okay, so the first thing we're gonna do

04:20.550 --> 04:23.340
is we're gonna install OpenAI and Pydantic.

04:23.340 --> 04:25.140
Those are just some packages.

04:25.140 --> 04:27.870
After that, what you're gonna do is you're gonna import

04:27.870 --> 04:30.030
OpenAI from OpenAI,

04:30.030 --> 04:31.590
import JSON and requests,

04:31.590 --> 04:34.530
and we're gonna set the model to GPT-4.1 mini.

04:34.530 --> 04:36.960
You're also going to need to update your API key,

04:36.960 --> 04:39.120
so feel free to do that and then continue.

04:39.120 --> 04:40.800
Now we have a get weather function,

04:40.800 --> 04:45.030
which makes a get request to this open mateo forecast

04:45.030 --> 04:48.720
and it takes two arguments, latitude and longitude.

04:48.720 --> 04:50.370
We are then gonna build this up step by step

04:50.370 --> 04:52.710
to figure out exactly how to do tool calling.

04:52.710 --> 04:55.800
So firstly we have our tools Python list,

04:55.800 --> 04:59.670
and we need to add on all of the tools in here.

04:59.670 --> 05:00.660
We are gonna have one tool

05:00.660 --> 05:02.913
and we're gonna have to type the JSON schema.

05:04.080 --> 05:06.240
So the first thing that we need to do

05:06.240 --> 05:08.640
is we need to create a dictionary.

05:08.640 --> 05:13.640
And in that we're gonna have type is function.

05:16.440 --> 05:19.260
Then we're gonna set the name of our tool

05:19.260 --> 05:21.810
is equal to get_weather.

05:21.810 --> 05:22.770
Then once we've done that,

05:22.770 --> 05:24.900
we have a description of what that tool is.

05:24.900 --> 05:28.260
So this tool will allow you

05:28.260 --> 05:30.123
to easily get the weather.

05:32.550 --> 05:35.310
Then we need to set the parameters of our function.

05:35.310 --> 05:37.140
So we'll say parameters

05:37.140 --> 05:40.990
and we'll do a dictionary type object

05:43.020 --> 05:45.453
and we'll add a comma there, type object.

05:46.500 --> 05:48.520
And then we have properties

05:49.410 --> 05:51.480
and we have an object again,

05:51.480 --> 05:56.480
latitude, which is a type of number

05:57.180 --> 06:02.180
and has a description of latitude of the location.

06:03.390 --> 06:06.300
And we have a second argument or property,

06:06.300 --> 06:08.103
which is the longitude,

06:09.240 --> 06:13.533
which is type of number.

06:15.000 --> 06:19.413
Description is gonna be longitude

06:20.430 --> 06:21.843
of the location.

06:22.800 --> 06:27.090
And then that's technically enough at this point in time.

06:27.090 --> 06:28.710
You can also add on things

06:28.710 --> 06:30.870
like if you require certain parameters.

06:30.870 --> 06:33.180
So you can say both of these,

06:33.180 --> 06:37.710
the latitude and longitude are required parameters.

06:37.710 --> 06:39.510
We then set up our chat messages.

06:39.510 --> 06:41.220
So we've got our input messages

06:41.220 --> 06:42.420
and we're gonna go and fill that out.

06:42.420 --> 06:47.420
So we need to have a role of user.

06:47.910 --> 06:49.800
And the content that we're gonna have here

06:49.800 --> 06:53.673
is gonna be what is the weather like in London?

06:54.660 --> 06:55.493
Okay, great.

06:55.493 --> 06:56.940
So we have our input messages.

06:56.940 --> 06:59.670
The next thing we're gonna need to do is call the API.

06:59.670 --> 07:01.650
So we're gonna do response is equal to

07:01.650 --> 07:04.590
client.responses.create,

07:04.590 --> 07:05.910
and then we're gonna pass in

07:05.910 --> 07:08.320
our model is equal to model

07:11.010 --> 07:12.510
and that was defined earlier.

07:12.510 --> 07:16.980
We're also going to put the input is equal to input messages

07:16.980 --> 07:19.560
and we'll put the tools is equal to tools.

07:19.560 --> 07:21.810
And basically at this point when you have this tool

07:21.810 --> 07:22.830
is equal to tools

07:22.830 --> 07:25.680
that's providing this large language model,

07:25.680 --> 07:27.993
the ability to know which tools it can use.

07:29.520 --> 07:30.779
The next thing we're gonna need to do

07:30.779 --> 07:32.370
is we're gonna need to print the output.

07:32.370 --> 07:37.370
So we will do print get weather response output,

07:39.180 --> 07:41.190
and I'll do response to output.

07:41.190 --> 07:43.950
Now if we just run this, now, if we just run this,

07:43.950 --> 07:48.400
you'll see that basically the response is being generated

07:49.546 --> 07:51.540
and you'll see that it's actually got the output

07:51.540 --> 07:53.760
and it has this response function tool call

07:53.760 --> 07:55.440
and it has the arguments.

07:55.440 --> 07:57.180
So it's actually told us exactly

07:57.180 --> 07:59.010
what function it wants to execute.

07:59.010 --> 08:01.020
So notice it's got a call id,

08:01.020 --> 08:03.630
it's got the name, the type function call,

08:03.630 --> 08:05.340
and it has an ID as well here.

08:05.340 --> 08:07.530
So this ID for the function call ID

08:07.530 --> 08:09.150
and a status of completed.

08:09.150 --> 08:10.410
So the first thing we need to do

08:10.410 --> 08:13.170
is we need to tell the language model,

08:13.170 --> 08:15.543
let's go and get the tool call out from this.

08:16.590 --> 08:20.863
So tool call is equal to response.output [0]

08:24.450 --> 08:28.870
and we will say the args of this is equal to json.loads

08:29.785 --> 08:31.713
tool_call.arguments.

08:33.150 --> 08:34.230
And then we get the results,

08:34.230 --> 08:37.270
we'll say where the result is equal to get_weather

08:39.312 --> 08:41.740
and we'll do args latitude

08:43.183 --> 08:46.173
and args longitude.

08:47.400 --> 08:49.350
And then what we need to do is we need to add

08:49.350 --> 08:52.620
the function messages back into the tool.

08:52.620 --> 08:57.423
So we do input_messages.append tool_call.

08:58.290 --> 09:00.990
So that's the original message that it told us to invoke.

09:00.990 --> 09:04.530
And we also then will put in the fact that we've

09:04.530 --> 09:05.670
invoked that function.

09:05.670 --> 09:07.410
So the first one is to tell it yes,

09:07.410 --> 09:08.520
we're adding on the tool call.

09:08.520 --> 09:10.650
The LLM wants to make this tool call.

09:10.650 --> 09:14.250
The second message append is to tell the LLM,

09:14.250 --> 09:16.590
hey, we actually executed this tool call

09:16.590 --> 09:18.630
and here are the results, right?

09:18.630 --> 09:19.710
That is really important.

09:19.710 --> 09:24.630
So then we're doing a type of function call output.

09:24.630 --> 09:27.030
And then after that we're doing a call id.

09:27.030 --> 09:30.003
And you'll see here we're doing the tool_call.call_id.

09:31.650 --> 09:33.310
And then we do the output

09:34.170 --> 09:38.070
and we're just gonna string json.dumps

09:38.070 --> 09:39.713
the weather_result.

09:41.250 --> 09:43.650
Okay, so now that we've added that extra message history on,

09:43.650 --> 09:45.810
the only thing we also need to do now

09:45.810 --> 09:48.090
is we just need to make a new response to that.

09:48.090 --> 09:52.080
So we'll do client.responses.create,

09:52.080 --> 09:53.970
model is equal to model,

09:53.970 --> 09:56.733
input is equal to input messages.

09:57.600 --> 10:00.003
And we'll also do tools is equal to tools.

10:01.080 --> 10:03.330
And then we can get the response to output from this again.

10:03.330 --> 10:05.413
So we'll do response.output_text.

10:07.500 --> 10:09.300
And let's see what this brings back.

10:10.170 --> 10:11.250
So you've got our weather output

10:11.250 --> 10:13.230
and it said the current temperature in London

10:13.230 --> 10:15.720
is approximately 5.1 degrees Celsius.

10:15.720 --> 10:20.403
So to recap, we created a get weather function,

10:21.390 --> 10:22.950
we created our tool

10:22.950 --> 10:25.320
and we made that using JSON schema,

10:25.320 --> 10:28.380
we set up a input message which had a query

10:28.380 --> 10:31.590
which would naturally invoke one of our tools.

10:31.590 --> 10:34.800
We then added that into the client.responses.create.

10:34.800 --> 10:37.770
Noticing we're adding this tools parameter,

10:37.770 --> 10:41.610
which tells the model that it can use our get weather tool.

10:41.610 --> 10:43.470
We then looked at the output of that

10:43.470 --> 10:47.070
and we saw that it had this response function call tool,

10:47.070 --> 10:49.620
which showed the arguments, the call ID

10:49.620 --> 10:51.450
and the name of the tool.

10:51.450 --> 10:53.010
We then get the output of that,

10:53.010 --> 10:55.770
we load the arguments in from a string,

10:55.770 --> 10:57.570
and then we invoke our tool.

10:57.570 --> 11:00.390
After invoking our tool, we then tell the model,

11:00.390 --> 11:03.150
hey, we know you had that tool call that you wanted to make

11:03.150 --> 11:06.030
and here's the result of that tool call.

11:06.030 --> 11:08.010
Then we then call the model again,

11:08.010 --> 11:10.890
providing it with a model, the input messages and the tools.

11:10.890 --> 11:13.830
And then the model then uses the tool result to tell us,

11:13.830 --> 11:15.930
okay, great, yeah, here's the temperature.

11:16.920 --> 11:18.180
Alright, let's step it up a notch.

11:18.180 --> 11:21.510
So OpenAI also provides prebuilt tools.

11:21.510 --> 11:25.590
So we're gonna use this tools type web search preview,

11:25.590 --> 11:27.007
and we have an input message here,

11:27.007 --> 11:29.700
"What was a positive news story from today?"

11:29.700 --> 11:30.960
And we can just again,

11:30.960 --> 11:34.080
provide the tools to the language model

11:34.080 --> 11:36.570
and provide the input, which is a role of user.

11:36.570 --> 11:39.000
And the content is this input message.

11:39.000 --> 11:41.040
We're also going to, you know,

11:41.040 --> 11:44.010
optionally you could show more specific information

11:44.010 --> 11:46.230
if you want to go and have a look at, you know,

11:46.230 --> 11:48.090
a country or a city or region.

11:48.090 --> 11:48.990
And you can have a look at here.

11:48.990 --> 11:50.010
So when we run these,

11:50.010 --> 11:52.470
it will do a live web search tool call

11:52.470 --> 11:54.960
and it will return the results directly to us.

11:54.960 --> 11:57.630
So it's actually searching the web in real time

11:57.630 --> 11:59.580
using one of OpenAI's prebuilt tools.

11:59.580 --> 12:01.830
And you can see these are the different types of results

12:01.830 --> 12:05.010
it searched and basically this allows you to build

12:05.010 --> 12:07.020
easy web search functionality

12:07.020 --> 12:09.840
directly into your large language model applications.

12:09.840 --> 12:12.030
It's also possible to do streaming with tool calls.

12:12.030 --> 12:14.730
So just like our get weather example you can add on

12:14.730 --> 12:16.860
the stream is equal to true parameter

12:16.860 --> 12:19.920
and you'll get back a lots of different results.

12:19.920 --> 12:22.140
So the response created event,

12:22.140 --> 12:23.850
the response in progress event,

12:23.850 --> 12:26.070
and an output item added event.

12:26.070 --> 12:27.750
And then you'll also, the most important one

12:27.750 --> 12:29.640
is this response completed event

12:29.640 --> 12:31.560
that shows you the entire response.

12:31.560 --> 12:33.660
But you can build this up in real time

12:33.660 --> 12:35.310
with the Delta events.

12:35.310 --> 12:37.620
So just bear in mind that you can also use streaming

12:37.620 --> 12:40.110
if we're doing tool calling so that you don't have to wait

12:40.110 --> 12:41.940
for the entire tool result

12:41.940 --> 12:44.010
to update your users in real time.

12:44.010 --> 12:45.960
Cool, so we looked at how to build tools,

12:45.960 --> 12:49.440
how to define those functions, and also handling responses.

12:49.440 --> 12:51.990
And we had a brief look at the web search tool,

12:51.990 --> 12:54.300
which is a built-in tool by OpenAI.

12:54.300 --> 12:55.410
And also we looked at the fact

12:55.410 --> 12:57.630
that you can naturally use streaming

12:57.630 --> 13:00.750
when you are invoking tools so that you can respond to users

13:00.750 --> 13:01.800
in real time.

13:01.800 --> 13:03.030
In the next video we'll have a look at

13:03.030 --> 13:04.830
how you can build agents from scratch.

13:04.830 --> 13:06.330
Cool, see you in the next one.
