WEBVTT

00:00.160 --> 00:00.840
Hey there.

00:00.880 --> 00:01.520
Eden here.

00:01.520 --> 00:08.080
And in this video we're going to introduce the concept of function calling aka tool calling.

00:08.080 --> 00:10.520
And this is strictly a theoretical video.

00:10.560 --> 00:13.800
We're going to be doing some hands on very very soon.

00:14.000 --> 00:16.680
So I just want to introduce the concepts.

00:17.240 --> 00:26.160
So function calling or tool calling refers to the model's ability to produce a structured function call

00:26.200 --> 00:29.520
to an external function with its arguments.

00:29.680 --> 00:36.960
So instead of just generating plain text it's going to generate a well structured answer which is very

00:36.960 --> 00:37.920
easy to parse.

00:38.360 --> 00:44.360
And it's going to appear in a very special place in the response from the LM instead of the generation

00:44.360 --> 00:45.040
content.

00:45.080 --> 00:50.520
Now, it's important to note that function calling is a capability of certain llms.

00:50.560 --> 00:53.480
Not all llms support function calling.

00:53.640 --> 01:00.000
However, these days this is quite a standard of the state of the art models, so you can pretty much

01:00.040 --> 01:06.190
assume that all the big vendors like OpenAI, like anthropic, like Google, when they release a state

01:06.190 --> 01:13.110
of the art model, it's going to support function calling and function calling or tool calling was introduced

01:13.110 --> 01:22.870
back in 2023 by OpenAI, and developers needed to simply provide with the model a list of function definitions,

01:23.070 --> 01:30.070
which includes the names, the parameters, and the descriptions of the functions, and the large language

01:30.070 --> 01:37.630
model can choose to respond by outputting a JSON object, which is going to specify which function to

01:37.670 --> 01:41.630
call and with what arguments if it needs to.

01:42.070 --> 01:49.310
So behind the scenes, this is a model which is fine tuned to detect when a function should be invoked

01:49.310 --> 01:57.670
based on the user's request, and then formats its response as a valid JSON adhering to the function's

01:57.670 --> 01:58.230
schema.

01:59.030 --> 02:07.210
So, for example, if a user is going to ask the LM, what's the weather in Paris to an LM where we

02:07.250 --> 02:10.210
binded the function of get weather?

02:10.530 --> 02:17.530
Then the LM is going to respond with a JSON which has the following information.

02:17.570 --> 02:19.970
The name is going to be get current weather.

02:20.210 --> 02:26.210
The arguments are going to be with the location of Paris and the unit Fahrenheit or Celsius.

02:26.810 --> 02:34.450
And our application could then take this JSON, parse it, and actually go and execute this get current

02:34.450 --> 02:38.370
weather function which should be existing in our application.

02:38.930 --> 02:45.850
And we can then take this response, plug it in back to the LM and continue and so on.

02:46.210 --> 02:53.290
And what was the actual motivation for the LM vendors to implement into the models function?

02:53.290 --> 02:59.290
Calling capabilities is actually the react prompt, which we already saw in the course.

02:59.490 --> 03:03.880
So you might notice that this is not so reliable.

03:03.880 --> 03:05.000
This react prompt.

03:05.160 --> 03:10.120
It sometimes outputs us some bad output, which is hard to parse.

03:10.120 --> 03:16.680
And then our program fails and we can have a lot of problems using this prompt, even though it's very,

03:16.680 --> 03:19.360
very cool, it's not that reliable.

03:19.400 --> 03:24.040
However, function calling is more reliable and more deterministic.

03:24.080 --> 03:27.480
All the heavy lifting is done by the LM vendor.

03:27.480 --> 03:35.880
So all the reasoning and what we get back is a very parsable JSON object, which is very, very easy

03:35.880 --> 03:36.880
to work with.

03:37.000 --> 03:42.280
By the way, in future videos, I do dive deeper into the difference between function calling and the

03:42.280 --> 03:45.680
react prompt, so I really recommend you checking this out.

03:45.960 --> 03:55.000
Anyways, the two main capabilities that function calling gives us is one to connect our LM to external

03:55.040 --> 04:01.040
tools, but the second one is to get structured output from the LM.

04:01.280 --> 04:09.030
So it's going to leverage the LMS reasoning capabilities to extract information in certain fields and

04:09.030 --> 04:13.390
return it in a very organized JSON, just like we wanted.

04:13.390 --> 04:18.190
So we can then convert it into a Pydantic object and we can downstream to our application.

04:18.190 --> 04:19.830
And this is very reliable.

04:19.830 --> 04:22.310
And I actually covered this as well in the course.

04:22.430 --> 04:22.750
All right.

04:22.750 --> 04:26.710
Let's talk about the advantages of using function calling.

04:27.110 --> 04:32.190
So first and foremost we get structured and reliable integration.

04:32.190 --> 04:40.190
So because the model's output is a machine readable JSON file with a specific function name and arguments,

04:40.190 --> 04:45.510
it's very easy to parse it and it's less prone to misinterpretation.

04:45.870 --> 04:48.670
So opposed to the react prompt.

04:48.910 --> 04:56.030
Now this model underneath the hood has been fine tuned to adhere to the function schema strictly, which

04:56.030 --> 05:01.390
is going to reduce random formatting errors like we're used to with the react prompt.

05:01.510 --> 05:09.020
So this structured approach is clean, it's very efficient, and it's going to enable us with a very

05:09.020 --> 05:11.700
powerful and reliable tool usage.

05:11.860 --> 05:16.340
Now another advantage is that it's very easy on the tokens.

05:16.540 --> 05:25.580
So because function calling doesn't output us all the chain of thought that we saw in previous sections,

05:25.580 --> 05:30.020
and we don't really get this high reasoning intensive prompting.

05:30.540 --> 05:37.340
So the model can skip these verbose explanations and directly and only return the function call.

05:37.500 --> 05:43.220
And the only drawback I can see for function calling, which is totally worth it.

05:43.220 --> 05:46.620
By the way, we have opaque reasoning process.

05:46.820 --> 05:54.460
And when the model decides to call a function, it typically does so without exposing its chain of thought.

05:54.620 --> 05:58.660
So the reasoning remains internal to the LLM.

05:58.820 --> 06:04.120
And we as the developers, we see only the final function, the name, the arguments, but we don't

06:04.120 --> 06:09.040
really see the justification and we don't really understand why it did so.

06:09.160 --> 06:17.040
So this function calling becomes more like a black box decision where we don't have any intermediate

06:17.040 --> 06:18.800
rationale which is exposed.

06:19.000 --> 06:25.840
So this can make debugging and auditing the model's decision harder, because we don't really see why

06:26.000 --> 06:30.640
it chose that particular function with those specific arguments.

06:30.680 --> 06:37.920
However, having said that, de facto function calling now is the standard and nobody really uses the

06:37.920 --> 06:39.400
actual react prompt.

06:39.400 --> 06:42.800
They're using the function calling features of Llms.

06:42.840 --> 06:50.160
And today, we're at a point where LLM vendors like OpenAI, like Google, like anthropic, have really

06:50.160 --> 06:51.760
perfected function calling.

06:51.920 --> 06:56.200
And we get really reliable answers with function calling.

06:56.240 --> 07:01.800
I mean, much more reliable than the react prompt, which is going to enable us to build more robust

07:01.840 --> 07:04.040
AI agents and AI applications.
