WEBVTT

00:00.270 --> 00:03.750
-: So in order to understand MCP in depth,

00:03.750 --> 00:05.460
we really need to go back a bit

00:05.460 --> 00:09.030
and to talk about the history of LLMs

00:09.030 --> 00:11.910
and AI applications in general.

00:11.910 --> 00:14.250
So let me remind you what are LLMs.

00:14.250 --> 00:17.640
LLMs are simple token generators.

00:17.640 --> 00:20.610
They're guessing one token after the other

00:20.610 --> 00:23.430
and they're simply text generators.

00:23.430 --> 00:26.040
And this is something which is not that obvious

00:26.040 --> 00:29.490
because nowadays, with all the agentic behavior,

00:29.490 --> 00:31.710
people think that LLMs have superpowers

00:31.710 --> 00:33.390
and they can do tons of stuff,

00:33.390 --> 00:34.650
and this is not the case.

00:34.650 --> 00:37.410
LLMs can only output text.

00:37.410 --> 00:39.390
Or maybe if it's multimodal LLMs,

00:39.390 --> 00:42.690
they can also output pictures and other formats,

00:42.690 --> 00:47.010
but they certainly cannot go and perform actions.

00:47.010 --> 00:50.100
So those extra capabilities like searching the web,

00:50.100 --> 00:54.390
performing deep research or invoking a Python function,

00:54.390 --> 00:57.090
those are external tools that are integrated

00:57.090 --> 01:00.090
into the application which is running the LLM.

01:00.090 --> 01:01.710
So for example, in ChatGPT,

01:01.710 --> 01:05.130
you use maybe the ChatGPT application in your desktop,

01:05.130 --> 01:07.770
or you use the ChatGPT web application,

01:07.770 --> 01:12.150
so it has the LLM wrapped inside an application.

01:12.150 --> 01:14.580
Software engineers wrote those applications.

01:14.580 --> 01:17.550
Let me remind you how tool usage works.

01:17.550 --> 01:22.110
So tools like web searching is external code

01:22.110 --> 01:27.110
that are not part of the LLM, that software engineers wrote.

01:27.690 --> 01:31.650
So if you use ChatGPT and you toggle the web search option,

01:31.650 --> 01:34.770
so the software engineers that work at OpenAI,

01:34.770 --> 01:37.050
they wrote this functionality.

01:37.050 --> 01:39.420
And how do we get this ad hoc behavior

01:39.420 --> 01:42.120
of tool usage with LLMs?

01:42.120 --> 01:44.550
So what's happening underneath the hood,

01:44.550 --> 01:46.853
and this is probably what's happening with OpenAI

01:46.853 --> 01:49.170
and all the other chat applications

01:49.170 --> 01:50.730
that are leveraging LLMs,

01:50.730 --> 01:53.040
is that we leverage tool calling.

01:53.040 --> 01:57.060
So remember the LLMs are text and token generator.

01:57.060 --> 01:58.170
So what happens here

01:58.170 --> 02:01.350
is that we have a very fancy system prompt,

02:01.350 --> 02:04.740
yes, this all boils down into a system prompt,

02:04.740 --> 02:07.230
which instead, let's say in a question

02:07.230 --> 02:09.870
like, "What is the weather right now?"

02:09.870 --> 02:12.540
Instead of generating this answer

02:12.540 --> 02:14.460
for those kinds of use cases,

02:14.460 --> 02:16.890
it won't say that the weather right now

02:16.890 --> 02:18.939
is 25 degrees Celsius,

02:18.939 --> 02:20.550
but what it would do,

02:20.550 --> 02:22.980
it would generate the text, get weather,

02:22.980 --> 02:27.210
and open parenthesis with the arguments of the city

02:27.210 --> 02:29.400
that we want the weather to, okay?

02:29.400 --> 02:31.950
So instead of generating the answer for this

02:31.950 --> 02:33.060
and hallucinating it,

02:33.060 --> 02:36.330
because it doesn't have access to real world information,

02:36.330 --> 02:40.380
then it simply generates the tool call, the tool invocation.

02:40.380 --> 02:41.820
And this tool call is going to be

02:41.820 --> 02:44.880
in a very specific format that the vendor designs,

02:44.880 --> 02:47.700
which is going to be very easy to parse.

02:47.700 --> 02:49.140
It's going to be easy to parse

02:49.140 --> 02:50.820
the functions that need to be called.

02:50.820 --> 02:52.860
It's going to be easy to parse the arguments

02:52.860 --> 02:55.110
which we need to call the function with.

02:55.110 --> 02:57.390
And there are many variations of tool calling.

02:57.390 --> 03:00.000
Each vendor implements it differently.

03:00.000 --> 03:03.960
But it all boils down into a very special system prompt.

03:03.960 --> 03:07.260
And we reviewed in my launching course the react prompt,

03:07.260 --> 03:09.000
which is an example of one.

03:09.000 --> 03:13.320
And this is basically what's happening underneath the hood.

03:13.320 --> 03:16.470
The LLM application, for example, ChatGPT,

03:16.470 --> 03:19.110
then takes this output, it parses it,

03:19.110 --> 03:21.060
and if there is a tool call,

03:21.060 --> 03:23.850
it simply goes and invoke the functionality

03:23.850 --> 03:25.590
that the engineers wrote.

03:25.590 --> 03:28.500
So it can have, for example, a web search tool,

03:28.500 --> 03:30.360
and the ChatGPT application

03:30.360 --> 03:33.210
is going to be wrapped in a very special prompt

03:33.210 --> 03:36.030
that is going to output, when necessary,

03:36.030 --> 03:39.690
the invocation of a web search with the user's query.

03:39.690 --> 03:41.400
And for example, if I would go and ask,

03:41.400 --> 03:43.560
what is the stock price of NVIDIA,

03:43.560 --> 03:46.260
it would go and generate the tokens

03:46.260 --> 03:49.170
of search on the web, of web search

03:49.170 --> 03:51.690
and query to be NVIDIA's stock price.

03:51.690 --> 03:55.380
So after the ChatGPT application perform the tool call,

03:55.380 --> 03:57.960
then it will generate another LLM call

03:57.960 --> 04:00.780
with the result of that tool call

04:00.780 --> 04:02.730
and the original user's query.

04:02.730 --> 04:04.290
So this is the basic functionality

04:04.290 --> 04:06.750
of almost every given agent.

04:06.750 --> 04:08.340
And also, it's important to note

04:08.340 --> 04:10.770
that LLMs are statistical creatures,

04:10.770 --> 04:13.770
so they are guessing the token one after another.

04:13.770 --> 04:16.890
So this tool calling, this tool calling mechanism,

04:16.890 --> 04:18.660
which the LLM is going to output

04:18.660 --> 04:20.400
the correct tool that we need to invoke

04:20.400 --> 04:21.870
with the correct arguments,

04:21.870 --> 04:23.730
it does not work 100%,

04:23.730 --> 04:26.010
but it works a lot of the time

04:26.010 --> 04:28.200
and most of the time, is actually pretty good

04:28.200 --> 04:30.480
for agentic applications.

04:30.480 --> 04:34.740
So to summarize, LLMs are simple token generators

04:34.740 --> 04:36.150
and token predictors.

04:36.150 --> 04:39.720
And tool calling capability is actually ad hoc behavior

04:39.720 --> 04:41.640
that we add in the application layer.

04:41.640 --> 04:43.347
So now, let's go back to MCP,

04:43.347 --> 04:46.530
and MCP lets us focus on writing those tools

04:46.530 --> 04:49.440
and exposing them in MCP servers.

04:49.440 --> 04:51.300
So those tools that we write,

04:51.300 --> 04:53.970
they can be used on all other applications

04:53.970 --> 04:55.530
that supports function calling.

04:55.530 --> 04:59.220
So they can be supported by ChatGPT at the end.

04:59.220 --> 05:02.310
So they announced that they are going to support MCP.

05:02.310 --> 05:06.213
They can be supported by the cloud desktop or Cursor.