WEBVTT

00:00.150 --> 00:01.440
Eden: Hey there, Eden here,

00:01.440 --> 00:03.750
and in this video we'll explain the motivation

00:03.750 --> 00:06.420
of creating the LangGraph framework.

00:06.420 --> 00:09.990
Now it's going to be theoretical/philosophical

00:09.990 --> 00:14.310
and it's really going to explain why LangGraph was created.

00:14.310 --> 00:16.800
So first I want to start with a big shout out

00:16.800 --> 00:18.090
to the LangChain team

00:18.090 --> 00:21.180
that provided me some slides and some illustrations

00:21.180 --> 00:23.460
so I can use in this video.

00:23.460 --> 00:26.910
Alright, so let's talk about building AI systems

00:26.910 --> 00:30.030
and if we'll take a look on AI systems

00:30.030 --> 00:33.840
from the perspective of levels of autonomy

00:33.840 --> 00:36.210
that those applications have,

00:36.210 --> 00:38.040
then we have a spectrum.

00:38.040 --> 00:41.250
And in that spectrum, on the one end,

00:41.250 --> 00:43.740
we have deterministic code.

00:43.740 --> 00:46.710
And those are systems, where we as developers,

00:46.710 --> 00:48.030
we write the code.

00:48.030 --> 00:50.370
We do not integrate any LLN,

00:50.370 --> 00:52.860
so we only have deterministic code

00:52.860 --> 00:56.430
like the functions and flow that you see right over here.

00:56.430 --> 00:59.670
So we know exactly what is the output,

00:59.670 --> 01:01.650
which is going to be at every step,

01:01.650 --> 01:03.480
what's going to be the input,

01:03.480 --> 01:05.190
which steps we're going to take.

01:05.190 --> 01:09.660
And we basically have control, on the entire system.

01:09.660 --> 01:14.160
And while those systems are very resilient and reliable

01:14.160 --> 01:17.850
because we have control on their entire performance,

01:17.850 --> 01:20.610
however, they're not flexible at all.

01:20.610 --> 01:23.343
And this is because everything is hardcoded.

01:24.720 --> 01:27.390
And if we take a look at the spectrum,

01:27.390 --> 01:31.110
on the other hand, we have this idea of autonomous agents

01:31.110 --> 01:33.870
and those autonomous agents can do everything.

01:33.870 --> 01:36.627
They can make up their task of what to do,

01:36.627 --> 01:40.140
they can write code, execute that code

01:40.140 --> 01:43.920
then to reorder their tasks and to write another code.

01:43.920 --> 01:45.330
And they're very dynamic

01:45.330 --> 01:47.850
and they can really get a task

01:47.850 --> 01:49.620
from the beginning to the end

01:49.620 --> 01:51.780
and they're super flexible.

01:51.780 --> 01:53.730
They can receive prompts like,

01:53.730 --> 01:56.940
make be the number one YouTuber and they supposedly

01:56.940 --> 01:59.190
can actually do this.

01:59.190 --> 02:02.640
However, in reality, those kinds of systems,

02:02.640 --> 02:04.050
they don't really exist.

02:04.050 --> 02:09.050
So projects like RGPT GPT Engineer, BabyAGI,

02:09.660 --> 02:12.960
they were trying to implement something like this.

02:12.960 --> 02:17.280
Now, while those projects are very important to the industry

02:17.280 --> 02:19.560
and I really think they pound on innovation

02:19.560 --> 02:21.420
and pushing the boundaries,

02:21.420 --> 02:24.480
they're not very production oriented

02:24.480 --> 02:26.550
and we don't see any usage

02:26.550 --> 02:28.950
of those kinds of systems in production.

02:28.950 --> 02:31.530
Because there's simply too flexible

02:31.530 --> 02:33.270
and we don't have control

02:33.270 --> 02:36.090
because we rely too much on the LLM.

02:36.090 --> 02:40.800
And when we do that, then the LLM tends to scatter around

02:40.800 --> 02:43.980
and to not output us what we want.

02:43.980 --> 02:46.860
And the reason for this is that LLMs

02:46.860 --> 02:50.940
at the very basic level are simply statistical creatures

02:50.940 --> 02:53.250
guessing one token after the other.

02:53.250 --> 02:56.760
So autonomous agents, while flexible,

02:56.760 --> 02:59.010
they're not very reliable.

02:59.010 --> 03:02.700
All right, so those are the two ends of the spectrum

03:02.700 --> 03:07.530
and let's now talk about what's in between the spectrum.

03:07.530 --> 03:11.460
So one level after writing the terministic code

03:11.460 --> 03:14.760
is to integrate an LLM into that code.

03:14.760 --> 03:17.820
So we as developers, we still write the code

03:17.820 --> 03:20.070
and we control the control flow

03:20.070 --> 03:22.980
and we know exactly what's going to be executed.

03:22.980 --> 03:26.730
But inside of this flow, then the LLM can be used

03:26.730 --> 03:29.340
for example, maybe to summarize something

03:29.340 --> 03:32.940
or to extract the information, to extract some entities.

03:32.940 --> 03:35.787
But overall we have a lot of control over here.

03:35.787 --> 03:39.390
The LLM simply helps us in this process

03:39.390 --> 03:42.960
and controls only one output in this entire flow.

03:42.960 --> 03:45.690
And by only doing this, by integrating an LLM,

03:45.690 --> 03:47.730
we gain a lot of flexibility

03:47.730 --> 03:51.780
because the generation of LLMs, can be very creative

03:51.780 --> 03:53.970
and we still have most of the control

03:53.970 --> 03:55.890
on the development side.

03:55.890 --> 03:58.770
Now if we take this a step further,

03:58.770 --> 04:01.290
we can introduce the concept of chaining.

04:01.290 --> 04:04.980
And this is basically to take one output of an LLM

04:04.980 --> 04:08.010
and giving it as an input to another LLM.

04:08.010 --> 04:11.250
So to compose one call over the other.

04:11.250 --> 04:13.020
And here you can see an example

04:13.020 --> 04:16.650
of a RAG flow, Retrieval-Augmentation Generation

04:16.650 --> 04:20.610
where we give the first LLM our original question.

04:20.610 --> 04:22.830
We then integrate embeddings,

04:22.830 --> 04:25.985
so we take the question, we embed it

04:25.985 --> 04:29.640
and we retrieve with similar relevant documents

04:29.640 --> 04:33.210
which are probably going to help answer that question.

04:33.210 --> 04:36.240
And then we take the original prompt, we augment it

04:36.240 --> 04:38.970
and we send everything to the LLM

04:38.970 --> 04:41.010
which finally generates the answer.

04:41.010 --> 04:42.750
And this is just one example

04:42.750 --> 04:47.750
of chaining multiple LLM calls/using embeddings.

04:48.630 --> 04:51.210
So we have a lot of other use cases

04:51.210 --> 04:53.100
and a lot of other flow chains,

04:53.100 --> 04:55.590
but this is basically one step further

04:55.590 --> 04:58.140
which really gives us the possibility

04:58.140 --> 04:59.820
to build really cool things.

04:59.820 --> 05:02.850
So to summarize, in a chain we have LLMs

05:02.850 --> 05:07.200
that determine the output in a bunch of steps, not only one.

05:07.200 --> 05:10.337
And the more we progress to this autonomous agent

05:10.337 --> 05:11.910
end of the spectrum,

05:11.910 --> 05:15.810
we leverage LLMs to build complex systems.

05:15.810 --> 05:18.300
Alright, so let's now continue

05:18.300 --> 05:22.620
and let's now discuss the concept of an LLM router.

05:22.620 --> 05:26.280
And an LLM router is a kind of chain

05:26.280 --> 05:31.140
which leverages the LLM to decide where do we want to go.

05:31.140 --> 05:33.180
So we can use an LLM to decide

05:33.180 --> 05:36.420
whether we are going to execute code in Branch 1

05:36.420 --> 05:37.980
or code in Branch 2.

05:37.980 --> 05:41.490
So it can be for example to go and search in a database

05:41.490 --> 05:43.650
or to go and search over the web

05:43.650 --> 05:46.680
and the LLM can decide where to go.

05:46.680 --> 05:50.760
So we are using the reasoning power of an LLM here.

05:50.760 --> 05:53.460
Here for the first time, the LLM decides

05:53.460 --> 05:55.560
which steps do we need to take

05:55.560 --> 05:58.080
and this gives us even more flexibility

05:58.080 --> 06:00.183
of building those kinds of systems.

06:01.050 --> 06:03.480
However, you notice here that it's written

06:03.480 --> 06:07.470
that there are no cycles in an LLM router

06:07.470 --> 06:09.930
and you see this dotted line over here.

06:09.930 --> 06:12.240
So what is below this dotted line

06:12.240 --> 06:15.360
is what we consider an agent

06:15.360 --> 06:18.150
or an agentic application.

06:18.150 --> 06:21.450
Cool. So everything you see above this line

06:21.450 --> 06:25.770
is actually very well implemented within LangChain

06:25.770 --> 06:27.990
and I am a very big fan of LangChain

06:27.990 --> 06:30.720
and I think we can build very advanced systems

06:30.720 --> 06:32.670
with only those building blocks

06:32.670 --> 06:34.650
with the LangChain framework.

06:34.650 --> 06:37.200
Alright, back into this theoretical part,

06:37.200 --> 06:38.370
you see this gap here,

06:38.370 --> 06:41.520
right between this autonomous agent and the router.

06:41.520 --> 06:43.140
Then I'll give you a spoiler,

06:43.140 --> 06:45.873
LangGraph is positioned right over here.

06:47.400 --> 06:51.690
There have been a lot of discussions of what is an agent

06:51.690 --> 06:54.390
and what's the formal definition of an agent

06:54.390 --> 06:56.880
and what's an agentic application.

06:56.880 --> 06:59.340
And if you go and ask somebody what's an agent

06:59.340 --> 07:01.050
or what's an agentic application,

07:01.050 --> 07:03.300
you're going to hear from three people

07:03.300 --> 07:04.650
three different answers.

07:04.650 --> 07:06.930
And if you're going to ask them a day later,

07:06.930 --> 07:09.270
you'll get three more different answers.

07:09.270 --> 07:10.920
Now today, in my opinion,

07:10.920 --> 07:14.880
the definition of an agentic system or an agent,

07:14.880 --> 07:19.500
is very soft and there isn't a definitive answer for that.

07:19.500 --> 07:22.740
I do like what Andrew Ng from DeepLearningAI

07:22.740 --> 07:26.280
and Harrison Chase from LangChain wrote on the topic.

07:26.280 --> 07:30.600
But I think they all share a similar consensus

07:30.600 --> 07:32.820
of the core components of an agent.

07:32.820 --> 07:36.060
And in my opinion, if you really want to dummy down

07:36.060 --> 07:38.910
and to look at it at its core level,

07:38.910 --> 07:42.210
then an agent is essentially

07:42.210 --> 07:47.210
a control flow that an LLM decides where to go.

07:47.370 --> 07:49.560
So you can see in this picture over here,

07:49.560 --> 07:53.190
we have an LLM which decides whether to go to step one

07:53.190 --> 07:55.500
or whether to go to step two.

07:55.500 --> 08:00.450
So this is a very basic example of an agent.

08:00.450 --> 08:03.240
We're using the reasoning power of an LLM

08:03.240 --> 08:06.450
to decide where to go in this flow.

08:06.450 --> 08:07.717
And you might ask yourself,

08:07.717 --> 08:10.777
"Hey, what's the difference between an agent and a chain

08:10.777 --> 08:12.570
"and specifically even the router chain."

08:12.570 --> 08:13.920
Because I showed you earlier

08:13.920 --> 08:16.170
and I told you it wasn't an agent.

08:16.170 --> 08:19.140
And the main difference is that a chain

08:19.140 --> 08:20.550
is one directional.

08:20.550 --> 08:25.230
We move from the left to the right, while in an agent,

08:25.230 --> 08:28.140
we actually have those kinds of cycles.

08:28.140 --> 08:32.400
And those cycles we're going to soon see are very important

08:32.400 --> 08:35.790
and they are what's going to give our application

08:35.790 --> 08:37.290
agentic properties.

08:37.290 --> 08:41.280
And specifically agents, at least agents nowadays,

08:41.280 --> 08:45.150
they use function calling, to decide which steps to take.

08:45.150 --> 08:47.370
And function calling is a very cool feature

08:47.370 --> 08:50.640
of certain LLMs, where besides

08:50.640 --> 08:53.730
our query that we send the LLM, the question,

08:53.730 --> 08:57.270
we can send a description of tools.

08:57.270 --> 09:01.440
So those are functions that we can execute in our backend

09:01.440 --> 09:03.210
and we can orchestrate them

09:03.210 --> 09:06.060
and we send the LLM, also the description

09:06.060 --> 09:09.030
of those functions, their arguments, their title,

09:09.030 --> 09:10.560
what they're supposed to do

09:10.560 --> 09:12.720
and what their return value is

09:12.720 --> 09:16.710
and we can do that very easily with the tool decorator.

09:16.710 --> 09:20.040
And then the LLM, if it finds appropriate,

09:20.040 --> 09:22.890
it can tell us that we need to invoke these functions

09:22.890 --> 09:24.720
with those arguments.

09:24.720 --> 09:28.290
So, hopefully we'll go, we'll invoke those function

09:28.290 --> 09:29.760
with those arguments

09:29.760 --> 09:32.703
and we'll get back the answer that we want.

09:33.570 --> 09:36.660
Cool. So now let's talk about the most basic design

09:36.660 --> 09:38.220
for an agent.

09:38.220 --> 09:41.970
And it was first introduced in the ReAct paper

09:41.970 --> 09:46.170
and it really, I think changed our industry.

09:46.170 --> 09:50.100
So basically the algorithm for this type of agent,

09:50.100 --> 09:51.390
is very simple.

09:51.390 --> 09:54.060
So we start, we then use the LLM

09:54.060 --> 09:56.340
to decide whether we need to use a tool.

09:56.340 --> 09:59.550
So for example, to make a call to an API

09:59.550 --> 10:02.700
or a call to a database and query there

10:02.700 --> 10:05.100
and the LLM decides whether we need

10:05.100 --> 10:07.020
to call this tool or not.

10:07.020 --> 10:10.620
Then we go and call this tool with the arguments

10:10.620 --> 10:13.890
that the LLM chose us to execute this tool.

10:13.890 --> 10:17.512
We get an answer and then we feed back everything to the LLM

10:17.512 --> 10:20.760
which can decide whether to use another tool

10:20.760 --> 10:24.390
or whether to return the answer to the user.

10:24.390 --> 10:26.430
So I don't want to dive too deep

10:26.430 --> 10:30.060
into the ReAct algorithm and how does it work?

10:30.060 --> 10:33.300
I do actually do this in my LangChain course

10:33.300 --> 10:34.800
and my LangGraph course,

10:34.800 --> 10:37.290
where we do implement this kind of algorithm

10:37.290 --> 10:40.740
from zero and we really understand where it's coming from

10:40.740 --> 10:43.230
and what is this kind of magic

10:43.230 --> 10:45.150
because it really looks magical

10:45.150 --> 10:46.830
from the first time you see it.

10:46.830 --> 10:50.640
All right. So the ReAct paper, was super innovative

10:50.640 --> 10:54.150
and right away the LangChain team implemented very nicely

10:54.150 --> 10:55.410
within the LangChain framework

10:55.410 --> 10:57.663
and we started seeing those agents pop up.

10:58.590 --> 11:02.280
So those agents were actually very flexible.

11:02.280 --> 11:05.610
So the LLM can decide whether to invoke tool one

11:05.610 --> 11:08.793
or then to invoke tool two, or to invoke tool one

11:08.793 --> 11:11.610
and then tool two, et cetera.

11:11.610 --> 11:15.000
And it has all the permutation that it can wrap.

11:15.000 --> 11:17.250
However, it was too flexible.

11:17.250 --> 11:19.740
And since every permutation is allowed,

11:19.740 --> 11:22.560
then also this permutation is allowed.

11:22.560 --> 11:25.620
And if you've been developing agents for a while,

11:25.620 --> 11:28.020
then you're probably familiar with this error

11:28.020 --> 11:31.890
where the agent is in this sort of end the slope

11:31.890 --> 11:34.350
where it's invoking the same tool,

11:34.350 --> 11:36.090
over and over and over

11:36.090 --> 11:38.310
and simply is stuck.

11:38.310 --> 11:41.430
And there are many reasons for why this can happen.

11:41.430 --> 11:44.730
Maybe we didn't define our tools correctly.

11:44.730 --> 11:48.240
The LLM is non-deterministic or not strong enough.

11:48.240 --> 11:50.370
It chooses the correct tool to use

11:50.370 --> 11:53.550
or it gives it the incorrect arguments

11:53.550 --> 11:56.520
or it even hallucinates a tool that does not exist.

11:56.520 --> 11:58.620
So there are a lot of explanations

11:58.620 --> 12:00.633
of why those things may happen.

12:01.860 --> 12:04.620
And basically what I'm trying to show you here,

12:04.620 --> 12:06.810
is that we have here a problem,

12:06.810 --> 12:11.220
where we have an agent maybe that is flexible

12:11.220 --> 12:12.965
like in RGPT

12:12.965 --> 12:17.910
or even the ReAct algorithm, but it's not very reliable.

12:17.910 --> 12:22.910
And what we want is something which still is flexible

12:23.160 --> 12:24.750
but much more reliable.

12:24.750 --> 12:27.600
So we can actually use it in production systems

12:27.600 --> 12:31.140
and let users interact with it and get good results

12:31.140 --> 12:33.900
which work outside the scope of a demo.

12:33.900 --> 12:37.350
This is exactly, why LangGraph was created.

12:37.350 --> 12:38.760
And you see this gap here

12:38.760 --> 12:41.820
between the autonomous agent and the router.

12:41.820 --> 12:44.520
So here, LangGraph is positioned

12:44.520 --> 12:46.440
and the idea of LangGraph

12:46.440 --> 12:51.240
is to not give the entire freedom to the LLM

12:51.240 --> 12:54.360
rather to scope it and to take this freedom

12:54.360 --> 12:57.690
and take it down by one dimension

12:57.690 --> 13:00.060
and to still allow the LLM to have freedom

13:00.060 --> 13:02.130
but not to give it all the freedom.

13:02.130 --> 13:05.190
And with LangGraph, we represent our software,

13:05.190 --> 13:09.000
our agentic software as a graph with nodes and edges

13:09.000 --> 13:11.430
and we represent it as a state machine,

13:11.430 --> 13:13.440
which can have cycles,

13:13.440 --> 13:16.500
which will give us agentic properties

13:16.500 --> 13:18.870
and it will look like the agent can reason

13:18.870 --> 13:21.270
and it can think about what it needs to do.

13:21.270 --> 13:24.450
And it give us very advanced capabilities,

13:24.450 --> 13:27.390
but we control the flow as developers.

13:27.390 --> 13:30.390
The LLM can play a crucial role in this flow

13:30.390 --> 13:33.030
and decide where do we need to go in this flow.

13:33.030 --> 13:36.390
But we as developers, we decide this flow

13:36.390 --> 13:39.450
and by actually reducing one dimension

13:39.450 --> 13:41.820
of freedom from the LLM,

13:41.820 --> 13:45.750
we can actually gain a lot of reliability

13:45.750 --> 13:47.970
and we can architect our system

13:47.970 --> 13:51.960
such that it would be much more resilient and reliable

13:51.960 --> 13:55.470
and all thanks because we have the entire control

13:55.470 --> 13:56.580
of the flow.

13:56.580 --> 14:00.270
And with LangGraph, we architect our software,

14:00.270 --> 14:03.150
our agentic software as a state machine,

14:03.150 --> 14:05.130
where we have nodes and edges

14:05.130 --> 14:07.590
and this is displayed as a graph,

14:07.590 --> 14:09.210
like you can see right here.

14:09.210 --> 14:11.880
We have nodes and we have edges

14:11.880 --> 14:14.610
and LangGraph gives us a lot of support

14:14.610 --> 14:17.010
for building those kinds of graphs

14:17.010 --> 14:19.050
which are very opinionated

14:19.050 --> 14:21.570
for building agentic applications.

14:21.570 --> 14:22.927
And you might ask yourself,

14:22.927 --> 14:24.937
"Hey, why do we need LangGraph for that?

14:24.937 --> 14:27.157
"I can build it with airflow

14:27.157 --> 14:28.657
"or with network X

14:28.657 --> 14:31.470
"or another graph framework."

14:31.470 --> 14:34.860
And the reason is that LangGraph is very opinionated

14:34.860 --> 14:37.140
towards agentic applications

14:37.140 --> 14:40.170
and it's built to solve that problem.

14:40.170 --> 14:42.540
And it offers a lot of building blocks,

14:42.540 --> 14:46.470
like controllability, running nodes in parallel

14:46.470 --> 14:49.980
and having conditional branching with the LLMs

14:49.980 --> 14:52.380
and having persistence which is built in.

14:52.380 --> 14:54.900
So we can store our current state of the graph

14:54.900 --> 14:57.780
and what's being executed and what has been executed,

14:57.780 --> 15:00.060
which helps us to implement very easily,

15:00.060 --> 15:01.710
human in the loop flows

15:01.710 --> 15:03.900
where we integrate human feedback

15:03.900 --> 15:07.590
which calibrates our execution of our agent.

15:07.590 --> 15:10.680
Time traveling, which is to replay some step

15:10.680 --> 15:12.780
that didn't work correctly

15:12.780 --> 15:16.200
and even debugging and tooling for tracing

15:16.200 --> 15:19.890
because it comes integrated with LangSmith out of the box.

15:19.890 --> 15:22.230
And by the way, you can write inside LangGraph

15:22.230 --> 15:23.130
any code you want.

15:23.130 --> 15:25.003
So it doesn't have to be LangChain code,

15:25.003 --> 15:28.470
but you really can choose anything you want.

15:28.470 --> 15:31.140
By the way, I think one of the motivations

15:31.140 --> 15:34.770
for architecting the software as graphs

15:34.770 --> 15:39.770
is because, most of the papers on agentic applications

15:39.840 --> 15:43.020
and agentic behavior were illustrated as graphs.

15:43.020 --> 15:45.330
So it also feels very natural

15:45.330 --> 15:47.610
to describe those solutions as graphs

15:47.610 --> 15:51.390
and it is very readable and very easy to maintain

15:51.390 --> 15:53.463
and to test and to monitor.

15:54.690 --> 15:57.270
So basically in LangGraph, we as developers,

15:57.270 --> 16:00.600
we control the flow and we write what is the flow

16:00.600 --> 16:04.050
and we can integrate an LLM to decide where to go

16:04.050 --> 16:07.950
and what to execute in this flow and with cycles,

16:07.950 --> 16:09.270
this is important.

16:09.270 --> 16:11.190
And we represent this flow

16:11.190 --> 16:13.950
with nodes and with edges.

16:13.950 --> 16:16.890
So the LLM can use conditional branching

16:16.890 --> 16:18.480
to decide where to go

16:18.480 --> 16:21.990
and maybe to execute Node 1 or Node 2.

16:21.990 --> 16:23.850
This is our state machine

16:23.850 --> 16:26.820
and because we have a state machine, we need to have a state

16:26.820 --> 16:30.240
and the state is something which is shared across the nodes

16:30.240 --> 16:31.500
and across the edges.

16:31.500 --> 16:34.800
And it's going to save all of our intermediate results

16:34.800 --> 16:37.830
and it's going to give us useful information

16:37.830 --> 16:39.540
and to the LLM, useful information

16:39.540 --> 16:41.973
to decide where to go in that flow.