WEBVTT

00:00.870 --> 00:02.310
-: Hey there, Eden here,

00:02.310 --> 00:05.460
and in this video I want to talk about RAG applications

00:05.460 --> 00:06.330
in production.

00:06.330 --> 00:07.230
And specifically,

00:07.230 --> 00:09.330
I want to take the example I show in the course

00:09.330 --> 00:10.740
of the Documentation Helper,

00:10.740 --> 00:12.960
which takes the LangChain docs ingests them

00:12.960 --> 00:15.150
and create a RAG system on top of them.

00:15.150 --> 00:17.850
And I want to show you the project of Chat LangChain,

00:17.850 --> 00:20.520
which take this prototype that we build in this course

00:20.520 --> 00:22.290
and takes it to the next level

00:22.290 --> 00:24.390
because here it's implementing the paradigm,

00:24.390 --> 00:27.240
which is called Agentic RAG, which is implemented

00:27.240 --> 00:29.250
with LangChain, LangGraph,

00:29.250 --> 00:33.000
and it has a very elaborate system of curating our query

00:33.000 --> 00:36.780
and generating an optimized output that is very usable.

00:36.780 --> 00:37.943
I also like, by the way, the UI

00:37.943 --> 00:40.350
and the user experience of this application

00:40.350 --> 00:42.330
and I think it's a very good example.

00:42.330 --> 00:44.310
Now what's cool about this application is

00:44.310 --> 00:45.600
that it's open source

00:45.600 --> 00:47.640
and LangChain actually show you the code

00:47.640 --> 00:51.210
and you can deploy it in your use case as well.

00:51.210 --> 00:56.040
All right, so let's talk about chat.langchain.com.

00:56.040 --> 00:59.160
Now this is an application that LangChain has built,

00:59.160 --> 01:02.790
which is very similar to our documentation helper

01:02.790 --> 01:04.350
that we build in this course.

01:04.350 --> 01:07.770
And both applications provide the user the ability to chat

01:07.770 --> 01:09.780
with the LangChain documentation.

01:09.780 --> 01:12.120
However, here, LangChain took it to the next level

01:12.120 --> 01:14.626
and they have a very advanced architecture

01:14.626 --> 01:17.010
of a RAG system over here.

01:17.010 --> 01:18.480
Now they're implementing something

01:18.480 --> 01:20.310
which is called an Agentic RAG,

01:20.310 --> 01:23.130
which I talk about in my language course.

01:23.130 --> 01:27.120
Now before we dive in into this implementation over here,

01:27.120 --> 01:29.430
let me just show you how does it work.

01:29.430 --> 01:33.270
So we can ask questions about the LangChain documentation.

01:33.270 --> 01:36.960
So let's go for example and ask what is LangChain?

01:36.960 --> 01:39.900
And right now the system is going to generate

01:39.900 --> 01:41.190
a bunch of questions

01:41.190 --> 01:43.890
which are related to our original question.

01:43.890 --> 01:47.196
So the first question is to review the documentation

01:47.196 --> 01:50.700
and gather comprehensive definition of LangChain

01:50.700 --> 01:52.740
and then it's going to receive retrieve docs

01:52.740 --> 01:54.450
about that query.

01:54.450 --> 01:57.600
And it's going to do the same for two more queries.

01:57.600 --> 02:00.750
and the goal here is a heuristic that is going

02:00.750 --> 02:04.230
to retrieve better documents which are more relevant for us.

02:04.230 --> 02:06.180
So this is a very cool heuristic.

02:06.180 --> 02:08.280
So for each one of those subqueries,

02:08.280 --> 02:10.710
which are derived from our original query,

02:10.710 --> 02:12.960
we're going to perform some semantic search

02:12.960 --> 02:15.750
and we're going to retrieve documents that are going to help

02:15.750 --> 02:17.790
to answer this question

02:17.790 --> 02:20.610
and that's going to be the selected context.

02:20.610 --> 02:22.836
So it's going to be all the documents retrieved

02:22.836 --> 02:24.780
and going to be probably filtered out

02:24.780 --> 02:27.720
or re-rank by relevance.

02:27.720 --> 02:30.108
So once we have the documents that are going

02:30.108 --> 02:33.180
to help the question, we are going to augment the query

02:33.180 --> 02:36.510
and then the system is going to produce the answer.

02:36.510 --> 02:38.250
Of course, once we produce the answer,

02:38.250 --> 02:41.160
we are going to also output what are the sources.

02:41.160 --> 02:44.340
Now at any point of our time of the run,

02:44.340 --> 02:47.040
we can check out the relevant context over here,

02:47.040 --> 02:47.873
for example.

02:47.873 --> 02:49.530
So those are the retrieved documents

02:49.530 --> 02:53.760
and you can see here that the UI here is very, very natural

02:53.760 --> 02:57.450
and we can really understand what the system is doing

02:57.450 --> 03:02.070
and how is it built and that there is no magic over here.

03:02.070 --> 03:05.010
So this really creates trust between the user

03:05.010 --> 03:06.870
and the system, which is very important

03:06.870 --> 03:08.790
for Generative AI application.

03:08.790 --> 03:12.510
And this field is actually called Generative UI

03:12.510 --> 03:15.720
and it is the art of creating very smooth

03:15.720 --> 03:18.180
and nice user interfaces

03:18.180 --> 03:21.210
and user experiences for generative applications.

03:21.210 --> 03:22.302
And I do talk about it

03:22.302 --> 03:25.323
on the production section of this course.

03:26.190 --> 03:29.130
All right, let me test the core reference resolution here.

03:29.130 --> 03:31.410
So I'm going to ask who created it.

03:31.410 --> 03:34.650
Now we can see we got relevant subqueries over here,

03:34.650 --> 03:35.592
which is to understand

03:35.592 --> 03:38.970
that I am referring to LangChain

03:38.970 --> 03:42.019
And by the way, this application also have the ability

03:42.019 --> 03:45.390
to search online, but we're not searching online

03:45.390 --> 03:46.683
for this kind of query.

03:53.280 --> 03:55.620
All right, so now we get the answer

03:55.620 --> 03:58.800
that LangChain was created by Harrison Chase.

03:58.800 --> 04:02.370
So we can see here the documentation link

04:02.370 --> 04:04.470
and yeah, we got here the answer

04:04.470 --> 04:05.303
and we can see

04:05.303 --> 04:08.310
that the core reference resolution is working.

04:08.310 --> 04:12.330
All right, so this application is actually open source.

04:12.330 --> 04:15.000
So let me just go and search it up on Google.

04:15.000 --> 04:17.520
I'm going to write chat LangChain GitHub

04:17.520 --> 04:21.060
and this is this repository right over here.

04:21.060 --> 04:24.000
So basically it's built with the stack of LangChain,

04:24.000 --> 04:28.950
LangGraph and NEXT.js, this is going to be the front end

04:28.950 --> 04:31.170
and you can check out the code.

04:31.170 --> 04:33.250
And if I'll go here to the back end

04:35.400 --> 04:37.560
and under the retrieval graph,

04:37.560 --> 04:39.510
we can check out the prompts

04:39.510 --> 04:42.360
that are comprising the application.

04:42.360 --> 04:45.150
And this is going to be a multi-agent system.

04:45.150 --> 04:48.180
So let's go to Prompt.py over here.

04:48.180 --> 04:51.288
And here we can see we're downloading a bunch

04:51.288 --> 04:53.280
of prompts which are going to be used in the application,

04:53.280 --> 04:55.080
and we download them from the link

04:55.080 --> 04:56.910
and have like we did before.

04:56.910 --> 04:59.190
We can see we have here the router prompt

04:59.190 --> 05:02.640
and I explained the router concept in this course.

05:02.640 --> 05:05.190
We can have here the generate queries prompt,

05:05.190 --> 05:07.860
which is going to generate from the original query

05:07.860 --> 05:10.830
of the user sub queries to search for.

05:10.830 --> 05:12.330
We have a bunch of more prompts

05:12.330 --> 05:14.700
and I highly recommend you checking them out

05:14.700 --> 05:16.080
because they're very good examples

05:16.080 --> 05:18.990
of proper prompt engineering.

05:18.990 --> 05:21.360
All right, so those are the prompts which are going

05:21.360 --> 05:23.400
to be used in the application.

05:23.400 --> 05:26.370
So let's check out now the rest of the logic here.

05:26.370 --> 05:28.620
And again, we're not going too much in depth here.

05:28.620 --> 05:30.470
I'm just trying to show you the idea.

05:31.410 --> 05:34.290
Let's go now to the retrieval graph

05:34.290 --> 05:37.080
and let's go to here to graph.py.

05:37.080 --> 05:40.530
And here we have the actual usage of those prompts.

05:40.530 --> 05:44.310
And here we have an example of a multi-agent system,

05:44.310 --> 05:47.010
which is implemented with LangGraph.

05:47.010 --> 05:50.070
So right over here you see a LangGraph implementation.

05:50.070 --> 05:52.770
Now don't worry that you don't know what's LangGraph

05:52.770 --> 05:54.870
or what are multi-agent systems

05:54.870 --> 05:56.970
because I do cover the introduction

05:56.970 --> 05:59.070
for those topics in this course.

05:59.070 --> 06:00.568
But I go over the essentials

06:00.568 --> 06:04.380
and in-depth in those topics in my LangGraph course,

06:04.380 --> 06:07.380
which builds right on top of this course.

06:07.380 --> 06:08.808
The only thing I wanted to show you

06:08.808 --> 06:11.670
in this GitHub repository and this example,

06:11.670 --> 06:14.310
is that we have some advanced logic here,

06:14.310 --> 06:16.980
which is going to optimize our results.

06:16.980 --> 06:19.920
And this is how the LangChain team took the idea

06:19.920 --> 06:22.980
of the implementation helper and took it to the next level

06:22.980 --> 06:25.620
and implemented something which is production-ready,

06:25.620 --> 06:27.693
which produces high quality results.