WEBVTT

00:00.200 --> 00:00.720
Hey, guys.

00:00.720 --> 00:01.280
Ethan here.

00:01.280 --> 00:06.360
And today we're going to introduce vector databases, embeddings, text splitters and a lot of cool

00:06.800 --> 00:07.160
stuff.

00:09.080 --> 00:12.520
This part is an introduction to vector databases.

00:12.960 --> 00:14.840
So we're going to introduce some new topics.

00:14.840 --> 00:18.480
In this course we're going to introduce embeddings vector stores.

00:18.480 --> 00:20.200
We're going to be using pinecone.

00:20.240 --> 00:26.560
We're going to introduce the retrieval QA chain and some long chain classes like document loader and

00:26.560 --> 00:27.280
text splitter.

00:27.720 --> 00:30.680
So you can notice that we are introducing a lot of topics.

00:30.680 --> 00:35.760
And those are very important topics when it comes for developing an LLM powered application.

00:35.960 --> 00:39.520
And in the rest of this course we'll see why those are so important.

00:40.000 --> 00:43.840
Now let's begin by talking about long chain document loaders.

00:46.280 --> 00:53.000
Now remember in the introduction video we discussed that long chain is super powerful because it helps

00:53.000 --> 00:55.480
us interact with a lot of third parties.

00:55.720 --> 00:58.320
You can connect your Google Drive and read documents.

00:58.320 --> 00:58.760
From there.

00:58.760 --> 01:03.110
You can connect your Or notion notebook and read your notebooks from there.

01:03.150 --> 01:04.790
Or even your file system.

01:05.550 --> 01:11.470
So there are a lot of documents you could plug in and read by link chain that it's able to digest.

01:11.670 --> 01:18.910
So what link chain actually implemented is a lot of wrappers around those third parties.

01:18.910 --> 01:25.790
That makes it super easy to connect to them and retrieve data from them that we can process with our

01:25.790 --> 01:26.310
LLM.

01:26.830 --> 01:30.350
So this data comes in the form of documents.

01:30.510 --> 01:32.590
That's the terminology in long chain.

01:33.390 --> 01:36.950
And a document is simply something that holds text.

01:37.030 --> 01:41.510
So if we have a PowerPoint presentation it's basically represented with text.

01:41.510 --> 01:46.790
Or if we have a text file of course it's text, it's an image, it's a PDF, whatever.

01:46.790 --> 01:48.670
Everything is text.

01:48.670 --> 01:53.710
And the document loader class is the abstraction that is going to help us use this text data.

01:54.110 --> 02:00.470
We're going to load the data into documents, and we're going to work with documents and send them to

02:00.610 --> 02:01.290
the LM.

02:01.450 --> 02:02.930
So that's how it's going to be.

02:02.930 --> 02:08.330
So it doesn't matter which document we have, if it represents something from the notion notebook or

02:08.330 --> 02:13.930
something from our Google Drive, it doesn't really matter because we are working with an abstract thing

02:13.930 --> 02:15.370
that is called a document.

02:15.530 --> 02:21.690
Now, I know this might seem very abstract, but we're going to see the very implementation of this

02:21.690 --> 02:23.690
and understand exactly what it is.

02:23.890 --> 02:24.890
In this session.

02:24.890 --> 02:26.170
So don't worry about it.

02:26.530 --> 02:27.330
I promise you.

02:27.690 --> 02:28.050
Okay.

02:28.090 --> 02:33.770
Remember that annoying error that limited the number of tokens that we can use in the LM?

02:34.410 --> 02:36.730
So this is quite common actually.

02:37.010 --> 02:39.610
And there are a couple of ways of dealing with this.

02:39.850 --> 02:45.050
We'll show one strategy of this and how to implement it in this session and other strategies.

02:45.050 --> 02:47.850
I elaborate in the theoretical section of this course.

02:47.930 --> 02:50.810
So that's introducing us text splitters.

02:51.570 --> 02:57.490
Basically, when we want to deal with long pieces of text, it's necessary to split it up to chunks.

02:57.810 --> 03:00.960
So it may sound simple to chunk it up.

03:01.160 --> 03:06.080
But there is a lot of complexity when it comes to this, because there are a lot of different files

03:06.080 --> 03:11.200
and there are a lot of different approaches, and we want to keep everything semantically related.

03:11.200 --> 03:15.480
So basically the text splitter will help us split the text into chunks.

03:15.480 --> 03:19.400
And if we want to comprise it later, it will help us to reassemble it together.

03:19.760 --> 03:23.360
We'll see an example of this in this course and even in this session.

03:23.360 --> 03:28.880
So this is very, very useful when we're trying to resolve that token limit because we can split our

03:28.880 --> 03:30.080
text to smaller parts.

03:30.080 --> 03:36.080
And then we have some kind of strategies to get around this token limit, but still process that large

03:36.080 --> 03:37.560
amount of data that we want.

03:37.880 --> 03:43.120
So now I want to show you a cool way of solving this in a very, very elegant solution.

03:43.160 --> 03:48.480
The formal name for this method is called retrieval augmentation generation or in short, Rag.

03:48.640 --> 03:54.520
And the gist of it is that we're taking our original prompt, augmenting it with some relevant context.

03:54.520 --> 04:00.940
And that way the LLM is able to answer the original query when it has the correct context, and we'll

04:00.940 --> 04:04.380
even implement this ourselves end to end in this session.

04:04.380 --> 04:05.580
So stay tuned.

04:05.620 --> 04:07.020
It's going to be very, very cool.

04:08.100 --> 04:15.780
Let's say we have a huge file like a couple of gigs that we want our LLM to process, and we want it

04:15.780 --> 04:17.860
to give us answers about.

04:17.860 --> 04:23.660
For example, let's say it's a book and we want to ask questions about this book because this file is

04:23.660 --> 04:24.220
too big.

04:24.340 --> 04:29.620
Of course, we'll hit the token limit because there are way more than 4K tokens in a book.

04:29.940 --> 04:34.900
And let's say that the question that we want to answer, the answer for it resides in a specific part

04:34.900 --> 04:37.100
in the book or over a couple of parts.

04:37.100 --> 04:40.460
But it's very narrow down and it's in very specific places.

04:40.940 --> 04:49.660
So even if we split up the book into lots of chunks and we'll take one chunk or 2 or 3 chunks together

04:49.820 --> 04:57.020
and send them as a context to the LLM to answer the question, then we'll make a lot of redundant API

04:57.020 --> 05:02.530
calls, because the information for our answer is only in specific kind of chunks.

05:02.730 --> 05:05.450
And if we have five chunks, it's okay.

05:05.690 --> 05:09.530
But what about if we have 1 million chunks?

05:09.730 --> 05:15.610
So if we have 1 million chunks, we'll make a lot of redundant API calls, which will cost us a lot

05:15.610 --> 05:16.210
of money.

05:16.730 --> 05:22.570
So what if there was a way to get with some kind of magic?

05:22.610 --> 05:27.810
The relevant chunks that we need that contain the answer, or have a high probability of containing

05:27.810 --> 05:31.170
the answer and only sending those chunks to the LLM.

05:31.490 --> 05:39.210
So that way we only make a couple of API calls or even one, and we save a lot of money and we get the

05:39.210 --> 05:42.890
response a lot faster and we're not doing any redundant work.

05:43.290 --> 05:48.610
So if that was possible to get those relevant chunks then that would be amazing.

05:48.610 --> 05:53.210
It would save a lot of time, a lot of effort and a lot of resources and a lot of money.

05:53.930 --> 05:56.170
Luckily for us, there is a way to do it.

05:56.210 --> 05:57.850
It's elegant and called retrieval.

05:57.850 --> 05:58.690
Augmentation.

05:59.470 --> 06:01.790
Now it's time to introduce embeddings.

06:02.230 --> 06:09.270
Basically, text embedding is a classic technique, quite old, but super super useful in the natural

06:09.270 --> 06:10.710
language processing world.

06:11.150 --> 06:18.590
The idea is to create a vector space from the text, such that the distance between the vectors in the

06:18.590 --> 06:20.630
space have a certain meaning.

06:21.070 --> 06:22.670
But wait, what's a vector?

06:22.710 --> 06:25.030
A vector is simply a sequence of numbers.

06:25.270 --> 06:31.950
But what's cool about a vector is that it can represent a more complex object like words, sentences,

06:31.950 --> 06:37.070
images, audio files in a continuous high dimensional space called an embedding.

06:38.350 --> 06:46.630
Now, an embedding model you can think of as a black box which receives objects and those are represented

06:46.630 --> 06:50.710
by text does some things to them that outputs them.

06:50.950 --> 06:57.550
This array of numbers, which represents those objects in the vector space, as far as it concerns us,

06:57.550 --> 07:00.700
we don't care about what happens in the embedding model.

07:00.700 --> 07:09.540
All we care about is that text comes in, vectors come out now in good embedding models, takes text

07:09.580 --> 07:16.420
which has similar semantic meaning, and the representing vectors in the vector space are very, very

07:16.420 --> 07:17.300
close together.

07:17.300 --> 07:20.820
You can calculate the distance between each vector to another vector.

07:20.860 --> 07:26.620
Okay, so like in this example, if I have sentence one, which is I want to order an extra large coffee,

07:26.860 --> 07:28.020
I have sentence two.

07:28.060 --> 07:32.860
I'll have a tall coffee and I have sentence three quiero pedir cafe extra grande.

07:33.460 --> 07:40.140
Then in a good embedding, the vectors representing those sentences will be close together in the vector

07:40.140 --> 07:41.740
space or embedding space.

07:42.060 --> 07:48.260
Now it doesn't even matter that those sentences are even not in the same language, because the semantic

07:48.260 --> 07:50.940
meaning here is pretty much the same.

07:51.220 --> 07:56.500
So in their vectors representing them, they'll have a small distance between each other.

07:56.980 --> 08:00.650
Now how do we calculate the distance between the vectors.

08:01.050 --> 08:06.850
It's possible and it's basic math, but it's really, really boring and we don't care about it.

08:07.290 --> 08:13.010
All we need to know is that there are very smart people that did it for us with a lot of optimizations,

08:13.010 --> 08:15.410
and this thing is happening very, very fast.

08:16.530 --> 08:22.450
Now let's take another example that is going to bring us closer to the solution to the problem that

08:22.450 --> 08:24.370
we introduced in the beginning of the video.

08:24.570 --> 08:25.010
Okay.

08:25.250 --> 08:27.730
So we have the text how tall is the Burj Khalifa?

08:28.210 --> 08:34.890
Now the text below is simply a copy paste of the text in Wikipedia describing the Burj Khalifa.

08:34.930 --> 08:36.250
This is the first paragraph.

08:36.610 --> 08:39.090
Now in a good embedding model.

08:39.090 --> 08:44.490
Then if we embed both of those texts, then there will be very similar together in the vector space,

08:44.490 --> 08:45.970
their vectors representing them.

08:47.090 --> 08:50.010
So let's recap a bit about the relevant chunk thingy.

08:51.010 --> 08:56.810
So let's assume for this example that the LLM doesn't really know what is the Burj Khalifa.

08:57.010 --> 08:57.610
It doesn't.

08:57.650 --> 08:58.830
It was not trained on it.

08:58.830 --> 09:04.470
So this information is not available for it and it can't really go online and look for it.

09:04.510 --> 09:12.030
However, if by somehow we are able to get these relevant chunks of data and we'll do that by simply

09:12.030 --> 09:17.190
searching for the vector similar to the query we asked, and it's very easy to do.

09:17.230 --> 09:19.710
In fact, we'll show it very, very soon.

09:20.230 --> 09:23.230
So if we have that, we can simply tell it to the LM.

09:23.270 --> 09:25.830
Hey, I want to know, how tall is the Burj Khalifa?

09:25.870 --> 09:28.430
Here is some information that is going to help you.

09:28.430 --> 09:29.030
Answer me.

09:29.070 --> 09:31.950
Please use this information and answer my question.

09:32.190 --> 09:36.830
Then the LM will have no problem of answering how tall is the Burj Khalifa?

09:37.110 --> 09:41.430
But who will be crazy enough to embed the entire values of Wikipedia?

09:41.470 --> 09:42.670
I mean, that's insane.

09:42.670 --> 09:43.790
That's tons of data.

09:44.110 --> 09:46.910
Well, it turns out there are people who like to do it.

09:47.430 --> 09:54.830
So you can see right here that we have some people that took the values of Wikipedia and embedded them

09:54.830 --> 09:58.150
and represented the paragraphs as vectors.

09:58.380 --> 10:03.020
So now if we want to use it in order to answer questions more accurate, we can do.

10:04.220 --> 10:10.940
So let's say that in the vector space the query that we asked the LLM can be represented by the orange

10:10.940 --> 10:13.260
vector that you see in the square shape.

10:13.380 --> 10:13.780
Cool.

10:14.180 --> 10:21.300
So if there is a way to find its closest neighbors, its closest vectors that have the least amount

10:21.300 --> 10:25.180
of distance to it, then they would provide some very good context.

10:25.940 --> 10:32.580
And if those vectors represent something that it's information and something that is a data source,

10:32.900 --> 10:39.500
then we can take it, send it as the context to the query that we sent to the LLM.

10:39.540 --> 10:47.980
So our prompt will contain our query plus this context information and tell the LLM to use that context

10:47.980 --> 10:49.620
in order to answer the question.

10:49.660 --> 10:50.100
Okay.

10:50.380 --> 10:55.140
So that's the way we really give it those relevant chunks we talked about.

10:55.460 --> 11:02.720
And the vector database is a database that is saving those embeddings, those vectors, and is able

11:02.720 --> 11:07.120
to provide us with no time, the closest vectors to a vector we want.

11:07.600 --> 11:13.280
So it takes the embeddings and it simply persists them and makes it easy for us to use them later.

11:14.560 --> 11:19.920
So now let's recap everything we learned, and let's talk about the example we're trying to solve.

11:20.160 --> 11:22.880
We have the huge file representing the book.

11:23.080 --> 11:27.440
Now this file weighs a lot of gigabytes and it's very large.

11:27.720 --> 11:32.840
We can split it up into chunks and large chain is helping us to do it very, very easily.

11:33.040 --> 11:39.200
So we took the huge file of a couple of gigs, and we split it for thousands or millions of chunks of

11:39.200 --> 11:39.760
text.

11:40.120 --> 11:46.520
Now we can take all of these chunks and embed them using an embedding model and turn them into a vector

11:46.520 --> 11:48.520
that each vector represents the chunk.

11:48.640 --> 11:54.360
So each vector is going to be a list of numbers, which is going to represent that given chunk that

11:54.360 --> 11:55.120
was embedded.

11:55.480 --> 12:02.150
Now we can take those embeddings Settings and save them into a vector database like pinecone, for example.

12:02.550 --> 12:07.710
Now, if you want to ask a question about the book, we can take that question as the query, embed

12:07.710 --> 12:14.270
it into a vector, place it into the vector space where all the embeddings of the book of the chunks

12:14.270 --> 12:15.350
of the book exist.

12:15.630 --> 12:21.470
And now we can calculate what are the closest vectors to the query vector that we embedded.

12:21.950 --> 12:25.950
So those vectors are semantically close to our query vector.

12:26.150 --> 12:30.390
And that's what is representing those relevant chunks we talked about.

12:30.870 --> 12:38.550
So now we can simply send this context of the relevant chunks with our query in our prompt.

12:38.590 --> 12:38.790
Okay.

12:38.830 --> 12:40.910
So we can plug it all in together.

12:41.630 --> 12:43.070
We'll send it to the LLM.

12:43.150 --> 12:46.910
And the final prompt will be what did John do to Alice in the book.

12:46.950 --> 12:54.230
And then we'll send it in the context, the specific chunks that have this answer that have this information.

12:54.230 --> 12:58.170
And then the LLM will be easily able to answer us this question.

12:58.210 --> 13:03.170
And now let's take a deep breath and digest everything we just talked about.

13:03.210 --> 13:04.290
I know it's a lot.

13:04.650 --> 13:08.170
Now don't worry if you didn't understand everything.

13:08.210 --> 13:08.930
It's okay.

13:09.250 --> 13:11.130
It's a lot of information.

13:11.410 --> 13:17.410
I do suggest to watch this video one more time, but don't worry, because we will be implementing everything

13:17.410 --> 13:20.450
we've talked about and see how it's implemented in long chain.

13:20.450 --> 13:23.570
So let's have a recap of what we discussed in this session.

13:23.970 --> 13:27.770
So it was a very theoretical discussion on the following topics.

13:28.530 --> 13:33.690
But don't worry because now we're going to implement everything.

13:33.970 --> 13:38.090
Now I know it sounds scary, but the fact is it's actually quite simple.

13:38.090 --> 13:43.170
And once you see the code, which you can glance at it right now, you'll see that it is very simple

13:43.170 --> 13:47.010
in that long chain is performing a lot of heavy lifting for us.

13:47.050 --> 13:49.130
This is why chain is so, so cool.

13:49.330 --> 13:53.490
So in the next videos, we're going to be implementing all of what we've discussed.

13:53.490 --> 13:57.130
And we'll be putting it to practice into the implementation.

13:57.130 --> 13:58.130
So stay tuned.