WEBVTT

00:02.310 --> 00:04.560
-: All right, let's talk vector databases.

00:04.560 --> 00:06.570
So, this is how to do information retrievals,

00:06.570 --> 00:08.100
so you can make sure that your prompt

00:08.100 --> 00:11.370
has all the information that you need to complete the query.

00:11.370 --> 00:14.670
So, let's just install a couple of things.

00:14.670 --> 00:15.780
There's two here.

00:15.780 --> 00:19.230
One is OpenAI, which you need in order to do the query,

00:19.230 --> 00:20.490
and also to get the embeddings.

00:20.490 --> 00:22.050
I'll explain what that is in a second.

00:22.050 --> 00:25.710
And then there's also FAISS, F-A-I-S-S.

00:25.710 --> 00:30.710
That is, very specifically, a open source vector database

00:31.620 --> 00:33.480
that Facebook does for you.

00:33.480 --> 00:38.253
So, this is by Facebook, the actual AI lab at Facebook.

00:39.090 --> 00:40.920
All right, how does this work?

00:40.920 --> 00:45.000
I'm just going to bring this up.

00:45.000 --> 00:46.680
All right, oh, I'm sorry.

00:46.680 --> 00:48.993
We need to get OpenAI.

00:56.040 --> 00:58.743
And we need to initialize the client.

01:02.790 --> 01:04.030
And to pass in

01:05.550 --> 01:06.723
api_key.

01:08.490 --> 01:09.990
So, in order to get that,

01:09.990 --> 01:13.530
you just go to the API Keys part of your dashboard.

01:13.530 --> 01:17.217
When you run this, you can paste it in here and Submit.

01:17.217 --> 01:19.230
And now we have the secret_key

01:19.230 --> 01:21.453
and then we can create the OpenAI client.

01:22.560 --> 01:25.620
All right, so, now we have that,

01:25.620 --> 01:27.753
we wanna set the model that we're using.

01:30.210 --> 01:32.013
Just gonna use quite a simple model.

01:33.960 --> 01:37.140
The nice thing about vector queries

01:37.140 --> 01:39.090
is that they tend to make it

01:39.090 --> 01:41.592
so that you don't need a really good model,

01:41.592 --> 01:42.425
in order to answer the question.

01:42.425 --> 01:45.360
Because most of the time the answer

01:45.360 --> 01:48.870
is actually in the text that you're retrieving.

01:48.870 --> 01:50.820
The way that we find the text that we want

01:50.820 --> 01:53.760
is we get a vector embedding.

01:53.760 --> 01:56.610
And a vector embedding, I'll just show you what that means,

01:56.610 --> 02:00.300
is just this long string of numbers.

02:00.300 --> 02:02.533
And you can get this through the OpenAI API,

02:02.533 --> 02:04.590
but there are all other sources as well.

02:04.590 --> 02:07.443
So, just gonna put the input is the text.

02:09.480 --> 02:11.100
And

02:11.100 --> 02:11.933
model,

02:13.920 --> 02:16.900
this is the vector embedding model, not

02:20.010 --> 02:22.110
the LLM that we're using, all right?

02:22.110 --> 02:24.960
So, it's a different type of model.

02:24.960 --> 02:27.070
And then we're gonna get the embeddings

02:29.190 --> 02:32.277
and that's just from response.data.embedding.

02:39.468 --> 02:42.630
And then we're just gonna return

02:43.510 --> 02:45.630
that, okay, so now we have a function

02:45.630 --> 02:47.313
that we can put in some text.

02:53.070 --> 02:54.670
And your text

02:55.770 --> 02:58.830
goes here and we should get a big long string

02:58.830 --> 03:00.130
of numbers outta the back.

03:01.970 --> 03:06.330
Here we go. This is quite a few different numbers, right?

03:06.330 --> 03:08.760
Hundreds of numbers and each one responds

03:08.760 --> 03:10.803
to a specific dimension.

03:11.760 --> 03:13.380
And essentially what you can think

03:13.380 --> 03:17.190
of this as is like an address in space

03:17.190 --> 03:20.220
and the space of all possible concepts.

03:20.220 --> 03:21.750
Then things that are similar

03:21.750 --> 03:23.640
to each other will be close together.

03:23.640 --> 03:25.200
They'll have similar numbers, right?

03:25.200 --> 03:29.040
So they're, you think of like living next door to concepts

03:29.040 --> 03:30.060
that are similar to each other.

03:30.060 --> 03:32.340
So, the concept of king will be quite close

03:32.340 --> 03:33.900
to the concept of queen.

03:33.900 --> 03:36.180
So, what we can do, we don't need to worry too much

03:36.180 --> 03:39.390
about how these things work, but we can use them in order

03:39.390 --> 03:42.150
to find things that are similar in our database

03:42.150 --> 03:45.300
to what the user has asked.

03:45.300 --> 03:46.830
All right, what I'm gonna do is I'm gonna paste

03:46.830 --> 03:50.730
in this function here, which helps us chunk up our text,

03:50.730 --> 03:55.410
helps us split our text up into different sections.

03:55.410 --> 03:58.530
And this is just gonna split it into sentences specifically.

03:58.530 --> 04:00.240
I copied this from somewhere else,

04:00.240 --> 04:02.790
but I'm just gonna split the whole text up

04:02.790 --> 04:05.250
into individual sentences.

04:05.250 --> 04:08.100
And so, I'm just gonna paste the text in,

04:08.100 --> 04:10.470
stuff about Batman.

04:10.470 --> 04:12.300
So, here's a big long article about Batman,

04:12.300 --> 04:13.133
all the different Batmans,

04:13.133 --> 04:14.940
how they've changed over the years.

04:14.940 --> 04:18.420
So, if we wanted to make question and answer thing

04:18.420 --> 04:20.640
about Batman, you can see here, we split,

04:20.640 --> 04:22.170
we could put a lot more information here,

04:22.170 --> 04:26.160
but we split that page up into 10 different chunks.

04:26.160 --> 04:28.740
And that's typical, by the way, for vector databases

04:28.740 --> 04:31.290
because you wanna be able to pull back,

04:31.290 --> 04:33.810
like this is what's gonna go into the prompt.

04:33.810 --> 04:35.220
You want to be able to pull back,

04:35.220 --> 04:37.470
either it's like an individual chunk,

04:37.470 --> 04:39.660
like just a small sentence

04:39.660 --> 04:41.130
or it could be like a whole page.

04:41.130 --> 04:42.420
You need to decide how much

04:42.420 --> 04:43.770
information the prompt's gonna get.

04:43.770 --> 04:45.870
You don't wanna stuff the whole thing in the prompt

04:45.870 --> 04:47.460
because it just costs you money for no reason.

04:47.460 --> 04:48.960
It might confuse the (indistinct),

04:48.960 --> 04:51.240
but you also don't want like too small of a sentence.

04:51.240 --> 04:53.760
Like if you just had this bit from Adam West

04:53.760 --> 04:56.130
to Robert Pattinson, it doesn't have any information here.

04:56.130 --> 04:58.380
So, that's how you think about chunking

04:58.380 --> 05:01.680
and it really depends on your use case, right?

05:01.680 --> 05:05.220
We're gonna import numpy as np

05:05.220 --> 05:07.803
and then we're gonna import faiss.

05:09.390 --> 05:10.990
And what we need to do

05:12.090 --> 05:13.570
is create our

05:16.560 --> 05:19.920
vector database.

05:19.920 --> 05:24.333
So, I'm just gonna get the vector embeddings for the chunks.

05:28.680 --> 05:31.731
So, this is getting vector embeddings

05:31.731 --> 05:33.660
for each chunk, for all the chunk

05:33.660 --> 05:36.883
and chunks and then it's gonna turn that into an array,

05:36.883 --> 05:38.640
a NumPy array.

05:38.640 --> 05:42.000
And you can see what shape that is here.

05:46.533 --> 05:48.990
That this is basically like a list now

05:48.990 --> 05:51.363
of the different things in our array.

05:53.250 --> 05:55.890
Okay, so it's 1,500 numbers for each one

05:55.890 --> 05:58.470
and there's 10 of them, if that makes sense?

05:58.470 --> 06:00.000
All right, so let's create our database

06:00.000 --> 06:02.730
and this just creates it locally quite useful to work with

06:02.730 --> 06:05.340
and we just need to create an IndexFlatL2.

06:05.340 --> 06:07.020
Don't worry about what that means,

06:07.020 --> 06:09.810
but that's just gonna get the specific shape there

06:09.810 --> 06:11.580
and then say,

06:11.580 --> 06:16.020
so that tells it basically we want 1,536 dimensions

06:16.020 --> 06:19.200
and then we're just gonna add the vectors to the index

06:19.200 --> 06:20.970
and hopefully that works.

06:20.970 --> 06:23.367
Now, once we have everything added to the index

06:23.367 --> 06:25.500
and we could add more stuff if we want,

06:25.500 --> 06:27.810
we can now query that index.

06:27.810 --> 06:31.450
So, we can say we're gonna make a function for vector search

06:32.400 --> 06:34.450
and we just wanna pass in the query text

06:36.330 --> 06:38.250
and how many results we want back.

06:38.250 --> 06:39.513
We just call it K.

06:41.403 --> 06:43.353
And then we're gonna say, query_vector.

06:44.590 --> 06:49.140
We're gonna get the vector embeddings for the user's prompt.

06:49.140 --> 06:50.190
So, what did they ask?

06:50.190 --> 06:55.190
Like what's the address in space for the thing they asked?

06:55.200 --> 06:59.110
And then we're gonna find the distances, the indices

07:00.090 --> 07:02.040
for index search.

07:02.040 --> 07:03.240
And then we just gonna do,

07:03.240 --> 07:06.310
we're gonna, again, get this NumPy array

07:07.530 --> 07:11.853
and it just needs to pass in an array here,

07:12.912 --> 07:13.912
query_vector

07:17.670 --> 07:21.360
and K is what we're bringing back

07:21.360 --> 07:23.910
and that's just what FAISS needs.

07:23.910 --> 07:25.690
So, lemme just scroll down here

07:27.210 --> 07:29.493
and then what we're gonna get back is,

07:30.420 --> 07:33.300
I'm just gonna copy this across from previous

07:33.300 --> 07:35.550
piece of code because it's a little bit complicated

07:35.550 --> 07:36.780
but I'll explain it.

07:36.780 --> 07:40.000
So, we're just gonna get the chunks from the vector database

07:41.219 --> 07:44.280
and so it's just gonna get like a user ID

07:44.280 --> 07:48.000
that we get from the vector database to find out

07:48.000 --> 07:49.170
what chunk is the right chunk

07:49.170 --> 07:51.240
and then it's gonna zip the distances

07:51.240 --> 07:53.100
together and the indices.

07:53.100 --> 07:54.900
It's gonna make sense once you see it.

07:54.900 --> 07:59.580
So, now we have a way to search the vector databases

07:59.580 --> 08:01.857
for things that are similar.

08:01.857 --> 08:05.970
Gonna do search_results = vector_search

08:05.970 --> 08:10.068
we're just gonna say ("robert_pattinson")

08:10.068 --> 08:11.580
who was the latest Batman.

08:11.580 --> 08:12.603
So, lemme see.

08:13.650 --> 08:15.243
Print the search results.

08:18.330 --> 08:21.580
We just say for results in search_results

08:22.440 --> 08:26.430
print(result) 'cause if it's gonna come back

08:26.430 --> 08:28.080
with three results, if you remember?

08:28.080 --> 08:31.920
K equals three. So, let's see if this works. Here we go.

08:31.920 --> 08:33.370
So, we've got one chunk here.

08:35.580 --> 08:37.730
Just make this a little bit easier to read.

08:40.320 --> 08:41.700
Yeah, so we've got one chunk here

08:41.700 --> 08:45.690
and then you can see that it does talk about Pattinson.

08:45.690 --> 08:46.740
This one here, I think.

08:46.740 --> 08:48.870
Yeah, so Nirvana's "Something in the Way,"

08:48.870 --> 08:50.270
that's from the latest film.

08:53.220 --> 08:54.720
Yeah, here we go.

08:54.720 --> 08:56.190
So, you can see it's all over this chunk

08:56.190 --> 08:57.690
and therefore it was very similar

08:57.690 --> 08:59.337
to the word "Robert Pattinson."

09:00.240 --> 09:02.970
Similar here, this is the second chunk to came back with

09:02.970 --> 09:04.740
and Robert Pattinsons' all over it.

09:04.740 --> 09:06.570
And then same here, he's all over it as well.

09:06.570 --> 09:08.700
So, it's brought back the three most relevant

09:08.700 --> 09:10.560
things to Robert Pattinson.

09:10.560 --> 09:14.133
You can also see that the similarity is quite high.

09:15.150 --> 09:17.370
1.27 in this case.

09:17.370 --> 09:18.570
And this one's 1.3.

09:18.570 --> 09:21.270
So, these are the ones that were the closest in terms,

09:21.270 --> 09:23.310
so like more is worse, right?

09:23.310 --> 09:26.403
Those are the three closest in concept.

09:29.100 --> 09:32.070
So, now that we have that function,

09:32.070 --> 09:35.100
we can do something called search and chat

09:35.100 --> 09:38.670
or RAG, retrieval-augmented generation.

09:38.670 --> 09:41.220
And that just means doing one of these vector searches

09:41.220 --> 09:44.740
to query the database before we

09:45.660 --> 09:48.030
answer the user, right?

09:48.030 --> 09:50.820
So, this is a way to bring your documents

09:50.820 --> 09:53.760
into the actual kind of chatbot that you have.

09:53.760 --> 09:56.970
And this is what all the kind of chatbot-type companies do.

09:56.970 --> 09:59.850
It's a vector search finds similar documents

09:59.850 --> 10:02.360
or maybe past memories, whatever it is

10:02.360 --> 10:04.440
to actually supplement prompt.

10:04.440 --> 10:06.960
So, we're gonna get the search results,

10:06.960 --> 10:09.030
we're gonna do a vector search first.

10:09.030 --> 10:11.280
So, we've got our vector search function,

10:11.280 --> 10:13.980
just gonna put the chat prompt in there.

10:13.980 --> 10:18.000
And then K, so we'll get our search results

10:18.000 --> 10:20.100
and then maybe let's just print those

10:20.100 --> 10:22.050
so you can see what we go back.

10:22.050 --> 10:24.030
And then we wanna prompt,

10:24.030 --> 10:25.980
I'm just gonna copy this prompt across.

10:28.530 --> 10:30.660
So, this is taking the prompt

10:30.660 --> 10:32.100
and just passing in the context.

10:32.100 --> 10:33.480
So, passing the search results

10:33.480 --> 10:35.253
and then say, answer the question.

10:37.530 --> 10:39.300
Okay, so that's useful.

10:39.300 --> 10:43.380
And then we're gonna send a check completion there.

10:43.380 --> 10:44.640
We'll just say, "Please answer the questions

10:44.640 --> 10:46.800
prompted by the user, using the context provided to you."

10:46.800 --> 10:49.110
If you don't know the answer, say, "I don't know."

10:49.110 --> 10:50.850
So, that just stops it from hallucinating

10:50.850 --> 10:53.853
and not answering based on its own context.

10:55.620 --> 10:58.770
And that's quite useful when you're doing a RAG application.

10:58.770 --> 11:01.020
All right, so we're gonna print out the reply

11:02.700 --> 11:05.070
and let's ask a question.

11:05.070 --> 11:08.854
So, we'll do the tryout of a search and chat.

11:08.854 --> 11:10.604
I'm just gonna ask a question like,

11:12.403 --> 11:15.030
"What song was playing

11:15.030 --> 11:15.863
in Robert

11:18.630 --> 11:19.463
Batman."

11:24.600 --> 11:26.460
And let's see what we get back.

11:26.460 --> 11:27.810
So, it did the query quite quickly

11:27.810 --> 11:29.730
and it's got all this information now

11:29.730 --> 11:32.373
and it talks about the songs in here, I think.

11:36.300 --> 11:38.220
Yeah, Nirvana.

11:38.220 --> 11:40.440
Here we go, Nirvana's, "Something in the Way."

11:40.440 --> 11:43.410
So, it's managed to find that in the prompt

11:43.410 --> 11:46.410
and then now it can answer correctly.

11:46.410 --> 11:47.850
So, you can say, "The famous song playing

11:47.850 --> 11:49.350
in Robert Pattinson's, Batman,

11:49.350 --> 11:52.410
is Nirvana's 'Something in the Way.'"

11:52.410 --> 11:55.530
That is how retrieval-augmented generation works

11:55.530 --> 11:56.640
and it's very useful,

11:56.640 --> 11:58.860
once you see this, you can see it everywhere.

11:58.860 --> 12:01.260
So, a couple of different ways you can do this,

12:01.260 --> 12:03.180
I just use this with quite a small data set,

12:03.180 --> 12:06.240
but you can imagine, take this,

12:06.240 --> 12:07.650
and put in all of your documents,

12:07.650 --> 12:09.930
like maybe you wrote a book and you could pull in

12:09.930 --> 12:12.570
all the different sections of that book into the query

12:12.570 --> 12:14.520
so people can ask questions about your book.

12:14.520 --> 12:17.490
You could also make a chatbot with memory.

12:17.490 --> 12:20.340
So, if you save every memory in this vector database,

12:20.340 --> 12:23.070
then you say, "My name is Michael,"

12:23.070 --> 12:24.870
in the first conversation.

12:24.870 --> 12:25.800
Then a couple of months later

12:25.800 --> 12:27.090
you could say, "What was my name again?"

12:27.090 --> 12:29.760
And it would do a vector search on that database

12:29.760 --> 12:33.900
and it could say, "Okay, he asked for my name.

12:33.900 --> 12:35.550
There's another time where he asked for the name,

12:35.550 --> 12:38.220
that came back in the vector search and what did he say?

12:38.220 --> 12:40.950
He said his name was Michael." Super helpful.

12:40.950 --> 12:42.900
You can do RAG on a lot of applications,

12:42.900 --> 12:46.020
but this is hopefully just a simple demo for you.

12:46.020 --> 12:48.960
I wouldn't recommend, by the way, use FAISS in production

12:48.960 --> 12:50.790
because it's quite heavy, resource-wise.

12:50.790 --> 12:52.950
You'd have to host it on your own database.

12:52.950 --> 12:54.240
So, typically I'd tell you

12:54.240 --> 12:57.180
to use a hosted service like Pinecone

12:57.180 --> 12:59.823
or Supabase, a pgvector.
