WEBVTT

00:01.024 --> 00:03.450
-: Okay, let's learn about vector databases

00:03.450 --> 00:06.954
using Supabase and pgvector.

00:06.954 --> 00:10.920
So Supabase is a database host

00:10.920 --> 00:13.260
that hosts Postgres for you.

00:13.260 --> 00:15.990
It's one of the most popular ones. A really good solution.

00:15.990 --> 00:18.060
I use it for most of my products,

00:18.060 --> 00:22.092
and it has a integration with pgvector.

00:22.092 --> 00:26.880
Pgvector means you can turn your normal Postgres database,

00:26.880 --> 00:31.050
your SQL database into a vector database.

00:31.050 --> 00:32.130
So something you can query

00:32.130 --> 00:35.280
for retrieval-augmented generation or RAG.

00:35.280 --> 00:37.440
So I'm gonna show you what all that means.

00:37.440 --> 00:39.000
Don't worry about it too much right now,

00:39.000 --> 00:41.910
but you'll be able to see it just a way of being able

00:41.910 --> 00:44.250
to pull information from your database

00:44.250 --> 00:48.510
but based on similarity rather than just on keywords.

00:48.510 --> 00:51.360
We're just loading the environment.

00:51.360 --> 00:56.100
I have a environment file, like a .env file

00:56.100 --> 00:59.490
with the SUPABASE_URL, SUPABASE_KEY in there.

00:59.490 --> 01:03.180
You'll be able to get that from your Supabase project.

01:03.180 --> 01:04.980
So you need to create a Supabase project.

01:04.980 --> 01:06.870
You can usually create one for free

01:06.870 --> 01:09.690
and then you can get started that way.

01:09.690 --> 01:13.050
But now let's go into some

01:13.050 --> 01:15.540
of the code that we need to run.

01:15.540 --> 01:18.510
First, we need to be able to get vectors.

01:18.510 --> 01:21.330
It's a vector database, so we need to be able to get vectors

01:21.330 --> 01:26.330
and those are coming from the OpenAI library.

01:26.400 --> 01:29.130
So gonna create a client

01:29.130 --> 01:34.130
and that's gonna be getting the OpenAI api_key from.

01:36.180 --> 01:38.310
So that's also my .env file.

01:38.310 --> 01:42.840
And then just gonna use the gpt-4.1-mini model,

01:42.840 --> 01:47.700
as well as the text-embedding-small model for embeddings.

01:47.700 --> 01:50.800
Let's call this LLM_MODEL, VECTOR_MODEL.

01:50.800 --> 01:51.963
Gonna be this one.

01:53.040 --> 01:54.943
All right, so how do we get vector embeddings?

01:54.943 --> 01:56.417
Just gonna show you.

01:56.417 --> 02:00.880
So I'm gonna say get_vector_embedding.

02:00.880 --> 02:02.190
Gonna create a function

02:02.190 --> 02:03.740
where we just pass in some text

02:04.920 --> 02:07.622
and we get a response,

02:07.622 --> 02:11.310
which is using the client.embeddings.create.

02:11.310 --> 02:15.210
And then we just need to pass an input, which is the text

02:15.210 --> 02:18.450
and the model, which is the VECTOR_MODEL.

02:18.450 --> 02:22.770
And then we get the embeddings from response.data.

02:24.330 --> 02:26.463
And it's the first thing in the list.

02:27.330 --> 02:28.593
Then embedding.

02:29.700 --> 02:31.260
We're gonna return that.

02:31.260 --> 02:33.810
Okay, so just to show you how that works,

02:33.810 --> 02:34.685
let's just run one.

02:34.685 --> 02:36.480
Query_embedding

02:36.480 --> 02:39.540
and then you can just put in whatever text string you want.

02:39.540 --> 02:40.860
So whatever.

02:40.860 --> 02:43.380
I'm gonna show you what we get back.

02:43.380 --> 02:47.670
We'll show the length and then the actual embedding as well.

02:47.670 --> 02:52.490
Okay, so it's a 1,536 number list

02:54.510 --> 02:56.370
and each number is quite long, right?

02:56.370 --> 02:58.530
Like, it's to a lot of decimal places.

02:58.530 --> 03:00.930
But this represents basically the location

03:00.930 --> 03:03.420
in space for this concept.

03:03.420 --> 03:08.160
So if you put oranges, it's gonna be a different number.

03:08.160 --> 03:10.200
There's still the same number of dimensions,

03:10.200 --> 03:12.030
but concepts that are similar

03:12.030 --> 03:15.510
or close together will be literally close together

03:15.510 --> 03:16.770
in the numbers as well.

03:16.770 --> 03:20.040
So the benefit of using vectors

03:20.040 --> 03:21.810
is when you compare the numbers,

03:21.810 --> 03:24.990
you can actually figure out how close you are to a concept.

03:24.990 --> 03:26.880
That means you can search by similarities.

03:26.880 --> 03:30.630
You can say, you know, oranges are similar to other fruits,

03:30.630 --> 03:31.770
they're gonna be grouped together.

03:31.770 --> 03:33.540
Whereas if I search for cars,

03:33.540 --> 03:37.140
it's not gonna bring back an article about oranges.

03:37.140 --> 03:39.060
So that's the basics of vectors.

03:39.060 --> 03:42.750
But we have now the ability to create vectors,

03:42.750 --> 03:44.680
but we need to go into Supabase

03:45.690 --> 03:47.670
and then create our vector table.

03:47.670 --> 03:50.160
The code we need is,

03:50.160 --> 03:53.060
I'm just gonna put this in markdown, just this.

03:53.060 --> 03:55.440
We need to create a table with items

03:55.440 --> 03:58.200
and then we're gonna have a primary key

03:58.200 --> 04:01.650
and then we're gonna have the vector embeddings as well.

04:01.650 --> 04:03.630
And that we've said that this is 1,536

04:04.730 --> 04:07.380
'cause that's how many numbers we have.

04:07.380 --> 04:08.970
And then we're just gonna store the text.

04:08.970 --> 04:10.560
So when we do a search, we're gonna be able

04:10.560 --> 04:12.063
to bring back the text.

04:13.050 --> 04:16.380
All right, so let me show you where you go to run that.

04:16.380 --> 04:18.990
Okay, so this is Supabase

04:18.990 --> 04:22.560
and you would put that code in here.

04:22.560 --> 04:24.780
So ignore that for a minute.

04:24.780 --> 04:27.000
You would paste it in here and then you would hit run.

04:27.000 --> 04:28.950
Obviously not with this extra stuff.

04:28.950 --> 04:31.920
This is the other code we're going to need as well

04:31.920 --> 04:34.500
to enable the vector database.

04:34.500 --> 04:36.930
And then this is to create a function to be able

04:36.930 --> 04:39.720
to search the vector database as well.

04:39.720 --> 04:41.910
So I'll just go through this quickly.

04:41.910 --> 04:43.860
This creates the table

04:43.860 --> 04:46.050
and we'll be able to see that table in a second.

04:46.050 --> 04:50.010
And then this adds the extension for vector databases.

04:50.010 --> 04:52.710
This creates or replaces.

04:52.710 --> 04:54.570
This is basically like a function

04:54.570 --> 04:56.040
that you'll either create the function

04:56.040 --> 04:58.650
or replace it if it's already found

04:58.650 --> 05:00.600
and it's called match_items.

05:00.600 --> 05:04.740
And this match_items function is gonna query with the vector

05:04.740 --> 05:06.750
and then it's gonna have a match threshold,

05:06.750 --> 05:09.270
like similarity score that you can pass it

05:09.270 --> 05:11.820
and then it's gonna tell you how many results you come back.

05:11.820 --> 05:14.790
So when this creates a function for us to be able to call

05:14.790 --> 05:16.140
with our code.

05:16.140 --> 05:19.170
And what it returns is like a table

05:19.170 --> 05:21.120
with the similarity in the text,

05:21.120 --> 05:26.120
and then we have this is essentially how the search happens.

05:26.820 --> 05:30.360
So it matches the threshold, and that's the SQL.

05:30.360 --> 05:31.560
I've got that in the notebook.

05:31.560 --> 05:32.760
You'll be able to copy and paste that.

05:32.760 --> 05:34.770
You don't need to worry too much about it.

05:34.770 --> 05:38.280
But now we have a table.

05:38.280 --> 05:41.493
And now I'm just gonna actually, let me show you that table.

05:42.930 --> 05:44.103
So that's in here.

05:45.390 --> 05:47.130
You can see items

05:47.130 --> 05:48.527
and I've already added these items in

05:48.527 --> 05:50.040
so you can already see them.

05:50.040 --> 05:54.150
It's from my employee handbook, "Unicorn Enterprises,"

05:54.150 --> 05:56.550
and I have each kind of chapter

05:56.550 --> 06:00.150
of the handbook stored alongside the vectors.

06:00.150 --> 06:02.581
And if you look, you can see this is like

06:02.581 --> 06:07.581
that 156th number list that we'll be able

06:08.580 --> 06:10.440
to search by similarity.

06:10.440 --> 06:13.443
Cool. All right, so we have that.

06:14.400 --> 06:17.040
Now what do we need to do next?

06:17.040 --> 06:18.250
We need to

06:23.100 --> 06:24.960
actually load in the employee handbook.

06:24.960 --> 06:26.790
So I already had it in mind,

06:26.790 --> 06:29.823
but I'm just gonna paste this in.

06:30.660 --> 06:32.940
I just did this to make it simple, right?

06:32.940 --> 06:35.940
We have a few different chapters here and it's split.

06:35.940 --> 06:39.723
So we're gonna split that by two new lines.

06:41.670 --> 06:44.190
So this is going to break it into chunks.

06:44.190 --> 06:46.620
And each of these chunks is then gonna go into the database

06:46.620 --> 06:48.360
once we create it, right?

06:48.360 --> 06:52.593
So if you look, just run that,

06:53.970 --> 06:56.070
we can see that we've got paragraph zero,

06:56.070 --> 06:58.980
paragraph one, paragraph two, et cetera.

06:58.980 --> 07:01.530
So each one of these things could be searched.

07:01.530 --> 07:03.420
So if someone asks about work-life balance,

07:03.420 --> 07:05.790
then it'll return paragraph three,

07:05.790 --> 07:07.170
whereas someone talks about coffee,

07:07.170 --> 07:09.870
then maybe it will bring back paragraph four

07:09.870 --> 07:11.910
because that mentions coffee.

07:11.910 --> 07:15.120
Okay, so now I need a function to create the embeddings

07:15.120 --> 07:18.030
and then upload them into Supabase.

07:18.030 --> 07:23.030
The way that works is first, let's make a list

07:23.370 --> 07:26.160
and we'll say for chunk in chunks

07:26.160 --> 07:28.110
and we're gonna get the embedding

07:28.110 --> 07:29.850
and that's our get_vector_embeddings.

07:29.850 --> 07:31.620
We're just gonna pass in that chunk.

07:31.620 --> 07:33.090
So that gets it from OpenAI.

07:33.090 --> 07:37.920
And then we're gonna add the embeddings_append(embedding).

07:37.920 --> 07:39.930
Right? So that passes that in.

07:39.930 --> 07:41.227
And now we're gonna go through

07:41.227 --> 07:43.563
and we're gonna say for chunk,

07:44.520 --> 07:47.847
embedding in zip(chunks, embeddings).

07:51.840 --> 07:55.590
What that's gonna do is gonna join those two lists together

07:55.590 --> 07:57.723
and then give you both of them in a row.

07:59.250 --> 08:00.690
So we're gonna get the data,

08:00.690 --> 08:04.350
I'm gonna say supabase.table.

08:04.350 --> 08:06.783
And we call our table items.

08:08.280 --> 08:12.060
We just wanna insert the object,

08:12.060 --> 08:13.803
which is gonna be a text,

08:14.820 --> 08:19.680
chunks, embedding.

08:19.680 --> 08:21.780
And just make sure that you use the same names here,

08:21.780 --> 08:23.850
I use the same table name,

08:23.850 --> 08:26.430
use the same column names as well.

08:26.430 --> 08:28.290
Otherwise you'll run into some issue.

08:28.290 --> 08:31.170
So this executes like this.

08:31.170 --> 08:34.260
So that's gonna add everything to the table.

08:34.260 --> 08:39.260
If I do that, then what we'll see,

08:40.170 --> 08:41.700
if I refresh my table,

08:41.700 --> 08:46.050
you can see it's added that same thing again.

08:46.050 --> 08:48.760
Okay, that's where you need to go

08:49.920 --> 08:52.560
and if add this.

08:52.560 --> 08:55.230
You also need to make sure you ran this function as well.

08:55.230 --> 08:58.983
So let me just paste this in here for you.

09:00.360 --> 09:03.000
So like I said, you create the extension,

09:03.000 --> 09:04.320
you actually need to run that before

09:04.320 --> 09:05.400
or just add the extension.

09:05.400 --> 09:07.650
Oh yeah, I added the extension manually in Supabase.

09:07.650 --> 09:10.380
But you can just run this instead.

09:10.380 --> 09:11.910
You need to create your function.

09:11.910 --> 09:14.580
And now we're gonna actually use the function. All right?

09:14.580 --> 09:17.170
So we wanna be able to take query

09:18.150 --> 09:19.650
and we're just gonna ask,

09:19.650 --> 09:23.673
do we get free unicorn rides?

09:24.660 --> 09:26.340
'Cause that's something the employee

09:26.340 --> 09:29.130
might want to know about their job.

09:29.130 --> 09:30.660
So we have the query embedding

09:30.660 --> 09:35.070
and we have get_vector_embeddings,

09:35.070 --> 09:38.100
and we're just gonna embed the text, right?

09:38.100 --> 09:39.690
So that's gonna give us the embedding,

09:39.690 --> 09:41.070
and then we can just do this.

09:41.070 --> 09:44.973
We can get the results from Supabase.

09:46.497 --> 09:50.220
Use rpc, and this is gonna run our match_items function

09:50.220 --> 09:51.720
that we made.

09:51.720 --> 09:55.173
And we need to pass a few different things there.

09:56.940 --> 09:58.640
Sorry, this should be query_text.

09:59.586 --> 10:01.923
And soryy, this should be equals.

10:03.990 --> 10:05.890
All right, so what do we need to pass?

10:07.500 --> 10:11.223
We need to pass the query_embedding, that's one thing.

10:13.800 --> 10:15.693
So that's what we got there.

10:17.034 --> 10:20.163
We need to pass the match_threshold.

10:20.163 --> 10:22.320
So 0.4, let's just say,

10:22.320 --> 10:25.830
or we can change that, see how it looks in your results.

10:25.830 --> 10:29.880
And that's basically how similar the documents have to be

10:29.880 --> 10:31.830
in order to return.

10:31.830 --> 10:34.683
And then this is how many documents you want back.

10:36.450 --> 10:37.593
We can execute that.

10:39.120 --> 10:40.800
I recommend putting it in the try block

10:40.800 --> 10:42.393
so you can handle any errors.

10:43.230 --> 10:45.660
Yeah, so then we're gonna, if it works,

10:45.660 --> 10:49.440
then we're gonna say for results in results.

10:49.440 --> 10:52.920
It's gonna come back in a data attribute.

10:52.920 --> 10:54.925
And then let's just print.

10:54.925 --> 10:57.243
Gonna print out a bunch of stuff here.

10:58.260 --> 11:00.750
So we're gonna print the similarity score

11:00.750 --> 11:02.220
that's gonna come back

11:02.220 --> 11:04.563
and we're gonna print out the text as well.

11:05.850 --> 11:07.590
And then if there's any issues,

11:07.590 --> 11:10.830
then you can print out the error as well.

11:10.830 --> 11:12.303
So let's see if this works.

11:13.230 --> 11:14.850
There we go. It worked.

11:14.850 --> 11:18.360
So we've got two documents and we asked about unicorn rides.

11:18.360 --> 11:21.750
We can see it says unicorn here, it says unicorn there.

11:21.750 --> 11:24.570
It doesn't have to be the word unicorn, right?

11:24.570 --> 11:26.640
Like, it's not a keyword search.

11:26.640 --> 11:29.700
So you could say, do we get free rides

11:29.700 --> 11:34.700
of any mythical beasts?

11:35.700 --> 11:38.130
And because unicorns are quite often mentioned

11:38.130 --> 11:41.280
with mythical beasts, then yeah, here we go.

11:41.280 --> 11:42.750
Unicorn rides.

11:42.750 --> 11:45.000
Could also say are there any animals at work?

11:46.020 --> 11:48.450
And that didn't get any similarity there

11:48.450 --> 11:51.180
because nothing that was similar enough to animals.

11:51.180 --> 11:56.070
So if we drop that similarity down, then you can see, yeah,

11:56.070 --> 11:58.620
we did get petting zoo, unicorn.

11:58.620 --> 12:00.480
We didn't actually say the word animal there,

12:00.480 --> 12:04.110
but it was close enough, similarity score of 0.3,

12:04.110 --> 12:05.433
that it did come back.

12:06.750 --> 12:08.310
And then here we have mythical creature.

12:08.310 --> 12:09.570
We didn't say the word animals,

12:09.570 --> 12:12.393
but animals is similar to mythical creature.

12:13.770 --> 12:17.130
Cool. Let's just say let's go back to our query.

12:17.130 --> 12:20.673
Do we get free unicorn rides?

12:23.610 --> 12:27.600
Cool, so now we have a way to get the documents

12:27.600 --> 12:30.450
that are relevant and we want our LLM to be able

12:30.450 --> 12:33.750
to search based on those documents.

12:33.750 --> 12:36.300
We want it to answer based on those documents, right?

12:36.300 --> 12:37.590
But let's just say we have...

12:37.590 --> 12:40.290
Let's make a retrieve function.

12:40.290 --> 12:43.217
We're gonna pass in the query_text and match_count.

12:45.480 --> 12:46.950
This all the same stuff.

12:46.950 --> 12:49.260
So we're just gonna paste this in.

12:49.260 --> 12:50.160
It's just gonna go through

12:50.160 --> 12:51.860
and just give you all the results.

12:53.280 --> 12:55.992
So we have our retrieval function,

12:55.992 --> 12:57.153
we have a query,

12:58.260 --> 12:59.970
and then we're gonna create another function

12:59.970 --> 13:02.385
called search_and_chat.

13:02.385 --> 13:03.718
Search_and_chat.

13:04.590 --> 13:06.600
And this is gonna use our retrieval, right?

13:06.600 --> 13:08.743
So we're gonna put in our chat_prompt.

13:10.839 --> 13:13.860
A=2, we're getting two documents back.

13:13.860 --> 13:16.620
We can change that if we wanted more documents.

13:16.620 --> 13:19.713
So let's print out the user query,

13:20.700 --> 13:23.730
and then here's where the good stuff happens, right?

13:23.730 --> 13:27.600
We're gonna retrieve the documents first,

13:27.600 --> 13:30.870
and then pump that into the LLM.

13:30.870 --> 13:35.640
So I'm gonna say let's add some print stuff first actually

13:35.640 --> 13:39.480
just so can have a nice look at what it looks like.

13:39.480 --> 13:40.593
Makes sense to you.

13:41.640 --> 13:44.340
And then we're gonna prompt_with_context.

13:44.340 --> 13:47.070
So let's create our prompt,

13:47.070 --> 13:49.380
and we're gonna put the results

13:49.380 --> 13:52.740
of the vector search into the prompt

13:52.740 --> 13:57.740
so that we know we have context from our documents, right?

13:57.930 --> 14:00.450
So this the whole point of RAG, right?

14:00.450 --> 14:03.960
Be able to pull in information into your documents

14:03.960 --> 14:07.470
so that the LLM can make a better decision.

14:07.470 --> 14:11.673
So you can say, answer the question, put the chat_prompt.

14:12.960 --> 14:16.350
So this just separates it by new lines, right?

14:16.350 --> 14:18.210
And then we just pass in our prompts.

14:18.210 --> 14:21.153
So this just messages equals.

14:22.140 --> 14:24.373
I'm just gonna paste this in.

14:26.550 --> 14:29.640
This is like a standard prompt that I use.

14:29.640 --> 14:31.710
So we're saying please answer the questions provided

14:31.710 --> 14:34.530
by the user using the context available to you.

14:34.530 --> 14:37.050
If you don't answer, say no, I don't know.

14:37.050 --> 14:39.800
And then we're gonna pass in this prompt_with_context.

14:40.800 --> 14:43.980
And then in order to call the LLM, the final part,

14:43.980 --> 14:45.270
the most important part,

14:45.270 --> 14:50.270
response = client.chat.completions.create.

14:52.999 --> 14:56.220
We're gonna pass in the model, let's call this gpt-4.

14:56.220 --> 14:58.173
Oh yeah, we actually have model.

14:59.280 --> 15:02.040
I'm gonna we call it LLM_MODEL.

15:02.040 --> 15:04.413
And then let's pass in the messages.

15:05.460 --> 15:08.250
And that should be everything we need.

15:08.250 --> 15:09.603
So let's run through.

15:10.470 --> 15:12.923
So we're gonna search_and_chat.

15:12.923 --> 15:15.373
I'm gonna say, I'm gonna put that query in there.

15:16.290 --> 15:18.090
Do we get free unicorn rides?

15:18.090 --> 15:20.880
And it's going to do a vector search,

15:20.880 --> 15:23.310
find the documents from our employee handbook

15:23.310 --> 15:25.950
that's most relevant to unicorn rides.

15:25.950 --> 15:29.880
Then it's going to put that into the prompt

15:29.880 --> 15:31.380
and then send that prompt to the LLM,

15:31.380 --> 15:34.140
and then get the response and print it out for us.

15:34.140 --> 15:35.820
So let's see if this works.

15:35.820 --> 15:38.490
User query: do we get free unicorn rides?

15:38.490 --> 15:40.173
It retrieved these documents.

15:41.670 --> 15:46.670
And then, okay, so we've got three documents here.

15:47.460 --> 15:49.710
They all have quite high similarity.

15:49.710 --> 15:53.610
And it says yes, you get free unlimited unicorn rides.

15:53.610 --> 15:57.420
You can see that here you get unlimited unicorn rides.

15:57.420 --> 15:59.190
That was in the third document here.

15:59.190 --> 16:02.010
We actually got the right answer here.

16:02.010 --> 16:04.320
And by providing that context to the LLM,

16:04.320 --> 16:07.200
it knows something that it didn't know before.

16:07.200 --> 16:08.970
So that's the really powerful thing.

16:08.970 --> 16:10.440
Now, with RAG applications,

16:10.440 --> 16:12.090
obviously there's a lot of work to do.

16:12.090 --> 16:13.920
Like you wanna really think

16:13.920 --> 16:16.800
about what documents you put in here,

16:16.800 --> 16:19.560
check whether the search is doing a good job or not.

16:19.560 --> 16:21.870
Are you pulling in the relevant context?

16:21.870 --> 16:25.260
And if so, then is it using the context correctly?

16:25.260 --> 16:27.390
So that those are all different things you can test.

16:27.390 --> 16:30.960
But that is basically how RAG works

16:30.960 --> 16:33.690
with pgvector in Supabase.

16:33.690 --> 16:36.870
Now, you can also do this locally in Postgres as well.

16:36.870 --> 16:38.580
I just use Supabase 'cause I prefer it.

16:38.580 --> 16:41.970
But it does have a local version you can run

16:41.970 --> 16:43.820
without paying anything. It does have a free version.

16:43.820 --> 16:44.970
But if you want to,

16:44.970 --> 16:48.630
you can actually just use this with Postgres locally as well

16:48.630 --> 16:49.890
or post it on the server.

16:49.890 --> 16:51.390
So it's a very flexible,

16:51.390 --> 16:54.480
I would say probably the most flexible and scalable result.

16:54.480 --> 16:56.310
There are really good vector databases

16:56.310 --> 16:59.340
that are specifically optimized for vectors,

16:59.340 --> 17:03.150
Qdrant and Chroma, Pinecone and a bunch of others.

17:03.150 --> 17:04.500
But this is just what I use

17:04.500 --> 17:06.780
'cause I'm already in Postgres anyway

17:06.780 --> 17:09.480
and so it just makes sense for me to just add another table

17:09.480 --> 17:10.950
and give out another column,

17:10.950 --> 17:12.510
which is like the vector column.

17:12.510 --> 17:14.553
So I find that works quite well for me.