WEBVTT

00:00.030 --> 00:01.020
-: Okay, let's talk about

00:01.020 --> 00:04.230
how to do vector databases with Pinecone.

00:04.230 --> 00:06.370
So we're gonna install Pinecone OpenAI

00:07.620 --> 00:10.499
and then also this datasets library

00:10.499 --> 00:13.470
because we're gonna load datasets from there.

00:13.470 --> 00:14.880
And this is, basically just following

00:14.880 --> 00:17.970
the typical Pinecone tutorial.

00:17.970 --> 00:20.370
So I'm just gonna really explain

00:20.370 --> 00:21.770
how the code works as we go.

00:23.243 --> 00:24.510
So that's installing that.

00:24.510 --> 00:27.720
We also want to get an OpenAI API key

00:27.720 --> 00:32.720
and gonna need a Pinecone one as well.

00:32.730 --> 00:33.563
Some point.

00:33.563 --> 00:36.270
So run that when that's ready.

00:36.270 --> 00:37.420
Gonna run this as well.

00:39.240 --> 00:40.073
Okay.

00:41.250 --> 00:42.540
All right, so, once you run that,

00:42.540 --> 00:45.040
then you paste the API key up here

00:46.110 --> 00:48.063
and we've got this, okay.

00:48.960 --> 00:52.140
All right, then we're going to just make sure

00:52.140 --> 00:54.570
that OpenAI is working.

00:54.570 --> 00:57.600
Here's a query I created earlier.

00:57.600 --> 01:00.600
So who's the 12th person on the moon and when did they land?

01:03.000 --> 01:04.750
So we're just gonna get a response.

01:06.330 --> 01:09.123
Chat, completions, create.

01:11.190 --> 01:12.810
That's how you get a response.

01:12.810 --> 01:17.313
I'm just gonna say the model and the messages.

01:19.500 --> 01:23.313
And then, the messages would be,

01:25.890 --> 01:30.890
say the user role of the query in here.

01:33.660 --> 01:35.700
Right, so that should be everything.

01:35.700 --> 01:38.493
And then, when we get the response,

01:40.825 --> 01:43.620
you need to get the first thing on the list

01:45.750 --> 01:48.693
and it's message content.

01:50.040 --> 01:50.873
All right.

01:53.530 --> 01:55.140
Okay, so the 12th person on the,

01:55.140 --> 01:57.723
walk on the moon was astronaut Gene Cernan.

02:02.070 --> 02:03.303
Or Harrison Schmitt.

02:07.770 --> 02:09.123
Oh, Harrison Schmitt again.

02:11.400 --> 02:12.423
Okay, yeah, so-

02:14.760 --> 02:17.850
Yeah, you can see that it's not consistent

02:17.850 --> 02:19.830
and that's because it didn't have context.

02:19.830 --> 02:22.110
And so it's loosening a little bit

02:22.110 --> 02:23.669
and now it's saying the 13th person

02:23.669 --> 02:25.419
was Harrison Schmitt.

02:27.465 --> 02:29.793
And then I actually said who was the 15th.

02:32.100 --> 02:33.600
So, do you see what I mean?

02:33.600 --> 02:35.433
LOMs tend to hallucinate.

02:36.420 --> 02:40.920
So, if you don't have context, that can cause a big problem.

02:40.920 --> 02:43.440
And what you want to be able to do is search for context.

02:43.440 --> 02:45.540
So let's take, let's take like,

02:45.540 --> 02:50.070
a more realistic query that you might want to actually ask.

02:50.070 --> 02:51.300
I'm just gonna paste this in here.

02:51.300 --> 02:54.810
I've just created a function to able to call OpenAI,

02:54.810 --> 02:56.760
just complete function,

02:56.760 --> 02:59.100
and here's like a question we might have.

02:59.100 --> 03:00.330
What training methods shall you use

03:00.330 --> 03:02.070
for sentence transformers when I only have

03:02.070 --> 03:03.480
a pair of related sentences?

03:03.480 --> 03:05.940
So this might be a user query

03:05.940 --> 03:09.420
if they're asking your documentation and therefore,

03:09.420 --> 03:11.850
if you can provide a good response to that,

03:11.850 --> 03:14.250
then yeah, then that could be a good thing.

03:14.250 --> 03:16.983
So this is the example that you can use.

03:18.030 --> 03:22.950
Now, we need to get an embedding model

03:22.950 --> 03:26.010
in order to get context to put in the prompt.

03:26.010 --> 03:30.180
So just gonna, this is a different type of model

03:30.180 --> 03:34.470
from the normal OpenAI that you might be calling.

03:34.470 --> 03:36.150
I'm gonna show you what this looks like.

03:36.150 --> 03:38.993
So we get embeddings.

03:49.590 --> 03:50.823
We just need to put in.

03:52.260 --> 03:54.640
Okay, so we're gonna send in some texts

03:55.650 --> 03:58.200
so that we're sending in two different texts there.

04:00.450 --> 04:04.120
And then we're gonna give embed model and get the data.

04:08.670 --> 04:10.920
So what that gives us, is this embedding.

04:10.920 --> 04:11.940
Actually two of them, right?

04:11.940 --> 04:14.520
One for each sentence that we put in.

04:14.520 --> 04:19.520
And this is just a big list of numbers, 1,536 numbers.

04:19.830 --> 04:22.290
And what that allows you to do,

04:22.290 --> 04:24.450
is to search based on similarities.

04:24.450 --> 04:26.850
So if the numbers are close together, then you know that,

04:26.850 --> 04:29.850
that topic is similar to what you're searching for.

04:29.850 --> 04:32.790
We've created two vectors here, so you can see that.

04:32.790 --> 04:35.040
Yeah, two, list of two.

04:35.040 --> 04:36.810
And just to kinda show you

04:36.810 --> 04:41.640
that the length of embedding is 1,536 numbers.

04:41.640 --> 04:43.440
So there are different types of embedding,

04:43.440 --> 04:45.000
different providers.

04:45.000 --> 04:48.000
We, I can also kinda show you what the embedding looks like.

04:49.230 --> 04:50.680
So if you look at that there,

04:52.170 --> 04:54.663
then you can see this big list of numbers.

04:55.920 --> 04:58.140
All right, so now we have our embedding,

04:58.140 --> 05:00.123
we want to load the data set.

05:05.610 --> 05:08.340
And this is what the data looks like.

05:08.340 --> 05:12.510
So, it is just little transcripts of these YouTube videos.

05:12.510 --> 05:13.897
And you can see this is like

05:13.897 --> 05:16.200
"Training and Testing an Italian BERT-

05:16.200 --> 05:17.940
Transformers from Scratch."

05:17.940 --> 05:22.500
So what, this is, again, from the Python tutorial,

05:22.500 --> 05:25.020
what we're doing, is we're gonna load

05:25.020 --> 05:27.900
all of those transcripts,

05:27.900 --> 05:29.550
we're gonna join them together, right?

05:29.550 --> 05:32.460
So in this case, you can see that this only

05:32.460 --> 05:33.660
has like a little bit of text

05:33.660 --> 05:37.350
and it starts at zero seconds, ends at nine seconds.

05:37.350 --> 05:39.600
We're gonna join them together, based on two things.

05:39.600 --> 05:44.027
One is the number of sentences to combine, right?

05:45.540 --> 05:47.220
And then the number of sentences

05:47.220 --> 05:49.290
to stride over to create overlap.

05:49.290 --> 05:51.510
What that means is, we're gonna

05:51.510 --> 05:53.850
combine 20 sentences together,

05:53.850 --> 05:56.490
but there's gonna be an overlap of four sentences

05:56.490 --> 05:58.410
between that and the next sentence.

05:58.410 --> 06:02.130
So sentences one through twenty would be in the first chunk

06:02.130 --> 06:06.240
and then sentences 16 through onwards

06:06.240 --> 06:08.053
for the next 20, as to 36.

06:09.114 --> 06:12.090
That's gonna have some overlap between the two.

06:12.090 --> 06:13.080
The reason why you do that

06:13.080 --> 06:15.630
is so that you don't cut off in the middle of a sentence

06:15.630 --> 06:18.570
and we're just gonna append the data there.

06:18.570 --> 06:20.637
And then you'll see, you can see like the,

06:20.637 --> 06:24.540
the text is a lot longer now, right?

06:24.540 --> 06:26.940
So this is like a much bigger chunk of text

06:26.940 --> 06:28.410
and that means it's much more likely

06:28.410 --> 06:31.890
that the answer will be in one of these chunks of texts.

06:31.890 --> 06:34.650
You need to also set up your Pinecone API key.

06:34.650 --> 06:39.300
So you can get that from going into Pinecone, API Keys,

06:39.300 --> 06:42.180
and then this is the code to create an index.

06:42.180 --> 06:45.660
When you run that, then it's gonna, I'm not gonna run it

06:45.660 --> 06:48.120
because I've already created this index,

06:48.120 --> 06:51.289
but it will create an index like this in Pinecone.

06:51.289 --> 06:53.220
So you see, you've got employee handbook

06:53.220 --> 06:57.450
and then you've got, it just kind of sets this up for you.

06:57.450 --> 06:59.490
And you can see I've got the data in here already.

06:59.490 --> 07:01.860
These are the different texts, right?

07:01.860 --> 07:04.830
You can see there and it's got a title, URL,

07:04.830 --> 07:06.990
when it was published, et cetera.

07:06.990 --> 07:09.510
But the way that you load that stuff in,

07:09.510 --> 07:13.230
you run this PC index and describe it, right?

07:13.230 --> 07:15.480
So that, that's all the kind of setup.

07:15.480 --> 07:17.640
Then this takes a long time to run.

07:17.640 --> 07:20.160
So we're not gonna run it for the tutorial,

07:20.160 --> 07:22.170
but this is the batch size,

07:22.170 --> 07:25.170
this is how many embeddings we create and insert at once.

07:25.170 --> 07:27.780
So we're doing a hundred inserts at once,

07:27.780 --> 07:30.780
and it's gonna run through, based on the batch size,

07:30.780 --> 07:33.270
and it's gonna find the text that we need.

07:33.270 --> 07:37.170
It's gonna basically, join them together to get the text.

07:37.170 --> 07:39.450
It's gonna create the embeddings

07:39.450 --> 07:41.910
and it's gonna sleep for five seconds

07:41.910 --> 07:46.230
and then it's gonna put those embeddings into the vectors

07:46.230 --> 07:49.920
and then insert them or upsert them into Pinecone.

07:49.920 --> 07:52.410
You know, tons and tons of these sentences, right?

07:52.410 --> 07:53.700
That's the specific thing you're doing

07:53.700 --> 07:55.440
with this vector database.

07:55.440 --> 07:58.170
Then once you have that, you can search,

07:58.170 --> 08:01.590
so this is how you do a search in Pinecone.

08:01.590 --> 08:03.000
You create the embeddings

08:03.000 --> 08:05.550
for the new thing that you're searching.

08:05.550 --> 08:07.920
So I've just done that through OpenAI again,

08:07.920 --> 08:09.750
and you can see I've just got the query.

08:09.750 --> 08:12.450
And then you just query based on the namespace.

08:12.450 --> 08:15.513
So this is, you know, this is something you get from here,

08:17.430 --> 08:20.400
you can see that here, namespace,

08:20.400 --> 08:22.710
and then you pass in the vector

08:22.710 --> 08:24.960
and then you pass how many results you need.

08:24.960 --> 08:26.160
So in this case, I've got two,

08:26.160 --> 08:28.170
but you know, you can get more.

08:28.170 --> 08:29.880
And you can see that this is actually

08:29.880 --> 08:30.780
giving you a match score,

08:30.780 --> 08:33.000
it tells you how close it is to match,

08:33.000 --> 08:35.253
but also the text as well.

08:36.600 --> 08:39.930
And then with that, as with all RAG,

08:39.930 --> 08:42.000
you wanna be able to stuff that into a prompt

08:42.000 --> 08:44.940
and then let the LLM answer.

08:44.940 --> 08:47.430
So for example, we had information about a moon landings

08:47.430 --> 08:48.960
and then we'd be able to avoid

08:48.960 --> 08:50.100
the issue that we saw earlier.

08:50.100 --> 08:51.690
In this case, we're trying to avoid the issue

08:51.690 --> 08:53.610
of not knowing how to do

08:53.610 --> 08:55.830
a specific type of machine learning, right?

08:55.830 --> 08:59.610
So we're creating the embedding,

08:59.610 --> 09:03.420
then we're passing that into Pinecone, querying,

09:03.420 --> 09:05.040
we're getting the context,

09:05.040 --> 09:06.720
and then we put them into the prompt,

09:06.720 --> 09:11.190
so we say, here's the context, and then we put that in here.

09:11.190 --> 09:13.797
So let's kind of join all those things together,

09:13.797 --> 09:15.330
put the end of the prompt,

09:15.330 --> 09:17.370
and then, if it exceeds the limit,

09:17.370 --> 09:19.353
then reduce the context one by one.

09:20.280 --> 09:22.590
So this is just like a smart thing to do.

09:22.590 --> 09:24.870
If you end up getting too many documents,

09:24.870 --> 09:27.150
you don't want to get past a certain size

09:27.150 --> 09:29.040
of your prompt, then you can do that.

09:29.040 --> 09:30.900
This is less important to do nowadays,

09:30.900 --> 09:32.820
where the prompts have, you know,

09:32.820 --> 09:34.260
quite a lot of context window,

09:34.260 --> 09:36.570
but it used to be really important to be able to do that.

09:36.570 --> 09:38.760
And then when you query that, you're gonna retrieve it

09:38.760 --> 09:40.470
and then you're gonna be able to print that

09:40.470 --> 09:41.705
and what you get, is a much better answer, so you can see,

09:41.705 --> 09:46.440
"If you only have pairs of related sentences,

09:46.440 --> 09:47.730
you could go ahead and try training

09:47.730 --> 09:49.500
or fine tuning your sentence transformer

09:49.500 --> 09:51.390
using Natural Language Inference

09:51.390 --> 09:53.610
with multiple negative ranking loss."

09:53.610 --> 09:58.610
So this answer is from the actual context that we got.

09:58.890 --> 10:01.620
So you can see, "Pairs of related sentences

10:01.620 --> 10:03.570
you can go ahead and try training or fine tuning

10:03.570 --> 10:06.450
using LLI with multiple ranking loss."

10:06.450 --> 10:09.030
So we actually pulled that from the context and therefore,

10:09.030 --> 10:13.530
the LLM was able to answer and we didn't have to

10:13.530 --> 10:15.480
match the search query exactly, right?

10:15.480 --> 10:19.500
Like, search query is, if we look back up here,

10:19.500 --> 10:22.357
search query we're trying to answer is,

10:22.357 --> 10:24.330
"Which training methods should I use

10:24.330 --> 10:26.070
for sentence transformers when I only have

10:26.070 --> 10:27.810
a pair of related sentences?"

10:27.810 --> 10:31.920
So, it's not actually a direct keyword match,

10:31.920 --> 10:34.560
and that's what's really powerful about vector search,

10:34.560 --> 10:36.510
is that you just have to get it partially right

10:36.510 --> 10:39.570
and it's gonna tell you what things in your database

10:39.570 --> 10:41.610
are close enough, and then that's enough

10:41.610 --> 10:43.650
to give you the response.

10:43.650 --> 10:47.640
Cool, so that's how Vector search works with Pinecone.

10:47.640 --> 10:49.650
There are lots of different options.

10:49.650 --> 10:52.170
I recommend using something in production

10:52.170 --> 10:55.110
that's not just local, like Face, for example,

10:55.110 --> 10:58.110
is an open source library that you can use locally,

10:58.110 --> 10:59.940
but it's only really good for prototyping

10:59.940 --> 11:01.320
and if you wanna go to production,

11:01.320 --> 11:03.240
I would use something like Pinecone

11:03.240 --> 11:05.970
or you can use Supabase, which we also have a lecture on.

11:05.970 --> 11:09.300
Yeah, hopefully that introduces you to the topic.

11:09.300 --> 11:11.070
All of the Vector database is the same.

11:11.070 --> 11:13.170
So, which in terms of what they do,

11:13.170 --> 11:15.960
they just have different optimization algorithms.

11:15.960 --> 11:18.000
Some are faster, some are cheaper.

11:18.000 --> 11:21.060
So it is really important for you to check the

11:21.060 --> 11:22.890
up to date details, but like,

11:22.890 --> 11:24.510
the actual concept itself

11:24.510 --> 11:26.280
hasn't really changed since they came out.

11:26.280 --> 11:28.130
So, hopefully, that's useful for you.