WEBVTT

00:00.040 --> 00:06.920
Okay, so just to make this real for you, let me talk through a diagram for the small idea behind Rag

00:07.200 --> 00:12.160
giving LMS extra knowledge just by shoving it in the prompt.

00:12.400 --> 00:14.120
So let's say we're in a chat.

00:14.280 --> 00:15.320
This is a chat.

00:15.360 --> 00:17.320
The user is asking us a question.

00:17.320 --> 00:19.800
Maybe they're asking what a ticket price is to London.

00:19.800 --> 00:27.320
So in comes a question and we have N810 or some other, some low code or real code platform that is

00:27.320 --> 00:32.120
responsible for figuring out how to answer that question, presumably using an LLM.

00:32.440 --> 00:40.400
Well, just before we answer that question, we first look up if we've got anything useful in our database

00:40.400 --> 00:42.400
that here I'm calling our knowledge base.

00:42.480 --> 00:47.720
You may remember that word because we used it briefly before with 11 labs, but basically just a database

00:47.720 --> 00:49.040
full of useful knowledge.

00:49.360 --> 00:52.600
And maybe what we could do is we could look for the word London.

00:52.600 --> 00:57.760
We could just sort of scan for the word London in our database, collect everything to do with London,

00:57.760 --> 01:00.830
and then we simply shove all of that in the prompt.

01:00.870 --> 01:04.790
The prompt says the user is asking how much ticket price is to London.

01:04.830 --> 01:07.470
Here's extra information that might be helpful.

01:07.470 --> 01:11.990
And then everything that has the word London in it in in our knowledge base.

01:12.350 --> 01:13.990
And that goes to the Lem.

01:14.310 --> 01:16.910
The Lem predicts next tokens.

01:16.910 --> 01:19.710
And it will be consistent with all of this context.

01:19.710 --> 01:25.870
And as a result, the answer is likely to answer the question how much ticket prices to London?

01:25.950 --> 01:27.790
But of course, you know the problem with this.

01:27.790 --> 01:33.110
The problem is what if the user doesn't ask how much a ticket prices to London, but they say how much

01:33.150 --> 01:39.990
does it cost to go to Heathrow Airport and Heathrow Airport being the airport, one of the larger airports

01:39.990 --> 01:47.310
in London, and the the word Heathrow say isn't as it happens in our knowledge base, so we don't find

01:47.350 --> 01:48.150
anything.

01:48.150 --> 01:49.870
So the Lem has no clue.

01:50.270 --> 01:50.990
It's brittle.

01:51.030 --> 01:56.590
This this small idea is a good idea, but it's brittle and it depends on, like the text in the question.

01:56.590 --> 02:02.410
And it depends on whatever code we write to do that retrieving It's, uh, it's very clunky.

02:02.650 --> 02:09.850
And so the big idea is all about figuring out how to do a kind of fuzzy look in the knowledge base,

02:09.890 --> 02:11.010
a fuzzy search.

02:11.050 --> 02:17.570
It's sometimes called a semantic search, a search for things that are interesting in our knowledge

02:17.570 --> 02:18.050
base.

02:18.210 --> 02:19.930
That's what it's all about.

02:20.330 --> 02:26.250
And that is where we have to take a tiny detour to talk about something called embedding models.

02:26.490 --> 02:27.770
Okay, here we go.

02:27.770 --> 02:28.570
An embedding model.

02:28.610 --> 02:34.210
Now, if this is all sounding technical and too nerdy for you, you're thinking, is this really going

02:34.210 --> 02:34.530
somewhere?

02:34.530 --> 02:39.010
I just want to assure you, this is very commercially relevant and important.

02:39.010 --> 02:44.130
And in any commercial agent system you're going to build, you're going to you're going to make use

02:44.130 --> 02:44.450
of this.

02:44.490 --> 02:46.250
It's going to allow you to do more.

02:46.410 --> 02:50.210
It's going to allow you to accelerate because you've got this expertise.

02:50.330 --> 02:52.130
So so hang on in there with me.

02:52.130 --> 02:52.930
It's worth it.

02:53.250 --> 02:53.810
Okay.

02:54.090 --> 03:00.480
An embedding model is a type of LLM, but it's different to the ones we normally work with.

03:00.520 --> 03:02.360
It's sometimes also called an encoder.

03:02.400 --> 03:03.480
An encoding model.

03:03.520 --> 03:04.160
An embedder.

03:04.160 --> 03:05.040
An embedding model.

03:05.080 --> 03:07.800
A vector embedding model goes by all those different names.

03:07.800 --> 03:09.120
It's the same thing.

03:09.160 --> 03:10.880
It's a special kind of LM.

03:11.040 --> 03:17.640
Most LMS that we work with take text, and they predict the most likely things to come after that text.

03:17.680 --> 03:18.800
That's what they're trained to do.

03:18.840 --> 03:20.480
That's the job of an LM.

03:20.520 --> 03:23.760
But there is this different kind called an embedding model.

03:23.760 --> 03:31.600
And its job is to take an input of text but not output what should come next, but rather output a bunch

03:31.600 --> 03:38.080
of numbers, just come up with some numbers that represent the meaning of that text.

03:38.240 --> 03:41.680
A bunch of numbers to reflect the meaning of it.

03:41.720 --> 03:47.040
It turns language into numbers where those numbers represent the meaning of the text.

03:47.320 --> 03:49.280
It's not not not the same as tokens.

03:49.280 --> 03:54.480
If you the text always gets turned into tokens before it goes into any of these models.

03:54.480 --> 03:59.980
But the thing about these numbers is that, as I will explain in a second, they they are they don't

03:59.980 --> 04:05.820
just represent the words, they represent the meaning of the whole bunch of texts together.

04:05.860 --> 04:06.220
Okay.

04:06.260 --> 04:08.940
So let's let's say more about this.

04:08.940 --> 04:11.820
So the embedding model, its input is some text.

04:11.860 --> 04:14.580
The output is a is a list of numbers.

04:14.940 --> 04:19.620
And when we see a list of numbers we sometimes call it a vector.

04:19.780 --> 04:25.220
And it's a sort of nod to the fact if there were three numbers, if it if it outputted three numbers,

04:25.220 --> 04:31.380
you could think of that as if they are a coordinate in space, like an X and a y and a z coordinate,

04:31.420 --> 04:34.500
a z coordinate of some point in space, like right here.

04:34.660 --> 04:37.540
Uh, and if it were three numbers, it could be that.

04:37.820 --> 04:41.540
Now it's typically not three numbers, it's more like a thousand numbers.

04:41.740 --> 04:50.340
And you could call that and some and, and we often do a point in a thousand dimensional space.

04:50.740 --> 04:53.740
Now that sounds very sci fi and very nerdy.

04:53.780 --> 04:57.340
And it's, it's, it's it's actually not as complicated as it sounds.

04:57.340 --> 05:02.770
It just means there's a thousand numbers that describes this thing, and if there were three, it would

05:02.770 --> 05:03.970
be a point in space.

05:03.970 --> 05:05.130
It's a thousand numbers.

05:05.130 --> 05:11.650
So it's like a point in a thousand dimensional space is what is represented by a thousand numbers.

05:11.650 --> 05:15.850
But you can get fancy and call it a point in a thousand dimensional space.

05:16.050 --> 05:21.410
Or sometimes people say a point in multidimensional space, or you can just say, hey, it's a bunch

05:21.410 --> 05:24.490
of numbers that represents the meaning of this text.

05:24.490 --> 05:25.930
So why is any of this important?

05:25.930 --> 05:31.210
Well, to explain that just for a moment, let's, let's, let's pretend that we're always working in

05:31.210 --> 05:33.090
three dimensional space.

05:33.090 --> 05:34.810
There's always just three numbers.

05:34.850 --> 05:39.290
These this model produces three numbers associated with a different input.

05:39.330 --> 05:40.970
Text gets different inputs.

05:40.970 --> 05:42.210
Three numbers come out.

05:42.450 --> 05:43.570
Here's the thing.

05:43.850 --> 05:51.970
If you give it two paragraphs of text which have a similar meaning, then they will end up generating

05:51.970 --> 05:55.810
points, XYZ numbers that are close to each other.

05:56.090 --> 05:59.510
And what I mean by that is, if you were to look at them in space here.

05:59.510 --> 06:01.670
And these are all the different paragraphs of text.

06:01.710 --> 06:06.830
Then the points that are closest together would represent text which has a similar meaning.

06:07.110 --> 06:08.270
That's what it is.

06:08.270 --> 06:12.590
So points close to each other have a similar meaning in this three dimensional world.

06:12.750 --> 06:16.830
And of course, the experts here will know that it's not exactly being close to each other.

06:16.830 --> 06:19.310
It's a calculation called cosine similarity.

06:19.310 --> 06:22.150
But it's that's a detail that we don't need to worry about right now.

06:22.430 --> 06:27.830
We can just say points that are close to each other represent texts with similar meaning, and that

06:27.830 --> 06:33.350
applies in 3D space, but it also applies in a thousand dimensional space.

06:33.350 --> 06:40.350
You have a thousand numbers and you do the same calculation to say, how similar are these two vectors

06:40.350 --> 06:42.470
of numbers, these two lists of numbers.

06:42.470 --> 06:46.870
And that tells you how close is the meaning of the inputs.

06:47.230 --> 06:52.270
And when I say that the meaning is similar, I'm not saying that it has the same words in it.

06:52.270 --> 06:56.110
I'm literally saying that they represent things with a similar meaning.

06:56.110 --> 07:02.900
So, uh, how much does it cost to go to London would have a similar meaning to what's the ticket price

07:02.900 --> 07:06.340
to Heathrow, even though the words are completely different.

07:06.340 --> 07:12.220
But the model has been trained so that it recognizes the underlying meaning, and those two things end

07:12.220 --> 07:16.780
up in a similar place in 3D space or in a thousand dimensional space.

07:16.780 --> 07:19.140
They end up being close to each other.

07:19.260 --> 07:20.500
That is the point.

07:20.500 --> 07:24.820
That is what an embedding model is able to do, and it's really good at it.

07:24.820 --> 07:27.060
And if you're wondering, how is it so good at it?

07:27.100 --> 07:30.900
It's good at it because it's been trained with lots and lots of data.

07:30.940 --> 07:35.740
It's been trained with tons of data, tons of things, and using clever tricks to give it information

07:35.740 --> 07:36.660
that's similar.

07:36.660 --> 07:43.900
And it's learned over time with lots of data how to associate similar things with similar points in

07:43.900 --> 07:44.500
space.

07:44.820 --> 07:47.980
Okay, so so at this point I want you just to take my word for it.

07:48.100 --> 07:50.060
Let's assume that that you believe me.

07:50.060 --> 07:54.460
There are these things, this type of LM called an encoder or an embedding model.

07:54.460 --> 08:00.160
It's able to take text and give you a set of numbers that you think of as a vector such that if you

08:00.160 --> 08:04.280
find out how close it is to something else, that will tell you how similar it is.

08:04.600 --> 08:05.480
So what?

08:05.520 --> 08:07.120
Well, what can we do with this?

08:07.520 --> 08:08.000
Okay.

08:08.120 --> 08:13.240
Well, what we can do with this is something that people like to call a semantic search.

08:13.440 --> 08:20.840
What we can do with this is it allows us to do a sort of fuzzy search over a big set of data to find

08:20.880 --> 08:25.240
stuff that's similar to the meaning of some text.

08:25.760 --> 08:27.520
And why is that helpful?

08:27.600 --> 08:32.440
Well, that's helpful if someone's got a question like how much a ticket price is to London?

08:32.520 --> 08:40.520
This gives us a trick to look across tons of data and pluck out the data that might be relevant to that

08:40.520 --> 08:47.280
question, whether it's ticket prices to London or the cost to travel to Heathrow or whatever it happens

08:47.280 --> 08:47.800
to be.

08:47.840 --> 08:49.720
It's going to have a similar meaning.

08:49.720 --> 08:57.640
So that data is going to be given a vector which will be close to the vector of the question, and that

08:57.640 --> 08:58.880
is the big idea.