WEBVTT

00:00.120 --> 00:02.680
And so of course, this is the moment I bring it together.

00:02.680 --> 00:08.200
I give you the new diagram and everything I hope will now click into place for you.

00:08.240 --> 00:10.920
We've got a user who's chatting with us.

00:10.920 --> 00:16.240
They ask a question, maybe they ask how much does it cost to travel to Heathrow?

00:16.440 --> 00:17.560
Something like that.

00:17.600 --> 00:23.600
That isn't going to directly match with our database that comes to our code, whether it's our software

00:23.600 --> 00:25.160
or if it's n810.

00:25.720 --> 00:26.240
Okay.

00:26.280 --> 00:27.360
What are we going to do?

00:27.400 --> 00:33.760
The first thing we're going to do is take that question, and we're going to to call an embedding model

00:33.760 --> 00:38.960
and say, can you turn this question into a set of numbers into a vector?

00:39.200 --> 00:43.440
Sometimes people call this vectorize, which again sounds very sci fi.

00:43.880 --> 00:50.040
So we're going to call an embedding model with the question and have it turn it into a set of numbers,

00:50.040 --> 00:51.040
a vector.

00:51.480 --> 00:52.080
Okay.

00:52.440 --> 00:54.280
And then what I think you know what.

00:54.320 --> 01:00.200
We then look in our database and we find information that might be relevant.

01:00.200 --> 01:01.400
And how do we do that.

01:01.440 --> 01:05.360
We've taken all of our database and we've turned everything into vectors.

01:05.360 --> 01:07.940
We've got vector versions of everything.

01:07.940 --> 01:11.300
And we say, here's the question how much does it cost to go to Heathrow?

01:11.580 --> 01:12.300
What?

01:12.340 --> 01:18.660
What in our database has vectors that is close to this because that's likely to be relevant.

01:18.660 --> 01:19.660
It might not be.

01:19.700 --> 01:21.980
It might be about something different to do with Heathrow.

01:22.420 --> 01:24.900
It might be might be not to do with ticket prices at all.

01:24.940 --> 01:27.940
But hopefully some data will be relevant.

01:27.940 --> 01:33.700
So let's gather like like things that are within a certain distance or maybe the ten closest things

01:33.740 --> 01:36.900
take them and that is going to be relevant.

01:36.900 --> 01:42.100
And when you find them, you look up the original content, the text associated with them, not the

01:42.100 --> 01:45.940
vectors, but the text that got the vectors that ended up here.

01:46.180 --> 01:49.380
And you take all of that text together and what do you do with it?

01:49.380 --> 01:54.300
You shove it in the prompt that you say the user is asking, how much does it cost to go to Heathrow?

01:54.420 --> 01:59.660
Here's some content that might be relevant, and then lots of text that you've plucked out because it

01:59.660 --> 02:01.900
was close in vector space.

02:02.220 --> 02:05.980
And the LLM gives back an answer and it's the right answer and you're happy.

02:06.140 --> 02:08.500
And that is the big idea behind rag.

02:08.540 --> 02:12.900
And hopefully at this point that I feel like I hope I've demystified it for you.

02:12.900 --> 02:14.200
It's connected.

02:14.200 --> 02:19.080
You don't need to know all of the gory details, but it's good to get the general idea and to say more

02:19.080 --> 02:19.440
about it.

02:19.440 --> 02:21.240
And I'm probably telling you things you already know.

02:21.480 --> 02:23.840
The the database at the bottom there.

02:23.840 --> 02:29.760
If it's a database which allows you to look up information based on its vector and it's able to to quickly

02:29.880 --> 02:34.600
query for things close to a vector, then it's often known as a vector database.

02:34.600 --> 02:39.920
And back in the day, they used to be just a small number of highly specialized databases that were

02:39.920 --> 02:40.880
good at this.

02:40.880 --> 02:48.360
But nowadays it's so popular that all of the mainstream databases support having vectors in them for

02:48.400 --> 02:49.560
looking up data.

02:49.560 --> 02:56.680
So so databases like Postgres and like MongoDB, they all allow you to do this to have to have vectors

02:56.680 --> 03:01.000
and to query based on vector similarity as it's called.

03:01.200 --> 03:02.640
And so that's that's one thing.

03:02.680 --> 03:09.000
Another thing to mention is some people get confused about this, this, this whole vector thing, and

03:09.000 --> 03:15.920
they think that the information that ends up getting sent to the LM on the right, that prompt contains

03:15.920 --> 03:17.400
some vectors in there.

03:17.400 --> 03:20.700
So it's like, what are ticket prices to Races to London, by the way.

03:20.740 --> 03:22.460
Here are here's here's the vector.

03:22.460 --> 03:24.500
And here are vectors of similar information.

03:24.500 --> 03:25.900
And then lots of numbers.

03:25.900 --> 03:28.940
But that LM on the right doesn't know anything about these vectors.

03:28.940 --> 03:32.940
That LM on the right just wants a prompt and it wants to predict the next tokens.

03:32.940 --> 03:35.140
It doesn't want you to send it a bunch of numbers.

03:35.380 --> 03:37.940
It's just expecting that you've used that.

03:37.940 --> 03:40.140
You've selected relevant contexts.

03:40.140 --> 03:44.180
And it doesn't know that you used an embedding LM and all this stuff in the middle.

03:44.460 --> 03:45.780
It's completely unaware of that.

03:45.780 --> 03:49.380
It just knows you've got a prompt with good relevant context.

03:49.500 --> 03:53.340
And so the embedding model has nothing to do with that.

03:53.380 --> 03:54.260
LM on the right.

03:54.300 --> 03:56.820
They are potentially completely unrelated.

03:56.860 --> 04:00.620
They could be from the same family from they could both be made by Google.

04:00.620 --> 04:01.540
But they don't need to be.

04:01.540 --> 04:03.380
They could be completely different.

04:03.420 --> 04:11.380
The only role of that embedding model is to help with this whole fuzzy lookup, this semantic search,

04:11.420 --> 04:17.620
search for data based on its meaning by turning everything into vectors and finding similar vectors

04:17.620 --> 04:18.740
to my other knowledge.

04:18.740 --> 04:21.620
But once we've done that, then vectors are forgotten.

04:21.620 --> 04:24.060
What goes to the LM is text.

04:24.100 --> 04:27.290
Is actual information Relevant content.

04:28.090 --> 04:30.970
You probably knew that already, but it doesn't hurt to emphasize it.

04:30.970 --> 04:31.570
Okay.

04:31.610 --> 04:34.450
And so that's all I'm doing on rag.

04:34.450 --> 04:36.410
And we're going to actually build some rag.

04:36.410 --> 04:37.570
So so you get to use it.

04:37.570 --> 04:42.290
But hopefully you see that the the amazing thing about this is that you can use this to give the sort

04:42.290 --> 04:47.650
of illusion that you've got a model that knows about so much information that has knowledge of your

04:47.650 --> 04:54.050
entire company because it's able just to get the relevant stuff for any question that's asked.

04:54.050 --> 04:59.690
And so the big business use case for this is whenever you want, like an expert knowledge worker that

04:59.690 --> 05:05.290
has expertise about all of the products, your company, about all of the employees at your company,

05:05.290 --> 05:06.410
anything like that.

05:06.450 --> 05:08.850
Then Rag is the go to technique.

05:08.850 --> 05:13.450
So for people that are building like, like HR systems that need to know all of your policies and all

05:13.450 --> 05:15.570
employee information, it could be Rag.

05:15.570 --> 05:21.130
If you're looking to build a knowledge worker that has full expertise of everything about what your

05:21.130 --> 05:23.410
company works on, sounds like rag.

05:23.570 --> 05:28.570
Anything that's to do with that kind of quick expertise and information.

05:28.610 --> 05:31.410
Rag is one of the go to techniques.

05:31.410 --> 05:36.710
So as I said before, there is a whole cottage industry of of extra techniques.

05:36.750 --> 05:42.150
There's there's graph rag, there's hierarchical rag, there's reranking so much.

05:42.150 --> 05:46.830
And if if you feel hungry for this and you're going to be hammering me with questions about it all,

05:46.830 --> 05:52.070
then I should tell you that in my AI engineer core track week five, we spend the whole week on Rag,

05:52.070 --> 05:58.470
including the single most important thing about Rag, which is measurement, measuring it, evaluating

05:58.470 --> 05:58.790
it.

05:58.830 --> 06:06.150
Rag is incredibly hacky and experimental, and the only way to tame rag and know whether you're you

06:06.190 --> 06:11.510
need to make changes or do one technique over another is to measure it, measure the quality of your

06:11.510 --> 06:16.030
retrieval, measure the quality of your responses, and I cover all of that.

06:16.070 --> 06:18.350
Week five of the core track.

06:18.510 --> 06:22.110
Uh, if you're if you're if you're interested, if I've, if you've caught the bug.

06:22.710 --> 06:27.750
But for now, it's time to explore a gentic rag.

06:27.790 --> 06:30.110
The such a hot new topic.

06:30.270 --> 06:35.750
And I wanted to to do it by explaining what's the difference between the kind of traditional rag rag

06:35.750 --> 06:39.670
of the original variety and the new agentic rag?
