WEBVTT

00:00.080 --> 00:05.600
Okay everybody, today is a purple day and that means it's a core learning day.

00:05.600 --> 00:08.200
And there's a lot of core for us to do.

00:08.360 --> 00:15.440
Now the the, the observant amongst you may have noticed that yesterday was a yellow day, but accidentally

00:15.680 --> 00:21.200
I made the slides with a purple strip in the in the front and the back page when it should have been

00:21.200 --> 00:22.200
a yellow strip.

00:22.200 --> 00:24.160
And when I realized I'd done that at the end, I.

00:24.160 --> 00:26.400
I almost went back and rerecorded.

00:26.760 --> 00:32.400
But something told me that I'm probably the only one that notices or cares about the color of this strip.

00:32.400 --> 00:38.400
But if you if you notice and you cared about the fact that I got the wrong color strip yesterday, then

00:38.400 --> 00:39.240
many apologies.

00:39.240 --> 00:40.680
I won't make that mistake again.

00:40.680 --> 00:42.480
And I certainly haven't made that mistake today.

00:42.480 --> 00:43.720
It's a purple day.

00:43.720 --> 00:45.440
We've got core to go through.

00:45.720 --> 00:46.600
Let's get to it.

00:46.600 --> 00:53.160
So today I'm going to demystify and fully explain one of the hottest topics in generative AI, which

00:53.160 --> 01:01.160
is rag retrieval, augmented generation, and then also the the new juicy twist on it in the form of

01:01.200 --> 01:02.760
a rag.

01:02.800 --> 01:07.530
I will explain all of this if you already know about rag and rag, then you can put me on two X and

01:07.530 --> 01:11.890
zoom through this because it might be old hat, but for some of you this is going to be really interesting.

01:11.890 --> 01:15.570
It's a technical topic, but I'm not going to go deep on the technical side of it.

01:15.570 --> 01:19.690
I'm going to give you enough so you've got the intuition to know how to put this into practice to get

01:19.690 --> 01:22.250
better commercial outcomes.

01:23.050 --> 01:27.370
But first, before all of that, just a quick recap about APIs.

01:27.410 --> 01:33.010
Okay, so last time we got in the weeds about the terminology around APIs, and I warned you that I

01:33.010 --> 01:34.570
might review it a few times.

01:34.610 --> 01:35.890
And this is one of those times.

01:36.170 --> 01:39.810
So just to quickly recap some of this terminology stuff.

01:40.010 --> 01:45.050
So I know you understand this now, but but this expression calling an API when you're talking about

01:45.050 --> 01:50.850
like a web API, what you mean is you're making a web request using HTTP.

01:51.290 --> 01:56.810
And the URL that you're hitting is what you call an endpoint, and the sort of way you describe it as

01:56.810 --> 02:03.890
look, I'm making an API call by making an HTTP request to an endpoint to a particular URL.

02:04.490 --> 02:11.070
And then there's this concept of a webhook, which is if you're on the receiving end of an API and you've

02:11.070 --> 02:16.510
got a URL which you want someone to call as a kind of real time trigger to tell you about something,

02:16.510 --> 02:20.070
to tell you that an event has happened, they're going to call your webhook.

02:20.110 --> 02:25.390
Some people call them reverse APIs because it's like it's an API coming into you.

02:25.550 --> 02:30.350
You might think, okay, I can call a few APIs to to do four different things, call four different

02:30.350 --> 02:30.950
APIs.

02:31.110 --> 02:35.710
And I'll have I'll publish some webhooks so that people can call me and tell me something's happened.

02:36.110 --> 02:39.070
Reverse API that that is a webhook.

02:39.150 --> 02:44.670
And so you might say, hey, I've set up a webhook, make an HTTP request here to tell me about this

02:44.670 --> 02:45.470
thing happening.

02:45.630 --> 02:51.590
Or you might say to someone, tell me your webhook that I can call to notify you about something that

02:51.590 --> 02:52.550
is a webhook.

02:52.590 --> 02:53.230
Okay.

02:53.270 --> 02:56.990
And then there's this idea that there are different types of HTTP requests.

02:56.990 --> 02:58.430
It's known as the method.

02:58.510 --> 03:04.790
The most common are get and post, but there are some others and usually use get when you're fetching

03:04.790 --> 03:10.680
stuff which will come back to you in the form of JSON and use post when you're sending information and

03:10.680 --> 03:15.360
I've just added in that little extra tidbit that we discovered yesterday, which is that the post can

03:15.360 --> 03:17.800
contain a body which is some JSON.

03:18.040 --> 03:23.800
And now a post could just be something that some code has been written in some code that calls another

03:23.800 --> 03:24.280
component.

03:24.280 --> 03:25.920
It could just be part of a workflow.

03:25.920 --> 03:30.960
It might have nothing to do with LMS, but the way we used it yesterday is that we built it into some

03:31.000 --> 03:31.560
LMS.

03:31.560 --> 03:38.800
Because we gave an LLM in 11 labs, we provided it with a tool that allowed it to make an HTTP request

03:38.800 --> 03:42.640
to a webhook in order to pass something in, in the body.

03:42.640 --> 03:43.480
What did it pass in?

03:43.480 --> 03:46.160
It passed in a question and in.

03:46.480 --> 03:47.240
And it woke up.

03:47.240 --> 03:48.320
It got alerted.

03:48.360 --> 03:49.680
It got this webhook.

03:49.720 --> 03:56.120
It looked at that question, it passed it on to an LLM, and it gave back the answer after also using

03:56.120 --> 03:56.600
a tool.

03:56.840 --> 03:58.600
And that was our the flow.

03:58.720 --> 04:04.960
And so, so these this this this gives you the gist of what it means to run these kinds of, of endpoints

04:04.960 --> 04:06.080
and this webhook.

04:06.440 --> 04:10.760
And hopefully that gives you a good sense of the big ideas behind APIs.

04:10.920 --> 04:14.460
And now on the subject of big ideas.

04:14.740 --> 04:16.420
Total change in topic.

04:16.580 --> 04:18.660
New blank sheet of paper.

04:19.180 --> 04:24.900
I am going to give you the story of Rag and as I say, apologies.

04:24.900 --> 04:29.940
If you already know about Rag then zoom through this or correct me if you disagree with me or whatever.

04:30.380 --> 04:35.260
Okay, so Rag is something that became massively popular probably a couple of years ago now.

04:35.300 --> 04:41.340
It's, uh, it's, uh, this, this idea that is unbelievably simple and so simple that you've probably

04:41.340 --> 04:43.980
thought of it yourself if you didn't already know it.

04:43.980 --> 04:51.380
But people were working feverishly to give llms more expertise in a particular area.

04:51.380 --> 04:55.780
If you work for a particular working on a particular commercial problem, you want to be able to have

04:55.780 --> 04:58.980
an LLM that is an expert in that commercial area.

04:58.980 --> 05:05.060
And back in the day, we used to do that by training llms more with more data for that area, which

05:05.060 --> 05:10.060
is something that's very expensive and time consuming and needs a lot of data.

05:10.420 --> 05:14.940
And this other idea came up, which is a sort of complementary idea.

05:14.980 --> 05:20.830
You can do it in parallel with, with, with working on training or just doing instead of training.

05:20.830 --> 05:25.830
And it's much easier and it's much cheaper, although perhaps a little bit less powerful.

05:25.830 --> 05:30.550
It's just really a quick and easy and it's called rag.

05:30.710 --> 05:37.430
And at the end of the day, all it's really about is making LMS appear like they have more knowledge,

05:37.470 --> 05:42.310
appear like they they have greater expertise by just shoving extra stuff in the prompt.

05:42.310 --> 05:43.510
That's all it's really about.

05:43.510 --> 05:46.430
When I put it that way, you're probably thinking, what's all the fuss about?

05:46.430 --> 05:51.390
But it's a it's a sort of little, little bunch of tricks and techniques to do that really well, so

05:51.390 --> 05:57.510
that when you're asking a question, the LM has the best possible context to give an answer, as if

05:57.510 --> 05:59.750
it's a total expert in that space.

05:59.750 --> 06:06.070
So, uh, as I say, it's about making an LM more knowledgeable, having more more information, more,

06:06.070 --> 06:08.550
more expertise about an area.

06:08.750 --> 06:13.430
And it's Rag stands for retrieval augmented generation.

06:13.470 --> 06:15.830
Lots of long words for not very much.

06:15.830 --> 06:19.350
And it's basically a clever trick that happens to work really well.

06:19.350 --> 06:23.280
And I often like to tell people that at the end of the day, rag is is a hack.

06:23.480 --> 06:25.160
It's a big old hack.

06:25.320 --> 06:27.320
It's not super ingenious.

06:27.360 --> 06:32.080
There's there's a whole cottage industry of little techniques that are built up around rag.

06:32.080 --> 06:37.920
And so people love to kind of rattle off different advanced techniques with, with long names and make

06:37.920 --> 06:43.120
it sound like it's ingenious and complicated, but it's actually pretty simple and it's very hacky,

06:43.400 --> 06:47.960
and it's very much a kind of trial and error, and it works really well.

06:48.040 --> 06:51.640
That's the sort of that's the thinking behind rag.

06:51.920 --> 06:58.640
And there's I like to say that there is a small idea behind rag and a big idea, and I like to like,

06:58.640 --> 06:59.520
go through them both.

06:59.640 --> 07:02.040
And if you've heard me say this before, I'm sorry.

07:02.240 --> 07:10.640
Uh, but the small idea is just basically saying, hey, can't we just just make an LLM be more knowledgeable

07:10.640 --> 07:13.400
just by adding in stuff to the prompt?

07:13.400 --> 07:19.800
So if we are a travel airline and we're trying to answer questions about traveling somewhere, we could

07:19.800 --> 07:26.420
just shove all sorts of information about our travel tips and our ticket prices to different places,

07:26.420 --> 07:28.260
and we could just shove it all in the prompt.

07:28.260 --> 07:33.220
So the prompt could say, like the user is asking this question, how much does it cost to travel to

07:33.260 --> 07:33.860
Paris?

07:34.180 --> 07:36.700
And here's relevant background information.

07:36.740 --> 07:38.700
What might be relevant in answering the question.

07:38.700 --> 07:40.860
And then shove lots of stuff in there.

07:40.980 --> 07:46.300
And llms are very good when they get this big prompt, when they're predicting the tokens that will

07:46.300 --> 07:50.780
come next, they will predict ones that are consistent with all of this context.

07:50.780 --> 07:56.780
And so if the ticket prices to London are in this context, it is bound to include that in what it thinks

07:56.780 --> 07:57.700
should come next.

07:57.700 --> 08:02.380
And so this is just a very simple idea that leads to better outcomes.

08:02.380 --> 08:07.420
You can try it yourself with ChatGPT you can say how much a ticket price is to London, by the way,

08:07.420 --> 08:11.860
here's some extra stuff and then give it lots of information, including ticket prices to London and

08:11.860 --> 08:13.540
it will do a great job in answering it.

08:13.540 --> 08:14.380
No surprise.

08:14.380 --> 08:20.100
So obviously, the problem with what I've just said there is that it's it's it's fundamentally not scalable.

08:20.540 --> 08:27.550
If if we were a travel agent and ticket prices to London was was the thing being discussed, but we

08:27.550 --> 08:32.590
have ticket prices to every single city in the world, then that's a lot of ticket prices.

08:32.590 --> 08:37.710
And we couldn't take all that information and put all of it in the prompt.

08:37.710 --> 08:43.990
Or maybe we could, but it would be so crammed full of irrelevant information that we'd be really setting

08:43.990 --> 08:49.830
up the Lem to fail, because it's got to do this enormous job of reading through this, this, this,

08:49.870 --> 08:55.030
this essay of information where all we were really after is one small sentence somewhere in the middle

08:55.030 --> 08:57.390
of it with the ticket prices to London.

08:57.670 --> 09:01.790
So the big idea is all about saying, look, is there a way?

09:01.830 --> 09:08.590
Are there some tricks where rather than sending all the data that we've got, we can select a subset

09:08.590 --> 09:14.710
of data, a relevant subset that is most likely going to answer the question.

09:14.710 --> 09:17.470
It might not and we might give too much information.

09:17.470 --> 09:22.910
Maybe only a third of the information is useful, but as long as some useful information is in there,

09:22.910 --> 09:25.510
it's going to be better than the no information at all.

09:25.510 --> 09:31.970
So that's the idea behind rag a trick to get a relevant subset of data.

09:31.970 --> 09:38.250
And it's just like a hacky trick where you can you can tweak lots of lots of different ways of doing

09:38.250 --> 09:39.170
it, lots of settings.

09:39.170 --> 09:43.050
That might mean you send in more information or less information, and you're always trying to find

09:43.050 --> 09:50.010
a nice balance where you're not overwhelming the LM with lots of irrelevant context, but you've got

09:50.410 --> 09:52.370
the right information in there.

09:52.370 --> 09:58.650
And that kind of trade off is is all part of doing Rag well, which many people are experts on after

09:58.690 --> 10:00.210
after practicing for a long time.

10:00.410 --> 10:07.650
And the the big idea behind this is that when you're trying to figure out what subset of the data is

10:07.650 --> 10:13.250
relevant to the question, well, you can actually use an LM to help with that.

10:13.250 --> 10:18.410
Not the same LM necessarily that you're going to ask the question to later, but you can use a different

10:18.410 --> 10:24.690
LM to be trying to figure out, okay, of all of this information, what's most likely to be interesting

10:24.890 --> 10:29.570
for answering the question, how much a ticket prices to London, and how can I find that information

10:29.570 --> 10:30.730
from all of this?

10:30.850 --> 10:33.210
That is the big idea behind Rag.