WEBVTT

00:00.040 --> 00:06.120
In the next couple of videos, we'll be chunking up the link chain documentation into smaller chunks,

00:06.280 --> 00:10.880
so we'll be able to provide it as context to our LMS.

00:11.240 --> 00:13.280
We are going to then embed.

00:13.280 --> 00:16.240
So we're going to turn those chunks into vectors.

00:16.240 --> 00:20.160
And that vectors are going to be indexed into our vector store.

00:20.480 --> 00:20.960
All right.

00:20.960 --> 00:24.040
So here we have the snippet of the chunking face.

00:24.280 --> 00:27.240
And we start by logging everything.

00:27.840 --> 00:34.600
And the entire chunking process is going to be with the recursive character text filter from name chain.

00:35.080 --> 00:43.200
We're going to provide it with the chunk size of 4000 chunk over up of 200, and it's going to recursively

00:43.200 --> 00:44.840
chunk up our documents.

00:45.200 --> 00:52.320
Now, I do elaborate on the recursive character text splitter in this video over here if you want to

00:52.320 --> 00:53.160
check it out.

00:53.520 --> 00:59.280
But the overall idea here is that it's going to semantically chunk up our documents.

00:59.440 --> 01:03.360
So first it's going to chunk it up by paragraphs, then by new line.

01:03.560 --> 01:12.570
And we want to limit the chunk sizes to 4000 and the chunk overlap to be 200, and those are 4000 characters

01:12.570 --> 01:14.010
and 200 characters.

01:14.650 --> 01:21.970
So after we have the text splitter object ready, we simply use the built in method of split documents

01:22.090 --> 01:24.850
and we provide it with the link chain documents.

01:25.090 --> 01:29.730
Once it's finished, we're going to get back a larger list of documents.

01:29.730 --> 01:34.690
But the documents are going to be chunked up and then we are going to log everything.

01:34.850 --> 01:39.530
So this entire step in the pipeline is very simple to implement.

01:39.570 --> 01:42.170
Link chain is going to do all the heavy lifting for us.

01:42.370 --> 01:45.130
And I just want to mention a few things.

01:45.170 --> 01:49.490
First of all, this isn't a silver bullet chunking method.

01:49.490 --> 01:53.650
So there are many chunking methods and this is a very deep topic.

01:53.890 --> 02:00.330
If you want to go and explore it, there are many strategies small to big semantic chunking and a lot

02:00.370 --> 02:03.450
of cool things you can do in this phase for optimizations.

02:03.570 --> 02:05.450
This is the first thing I wanted to mention.

02:05.450 --> 02:12.050
And second thing I want to say is that because LMS now have larger token limits which can reach these

02:12.050 --> 02:21.860
days in 2025 to 1 million tokens like in a anthropic, and can go to Gemini 2.5 with 2 million input

02:21.900 --> 02:22.460
tokens.

02:22.820 --> 02:30.460
Then this chunking over here that we were doing is important, but it's not the thing we should be focused

02:30.460 --> 02:30.820
on.

02:30.980 --> 02:37.100
And a lot of people are saying that rag is dead because of those larger context windows.

02:37.220 --> 02:40.860
And I think it's important to mention that rag is not dead.

02:40.900 --> 02:41.980
It's evolving.

02:42.100 --> 02:45.140
Even with the rise of huge context windows.

02:45.300 --> 02:47.500
And I'll tell you why it's important.

02:47.500 --> 02:54.980
First of all, we have the cost efficiency because feeding a million token document into a LLM is significantly

02:54.980 --> 03:00.420
slower and far more expensive than simply retrieving just the relevant snippet.

03:00.460 --> 03:01.420
With rag.

03:02.860 --> 03:10.900
Another thing why Rag is important is because of precision and noise reduction and Rag filters to only

03:10.900 --> 03:13.540
the most relevant chunks.

03:13.580 --> 03:21.660
So this is drastically reducing hallucinations and positional biases found in long context methods and

03:21.820 --> 03:24.260
retrieval with intelligent reordering.

03:24.260 --> 03:32.180
So this is a strategy of retrieving the documents improves the answer quality while using fewer tokens

03:32.180 --> 03:35.700
than the full context, and this is proven.

03:36.460 --> 03:46.020
I also want to mention that Rag gives us a lot of user facing features, so you can see the source of

03:46.020 --> 03:48.220
the pieces of information in the answer.

03:48.420 --> 03:52.060
So we can trace back that answer to it and find its origins.

03:52.220 --> 03:58.340
And this is critical to convey trust in the AI system.

03:58.380 --> 04:01.980
And it's very critical in regulated environments.

04:02.100 --> 04:09.500
So I think that rather than saying rag is dead, I think that long context models are complementary

04:09.500 --> 04:16.140
to it, and they support Rag more effectively by handling richer and prompt sequences.

04:17.340 --> 04:17.740
All right.

04:17.740 --> 04:25.060
And the TLDR for everything is that larger context windows don't kill rag, but they amplify it and

04:25.220 --> 04:26.500
magnify the strings.

04:26.820 --> 04:29.860
Alrighty, let's go now and take all of these chunks.

04:29.860 --> 04:34.900
We want to turn them into vectors and let's go and index them in the vector store.
