WEBVTT

00:00.280 --> 00:00.880
Hey there.

00:00.920 --> 00:01.600
Eat it here.

00:01.720 --> 00:06.480
So the first part of our brag pipeline is to ingest the documentation.

00:06.640 --> 00:10.800
We're going to get the latest up to date latest documentation.

00:10.960 --> 00:15.400
And we're going to ingest it and index it into our vector store.

00:15.800 --> 00:21.920
And in order to do this we're going to be using the Wacom, which is going to make this tedious task

00:21.920 --> 00:27.600
of downloading the documentation since and before we dive in, I just want to give a bit of history.

00:27.960 --> 00:34.320
When I made this course in early 2022, I was doing this process manually, writing a lot of scripts,

00:34.320 --> 00:36.760
and it really was cumbersome.

00:36.800 --> 00:37.640
It broke.

00:37.640 --> 00:43.360
It has a lot of issues, and right now in 2025, things have changed a lot.

00:43.400 --> 00:49.640
And we have tools like crawl, which are going to help us to achieve this with just a bunch of API calls

00:49.840 --> 00:50.920
just to double click.

00:51.120 --> 00:56.160
So I used to use File Crawl in order to achieve this tracking capability.

00:56.400 --> 01:02.840
However, I found that the Firepro package was not very well maintained with link chain ecosystem and

01:02.840 --> 01:04.440
it had some scaling issues.

01:04.920 --> 01:09.270
So that's the reason I chose to migrate from crawl to the With the Velcro.

01:09.950 --> 01:10.350
All right.

01:10.350 --> 01:14.150
So let's talk now about what we're going to be doing in the next couple of videos.

01:14.310 --> 01:18.070
So first I want you to get to know the API better.

01:18.070 --> 01:23.390
And we're going to have a walkthrough of how to use tabular map tabular extract to really crop.

01:23.710 --> 01:30.830
And after that we're going to be integrated into our chain Rag ingestion pipeline which is going to

01:30.870 --> 01:33.350
take the latent documentation.

01:33.350 --> 01:35.910
It's going to map all the URLs.

01:35.910 --> 01:38.790
And then it's going to concurrently scrape those URLs.

01:38.790 --> 01:43.470
And we're going to add some metadata and index everything into a vector store.

01:43.670 --> 01:49.230
So this entire process it might look like a lot but it's actually not a lot of code.

01:49.270 --> 01:53.550
And all the heavy lifting is going to be done by link chain and the village.

01:53.710 --> 01:59.950
The village is going to be responsible for the crawling and scraping of the documentation, and link

01:59.990 --> 02:06.710
chain is going to be responsible to chunk everything up, update the metadata and to index everything

02:06.710 --> 02:07.350
seamlessly.

02:07.590 --> 02:09.830
By the way, in case you're worried about pricing.

02:09.830 --> 02:14.990
So don't be because the really offers us a very generous free tier, which is more than enough for this

02:14.990 --> 02:16.430
course and for this task.