WEBVTT

00:00.640 --> 00:02.000
Hi in here.

00:02.000 --> 00:07.720
And in this video we're going to cover the imports and main classes that we're going to use during the

00:07.760 --> 00:10.960
ingestion phase in our Rag pipeline.

00:11.360 --> 00:14.920
So we'll also define the environment variables we'll be needing.

00:15.080 --> 00:21.640
So all the API keys and all the configuration we need will initialize to really map to really extract

00:21.840 --> 00:23.160
OpenAI embeddings.

00:23.160 --> 00:25.440
And if you want you can use open source embeddings.

00:25.640 --> 00:28.760
And we'll also be using the pinecone vector store.

00:28.760 --> 00:32.840
And you can use Chrome ADB as well, or any other vector store you want.

00:33.080 --> 00:39.680
The beautiful thing about link chain is that we have one single interface for all vector stores and

00:39.680 --> 00:42.360
all embedding models, so the code should be similar.

00:43.640 --> 00:44.040
All right.

00:44.040 --> 00:52.080
So right now we have this boilerplate code which is importing Asyncio and running our main function

00:52.080 --> 00:52.480
here.

00:52.920 --> 00:56.680
So let's run it as a sanity check just to see that everything is working.

00:56.880 --> 01:00.320
So I'm going to click this play button at the top right.

01:01.010 --> 01:05.810
And then everything is going to run now and it's working.

01:06.410 --> 01:12.730
So let me go and show you the logger.py file, which I have pre-prepared and you should have it as well.

01:14.770 --> 01:23.170
And here I simply define a bunch of colors and some nice printing functions, so we can see the logs

01:23.170 --> 01:24.730
in a readable way.

01:25.330 --> 01:29.890
And we can document and log every step of our ingestion pipeline.

01:30.050 --> 01:35.650
So we have log info log success log error log warning log header.

01:36.010 --> 01:39.730
And we'll be using an importing that in the ingestion file.

01:40.250 --> 01:43.250
All right so let's start with the first import.

01:43.250 --> 01:49.370
We'll be needing the OS module so we can import and use in reference environment variables.

01:49.650 --> 01:56.450
We need SSL because we're going to create some SSL context and some type hinting objects.

01:59.490 --> 02:00.010
All right.

02:00.050 --> 02:07.930
Now we want to import the certified package, which is going to get us a valid certificate, so we can

02:07.930 --> 02:11.410
attach to our HTTP request that we're going to be sending.

02:12.530 --> 02:20.250
Like always, we're going to import from dot env the dot env file, which is going to load the environment

02:20.250 --> 02:22.370
variables from the dot env file.

02:23.490 --> 02:26.610
And we'll soon see exactly what values do we need there.

02:27.010 --> 02:32.130
And next we want to import all things related to link chain.

02:32.490 --> 02:37.770
So let me go and paste all those imports and let's go over them one by one.

02:38.370 --> 02:45.050
We're going to import first the recursive character text splitter which is the link chain helper class

02:45.050 --> 02:47.530
which is going to help us split the documents.

02:47.770 --> 02:54.850
Now I have an entire video dedicated discussing how this text splitter is working and why it's effective,

02:54.850 --> 02:57.930
so feel free to check out this video in your free time.

02:58.210 --> 03:01.250
We want to import from link chain chroma.

03:01.690 --> 03:06.980
The chroma class, which is going to be the chroma vector store in case you want to index everything

03:06.980 --> 03:07.540
locally.

03:07.820 --> 03:14.140
However, I will be using the Pinecone Vector store, which is a cloud based vector store, so it's

03:14.140 --> 03:16.540
going to be in the link chain pinecone package.

03:16.740 --> 03:18.340
So you can choose either one.

03:18.500 --> 03:20.900
I'm going to show in the videos the Pinecone Vector store.

03:21.740 --> 03:29.380
I also want to import the document class which represents a text document with associated metadata.

03:29.660 --> 03:36.380
So it's the core abstraction in link chain for handling text data that can be processed, split, embedded

03:36.380 --> 03:37.340
or indexed.

03:37.980 --> 03:43.700
And if you want to dive deep about documents, you should check out this video where I elaborate why

03:43.700 --> 03:47.460
this class is important and why do we need it, and examples of it.

03:48.340 --> 03:51.020
And I'll be using the OpenAI embeddings.

03:51.380 --> 03:56.180
And by the way, I have students that use open source embeddings which worked fine as well.

03:56.700 --> 03:57.180
All right.

03:57.180 --> 03:59.460
Let's go and import from Tivoli.

03:59.500 --> 04:06.020
The Tivoli Crawl, which is going to be our main driver to get the documentation, and we'll also import

04:06.020 --> 04:09.540
the vanilla extract into Vmap for the optional video later.

04:10.420 --> 04:12.740
And that's it for the link chain imports.

04:12.740 --> 04:16.620
And now we want to import from our logger.py file.

04:16.620 --> 04:19.660
We want to import all the logging functions.

04:21.140 --> 04:21.620
Cool.

04:21.820 --> 04:24.460
So we're pretty much done with the imports.

04:24.860 --> 04:30.140
And now I want to load the environment variables from my dot env file.

04:30.460 --> 04:33.900
And let me show you which environment variables I have.

04:34.380 --> 04:42.540
The OpenAI API key for the embeddings, the pinecone API key for the pinecone vector store, and then

04:42.540 --> 04:50.420
the LinkedIn API key with link chain tracing v2 equals true and link chain project with the name documentation

04:50.420 --> 04:50.940
helper.

04:51.140 --> 04:56.380
And those three environment variables is to have Langschmidt trace our pipeline.

04:56.740 --> 04:59.100
And lastly the Nvidia API key.

04:59.100 --> 05:02.260
So we can use the Vmap and extract.

05:02.780 --> 05:05.150
As always never share your API keys.

05:05.350 --> 05:09.110
I'll be revoking those API keys once I finish editing this video.

05:09.550 --> 05:10.070
Cool.

05:10.270 --> 05:17.030
So just like we did in previous video, let me now configure the SSL context with a valid certificate

05:17.350 --> 05:19.390
using the certified package.

05:19.550 --> 05:26.030
And again, I remind you this is for making tons of requests for the API which we need that.

05:26.030 --> 05:29.910
So we don't want to encounter a weird SSL certificate error.

05:30.590 --> 05:32.470
So this is defensive programming.

05:32.830 --> 05:39.670
Oh, and this reminds me if you're using a corporate computer and your corporate computer has a VPN

05:39.710 --> 05:44.630
running, then you might be still getting a certificate error.

05:44.950 --> 05:51.350
So what I recommend you to do is to disable the VPN so you will be able to make those requests.

05:53.230 --> 05:53.750
Cool.

05:53.790 --> 05:57.870
Let's go and initialize the classes that we imported.

05:58.110 --> 06:01.510
So first I want to initialize the OpenAI embeddings class.

06:01.710 --> 06:05.200
I'm going to give it the model of text embedding three small.

06:05.840 --> 06:12.280
And if you want to see a cute progress bar while your text is being indexed into a vector, you can

06:12.320 --> 06:15.360
enable this show progress bar flag.

06:15.800 --> 06:18.640
So this flag is disabled by default.

06:18.640 --> 06:25.680
I am simply writing it explicitly to show you that it exists, and we have the argument of chunk size

06:25.680 --> 06:26.760
equals to 50.

06:27.160 --> 06:34.920
And this is going to limit how many text objects blockchain documents were going to embed at every request

06:34.960 --> 06:36.280
we send to OpenAI.

06:36.520 --> 06:43.040
Now why is this important if this number is going to be too big, for example, a thousand, then this

06:43.040 --> 06:48.480
means we can send a thousand of text objects to be embedded in OpenAI.

06:48.840 --> 06:55.720
And depending on our customer tier in OpenAI, and it's going to be the same for any cloud provider

06:55.720 --> 06:57.920
that is giving us an embeddings model.

06:57.920 --> 07:01.720
We get a token per limit limitation that we can embed.

07:02.160 --> 07:09.840
So here we are limiting the number of documents we can embed per request here, and we are limiting

07:09.840 --> 07:10.920
it to 50.

07:10.960 --> 07:18.280
So it's going to be 50 linkchain documents and text objects that we're going to embed at a single request.

07:18.400 --> 07:20.680
So again this is for rate limiting.

07:20.680 --> 07:25.200
If we put this number too high then we're going to get rate limited.

07:25.200 --> 07:31.000
And if this number is going to get to be too low for example one, then it's going to take a lot longer

07:31.000 --> 07:33.280
to process and to embed everything.

07:34.080 --> 07:34.480
All right.

07:34.480 --> 07:37.440
Let's talk about the retry mean seconds.

07:37.720 --> 07:44.960
So in case a batch of hours fails for some reason maybe something is wrong with the payload.

07:44.960 --> 07:47.040
Maybe we are rate limited.

07:47.240 --> 07:49.880
So a lot of times it's going to be rate limits.

07:50.000 --> 07:54.960
And if it's going to be rate limits there are a bunch of strategies of how to handle them.

07:55.200 --> 07:59.000
So one strategy is to use this retry min seconds.

07:59.000 --> 08:06.530
And this means that after a failure we are going to wait 10s before we try to retry it.

08:06.730 --> 08:14.170
And if this number is going to be very big, for example, 60s a minute, then this means that after

08:14.170 --> 08:21.250
each failure, we're going to wait at least one minute before we're going to retry to embed those texts.

08:22.330 --> 08:24.450
So I want to show you this error over here.

08:24.450 --> 08:29.490
So this is an error from my previous tries of indexing this documentation here.

08:29.810 --> 08:35.010
Now I am running a lot of concurrent batch requests.

08:35.010 --> 08:41.250
So this means I'm running concurrent requests where each requests have a batch of a lot of documents

08:41.250 --> 08:41.770
here.

08:41.770 --> 08:45.690
And you can see that some of my batches are being rate limited.

08:46.050 --> 08:51.290
And you can see that in the error message I get I get here A429.

08:51.290 --> 08:55.250
So this is a common rate limiting error code.

08:55.250 --> 09:01.490
And you can see that the message states how much time do I need to wait in order for the rate limit

09:01.490 --> 09:04.530
to reset and my request to pass through.

09:04.850 --> 09:11.300
So you can see for example, 194 milliseconds, 500 milliseconds, etc..

09:11.580 --> 09:19.580
So if I put a very high value in my minimum weight, it's a heuristic that my requests are going to

09:19.620 --> 09:23.220
pass through and they're going to wait enough time.

09:23.380 --> 09:26.540
So I can go and embed the rest of my documents.

09:26.700 --> 09:32.380
Now of course this is a trade off because if I wait too long, then my entire processing time is going

09:32.380 --> 09:33.780
to be too long as well.

09:34.540 --> 09:41.500
And rate limiting errors and handling rate limiting is very common when you take an application to production

09:41.500 --> 09:43.300
and when you handle scale.

09:43.820 --> 09:49.700
And the rate limiting concept is basically for every third party you're going to use, which is cloud

09:49.780 --> 09:50.260
based.

09:50.580 --> 09:53.220
And it's very known in the industry.

09:53.380 --> 09:57.580
And there are many algorithms of how to calculate rate limiting.

09:57.820 --> 10:00.660
Each vendor calculates rate limiting differently.

10:00.700 --> 10:05.540
You have token bucket, leaky bucket, and tons of algorithms that help you do this.

10:05.580 --> 10:10.700
We're not going to get into this, but overall, you should know and you should understand the concept.

10:10.700 --> 10:14.940
And especially in generative AI applications we need to handle rate limiting.

10:15.060 --> 10:21.900
So here when we enable retry mean seconds equals ten then this is some way of handling this.

10:22.980 --> 10:28.620
All right so let me paste this snippet over here and let me cover the rest of the objects.

10:28.620 --> 10:29.620
Initialization.

10:30.100 --> 10:36.700
What I have here in commented out is initialization of a local ODB vector store.

10:37.140 --> 10:41.500
Now they persist directory is going to be where it's going to be persisted.

10:41.660 --> 10:44.140
Now I marked it chrome underscore db.

10:44.180 --> 10:50.060
So this means that the DB is going to be stored under the current working directory of our project.

10:50.060 --> 10:52.940
And it's going to create a Chrome DB directory.

10:52.940 --> 10:55.380
So you can see I have it right here on the left side.

10:55.700 --> 11:01.300
And I give it the embedding function of the embeddings objects I created earlier.

11:01.460 --> 11:04.380
So this is in case you would want to use chrome db.

11:04.660 --> 11:08.870
However I will be using the pine cone cloud based vector store.

11:09.030 --> 11:14.750
And here I'm giving it the index name of length index 2025.

11:14.870 --> 11:18.310
And let me go to Pine Cone and let me create an index.

11:18.310 --> 11:22.670
Let me call it Length docs 2025.

11:23.310 --> 11:30.590
And in the embeddings model here I want to select OpenAI's embedding text small, and I'll choose the

11:30.590 --> 11:33.270
dimension to be one 536.

11:33.910 --> 11:39.830
I'll go with the serverless option because I don't want to handle any scaling and all those abilities

11:39.830 --> 11:40.390
myself.

11:40.870 --> 11:45.790
And let me go with the default cloud provider here, which is going to be AWS.

11:47.470 --> 11:50.110
And I'll go also with the default region.

11:50.910 --> 11:53.230
Let me go and click here create.

11:57.990 --> 12:00.710
And my index is live and running.

12:00.710 --> 12:03.310
And we can use it now right.

12:03.310 --> 12:04.990
Let's go back to the code.

12:06.710 --> 12:14.110
And I remind you, when we created the index, then we gave it the dimensions that are going to match

12:14.110 --> 12:17.670
the text embeddings three small embedding size.

12:17.710 --> 12:25.630
And here I'm initializing all the objects with the emphasis on which is going to take the URL and get

12:25.670 --> 12:27.070
a documentation from it.

12:27.630 --> 12:35.590
And also to extract and to map for alternative way of scraping and crawling a bit more manually the

12:35.750 --> 12:37.150
link and documentation.

12:37.150 --> 12:39.510
But this is going to be on the optional video.

12:40.270 --> 12:42.510
Ah wow, this video was pretty long.

12:42.750 --> 12:45.510
We are done with all the initialization.

12:45.830 --> 12:52.110
Let's go and run everything just to see that everything is compiles and we don't get any initialization

12:52.230 --> 12:52.590
error.

12:58.710 --> 13:02.790
And we can see it ran successfully and we didn't get any error.

13:03.470 --> 13:04.110
Amazing.