WEBVTT

00:00.320 --> 00:08.320
The document class is one of Link Chain's core building blocks, and it serves as a standard container

00:08.320 --> 00:09.920
for handling text.

00:10.160 --> 00:16.320
And it's designed to package a piece of text along with contextual information.

00:16.880 --> 00:20.400
So every document object consists of two main parts.

00:20.600 --> 00:27.320
We have the page content, and this holds the actual text like a paragraph from an article or a page

00:27.320 --> 00:29.440
from a PDF or any other text.

00:30.200 --> 00:32.320
And we have the metadata.

00:32.480 --> 00:38.800
And this is a dictionary that stores additional details about the text like the source where it's coming

00:38.800 --> 00:45.880
from, maybe the file name, or maybe a URL, or maybe page number or some custom tags that we want

00:45.880 --> 00:46.560
to add it.

00:46.960 --> 00:52.440
And this structure is super important for our link chain Rag workflows.

00:52.640 --> 00:58.520
So when we load data from various sources which LinkedIn supports, it supports tons of integrations.

00:58.920 --> 01:02.360
It's converted into a list of documents objects.

01:02.560 --> 01:08.920
And these can be broken down into smaller chunks later which themselves are documents instances.

01:09.400 --> 01:17.840
So the attached metadata is super useful when we want to enable some filtering or retrieval logic later.

01:17.880 --> 01:22.000
The pipeline, which is very crucial for advanced applications.
