1
00:00:06,100 --> 00:00:16,420
In this lesson, we are going to see the steps we are going to follow to create a multi modal LM application

2
00:00:16,420 --> 00:00:18,850
in a using launch.

3
00:00:20,730 --> 00:00:24,720
So these are just the conceptual steps.

4
00:00:24,870 --> 00:00:31,770
And in the following lesson the following video we are going to see this in practice.

5
00:00:32,369 --> 00:00:41,250
You will see in these steps that some of them are familiar to you because they are similar to our traditional

6
00:00:41,250 --> 00:00:46,440
rack technique, but some of them are also new and different.

7
00:00:46,710 --> 00:00:53,640
So the first thing we are going to do as, as always, is to load the necessary modules and you will

8
00:00:53,640 --> 00:01:00,390
see that we are going to use some new modules for this kind of applications.

9
00:01:00,390 --> 00:01:08,910
And you will see that among all these new modules, one of them is especially important is one module

10
00:01:08,910 --> 00:01:10,830
called unstructured.

11
00:01:10,860 --> 00:01:19,080
You will see that we are going to use this module to extract the different multimodal elements of our

12
00:01:19,080 --> 00:01:19,710
document.

13
00:01:19,710 --> 00:01:28,770
So the unstructured module is able to extract the text elements, the table elements and the image element

14
00:01:28,860 --> 00:01:31,230
elements in our project.

15
00:01:31,710 --> 00:01:36,000
We are going to use a sample PDF.

16
00:01:36,000 --> 00:01:37,410
A very simple.

17
00:01:37,560 --> 00:01:45,150
We have created is like a financial statement with text tables and images.

18
00:01:45,840 --> 00:01:54,120
We will extract the text, image and table elements with partition PDF which is one one component of

19
00:01:54,120 --> 00:01:55,470
of the module.

20
00:01:56,010 --> 00:02:07,920
We will see extracted images uh, going into a folder output folder uh, that we create, we will store

21
00:02:07,920 --> 00:02:14,640
the texts, the tables and the encoded images in three, uh, Python lists.

22
00:02:14,640 --> 00:02:19,740
And you will see, uh, how do we encode the images.

23
00:02:19,740 --> 00:02:25,770
We will summarize the text and the tables using, uh chat GPT three.

24
00:02:25,770 --> 00:02:31,320
And then we will summarize the images using GPT four vision.

25
00:02:32,770 --> 00:02:39,310
Finally, we will apply the regular Rag technique using a vector database.

26
00:02:39,700 --> 00:02:43,150
A new uh, component called Doc store.

27
00:02:43,150 --> 00:02:48,970
We will see what this is and a Multi-vector or Multi-vector retrieval.

28
00:02:49,000 --> 00:02:51,130
This is also a new element for us.

29
00:02:51,130 --> 00:02:53,650
You will see that in the exercise.

30
00:02:53,650 --> 00:03:00,940
And finally using this Multi-vector retriever as context this is very important.

31
00:03:00,940 --> 00:03:06,490
We can now ask anything about our multi-modal private PDF file.

32
00:03:06,490 --> 00:03:07,030
Okay.

33
00:03:07,030 --> 00:03:12,910
So these are just the steps we are going to follow in the next video.

34
00:03:12,910 --> 00:03:19,570
And you will see in the next video that you will be able to see the PDF document with the different

35
00:03:19,570 --> 00:03:20,200
elements.

36
00:03:20,200 --> 00:03:25,720
And you will also see a notebook with the code, but also detailed explanation.

37
00:03:25,720 --> 00:03:30,880
So don't worry, this is just a quick summary of the steps we are going to follow.

38
00:03:30,880 --> 00:03:37,990
But in the notebook you have these steps and also detailed explanations about all of them.

39
00:03:37,990 --> 00:03:47,410
Okay, so in the next lesson we are going to see and this uh project in practice super super interesting.

40
00:03:47,410 --> 00:03:50,320
This new, uh, this the next video.