1
00:00:05,570 --> 00:00:13,850
In this lesson, we are going to talk about a very, I would say, very good question, which is if

2
00:00:13,850 --> 00:00:17,090
GPT four vision is so good.

3
00:00:17,780 --> 00:00:22,850
Why do we need multi-modal LM applications for?

4
00:00:24,160 --> 00:00:25,030
And.

5
00:00:26,640 --> 00:00:34,680
In order to talk about this, we need to remember the limitations of the foundational LM models.

6
00:00:34,680 --> 00:00:44,400
We talk about that in a some in one of the initial lessons of the part two of the program.

7
00:00:44,400 --> 00:00:53,430
So if you remember the foundational LM models like ChatGPT, for example, or anthropic or, or Falcon,

8
00:00:53,430 --> 00:01:00,300
Mistral, etc., they have limited context window which means memory.

9
00:01:00,630 --> 00:01:06,150
And they have a limitations regarding data privacy, data security, etc..

10
00:01:06,330 --> 00:01:12,120
Do you remember all these limitations of the regular foundation LM models?

11
00:01:12,120 --> 00:01:21,810
So with the multi modal foundation LM models we have the same limitations.

12
00:01:21,810 --> 00:01:24,180
We have a limited context window.

13
00:01:24,210 --> 00:01:28,650
We have a problem with data privacy, we have a problem with security etc..

14
00:01:28,650 --> 00:01:32,190
So the same companies that.

15
00:01:33,200 --> 00:01:42,200
Did not trust ChatGPT with their text because they have concerns about privacy, security, memory,

16
00:01:42,200 --> 00:01:50,210
etc. are going to be the same companies that are not going to trust GPT four vision with their images

17
00:01:50,210 --> 00:02:00,200
or tables exactly for the same reasons privacy, security, memory, length, etc., etc. and this is

18
00:02:00,200 --> 00:02:10,669
very good for us because this allows us to build the multimodal LM applications that can solve all these

19
00:02:10,669 --> 00:02:19,190
problems that GPT four vision has for many possible customers, especially a big companies, but also

20
00:02:19,190 --> 00:02:27,740
mid-sized companies, public administrations, and any kind of organization or professional that has

21
00:02:27,740 --> 00:02:30,170
a concern about these matters.

22
00:02:30,170 --> 00:02:43,850
Okay, so the most common use case would be to ask questions about private, private multimodal documents.

23
00:02:43,850 --> 00:02:44,660
Okay.

24
00:02:44,660 --> 00:02:53,930
So in your company you want to use a multimodal LM with your.

25
00:02:54,660 --> 00:03:02,700
Multimodal documents with your PowerPoint, PowerPoint presentations, you know, PDF documents, every

26
00:03:02,700 --> 00:03:03,960
kind of documents.

27
00:03:03,960 --> 00:03:13,650
But you have a problem with privacy, security, memory length, etc. that's why we are going to use

28
00:03:14,160 --> 00:03:21,870
multimodal LLM applications like the one we are going to see in the example at the end of this block.

29
00:03:24,490 --> 00:03:36,460
So the first thing we are going to see a before creating a multi modal LM app is going to be the alternative

30
00:03:36,460 --> 00:03:42,670
ways we can use to create a multi modal LM application today.

31
00:03:42,850 --> 00:03:50,290
And very important to understand how accurate are these alternative ways.

32
00:03:50,290 --> 00:03:58,810
In order to focus our attention in the most efficient and most accurate method of all.

33
00:03:59,830 --> 00:04:02,230
They see that in the next lesson.

