1
00:00:03,230 --> 00:00:03,620
Okay.

2
00:00:03,620 --> 00:00:09,560
So here you have our multimodal a PDF.

3
00:00:10,100 --> 00:00:18,230
This is a fake financial statement we have created as you can see very simple one.

4
00:00:18,230 --> 00:00:27,380
But we have included a the different elements that we need in order to test our multimodal LM application.

5
00:00:27,380 --> 00:00:29,300
We have images.

6
00:00:29,510 --> 00:00:31,400
We have text.

7
00:00:32,159 --> 00:00:39,270
We have tables, we have diagrams and you will see how impressive.

8
00:00:39,450 --> 00:00:47,790
And the LM, the multimodal LM application is here in the right side of the screen you have the notebook

9
00:00:47,790 --> 00:00:49,680
that you will be able to download.

10
00:00:49,680 --> 00:00:54,240
You are also going to have this this PDF for you.

11
00:00:54,240 --> 00:01:03,870
But a in the notebook you can see detailed explanations and also the code that we are using for this

12
00:01:03,870 --> 00:01:04,440
exercise.

13
00:01:04,440 --> 00:01:13,230
So remember we are going to do multimodal RAC with GPT for vision and language okay.

14
00:01:13,230 --> 00:01:23,850
So this is going to be a multimodal LM application that is using the RAC technique with GPT for vision.

15
00:01:24,150 --> 00:01:29,100
Also GPT three and long chain as orchestration framework.

16
00:01:29,190 --> 00:01:34,830
And chain is doing a very good job right now with multimodal LM applications.

17
00:01:35,800 --> 00:01:36,100
First.

18
00:01:36,100 --> 00:01:38,620
We include here a short introduction.

19
00:01:38,620 --> 00:01:42,160
When do we need a multimodal RAC?

20
00:01:42,970 --> 00:01:46,360
So standard sec with text only files.

21
00:01:46,360 --> 00:01:52,600
But what if we want to use Rag with PDFs or slides that have text, images and tables?

22
00:01:52,600 --> 00:01:55,060
Then we use the multi-modal rag.

23
00:01:55,090 --> 00:01:55,810
Okay.

24
00:01:55,810 --> 00:01:57,910
So these are the different steps.

25
00:01:57,910 --> 00:02:04,000
We have a already talked about this in the previous slide.

26
00:02:04,000 --> 00:02:05,500
But here you have them.

27
00:02:05,560 --> 00:02:10,300
And these are the modules we are going to install.

28
00:02:10,300 --> 00:02:21,070
Remember that A you you will need to remove the pound sign in order to uh execute uh this these cells

29
00:02:21,070 --> 00:02:22,270
in your computer.

30
00:02:22,270 --> 00:02:27,850
We, we use the pound here because we don't want to re-execute or execute it again.

31
00:02:27,850 --> 00:02:29,020
The cell okay.

32
00:02:29,020 --> 00:02:35,620
So we are going the first thing, as always, is to create a virtual environment.

33
00:02:35,620 --> 00:02:36,070
Okay.

34
00:02:36,070 --> 00:02:43,990
So after you have created the virtual environment then you install these modules like chain, OpenAI,

35
00:02:44,020 --> 00:02:48,040
pedantic, Alex ML, chroma db.

36
00:02:48,070 --> 00:02:51,970
This is the vector database type token in order to manage the tokens.

37
00:02:51,970 --> 00:02:53,860
And this is the important part.

38
00:02:53,860 --> 00:02:58,840
We are installing the unstructured module.

39
00:02:59,600 --> 00:03:06,170
In this case, we are installing the full version of the module with this, uh, with this, uh, way

40
00:03:06,170 --> 00:03:06,770
here.

41
00:03:08,060 --> 00:03:11,360
So the unstructured module is the key.

42
00:03:11,720 --> 00:03:17,840
We will use it to extract all the relevant parts of the document, the text, the tables and the images.

43
00:03:17,840 --> 00:03:22,760
And Koroma DB will be our vector store or vector database.

44
00:03:22,760 --> 00:03:27,800
Remember vector store and vector database are a synonymous okay.

45
00:03:28,920 --> 00:03:30,930
So this is important.

46
00:03:30,930 --> 00:03:39,630
In order to use the unstructured modules, you will need to install two other modules the Tesseract

47
00:03:39,630 --> 00:03:42,090
module and the poplar module.

48
00:03:42,210 --> 00:03:48,450
Here you have the instructions in order to, uh, install these modules in your computer.

49
00:03:48,450 --> 00:03:56,700
If you are using a mac, uh, like we do, but also if you are using a different system like Windows

50
00:03:56,700 --> 00:03:57,870
or Linux or whatever.

51
00:03:57,870 --> 00:04:05,880
So these are the instructions, the steps to follow in order to install a tesseract and poplar in your

52
00:04:05,880 --> 00:04:07,500
Mac with homebrew.

53
00:04:07,500 --> 00:04:15,300
And if you are using a different system, here you have the links with uh, instructions to to do that

54
00:04:15,300 --> 00:04:17,010
in, in a different system.

55
00:04:17,010 --> 00:04:17,550
Okay.

56
00:04:18,060 --> 00:04:24,960
So here we say we will use a fake, uh, PDF file with text tables and images.

57
00:04:24,960 --> 00:04:29,040
So this is the name of this PDF file you see here.

58
00:04:29,400 --> 00:04:35,670
And we describe what we are going to do in the code okay.

59
00:04:35,670 --> 00:04:41,730
So this is an explanation of the steps you have here in the code.

60
00:04:41,730 --> 00:04:42,420
Okay.

61
00:04:42,990 --> 00:04:43,230
Uh.

62
00:04:44,120 --> 00:04:47,030
In summary, what we are doing here is.

63
00:04:48,020 --> 00:04:49,160
This is important.

64
00:04:49,160 --> 00:05:02,270
We are telling a long chain the name of the output file where we want to load the images that unstructured

65
00:05:02,390 --> 00:05:03,200
extracts.

66
00:05:03,230 --> 00:05:10,970
Okay, so the unstructured model is, is going to come to this file and is going to grab this image

67
00:05:10,970 --> 00:05:19,550
and it's going to save this image, the extracted image in the output path we are defining here we are

68
00:05:19,550 --> 00:05:22,760
calling the the output path figures okay.

69
00:05:22,760 --> 00:05:31,550
So this a cell is going to create a new folder in our root directory called figures.

70
00:05:31,550 --> 00:05:40,970
And in this figures folder it is going to save the A image elements that the unstructured module is

71
00:05:40,970 --> 00:05:42,740
going to extract.

72
00:05:42,740 --> 00:05:43,310
Okay.

73
00:05:44,150 --> 00:05:50,750
So this is where we configure our this extraction okay.

74
00:05:50,750 --> 00:06:03,980
So we say okay this is the name of the PDF file that you have to a look for in the root directory a

75
00:06:03,980 --> 00:06:05,750
in order to extract your elements.

76
00:06:05,750 --> 00:06:14,060
So you will you will need to have this PDF file in the same directory that you where you have this notebook

77
00:06:14,060 --> 00:06:14,570
okay.

78
00:06:14,570 --> 00:06:18,650
And here is where we are configuring this extraction.

79
00:06:18,650 --> 00:06:22,610
We say okay extract images and then also tables.

80
00:06:22,610 --> 00:06:26,360
And these are the the the limits we are defining etc..

81
00:06:26,360 --> 00:06:34,850
All this you can a learn more about that in the unstructured module documentation etc..

82
00:06:34,850 --> 00:06:40,010
But these are very regular uh configurations okay.

83
00:06:40,010 --> 00:06:49,160
So with this cell we are extracting the table text and image elements of this PDF.

84
00:06:50,900 --> 00:07:01,280
Then a we can see we have a store all these elements in this variable row pdf elements.

85
00:07:01,340 --> 00:07:09,410
If we print what we have here you will see that okay we have the text.

86
00:07:10,230 --> 00:07:13,830
Two text elements, and this one and this one.

87
00:07:13,830 --> 00:07:16,380
And we have one table element.

88
00:07:16,380 --> 00:07:23,670
And we also have if you look in your directory we have created the figures folder.

89
00:07:23,670 --> 00:07:28,650
And in the figures folder you already have the images the extracted images.

90
00:07:28,650 --> 00:07:33,930
And remember that in this particular PDF file.

91
00:07:34,740 --> 00:07:36,060
This is an image.

92
00:07:36,060 --> 00:07:37,470
This is an image.

93
00:07:37,470 --> 00:07:38,880
This is an image.

94
00:07:38,880 --> 00:07:40,230
This is an image.

95
00:07:40,230 --> 00:07:42,780
This is an image and this is an image.

96
00:07:42,780 --> 00:07:43,170
Okay.

97
00:07:43,170 --> 00:07:45,810
So I think we have like eight images.

98
00:07:46,710 --> 00:07:52,470
So then a we want to extract the relevant information.

99
00:07:52,470 --> 00:07:59,910
So we want to store the text the table and the image elements in three lists and in three Python lists.

100
00:08:00,800 --> 00:08:03,710
We cannot send the images as they are.

101
00:08:03,710 --> 00:08:10,820
We need to convert them into binary format and we will use the module base64 for that.

102
00:08:11,570 --> 00:08:16,880
And for the text and table elements, we will loop to add them in their list.

103
00:08:16,880 --> 00:08:25,460
Okay, so we import the base64 in order to convert the images into binary format.

104
00:08:25,460 --> 00:08:31,250
And then we create the the text table and images elements.

105
00:08:31,400 --> 00:08:32,299
Python list.

106
00:08:32,299 --> 00:08:43,820
Initially empty, we encode the images using base64, and then we loop into the text and table elements

107
00:08:43,820 --> 00:08:47,900
in order to create the two Python lists.

108
00:08:47,930 --> 00:08:54,200
Okay, we call text elements the list for the text elements and table elements.

109
00:08:54,200 --> 00:08:56,690
The list for table elements.

110
00:08:57,530 --> 00:09:01,070
Then what we are doing is extract just the text.

111
00:09:01,070 --> 00:09:03,680
We don't want to store the the row classes.

112
00:09:03,680 --> 00:09:05,690
So this is how we do it.

113
00:09:05,690 --> 00:09:11,480
And then we can print the number of table elements and text elements we have.

114
00:09:11,480 --> 00:09:15,620
So as you can see we have one table in this PDF file.

115
00:09:15,620 --> 00:09:16,850
This is the table.

116
00:09:16,850 --> 00:09:21,200
And we have two text elements in this PDF file.

117
00:09:21,200 --> 00:09:25,940
So this is one text element and this is the second text element.

118
00:09:27,780 --> 00:09:29,340
Regarding the images.

119
00:09:29,340 --> 00:09:34,050
They are currently stored in the figures folder.

120
00:09:35,810 --> 00:09:39,710
So if you check in the figures folder, you will see the extracted images.

121
00:09:39,920 --> 00:09:50,270
We will loop through that folder and check if the image file ends with png, jpg, and then we will

122
00:09:50,270 --> 00:09:59,960
provide the full page to to the encode image function to in order to encode it in a base64 format.

123
00:10:00,200 --> 00:10:06,080
And then we will enter the encoding result in the image list.

124
00:10:06,170 --> 00:10:06,860
Okay.

125
00:10:07,340 --> 00:10:08,690
Encoded result.

126
00:10:09,390 --> 00:10:12,480
The following cell may take a few seconds to run.

127
00:10:12,600 --> 00:10:16,170
In my case, it was pretty.

128
00:10:17,030 --> 00:10:23,540
Fast, but consider that this is a very simple and short PDF file.

129
00:10:23,540 --> 00:10:29,720
So the longer your file, the slower this cell is going to run.

130
00:10:29,720 --> 00:10:34,610
When you apply this, uh, this code to different, uh, projects.

131
00:10:35,480 --> 00:10:39,440
So this is where, uh, we are.

132
00:10:39,440 --> 00:10:40,520
Uh uh.

133
00:10:41,360 --> 00:10:44,510
Doing all these operations in this cell.

134
00:10:44,510 --> 00:10:50,450
And if we print the number of image elements, you will see that we have eight elements image elements.

135
00:10:50,450 --> 00:10:51,020
So this is

136
00:10:51,020 --> 00:10:57,500
123456I

137
00:10:57,500 --> 00:10:58,670
don't know what are the other ones.

138
00:10:58,670 --> 00:11:00,320
Probably these ones are images.

139
00:11:01,220 --> 00:11:11,210
So now we can create three functions to summarize the text the tables and the images for the text and

140
00:11:11,210 --> 00:11:11,930
the table.

141
00:11:11,930 --> 00:11:19,520
The functions are very similar, but for the images we are going to use GPT for vision.

142
00:11:19,520 --> 00:11:25,370
So this is the main difference with images in the application.

143
00:11:25,370 --> 00:11:32,120
And it is important we make a note of this to pay attention on how we set the URL.

144
00:11:32,120 --> 00:11:33,380
Let's see this later.

145
00:11:33,380 --> 00:11:36,800
So first we install long chain OpenAI.

146
00:11:37,640 --> 00:11:43,940
Remember to load your OpenAI API key in the dot m file.

147
00:11:43,940 --> 00:11:53,750
So you will have to load your dot m file with your OpenAI API key in the root directory of this project.

148
00:11:54,610 --> 00:11:55,570
Uh, with this.

149
00:11:55,570 --> 00:11:57,340
You are familiar with this cell.

150
00:11:57,460 --> 00:12:02,830
And what we are getting here is the OpenAI key that is in the dot m file.

151
00:12:02,830 --> 00:12:04,000
And then.

152
00:12:04,580 --> 00:12:12,080
We proceed with the creation of three functions to summarize the text, the table and the images, and

153
00:12:12,080 --> 00:12:20,000
you will see that the ones for text and tables are very similar, and also we are familiar with them.

154
00:12:20,000 --> 00:12:26,780
But if you go to the third one to summarize images, you see that we are using.

155
00:12:27,750 --> 00:12:29,340
Chat GPT four.

156
00:12:29,370 --> 00:12:38,400
Okay, so here we are creating two different chains using GPT three and GPT four.

157
00:12:38,400 --> 00:12:44,340
And you see we are here loading both okay.

158
00:12:44,340 --> 00:12:54,090
So in this case the way we proceed in order to get a summary of each of the images is a little bit different.

159
00:12:54,090 --> 00:13:00,720
We are communicating ourselves with GPT four vision, and we are telling you are a bot that is good

160
00:13:00,720 --> 00:13:02,310
at analyzing images.

161
00:13:02,310 --> 00:13:08,280
And, uh, what we want is you to describe the contents of this image, okay?

162
00:13:08,280 --> 00:13:14,730
And this is the important thing to keep in mind the way we are.

163
00:13:15,990 --> 00:13:18,630
Setting the URL of the image.

164
00:13:18,630 --> 00:13:19,260
Okay.

165
00:13:19,740 --> 00:13:21,600
So pay attention.

166
00:13:21,750 --> 00:13:31,980
We are saying okay, this is an image and a remember to include the base64 encoded image, etc..

167
00:13:31,980 --> 00:13:32,700
Okay.

168
00:13:32,880 --> 00:13:40,890
So once we have that we will create a summary for each text table and image element.

169
00:13:40,890 --> 00:13:45,810
So the following cells will take also some time to run.

170
00:13:46,260 --> 00:13:54,000
Be careful because GPT four vision is significantly more expensive than the regular GPT models.

171
00:13:54,000 --> 00:14:01,440
So this exercise is not going to be expensive to run because this is a very simple document.

172
00:14:01,440 --> 00:14:11,070
But once you start using GPT four vision with a, you know, long documents, be careful because GPT

173
00:14:11,070 --> 00:14:13,560
four vision is more expensive.

174
00:14:13,710 --> 00:14:16,380
It also has some.

175
00:14:17,070 --> 00:14:20,280
Sometimes some stability issues.

176
00:14:20,280 --> 00:14:26,820
So OpenAI is saying that they are getting better and better with that.

177
00:14:26,820 --> 00:14:28,620
But pay attention to the cost.

178
00:14:28,620 --> 00:14:34,500
Pay also also attention to the stability of the GPT four vision API.

179
00:14:35,620 --> 00:14:43,300
So in order to save time and tokens, we will only summarize the first two elements of each kind.

180
00:14:43,330 --> 00:14:48,340
Well, this is not true because we can remove this.

181
00:14:49,480 --> 00:14:50,500
Sexercize.

182
00:14:50,500 --> 00:14:52,690
We have done all of them.

183
00:14:54,120 --> 00:14:55,050
So.

184
00:14:56,130 --> 00:15:00,120
Here we are processing the text elements.

185
00:15:01,530 --> 00:15:11,520
What we have included here is just a way to stop at the element you want.

186
00:15:11,520 --> 00:15:20,670
So let's say, for example, you have, I don't know, 20 text elements or 20 images and you are just

187
00:15:20,670 --> 00:15:25,470
running an MVP or, you know, a sample or a whatever.

188
00:15:25,470 --> 00:15:35,070
You are just trying your application or showing your application to, to a quick a review team or whatever.

189
00:15:35,070 --> 00:15:40,920
And you don't want to spend a lot of time or a lot of money with the application, you can limit the

190
00:15:40,920 --> 00:15:43,530
number of elements you can process.

191
00:15:43,530 --> 00:15:50,640
In this case, we are going to process all the text elements we have, which are two and all the tables

192
00:15:50,640 --> 00:15:55,260
we have, which is one, and all the images we have, which are eight.

193
00:15:55,260 --> 00:15:55,980
Okay.

194
00:15:56,430 --> 00:16:03,150
So we do that and after creating the summaries we can now proceed with the.

195
00:16:03,760 --> 00:16:06,280
More or less typical rack technique.

196
00:16:06,280 --> 00:16:06,640
Okay.

197
00:16:06,640 --> 00:16:12,850
So the important thing here, this is the important part or one of the most important parts is that

198
00:16:12,850 --> 00:16:14,800
we are going to use.

199
00:16:15,960 --> 00:16:26,460
The long chains multi Vector Retriever to store all our documents, the summaries and also the embeddings

200
00:16:26,460 --> 00:16:28,560
in a vector database.

201
00:16:28,560 --> 00:16:35,580
So here you have a link to the Multi Vector Retriever documentation page.

202
00:16:35,580 --> 00:16:36,510
In long chain.

203
00:16:36,510 --> 00:16:43,230
You will you will see there interesting information and different examples about that.

204
00:16:43,500 --> 00:16:48,630
We will use chroma uh for our vector database.

205
00:16:48,630 --> 00:16:57,150
And we will use a doc store to store the raw documents, the original documents for that we will use

206
00:16:57,150 --> 00:16:59,340
in-memory store from long chain.

207
00:16:59,340 --> 00:16:59,610
Okay.

208
00:16:59,610 --> 00:17:10,230
This is also a new part uh, for uh for our multimodal RAC we will provide later the ID key.

209
00:17:10,230 --> 00:17:12,150
Now it is just a string.

210
00:17:12,329 --> 00:17:15,329
You will see that in the code then.

211
00:17:15,480 --> 00:17:21,000
So we are explaining here everything that you are going to see in the code here.

212
00:17:21,000 --> 00:17:21,329
Okay.

213
00:17:21,329 --> 00:17:24,630
So these are the steps and this is the code.

214
00:17:25,170 --> 00:17:26,339
So.

215
00:17:28,440 --> 00:17:29,730
Then we create.

216
00:17:29,730 --> 00:17:33,840
We will create the function to add documents to the retriever.

217
00:17:33,840 --> 00:17:42,510
We will create some IDs for our documents using the Uuid for function.

218
00:17:42,510 --> 00:17:50,970
Here you have an explanation on what is this function, what is the purpose of it in the exercise,

219
00:17:50,970 --> 00:17:57,150
etc. then we will create a list of documents using the document class class.

220
00:17:58,710 --> 00:18:03,330
A and then we add the documents to the vector database.

221
00:18:03,360 --> 00:18:06,900
We also store our raw documents in the doc store.

222
00:18:06,930 --> 00:18:11,790
Each raw document has the corresponding UID.

223
00:18:11,820 --> 00:18:19,680
Okay, so as you can see, the connection between the vector database and the doc store is in the.

224
00:18:20,500 --> 00:18:21,280
Ideas.

225
00:18:21,310 --> 00:18:23,170
Okay, so here you have it.

226
00:18:23,170 --> 00:18:26,020
We are importing the ID module.

227
00:18:26,020 --> 00:18:36,010
And then we are using a more or less typical rag with a couple of changes the Multi-vector retriever

228
00:18:36,010 --> 00:18:39,130
and also the in-memory store dog store.

229
00:18:39,160 --> 00:18:45,730
Okay so here we are initializing a chroma vector database and the dog store.

230
00:18:45,730 --> 00:18:48,970
We are initializing the Multi-vector retriever.

231
00:18:48,970 --> 00:18:54,610
And here you have the function to add documents into the Multi-vector retriever.

232
00:18:55,230 --> 00:19:01,170
Now we can add everything to the Multi-vector retriever for text and tables.

233
00:19:01,320 --> 00:19:06,330
Summaries are stored in the vector database and raw documents are stored in the doc store.

234
00:19:06,420 --> 00:19:14,430
But for the images, the summaries are stored in the vector database, but summaries, not the raw images,

235
00:19:14,430 --> 00:19:17,160
are also stored in the doc store.

236
00:19:17,190 --> 00:19:19,500
This is an important distinction.

237
00:19:19,500 --> 00:19:25,830
Here you have the raw documents for text and tables, but in the case of the images, what you have

238
00:19:25,830 --> 00:19:27,090
is the summary.

239
00:19:27,180 --> 00:19:27,930
Okay.

240
00:19:28,940 --> 00:19:35,810
So this is how we are adding everything into the Multi-vector retriever.

241
00:19:35,810 --> 00:19:40,640
And after adding everything into the retriever, we can now retrieve the information.

242
00:19:40,640 --> 00:19:45,590
We can use this approach, but mostly what we want is to use this approach.

243
00:19:45,590 --> 00:19:50,840
So we want to have the Multi-vector retriever as the context.

244
00:19:50,840 --> 00:19:51,410
Okay.

245
00:19:51,410 --> 00:20:01,550
So when we configure our chain in order to get responses from our RAC application multimodal RAC application,

246
00:20:01,550 --> 00:20:07,730
when we define the chain in the context, we are going to use the Multi-vector retriever.

247
00:20:07,730 --> 00:20:08,090
Okay.

248
00:20:08,090 --> 00:20:09,770
This is the main difference.

249
00:20:09,770 --> 00:20:16,880
As you can see, we are using here the Lang chain expression language in order to create this change.

250
00:20:16,880 --> 00:20:25,520
So once we have that we can start uh, making questions about text, images or table elements in our

251
00:20:25,520 --> 00:20:26,330
document.

252
00:20:26,330 --> 00:20:35,150
And you will see here in this examples how precise how accurate this multimodal Elm applications are.

253
00:20:35,150 --> 00:20:35,840
This is amazing.

254
00:20:35,840 --> 00:20:39,050
So we can ask this is a very easy question.

255
00:20:39,050 --> 00:20:40,520
What do you see in the images.

256
00:20:40,520 --> 00:20:46,760
And you know, it responds uh correctly more or less.

257
00:20:46,760 --> 00:20:51,470
Then we can go, you know, into more detail like what is the name of the company?

258
00:20:51,890 --> 00:20:54,980
The name of the company is is right.

259
00:20:55,960 --> 00:20:58,630
What is the product displayed in the image?

260
00:20:58,630 --> 00:21:01,180
So we are asking about this product.

261
00:21:01,210 --> 00:21:02,290
The product is playing.

262
00:21:02,290 --> 00:21:04,300
The image is a modern graphics card.

263
00:21:04,330 --> 00:21:10,300
This is a exactly that a how much are the total expenses.

264
00:21:10,300 --> 00:21:14,020
So we are asking about this number in the table.

265
00:21:14,020 --> 00:21:17,080
And again the response is right.

266
00:21:17,080 --> 00:21:21,310
What is the ROI that the return on investment.

267
00:21:21,340 --> 00:21:23,260
Here you have this data.

268
00:21:23,740 --> 00:21:25,240
The answer is right.

269
00:21:25,240 --> 00:21:31,150
Again how much did the company sell in 2023.

270
00:21:31,750 --> 00:21:37,420
Here we can see that uh, the answer.

271
00:21:38,110 --> 00:21:44,260
Well, the note I included here is see that the previous answer can be seen as a mistake if we look

272
00:21:44,260 --> 00:21:45,550
at the bar chart.

273
00:21:45,550 --> 00:21:54,220
So if we look at this bar chart, you will see that the 22 million sales correspond to the 2024 year.

274
00:21:54,220 --> 00:22:00,550
But it is not very clear because, for example, we don't know if these are sales and we we are here.

275
00:22:00,550 --> 00:22:02,680
The name a.

276
00:22:03,660 --> 00:22:04,950
22.

277
00:22:05,190 --> 00:22:06,030
These are an image.

278
00:22:06,060 --> 00:22:07,530
22 million in sales.

279
00:22:07,530 --> 00:22:16,890
And since these are financial statement, ChatGPT may assume that the financial statement is for the

280
00:22:16,890 --> 00:22:18,030
current year.

281
00:22:18,030 --> 00:22:22,140
So we are talking about the results of the 20.

282
00:22:23,660 --> 00:22:25,220
23 year.

283
00:22:25,250 --> 00:22:30,980
Okay, so probably this is a little bit confusing even for a human.

284
00:22:30,980 --> 00:22:37,550
So this response may reflect the confusion that we are providing here in the document.

285
00:22:37,550 --> 00:22:43,940
So what I see here is see that the previous answer can be seen as a mistake if we look at the bar chart.

286
00:22:43,940 --> 00:22:51,230
But we have to admit that the PDF is a bit confusing about it, since it highlights the 22 million sales

287
00:22:51,230 --> 00:22:52,700
in two different places.

288
00:22:53,930 --> 00:23:02,420
The interesting thing is that later I ask and in 2022 so 2022.

289
00:23:02,450 --> 00:23:06,500
So how much did the company sell the previous year?

290
00:23:06,740 --> 00:23:13,880
And right now the answer is correct considering the bar chart.

291
00:23:14,060 --> 00:23:15,710
So this is the answer.

292
00:23:15,710 --> 00:23:24,110
In 2022, the approximate value represented on the bar chart is 15 units.

293
00:23:24,230 --> 00:23:26,540
It's super interesting.

294
00:23:26,540 --> 00:23:32,330
So the note here says see that now GPT four.

295
00:23:32,840 --> 00:23:38,990
Well actually this is GPT three is taking the right sales data from the bar chart.

296
00:23:38,990 --> 00:23:39,680
Impressive.

297
00:23:39,680 --> 00:23:42,650
So let's change this to GPT.

298
00:23:43,520 --> 00:23:47,780
Well no this is GPT four actually because this is an image okay.

299
00:23:47,780 --> 00:23:48,890
So this is correct.

300
00:23:48,890 --> 00:23:49,580
Very good.

301
00:23:49,580 --> 00:24:01,430
So I, I was very impressed about the performance of this very simple multimodal LM application because

302
00:24:01,430 --> 00:24:14,180
as you have seen with a very small resources and code, because you can you can check the cost of this

303
00:24:14,240 --> 00:24:15,350
application.

304
00:24:15,350 --> 00:24:22,820
The cost of running this application with GPT three and four is really small, and the performance is

305
00:24:22,820 --> 00:24:23,660
really good.

306
00:24:23,660 --> 00:24:35,000
So we were able to get the information, uh, on the images, the graphs and also the tables, of course,

307
00:24:35,150 --> 00:24:35,810
the text.

308
00:24:35,810 --> 00:24:45,860
So it is really powerful because, uh, from now on you can start using, uh, the rack technique with

309
00:24:45,860 --> 00:24:54,710
your private documents, the private data of your company, without exposing all this private, private

310
00:24:54,710 --> 00:25:03,170
images or graphs, you know, to, to the cloud, with, with, with ChatGPT or, or other, uh, models

311
00:25:03,170 --> 00:25:03,770
in the cloud.

312
00:25:03,770 --> 00:25:10,310
So you can have a lot of, uh, tranquility about that security, privacy, etc..

313
00:25:10,310 --> 00:25:14,450
So I think this is an amazing step.

314
00:25:14,450 --> 00:25:22,760
I think this coming year, we are going to see a lot of, uh, multimodal lwm applications and multimodal,

315
00:25:23,240 --> 00:25:24,860
uh, startups.

316
00:25:24,860 --> 00:25:27,950
And I think we are just in the beginning.

317
00:25:27,950 --> 00:25:32,030
So it's super, super exciting the, the moment.

318
00:25:32,030 --> 00:25:39,560
And I think that all of us, we are very lucky to be exposed to this new technology in the moment when

319
00:25:39,680 --> 00:25:48,770
it is so new and and it has so many applications, so very, very excited about this new, uh, multimodal

320
00:25:48,770 --> 00:25:50,450
LM applications.