1
00:00:05,100 --> 00:00:09,780
In this lesson we are going to talk about the.

2
00:00:10,540 --> 00:00:11,860
Basic concepts.

3
00:00:11,860 --> 00:00:14,020
Concepts, concepts.

4
00:00:14,350 --> 00:00:15,550
Uh, excuse me.

5
00:00:15,550 --> 00:00:23,110
In this lesson, we are going to talk about the basic concepts associated with the rack technique.

6
00:00:30,850 --> 00:00:31,870
So.

7
00:00:33,950 --> 00:00:41,420
One of the most important things to understand when we are studying the Irac technique is the limits

8
00:00:41,420 --> 00:00:44,060
of the context window.

9
00:00:44,720 --> 00:00:53,360
So remember that the context window is the maximum size of the context that we can give to an LM.

10
00:00:54,570 --> 00:01:00,180
For example, an LM like ChatGPT has the following context.

11
00:01:00,180 --> 00:01:01,140
Windows.

12
00:01:01,200 --> 00:01:13,500
Chat GPT 3.5, the current free version of ChatGPT, supports a context window of up to 3000 words or

13
00:01:13,500 --> 00:01:16,590
less, around six pages of text.

14
00:01:17,010 --> 00:01:28,320
Chat GPT four, the premium version of chat GPT supports a higher context window of approximately 6000

15
00:01:28,320 --> 00:01:29,940
words, or 12 pages.

16
00:01:29,940 --> 00:01:31,080
They have increased.

17
00:01:31,080 --> 00:01:34,890
They are increasing this context window limits.

18
00:01:35,670 --> 00:01:41,310
So what limits does the context window impose on us?

19
00:01:42,250 --> 00:01:48,910
Remember that the context window prevents us from things like asking ChatGPT.

20
00:01:48,940 --> 00:01:59,050
To summarize, for example, a 300 page report or asking ChatGPT to use a big database, etc. and things

21
00:01:59,050 --> 00:01:59,890
like this.

22
00:01:59,890 --> 00:02:09,340
As we saw, for example, in the in the professional application, we preview a few lessons.

23
00:02:09,340 --> 00:02:17,740
Uh, before, uh, before this, we saw that this kind of limitations are a big problem for professional

24
00:02:17,740 --> 00:02:19,120
LM applications.

25
00:02:19,120 --> 00:02:20,080
So.

26
00:02:21,010 --> 00:02:23,980
ChatGPT is limited.

27
00:02:23,980 --> 00:02:33,160
The foundation attempts are limited for many operations, and that's why LM applications are finding

28
00:02:33,160 --> 00:02:41,140
so much interest in the market with with customers, because they can solve some limitations like this

29
00:02:41,140 --> 00:02:44,800
one that foundational LMS have.

30
00:02:46,170 --> 00:02:50,640
How to overcome the context window limits.

31
00:02:51,180 --> 00:03:01,260
You could train an LLM from scratch with your own data, and this is impractical for most in reality

32
00:03:01,260 --> 00:03:03,720
because it's hugely expensive.

33
00:03:04,350 --> 00:03:11,400
You could add your data to an already trained LLM, which is called fine tuning.

34
00:03:11,400 --> 00:03:18,990
And this is also impractical for most in reality because it's very expensive and technically very complex.

35
00:03:18,990 --> 00:03:22,470
Or you can use the RAC technique.

36
00:03:23,630 --> 00:03:29,840
Rack is an acronym for retrieval, Augmented Generation.

37
00:03:30,530 --> 00:03:31,700
Don't worry about the name.

38
00:03:31,700 --> 00:03:32,990
It's not important for us.

39
00:03:32,990 --> 00:03:35,960
We will refer it to rack technique.

40
00:03:36,230 --> 00:03:45,740
And in the rack technique we divide our data into small segments, allowing the LM to use them within

41
00:03:45,740 --> 00:03:48,470
the limits of its context window.

42
00:03:48,500 --> 00:03:50,930
We will see more of that later.

43
00:03:50,960 --> 00:03:57,680
This is the technique used today by virtually all LM applications.

44
00:03:57,680 --> 00:04:04,220
So important notes about the limits of the context window.

45
00:04:04,220 --> 00:04:08,150
This is the main problem that rack technique solves.

46
00:04:08,690 --> 00:04:13,670
So let's talk a little bit about the rack technique itself.

47
00:04:15,550 --> 00:04:26,710
So with the rack technique, we will divide our data, our private data into small segments.

48
00:04:26,710 --> 00:04:33,220
So remember for example the seq insights application we saw the professional application.

49
00:04:33,220 --> 00:04:43,570
Do you remember that it loaded a until ten PDFs with a financial information of different companies.

50
00:04:43,570 --> 00:04:48,520
And then it can respond questions or make comparisons or whatever.

51
00:04:48,820 --> 00:04:49,330
Right.

52
00:04:49,330 --> 00:04:56,830
So how can an LM application work with huge amounts of data like that.

53
00:04:56,830 --> 00:05:02,290
So using the rack technique we will see in detail how to do that.

54
00:05:02,290 --> 00:05:12,040
But in short, what the LM application is doing is first it divides in the data into small segments.

55
00:05:12,040 --> 00:05:17,020
Then it will convert the small segments into numbers.

56
00:05:17,830 --> 00:05:23,890
And finally it will load these numbers into a vector database.

57
00:05:23,920 --> 00:05:26,710
We will see that these numbers are called embeddings.

58
00:05:26,710 --> 00:05:28,510
We will see that later.

59
00:05:28,840 --> 00:05:36,520
And then when you ask something to the LM application this question is called query.

60
00:05:36,550 --> 00:05:44,470
So when you ask something to the LM application, this LM application goes to the vector database and

61
00:05:44,470 --> 00:05:49,360
searches for data that only answer your question.

62
00:05:49,360 --> 00:05:56,710
So it is going to use a technique called semantic similarity search.

63
00:05:56,710 --> 00:06:00,010
We'll see more about that later.

64
00:06:00,010 --> 00:06:11,950
So when using the rack technique it is said that the ability to speak comes from the foundation LM the

65
00:06:11,950 --> 00:06:16,000
specific knowledge comes from the vector database.

66
00:06:16,270 --> 00:06:25,360
In other words, the foundation NLM acts like a person who knows how to speak, but it is not familiar

67
00:06:25,360 --> 00:06:26,530
with your data.

68
00:06:27,350 --> 00:06:36,440
And the vector data base acts as the expert knowledge that you add to that foundation LM to make it

69
00:06:36,440 --> 00:06:41,390
behave like a person who knows how to speak about your data.

70
00:06:41,840 --> 00:06:46,850
So we will see that in the LM applications.

71
00:06:47,850 --> 00:06:55,440
We are using LM models like ChatGPT in order to be able to.

72
00:06:56,020 --> 00:06:57,970
Have a conversation.

73
00:06:58,150 --> 00:07:09,820
But these foundation models like ChatGPT do not have the ability to a store private data.

74
00:07:10,790 --> 00:07:11,600
A.

75
00:07:11,600 --> 00:07:17,480
So we will use another tool out of the foundation.

76
00:07:17,570 --> 00:07:22,730
LM a call vector database to store the data.

77
00:07:22,730 --> 00:07:31,460
And we will store the data, the data in a way that can be searched very quickly.

78
00:07:32,000 --> 00:07:41,540
So when the LM application searches data from a vector database, instead of reviewing all the data

79
00:07:41,540 --> 00:07:51,800
in the vector database, it only goes to the data that are that is associated with the question the

80
00:07:51,800 --> 00:07:53,030
user has made.

81
00:07:53,030 --> 00:07:59,720
So it only looks for things that are similar to the question.

82
00:07:59,720 --> 00:08:02,780
The user has a answer.

83
00:08:04,180 --> 00:08:09,250
So this is what we call a similarity.

84
00:08:09,340 --> 00:08:11,170
Semantic similarity search.

85
00:08:11,170 --> 00:08:13,840
You will see more about that later.

86
00:08:13,870 --> 00:08:14,380
Okay.

87
00:08:14,380 --> 00:08:20,620
So some basic concepts about a drag technique.

88
00:08:21,160 --> 00:08:31,030
One interesting note we can make here in this introduction is a talk a little bit about what is called

89
00:08:31,030 --> 00:08:33,190
in-context learning.

90
00:08:33,190 --> 00:08:34,030
Why?

91
00:08:34,659 --> 00:08:43,780
Because you will see in some other courses that there is some confusion between drag and in-context

92
00:08:43,780 --> 00:08:44,680
learning.

93
00:08:45,390 --> 00:08:55,230
So with the rack technique, we divide our data into small segments, thus allowing the LM to use them

94
00:08:55,230 --> 00:08:58,380
within the limits of its context window.

95
00:08:58,410 --> 00:09:06,360
This is the technique remember, used today by almost all LM applications in some media.

96
00:09:07,140 --> 00:09:08,580
In some other courses.

97
00:09:08,580 --> 00:09:14,760
The rack technique is confused with the In-context learning techniques.

98
00:09:14,790 --> 00:09:15,720
Technique.

99
00:09:15,900 --> 00:09:19,800
Next we will clarify the difference between the two.

100
00:09:20,370 --> 00:09:26,220
This clarification is only relevant if a student had this question.

101
00:09:26,220 --> 00:09:31,470
Otherwise it is irrelevant, a mere theoretical matter.

102
00:09:31,590 --> 00:09:42,330
So if you have heard about in-context learning, or you have a confusion between the concepts of rag

103
00:09:42,330 --> 00:09:46,080
and in-context learning, this can be interesting for you.

104
00:09:46,080 --> 00:09:55,590
So with the Rag technique, our goal is going to be to combine information retrieval capability with

105
00:09:55,590 --> 00:10:01,680
language generation to answer questions using external information.

106
00:10:03,120 --> 00:10:13,470
The rack technique uses a retrieval system to search for relevant documents or text snippets in a database.

107
00:10:13,920 --> 00:10:22,200
It then uses a generator model to formulate an answer based on the retrieval retrieve snippets.

108
00:10:22,200 --> 00:10:31,890
For example, if you ask a rack model about a specific topic, it will first search its database, its

109
00:10:31,890 --> 00:10:40,170
database to find relevant information, and then use that information to generate a coherent answer.

110
00:10:41,670 --> 00:10:52,590
In the case of the in-context learning technique, the goal is to adapt a pre-trained model to specific

111
00:10:52,590 --> 00:10:57,930
tasks by providing examples in the input context.

112
00:10:58,620 --> 00:11:03,120
The mechanism in the In-context learning technique is.

113
00:11:03,680 --> 00:11:06,470
That the model is not retrained.

114
00:11:06,590 --> 00:11:15,620
Instead, a context is provided that includes example of the desired task, and the model is expected

115
00:11:15,620 --> 00:11:20,030
to generalize from that context to respond appropriately.

116
00:11:20,330 --> 00:11:30,410
For example, with models like ChatGPT or ChatGPT for, you can provide translation examples in the

117
00:11:30,410 --> 00:11:38,990
input context and then ask a translation question without providing the example explicitly.

118
00:11:39,770 --> 00:11:40,700
So.

119
00:11:41,750 --> 00:11:52,370
The in chat GPT, for example, we can enter a prompt saying English hello, Spanish, hola, and then

120
00:11:52,370 --> 00:11:57,890
ask a translation question without providing the example explicitly.

121
00:11:58,430 --> 00:12:09,230
In summary, In-context learning is based on providing examples in the input context to guide the model

122
00:12:09,230 --> 00:12:11,480
in the desired task.

123
00:12:12,260 --> 00:12:21,530
Rag combines information retrieval with language generation to answer questions using external data.

124
00:12:22,370 --> 00:12:31,040
Both techniques seek to enhance the ability of language models to adapt to specific tasks and provide

125
00:12:31,040 --> 00:12:32,510
informed answers.

126
00:12:32,510 --> 00:12:38,300
However, they use different approaches and mechanisms to achieve this.

127
00:12:39,080 --> 00:12:43,040
So in short, whenever you.

128
00:12:44,730 --> 00:12:46,560
Read or listen.

129
00:12:46,560 --> 00:12:52,560
That rack technique and in-context learning technique are the same.

130
00:12:52,590 --> 00:12:53,850
That is wrong.

131
00:12:54,960 --> 00:13:03,870
In this program, we are going to be focused on the rack technique because this is the most used technique

132
00:13:03,870 --> 00:13:07,140
in the LM applications right now.

133
00:13:07,380 --> 00:13:11,700
This is going to be our focus and.

134
00:13:12,730 --> 00:13:21,250
In this introduction, I wanted to finish with a note about the importance of the rack technique for

135
00:13:21,370 --> 00:13:23,350
LM applications.

136
00:13:24,890 --> 00:13:25,760
So.

137
00:13:26,900 --> 00:13:32,720
Directing me right now is essential for creating LLM applications.

138
00:13:33,050 --> 00:13:33,980
Essential.

139
00:13:34,680 --> 00:13:40,290
And that's why we are going to focus on mastering this technique.

140
00:13:40,290 --> 00:13:41,820
So it is crucial.

141
00:13:41,820 --> 00:13:47,280
It is a core component of most LLM applications, the rack technique.

142
00:13:47,490 --> 00:13:54,000
That's why we are going to focus our attention on it in this program.

143
00:13:54,900 --> 00:13:56,520
So in this lesson.

144
00:13:57,140 --> 00:14:01,460
We were just talking theory.

145
00:14:01,610 --> 00:14:10,430
I know it's a little bit messy, and I don't like the approach of, uh, too much theory, you know,

146
00:14:10,430 --> 00:14:11,750
uh, in the at this point.

147
00:14:11,750 --> 00:14:19,340
But we thought it was interesting to just, you know, uh, start talking about some concepts that we

148
00:14:19,340 --> 00:14:22,100
are going to be finding in the next lessons.

149
00:14:22,100 --> 00:14:31,550
And in the case of the In-context learning, some students have a posted this, uh, question to us,

150
00:14:31,550 --> 00:14:36,530
and it was important for us just to clarify it from the beginning.

151
00:14:36,530 --> 00:14:44,120
The main technique we use in LLM application is the Rag technique, which is slightly different from

152
00:14:44,120 --> 00:14:45,200
in-context learning.

153
00:14:45,200 --> 00:14:47,960
So do not confuse both of them.

154
00:14:47,960 --> 00:14:49,070
They are different.

155
00:14:49,070 --> 00:14:56,120
And the focus of this program is going to be in the most used technique in LLM applications, which

156
00:14:56,120 --> 00:14:58,220
is the Rag technique.

157
00:14:58,840 --> 00:15:06,550
In the next lesson, we are going to talk in more detail about the components of the rack technique.

158
00:15:06,580 --> 00:15:14,560
We are still going to be talking in theory in a conceptual way, because it is important for you to

159
00:15:14,560 --> 00:15:21,790
understand what we are going to do before going into the practical side of it.

