WEBVTT

1
00:00:00.180 --> 00:00:02.370
<v ->Hey there, Idan here, and in this video,</v>

2
00:00:02.370 --> 00:00:04.950
we are going to be implementing the retrieval part.

3
00:00:04.950 --> 00:00:06.600
And let's go to the code.

4
00:00:06.600 --> 00:00:10.320
So I'm going to go to the backend directory here.

5
00:00:10.320 --> 00:00:14.640
Let's go and create here a Dundas core init file,

6
00:00:14.640 --> 00:00:16.920
just to turn it into a package.

7
00:00:16.920 --> 00:00:21.920
And let's go and let me go and create a core.py file.

8
00:00:22.230 --> 00:00:25.560
And here we're going to implement the retrieval part.

9
00:00:25.560 --> 00:00:27.360
All right, let's start with the imports.

10
00:00:27.360 --> 00:00:31.260
Let's import OS to handle environment variables.

11
00:00:31.260 --> 00:00:34.590
Let's go and import from typing,

12
00:00:34.590 --> 00:00:36.180
type Any and Dict,

13
00:00:36.180 --> 00:00:37.830
and those are going to be type-ins

14
00:00:37.830 --> 00:00:39.810
for our function declarations.

15
00:00:39.810 --> 00:00:42.037
We want to import load_dotenv

16
00:00:42.037 --> 00:00:43.800
to load environment variables.

17
00:00:43.800 --> 00:00:45.390
We saw all of this before.

18
00:00:45.390 --> 00:00:48.240
We want to import the create_agent function,

19
00:00:48.240 --> 00:00:50.220
which we saw before as well.

20
00:00:50.220 --> 00:00:53.820
And now we're introducing something new, the initChatModel.

21
00:00:53.820 --> 00:00:57.270
And the initChatModel is a very convenient way

22
00:00:57.270 --> 00:01:00.600
which we are going to use to initialize

23
00:01:00.600 --> 00:01:05.130
very quickly a chat client to make the LLM request.

24
00:01:05.130 --> 00:01:07.140
Now, this is a general function

25
00:01:07.140 --> 00:01:09.240
which is going to receive a string

26
00:01:09.240 --> 00:01:12.150
and it's going to return us the correct chat model.

27
00:01:12.150 --> 00:01:12.983
We're going to see it,

28
00:01:12.983 --> 00:01:15.120
it's very cool and it's very convenient to use.

29
00:01:15.120 --> 00:01:18.270
All right, let's go and also use ToolMessage.

30
00:01:18.270 --> 00:01:21.570
and we're going to be implementing everything,

31
00:01:21.570 --> 00:01:23.850
the entire retrieval pipeline,

32
00:01:23.850 --> 00:01:27.870
with an agent which is going to have a retrieval tool.

33
00:01:27.870 --> 00:01:31.410
So it's going to retrieve the relevant content,

34
00:01:31.410 --> 00:01:32.760
the document,

35
00:01:32.760 --> 00:01:37.760
and this retrieval is going to be marked as a tool message.

36
00:01:37.770 --> 00:01:39.930
So I remind you, a tool message

37
00:01:39.930 --> 00:01:41.910
is going to be a type of message

38
00:01:41.910 --> 00:01:44.310
that is containing tool execution,

39
00:01:44.310 --> 00:01:47.580
and in our case, it's going to be the retrieval tool,

40
00:01:47.580 --> 00:01:49.020
which we're going to implement.

41
00:01:49.020 --> 00:01:52.890
Let's import also tool, because we want to create this tool.

42
00:01:52.890 --> 00:01:55.890
And we want to use the PineconeVectorStore.

43
00:01:55.890 --> 00:01:57.090
You can use Chroma.

44
00:01:57.090 --> 00:01:59.610
I'm going to use in this video PineconeVectorStore

45
00:01:59.610 --> 00:02:01.500
for the retrieval itself.

46
00:02:01.500 --> 00:02:03.600
And of course, we need an embeddings model

47
00:02:03.600 --> 00:02:05.760
because we want to embed the query

48
00:02:05.760 --> 00:02:07.320
and turn it into a vector

49
00:02:07.320 --> 00:02:10.020
before we get the relevant context.

50
00:02:10.020 --> 00:02:14.400
All right. So now we want to load the environment variables,

51
00:02:14.400 --> 00:02:17.070
and up until now was just the inputs.

52
00:02:17.070 --> 00:02:20.010
So now we want to initialize the embeddings model,

53
00:02:20.010 --> 00:02:22.950
and I'm going to use text embedding 3 small.

54
00:02:22.950 --> 00:02:25.710
And I remind you, this must correlate

55
00:02:25.710 --> 00:02:29.310
and this must match the size of the vectors

56
00:02:29.310 --> 00:02:32.490
when you initialize your vector store in Pinecone.

57
00:02:32.490 --> 00:02:34.050
So make sure you're going to be using

58
00:02:34.050 --> 00:02:35.550
the same embedding model.

59
00:02:35.550 --> 00:02:38.310
And now we want to initialize the vector store,

60
00:02:38.310 --> 00:02:40.410
so we want to give it the index name.

61
00:02:40.410 --> 00:02:42.840
And here I re-recorded this video,

62
00:02:42.840 --> 00:02:46.260
so I gave it the langchain-docs-2026.

63
00:02:46.260 --> 00:02:48.420
And we want to pass the vector store,

64
00:02:48.420 --> 00:02:50.340
just like in the earlier video,

65
00:02:50.340 --> 00:02:52.710
an embeddings object.

66
00:02:52.710 --> 00:02:55.800
So it will know how to embed the text

67
00:02:55.800 --> 00:02:58.050
and work with the embedding.

68
00:02:58.050 --> 00:03:00.270
We want to also now init a chat model.

69
00:03:00.270 --> 00:03:02.640
And here, look how easy it is.

70
00:03:02.640 --> 00:03:05.790
We're simply going to tell that the model provider

71
00:03:05.790 --> 00:03:07.440
is going to be OpenAI

72
00:03:07.440 --> 00:03:11.730
and that we're going to be wanting to use GPT-5.2.

73
00:03:11.730 --> 00:03:15.570
So you can see here in the actual implementation itself,

74
00:03:15.570 --> 00:03:17.130
all of the strings, right?

75
00:03:17.130 --> 00:03:19.770
So you can see in order to use OpenAI,

76
00:03:19.770 --> 00:03:21.480
this is the string we need to write.

77
00:03:21.480 --> 00:03:22.800
Cool. So here you can see

78
00:03:22.800 --> 00:03:24.090
all of the supported models

79
00:03:24.090 --> 00:03:26.160
because it doesn't support all of the models,

80
00:03:26.160 --> 00:03:28.110
but the major ones are here.

81
00:03:28.110 --> 00:03:29.640
And it's very easy to use.

82
00:03:29.640 --> 00:03:30.900
If you wanted to use Gemini,

83
00:03:30.900 --> 00:03:34.680
we'd simply need to change this string

84
00:03:34.680 --> 00:03:37.230
to represent a Google gen AI

85
00:03:37.230 --> 00:03:39.840
and to write the Gemini version that we want

86
00:03:39.840 --> 00:03:41.970
according to the function implementation here.

87
00:03:41.970 --> 00:03:45.660
All right, so let's now go and implement our function.

88
00:03:45.660 --> 00:03:48.720
And our function is going to be called retrieve_context,

89
00:03:48.720 --> 00:03:50.160
and it's going to receive a query

90
00:03:50.160 --> 00:03:52.470
and this query is going to be the user query.

91
00:03:52.470 --> 00:03:55.410
Now this function is going to be a tool.

92
00:03:55.410 --> 00:03:58.320
So let's go and use the tool decorator.

93
00:03:58.320 --> 00:04:01.290
And we want to have the response format

94
00:04:01.290 --> 00:04:03.750
as content and artifact.

95
00:04:03.750 --> 00:04:05.550
So the response format,

96
00:04:05.550 --> 00:04:07.170
you can see it right over here,

97
00:04:07.170 --> 00:04:10.860
it can be either content or content_and_artifact,

98
00:04:10.860 --> 00:04:12.570
where the default is content.

99
00:04:12.570 --> 00:04:13.860
If it's going to be content,

100
00:04:13.860 --> 00:04:16.320
then this tool should only return one value.

101
00:04:16.320 --> 00:04:18.990
And if it's going to be content_and_artifact,

102
00:04:18.990 --> 00:04:21.720
this tool should return two values.

103
00:04:21.720 --> 00:04:24.030
Now this is going to be very convenient

104
00:04:24.030 --> 00:04:27.180
to mark other information

105
00:04:27.180 --> 00:04:29.760
which we want to downstream to the application

106
00:04:29.760 --> 00:04:31.530
and not send to the LLM.

107
00:04:31.530 --> 00:04:34.320
So I know you're probably not making any sense of it,

108
00:04:34.320 --> 00:04:36.300
but trust me, once we're going to debug it,

109
00:04:36.300 --> 00:04:37.560
you'll see the difference

110
00:04:37.560 --> 00:04:39.930
between content and between artifact.

111
00:04:39.930 --> 00:04:42.510
Anyways, just to summarize, this response format

112
00:04:42.510 --> 00:04:44.250
simply is going to mean that

113
00:04:44.250 --> 00:04:48.480
this tool is going to return two return values.

114
00:04:48.480 --> 00:04:49.313
All right.

115
00:04:49.313 --> 00:04:51.337
So the description is going to be as follows.

116
00:04:51.337 --> 00:04:53.070
"Retrieve relevant documentation

117
00:04:53.070 --> 00:04:55.710
to help answer user queries about LangChain"

118
00:04:55.710 --> 00:04:57.930
'cause it's going to be a LangChain retrieval tool.

119
00:04:57.930 --> 00:05:00.420
So this is going to help the agent

120
00:05:00.420 --> 00:05:02.790
determine whether to use this tool or not.

121
00:05:02.790 --> 00:05:06.300
And notice here that I didn't mention even the return values

122
00:05:06.300 --> 00:05:07.710
because it's going to be derived

123
00:05:07.710 --> 00:05:09.810
from this content and artifact.

124
00:05:09.810 --> 00:05:12.510
All right, so let's go do the implementation.

125
00:05:12.510 --> 00:05:15.270
So we want to take the vector, we want to embed it,

126
00:05:15.270 --> 00:05:18.780
then we want to find all its relevant documents

127
00:05:18.780 --> 00:05:21.030
that are most relevant to that vector,

128
00:05:21.030 --> 00:05:22.410
and then we want to return it.

129
00:05:22.410 --> 00:05:23.940
So let's go and do it.

130
00:05:23.940 --> 00:05:26.190
And we're going to take the vector store,

131
00:05:26.190 --> 00:05:27.940
we're going to use the as_retriever function

132
00:05:27.940 --> 00:05:30.480
like we saw in the previous section,

133
00:05:30.480 --> 00:05:33.900
and we're going to be using the invoke method here.

134
00:05:33.900 --> 00:05:36.420
Now I remind you, for a retriever,

135
00:05:36.420 --> 00:05:38.790
so this entire thing is a retriever,

136
00:05:38.790 --> 00:05:42.330
an invoke method is going to perform similarity search.

137
00:05:42.330 --> 00:05:45.450
And in similarity search, we can provide the query.

138
00:05:45.450 --> 00:05:46.890
So this is the question string

139
00:05:46.890 --> 00:05:50.430
that we want to embed and find the relevant documents.

140
00:05:50.430 --> 00:05:53.460
Here, we can also provide the number of documents

141
00:05:53.460 --> 00:05:55.140
we want to get at most.

142
00:05:55.140 --> 00:05:57.510
So this is the k=4 here.

143
00:05:57.510 --> 00:06:00.090
So this finally is going to give us

144
00:06:00.090 --> 00:06:03.690
a list of documents we can augment our prompt here.

145
00:06:03.690 --> 00:06:07.320
And now we want to go and we want to serialize everything.

146
00:06:07.320 --> 00:06:09.030
Just like we saw in the previous section,

147
00:06:09.030 --> 00:06:11.940
we want to go and to iterate over the documents.

148
00:06:11.940 --> 00:06:15.120
And we want to take the content of the documents

149
00:06:15.120 --> 00:06:18.510
and we want to take the sources of the documents

150
00:06:18.510 --> 00:06:20.550
which we indexed before.

151
00:06:20.550 --> 00:06:23.010
And eventually you want to get a big string

152
00:06:23.010 --> 00:06:25.320
that we want to attach to our prompt

153
00:06:25.320 --> 00:06:27.930
and this is the prompt augmentation here.

154
00:06:27.930 --> 00:06:30.810
So this is going to be the serialized context

155
00:06:30.810 --> 00:06:32.730
which we're going to pass

156
00:06:32.730 --> 00:06:35.310
and downstream eventually to the LLM.

157
00:06:35.310 --> 00:06:36.143
Amazing.

158
00:06:36.143 --> 00:06:38.550
And lastly, now, all we want to do

159
00:06:38.550 --> 00:06:42.060
is to return also the serialized content.

160
00:06:42.060 --> 00:06:43.620
So this is going to be the content

161
00:06:43.620 --> 00:06:45.510
from the content and artifact,

162
00:06:45.510 --> 00:06:47.460
and also the retrieved docs.

163
00:06:47.460 --> 00:06:49.590
So this is going to be an artifact.

164
00:06:49.590 --> 00:06:53.490
Now, why I am making this distinction?

165
00:06:53.490 --> 00:06:55.410
Now, notice what's going to happen now.

166
00:06:55.410 --> 00:06:57.540
We are going to be downstreaming

167
00:06:57.540 --> 00:07:01.980
the serialized documents here to the LLM.

168
00:07:01.980 --> 00:07:06.420
However, I want to downstream to the application

169
00:07:06.420 --> 00:07:07.860
and not the LLM,

170
00:07:07.860 --> 00:07:12.480
the retrieved documents in their LangChain documents form.

171
00:07:13.740 --> 00:07:16.410
So this is why we have here a distinction,

172
00:07:16.410 --> 00:07:20.670
because if we were to return only the serialized documents,

173
00:07:20.670 --> 00:07:25.140
then I wouldn't have this document object I can work with.

174
00:07:25.140 --> 00:07:28.050
So I want to keep that so I can work with.

175
00:07:28.050 --> 00:07:31.347
And eventually, this is going to go to the LLM,

176
00:07:31.347 --> 00:07:33.780
and this is not going to go to the LLM.

177
00:07:33.780 --> 00:07:35.760
This is going to stay in the application.

178
00:07:35.760 --> 00:07:36.990
And once we debug it,

179
00:07:36.990 --> 00:07:39.210
we're going to be seeing this in the traces.

180
00:07:39.210 --> 00:07:43.380
So this is the entire RetrieveContext tool here.

181
00:07:43.380 --> 00:07:46.740
So now we want to create an agent

182
00:07:46.740 --> 00:07:48.780
which is going to be using that tool.

183
00:07:48.780 --> 00:07:51.390
All right, so let me define a wrapper function

184
00:07:51.390 --> 00:07:54.390
because I want to wrap everything under a function.

185
00:07:54.390 --> 00:07:56.760
I'm going to call this function run_llm.

186
00:07:56.760 --> 00:07:58.260
It's going to receive a query

187
00:07:58.260 --> 00:08:00.810
and it's going to return us a dictionary.

188
00:08:00.810 --> 00:08:04.890
This function is going to run the RAG retrieval pipeline

189
00:08:04.890 --> 00:08:06.570
to answer a question.

190
00:08:06.570 --> 00:08:09.000
It's going to receive the user query

191
00:08:09.000 --> 00:08:11.670
and it's going to return a dictionary,

192
00:08:11.670 --> 00:08:13.890
which is going to have the answer,

193
00:08:13.890 --> 00:08:16.860
which is the generated answer by the LLM,

194
00:08:16.860 --> 00:08:19.410
and it's going to return context.

195
00:08:19.410 --> 00:08:21.840
So it's going to be the list of retrieved documents.

196
00:08:21.840 --> 00:08:23.280
And I'm going to give you a quick hint.

197
00:08:23.280 --> 00:08:26.070
This context here is going to be derived here

198
00:08:26.070 --> 00:08:28.260
from this retrieved_docs here.

199
00:08:28.260 --> 00:08:31.320
So this is why we need it and we'll soon go and see it.

200
00:08:31.320 --> 00:08:33.600
All right, so let's start create the agent

201
00:08:33.600 --> 00:08:35.640
with one retrieval tool

202
00:08:35.640 --> 00:08:38.100
and let's start with the system prompt.

203
00:08:38.100 --> 00:08:41.077
And the system prompt is going to go as follows.

204
00:08:41.077 --> 00:08:43.110
"You are a helpful AI assistant

205
00:08:43.110 --> 00:08:46.350
that answers questions about LangChain documentation.

206
00:08:46.350 --> 00:08:48.210
You have access to a tool

207
00:08:48.210 --> 00:08:50.490
that retrieves relevant documentation.

208
00:08:50.490 --> 00:08:53.010
Use the tool to find relevant information

209
00:08:53.010 --> 00:08:54.870
before answering questions.

210
00:08:54.870 --> 00:08:59.100
Always cite the sources you use in your answers.

211
00:08:59.100 --> 00:09:01.380
If you cannot find the answer

212
00:09:01.380 --> 00:09:04.500
in the retrieved documentation, say so."

213
00:09:04.500 --> 00:09:06.030
Cool. So the last part

214
00:09:06.030 --> 00:09:07.350
is actually very, very important

215
00:09:07.350 --> 00:09:09.420
because if the documentation helper

216
00:09:09.420 --> 00:09:12.480
doesn't know how to answer, we don't want it to hallucinate.

217
00:09:12.480 --> 00:09:14.340
So this is going to be helping with that.

218
00:09:14.340 --> 00:09:17.220
So let's go now and use the create_agent function

219
00:09:17.220 --> 00:09:18.330
like we saw before.

220
00:09:18.330 --> 00:09:19.890
We're going to give it the model.

221
00:09:19.890 --> 00:09:23.010
We're going to give it the tools of the retrieve_context.

222
00:09:23.010 --> 00:09:26.670
And we're going it the system_prompt we saw before.

223
00:09:26.670 --> 00:09:29.280
So this is going to run langref under the hood,

224
00:09:29.280 --> 00:09:30.450
I remind you.

225
00:09:30.450 --> 00:09:34.230
And now let's go and build this message list

226
00:09:34.230 --> 00:09:36.270
and this invocation from the query.

227
00:09:36.270 --> 00:09:37.980
So we want to take this query,

228
00:09:37.980 --> 00:09:39.690
which is a string right here,

229
00:09:39.690 --> 00:09:41.550
and we want to create a message,

230
00:09:41.550 --> 00:09:43.470
which is going to have the role of the user

231
00:09:43.470 --> 00:09:45.840
and the content is going to be the user query,

232
00:09:45.840 --> 00:09:48.090
simply building a message object.

233
00:09:48.090 --> 00:09:52.080
And now we want to invoke the graph

234
00:09:52.080 --> 00:09:55.530
and we want to give it the list of messages

235
00:09:55.530 --> 00:09:57.690
of this list over here.

236
00:09:57.690 --> 00:10:00.420
So I remind you, the create agent function,

237
00:10:00.420 --> 00:10:01.310
when you invoke it,

238
00:10:01.310 --> 00:10:05.670
it expects to receive a dictionary with a messages key.

239
00:10:05.670 --> 00:10:07.890
And we're going to get back a response,

240
00:10:07.890 --> 00:10:08.790
and the response,

241
00:10:08.790 --> 00:10:11.220
because we're going to run a lot of tool calls,

242
00:10:11.220 --> 00:10:12.623
and we're going to have tool messages,

243
00:10:12.623 --> 00:10:14.670
who are going to sit in the trace,

244
00:10:14.670 --> 00:10:17.310
we want to take from the response,

245
00:10:17.310 --> 00:10:19.500
we're going to have all the messages history,

246
00:10:19.500 --> 00:10:23.370
want to take the last message and to access its content.

247
00:10:23.370 --> 00:10:26.370
So far, we created a very simple agent,

248
00:10:26.370 --> 00:10:29.280
we invoke it, we got here the response.

249
00:10:29.280 --> 00:10:32.160
And now we want to also,

250
00:10:32.160 --> 00:10:33.840
we want to return the documents

251
00:10:33.840 --> 00:10:36.090
that help the agent answer the question,

252
00:10:36.090 --> 00:10:38.010
because we want to show the user

253
00:10:38.010 --> 00:10:40.200
where the answer was grounded for.

254
00:10:40.200 --> 00:10:42.480
So it's going to create trust with the user

255
00:10:42.480 --> 00:10:44.160
because they can go to the link

256
00:10:44.160 --> 00:10:46.860
where the answer was generated from.

257
00:10:46.860 --> 00:10:49.110
So this concept of showing the user

258
00:10:49.110 --> 00:10:51.030
where does the answer come from

259
00:10:51.030 --> 00:10:52.050
really creates trust,

260
00:10:52.050 --> 00:10:53.850
and it's a really important part

261
00:10:53.850 --> 00:10:56.280
of the agentic user experience.

262
00:10:56.280 --> 00:10:59.460
So we want now to take, I remind you,

263
00:10:59.460 --> 00:11:01.890
this tool which ran,

264
00:11:01.890 --> 00:11:04.860
it also returned the documents themselves.

265
00:11:04.860 --> 00:11:08.070
So we want now to access that, right?

266
00:11:08.070 --> 00:11:09.660
So I told you earlier

267
00:11:09.660 --> 00:11:13.350
that we used the content_and_artifact here.

268
00:11:13.350 --> 00:11:17.010
So in fact, we can go and access this variable here

269
00:11:17.010 --> 00:11:18.720
of retrieved_docs.

270
00:11:18.720 --> 00:11:22.440
We can access the value here from the tool message.

271
00:11:22.440 --> 00:11:24.570
And I'm going to show it in debug mode.

272
00:11:24.570 --> 00:11:28.320
So now I want to extract the context documents

273
00:11:28.320 --> 00:11:31.050
from the tool message artifacts here.

274
00:11:31.050 --> 00:11:33.690
So I'm going to initialize an empty list

275
00:11:33.690 --> 00:11:36.660
and I'm going to iterate through all the messages here.

276
00:11:36.660 --> 00:11:39.450
And I'm going to search for all the messages

277
00:11:39.450 --> 00:11:41.520
which are type ToolMessage

278
00:11:41.520 --> 00:11:44.760
and are going to have the attribute of artifact.

279
00:11:44.760 --> 00:11:47.010
So this means that this tool message

280
00:11:47.010 --> 00:11:48.840
is going to have an artifact field

281
00:11:48.840 --> 00:11:50.580
which is not going to be empty.

282
00:11:50.580 --> 00:11:52.830
So this is going to be only for the result

283
00:11:52.830 --> 00:11:56.040
of this tool execution right over here.

284
00:11:56.040 --> 00:11:57.420
So if that is the case,

285
00:11:57.420 --> 00:12:02.250
I want to go and I want to take the context docs,

286
00:12:02.250 --> 00:12:03.570
which is an empty list right now,

287
00:12:03.570 --> 00:12:06.630
and I want to append to it this entire list here,

288
00:12:06.630 --> 00:12:09.390
because the value of artifact is going to be a list.

289
00:12:09.390 --> 00:12:11.610
And finally, let's finish this function

290
00:12:11.610 --> 00:12:15.390
by returning an answer and a context.

291
00:12:15.390 --> 00:12:17.970
So the answer is going to be the LLM response.

292
00:12:17.970 --> 00:12:21.030
And the context now is going to be a list of documents

293
00:12:21.030 --> 00:12:23.910
which we took from the artifact of the tool execution.

294
00:12:23.910 --> 00:12:25.650
So this is the entire function.

295
00:12:25.650 --> 00:12:29.430
Now, let me go and implement an example

296
00:12:29.430 --> 00:12:31.500
so you can see everything I talked about

297
00:12:31.500 --> 00:12:33.150
and let's go and debug it.

298
00:12:33.150 --> 00:12:36.150
So I created if __name == '__main__':,

299
00:12:36.150 --> 00:12:38.820
and I'm going to run the LLM function

300
00:12:38.820 --> 00:12:41.040
with the query of "what are deep agents?"

301
00:12:41.040 --> 00:12:43.200
and I'm going to be printing the result.

302
00:12:43.200 --> 00:12:45.393
So let me now run it in debug.