1
00:00:00,020 --> 00:00:03,000
[MUSIC]

2
00:00:04,000 --> 00:00:07,750
Welcome to this video on understanding Reflexion agents.

3
00:00:08,640 --> 00:00:13,017
In this video, you'll analyze how Reflexion agents improve AI responses

4
00:00:13,017 --> 00:00:15,560
through iterative self-critique and tool use.

5
00:00:15,920 --> 00:00:19,883
You'll apply generator and reflector roles within the Reflexion workflow,

6
00:00:19,883 --> 00:00:22,983
integrate real-time information using external tools,

7
00:00:22,983 --> 00:00:27,990
and structure outputs to include citations and references for greater transparency.

8
00:00:28,780 --> 00:00:34,240
In this video on Reflexion agents with LangGraph, remember the difference isn't just the "X".

9
00:00:34,240 --> 00:00:37,783
Reflexion agents build on the idea of reflection agents,

10
00:00:37,783 --> 00:00:40,550
which iteratively review and refine outputs.

11
00:00:40,820 --> 00:00:45,390
They go further by producing responses with citations, current information,

12
00:00:45,390 --> 00:00:49,220
and verifiable claims rather than just improved opinions.

13
00:00:49,890 --> 00:00:52,420
Here's a quick review of the reflection process.

14
00:00:52,520 --> 00:00:57,820
A query like "I need more minerals in my diet" enters a cycle of generation and reflection.

15
00:00:58,110 --> 00:01:01,930
The back-and-forth continues until a reasonable stopping point is reached.

16
00:01:02,030 --> 00:01:05,610
Eventually, the system produces a response such as,

17
00:01:05,617 --> 00:01:10,850
"To increase minerals in your diet, try eating foods like spinach (iron and magnesium),

18
00:01:10,850 --> 00:01:15,050
almonds (magnesium), and dairy products (calcium)."

19
00:01:15,500 --> 00:01:19,740
But what happens if new research comes out after the model is already trained?

20
00:01:19,990 --> 00:01:22,010
This is where Reflexion comes in.

21
00:01:22,270 --> 00:01:27,250
It allows the system to learn from new information, evaluate past responses,

22
00:01:27,250 --> 00:01:29,870
and iteratively improve even post-training.

23
00:01:31,130 --> 00:01:35,890
Here's why Reflexion is considered a powerful framework for building smarter agents.

24
00:01:36,340 --> 00:01:40,820
At its core, Reflexion is designed to support self-improving agents.

25
00:01:41,090 --> 00:01:43,400
These agents don't just reflect once.

26
00:01:43,480 --> 00:01:45,983
They continually analyze their own performance,

27
00:01:45,983 --> 00:01:48,590
learning and getting better with each iteration.

28
00:01:48,740 --> 00:01:53,230
One of their key strengths is the ability to find and fix their own weaknesses.

29
00:01:53,460 --> 00:01:56,810
After each run, the agent reflects on what went wrong

30
00:01:56,810 --> 00:01:59,900
and adjusts its reasoning or strategy before trying again.

31
00:02:00,000 --> 00:02:03,410
They also have the ability to incorporate external information.

32
00:02:03,570 --> 00:02:06,370
By calling tools like web search or APIs,

33
00:02:06,370 --> 00:02:08,883
Reflexion agents can bring in real-time data

34
00:02:08,883 --> 00:02:11,990
to improve the relevance and accuracy of their next attempt.

35
00:02:12,170 --> 00:02:16,940
And finally, Reflexion agents are able to support and justify their output.

36
00:02:17,130 --> 00:02:21,280
Because of the reflection cycle, they can back up their responses with citations

37
00:02:21,283 --> 00:02:24,380
or clearly explain the reasoning behind their answers.

38
00:02:24,910 --> 00:02:29,720
Let's understand Reflexion, a method for improving LLM response.

39
00:02:29,940 --> 00:02:31,140
Start with a query.

40
00:02:31,259 --> 00:02:33,259
"I need more minerals in my diet."

41
00:02:33,370 --> 00:02:37,390
A generator or responder LLM creates the initial response.

42
00:02:38,179 --> 00:02:40,020
A system prompt sets the role.

43
00:02:40,020 --> 00:02:42,390
For example, "You are a fitness coach".

44
00:02:42,710 --> 00:02:45,390
Tell the LLM to critique its own output.

45
00:02:45,780 --> 00:02:49,880
Provide a tool input, like a search query, to refine the response.

46
00:02:50,020 --> 00:02:53,360
A structured system message guides the entire process.

47
00:02:53,470 --> 00:02:57,517
To help the model distinguish between tool outputs and its own responses,

48
00:02:57,517 --> 00:02:59,460
the output is clearly formatted.

49
00:02:59,720 --> 00:03:05,730
Each part is labeled, including the response, critique, and query, to avoid ambiguity.

50
00:03:06,070 --> 00:03:08,050
Instead of returning plain text,

51
00:03:08,050 --> 00:03:12,830
the LLM outputs a structured object based on a defined schema or data model.

52
00:03:13,600 --> 00:03:16,460
The user query is passed to the responder.

53
00:03:17,170 --> 00:03:21,590
Instead of returning raw text, the LLM outputs a structured object.

54
00:03:21,860 --> 00:03:25,410
This object follows a schema, which is represented as a table.

55
00:03:25,690 --> 00:03:30,210
Each field, like response and query, maps to an attribute of the object.

56
00:03:30,600 --> 00:03:33,350
This entire structure becomes an AI message.

57
00:03:33,530 --> 00:03:36,500
The responder's output is passed to a search engine.

58
00:03:36,670 --> 00:03:40,130
The tool extracts the search query from the responder's output.

59
00:03:40,310 --> 00:03:44,410
Simultaneously, the HumanMessage and AIMessage from the responder

60
00:03:44,410 --> 00:03:47,790
are saved to a list called response_list.

61
00:03:48,370 --> 00:03:51,010
For each query, the tool will return information.

62
00:03:51,240 --> 00:03:54,730
For example, it might include the title, content, and URL.

63
00:03:54,910 --> 00:03:59,640
Here, one search result is shown, but this is a parameter you can decide.

64
00:03:59,890 --> 00:04:03,790
You also append the tool call result to the response list.

65
00:04:05,000 --> 00:04:08,930
The tool will pass this output to the revisor via the response list.

66
00:04:09,110 --> 00:04:14,310
The revisor will use the response list, in particular the self-critique from the responder.

67
00:04:15,300 --> 00:04:19,680
The revisor modifies the input from the responder using the tool outputs.

68
00:04:19,940 --> 00:04:23,917
It then follows a set of instructions to revise the response,

69
00:04:23,917 --> 00:04:29,490
incorporate citations from the tool, and add references for the citations.

70
00:04:29,830 --> 00:04:33,200
Just like the generator, the revisor outputs a response.

71
00:04:33,410 --> 00:04:38,270
For example, recommending mineral-rich food, but with added references.

72
00:04:38,400 --> 00:04:42,440
Like the responder, it uses the same schema, represented as a table.

73
00:04:42,720 --> 00:04:45,550
This includes the revised response, references,

74
00:04:45,550 --> 00:04:48,760
self-critique, and the next set of search queries.

75
00:04:49,180 --> 00:04:53,880
A key difference is that the response now includes refined references.

76
00:04:54,270 --> 00:04:58,660
The revisor's output is passed to the tool, and the search queries are extracted.

77
00:04:59,000 --> 00:05:01,660
The output is also stored in the tool response.

78
00:05:01,920 --> 00:05:06,150
The tool response is processed by the revisor and added to the response list.

79
00:05:06,850 --> 00:05:10,990
The response list also includes past revisor AI messages.

80
00:05:11,530 --> 00:05:14,220
This process repeats itself iteratively.

81
00:05:14,720 --> 00:05:19,280
The revisor passes its output back to the tool, which then updates the response.

82
00:05:19,480 --> 00:05:24,040
This cycle continues for a predetermined number of iterations until you get an output.

83
00:05:24,830 --> 00:05:26,670
In this video, you learned that:

84
00:05:26,679 --> 00:05:29,950
Reflexion agents build on reflection agents by iteratively

85
00:05:29,950 --> 00:05:34,430
improving responses using self-critiques, external tools, and citations.

86
00:05:34,790 --> 00:05:38,510
The reflection process involves a loop of generation, critique,

87
00:05:38,510 --> 00:05:42,130
and revision to enhance clarity, accuracy, and usefulness.

88
00:05:42,650 --> 00:05:46,080
Reflexion agents can identify and fix their own weaknesses,

89
00:05:46,080 --> 00:05:49,340
improving with each cycle by analyzing prior outputs.

90
00:05:49,490 --> 00:05:52,810
They can incorporate real-time data by calling external tools

91
00:05:52,810 --> 00:05:56,540
like web search APIs, enhancing the relevance of responses.

92
00:05:56,780 --> 00:05:59,950
Structured schema-based output helps agents distinguish

93
00:05:59,950 --> 00:06:04,090
between different components like response, critique, and tool query.

94
00:06:04,290 --> 00:06:08,310
The responder produces an output with fields such as query and response,

95
00:06:08,310 --> 00:06:11,410
which downstream components like the revisor can build on.

96
00:06:11,490 --> 00:06:15,510
The revisor refines the response by revising it, integrating tool outputs,

97
00:06:15,510 --> 00:06:17,930
and adding references to support the claims.

98
00:06:18,070 --> 00:06:20,920
This entire process operates in an iterative cycle

99
00:06:20,920 --> 00:06:23,380
with outputs and feedback passed through tools

100
00:06:23,380 --> 00:06:25,780
and stored in a response list across runs.

101
00:06:26,690 --> 00:06:29,380
[MUSIC]