✅ Lab Setup (10–15 min)
Step 0 — Create project folder
Create: genai-section8-agents-lab/ with:
src/tools/memory/data/outputs/logs/reports/
Step 1 — Install dependencies
pip install fastapi uvicorn pydantic python-dotenv requests pip install jsonschema tenacity rich pip install chromadb sentence-transformers pip install networkx
Optional (if using APIs):
pip install openai anthropic google-generativeai
🧪 Part A — Agents vs Prompt-Based Systems (8.1)
A1) Same Task, Two Approaches: Prompt-Only vs Agent Loop
Goal: Feel the difference between a single prompt and an autonomous agent.
Step A1.1 — Choose a realistic task
Pick one:
“Plan a 3-day study schedule based on topics + constraints”
“Analyze a support issue and create a resolution checklist”
“Summarize docs and answer questions with citations (mini RAG)”
Step A1.2 — Run Prompt-Only baseline
One prompt
No tools
No memory
Save output:outputs/prompt_only.txt
Step A1.3 — Run an Agent loop version
Agent loop should:
create plan
execute steps
self-check
revise if needed
Save output:outputs/agent_loop.txt
✅ Deliverable: reports/prompt_vs_agent.md (1 page)
What failed in prompt-only?
What improved with agent loop?
A2) Introduce the 3 agent capabilities: Planning, Memory, Execution
Goal: Make each capability visible and measurable.
Step A2.1 — Planning test
Ask the agent: “Show your plan as steps with success criteria.”
Step A2.2 — Memory test
Give a preference:
“I prefer bullet answers and short responses.”
Later ask a different question and verify it remembers.
Step A2.3 — Execution test
Have it call a tool (even mocked) such as:
search docs
calculate totals
read a local file
✅ Deliverable: outputs/capabilities_demo.json
🧪 Part B — Agent Architectures (8.2)
B1) Implement ReAct Pattern (Reason + Act + Observe)
Goal: Build a ReAct agent that alternates between thinking and tool use.
Step B1.1 — Define tool schema contract
Create src/tool_schemas.py with JSON schemas for:
search_kb(query)calc(expression)save_note(title, content)read_note(title)
Step B1.2 — Build a simple tool server (FastAPI)
Create tools/server.py with endpoints:
POST /search_kb(search local documents indata/kb/)POST /calc(safe calculator)POST /notes/saveGET /notes/read?title=...
Run tools:
uvicorn tools.server:app --reload --port 8002
Step B1.3 — Implement ReAct loop
Create src/react_agent.py:
Loop for max_steps=6:
model decides: tool call OR final answer
validate tool call JSON
execute tool
append tool result as “observation”
repeat
✅ Deliverable: outputs/react_runs.json for 5 tasks:
“Find refund policy and summarize”
“Calculate 18% tax on 349”
“Save a note and retrieve it”
“Search KB and answer with citations”
“Create checklist based on KB”
B2) Planner–Executor Architecture
Goal: Separate “planning” from “doing” for better reliability.
Step B2.1 — Implement planner
Create src/planner.py:
Input: user goal
Output: JSON plan:
steps[]
tools_needed[]
success_criteria
Step B2.2 — Implement executor
Create src/executor.py:
Takes the plan
Executes each step with tools
Stores intermediate results
Stops and asks for help if blocked
Step B2.3 — Compare with ReAct
Run same 3 tasks with both:
response quality
fewer tool calls?
better structure?
✅ Deliverable: reports/react_vs_planner_executor.md
B3) Multi-Agent System (Simple but Real)
Goal: Build 2–3 agents with roles and a coordinator.
Step B3.1 — Define agents
Research Agent: retrieves facts from KB
Writer Agent: writes final answer
Verifier Agent: checks citations + consistency
Step B3.2 — Coordinator flow
Create src/multi_agent.py:
Research agent gathers sources
Writer drafts answer
Verifier checks for:
missing citations
contradictions
policy violations
Writer revises if needed
✅ Deliverable: outputs/multi_agent_demo.json for 2 tasks:
“Explain policy X and list exceptions”
“Compare two docs and summarize differences”
🧪 Part C — Building Practical Agents (8.3)
C1) Task Decomposition (Make It Automatic)
Goal: Teach the agent to break a goal into smaller tasks reliably.
Step C1.1 — Add decomposition prompt template
Require the agent to output:
sub_tasks[]
dependencies
estimated difficulty
Step C1.2 — Execute sub-tasks
Run sub-tasks sequentially, storing results.
✅ Deliverable: outputs/task_decomposition.json
C2) Long-Term Memory Strategies (Short-term vs Persistent)
Goal: Implement memory that persists across sessions.
Step C2.1 — Short-term memory (session)
Store:
last 5 user turns
last tool results
current plan
Step C2.2 — Long-term memory (vector store)
Use Chroma:
embed notes + preferences
store with metadata:
type = preference / fact / project
timestamp
Step C2.3 — Memory retrieval trigger
Before responding:
retrieve top-k memories relevant to user request
inject into context as “MEMORY: …”
✅ Deliverable: reports/memory_demo.md
show user preference remembered after restart
C3) Human-in-the-Loop (HITL) Control
Goal: Add safe stopping points and approvals.
Step C3.1 — Add “approval required” actions
Mark actions as high-risk:
writing emails to customers
changing task status
saving final report
Step C3.2 — Implement HITL gate
If action requires approval:
agent outputs a JSON request:
action_summaryproposed_changesrisk_levelapprove: true/false
Step C3.3 — Simulate human approval
Run 3 tasks where agent must ask for approval before final output.
✅ Deliverable: outputs/hitl_runs.json
🧪 Part D — Observability + Failure Recovery (Production-Ready Agents)
D1) Observability logging
Log every step to logs/agent.jsonl:
run_id, step_id
tool name + args
tool latency + result
model decision type
error if any
✅ Deliverable: log file with 3 full runs
D2) Failure recovery strategies
Simulate tool failures (20% random errors). Implement:
Retry tool (max 2)
If schema invalid → ask model to fix tool args
If still failing → fallback response + ask user
✅ Deliverable: reports/failure_recovery.md with examples
✅ Final Submission Checklist (Section 8 Lab)
Students submit:
Prompt-only vs agent loop comparison
ReAct agent implementation + 5 runs
Planner–Executor plan JSON + execution logs
Multi-agent demo with verifier pass
Task decomposition outputs
Long-term memory demo (Chroma)
HITL approval workflow runs
Observability logs + failure recovery report