Hands on Lab 8

✅ Lab Setup (10–15 min)

Step 0 — Create project folder

Create: genai-section8-agents-lab/ with:

src/
tools/
memory/
data/
outputs/
logs/
reports/

Step 1 — Install dependencies

pip install fastapi uvicorn pydantic python-dotenv requests
pip install jsonschema tenacity rich
pip install chromadb sentence-transformers
pip install networkx

Optional (if using APIs):

pip install openai anthropic google-generativeai

🧪 Part A — Agents vs Prompt-Based Systems (8.1)

A1) Same Task, Two Approaches: Prompt-Only vs Agent Loop

Goal: Feel the difference between a single prompt and an autonomous agent.

Step A1.1 — Choose a realistic task

Pick one:

“Plan a 3-day study schedule based on topics + constraints”
“Analyze a support issue and create a resolution checklist”
“Summarize docs and answer questions with citations (mini RAG)”

Step A1.2 — Run Prompt-Only baseline

One prompt
No tools
No memory
Save output: outputs/prompt_only.txt

Step A1.3 — Run an Agent loop version

Agent loop should:

create plan
execute steps
self-check
revise if needed
Save output: outputs/agent_loop.txt

✅ Deliverable: reports/prompt_vs_agent.md (1 page)

What failed in prompt-only?
What improved with agent loop?

A2) Introduce the 3 agent capabilities: Planning, Memory, Execution

Goal: Make each capability visible and measurable.

Step A2.1 — Planning test

Ask the agent: “Show your plan as steps with success criteria.”

Step A2.2 — Memory test

Give a preference:

“I prefer bullet answers and short responses.”
Later ask a different question and verify it remembers.

Step A2.3 — Execution test

Have it call a tool (even mocked) such as:

search docs
calculate totals
read a local file

✅ Deliverable: outputs/capabilities_demo.json

🧪 Part B — Agent Architectures (8.2)

B1) Implement ReAct Pattern (Reason + Act + Observe)

Goal: Build a ReAct agent that alternates between thinking and tool use.

Step B1.1 — Define tool schema contract

Create src/tool_schemas.py with JSON schemas for:

search_kb(query)
calc(expression)
save_note(title, content)
read_note(title)

Step B1.2 — Build a simple tool server (FastAPI)

Create tools/server.py with endpoints:

POST /search_kb (search local documents in data/kb/)
POST /calc (safe calculator)
POST /notes/save
GET /notes/read?title=...

Run tools:

uvicorn tools.server:app --reload --port 8002

Step B1.3 — Implement ReAct loop

Create src/react_agent.py:
Loop for max_steps=6:

model decides: tool call OR final answer
validate tool call JSON
execute tool
append tool result as “observation”
repeat

✅ Deliverable: outputs/react_runs.json for 5 tasks:

“Find refund policy and summarize”
“Calculate 18% tax on 349”
“Save a note and retrieve it”
“Search KB and answer with citations”
“Create checklist based on KB”

B2) Planner–Executor Architecture

Goal: Separate “planning” from “doing” for better reliability.

Step B2.1 — Implement planner

Create src/planner.py:

Input: user goal
Output: JSON plan:
- steps[]
- tools_needed[]
- success_criteria

Step B2.2 — Implement executor

Create src/executor.py:

Takes the plan
Executes each step with tools
Stores intermediate results
Stops and asks for help if blocked

Step B2.3 — Compare with ReAct

Run same 3 tasks with both:

response quality
fewer tool calls?
better structure?

✅ Deliverable: reports/react_vs_planner_executor.md

B3) Multi-Agent System (Simple but Real)

Goal: Build 2–3 agents with roles and a coordinator.

Step B3.1 — Define agents

Research Agent: retrieves facts from KB
Writer Agent: writes final answer
Verifier Agent: checks citations + consistency

Step B3.2 — Coordinator flow

Create src/multi_agent.py:

Research agent gathers sources
Writer drafts answer
Verifier checks for:
- missing citations
- contradictions
- policy violations
Writer revises if needed

✅ Deliverable: outputs/multi_agent_demo.json for 2 tasks:

“Explain policy X and list exceptions”
“Compare two docs and summarize differences”

🧪 Part C — Building Practical Agents (8.3)

C1) Task Decomposition (Make It Automatic)

Goal: Teach the agent to break a goal into smaller tasks reliably.

Step C1.1 — Add decomposition prompt template

Require the agent to output:

sub_tasks[]
dependencies
estimated difficulty

Step C1.2 — Execute sub-tasks

Run sub-tasks sequentially, storing results.

✅ Deliverable: outputs/task_decomposition.json

C2) Long-Term Memory Strategies (Short-term vs Persistent)

Goal: Implement memory that persists across sessions.

Step C2.1 — Short-term memory (session)

Store:

last 5 user turns
last tool results
current plan

Step C2.2 — Long-term memory (vector store)

Use Chroma:

embed notes + preferences
store with metadata:
- type = preference / fact / project
- timestamp

Step C2.3 — Memory retrieval trigger

Before responding:

retrieve top-k memories relevant to user request
inject into context as “MEMORY: …”

✅ Deliverable: reports/memory_demo.md

show user preference remembered after restart

C3) Human-in-the-Loop (HITL) Control

Goal: Add safe stopping points and approvals.

Step C3.1 — Add “approval required” actions

Mark actions as high-risk:

writing emails to customers
changing task status
saving final report

Step C3.2 — Implement HITL gate

If action requires approval:

agent outputs a JSON request:
- action_summary
- proposed_changes
- risk_level
- approve: true/false

Step C3.3 — Simulate human approval

Run 3 tasks where agent must ask for approval before final output.

✅ Deliverable: outputs/hitl_runs.json

🧪 Part D — Observability + Failure Recovery (Production-Ready Agents)

D1) Observability logging

Log every step to logs/agent.jsonl:

run_id, step_id
tool name + args
tool latency + result
model decision type
error if any

✅ Deliverable: log file with 3 full runs

D2) Failure recovery strategies

Simulate tool failures (20% random errors). Implement:

Retry tool (max 2)
If schema invalid → ask model to fix tool args
If still failing → fallback response + ask user

✅ Deliverable: reports/failure_recovery.md with examples

✅ Final Submission Checklist (Section 8 Lab)

Students submit:

Prompt-only vs agent loop comparison
ReAct agent implementation + 5 runs
Planner–Executor plan JSON + execution logs
Multi-agent demo with verifier pass
Task decomposition outputs
Long-term memory demo (Chroma)
HITL approval workflow runs
Observability logs + failure recovery report