Hands on Lab 7

✅ Lab Setup (10–15 min)

Step 0 — Create folder

Create: genai-section7-tools-lab/ with:

src/
tools/
data/
outputs/
logs/
reports/

Step 1 — Install dependencies

pip install fastapi uvicorn pydantic python-dotenv requests
pip install jsonschema tenacity rich
pip install transformers torch sentencepiece

Optional (if using APIs):

pip install openai anthropic google-generativeai

🧪 Part A — Tool-Using LLMs (7.1)

A1) Define Tool Schemas (Function Calling Contract)

Goal: Teach the model how to call tools with structured JSON arguments.

Step A1.1 — Create 3 tools (schemas first)

Create src/tool_schemas.py defining JSON schemas for:

get_weather(city: str, units: str)
convert_currency(amount: float, from: str, to: str)
search_faq(query: str) (local KB search)

Each schema includes:

name
description
parameters with types + required fields

✅ Deliverable: tool schema definitions (model-readable)

A2) Build a Structured Output Parser

Goal: Enforce that tool calls are valid JSON.

Step A2.1 — Create `src/json_parser.py`

Extract tool call JSON from model output
Validate JSON strictly
Validate schema with jsonschema

If invalid:

return an error object:
- error_type
- message
- raw_output

✅ Deliverable: parser that never crashes on bad model output

A3) Run First Tool Call (Single-Step)

Goal: LLM decides which tool to call.

Step A3.1 — Create `data/tool_tasks.json`

Add 10 tasks:

“What’s the weather in Chicago tomorrow?”
“Convert 120 USD to PKR”
“Search FAQ: refund policy”

Step A3.2 — Prompt the model to respond in tool-call JSON

Format:

{"tool":"convert_currency","arguments":{"amount":120,"from":"USD","to":"PKR"}}

Step A3.3 — Execute the tool

Run tool function
Print tool result
Ask model to produce final response based on tool result

✅ Deliverable: outputs/single_step_results.json

🧪 Part B — Designing Tools for LLMs (7.2)

B1) Build Tools as APIs (FastAPI Tool Server)

Goal: Wrap tools behind an API like real production systems.

Step B1.1 — Create `tools/server.py` (FastAPI)

Endpoints:

POST /weather
POST /convert
POST /faq

For the lab:

Weather can be mocked (static dictionary) to avoid external API dependency.
Currency can be mocked with sample rates in a dict.
FAQ search can be simple keyword match on data/faq.md.

✅ Deliverable: running tool server:

uvicorn tools.server:app --reload --port 8001

B2) Input Validation & Error Handling (Tool Layer)

Goal: Make tools robust to bad inputs from the model.

Step B2.1 — Add Pydantic validation

Examples:

City must be non-empty string
Currency must be 3 letters
Amount must be > 0

Step B2.2 — Add predictable error responses

Return JSON errors:

{"ok": false, "error": {"code": "INVALID_INPUT", "message": "..."}}

✅ Deliverable: reports/tool_validation.md with 5 bad-input tests

B3) Stateless vs Stateful Tools

Goal: Understand when a tool must remember state.

Step B3.1 — Stateless tool examples

Currency conversion
FAQ search

Step B3.2 — Create a stateful tool: “Task List Manager”

Add endpoints:

POST /tasks/add
GET /tasks/list
POST /tasks/complete

Store tasks in memory (for now).

✅ Deliverable: working stateful tool + example task list

🧪 Part C — Multi-Step Reasoning with Tools (7.3)

C1) Tool Chaining: Multi-Step Agent Loop

Goal: Build an agent that calls tools multiple times to solve one request.

Step C1.1 — Create multi-step tasks in `data/agent_tasks.json`

Examples:

“Plan my day in NYC: check weather, then create outfit recommendation.”
“Convert 200 USD to PKR, then summarize what I can buy in PKR (short).”
“Add 3 tasks to my list and mark one complete.”

Step C1.2 — Implement an agent loop `src/agent.py`

Loop:

Send current state + user goal to model
Model outputs either:
- tool call JSON OR
- final answer JSON
Validate tool call
Execute tool via API
Append tool result to conversation state
Repeat up to max_steps=6

✅ Deliverable: successful tool chaining for all 3 tasks

C2) Observability: Log Every Tool Call

Goal: Make tool usage debuggable.

Step C2.1 — Add structured logs to `logs/agent_runs.jsonl`

Each line includes:

timestamp
run_id
step
tool_name
args
response
latency_ms
error (if any)

Step C2.2 — Add a console trace (optional)

Use rich to print:

tool called
inputs
outputs
next decision

✅ Deliverable: log file with at least 3 complete runs

C3) Failure Recovery Strategies

Goal: Handle tool failures gracefully.

Step C3.1 — Simulate failures

Make tools randomly fail 20% of the time (return 500 or error JSON).

Step C3.2 — Add recovery behaviors in the agent

Retry tool call (max 2)
If input invalid → ask model to fix arguments
If repeated failures → fallback response and ask user

Step C3.3 — Validate recovery works

Run tasks multiple times and confirm agent completes.

✅ Deliverable: reports/failure_recovery.md showing:

failures encountered
how recovery fixed them

🧪 Part D — Mini Project: “Support Assistant with Tools” (45–60 min)

D1) Build a Customer Support Agent

Goal: Combine tool calling + state + safety.

Step D1.1 — Tools

FAQ search (KB)
Task creation (“create support ticket”)
Ticket status lookup (stateful)

Step D1.2 — Guardrails

Force JSON tool call schema
Block prompt injection patterns in user input
Output validation (no secrets, no system prompt leakage)

Step D1.3 — Test scenarios

“My refund isn’t processed—what do I do?”
“Create a ticket for my broken product.”
“Check status of ticket #123.”

✅ Deliverable: outputs/support_agent_demo.json with 3 full runs

✅ Final Submission Checklist (Section 7 Lab)

Students submit:

Tool schemas + strict JSON tool-call format
Tool server (FastAPI) with validation + predictable error outputs
Single-step tool calling results
Multi-step agent loop (tool chaining)
Observability logs (JSONL) showing each tool call + latency
Failure recovery report with retries + argument fixing
Mini project: support agent demo with tools + state + guardrails