✅ Lab Setup (10–15 min)
Step 0 — Create folder
Create: genai-section9-fullstack-lab/ with:
backend/frontend/data/outputs/logs/reports/
Step 1 — Install backend deps
pip install fastapi uvicorn pydantic python-dotenv httpx tenacity pip install jsonschema tiktoken pip install chromadb sentence-transformers pip install structlog rich
Optional APIs:
pip install openai anthropic google-generativeai
Step 2 — Create .env
In backend/.env:
PROVIDER=openai # or anthropic or gemini or mock OPENAI_API_KEY=... ANTHROPIC_API_KEY=... GEMINI_API_KEY=...
Step 3 — Frontend setup (choose one)
Option A (recommended): Next.js
npx create-next-app@latest frontend cd frontend npm install
Option B (simpler): Vite + React
npm create vite@latest frontend -- --template react cd frontend npm install
🧪 Part A — Backend Architecture (9.1)
A1) Create a Production-Style FastAPI Service
Goal: Build a backend with clean request lifecycle: validate → build prompt → call model → stream response → log → store state.
Step A1.1 — Scaffold backend files
Inside backend/ create:
app.py(FastAPI entry)schemas.py(Pydantic models)llm_client.py(provider adapters)chat_service.py(business logic)memory_store.py(session + persistent)observability.py(logging + trace IDs)
✅ Deliverable: backend scaffold
A2) Define Request/Response Contracts
Goal: Strict schemas prevent frontend/backend mismatches.
Step A2.1 — Create Pydantic models
In schemas.py define:
ChatRequest
session_id: struser_message: strstream: bool = Truemax_output_tokens: inttemperature: float
ChatChunk (for streaming)
type: "delta" | "meta" | "error" | "done"text: strrun_id: strtokens_used: optional
✅ Deliverable: strict request/stream contracts
A3) Implement LLM Provider Layer + Mock Mode
Goal: Students can run this even without API keys.
Step A3.1 — Build llm_client.py
Add providers:
openai_chat()anthropic_chat()gemini_chat()mock_chat()(returns predictable text chunks)
Step A3.2 — Add timeout + retry policies
Use tenacity:
retry 2 times on timeouts/429/5xx
exponential backoff
total deadline (e.g., 25 seconds)
✅ Deliverable: reliable LLM call layer
A4) Request Lifecycle Design
Goal: Make the backend behave like production.
Step A4.1 — Implement endpoint: POST /chat
Flow:
Validate input schema
Create
run_idLoad conversation state from memory store
Apply context pruning (Section 9.3)
Call LLM (streaming)
Emit SSE chunks back to frontend
Persist updated state
Log metrics
✅ Deliverable: working /chat endpoint
A5) Observability & Metrics
Goal: Debuggability is mandatory for LLM systems.
Step A5.1 — Log per request
Write JSON logs to logs/backend.jsonl including:
run_id
session_id
model/provider
latency_ms
retries
prompt_tokens, output_tokens (estimate ok)
error (if any)
✅ Deliverable: log file showing at least 5 runs
🧪 Part B — Frontend → LLM Integration (9.2)
B1) Build the Chat UI
Goal: Create a chat interface with messages, loading, error state.
Step B1.1 — UI components
Create:
message list (user vs assistant)
input box + send button
“typing” indicator
error banner with retry button
✅ Deliverable: basic chat UI
B2) Implement Streaming Responses (SSE)
Goal: Stream assistant tokens live like ChatGPT.
Step B2.1 — Use Server-Sent Events
Frontend should:
POST to
/chatwithstream=trueListen to SSE events
Append incoming “delta” chunks to the last assistant message
Step B2.2 — Handle stream end
When type="done", stop stream and enable input.
✅ Deliverable: smooth streaming chat
B3) UX Patterns for AI Apps
Goal: Make it feel professional.
Step B3.1 — Add 3 UX features
Pick 3:
“Stop generating” button
“Regenerate response”
Copy button
Message timestamps
Markdown rendering
“Sources” panel placeholder (for RAG later)
✅ Deliverable: reports/ux_patterns.md describing what you added
🧪 Part C — State, Memory & Context Management (9.3)
C1) Session Memory (Short-Term Conversation)
Goal: Store the last N turns per session.
Step C1.1 — Implement in-memory store (dict)
session_id -> list of messageskeep last 20 messages max
✅ Deliverable: conversation persists across messages
C2) Persistent Memory (Long-Term)
Goal: Persist memory across restarts.
Step C2.1 — Store chat history to disk
Options:
SQLite (simple)
JSONL file per session (simplest)
Chroma for “semantic memory” (advanced)
Start with: JSONL per session in data/sessions/.
✅ Deliverable: reload works after backend restart
C3) Context Pruning Strategies
Goal: Stay within token budget while keeping important context.
Step C3.1 — Implement token budget
Define:
total context budget: 1800 tokens
reserved output: 400 tokens
Step C3.2 — Add pruning strategies
Implement:
Recency window: keep last K messages
Summarize old history: compress earlier turns into a summary message
Drop low-value content: remove long tool logs, repeated confirmations
✅ Deliverable: reports/context_pruning.md
show before/after context length
explain which messages got removed/summarized
C4) Add “Memory Types” (Practical Pattern)
Goal: Separate chat history from user preferences.
Step C4.1 — Store user preferences separately
Example:
writing style
verbosity
formatting preferences
Let user say:
“From now on, keep answers in bullet points.”
Persist as preferences.json keyed by user or session.
✅ Deliverable: preference persists + affects later outputs
🧪 Part D — Reliability: Errors, Retries, and Fallbacks
D1) Simulate failures
Goal: Ensure app doesn’t break in real life.
Step D1.1 — Add a “chaos mode”
If CHAOS_MODE=true, randomly:
delay responses
throw 500
return rate-limit error
Step D1.2 — Frontend behavior
show toast “Temporary error”
allow retry
keep user message intact
✅ Deliverable: reports/reliability_tests.md with screenshots or logs
✅ Mini Project (End of Section 9): “Production Chat MVP”
Goal: Combine everything into a polished demo.
Requirements
FastAPI backend
Streaming chat frontend
Session memory
Persistent storage
Context pruning
Logging + retries
Clean UX (stop/regenerate/copy)
✅ Deliverable: short demo script + 3 example conversations saved in outputs/
✅ Final Submission Checklist (Section 9)
Students submit:
FastAPI
/chatendpoint + streaming SSE supportProvider layer with retries + mock mode
Backend JSONL logs with run_id + latency
Frontend chat UI + streaming integration
UX enhancements report
Session memory + persistent storage evidence
Context pruning report + token budget enforcement
Reliability tests (failure simulation + recovery)
Mini project demo conversations