Hands on Lab 9

✅ Lab Setup (10–15 min)

Step 0 — Create folder

Create: genai-section9-fullstack-lab/ with:

backend/
frontend/
data/
outputs/
logs/
reports/

Step 1 — Install backend deps

pip install fastapi uvicorn pydantic python-dotenv httpx tenacity
pip install jsonschema tiktoken
pip install chromadb sentence-transformers
pip install structlog rich

Optional APIs:

pip install openai anthropic google-generativeai

Step 2 — Create `.env`

In backend/.env:

PROVIDER=openai  # or anthropic or gemini or mock
OPENAI_API_KEY=...
ANTHROPIC_API_KEY=...
GEMINI_API_KEY=...

Step 3 — Frontend setup (choose one)

Option A (recommended): Next.js

npx create-next-app@latest frontend
cd frontend
npm install

Option B (simpler): Vite + React

npm create vite@latest frontend -- --template react
cd frontend
npm install

🧪 Part A — Backend Architecture (9.1)

A1) Create a Production-Style FastAPI Service

Goal: Build a backend with clean request lifecycle: validate → build prompt → call model → stream response → log → store state.

Step A1.1 — Scaffold backend files

Inside backend/ create:

app.py (FastAPI entry)
schemas.py (Pydantic models)
llm_client.py (provider adapters)
chat_service.py (business logic)
memory_store.py (session + persistent)
observability.py (logging + trace IDs)

✅ Deliverable: backend scaffold

A2) Define Request/Response Contracts

Goal: Strict schemas prevent frontend/backend mismatches.

Step A2.1 — Create Pydantic models

In schemas.py define:

ChatRequest

session_id: str
user_message: str
stream: bool = True
max_output_tokens: int
temperature: float

ChatChunk (for streaming)

type: "delta" | "meta" | "error" | "done"
text: str
run_id: str
tokens_used: optional

✅ Deliverable: strict request/stream contracts

A3) Implement LLM Provider Layer + Mock Mode

Goal: Students can run this even without API keys.

Step A3.1 — Build `llm_client.py`

Add providers:

openai_chat()
anthropic_chat()
gemini_chat()
mock_chat() (returns predictable text chunks)

Step A3.2 — Add timeout + retry policies

Use tenacity:

retry 2 times on timeouts/429/5xx
exponential backoff
total deadline (e.g., 25 seconds)

✅ Deliverable: reliable LLM call layer

A4) Request Lifecycle Design

Goal: Make the backend behave like production.

Step A4.1 — Implement endpoint: `POST /chat`

Flow:

Validate input schema
Create run_id
Load conversation state from memory store
Apply context pruning (Section 9.3)
Call LLM (streaming)
Emit SSE chunks back to frontend
Persist updated state
Log metrics

✅ Deliverable: working /chat endpoint

A5) Observability & Metrics

Goal: Debuggability is mandatory for LLM systems.

Step A5.1 — Log per request

Write JSON logs to logs/backend.jsonl including:

run_id
session_id
model/provider
latency_ms
retries
prompt_tokens, output_tokens (estimate ok)
error (if any)

✅ Deliverable: log file showing at least 5 runs

🧪 Part B — Frontend → LLM Integration (9.2)

B1) Build the Chat UI

Goal: Create a chat interface with messages, loading, error state.

Step B1.1 — UI components

Create:

message list (user vs assistant)
input box + send button
“typing” indicator
error banner with retry button

✅ Deliverable: basic chat UI

B2) Implement Streaming Responses (SSE)

Goal: Stream assistant tokens live like ChatGPT.

Step B2.1 — Use Server-Sent Events

Frontend should:

POST to /chat with stream=true
Listen to SSE events
Append incoming “delta” chunks to the last assistant message

Step B2.2 — Handle stream end

When type="done", stop stream and enable input.

✅ Deliverable: smooth streaming chat

B3) UX Patterns for AI Apps

Goal: Make it feel professional.

Step B3.1 — Add 3 UX features

Pick 3:

“Stop generating” button
“Regenerate response”
Copy button
Message timestamps
Markdown rendering
“Sources” panel placeholder (for RAG later)

✅ Deliverable: reports/ux_patterns.md describing what you added

🧪 Part C — State, Memory & Context Management (9.3)

C1) Session Memory (Short-Term Conversation)

Goal: Store the last N turns per session.

Step C1.1 — Implement in-memory store (dict)

session_id -> list of messages
keep last 20 messages max

✅ Deliverable: conversation persists across messages

C2) Persistent Memory (Long-Term)

Goal: Persist memory across restarts.

Step C2.1 — Store chat history to disk

Options:

SQLite (simple)
JSONL file per session (simplest)
Chroma for “semantic memory” (advanced)

Start with: JSONL per session in data/sessions/.

✅ Deliverable: reload works after backend restart

C3) Context Pruning Strategies

Goal: Stay within token budget while keeping important context.

Step C3.1 — Implement token budget

Define:

total context budget: 1800 tokens
reserved output: 400 tokens

Step C3.2 — Add pruning strategies

Implement:

Recency window: keep last K messages
Summarize old history: compress earlier turns into a summary message
Drop low-value content: remove long tool logs, repeated confirmations

✅ Deliverable: reports/context_pruning.md

show before/after context length
explain which messages got removed/summarized

C4) Add “Memory Types” (Practical Pattern)

Goal: Separate chat history from user preferences.

Step C4.1 — Store user preferences separately

Example:

writing style
verbosity
formatting preferences

Let user say:

“From now on, keep answers in bullet points.”

Persist as preferences.json keyed by user or session.

✅ Deliverable: preference persists + affects later outputs

🧪 Part D — Reliability: Errors, Retries, and Fallbacks

D1) Simulate failures

Goal: Ensure app doesn’t break in real life.

Step D1.1 — Add a “chaos mode”

If CHAOS_MODE=true, randomly:

delay responses
throw 500
return rate-limit error

Step D1.2 — Frontend behavior

show toast “Temporary error”
allow retry
keep user message intact

✅ Deliverable: reports/reliability_tests.md with screenshots or logs

✅ Mini Project (End of Section 9): “Production Chat MVP”

Goal: Combine everything into a polished demo.

Requirements

FastAPI backend
Streaming chat frontend
Session memory
Persistent storage
Context pruning
Logging + retries
Clean UX (stop/regenerate/copy)

✅ Deliverable: short demo script + 3 example conversations saved in outputs/

✅ Final Submission Checklist (Section 9)

Students submit:

FastAPI /chat endpoint + streaming SSE support
Provider layer with retries + mock mode
Backend JSONL logs with run_id + latency
Frontend chat UI + streaming integration
UX enhancements report
Session memory + persistent storage evidence
Context pruning report + token budget enforcement
Reliability tests (failure simulation + recovery)
Mini project demo conversations