Hands on Lab 9


✅ Lab Setup (10–15 min)

Step 0 — Create folder

Create: genai-section9-fullstack-lab/ with:

Step 1 — Install backend deps

pip install fastapi uvicorn pydantic python-dotenv httpx tenacity
pip install jsonschema tiktoken
pip install chromadb sentence-transformers
pip install structlog rich

Optional APIs:

pip install openai anthropic google-generativeai

Step 2 — Create .env

In backend/.env:

PROVIDER=openai  # or anthropic or gemini or mock
OPENAI_API_KEY=...
ANTHROPIC_API_KEY=...
GEMINI_API_KEY=...

Step 3 — Frontend setup (choose one)

Option A (recommended): Next.js

npx create-next-app@latest frontend
cd frontend
npm install

Option B (simpler): Vite + React

npm create vite@latest frontend -- --template react
cd frontend
npm install

🧪 Part A — Backend Architecture (9.1)

A1) Create a Production-Style FastAPI Service

Goal: Build a backend with clean request lifecycle: validate → build prompt → call model → stream response → log → store state.

Step A1.1 — Scaffold backend files

Inside backend/ create:

Deliverable: backend scaffold

A2) Define Request/Response Contracts

Goal: Strict schemas prevent frontend/backend mismatches.

Step A2.1 — Create Pydantic models

In schemas.py define:

ChatRequest

ChatChunk (for streaming)

Deliverable: strict request/stream contracts

A3) Implement LLM Provider Layer + Mock Mode

Goal: Students can run this even without API keys.

Step A3.1 — Build llm_client.py

Add providers:

Step A3.2 — Add timeout + retry policies

Use tenacity:

Deliverable: reliable LLM call layer

A4) Request Lifecycle Design

Goal: Make the backend behave like production.

Step A4.1 — Implement endpoint: POST /chat

Flow:

  1. Validate input schema

  2. Create run_id

  3. Load conversation state from memory store

  4. Apply context pruning (Section 9.3)

  5. Call LLM (streaming)

  6. Emit SSE chunks back to frontend

  7. Persist updated state

  8. Log metrics

Deliverable: working /chat endpoint

A5) Observability & Metrics

Goal: Debuggability is mandatory for LLM systems.

Step A5.1 — Log per request

Write JSON logs to logs/backend.jsonl including:

Deliverable: log file showing at least 5 runs

🧪 Part B — Frontend → LLM Integration (9.2)

B1) Build the Chat UI

Goal: Create a chat interface with messages, loading, error state.

Step B1.1 — UI components

Create:

Deliverable: basic chat UI

B2) Implement Streaming Responses (SSE)

Goal: Stream assistant tokens live like ChatGPT.

Step B2.1 — Use Server-Sent Events

Frontend should:

  1. POST to /chat with stream=true

  2. Listen to SSE events

  3. Append incoming “delta” chunks to the last assistant message

Step B2.2 — Handle stream end

When type="done", stop stream and enable input.

Deliverable: smooth streaming chat

B3) UX Patterns for AI Apps

Goal: Make it feel professional.

Step B3.1 — Add 3 UX features

Pick 3:

Deliverable: reports/ux_patterns.md describing what you added

🧪 Part C — State, Memory & Context Management (9.3)

C1) Session Memory (Short-Term Conversation)

Goal: Store the last N turns per session.

Step C1.1 — Implement in-memory store (dict)

Deliverable: conversation persists across messages

C2) Persistent Memory (Long-Term)

Goal: Persist memory across restarts.

Step C2.1 — Store chat history to disk

Options:

Start with: JSONL per session in data/sessions/.

Deliverable: reload works after backend restart

C3) Context Pruning Strategies

Goal: Stay within token budget while keeping important context.

Step C3.1 — Implement token budget

Define:

Step C3.2 — Add pruning strategies

Implement:

  1. Recency window: keep last K messages

  2. Summarize old history: compress earlier turns into a summary message

  3. Drop low-value content: remove long tool logs, repeated confirmations

Deliverable: reports/context_pruning.md

C4) Add “Memory Types” (Practical Pattern)

Goal: Separate chat history from user preferences.

Step C4.1 — Store user preferences separately

Example:

Let user say:

“From now on, keep answers in bullet points.”

Persist as preferences.json keyed by user or session.

Deliverable: preference persists + affects later outputs

🧪 Part D — Reliability: Errors, Retries, and Fallbacks

D1) Simulate failures

Goal: Ensure app doesn’t break in real life.

Step D1.1 — Add a “chaos mode”

If CHAOS_MODE=true, randomly:

Step D1.2 — Frontend behavior

Deliverable: reports/reliability_tests.md with screenshots or logs

✅ Mini Project (End of Section 9): “Production Chat MVP”

Goal: Combine everything into a polished demo.

Requirements

Deliverable: short demo script + 3 example conversations saved in outputs/

✅ Final Submission Checklist (Section 9)

Students submit:

  1. FastAPI /chat endpoint + streaming SSE support

  2. Provider layer with retries + mock mode

  3. Backend JSONL logs with run_id + latency

  4. Frontend chat UI + streaming integration

  5. UX enhancements report

  6. Session memory + persistent storage evidence

  7. Context pruning report + token budget enforcement

  8. Reliability tests (failure simulation + recovery)

  9. Mini project demo conversations