✅ Lab Setup (10–15 min)
Step 0 — Create folder
Create: genai-section6-rag-lab/ with:
data/docs/data/queries/indexes/outputs/src/reports/
Step 1 — Install dependencies
pip install numpy pandas scikit-learn matplotlib pip install sentence-transformers pip install chromadb faiss-cpu pip install rank-bm25 nltk pip install transformers torch sentencepiece pip install pydantic python-dotenv tqdm
If FAISS fails, skip it and use Chroma only.
Step 2 — Download NLTK tokenizer
python -c "import nltk; nltk.download('punkt')"
🧪 Part A — Why RAG is Needed (6.1)
A1) Hallucination Baseline (No Retrieval)
Goal: Prove that “LLM-only” answers hallucinate on unknown facts.
Step A1.1 — Create a private knowledge base
Add 6–10 docs in data/docs/ (plain .txt):
“Company Policy Handbook”
“Product FAQ”
“Engineering Runbook”
“Incident Postmortem”
“HR Guidelines”
“Aviation Maintenance Notes” (optional domain)
Each doc: 500–1500 words.
Step A1.2 — Create evaluation questions
Create data/queries/questions.json with 15 questions:
10 answerable from docs
5 NOT answerable (to test honesty)
Each record:
idquestionanswerable(true/false)source_doc_hint(optional)
Step A1.3 — Run “LLM-only” answering
Use a small instruction model for generation:
google/flan-t5-base(light)
orany API model you already use
Prompt:
“Answer the question. If you don’t know, say you don’t know.”
Save outputs to:
outputs/baseline_llm_only.json
Step A1.4 — Score hallucinations manually (fast)
Add columns:
hallucinated(Y/N)uncertain(Y/N)correct(Y/N)
✅ Deliverable: reports/hallucination_baseline.md
Include counts and 2–3 examples of hallucination.
🧪 Part B — RAG Architecture (6.2)
B1) Document Ingestion Pipeline
Goal: Turn documents into chunks with metadata.
Step B1.1 — Build ingestion script src/ingest.py
Read all data/docs/*.txt and produce chunk records with:
chunk_iddoc_idtitlechunk_textchunk_indexchar_start,char_end
Step B1.2 — Implement chunking strategy
Use token-aware chunking:
chunk_size = 250 tokens
overlap = 40 tokens
Save:
outputs/chunks.jsonl
✅ Deliverable: chunk file + chunk count per document.
B2) Embeddings + Vector Store (Retriever)
Goal: Create a searchable index.
Step B2.1 — Generate embeddings for chunks
Use:
sentence-transformers/all-MiniLM-L6-v2
Save:
outputs/chunk_embeddings.npyoutputs/chunk_metadata.json
Step B2.2 — Store in Chroma
Create a Chroma collection:
name:
rag_demo
Insert:
ids = chunk_id
embeddings
documents = chunk_text
metadata = {doc_id, title, chunk_index}
✅ Deliverable: persisted Chroma DB in indexes/chroma/.
B3) Retriever → Generator Flow
Goal: Implement end-to-end RAG.
Step B3.1 — Write retriever function src/retrieve.py
Inputs:
query
top_k (default 4)
Steps:
embed query
vector search
return top_k chunks + scores
Step B3.2 — Build context window manager
Implement “context builder”:
Sort by score
Stop adding chunks when context exceeds token budget (e.g., 900 tokens)
Add citations like:
[doc_id#chunk_index]
Step B3.3 — Write generator src/generate.py
Prompt template:
System: “You are a grounded assistant. Use only provided context.”
User: question
Context: retrieved chunks with citations
Rules:
If answer not in context: say “Not found in provided documents.”
Provide bullet answer + citations.
Save outputs:
outputs/rag_answers.json
✅ Deliverable: working RAG pipeline producing answers with citations.
B4) Compare RAG vs LLM-only
Goal: Demonstrate improvement + reduced hallucination.
Step B4.1 — Run same 15 questions through RAG
Save results.
Step B4.2 — Score outputs
Add:
grounded(Y/N)citation_correct(Y/N)hallucinated(Y/N)
Step B4.3 — Write comparison summary
Compute:
hallucination rate drop
answer accuracy improvement on answerable questions
honesty improvement on unanswerable questions
✅ Deliverable: reports/rag_vs_baseline.md with a table.
🧪 Part C — Advanced RAG Techniques (6.3)
C1) Hybrid Search (Keyword + Vector)
Goal: Improve retrieval for exact terms and rare keywords.
Step C1.1 — Build BM25 index
Use rank-bm25 over chunk texts.
Step C1.2 — Retrieve with both methods
For each query:
vector top_k = 8
bm25 top_k = 8
Step C1.3 — Merge results
Combine and deduplicate:
Normalize scores
Weighted merge (example: 0.6 vector, 0.4 bm25)
✅ Deliverable: outputs/hybrid_retrieval.json and a short before/after example.
C2) Re-ranking Strategies
Goal: Improve top results before generation.
Option A (No LLM): Cross-encoder re-ranker
Use:
cross-encoder/ms-marco-MiniLM-L-6-v2
Step C2.1 — Take top 12 candidates from hybrid
Step C2.2 — Re-rank to top 4 based on relevance score
Save:
before ranks vs after ranks
✅ Deliverable: reports/reranking_effect.md (show one query where it improves relevance).
C3) Multi-Document Reasoning
Goal: Answer questions requiring combining multiple sources.
Step C3.1 — Create multi-doc questions (5)
Example types:
Compare two policies from different docs
Summarize change from Postmortem vs Runbook
“What are the steps AND the exception rules?”
Step C3.2 — Retrieval strategy
Retrieve top_k=8
Ensure diversity:
max 2 chunks per doc
prefer covering multiple doc_ids
Step C3.3 — Generate answer with synthesis prompt
Prompt rules:
Must cite at least 2 distinct sources
Provide “combined explanation” section
Provide “source breakdown” section
✅ Deliverable: outputs/multidoc_answers.json + 2 strong examples.
🧪 Part D — Context Window Management (Production Practice)
D1) Token Budget Stress Test
Goal: Show what happens when context is too large.
Step D1.1 — Try with large top_k (e.g., 15)
Step D1.2 — Compare strategies
naive: concatenate all
smart: token-budget selection + diversity
summary: summarize chunks then insert
✅ Deliverable: reports/context_management.md with 3 outputs and conclusions.
✅ Final Submission Checklist (Section 6 Lab)
Students submit:
LLM-only baseline answers + hallucination scoring
Chunking/ingestion outputs with metadata
Vector store index (Chroma) + retrieval function
Full RAG pipeline answers with citations
RAG vs baseline comparison report
Hybrid search implementation + example improvement
Re-ranking report (before/after ranks)
Multi-document reasoning answers with multi-source citations
Context window management stress test report