Hands on Lab 6


✅ Lab Setup (10–15 min)

Step 0 — Create folder

Create: genai-section6-rag-lab/ with:

Step 1 — Install dependencies

pip install numpy pandas scikit-learn matplotlib
pip install sentence-transformers
pip install chromadb faiss-cpu
pip install rank-bm25 nltk
pip install transformers torch sentencepiece
pip install pydantic python-dotenv tqdm

If FAISS fails, skip it and use Chroma only.

Step 2 — Download NLTK tokenizer

python -c "import nltk; nltk.download('punkt')"

🧪 Part A — Why RAG is Needed (6.1)

A1) Hallucination Baseline (No Retrieval)

Goal: Prove that “LLM-only” answers hallucinate on unknown facts.

Step A1.1 — Create a private knowledge base

Add 6–10 docs in data/docs/ (plain .txt):

Each doc: 500–1500 words.

Step A1.2 — Create evaluation questions

Create data/queries/questions.json with 15 questions:

Each record:

Step A1.3 — Run “LLM-only” answering

Use a small instruction model for generation:

Prompt:

Save outputs to:

Step A1.4 — Score hallucinations manually (fast)

Add columns:

Deliverable: reports/hallucination_baseline.md
Include counts and 2–3 examples of hallucination.

🧪 Part B — RAG Architecture (6.2)

B1) Document Ingestion Pipeline

Goal: Turn documents into chunks with metadata.

Step B1.1 — Build ingestion script src/ingest.py

Read all data/docs/*.txt and produce chunk records with:

Step B1.2 — Implement chunking strategy

Use token-aware chunking:

Save:

Deliverable: chunk file + chunk count per document.

B2) Embeddings + Vector Store (Retriever)

Goal: Create a searchable index.

Step B2.1 — Generate embeddings for chunks

Use:

Save:

Step B2.2 — Store in Chroma

Create a Chroma collection:

Insert:

Deliverable: persisted Chroma DB in indexes/chroma/.

B3) Retriever → Generator Flow

Goal: Implement end-to-end RAG.

Step B3.1 — Write retriever function src/retrieve.py

Inputs:

Steps:

  1. embed query

  2. vector search

  3. return top_k chunks + scores

Step B3.2 — Build context window manager

Implement “context builder”:

Step B3.3 — Write generator src/generate.py

Prompt template:

Rules:

Save outputs:

Deliverable: working RAG pipeline producing answers with citations.

B4) Compare RAG vs LLM-only

Goal: Demonstrate improvement + reduced hallucination.

Step B4.1 — Run same 15 questions through RAG

Save results.

Step B4.2 — Score outputs

Add:

Step B4.3 — Write comparison summary

Compute:

Deliverable: reports/rag_vs_baseline.md with a table.

🧪 Part C — Advanced RAG Techniques (6.3)

C1) Hybrid Search (Keyword + Vector)

Goal: Improve retrieval for exact terms and rare keywords.

Step C1.1 — Build BM25 index

Use rank-bm25 over chunk texts.

Step C1.2 — Retrieve with both methods

For each query:

Step C1.3 — Merge results

Combine and deduplicate:

Deliverable: outputs/hybrid_retrieval.json and a short before/after example.

C2) Re-ranking Strategies

Goal: Improve top results before generation.

Option A (No LLM): Cross-encoder re-ranker

Use:

Step C2.1 — Take top 12 candidates from hybrid

Step C2.2 — Re-rank to top 4 based on relevance score

Save:

Deliverable: reports/reranking_effect.md (show one query where it improves relevance).

C3) Multi-Document Reasoning

Goal: Answer questions requiring combining multiple sources.

Step C3.1 — Create multi-doc questions (5)

Example types:

Step C3.2 — Retrieval strategy

Step C3.3 — Generate answer with synthesis prompt

Prompt rules:

Deliverable: outputs/multidoc_answers.json + 2 strong examples.

🧪 Part D — Context Window Management (Production Practice)

D1) Token Budget Stress Test

Goal: Show what happens when context is too large.

Step D1.1 — Try with large top_k (e.g., 15)

Step D1.2 — Compare strategies

Deliverable: reports/context_management.md with 3 outputs and conclusions.

✅ Final Submission Checklist (Section 6 Lab)

Students submit:

  1. LLM-only baseline answers + hallucination scoring

  2. Chunking/ingestion outputs with metadata

  3. Vector store index (Chroma) + retrieval function

  4. Full RAG pipeline answers with citations

  5. RAG vs baseline comparison report

  6. Hybrid search implementation + example improvement

  7. Re-ranking report (before/after ranks)

  8. Multi-document reasoning answers with multi-source citations

  9. Context window management stress test report