Hands on Lab 2

✅ Lab Setup (5–10 min)

Step 0 — Create folder

Create: genai-section2-lab/ with:

data/
outputs/
lab_section2.ipynb (recommended)

Step 1 — Install packages

pip install torch transformers tokenizers datasets accelerate sentencepiece matplotlib numpy pandas scikit-learn

Step 2 — Verify GPU (optional)

python -c "import torch; print('CUDA:', torch.cuda.is_available())"

🧪 Part A — Anatomy of Transformers (2.1)

A1) Visualize Self-Attention on a Sentence

Goal: Extract real attention weights from a transformer and visualize which tokens attend to which.

Step A1.1 — Load a small model

Use: bert-base-uncased (encoder) OR distilbert-base-uncased (lighter).

Step A1.2 — Prepare a sentence

Example:

“The mechanic inspected the engine because it was noisy.”

Step A1.3 — Run model with `output_attentions=True`

Tokenize sentence
Forward pass
Collect attention tensors: (layers, heads, seq, seq)

Step A1.4 — Plot one attention head as heatmap

Pick: layer 1, head 0
Plot token-to-token attention map
Save image to outputs/attention_heatmap.png

✅ Deliverable: 1 heatmap + 3 bullet observations (e.g., which token attends to “engine” / “noisy”).

A2) Understand Positional Encoding (By Breaking It)

Goal: Prove positions matter by scrambling token order and observing output changes.

Step A2.1 — Create two inputs

Original: “The cat sat on the mat”
Scrambled: “Mat the on sat cat the”

Step A2.2 — Compare embeddings (encoder model)

Extract last hidden state for both
Compute cosine similarity between:
- Sentence embeddings (mean pooling)
- Token embeddings

Step A2.3 — Explain outcome

Same words but different order → different representations

✅ Deliverable: similarity scores + short explanation: “Why position changes meaning.”

A3) Encoder vs Decoder Architecture (Hands-On)

Goal: Experience difference between “understanding” models vs “generation” models.

Step A3.1 — Encoder task (BERT): Fill-Mask

Input:

“Transformers are [MASK] at understanding context.”

Use BERT fill-mask pipeline
Record top 5 predictions

Step A3.2 — Decoder task (GPT2): Generate

Prompt:

“Transformers are powerful because”

Use GPT2 text-generation
Generate 3 outputs at temperature 0.7

Step A3.3 — Compare in writing

Encoder: bidirectional understanding
Decoder: left-to-right generation

✅ Deliverable: a short comparison paragraph + outputs.

🧪 Part B — Tokens, Embeddings & Context Windows (2.2)

B1) Tokenization Strategies: Word vs Subword

Goal: See why subword tokenization exists.

Step B1.1 — Compare 2 tokenizers

Use:

GPT-2 tokenizer (BPE)
BERT tokenizer (WordPiece)

Step B1.2 — Test words that break naïve tokenizers

Try:

“unbelievable”
“internationalization”
“electroencephalography”
“bioinformatics”
A typo word: “mecahnical”

Step B1.3 — Print token lists + token counts

Count tokens per word for each tokenizer
Save results in a table

✅ Deliverable: tokenization table + 2 observations:

“Which tokenizer splits more?”
“How does typo affect tokens?”

B2) Embeddings & Semantic Meaning (Mini Semantic Search)

Goal: Prove embeddings cluster meaning.

Step B2.1 — Create 12 sentences (4 topics × 3 sentences)

Example topics:

Aviation maintenance
Finance
Health
Sports

Step B2.2 — Convert each sentence to embeddings

Use one:

sentence-transformers (recommended if allowed)
OR
mean-pool BERT last hidden states

Step B2.3 — Measure similarity

Compute cosine similarity matrix (12×12)
Identify top-2 most similar sentences for each one

Step B2.4 — Visualize embedding space

Use PCA (2D) or t-SNE
Plot 2D scatter (label points)

✅ Deliverable: similarity matrix + 2D plot + brief conclusion.

B3) Context Window Limits (Practical Experiment)

Goal: Experience how long context fails and why pruning matters.

Step B3.1 — Create a long prompt

Generate a paragraph repeated until ~1500–2500 tokens
Insert a key fact near the start:
- “The secret key is ORANGE-9281.”

Step B3.2 — Ask at the end

“What is the secret key?”

Step B3.3 — Test with 2 settings

Short max context (or short prompt)
Long prompt near model limit

Observe:

Does it recall?
Does accuracy degrade?

✅ Deliverable: 2 outputs + explanation:

“Why long context reduces reliability”
Mention: attention cost grows with sequence length.

🧪 Part C — How LLMs Are Trained (2.3)

C1) Pretraining Objectives (Mini Demo)

You’ll simulate what models learn during pretraining.

Option 1: Masked Language Modeling (BERT style)

Goal: Predict missing tokens.

Step C1.1 — Make 10 custom sentences (domain-specific)

Example:

“The aircraft engine requires regular inspection.”
“Hydraulic systems can leak under high pressure.”

Step C1.2 — Mask random tokens

Replace 15% with [MASK].

Step C1.3 — Run BERT fill-mask predictions

Evaluate: how often does top-1 match your original?

✅ Deliverable: masked sentence results + basic “accuracy”.

Option 2: Next Token Prediction (GPT style)

Goal: Predict next word distribution.

Step C1.1 — Choose prompt

“In aviation maintenance, reliability depends on”

Step C1.2 — Generate with low temperature

temperature 0.1, top_p 0.9

✅ Deliverable: outputs + note: “This is what pretraining optimizes.”

C2) Fine-Tuning vs Instruction Tuning (Hands-On)

Goal: See difference between fine-tuning for a task vs instruction-following behavior.

Step C2.1 — Create a tiny dataset (20 rows)

Make a CSV with:

instruction
input
output

Example tasks:

Convert text → JSON
Summarize into 1 sentence
Classify into labels

Step C2.2 — Run inference BEFORE tuning

Feed 5 examples to base model
Record outputs (often inconsistent)

Step C2.3 — Do a lightweight instruction tuning (conceptual + optional)

If you want a runnable approach without heavy compute:

Use a small model like google/flan-t5-small (already instruction-following)
Compare against a base model that is not instruction tuned (like raw GPT2)

✅ Deliverable: “Before vs after” comparison table:

formatting consistency
instruction adherence

If you want full tuning, we can do LoRA/PEFT in Section 6+ where it fits better.

C3) RLHF (High-Level Intuition) — Mini Simulation

Goal: Understand RLHF as “preference optimization.”

Step C3.1 — Create 6 prompts

Example:

“Write a polite email declining a meeting.”
“Give a safe answer to a risky question.”

Step C3.2 — Generate 2 candidate answers each

Use two settings:

Candidate A: temperature 0.2 (safer)
Candidate B: temperature 0.9 (more creative)

Step C3.3 — Rank each pair (human preference)

Create a simple table:

Prompt
Answer A
Answer B
Winner + why

Step C3.4 — Explain RLHF mapping

Your preference labels ≈ “reward model”
Optimization pushes model toward preferred style

✅ Deliverable: preference table + 5-line explanation of RLHF.

✅ Final Submission Checklist (Section 2 Lab)

Students submit:

Attention heatmap + observations
Positional encoding experiment results (similarity scores)
Encoder vs decoder outputs + comparison paragraph
Tokenization comparison table
Embedding similarity matrix + PCA/t-SNE plot
Context window experiment outputs + explanation
Pretraining objective demo results
Instruction tuning comparison (base vs instruction-following)
RLHF preference ranking table

Hands on Lab 2

✅ Lab Setup (5–10 min)

Step 0 — Create folder

Step 1 — Install packages

Step 2 — Verify GPU (optional)

🧪 Part A — Anatomy of Transformers (2.1)

A1) Visualize Self-Attention on a Sentence

Step A1.1 — Load a small model

Step A1.2 — Prepare a sentence

Step A1.3 — Run model with output_attentions=True

Step A1.4 — Plot one attention head as heatmap

A2) Understand Positional Encoding (By Breaking It)

Step A2.1 — Create two inputs

Step A2.2 — Compare embeddings (encoder model)

Step A2.3 — Explain outcome

A3) Encoder vs Decoder Architecture (Hands-On)

Step A3.1 — Encoder task (BERT): Fill-Mask

Step A3.2 — Decoder task (GPT2): Generate

Step A3.3 — Compare in writing

🧪 Part B — Tokens, Embeddings & Context Windows (2.2)

B1) Tokenization Strategies: Word vs Subword

Step B1.1 — Compare 2 tokenizers

Step B1.2 — Test words that break naïve tokenizers

Step B1.3 — Print token lists + token counts

B2) Embeddings & Semantic Meaning (Mini Semantic Search)

Step B2.1 — Create 12 sentences (4 topics × 3 sentences)

Step B2.2 — Convert each sentence to embeddings

Step B2.3 — Measure similarity

Step B2.4 — Visualize embedding space

B3) Context Window Limits (Practical Experiment)

Step B3.1 — Create a long prompt

Step B3.2 — Ask at the end

Step B3.3 — Test with 2 settings

🧪 Part C — How LLMs Are Trained (2.3)

C1) Pretraining Objectives (Mini Demo)

Option 1: Masked Language Modeling (BERT style)

Step C1.1 — Make 10 custom sentences (domain-specific)

Step C1.2 — Mask random tokens

Step C1.3 — Run BERT fill-mask predictions

Option 2: Next Token Prediction (GPT style)

Step C1.1 — Choose prompt

Step C1.2 — Generate with low temperature

C2) Fine-Tuning vs Instruction Tuning (Hands-On)

Step C2.1 — Create a tiny dataset (20 rows)

Step C2.2 — Run inference BEFORE tuning

Step C2.3 — Do a lightweight instruction tuning (conceptual + optional)

C3) RLHF (High-Level Intuition) — Mini Simulation

Step C3.1 — Create 6 prompts

Step C3.2 — Generate 2 candidate answers each

Step C3.3 — Rank each pair (human preference)

Step C3.4 — Explain RLHF mapping

✅ Final Submission Checklist (Section 2 Lab)

Step A1.3 — Run model with `output_attentions=True`