✅ Lab Setup (5–10 min)
Step 0 — Create your lab folder
Create a folder:
genai-section1-lab/Inside, create:
notebooks/(optional)data/outputs/
Step 1 — Install dependencies
Run:
pip install numpy pandas scikit-learn matplotlib transformers torch sentencepiece
If you don’t have a GPU, no problem—this lab works on CPU.
🧪 Lab Part A — Discriminative vs Generative (1.1)
A1) Build a Discriminative Model (Text Classification)
Goal: Train a classifier that predicts a label (spam vs not spam).
This is discriminative: learns P(y | x).
Step A1.1 — Create a tiny dataset
Use any small dataset (you can paste this into a notebook/script):
Spam: “win money now”, “free cash prize”, “claim reward”
Not spam: “meeting at 3pm”, “project update”, “lunch tomorrow”
Step A1.2 — Vectorize and train Logistic Regression
Convert text → TF-IDF vectors
Train
LogisticRegression()Print accuracy + confusion matrix
✅ Deliverable: a simple accuracy score and a few predictions.
A2) Build a Generative Model (Text Generation)
Goal: Generate text given a prompt.
This is generative: learns P(x) or P(next_token | previous_tokens).
Step A2.1 — Start with a trivial “template generator”
Create a simple rule-based generator:
Input: “Write a spam message about {topic}”
Output: a filled template like:
“Claim your {topic} reward now! Limited time!”
This shows: generation doesn’t require classification.
Step A2.2 — Use a small Transformer model locally
Use transformers with a lightweight model (example: distilgpt2).
Prompt: “Write a short email asking for a project update”
Generate 2–3 variations
Change temperature: 0.2 vs 1.0
✅ Deliverable: 6 outputs total (3 prompts × 2 temperatures).
A3) Why Generative Models Matter (Mini Case Exercise)
Step A3.1 — Pick ONE real industry scenario
Choose one:
Healthcare: patient note summarization
Finance: compliance drafting assistant
Aviation/MRO: defect summary + recommended actions
Education: quiz generation
Step A3.2 — Write 3 prompts for that use case
Include:
A basic prompt
A constrained prompt (format + tone)
A safety prompt (avoid sensitive info)
✅ Deliverable: A “Prompt Pack” document (copy/paste).
🧪 Lab Part B — Evolution of Generative Models (1.2)
B1) N-grams: Build a Bigram Language Model
Goal: See how early generation worked.
Step B1.1 — Create a tiny corpus
Use 10–20 sentences (you can write them yourself).
Step B1.2 — Compute bigram probabilities
Tokenize words
Count pairs (wᵢ → wᵢ₊₁)
Convert counts to probabilities
Step B1.3 — Generate text
Start word: “the”
Sample next tokens using probabilities
Generate 15–30 words
✅ Deliverable: 3 generated samples + a short observation:
repetitive?
incoherent?
stuck loops?
B2) Neural Language Models (High-level, practical demo)
Goal: Understand the “learn embeddings + predict next word” concept.
Step B2.1 — Compare “bigram vs transformer”
Run the same prompt:
“The airplane maintenance team noticed that…”
Generate:
Bigram output
Transformer output
✅ Deliverable: A side-by-side comparison paragraph.
B3) Autoencoders vs VAEs (Concept → hands-on intuition)
Goal: Learn reconstruction vs generation.
Step B3.1 — Autoencoder idea (simple numeric demo)
Use a toy dataset:
2D points (clusters)
Train a simple neural autoencoder (optional) OR simulate conceptually:Encode → compressed latent → decode
Show: it reconstructs input well.
Step B3.2 — VAE intuition
Explain via experiment:
Sample random points in latent space
Decode them (conceptually or using a simple plotted distribution)
Show: VAEs enable sampling and smooth interpolation.
✅ Deliverable: A diagram/sketch + 5 bullet insight notes.
(If you want, I can provide the exact runnable VAE notebook in PyTorch.)
B4) GANs (High-level intuition experiment)
Goal: Understand Generator vs Discriminator game.
Step B4.1 — Use a “coin game” simulation
Simulate:
Real data distribution = normal(0,1)
Generator tries to mimic it
Discriminator tries to distinguish real vs fake
Even without training a GAN, plot:
Real distribution histogram
Fake distribution histogram (initial)
Then “improve” generator distribution manually (mean/variance closer)
✅ Deliverable: A 3-stage plot: fake improves over time.
B5) Why Transformers Replaced Everything (Mini Benchmark)
Step B5.1 — Show “context understanding”
Prompt:
“John gave Tom his book because he was leaving. Who was leaving?”
Try:Bigram generator (fails)
Transformer (usually better)
Step B5.2 — Show long-range dependency
Prompt:
“In the following paragraph, remember the secret code is 9281…”
Ask at end: “What was the secret code?”
✅ Deliverable: A short write-up on context + attention.
🧪 Lab Part C — Generative AI Landscape (1.3)
C1) Text vs Image vs Audio vs Video (Practical mapping)
Step C1.1 — Make a “modality worksheet”
Create a table with:
Modality
Input
Output
Common use cases
Risks
Fill it for:
Text, Image, Audio, Video
✅ Deliverable: 1 completed table.
C2) Closed-source vs Open-source (Engineering Decision Lab)
Step C2.1 — Pick a project scenario
Example:
“Customer support chatbot for airline TechOps”
or“Internal PDF Q&A assistant for policies”
Step C2.2 — Choose model approach
Write a 1-page decision:
Closed-source pros/cons (quality, ease, privacy, cost)
Open-source pros/cons (control, hosting, tuning, ops burden)
Final recommendation
✅ Deliverable: 1-page “Model Choice Memo”.
C3) Foundation Models: Build a simple “foundation → specialization” demo
Step C3.1 — Use a base model for multiple tasks
Using the same text model:
Summarize
Classify sentiment
Generate email
Extract JSON fields
Step C3.2 — Observe: same model, different behavior via prompts
Write 4 prompts and compare outputs.
✅ Deliverable: “Foundation Model Multi-Task Sheet”.
✅ Final Submission Checklist (What students should submit)
Discriminative classifier output (accuracy + predictions)
Generative outputs (temperature comparison)
Bigram generator samples + observations
Bigram vs Transformer comparison
Modality worksheet (text/image/audio/video)
Closed vs open-source model choice memo
Foundation model multi-task prompt sheet