Hands on Lab 1

✅ Lab Setup (5–10 min)

Step 0 — Create your lab folder

Create a folder: genai-section1-lab/
Inside, create:
- notebooks/ (optional)
- data/
- outputs/

Step 1 — Install dependencies

Run:

pip install numpy pandas scikit-learn matplotlib transformers torch sentencepiece

If you don’t have a GPU, no problem—this lab works on CPU.

🧪 Lab Part A — Discriminative vs Generative (1.1)

A1) Build a Discriminative Model (Text Classification)

Goal: Train a classifier that predicts a label (spam vs not spam).
This is discriminative: learns P(y | x).

Step A1.1 — Create a tiny dataset

Use any small dataset (you can paste this into a notebook/script):

Spam: “win money now”, “free cash prize”, “claim reward”
Not spam: “meeting at 3pm”, “project update”, “lunch tomorrow”

Step A1.2 — Vectorize and train Logistic Regression

Convert text → TF-IDF vectors
Train LogisticRegression()
Print accuracy + confusion matrix

✅ Deliverable: a simple accuracy score and a few predictions.

A2) Build a Generative Model (Text Generation)

Goal: Generate text given a prompt.
This is generative: learns P(x) or P(next_token | previous_tokens).

Step A2.1 — Start with a trivial “template generator”

Create a simple rule-based generator:

Input: “Write a spam message about {topic}”
Output: a filled template like:
- “Claim your {topic} reward now! Limited time!”

This shows: generation doesn’t require classification.

Step A2.2 — Use a small Transformer model locally

Use transformers with a lightweight model (example: distilgpt2).

Prompt: “Write a short email asking for a project update”
Generate 2–3 variations
Change temperature: 0.2 vs 1.0

✅ Deliverable: 6 outputs total (3 prompts × 2 temperatures).

A3) Why Generative Models Matter (Mini Case Exercise)

Step A3.1 — Pick ONE real industry scenario

Choose one:

Healthcare: patient note summarization
Finance: compliance drafting assistant
Aviation/MRO: defect summary + recommended actions
Education: quiz generation

Step A3.2 — Write 3 prompts for that use case

Include:

A basic prompt
A constrained prompt (format + tone)
A safety prompt (avoid sensitive info)

✅ Deliverable: A “Prompt Pack” document (copy/paste).

🧪 Lab Part B — Evolution of Generative Models (1.2)

B1) N-grams: Build a Bigram Language Model

Goal: See how early generation worked.

Step B1.1 — Create a tiny corpus

Use 10–20 sentences (you can write them yourself).

Step B1.2 — Compute bigram probabilities

Tokenize words
Count pairs (wᵢ → wᵢ₊₁)
Convert counts to probabilities

Step B1.3 — Generate text

Start word: “the”
Sample next tokens using probabilities
Generate 15–30 words

✅ Deliverable: 3 generated samples + a short observation:

repetitive?
incoherent?
stuck loops?

B2) Neural Language Models (High-level, practical demo)

Goal: Understand the “learn embeddings + predict next word” concept.

Step B2.1 — Compare “bigram vs transformer”

Run the same prompt:

“The airplane maintenance team noticed that…”

Generate:

Bigram output
Transformer output

✅ Deliverable: A side-by-side comparison paragraph.

B3) Autoencoders vs VAEs (Concept → hands-on intuition)

Goal: Learn reconstruction vs generation.

Step B3.1 — Autoencoder idea (simple numeric demo)

Use a toy dataset:

2D points (clusters)
Train a simple neural autoencoder (optional) OR simulate conceptually:
Encode → compressed latent → decode
Show: it reconstructs input well.

Step B3.2 — VAE intuition

Explain via experiment:

Sample random points in latent space
Decode them (conceptually or using a simple plotted distribution)
Show: VAEs enable sampling and smooth interpolation.

✅ Deliverable: A diagram/sketch + 5 bullet insight notes.

(If you want, I can provide the exact runnable VAE notebook in PyTorch.)

B4) GANs (High-level intuition experiment)

Goal: Understand Generator vs Discriminator game.

Step B4.1 — Use a “coin game” simulation

Simulate:

Real data distribution = normal(0,1)
Generator tries to mimic it
Discriminator tries to distinguish real vs fake

Even without training a GAN, plot:

Real distribution histogram
Fake distribution histogram (initial)
Then “improve” generator distribution manually (mean/variance closer)

✅ Deliverable: A 3-stage plot: fake improves over time.

B5) Why Transformers Replaced Everything (Mini Benchmark)

Step B5.1 — Show “context understanding”

Prompt:

“John gave Tom his book because he was leaving. Who was leaving?”
Try:
Bigram generator (fails)
Transformer (usually better)

Step B5.2 — Show long-range dependency

Prompt:

“In the following paragraph, remember the secret code is 9281…”
Ask at end: “What was the secret code?”

✅ Deliverable: A short write-up on context + attention.

🧪 Lab Part C — Generative AI Landscape (1.3)

C1) Text vs Image vs Audio vs Video (Practical mapping)

Step C1.1 — Make a “modality worksheet”

Create a table with:

Modality
Input
Output
Common use cases
Risks

Fill it for:

Text, Image, Audio, Video

✅ Deliverable: 1 completed table.

C2) Closed-source vs Open-source (Engineering Decision Lab)

Step C2.1 — Pick a project scenario

Example:

“Customer support chatbot for airline TechOps”
or
“Internal PDF Q&A assistant for policies”

Step C2.2 — Choose model approach

Write a 1-page decision:

Closed-source pros/cons (quality, ease, privacy, cost)
Open-source pros/cons (control, hosting, tuning, ops burden)
Final recommendation

✅ Deliverable: 1-page “Model Choice Memo”.

C3) Foundation Models: Build a simple “foundation → specialization” demo

Step C3.1 — Use a base model for multiple tasks

Using the same text model:

Summarize
Classify sentiment
Generate email
Extract JSON fields

Step C3.2 — Observe: same model, different behavior via prompts

Write 4 prompts and compare outputs.

✅ Deliverable: “Foundation Model Multi-Task Sheet”.

✅ Final Submission Checklist (What students should submit)

Discriminative classifier output (accuracy + predictions)
Generative outputs (temperature comparison)
Bigram generator samples + observations
Bigram vs Transformer comparison
Modality worksheet (text/image/audio/video)
Closed vs open-source model choice memo
Foundation model multi-task prompt sheet