Hands on Lab 1


✅ Lab Setup (5–10 min)

Step 0 — Create your lab folder

  1. Create a folder: genai-section1-lab/

  2. Inside, create:

    • notebooks/ (optional)

    • data/

    • outputs/

Step 1 — Install dependencies

Run:

pip install numpy pandas scikit-learn matplotlib transformers torch sentencepiece

If you don’t have a GPU, no problem—this lab works on CPU.

🧪 Lab Part A — Discriminative vs Generative (1.1)

A1) Build a Discriminative Model (Text Classification)

Goal: Train a classifier that predicts a label (spam vs not spam).
This is discriminative: learns P(y | x).

Step A1.1 — Create a tiny dataset

Use any small dataset (you can paste this into a notebook/script):

Step A1.2 — Vectorize and train Logistic Regression

Deliverable: a simple accuracy score and a few predictions.

A2) Build a Generative Model (Text Generation)

Goal: Generate text given a prompt.
This is generative: learns P(x) or P(next_token | previous_tokens).

Step A2.1 — Start with a trivial “template generator”

Create a simple rule-based generator:

This shows: generation doesn’t require classification.

Step A2.2 — Use a small Transformer model locally

Use transformers with a lightweight model (example: distilgpt2).

Deliverable: 6 outputs total (3 prompts × 2 temperatures).

A3) Why Generative Models Matter (Mini Case Exercise)

Step A3.1 — Pick ONE real industry scenario

Choose one:

Step A3.2 — Write 3 prompts for that use case

Include:

Deliverable: A “Prompt Pack” document (copy/paste).

🧪 Lab Part B — Evolution of Generative Models (1.2)

B1) N-grams: Build a Bigram Language Model

Goal: See how early generation worked.

Step B1.1 — Create a tiny corpus

Use 10–20 sentences (you can write them yourself).

Step B1.2 — Compute bigram probabilities

Step B1.3 — Generate text

Deliverable: 3 generated samples + a short observation:

B2) Neural Language Models (High-level, practical demo)

Goal: Understand the “learn embeddings + predict next word” concept.

Step B2.1 — Compare “bigram vs transformer”

Run the same prompt:

Generate:

Deliverable: A side-by-side comparison paragraph.

B3) Autoencoders vs VAEs (Concept → hands-on intuition)

Goal: Learn reconstruction vs generation.

Step B3.1 — Autoencoder idea (simple numeric demo)

Use a toy dataset:

Step B3.2 — VAE intuition

Explain via experiment:

Deliverable: A diagram/sketch + 5 bullet insight notes.

(If you want, I can provide the exact runnable VAE notebook in PyTorch.)

B4) GANs (High-level intuition experiment)

Goal: Understand Generator vs Discriminator game.

Step B4.1 — Use a “coin game” simulation

Simulate:

Even without training a GAN, plot:

Deliverable: A 3-stage plot: fake improves over time.

B5) Why Transformers Replaced Everything (Mini Benchmark)

Step B5.1 — Show “context understanding”

Prompt:

Step B5.2 — Show long-range dependency

Prompt:

Deliverable: A short write-up on context + attention.

🧪 Lab Part C — Generative AI Landscape (1.3)

C1) Text vs Image vs Audio vs Video (Practical mapping)

Step C1.1 — Make a “modality worksheet”

Create a table with:

Fill it for:

Deliverable: 1 completed table.

C2) Closed-source vs Open-source (Engineering Decision Lab)

Step C2.1 — Pick a project scenario

Example:

Step C2.2 — Choose model approach

Write a 1-page decision:

Deliverable: 1-page “Model Choice Memo”.

C3) Foundation Models: Build a simple “foundation → specialization” demo

Step C3.1 — Use a base model for multiple tasks

Using the same text model:

Step C3.2 — Observe: same model, different behavior via prompts

Write 4 prompts and compare outputs.

Deliverable: “Foundation Model Multi-Task Sheet”.

✅ Final Submission Checklist (What students should submit)

  1. Discriminative classifier output (accuracy + predictions)

  2. Generative outputs (temperature comparison)

  3. Bigram generator samples + observations

  4. Bigram vs Transformer comparison

  5. Modality worksheet (text/image/audio/video)

  6. Closed vs open-source model choice memo

  7. Foundation model multi-task prompt sheet