Hands on Lab 4


✅ Lab Setup (10 min)

Step 0 — Create folder

Create: genai-section4-lab/ with:

Step 1 — Install dependencies

pip install python-dotenv pydantic fastapi uvicorn requests
pip install openai anthropic google-generativeai
pip install jsonschema rapidfuzz tiktoken

If you prefer open-source only:

pip install transformers torch sentencepiece

Step 2 — Add .env (optional APIs)

OPENAI_API_KEY=...
ANTHROPIC_API_KEY=...
GEMINI_API_KEY=...

🧪 Part A — Prompt Design Fundamentals (4.1)

A1) Build a “Prompt Template” with Roles

Goal: Create a reusable prompt structure using system/user/assistant roles.

Step A1.1 — Create data/system_prompt.txt

Write a system prompt that enforces:

Example requirements:

Step A1.2 — Create 3 user tasks in data/tasks.json

Include:

  1. Summarization task

  2. Structured extraction task

  3. Email drafting task

Each task has:

Deliverable: system_prompt.txt + tasks.json

A2) Zero-shot vs One-shot vs Few-shot (Reliability Test)

Goal: Measure how examples improve output consistency.

Step A2.1 — Choose ONE task: structured extraction

Example:

Extract: name, date, action_items from a paragraph.

Step A2.2 — Write three prompt variants

Create 3 files:

Step A2.3 — Run each prompt 5 times

Keep parameters fixed:

Step A2.4 — Score reliability

For each output check:

Deliverable: reports/fewshot_comparison.md with a table:
| Prompt type | JSON validity | Field completeness | Notes |

🧪 Part B — Advanced Prompting Techniques (4.2)

B1) Chain-of-Thought vs “Answer-Only” (Controlled Reasoning)

Goal: Improve reasoning accuracy while keeping outputs safe.

Step B1.1 — Use a reasoning task

Example:

“A flight has 3 legs… compute total delay impact…”

Step B1.2 — Create 2 prompts

Step B1.3 — Run both 5 times

Compare:

Deliverable: reports/cot_vs_direct.md
Include which is better for production and why.

Engineering note: In production, you often use hidden reasoning or “scratchpad” internally and return final answers only.

B2) Self-Consistency (Voting for Reliability)

Goal: Reduce randomness by sampling multiple outputs and choosing the best.

Step B2.1 — Pick a task with variability

Example:

Step B2.2 — Generate 7 outputs (same prompt)

Use:

Step B2.3 — Implement majority vote

Deliverable: reports/self_consistency.md including:

B3) Role Prompting + Constraints (Production Template)

Goal: Create a robust prompt that consistently produces structured results.

Step B3.1 — Write a role-based system instruction

Example roles:

Step B3.2 — Add hard constraints

Step B3.3 — Test against 5 varied inputs

Include:

Deliverable: outputs/role_constraints_results.json

🧪 Part C — Prompt Robustness & Safety (4.3)

C1) Prompt Injection Simulation (Red Team Test)

Goal: See how prompts fail and how to defend.

Step C1.1 — Create 8 malicious user inputs

Save in data/injection_tests.json. Examples:

Step C1.2 — Run baseline prompt (no defenses)

Record:

Deliverable: baseline results table in reports/injection_baseline.md

C2) Defensive Prompting (Blue Team Fix)

Goal: Improve resistance using prompt hierarchy + delimiters.

Step C2.1 — Update system prompt with:

Step C2.2 — Re-run same 8 tests

Compare baseline vs defended.

Deliverable: reports/injection_defended.md with:

C3) Input Validation (Before the Model)

Goal: Block risky inputs before they reach the LLM.

Step C3.1 — Build a validator in src/validators.py

Rules:

Step C3.2 — Apply validator to injection tests

Output:

Deliverable: outputs/input_validation_report.json

C4) Output Validation (After the Model)

Goal: Ensure the model output is safe + structured.

Step C4.1 — Define a JSON Schema

Create data/response_schema.json (example):

Step C4.2 — Validate every LLM response

If schema fails:

Deliverable: reports/output_validation.md showing:

🧪 Part D — Mini Project: “Prompt Pack” for a Real Feature (30–45 min)

Goal: Produce a reusable prompt library ready for a real app.

Step D1 — Choose ONE feature

Pick:

Step D2 — Build a Prompt Pack

Include:

Deliverable: reports/prompt_pack.md (copy/paste ready)

✅ Final Submission Checklist (Section 4 Lab)

Students submit:

  1. system_prompt.txt + tasks.json

  2. Zero/one/few-shot comparison report

  3. CoT vs answer-only report

  4. Self-consistency voting report

  5. Role + constraint outputs

  6. Injection baseline vs defended results

  7. Input validation report

  8. Output validation report + schema

  9. Prompt Pack mini-project