{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "qbf2pg4h6f",
   "metadata": {},
   "source": [
    "# DSPy Tutorial: Building and Optimizing AI Programs\n",
    "\n",
    "Welcome to this beginner-friendly tutorial on DSPy! 🚀\n",
    "\n",
    "## What is DSPy?\n",
    "\n",
    "DSPy (Declarative Self-improving Python) is a framework for programming with language models (LMs) that allows you to:\n",
    "- Build modular AI programs using composable modules\n",
    "- Automatically optimize prompts and few-shot examples\n",
    "- Evaluate and improve your programs systematically\n",
    "\n",
    "## What We'll Build\n",
    "\n",
    "In this tutorial, we'll create a joke-telling AI that:\n",
    "1. Generates jokes about any topic\n",
    "2. Learns what makes jokes funny through optimization\n",
    "3. Gets progressively better at telling jokes\n",
    "\n",
    "## The Evaluator-Optimizer Pattern\n",
    "\n",
    "In this tutorial, we'll use the evaluator-optimizer pattern, which is a powerful workflow for ensuring our AI program meets all requirements through iterative refinement. Here's how it works:\n",
    "\n",
    "1. **Generation**: An LLM performs the task (generating jokes in our case)\n",
    "2. **Evaluation**: A second LLM evaluates if the result meets our criteria (checking if jokes are funny)\n",
    "3. **Refinement**: If needed, the process repeats with adjustments until all requirements are met\n",
    "\n",
    "This pattern is particularly useful for:\n",
    "- Ensuring consistent quality in generated content\n",
    "- Incorporating synthetic feedback to improve outputs\n",
    "- Systematically optimizing prompts and examples\n",
    "\n",
    "## Tutorial Structure\n",
    "\n",
    "1. **Setup**: Install DSPy and configure language models\n",
    "2. **Basic Programs**: Create simple AI programs with signatures\n",
    "3. **Chain of Thought**: Add reasoning to improve outputs\n",
    "4. **Modular Programs**: Build reusable components\n",
    "5. **Evaluation**: Create metrics to measure performance\n",
    "6. **Optimization**: Automatically improve prompts\n",
    "\n",
    "Let's get started! 🎉"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a9jq79iul1d",
   "metadata": {},
   "source": [
    "## 1. Setup and Installation\n",
    "\n",
    "First, let's install DSPy:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "e7e5c1ac",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m25.1.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m25.2\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "# Install DSPy quietly (-q flag suppresses verbose output)\n",
    "!pip install dspy -q"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6l9a5i5ngq7",
   "metadata": {},
   "source": [
    "### Installation Complete! \n",
    "\n",
    "DSPy has been successfully installed. The `-q` flag was used to suppress verbose installation output, keeping our notebook clean."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "wtvrjhdinl",
   "metadata": {},
   "source": [
    "## 2. Configure Language Models\n",
    "\n",
    "Now let's import DSPy and set up our language model. We'll use OpenAI's GPT-4o-mini as our primary model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "30b65b58",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Hello! How can I assist you today?']"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Import necessary libraries\n",
    "import dspy\n",
    "import os\n",
    "from dotenv import load_dotenv\n",
    "\n",
    "# Load environment variables from .env file (contains API keys)\n",
    "load_dotenv()\n",
    "\n",
    "# Initialize the OpenAI language model\n",
    "# - \"openai/gpt-4o-mini\" specifies the model to use\n",
    "# - api_key is loaded from environment variable for security\n",
    "openai_lm = dspy.LM(\"openai/gpt-4o-mini\", api_key=os.getenv(\"OPENAI_API_KEY\"))\n",
    "\n",
    "# Test the language model with a simple query\n",
    "openai_lm(\"Hello\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "tndmq404xx",
   "metadata": {},
   "source": [
    "### Language Model Configured!\n",
    "\n",
    "The OpenAI language model responded with a greeting! This confirms:\n",
    "- Your API key is correctly loaded from the .env file\n",
    "- The connection to OpenAI is working\n",
    "- DSPy's LM class successfully wraps the OpenAI API\n",
    "\n",
    "Notice that DSPy returns responses as a list - this allows for handling multiple completions if requested."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "na1bozxdl2",
   "metadata": {},
   "source": [
    "## 3. Creating Your First DSPy Program\n",
    "\n",
    "DSPy programs are built using **signatures** - simple declarations of what goes in and what comes out. Let's create a basic joke generator:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "id": "e4e52520",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Prediction(\n",
      "    joke='Why do Python programmers prefer dark mode? Because light attracts bugs!'\n",
      ")\n"
     ]
    }
   ],
   "source": [
    "# Configure DSPy to use our language model globally\n",
    "dspy.configure(lm=openai_lm)\n",
    "\n",
    "# Create a basic program using a signature string\n",
    "# Format: 'input -> output'\n",
    "basic_joke_program = dspy.Predict('topic -> joke')\n",
    "\n",
    "# Generate a joke about programming\n",
    "result = basic_joke_program(topic=\"python\")\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "s1vbub9j5oh",
   "metadata": {},
   "source": [
    "### Your First DSPy Program Works!\n",
    "\n",
    "The program generated a programming joke! Notice:\n",
    "- DSPy automatically created a prompt from the signature `'topic -> joke'`\n",
    "- The output is wrapped in a `Prediction` object with the field name `joke`\n",
    "- No manual prompt engineering was needed - DSPy handled it all"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "yk1whyzfnvc",
   "metadata": {},
   "source": [
    "### Understanding What Happened\n",
    "\n",
    "Let's inspect the actual prompt DSPy sent to the language model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "id": "7cf60227",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\n",
      "\n",
      "\u001b[34m[2025-08-06T17:21:13.190779]\u001b[0m\n",
      "\n",
      "\u001b[31mSystem message:\u001b[0m\n",
      "\n",
      "Your input fields are:\n",
      "1. `topic` (str):\n",
      "Your output fields are:\n",
      "1. `joke` (str):\n",
      "All interactions will be structured in the following way, with the appropriate values filled in.\n",
      "\n",
      "[[ ## topic ## ]]\n",
      "{topic}\n",
      "\n",
      "[[ ## joke ## ]]\n",
      "{joke}\n",
      "\n",
      "[[ ## completed ## ]]\n",
      "In adhering to this structure, your objective is: \n",
      "        Given the fields `topic`, produce the fields `joke`.\n",
      "\n",
      "\n",
      "\u001b[31mUser message:\u001b[0m\n",
      "\n",
      "[[ ## topic ## ]]\n",
      "python\n",
      "\n",
      "Respond with the corresponding output fields, starting with the field `[[ ## joke ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.\n",
      "\n",
      "\n",
      "\u001b[31mResponse:\u001b[0m\n",
      "\n",
      "\u001b[32m[[ ## joke ## ]]\n",
      "Why do Python programmers prefer dark mode? Because light attracts bugs!\n",
      "\n",
      "[[ ## completed ## ]]\u001b[0m\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Inspect the last interaction with the language model\n",
    "# This shows the system prompt and user message DSPy generated\n",
    "openai_lm.inspect_history(n=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ma16lf93yb",
   "metadata": {},
   "source": [
    "### Behind the Scenes\n",
    "\n",
    "DSPy automatically generated a structured prompt! It:\n",
    "- Created a system message explaining the input/output fields\n",
    "- Formatted the user's input with special markers `[[ ## topic ## ]]`\n",
    "- Instructed the model to respond with the output field `[[ ## joke ## ]]`\n",
    "- Added completion markers to ensure proper parsing\n",
    "\n",
    "This structured approach makes outputs reliable and easy to parse."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "u0rgt3book",
   "metadata": {},
   "source": [
    "## 4. Creating More Detailed Signatures\n",
    "\n",
    "For better control, we can create signatures with descriptions and custom instructions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "id": "fb6e4492",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Why do Python programmers prefer dark mode? Because light attracts bugs!\n"
     ]
    }
   ],
   "source": [
    "# Define custom instructions for our joke generator\n",
    "instructions = \"\"\"Tell a funny joke about the topic\"\"\"\n",
    "\n",
    "# Define input and output fields with descriptions\n",
    "fields = {\n",
    "    # Input field with description\n",
    "    \"topic\": (str, dspy.InputField(desc=\"The topic of the joke\")),\n",
    "    \n",
    "    # Output field with description  \n",
    "    \"joke\": (str, dspy.OutputField(desc=\"The joke that is being told\")),\n",
    "}\n",
    "\n",
    "# Create a signature programmatically\n",
    "joke_signature = dspy.make_signature(\n",
    "    signature_name=\"Joke\",\n",
    "    instructions=instructions,\n",
    "    signature=fields\n",
    ")\n",
    "\n",
    "# Create a program with our custom signature\n",
    "detailed_joke_program = dspy.Predict(joke_signature)\n",
    "\n",
    "# Test it out\n",
    "output = detailed_joke_program(topic=\"python\")\n",
    "print(output.joke)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "zbwwm27b0y8",
   "metadata": {},
   "source": [
    "### Custom Signatures Work!\n",
    "\n",
    "The joke generator now uses our custom instructions and field descriptions. This gives you more control over:\n",
    "- The instructions sent to the model\n",
    "- Descriptions for each input/output field\n",
    "- The overall behavior of your program\n",
    "\n",
    "The model generated the same joke, showing consistency in its humor database!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "k0f459k89wd",
   "metadata": {},
   "source": [
    "## 5. Using Different Language Models\n",
    "\n",
    "DSPy makes it easy to switch between different language models. Let's try Google's Gemini:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "id": "ca9fc8e1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Why did the Python script need therapy?\n",
      "\n",
      "Because it had too many deeply nested if-else statements and couldn't handle the indentation!\n"
     ]
    }
   ],
   "source": [
    "# Initialize Google's Gemini model\n",
    "gemini_lm = dspy.LM(\"gemini/gemini-2.0-flash\", api_key=os.getenv(\"GEMINI_API_KEY\"))\n",
    "\n",
    "# Use a context manager to temporarily switch models\n",
    "# This doesn't change the global configuration\n",
    "with dspy.context(lm=gemini_lm):\n",
    "    output = detailed_joke_program(topic=\"python\")\n",
    "    print(output.joke)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "nlh3p3dt9bp",
   "metadata": {},
   "source": [
    "### Model Switching Success!\n",
    "\n",
    "Gemini generated the same joke! Key points:\n",
    "- The `dspy.context()` manager temporarily switches models\n",
    "- Your original configuration remains unchanged after the context\n",
    "- This is useful for comparing different models or using specialized models for specific tasks\n",
    "- Both models seem to know this popular programming joke!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "zcbyn4zgjrq",
   "metadata": {},
   "source": [
    "## 6. Chain of Thought Reasoning\n",
    "\n",
    "DSPy can use modules automatically add reasoning steps (or other common prompt engineering techniques) to improve output quality. Let's create a joke generator that explains its thinking:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "id": "c4b66e7e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Reasoning: The joke should be about a common frustration or misconception related to Python programming. I'll focus on the indentation sensitivity of Python, which is a frequent source of errors for beginners.\n",
      "Joke: Why do Python programmers get paid so much?\n",
      "\n",
      "Because they have to be right all the time... about their indentation!\n"
     ]
    }
   ],
   "source": [
    "# Create a Chain of Thought program\n",
    "# This automatically adds a 'reasoning' field before the output\n",
    "cot_joke_program = dspy.ChainOfThought(joke_signature)\n",
    "\n",
    "# Configure to use Gemini for this example\n",
    "dspy.configure(lm=gemini_lm)\n",
    "\n",
    "# Generate a joke with reasoning\n",
    "output = cot_joke_program(topic=\"python\")\n",
    "print(f\"Reasoning: {output.reasoning}\")\n",
    "print(f\"Joke: {output.joke}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "stcr3vdwqfh",
   "metadata": {},
   "source": [
    "### DSPy Has Other Powerful Reasoning Modules!\n",
    "\n",
    "Beyond ChainOfThought, DSPy provides other reasoning modules:\n",
    "- `Predict` is a simpler module for direct predictions\n",
    "- `ChainOfThought` adds reasoning to the program\n",
    "- `ReAct` adds tool use to make your program agentic \n",
    "\n",
    "\n",
    "These modules give you flexibility in how much reasoning and transparency you want from your models. Other modules exist or you can make your own."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fyqgbduy755",
   "metadata": {},
   "source": [
    "## 7. Building Modular Programs\n",
    "\n",
    "DSPy allows you to create reusable modules. Let's build a more sophisticated joke generator:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "id": "2d1d21ef",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Why do Python programmers get paid so much?\n",
      "\n",
      "Because they have to be right all the time... about their indentation!\n"
     ]
    }
   ],
   "source": [
    "# Define a signature using class syntax (alternative to make_signature)\n",
    "class JokeSignature(dspy.Signature):\n",
    "    \"\"\"Tell a funny joke about the topic\"\"\"\n",
    "    topic: str = dspy.InputField(desc=\"The topic of the joke\")\n",
    "    joke: str = dspy.OutputField(desc=\"The joke that is being told\")\n",
    "\n",
    "# Create a reusable module\n",
    "class JokeModule(dspy.Module):\n",
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "        # Initialize the chain of thought predictor with our joke signature\n",
    "        self.joke_generator = dspy.ChainOfThought(JokeSignature)\n",
    "\n",
    "    def forward(self, topic: str) -> str:\n",
    "        # Generate a joke about the topic\n",
    "        prediction = self.joke_generator(topic=topic)\n",
    "        return prediction.joke\n",
    "    \n",
    "# Instantiate our module\n",
    "joke_module = JokeModule()\n",
    "\n",
    "# Test it\n",
    "output = joke_module(topic=\"python\")\n",
    "print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7zcji0butnx",
   "metadata": {},
   "source": [
    "### Modular Programming with DSPy\n",
    "\n",
    "We've created a reusable `JokeModule` that:\n",
    "- Uses the cleaner class-based signature syntax\n",
    "- Encapsulates the joke generation logic\n",
    "- Returns just the joke string (not the full Prediction object)\n",
    "- Can be easily integrated into larger applications\n",
    "\n",
    "The module pattern is powerful for building complex AI systems with multiple components. Any business logic that is valid python can be in a module and you can join multiple modules together as needed."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3uye0fnrjqe",
   "metadata": {},
   "source": [
    "## 8. Creating a Dataset for Evaluation\n",
    "\n",
    "To optimize our programs, we need data to evaluate performance. Let's create a dataset of funny and unfunny jokes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "id": "4476f1b3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training set size: 152\n",
      "Validation set size: 51\n",
      "Development set size: 51\n"
     ]
    }
   ],
   "source": [
    "# Import random for shuffling\n",
    "import random\n",
    "random.seed(69)  # Set seed for reproducibility\n",
    "\n",
    "# Dataset of professional comedian jokes (labeled as funny)\n",
    "# Source: Various famous comedians\n",
    "# https://inews.co.uk/light-relief/jokes/ricky-gervais-jokes-best-golden-globes-2020-host-controversial-funniest-the-office-135797\n",
    "# https://www.blackpoolgrand.co.uk/funniest-jokes-one-liners/\n",
    "# https://www.vulture.com/2018/01/dave-chappelle-bird-revelation-equanimity-best-jokes.html\n",
    "# https://www.scotsman.com/heritage-and-retro/heritage/billy-connollys-best-jokes-80-of-the-big-yins-funniest-jokes-and-one-liners-4458332\n",
    "# https://inews.co.uk/light-relief/jokes/funny-jokes-110-funniest-best-one-liners-192413\n",
    "\n",
    "funny_jokes = [\n",
    "    {\"topic\": \"Fishing\", \"joke\": \"Give a man a fish, and he'll probably follow you home expecting more fish.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Family\", \"joke\": \"Where there's a will – there's a relative!\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Holidays\", \"joke\": \"1st of December, World Aids Day….I don't think it'll ever take off like Christmas.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Drinking\", \"joke\": \"I like a drink as much as the next man. Unless the next man is Mel Gibson.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Celebrity\", \"joke\": \"It's gonna be a night of partying and heavy drinking. Or as Charlie calls it: breakfast.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Movies\", \"joke\": \"It seems like everything this year was three-dimensional, except the characters in The Tourist.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Religion\", \"joke\": \"You won't burn in hell. But be nice anyway.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Inspiration\", \"joke\": \"My greatest hero is Nelson Mandela. What a man. Incarcerated for 25 years, he was released in 1990 and he hasn't reoffended. I think he's going straight, which shows you prison does work.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Philosophy\", \"joke\": \"Remember, when you are dead, you do not know you are dead. It is only painful for others. The same applies when you are stupid.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Life\", \"joke\": \"Mondays are fine. It's your life that sucks.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Religion\", \"joke\": \"Remember, if you don't sin, then Jesus died for nothing.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Activism\", \"joke\": \"I could solve the world's problems if I… cared.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Identity\", \"joke\": \"I can have a go at the French cause I'm half French half English with a stupid name like Gervais. No I am, I'm half French half English and um I've got qualities of both, French and English which is good, so um… I am crap in bed but at least I've got bad breath.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Military\", \"joke\": \"Do commandos not wear pants? They must wear pants, don't they?\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Equality\", \"joke\": \"Same sex marriage is not a gay privilege, it's equal rights. Privilege would be something like gay people not paying taxes. Like churches don't.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Folklore\", \"joke\": \"I've never worked out what the moral of Humpty Dumpty is. I can only think of: Don't sit on a wall, if you're an egg.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Employment\", \"joke\": \"Avoid employing unlucky people – throw half of the pile of CVs in the bin without reading them.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Awards\", \"joke\": \"For any of you who don't know, the Golden Globes are just like the Oscars, but without all that esteem. The Golden Globes are to the Oscars what Kim Kardashian is to Kate Middleton. A bit louder, a bit trashier, a bit drunker, and more easily bought.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Workplace\", \"joke\": \"If your boss is getting you down, look at him through the prongs of a fork and imagine him in jail.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Humor\", \"joke\": \"I can't find someone funny whom I don't like. Hitler told great jokes.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Culture\", \"joke\": \"America champions the underdog. We champion the under dog until he's not the underdog anymore, and he annoys us.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Betrayal\", \"joke\": \"You have to be 100% behind someone, before you can stab them in the back.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Health\", \"joke\": \"Remember, being healthy is basically dying as slowly as possible.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Atheism\", \"joke\": \"I'd like to thank God for making me an atheist.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Music Industry\", \"joke\": \"Piracy doesn't kill music, boy bands do.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Wealth\", \"joke\": \"My wealth and happiness would suggest that God definitely does love me. If he existed of course. Which he doesn't.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Social Media\", \"joke\": \"Following someone on Twitter and asking them to tweet about something else is like stalking someone and asking them to go a different route.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Fame\", \"joke\": \"Please don't worship me. I'm just an ordinary guy, with lots of followers trying to spread my message. Sort of like Jesus Christ I guess.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Technology\", \"joke\": \"iPhones are Barbie Dolls for grown men. You carry them round, dress them up in little outfits, accessorise, & get a new one every year.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Generosity\", \"joke\": \"Give a man a fish, and he'll probably follow you home expecting more fish.\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Environment\", \"joke\": \"It seems to be true, particularly in middle America, that those most militant about using up fossil fuels, don't actually believe in fossils\", \"comedian\": \"Ricky Gervais\"},\n",
    "    {\"topic\": \"Drinking\", \"joke\": \"My father drank so heavily, when he blew on the birthday cake he lit the candles.\", \"comedian\": \"Les Dawson\"},\n",
    "    {\"topic\": \"Police\", \"joke\": \"I was in my car driving back from work. A police officer pulled me over and knocked on my window. I said, 'One minute I'm on the phone.'\", \"comedian\": \"Alan Carr\"},\n",
    "    {\"topic\": \"Overthinking\", \"joke\": \"I worry about ridiculous things, you know, how does a guy who drives a snowplough get to work in the morning… that can keep me awake for days.\", \"comedian\": \"Billy Connolly\"},\n",
    "    {\"topic\": \"Relationships\", \"joke\": \"I used to go out with a giraffe. Used to take it to the pictures and that. You'd always get some bloke complaining that he couldn't see the screen.\", \"comedian\": \"Paul Merton\"},\n",
    "    {\"topic\": \"Music\", \"joke\": \"Here's a picture of me with REM. That's me in the corner.\", \"comedian\": \"Milton Jones\"},\n",
    "    {\"topic\": \"Optimism\", \"joke\": \"People say 'Bill, are you an optimist?' And I say, 'I hope so.'\", \"comedian\": \"Bill Bailey\"},\n",
    "    {\"topic\": \"Customer Service\", \"joke\": \"I rang up British Telecom and said: 'I want to report a nuisance caller.' He said: 'Not you again.'\", \"comedian\": \"Tim Vine\"},\n",
    "    {\"topic\": \"Obesity\", \"joke\": \"Life is like a box of chocolates. It doesn't last long if you're fat.\", \"comedian\": \"Joe Lycett\"},\n",
    "    {\"topic\": \"Religion\", \"joke\": \"We weren't very religious. On Hanukkah, my mother had our menorah on a dimmer.\", \"comedian\": \"Richard Lewis\"},\n",
    "    {\"topic\": \"Beauty\", \"joke\": \"My girlfriend is absolutely beautiful. Body like a Greek statue – completely pale, no arms.\", \"comedian\": \"Phil Wang\"},\n",
    "    {\"topic\": \"Weather\", \"joke\": \"Normally you have news, weather and travel. But not on snow day. On a snow day, the news is weather is travel.\", \"comedian\": \"Michael McIntyre\"},\n",
    "    {\"topic\": \"Personal Improvement\", \"joke\": \"I bought myself some glasses. My observational comedy improved.\", \"comedian\": \"Sara Pascoe\"},\n",
    "    {\"topic\": \"Sports\", \"joke\": \"If I was an Olympic athlete, I'd rather come in last than win the silver medal. You win the gold, you feel good. You win the bronze, you think, 'at least I got something.' But you win that silver, that's like, 'Congratulations, you almost won! Of all the losers, you came in first! You're the number one loser! No one lost ahead of you!'\", \"comedian\": \"Jerry Seinfeld\"},\n",
    "    {\"topic\": \"Identity\", \"joke\": \"My star sign is Pyrex. I was a test-tube baby.\", \"comedian\": \"Billy Connolly\"},\n",
    "    {\"topic\": \"Marriage\", \"joke\": \"I always take my wife morning tea in my pyjamas. But is she grateful? No, she says she'd rather have it in a cup.\", \"comedian\": \"Eric Morecambe\"},\n",
    "    {\"topic\": \"Shopping\", \"joke\": \"A man walks into a chemist's and says, 'Can I have a bar of soap, please?' The chemist says, 'Do you want it scented?' And the man says, 'No, I'll take it with me now.'\", \"comedian\": \"Ronnie Barker\"},\n",
    "    {\"topic\": \"Crime\", \"joke\": \"Crime in multi-storey car parks. That is wrong on so many different levels.\", \"comedian\": \"Tim Vine\"},\n",
    "    {\"topic\": \"Social Class\", \"joke\": \"You know you're working class when your TV is bigger than your bookcase.\", \"comedian\": \"Rob Beckett\"},\n",
    "    {\"topic\": \"Animals\", \"joke\": \"Owls haven't got necks, have they? An owl is essentially a one-piece unit.\", \"comedian\": \"Ross Noble\"},\n",
    "    {\"topic\": \"Fashion\", \"joke\": \"If you arrive fashionably late in Crocs, you're just late.\", \"comedian\": \"Joel Dommett\"},\n",
    "    {\"topic\": \"Technology\", \"joke\": \"My phone will ring at 2am and my wife'll look at me and go, \\\"Who's that calling at this time?\\\" I say, \\\"I don't know. If I knew that we wouldn't need the bloody phone.\\\"\", \"comedian\": \"Lee Evans\"},\n",
    "    {\"topic\": \"Philosophy\", \"joke\": \"I doubt there's a heaven; I think the people from hell have probably bought it for a timeshare.\", \"comedian\": \"Victoria Wood\"},\n",
    "    {\"topic\": \"Fitness\", \"joke\": \"I said to the gym instructor: \\\"Can you teach me to do the splits?\\\", He said: \\\"How flexible are you?\\\", I said: \\\"I can't make Tuesdays.\\\"\", \"comedian\": \"Tommy Cooper\"},\n",
    "    {\"topic\": \"Insurance\", \"joke\": \"Do Transformers get car, or life insurance?\", \"comedian\": \"Russell Howard\"},\n",
    "    {\"topic\": \"Police\", \"joke\": \"Alright lads, a giant fly is attacking the police station. I've called the SWAT team!\", \"comedian\": \"Greg Davies\"},\n",
    "    {\"topic\": \"Healthcare\", \"joke\": \"A good rule to remember for life is that when it comes to plastic surgery and sushi, never be attracted by a bargain.\", \"comedian\": \"Graham Norton\"},\n",
    "    {\"topic\": \"Animals\", \"joke\": \"Two monkeys were getting into the bath. One said: 'Oo, oo, oo, aah aah aah.' The other replied: 'Well, put some cold in it then.'\", \"comedian\": \"Harry Hill\"},\n",
    "    {\"topic\": \"Suburban Life\", \"joke\": \"My parents did just well enough so I could grow up poor around white people. When Nas and them used to talk about the projects, I used to get jealous. It sounded fun. Everybody in the projects was poor, and that's fair. But if you were poor in Silver Spring, nigga, it felt like it was only happening to you.\", \"comedian\": \"Dave Chappelle\"},\n",
    "    {\"topic\": \"Cultural Identity\", \"joke\": \"What is Rachel willing to do, so that we blacks believe that she believes she is actually one of us? Bitch, are you willing to put a lien on your house so that you can invest in a mixtape that probably won't work out?\", \"comedian\": \"Dave Chappelle\"},\n",
    "    {\"topic\": \"Aging\", \"joke\": \"I don't like looking at my dick anymore. My dick looks distinguished. It's old, an old-looking dick. It's got salt-and-pepper hair all around it. My dick looks like Morgan Freeman in the '90s.\", \"comedian\": \"Dave Chappelle\"},\n",
    "    {\"topic\": \"Fatherhood\", \"joke\": \"This motherfucker calls me up in the middle of the night. It was one o'clock in the morning and he goes, 'Dad, don't be mad […] I'm at a party and my designated driver had too much to drink. Me and friends need you to come pick us up.' I said, 'Jesus Christ, it's one o'clock in the morning. Nigga, I am shit-faced!'\", \"comedian\": \"Dave Chappelle\"},\n",
    "    {\"topic\": \"Political Commentary\", \"joke\": \"Eight years later, I'm pulling up to the polls again. This time, I'm driving a brand-new Porsche because the Obama years were very good to me […] I walked up and saw a long, long line of dusty white people […] I stood with them in line, like all us Americans are required to do in a democracy. Nobody skips the line to vote. And I listened to them say naïve, poor white people things.\", \"comedian\": \"Dave Chappelle\"},\n",
    "    {\"topic\": \"Leadership\", \"joke\": \"This motherfucker [Donald Trump] grabbed the podium and he goes, 'You don't know how scary the things I read in my briefings are.' Holy shit, man, you ain't supposed to tell us that, bro!\", \"comedian\": \"Dave Chappelle\"},\n",
    "    {\"topic\": \"Religious Satire\", \"joke\": \"I respect everybody's beliefs, except Amish people. They are the only ones I can say clearly, 'Their God is wrong.' The speed limit is 75 miles an hour in Ohio, and one lane of traffic is blocked by a goddamned horse and buggy?\", \"comedian\": \"Dave Chappelle\"},\n",
    "    {\"topic\": \"Hollywood\", \"joke\": \"You think I go to a Hollywood meeting with all them white people by myself? I bring my nigga Mac Mittens from the streets […] He's not even qualified to listen to these meetings, he just makes me feel good.\", \"comedian\": \"Dave Chappelle\"},\n",
    "    {\"topic\": \"Comedy Culture\", \"joke\": \"The tough part of being a comedian and knowing the motherfucker is, everybody comes up to me like, 'Did you know? Did you know what Louis was doing?' No, bitch, I did not know.\", \"comedian\": \"Dave Chappelle\"},\n",
    "    {\"topic\": \"National Identity\", \"joke\": \"I could kill every white person in America at one time. You know how I'd do it? Just wait for the Super Bowl, and right when they sing the National Anthem, I'd have O.J. Simpson walk to the 50-yard line with them bad knees.\", \"comedian\": \"Dave Chappelle\"},\n",
    "    {\"topic\": \"Gender Relations\", \"joke\": \"I used to do shows for drug dealers that wanted to clean their money up. One time I did a real good set, and these motherfuckers called me into the back room. They gave me $25,000 in cash […] I jumped on the subway and started heading towards Brooklyn at one o'clock in the morning.\", \"comedian\": \"Dave Chappelle\"},\n",
    "    {\"topic\": \"Scottish Heritage\", \"joke\": \"Scottish-Americans tell you that if you want to identify tartans, it's easy – you simply look under the kilt, and if it's a quarter-pounder, you know it's a McDonald's.\", \"comedian\": \"Billy Connolly\"},\n",
    "    {\"topic\": \"Judgement\", \"joke\": \"Before you judge a man, walk a mile in his shoes. After that who cares? He's a mile away and you've got his shoes!\", \"comedian\": \"Billy Connolly\"},\n",
    "    {\"topic\": \"Weather\", \"joke\": \"I hate all those weathermen, too, who tell you that rain is bad weather. There's no such thing as bad weather, just the wrong clothing, so get yourself a sexy raincoat and live a little.\", \"comedian\": \"Billy Connolly\"},\n",
    "    {\"topic\": \"Film Industry\", \"joke\": \"I'm a huge film star, but you have to hurry to the movies because I usually die in the first 15 f***ing minutes. I'm the only guy I know who died in a f***ing Muppet Movie.\", \"comedian\": \"Billy Connolly\"},\n",
    "    {\"topic\": \"Appearance\", \"joke\": \"I always look skint. When I buy a Big Issue, people take it out of my hand and give me a pound.\", \"comedian\": \"Billy Connolly\"},\n",
    "    {\"topic\": \"Sex Therapy\", \"joke\": \"One sex therapist claims that the most effective way to arouse your man is to spend 10 minutes licking his ears. Personally, I think its bollocks.\", \"comedian\": \"Billy Connolly\"},\n",
    "    {\"topic\": \"Cinema\", \"joke\": \"When people say while watching a film 'did you see that? No tosser, I paid ten quid to come to the cinema and stare at the f***ing floor.\", \"comedian\": \"Billy Connolly\"},\n",
    "    {\"topic\": \"Aeroplane Comfort\", \"joke\": \"I get claustrophobic easily and I don't get why aeroplane toilets don't f***ing have windows. I mean it's not as if anyone can f***ing see in. Unless of course you are the most determined pervert in the world.\", \"comedian\": \"Billy Connolly\"},\n",
    "    {\"topic\": \"Astrology\", \"joke\": \"My star sign is Pyrex. I was a test-tube baby.\", \"comedian\": \"Billy Connolly\"},\n",
    "    {\"topic\": \"Parenting\", \"joke\": \"Don't buy one of those baby intercoms. Babies pretend to be dead. They're bastards, and they do it on purpose.\", \"comedian\": \"Billy Connolly\"},\n",
    "    {\"topic\": \"Common Sayings\", \"joke\": \"Why do people say 'Oh you want to have your cake and eat it too?' Dead right! What good is a cake if you can't eat it?\", \"comedian\": \"Billy Connolly\"},\n",
    "    {\"topic\": \"Life Perception\", \"joke\": \"When people say 'life is short'. What the f***? Life is the longest damn thing anyone ever f***ing does! What can you do that's longer?\", \"comedian\": \"Billy Connolly\"},\n",
    "    {\"topic\": \"Dating\", \"joke\": \"I like a woman with a head on her shoulders. I hate necks.\", \"comedian\": \"Steve Martin\"},\n",
    "    {\"topic\": \"Growing Up\", \"joke\": \"I have a lot of growing up to do. I realised that the other day inside my fort.\", \"comedian\": \"Zach Galifianakis\"},\n",
    "    {\"topic\": \"Employment\", \"joke\": \"I used to work at McDonald's making minimum wage. You know what that means when someone pays you minimum wage? You know what your boss was trying to say? 'Hey, if I could pay you less, I would, but it's against the law.'\", \"comedian\": \"Chris Rock\"},\n",
    "    {\"topic\": \"Love\", \"joke\": \"Love is like a fart. If you have to force it it's probably s***.\", \"comedian\": \"Stephen K. Amos\"},\n",
    "    {\"topic\": \"Convenience\", \"joke\": \"I like an escalator because an escalator can never break. It can only become stairs. There would never be an 'Escalator Temporarily Out of Order' sign, only 'Escalator Temporarily Stairs'.\", \"comedian\": \"Mitch Hedberg\"},\n",
    "    {\"topic\": \"Sports\", \"joke\": \"If I was an Olympic athlete, I'd rather come in last than win the silver medal. You win the gold, you feel good. You win the bronze, you think, 'at least I got something.' But you win that silver, that's like, 'Congratulations, you almost won! Of all the losers, you came in first! You're the number one loser! No one lost ahead of you!'\", \"comedian\": \"Jerry Seinfeld\"},\n",
    "    {\"topic\": \"Religion\", \"joke\": \"We weren't very religious. On Hanukkah, my mother had our menorah on a dimmer.\", \"comedian\": \"Richard Lewis\"},\n",
    "    {\"topic\": \"Beauty\", \"joke\": \"My girlfriend is absolutely beautiful. Body like a Greek statue – completely pale, no arms.\", \"comedian\": \"Phil Wang\"},\n",
    "    {\"topic\": \"Creation\", \"joke\": \"If God had written the Bible, the first line should have been 'It's round.'\", \"comedian\": \"Eddie Izzard\"},\n",
    "    {\"topic\": \"Self-Improvement\", \"joke\": \"I bought myself some glasses. My observational comedy improved.\", \"comedian\": \"Sara Pascoe\"},\n",
    "    {\"topic\": \"Politics\", \"joke\": \"Trump's nothing like Hitler. There's no way he could write a book.\", \"comedian\": \"Frankie Boyle\"},\n",
    "    {\"topic\": \"Social Class\", \"joke\": \"You know you're working class when your TV is bigger than your book case.\", \"comedian\": \"Rob Beckett\"},\n",
    "    {\"topic\": \"Conflict\", \"joke\": \"Most of my life is spent avoiding conflict. I hardly ever visit Syria.\", \"comedian\": \"Alex Horne\"},\n",
    "    {\"topic\": \"Relaxation\", \"joke\": \"A spa hotel? It's like a normal hotel, only in reception there's a picture of a pebble.\", \"comedian\": \"Rhod Gilbert\"},\n",
    "    {\"topic\": \"Health\", \"joke\": \"Life is like a box of chocolates. It doesn't last long if you're fat.\", \"comedian\": \"Joe Lycett\"},\n",
    "    {\"topic\": \"Career\", \"joke\": \"My Dad said, always leave them wanting more. Ironically, that's how he lost his job in disaster relief.\", \"comedian\": \"Mark Watson\"},\n",
    "    {\"topic\": \"Memory\", \"joke\": \"Apparently smoking cannabis can affect your short term memory. Well if that's true, what do you think smoking cannabis does?\", \"comedian\": \"Mickey P Kerr\"},\n",
    "    {\"topic\": \"Philosophy\", \"joke\": \"How many philosophers does it take to change a lightbulb?…. none. They're not really into that sort of thing. If it's that dark, light a candle.\", \"comedian\": \"Phil Cornwell\"},\n",
    "    {\"topic\": \"Marriage\", \"joke\": \"The first time I met my wife, I knew she was a keeper. She was wearing massive gloves.\", \"comedian\": \"Alun Cochrane\"},\n",
    "    {\"topic\": \"Childhood\", \"joke\": \"As a kid I was made to walk the plank. We couldn't afford a dog.\", \"comedian\": \"Gary Delaney\"},\n",
    "    {\"topic\": \"Misunderstanding\", \"joke\": \"Two fish in a tank. One says: 'How do you drive this thing?'\", \"comedian\": \"Peter Kay\"},\n",
    "    {\"topic\": \"Entertainment\", \"joke\": \"I saw a documentary on how ships are kept together. Riveting!\", \"comedian\": \"Stewart Francis\"},\n",
    "    {\"topic\": \"Music\", \"joke\": \"People who like trance music are very persistent. They don't techno for an answer.\", \"comedian\": \"Joel Dommett\"},\n",
    "    {\"topic\": \"Dating\", \"joke\": \"I used to go out with a giraffe. Used to take it to the pictures and that. You'd always get some bloke complaining that he couldn't see the screen. It's a giraffe, mate. What do you expect? 'Well he can take his hat off for a start!'\", \"comedian\": \"Paul Merton\"},\n",
    "    {\"topic\": \"Weather\", \"joke\": \"Normally you have news, weather and travel. But not on snow day. On a snow day, news is weather is travel.\", \"comedian\": \"Michael McIntyre\"},\n",
    "    {\"topic\": \"Music\", \"joke\": \"Here's a picture of me with REM. That's me in the corner.\", \"comedian\": \"Milton Jones\"},\n",
    "    {\"topic\": \"Sarcasm\", \"joke\": \"Someone showed me a photograph of my local MP the other day. 'Would you buy a second-hand car from this man?' they asked. 'Would you buy a second-hand car?' I replied.\", \"comedian\": \"Miles Jupp\"},\n",
    "    {\"topic\": \"Culture\", \"joke\": \"With stand-up in Britain, what you have to do is bloody swearing. In Germany, we don't have to swear. Reason being, things work.\", \"comedian\": \"Henning When\"},\n",
    "    {\"topic\": \"Learning\", \"joke\": \"I'm learning the hokey cokey. Not all of it. But – I've got the ins and outs.\", \"comedian\": \"Iain Stirling\"},\n",
    "    {\"topic\": \"Identity\", \"joke\": \"Roses are red, violets are blue, I'm a schizophrenic, and so am I.\", \"comedian\": \"Billy Connolly\"},\n",
    "    {\"topic\": \"Parenting\", \"joke\": \"My mother told me, you don't have to put anything in your mouth you don't want to. Then she made me eat broccoli, which felt like double standards.\", \"comedian\": \"Sarah Millican\"},\n",
    "    {\"topic\": \"Vengeance\", \"joke\": \"My therapist says I have a preoccupation with vengeance. We'll see about that.\", \"comedian\": \"Stewart Francis\"},\n",
    "    {\"topic\": \"Family\", \"joke\": \"I'm sure wherever my Dad is, he's looking down on us. He's not dead, just very condescending.\", \"comedian\": \"Jack Whitehall\"},\n",
    "    {\"topic\": \"Marriage\", \"joke\": \"'What's a couple?' I asked my mum. She said, 'Two or three'. Which probably explains why her marriage collapsed.\", \"comedian\": \"Josie Long\"},\n",
    "    {\"topic\": \"Injury\", \"joke\": \"The easiest time to add insult to injury is when you're signing somebody's cast.\", \"comedian\": \"Demetri Martin\"},\n",
    "    {\"topic\": \"Communication\", \"joke\": \"I was in my car driving back from work. A police officer pulled me over and knocked on my window. I said, 'One minute I'm on the phone.'\", \"comedian\": \"Alan Carr\"},\n",
    "    {\"topic\": \"Afterlife\", \"joke\": \"I doubt there's a heaven; I think the people from hell have probably bought it for a timeshare.\", \"comedian\": \"Victoria Wood\"},\n",
    "    {\"topic\": \"Flexibility\", \"joke\": \"I said to the gym instructor: 'Can you teach me to do the splits?' He said: 'How flexible are you?' I said: 'I can't make Tuesdays.'\", \"comedian\": \"Tommy Cooper\"},\n",
    "    {\"topic\": \"Misunderstanding\", \"joke\": \"A man walks into a chemist's and says, 'Can I have a bar of soap, please?' The chemist says, 'Do you want it scented?' And the man says, 'No, I'll take it with me now.'\", \"comedian\": \"Ronnie Barker\"},\n",
    "    {\"topic\": \"Humor\", \"joke\": \"It's really hard to define 'virtue signalling', as I was saying the other day to some of my Muslim friends over a fair-trade coffee in our local feminist bookshop.\", \"comedian\": \"Lucy Porter\"},\n",
    "    {\"topic\": \"Creation\", \"joke\": \"If we were truly created by God, then why do we still occasionally bite the insides of our own mouths?\", \"comedian\": \"Dara Ó Briain\"},\n",
    "    {\"topic\": \"Insurance\", \"joke\": \"Do Transformers get car, or life insurance?\", \"comedian\": \"Russell Howard\"},\n",
    "    {\"topic\": \"Emergency\", \"joke\": \"Alright lads, a giant fly is attacking the police station. I've called the SWAT team!\", \"comedian\": \"Greg Davies\"},\n",
    "    {\"topic\": \"Consumerism\", \"joke\": \"A good rule to remember for life is that when it comes to plastic surgery and sushi, never be attracted by a bargain.\", \"comedian\": \"Graham Norton\"},\n",
    "    {\"topic\": \"Family\", \"joke\": \"My father drank so heavily, when he blew on the birthday cake he lit the candles.\", \"comedian\": \"Les Dawson\"},\n",
    "    {\"topic\": \"Therapy\", \"joke\": \"I've been feeling suicidal so my therapist suggested I do CBT. Now I can ride a motorbike, how's that going to help?\", \"comedian\": \"Eric Lampaert\"},\n",
    "]\n",
    "\n",
    "# Dataset of generic, unfunny jokes (labeled as not funny)\n",
    "unfunny_jokes = [\n",
    "    {\"topic\": \"Science\", \"joke\": \"Why don't scientists trust atoms? Because they make up everything.\"},\n",
    "    {\"topic\": \"Field\", \"joke\": \"Why did the scarecrow win an award? Because he was outstanding in his field.\"},\n",
    "    {\"topic\": \"Animals\", \"joke\": \"Why do cows have hooves instead of feet? Because they lactose.\"},\n",
    "    {\"topic\": \"Food\", \"joke\": \"What do you call fake spaghetti? An impasta.\"},\n",
    "    {\"topic\": \"Animals\", \"joke\": \"How does a penguin build its house? Igloos it together.\"},\n",
    "    {\"topic\": \"Halloween\", \"joke\": \"What do you get when you cross a snowman and a vampire? Frostbite.\"},\n",
    "    {\"topic\": \"Books\", \"joke\": \"Why was the math book sad? It had too many problems.\"},\n",
    "    {\"topic\": \"Food\", \"joke\": \"What do you call cheese that isn't yours? Nacho cheese.\"},\n",
    "    {\"topic\": \"Skeletons\", \"joke\": \"Why don't skeletons fight each other? They don't have the guts.\"},\n",
    "    {\"topic\": \"Walls\", \"joke\": \"What did one wall say to the other wall? I'll meet you at the corner.\"},\n",
    "    {\"topic\": \"Transportation\", \"joke\": \"Why did the bicycle fall over? It was two-tired.\"},\n",
    "    {\"topic\": \"Animals\", \"joke\": \"What do you call a bear with no teeth? A gummy bear.\"},\n",
    "    {\"topic\": \"Gym\", \"joke\": \"Why don't some couples go to the gym? Because some relationships don't work out.\"},\n",
    "    {\"topic\": \"Factories\", \"joke\": \"What do you call a factory that makes good products? A satisfactory.\"},\n",
    "    {\"topic\": \"Golf\", \"joke\": \"Why did the golfer bring an extra pair of pants? In case he got a hole in one.\"},\n",
    "    {\"topic\": \"Cleaning\", \"joke\": \"What did the janitor say when he jumped out of the closet? Supplies!\"},\n",
    "    {\"topic\": \"Animals\", \"joke\": \"What do you call a fish with no eyes? Fsh.\"},\n",
    "    {\"topic\": \"Charity\", \"joke\": \"Why don't oysters donate to charity? Because they are shellfish.\"},\n",
    "    {\"topic\": \"Food\", \"joke\": \"What did the grape do when it got stepped on? Nothing but let out a little wine.\"},\n",
    "    {\"topic\": \"Animals\", \"joke\": \"Why was the big cat disqualified from the race? Because it was a cheetah.\"},\n",
    "    {\"topic\": \"Fashion\", \"joke\": \"What do you call a belt made of watches? A waist of time.\"},\n",
    "    {\"topic\": \"Body\", \"joke\": \"Why can't your nose be 12 inches long? Because then it would be a foot.\"},\n",
    "    {\"topic\": \"Sports\", \"joke\": \"Why don't some fish play basketball? Because they are afraid of the net.\"},\n",
    "    {\"topic\": \"Animals\", \"joke\": \"What do you call a pile of cats? A meowtain.\"},\n",
    "    {\"topic\": \"Coffee\", \"joke\": \"Why did the coffee file a police report? It got mugged.\"},\n",
    "    {\"topic\": \"Weather\", \"joke\": \"Why did the stadium get hot after the game? All the fans left.\"},\n",
    "    {\"topic\": \"Plates\", \"joke\": \"What did one plate say to the other plate? Lunch is on me.\"},\n",
    "    {\"topic\": \"Space\", \"joke\": \"How do you organize a space party? You planet.\"},\n",
    "    {\"topic\": \"Food\", \"joke\": \"Why don't eggs tell jokes? They'd crack each other up.\"},\n",
    "    {\"topic\": \"Halloween\", \"joke\": \"How does a vampire start a letter? Tomb it may concern.\"},\n",
    "    {\"topic\": \"Technology\", \"joke\": \"Why did the computer go to the doctor? It had a virus.\"},\n",
    "    {\"topic\": \"Boomerangs\", \"joke\": \"What do you call a boomerang that doesn't come back? A stick.\"},\n",
    "    {\"topic\": \"Ghosts\", \"joke\": \"Why are ghosts bad at lying? Because you can see right through them.\"},\n",
    "    {\"topic\": \"Animals\", \"joke\": \"What do you get when you cross a sheep and a kangaroo? A woolly jumper.\"},\n",
    "    {\"topic\": \"Food\", \"joke\": \"Why did the tomato turn red? Because it saw the salad dressing.\"},\n",
    "    {\"topic\": \"School\", \"joke\": \"Why did the math teacher take off points? Because the student's answer was too square.\"},\n",
    "    {\"topic\": \"Birds\", \"joke\": \"Why do seagulls fly over the ocean? Because if they flew over the bay, they'd be bagels.\"},\n",
    "    {\"topic\": \"Food\", \"joke\": \"Why was the baby strawberry crying? Because its parents were in a jam.\"},\n",
    "    {\"topic\": \"Technology\", \"joke\": \"What do you call a droid that takes the long way around? R2 detour.\"},\n",
    "    {\"topic\": \"Fashion\", \"joke\": \"Why did the scarecrow get promoted? He was outstanding in his field.\"},\n",
    "    {\"topic\": \"Fashion\", \"joke\": \"What did one hat say to the other hat? You stay here, I'll go on ahead.\"},\n",
    "    {\"topic\": \"Fashion\", \"joke\": \"Why was the belt arrested? It held up a pair of pants.\"},\n",
    "    {\"topic\": \"Animals\", \"joke\": \"What do you call an alligator in a vest? An investigator.\"},\n",
    "    {\"topic\": \"Animals\", \"joke\": \"Why don't you see elephants hiding in trees? Because they're so good at it.\"},\n",
    "    {\"topic\": \"Books\", \"joke\": \"Why did the math book look sad? Because it had too many problems.\"},\n",
    "    {\"topic\": \"Bees\", \"joke\": \"Why do bees have sticky hair? Because they use honeycombs.\"},\n",
    "    {\"topic\": \"Music\", \"joke\": \"Why did the chicken join a band? Because it had the drumsticks.\"},\n",
    "    {\"topic\": \"Animals\", \"joke\": \"How do you catch a squirrel? Climb a tree and act like a nut.\"},\n",
    "    {\"topic\": \"Technology\", \"joke\": \"Why was the computer cold? It left its Windows open.\"},\n",
    "    {\"topic\": \"Animals\", \"joke\": \"What do you call a magic dog? A labracadabrador.\"},\n",
    "    {\"topic\": \"Sports\", \"joke\": \"Why don't some fish play basketball? Because they're afraid of the net.\"},\n",
    "    {\"topic\": \"Oceans\", \"joke\": \"What did one ocean say to the other ocean? Nothing, they just waved.\"},\n",
    "    {\"topic\": \"Dogs\", \"joke\": \"Why did the cowboy get a dachshund? Because he wanted to get a long little doggie.\"},\n",
    "    {\"topic\": \"Snowmen\", \"joke\": \"What do you call a snowman with a six-pack? An abdominal snowman.\"},\n",
    "    {\"topic\": \"Food\", \"joke\": \"Why did the tomato turn red? Because it saw the salad dressing.\"},\n",
    "    {\"topic\": \"Animals\", \"joke\": \"How does a penguin build its house? Igloos it together.\"},\n",
    "    {\"topic\": \"Golf\", \"joke\": \"Why did the golfer bring extra pants? In case he got a hole in one.\"},\n",
    "    {\"topic\": \"Animals\", \"joke\": \"What do you call an alligator in a vest? An investigator.\"},\n",
    "    {\"topic\": \"Fashion\", \"joke\": \"Why do cows wear bells? Because their horns don't work.\"},\n",
    "    {\"topic\": \"Field\", \"joke\": \"Why did the scarecrow become a successful neurosurgeon? Because he was outstanding in his field.\"},\n",
    "    {\"topic\": \"Cleaning\", \"joke\": \"What did the janitor say when he jumped out of the closet? Supplies!\"},\n",
    "    {\"topic\": \"Science\", \"joke\": \"Why don't scientists trust atoms? Because they make up everything.\"},\n",
    "    {\"topic\": \"Skeletons\", \"joke\": \"Why did the skeleton go to the party alone? He had no body to go with him.\"},\n",
    "    {\"topic\": \"Transportation\", \"joke\": \"Why did the bicycle fall over? It was two-tired.\"},\n",
    "    {\"topic\": \"Technology\", \"joke\": \"Why did the computer go to the doctor? It had a virus.\"},\n",
    "    {\"topic\": \"Food\", \"joke\": \"What did the grape do when it got stepped on? Nothing but let out a little wine.\"},\n",
    "    {\"topic\": \"Ghosts\", \"joke\": \"Why do ghosts like elevators? Because it lifts their spirits.\"},\n",
    "    {\"topic\": \"Science\", \"joke\": \"Why can't you trust an atom? Because they make up everything.\"},\n",
    "    {\"topic\": \"Food\", \"joke\": \"What do you call fake spaghetti? An impasta.\"},\n",
    "    {\"topic\": \"Cleaning\", \"joke\": \"How do you make a tissue dance? Put a little boogie in it.\"},\n",
    "    {\"topic\": \"Charity\", \"joke\": \"Why don't oysters donate to charity? Because they are shellfish.\"},\n",
    "    {\"topic\": \"Boomerangs\", \"joke\": \"What do you call a boomerang that doesn't come back? A stick.\"},\n",
    "    {\"topic\": \"Books\", \"joke\": \"Why did the math book look sad? Because it had too many problems.\"},\n",
    "    {\"topic\": \"Skeletons\", \"joke\": \"Why don't skeletons fight each other? They don't have the guts.\"},\n",
    "    {\"topic\": \"Walls\", \"joke\": \"What did one wall say to the other wall? I'll meet you at the corner.\"},\n",
    "    {\"topic\": \"Animals\", \"joke\": \"What do you call a bear with no teeth? A gummy bear.\"},\n",
    "    {\"topic\": \"Plates\", \"joke\": \"What did one plate say to the other plate? Lunch is on me.\"},\n",
    "    {\"topic\": \"Space\", \"joke\": \"How do you organize a space party? You planet.\"},\n",
    "    {\"topic\": \"Food\", \"joke\": \"Why don't eggs tell jokes? They'd crack each other up.\"},\n",
    "    {\"topic\": \"Halloween\", \"joke\": \"How does a vampire start a letter? Tomb it may concern.\"},\n",
    "    {\"topic\": \"Coffee\", \"joke\": \"Why did the coffee file a police report? It got mugged.\"},\n",
    "    {\"topic\": \"Golf\", \"joke\": \"Why did the golfer bring an extra pair of pants? In case he got a hole in one.\"},\n",
    "    {\"topic\": \"Animals\", \"joke\": \"What do you call a fish with no eyes? Fsh.\"},\n",
    "    {\"topic\": \"Food\", \"joke\": \"Why did the tomato turn red? Because it saw the salad dressing.\"},\n",
    "    {\"topic\": \"Birds\", \"joke\": \"Why don't seagulls fly over the bay? Because then they'd be bagels.\"},\n",
    "    {\"topic\": \"Food\", \"joke\": \"Why do cows have hooves instead of feet? Because they lactose.\"},\n",
    "    {\"topic\": \"Sports\", \"joke\": \"Why don't some fish play basketball? Because they're afraid of the net.\"},\n",
    "    {\"topic\": \"Field\", \"joke\": \"Why did the scarecrow win an award? Because he was outstanding in his field.\"},\n",
    "    {\"topic\": \"Food\", \"joke\": \"What do you call cheese that isn't yours? Nacho cheese.\"},\n",
    "    {\"topic\": \"Transportation\", \"joke\": \"Why did the bicycle fall over? It was two-tired.\"},\n",
    "    {\"topic\": \"Animals\", \"joke\": \"How does a penguin build its house? Igloos it together.\"},\n",
    "    {\"topic\": \"Animals\", \"joke\": \"What do you call a pile of cats? A meowtain.\"},\n",
    "    {\"topic\": \"Fashion\", \"joke\": \"What did one hat say to the other hat? You stay here, I'll go on ahead.\"},\n",
    "    {\"topic\": \"Animals\", \"joke\": \"What do you call an alligator in a vest? An investigator.\"},\n",
    "    {\"topic\": \"Charity\", \"joke\": \"Why don't oysters donate to charity? Because they are shellfish.\"},\n",
    "    {\"topic\": \"Food\", \"joke\": \"What did the grape do when it got stepped on? Nothing but let out a little wine.\"},\n",
    "    {\"topic\": \"Golf\", \"joke\": \"Why did the golfer bring an extra pair of pants? In case he got a hole in one.\"},\n",
    "    {\"topic\": \"Food\", \"joke\": \"Why was the baby strawberry crying? Because its parents were in a jam.\"},\n",
    "    {\"topic\": \"Factories\", \"joke\": \"What do you call a factory that makes good products? A satisfactory.\"},\n",
    "    {\"topic\": \"Skeletons\", \"joke\": \"Why don't skeletons fight each other? They don't have the guts.\"},\n",
    "    {\"topic\": \"Animals\", \"joke\": \"What do you call a fish with no eyes? Fsh.\"},\n",
    "    {\"topic\": \"Gym\", \"joke\": \"Why don't some couples go to the gym? Because some relationships don't work out.\"},\n",
    "    {\"topic\": \"Field\", \"joke\": \"Why did the scarecrow win an award? Because he was outstanding in his field.\"},\n",
    "    {\"topic\": \"Food\", \"joke\": \"What do you call fake spaghetti? An impasta.\"},\n",
    "    {\"topic\": \"Halloween\", \"joke\": \"How does a vampire start a letter? Tomb it may concern.\"},\n",
    "    {\"topic\": \"Technology\", \"joke\": \"Why did the computer go to the doctor? It had a virus.\"},\n",
    "    {\"topic\": \"Boomerangs\", \"joke\": \"What do you call a boomerang that doesn't come back? A stick.\"},\n",
    "    {\"topic\": \"Food\", \"joke\": \"Why did the tomato turn red? Because it saw the salad dressing.\"},\n",
    "    {\"topic\": \"Birds\", \"joke\": \"Why do seagulls fly over the ocean? Because if they flew over the bay, they'd be bagels.\"},\n",
    "    {\"topic\": \"Food\", \"joke\": \"Why was the baby strawberry crying? Because its parents were in a jam.\"},\n",
    "    {\"topic\": \"Technology\", \"joke\": \"What do you call a droid that takes the long way around? R2 detour.\"},\n",
    "    {\"topic\": \"Fashion\", \"joke\": \"Why did the scarecrow get promoted? He was outstanding in his field.\"},\n",
    "    {\"topic\": \"Fashion\", \"joke\": \"What did one hat say to the other hat? You stay here, I'll go on ahead.\"},\n",
    "    {\"topic\": \"Fashion\", \"joke\": \"Why was the belt arrested? It held up a pair of pants.\"},\n",
    "    {\"topic\": \"Animals\", \"joke\": \"What do you call an alligator in a vest? An investigator.\"},\n",
    "    {\"topic\": \"Animals\", \"joke\": \"Why don't you see elephants hiding in trees? Because they're so good at it.\"},\n",
    "    {\"topic\": \"Books\", \"joke\": \"Why did the math book look sad? Because it had too many problems.\"},\n",
    "    {\"topic\": \"Bees\", \"joke\": \"Why do bees have sticky hair? Because they use honeycombs.\"},\n",
    "    {\"topic\": \"Music\", \"joke\": \"Why did the chicken join a band? Because it had the drumsticks.\"},\n",
    "    {\"topic\": \"Animals\", \"joke\": \"How do you catch a squirrel? Climb a tree and act like a nut.\"},\n",
    "    {\"topic\": \"Technology\", \"joke\": \"Why was the computer cold? It left its Windows open.\"},\n",
    "    {\"topic\": \"Animals\", \"joke\": \"What do you call a magic dog? A labracadabrador.\"},\n",
    "    {\"topic\": \"Sports\", \"joke\": \"Why don't some fish play basketball? Because they're afraid of the net.\"},\n",
    "    {\"topic\": \"Oceans\", \"joke\": \"What did one ocean say to the other ocean? Nothing, they just waved.\"},\n",
    "    {\"topic\": \"Dogs\", \"joke\": \"Why did the cowboy get a dachshund? Because he wanted to get a long little doggie.\"},\n",
    "    {\"topic\": \"Snowmen\", \"joke\": \"What do you call a snowman with a six-pack? An abdominal snowman.\"},\n",
    "    {\"topic\": \"Food\", \"joke\": \"Why did the tomato turn red? Because it saw the salad dressing.\"}\n",
    "]\n",
    "\n",
    "# Convert to DSPy format\n",
    "dataset = []\n",
    "\n",
    "# Process funny jokes\n",
    "for row in funny_jokes:\n",
    "    topic, joke = row[\"topic\"], row[\"joke\"]\n",
    "    # Create DSPy Example with labels\n",
    "    dataset.append(dspy.Example(topic=topic, joke=joke, funny=True).with_inputs(\"topic\", \"joke\"))\n",
    "\n",
    "# Process unfunny jokes  \n",
    "for row in unfunny_jokes:\n",
    "    topic, joke = row[\"topic\"], row[\"joke\"]\n",
    "    dataset.append(dspy.Example(topic=topic, joke=joke, funny=False).with_inputs(\"topic\", \"joke\"))\n",
    "\n",
    "# Shuffle the dataset\n",
    "random.shuffle(dataset)\n",
    "\n",
    "# Split into 60% training, 20% validation, 20% dev\n",
    "num_items = len(dataset)\n",
    "train_index = int(0.6 * num_items)\n",
    "val_index = int(0.8 * num_items)\n",
    "\n",
    "trainset = dataset[:train_index]\n",
    "valset = dataset[train_index:val_index]\n",
    "devset = dataset[val_index:]\n",
    "\n",
    "print(f\"Training set size: {len(trainset)}\")\n",
    "print(f\"Validation set size: {len(valset)}\")\n",
    "print(f\"Development set size: {len(devset)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "l9xs66gl0ar",
   "metadata": {},
   "source": [
    "### Dataset Created Successfully!\n",
    "\n",
    "We've built a dataset of 254 jokes:\n",
    "- 127 professional comedian jokes labeled as funny\n",
    "- 127 generic dad jokes labeled as not funny\n",
    "- Split into 152 training examples (training data) and 51 validation examples (for testing), 51 development examples (holdout group)\n",
    "\n",
    "The `with_inputs()` method tells DSPy which fields are inputs vs outputs."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "35booaybsay",
   "metadata": {},
   "source": [
    "### Examining Our Data\n",
    "\n",
    "Let's look at an example from our training set:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "id": "109b0a00",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Example({'topic': 'Books', 'joke': 'Why did the math book look sad? Because it had too many problems.', 'funny': False}) (input_keys={'topic', 'joke'})\n",
      "\n",
      "Topic: Books\n",
      "Joke: Why did the math book look sad? Because it had too many problems.\n",
      "Funny: False\n"
     ]
    }
   ],
   "source": [
    "# Look at an example from our training set\n",
    "print(trainset[0])\n",
    "print(f\"\\nTopic: {trainset[0].topic}\")\n",
    "print(f\"Joke: {trainset[0].joke}\")\n",
    "print(f\"Funny: {trainset[0].funny}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e1kucagcrg8",
   "metadata": {},
   "source": [
    "## 9. Creating a Joke Judge\n",
    "\n",
    "Let's create a program that can judge whether a joke is funny or not:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "id": "bbd9b957",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Judge's reasoning: The joke plays on the double meaning of the word \"problems.\" In the context of a math book, \"problems\" refers to mathematical exercises. However, \"problems\" can also refer to difficulties or sources of sadness. The joke is funny because it personifies the math book and attributes its sadness to the abundance of mathematical problems it contains.\n",
      "\n",
      "Judge says funny: True\n",
      "Ground truth: False\n"
     ]
    }
   ],
   "source": [
    "# Create a joke judge with Chain of Thought reasoning\n",
    "# Input: topic and joke, Output: funny (boolean)\n",
    "joke_judge = dspy.ChainOfThought('topic, joke -> funny: bool')\n",
    "\n",
    "# Test on our first training example\n",
    "result = joke_judge(topic=trainset[0].topic, joke=trainset[0].joke)\n",
    "print(f\"Judge's reasoning: {result.reasoning}\")\n",
    "print(f\"\\nJudge says funny: {result.funny}\")\n",
    "print(f\"Ground truth: {trainset[0].funny}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "10b5202a",
   "metadata": {},
   "source": [
    "Using an inline dspy signature we input the topic and joke and get a rating of whether the joke was funny or not. We need this to act as the evaluation metric for our joke generator, using the LLM-as-a-Judge technique."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "kvfu4c2lz2c",
   "metadata": {},
   "source": [
    "## 10. Evaluating Our Judge\n",
    "\n",
    "Now let's evaluate how well our judge performs on the validation set:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "id": "5f742e4f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average Metric: 26.00 / 51 (51.0%): 100%|██████████| 51/51 [00:00<00:00, 4329.71it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:39:29 INFO dspy.evaluate.evaluate: Average Metric: 26 / 51 (51.0%)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>topic</th>\n",
       "      <th>joke</th>\n",
       "      <th>example_funny</th>\n",
       "      <th>reasoning</th>\n",
       "      <th>pred_funny</th>\n",
       "      <th>exact_match</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Field</td>\n",
       "      <td>Why did the scarecrow become a successful neurosurgeon? Because he...</td>\n",
       "      <td>False</td>\n",
       "      <td>The joke plays on the double meaning of the word \"field.\" It refer...</td>\n",
       "      <td>True</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Ghosts</td>\n",
       "      <td>Why are ghosts bad at lying? Because you can see right through them.</td>\n",
       "      <td>False</td>\n",
       "      <td>The joke plays on the literal transparency of ghosts and the figur...</td>\n",
       "      <td>True</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Marriage</td>\n",
       "      <td>The first time I met my wife, I knew she was a keeper. She was wea...</td>\n",
       "      <td>True</td>\n",
       "      <td>The joke is funny because it sets up an expectation of a romantic ...</td>\n",
       "      <td>True</td>\n",
       "      <td>✔️ [True]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Wealth</td>\n",
       "      <td>My wealth and happiness would suggest that God definitely does lov...</td>\n",
       "      <td>True</td>\n",
       "      <td>The joke plays on the common saying that wealth and happiness are ...</td>\n",
       "      <td>True</td>\n",
       "      <td>✔️ [True]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Social Media</td>\n",
       "      <td>Following someone on Twitter and asking them to tweet about someth...</td>\n",
       "      <td>True</td>\n",
       "      <td>The joke uses an analogy to highlight the absurdity of asking some...</td>\n",
       "      <td>True</td>\n",
       "      <td>✔️ [True]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          topic  \\\n",
       "0         Field   \n",
       "1        Ghosts   \n",
       "2      Marriage   \n",
       "3        Wealth   \n",
       "4  Social Media   \n",
       "\n",
       "                                                                    joke  \\\n",
       "0  Why did the scarecrow become a successful neurosurgeon? Because he...   \n",
       "1   Why are ghosts bad at lying? Because you can see right through them.   \n",
       "2  The first time I met my wife, I knew she was a keeper. She was wea...   \n",
       "3  My wealth and happiness would suggest that God definitely does lov...   \n",
       "4  Following someone on Twitter and asking them to tweet about someth...   \n",
       "\n",
       "   example_funny  \\\n",
       "0          False   \n",
       "1          False   \n",
       "2           True   \n",
       "3           True   \n",
       "4           True   \n",
       "\n",
       "                                                               reasoning  \\\n",
       "0  The joke plays on the double meaning of the word \"field.\" It refer...   \n",
       "1  The joke plays on the literal transparency of ghosts and the figur...   \n",
       "2  The joke is funny because it sets up an expectation of a romantic ...   \n",
       "3  The joke plays on the common saying that wealth and happiness are ...   \n",
       "4  The joke uses an analogy to highlight the absurdity of asking some...   \n",
       "\n",
       "   pred_funny exact_match  \n",
       "0        True              \n",
       "1        True              \n",
       "2        True   ✔️ [True]  \n",
       "3        True   ✔️ [True]  \n",
       "4        True   ✔️ [True]  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "\n",
       "            <div style='\n",
       "                text-align: center;\n",
       "                font-size: 16px;\n",
       "                font-weight: bold;\n",
       "                color: #555;\n",
       "                margin: 10px 0;'>\n",
       "                ... 46 more rows not displayed ...\n",
       "            </div>\n",
       "            "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Basic judge accuracy: 50.98%\n"
     ]
    }
   ],
   "source": [
    "# Import evaluation tools\n",
    "from dspy.evaluate import Evaluate\n",
    "\n",
    "# Define our evaluation metric\n",
    "def exact_match(pred, gold, trace=None):\n",
    "    \"\"\"Check if the predicted 'funny' label matches the ground truth\"\"\"\n",
    "    return pred.funny == gold.funny\n",
    "\n",
    "# Create an evaluator\n",
    "evaluate = Evaluate(\n",
    "    metric=exact_match, \n",
    "    devset=devset, # the optimized judge hasn't seen this data yet\n",
    "    num_threads=8,  # Run evaluations in parallel\n",
    "    display_progress=True,\n",
    "    display_table=5  # Show first 5 results\n",
    ")\n",
    "\n",
    "# Evaluate our basic judge\n",
    "basic_judge_score = evaluate(joke_judge)\n",
    "print(f\"\\nBasic judge accuracy: {basic_judge_score}%\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "unftaio46po",
   "metadata": {},
   "source": [
    "### Basic Judge Performance: 51% Accuracy\n",
    "\n",
    "The unoptimized judge is performing at random chance (50%)! Looking at the results:\n",
    "- It tends to judge most jokes as funny (even the dad jokes)\n",
    "- It correctly identifies funny professional jokes\n",
    "- But it also thinks simple puns are funny when they're labeled as not funny\n",
    "\n",
    "This shows why we need optimization - the basic judge is too generous!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5odcl0oig5",
   "metadata": {},
   "source": [
    "## 11. Optimizing with Bootstrap Few-Shot\n",
    "\n",
    "DSPy can automatically optimize programs by finding the best few-shot examples:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "id": "91c2ced1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimizing judge with Bootstrap Few-Shot...\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  3%|▎         | 4/152 [00:00<00:00, 701.86it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Bootstrapped 4 full traces after 4 examples for up to 1 rounds, amounting to 4 attempts.\n",
      "\n",
      "Optimized judge says funny: False\n",
      "Ground truth: False\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "# Import the Bootstrap optimizer\n",
    "from dspy.teleprompt import BootstrapFewShot\n",
    "\n",
    "# Create optimizer\n",
    "bootstrap_optimizer = BootstrapFewShot(metric=exact_match)\n",
    "\n",
    "# Compile (optimize) our judge with the training data\n",
    "print(\"Optimizing judge with Bootstrap Few-Shot...\")\n",
    "bootstrap_optimized_judge = bootstrap_optimizer.compile(\n",
    "    joke_judge, \n",
    "    trainset=trainset\n",
    ")\n",
    "\n",
    "# Test on the same example\n",
    "result = bootstrap_optimized_judge(topic=trainset[0].topic, joke=trainset[0].joke)\n",
    "print(f\"\\nOptimized judge says funny: {result.funny}\")\n",
    "print(f\"Ground truth: {trainset[0].funny}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "v6oeg9s5h7",
   "metadata": {},
   "source": [
    "### Bootstrap Optimization Complete!\n",
    "\n",
    "The optimizer:\n",
    "- Automatically found 4 good examples from the training set\n",
    "- These examples will be used as few-shot demonstrations\n",
    "- The optimized judge still correctly identifies our test joke as funny\n",
    "\n",
    "Bootstrap works by finding examples where the base program succeeds and using those as demonstrations."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "k7a8m59tszd",
   "metadata": {},
   "source": [
    "### Evaluating the Optimized Judge"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "id": "9cb24f19",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average Metric: 47.00 / 51 (92.2%): 100%|██████████| 51/51 [00:00<00:00, 4264.63it/s] "
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:45:48 INFO dspy.evaluate.evaluate: Average Metric: 47 / 51 (92.2%)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>topic</th>\n",
       "      <th>joke</th>\n",
       "      <th>example_funny</th>\n",
       "      <th>reasoning</th>\n",
       "      <th>pred_funny</th>\n",
       "      <th>exact_match</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Field</td>\n",
       "      <td>Why did the scarecrow become a successful neurosurgeon? Because he...</td>\n",
       "      <td>False</td>\n",
       "      <td>The joke is a pun based on the double meaning of the word \"field.\"...</td>\n",
       "      <td>True</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Ghosts</td>\n",
       "      <td>Why are ghosts bad at lying? Because you can see right through them.</td>\n",
       "      <td>False</td>\n",
       "      <td>The joke is a pun based on the literal transparency of ghosts and ...</td>\n",
       "      <td>False</td>\n",
       "      <td>✔️ [True]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Marriage</td>\n",
       "      <td>The first time I met my wife, I knew she was a keeper. She was wea...</td>\n",
       "      <td>True</td>\n",
       "      <td>The joke is funny because it sets up an expectation of a romantic ...</td>\n",
       "      <td>True</td>\n",
       "      <td>✔️ [True]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Wealth</td>\n",
       "      <td>My wealth and happiness would suggest that God definitely does lov...</td>\n",
       "      <td>True</td>\n",
       "      <td>The joke is based on irony and a contradiction. The speaker claims...</td>\n",
       "      <td>True</td>\n",
       "      <td>✔️ [True]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Social Media</td>\n",
       "      <td>Following someone on Twitter and asking them to tweet about someth...</td>\n",
       "      <td>True</td>\n",
       "      <td>The joke uses an analogy to highlight the absurdity of trying to c...</td>\n",
       "      <td>True</td>\n",
       "      <td>✔️ [True]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          topic  \\\n",
       "0         Field   \n",
       "1        Ghosts   \n",
       "2      Marriage   \n",
       "3        Wealth   \n",
       "4  Social Media   \n",
       "\n",
       "                                                                    joke  \\\n",
       "0  Why did the scarecrow become a successful neurosurgeon? Because he...   \n",
       "1   Why are ghosts bad at lying? Because you can see right through them.   \n",
       "2  The first time I met my wife, I knew she was a keeper. She was wea...   \n",
       "3  My wealth and happiness would suggest that God definitely does lov...   \n",
       "4  Following someone on Twitter and asking them to tweet about someth...   \n",
       "\n",
       "   example_funny  \\\n",
       "0          False   \n",
       "1          False   \n",
       "2           True   \n",
       "3           True   \n",
       "4           True   \n",
       "\n",
       "                                                               reasoning  \\\n",
       "0  The joke is a pun based on the double meaning of the word \"field.\"...   \n",
       "1  The joke is a pun based on the literal transparency of ghosts and ...   \n",
       "2  The joke is funny because it sets up an expectation of a romantic ...   \n",
       "3  The joke is based on irony and a contradiction. The speaker claims...   \n",
       "4  The joke uses an analogy to highlight the absurdity of trying to c...   \n",
       "\n",
       "   pred_funny exact_match  \n",
       "0        True              \n",
       "1       False   ✔️ [True]  \n",
       "2        True   ✔️ [True]  \n",
       "3        True   ✔️ [True]  \n",
       "4        True   ✔️ [True]  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "\n",
       "            <div style='\n",
       "                text-align: center;\n",
       "                font-size: 16px;\n",
       "                font-weight: bold;\n",
       "                color: #555;\n",
       "                margin: 10px 0;'>\n",
       "                ... 46 more rows not displayed ...\n",
       "            </div>\n",
       "            "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Bootstrap optimized judge accuracy: 92.16%\n",
      "Improvement: 0.8077677520596312%\n"
     ]
    }
   ],
   "source": [
    "# Evaluate the optimized judge\n",
    "bootstrap_judge_score = evaluate(bootstrap_optimized_judge)\n",
    "print(f\"\\nBootstrap optimized judge accuracy: {bootstrap_judge_score}%\")\n",
    "print(f\"Improvement: {(bootstrap_judge_score - basic_judge_score) / basic_judge_score}%\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9d1td8vzqv5",
   "metadata": {},
   "source": [
    "### Bootstrap Optimization Improved Accuracy to 92%!\n",
    "\n",
    "The optimized judge shows significant improvement:\n",
    "- From ~50% (random chance) to 90% accuracy\n",
    "- An 80% improvement from just adding few-shot examples\n",
    "- The judge is still too generous with puns, but it's usable now\n",
    "\n",
    "The few-shot examples help the model understand the distinction between professional and dad jokes. The judge now agrees with our human evaluation of what jokes are funny to a high enough degree to use."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0mhlty2306j",
   "metadata": {},
   "source": [
    "## 12. Using the Judge to Optimize Joke Generation\n",
    "\n",
    "Now let's use our optimized judge to create a better joke generator:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "id": "a6c7afe7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Example({'topic': 'Suburban Life', 'joke': \"My parents did just well enough so I could grow up poor around white people. When Nas and them used to talk about the projects, I used to get jealous. It sounded fun. Everybody in the projects was poor, and that's fair. But if you were poor in Silver Spring, nigga, it felt like it was only happening to you.\"}) (input_keys={'topic'})"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Define a metric that uses our judge to score generated jokes\n",
    "def judge_score(pred, gold, trace=None):\n",
    "    \"\"\"Score generated jokes using our optimized judge\"\"\"\n",
    "    # Use the judge to evaluate the generated joke\n",
    "    judge_result = bootstrap_optimized_judge(topic=gold, joke=pred)\n",
    "    \n",
    "    # Return 1.0 if judge thinks it's funny, 0.0 otherwise\n",
    "    score = 1.0 if judge_result.funny else 0.0\n",
    "    return score\n",
    "\n",
    "# Create a dataset of topics from training data\n",
    "topic_trainset = [\n",
    "    dspy.Example(topic=example.topic, joke=example.joke).with_inputs(\"topic\")\n",
    "    for example in trainset if example.funny\n",
    "]\n",
    "\n",
    "# It doesn't matter what we put in the joke field, we're only using the topic\n",
    "topic_valset = [\n",
    "    dspy.Example(topic=example.topic, joke=example.joke).with_inputs(\"topic\")\n",
    "    for example in valset\n",
    "]\n",
    "\n",
    "# This is just a holdout set of fresh topics to do the final evaluation on\n",
    "topic_devset = [\n",
    "    dspy.Example(topic=example.topic, joke=example.joke).with_inputs(\"topic\")\n",
    "    for example in devset\n",
    "]\n",
    "\n",
    "topic_trainset[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "id": "d7144f52",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Topic: Suburban Life\n",
      "Joke: My parents did just well enough so I could grow up poor around white people. When Nas and them used to talk about the projects, I used to get jealous. It sounded fun. Everybody in the projects was poor, and that's fair. But if you were poor in Silver Spring, nigga, it felt like it was only happening to you.\n",
      "Score: 1.0\n"
     ]
    }
   ],
   "source": [
    "# Check whether the judge rates our joke as funny\n",
    "\n",
    "example_topic = topic_trainset[0].topic\n",
    "example_joke = topic_trainset[0].joke\n",
    "\n",
    "example_score = judge_score(example_topic, example_joke)\n",
    "\n",
    "print(f\"Topic: {example_topic}\")\n",
    "print(f\"Joke: {example_joke}\")\n",
    "\n",
    "print(f\"Score: {example_score}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "id": "0926f9fc",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Evaluating baseline joke generator on dev set...\n",
      "Average Metric: 20.00 / 51 (39.2%): 100%|██████████| 51/51 [00:00<00:00, 4641.43it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:49:02 INFO dspy.evaluate.evaluate: Average Metric: 20.0 / 51 (39.2%)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>topic</th>\n",
       "      <th>joke</th>\n",
       "      <th>prediction</th>\n",
       "      <th>judge_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Field</td>\n",
       "      <td>Why did the scarecrow become a successful neurosurgeon? Because he...</td>\n",
       "      <td>Why did the scarecrow win an award?\\n\\nBecause he was outstanding ...</td>\n",
       "      <td>✔️ [1.000]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Ghosts</td>\n",
       "      <td>Why are ghosts bad at lying? Because you can see right through them.</td>\n",
       "      <td>Why did the ghost cross the road?\\n\\nTo get to the other sheet!</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Marriage</td>\n",
       "      <td>The first time I met my wife, I knew she was a keeper. She was wea...</td>\n",
       "      <td>A man is sitting at home when he hears the doorbell ring. He opens...</td>\n",
       "      <td>✔️ [1.000]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Wealth</td>\n",
       "      <td>My wealth and happiness would suggest that God definitely does lov...</td>\n",
       "      <td>They say money can't buy happiness, but it's a lot more comfortabl...</td>\n",
       "      <td>✔️ [1.000]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Social Media</td>\n",
       "      <td>Following someone on Twitter and asking them to tweet about someth...</td>\n",
       "      <td>I unfollowed my gym on social media. I wasn't getting results.</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Fitness</td>\n",
       "      <td>I said to the gym instructor: \"Can you teach me to do the splits?\"...</td>\n",
       "      <td>I'm on a seafood diet. I see food, and I eat it. Especially after ...</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Field</td>\n",
       "      <td>Why did the scarecrow win an award? Because he was outstanding in ...</td>\n",
       "      <td>Why did the scarecrow win an award?\\n\\nBecause he was outstanding ...</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Fashion</td>\n",
       "      <td>Why do cows wear bells? Because their horns don't work.</td>\n",
       "      <td>Why did the fashion model get fired from the grocery store? Becaus...</td>\n",
       "      <td>✔️ [1.000]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Ghosts</td>\n",
       "      <td>Why do ghosts like elevators? Because it lifts their spirits.</td>\n",
       "      <td>Why did the ghost cross the road?\\n\\nTo get to the other sheet!</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Animals</td>\n",
       "      <td>What do you call an alligator in a vest? An investigator.</td>\n",
       "      <td>Why don't scientists trust atoms?\\n\\nBecause they make up everything!</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          topic  \\\n",
       "0         Field   \n",
       "1        Ghosts   \n",
       "2      Marriage   \n",
       "3        Wealth   \n",
       "4  Social Media   \n",
       "5       Fitness   \n",
       "6         Field   \n",
       "7       Fashion   \n",
       "8        Ghosts   \n",
       "9       Animals   \n",
       "\n",
       "                                                                    joke  \\\n",
       "0  Why did the scarecrow become a successful neurosurgeon? Because he...   \n",
       "1   Why are ghosts bad at lying? Because you can see right through them.   \n",
       "2  The first time I met my wife, I knew she was a keeper. She was wea...   \n",
       "3  My wealth and happiness would suggest that God definitely does lov...   \n",
       "4  Following someone on Twitter and asking them to tweet about someth...   \n",
       "5  I said to the gym instructor: \"Can you teach me to do the splits?\"...   \n",
       "6  Why did the scarecrow win an award? Because he was outstanding in ...   \n",
       "7                Why do cows wear bells? Because their horns don't work.   \n",
       "8          Why do ghosts like elevators? Because it lifts their spirits.   \n",
       "9              What do you call an alligator in a vest? An investigator.   \n",
       "\n",
       "                                                              prediction  \\\n",
       "0  Why did the scarecrow win an award?\\n\\nBecause he was outstanding ...   \n",
       "1        Why did the ghost cross the road?\\n\\nTo get to the other sheet!   \n",
       "2  A man is sitting at home when he hears the doorbell ring. He opens...   \n",
       "3  They say money can't buy happiness, but it's a lot more comfortabl...   \n",
       "4         I unfollowed my gym on social media. I wasn't getting results.   \n",
       "5  I'm on a seafood diet. I see food, and I eat it. Especially after ...   \n",
       "6  Why did the scarecrow win an award?\\n\\nBecause he was outstanding ...   \n",
       "7  Why did the fashion model get fired from the grocery store? Becaus...   \n",
       "8        Why did the ghost cross the road?\\n\\nTo get to the other sheet!   \n",
       "9  Why don't scientists trust atoms?\\n\\nBecause they make up everything!   \n",
       "\n",
       "  judge_score  \n",
       "0  ✔️ [1.000]  \n",
       "1              \n",
       "2  ✔️ [1.000]  \n",
       "3  ✔️ [1.000]  \n",
       "4              \n",
       "5              \n",
       "6              \n",
       "7  ✔️ [1.000]  \n",
       "8              \n",
       "9              "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "\n",
       "            <div style='\n",
       "                text-align: center;\n",
       "                font-size: 16px;\n",
       "                font-weight: bold;\n",
       "                color: #555;\n",
       "                margin: 10px 0;'>\n",
       "                ... 41 more rows not displayed ...\n",
       "            </div>\n",
       "            "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Baseline evaluation complete!\n"
     ]
    }
   ],
   "source": [
    "# First evaluate the baseline joke generator before MIPRO optimization\n",
    "print(\"Evaluating baseline joke generator on dev set...\")\n",
    "\n",
    "# Run evaluation using the judge_score metric on the original joke_module\n",
    "evaluate = Evaluate(\n",
    "    metric=judge_score, \n",
    "    devset=topic_devset,\n",
    "    num_threads=8,  # Run evaluations in parallel\n",
    "    display_progress=True,\n",
    "    display_table=10  # Show first 10 results\n",
    ")\n",
    "\n",
    "# Evaluate the baseline joke generator\n",
    "baseline_results = evaluate(joke_module)\n",
    "\n",
    "# Print the results\n",
    "print(\"\\nBaseline evaluation complete!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "zr719ziu4nq",
   "metadata": {},
   "source": [
    "## 13. Advanced Optimization with MIPRO\n",
    "\n",
    "MIPRO (Multi-prompt Instruction Proposal Optimizer) can optimize the instructions themselves, not just the examples:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "id": "e44402a1",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: \n",
      "RUNNING WITH THE FOLLOWING HEAVY AUTO RUN SETTINGS:\n",
      "num_trials: 27\n",
      "minibatch: True\n",
      "num_fewshot_candidates: 18\n",
      "num_instruct_candidates: 9\n",
      "valset size: 51\n",
      "\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: \n",
      "==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates for our program and for creating instructions.\n",
      "\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=18 sets of demonstrations...\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimizing joke generator with MIPRO...\n",
      "MIPRO will optimize:\n",
      "- The instruction text\n",
      "- Select the best few-shot examples\n",
      "- Balance between different optimization strategies\n",
      "Bootstrapping set 1/18\n",
      "Bootstrapping set 2/18\n",
      "Bootstrapping set 3/18\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  8%|▊         | 6/74 [00:00<00:00, 589.68it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Bootstrapped 4 full traces after 6 examples for up to 1 rounds, amounting to 6 attempts.\n",
      "Bootstrapping set 4/18\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  3%|▎         | 2/74 [00:00<00:00, 479.82it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Bootstrapped 1 full traces after 2 examples for up to 1 rounds, amounting to 2 attempts.\n",
      "Bootstrapping set 5/18\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  1%|▏         | 1/74 [00:00<00:00, 475.98it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.\n",
      "Bootstrapping set 6/18\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  1%|▏         | 1/74 [00:00<00:00, 484.95it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.\n",
      "Bootstrapping set 7/18\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  1%|▏         | 1/74 [00:00<00:00, 429.00it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.\n",
      "Bootstrapping set 8/18\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  7%|▋         | 5/74 [00:00<00:00, 548.17it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Bootstrapped 3 full traces after 5 examples for up to 1 rounds, amounting to 5 attempts.\n",
      "Bootstrapping set 9/18\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  3%|▎         | 2/74 [00:00<00:00, 531.63it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Bootstrapped 2 full traces after 2 examples for up to 1 rounds, amounting to 2 attempts.\n",
      "Bootstrapping set 10/18\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  8%|▊         | 6/74 [00:00<00:00, 595.06it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Bootstrapped 3 full traces after 6 examples for up to 1 rounds, amounting to 6 attempts.\n",
      "Bootstrapping set 11/18\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  9%|▉         | 7/74 [00:00<00:00, 587.68it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Bootstrapped 4 full traces after 7 examples for up to 1 rounds, amounting to 7 attempts.\n",
      "Bootstrapping set 12/18\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  3%|▎         | 2/74 [00:00<00:00, 483.66it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Bootstrapped 1 full traces after 2 examples for up to 1 rounds, amounting to 2 attempts.\n",
      "Bootstrapping set 13/18\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      " 11%|█         | 8/74 [00:00<00:00, 576.78it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Bootstrapped 4 full traces after 8 examples for up to 1 rounds, amounting to 8 attempts.\n",
      "Bootstrapping set 14/18\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  3%|▎         | 2/74 [00:00<00:00, 515.62it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Bootstrapped 2 full traces after 2 examples for up to 1 rounds, amounting to 2 attempts.\n",
      "Bootstrapping set 15/18\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  8%|▊         | 6/74 [00:00<00:00, 609.25it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Bootstrapped 4 full traces after 6 examples for up to 1 rounds, amounting to 6 attempts.\n",
      "Bootstrapping set 16/18\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  9%|▉         | 7/74 [00:00<00:00, 609.92it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Bootstrapped 4 full traces after 7 examples for up to 1 rounds, amounting to 7 attempts.\n",
      "Bootstrapping set 17/18\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  5%|▌         | 4/74 [00:00<00:00, 503.20it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Bootstrapped 3 full traces after 4 examples for up to 1 rounds, amounting to 4 attempts.\n",
      "Bootstrapping set 18/18\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  1%|▏         | 1/74 [00:00<00:00, 474.84it/s]\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: \n",
      "==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: \n",
      "Proposing N=9 instructions...\n",
      "\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.\n",
      "Error getting source code: unhashable type: 'dict'.\n",
      "\n",
      "Running without program aware proposer.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: Proposed Instructions for Predictor 0:\n",
      "\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: 0: Tell a funny joke about the topic\n",
      "\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: 1: Generate a humorous joke related to the specified topic, suitable for a general adult audience. Be mindful of potentially sensitive content.\n",
      "\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: 2: You are a world-class comedian. Your task is to generate a joke based on a given topic. The jokes should be original, funny, and appropriate for a general adult audience. Avoid offensive or discriminatory humor. Aim for a joke that elicits laughter or amusement through clever wordplay, surprising twists, or relatable observations. Provide a joke that's a maximum of three sentences. After the joke, briefly explain the humor behind it (one sentence).\n",
      "\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: 3: You are a world-renowned comedian on stage at your biggest show ever. The crowd is enormous and the expectations are even higher. Your entire career rests on delivering a killer joke on the following topic: {topic}. Give me your best shot!\n",
      "\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: 4: You are a creative joke writer. Your task is to generate a funny joke based on a given topic. First, reason step by step about the topic and try to identify potential humorous angles, common stereotypes, or surprising twists related to the subject. The goal is to create a joke that is both relevant to the topic and genuinely funny. Finally, write the joke.\n",
      "\n",
      "Topic: {topic}\n",
      "Reasoning: Let's think step by step in order to\n",
      "Joke:\n",
      "\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: 5: Compose a humorous joke based on the provided topic, incorporating elements of surprise, irony, or wordplay. The joke should be suitable for a general audience but avoid offensive or insensitive content. Briefly outline your reasoning for choosing this particular joke structure and its connection to the topic before delivering the joke.\n",
      "\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: 6: Generate a joke on the following topic. First, reason step by step about the type of joke to generate, considering common associations and potential comedic angles related to the topic. Then, tell the joke.\n",
      "\n",
      "Topic: {topic}\n",
      "Reasoning: Let's think step by step in order to\n",
      "Joke:\n",
      "\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: 7: You are a world-class comedian known for your clever and original jokes. Your task is to generate a funny joke based on a given topic. Before responding with the joke, think step by step about the common associations, stereotypes, or humorous angles related to the topic. Then, craft a joke that is both relevant and funny.\n",
      "\n",
      "Topic: [Insert Topic Here]\n",
      "Reasoning: Let's think step by step in order to [Explain your thought process for creating a funny joke about the topic. Consider potential angles, stereotypes, or unexpected twists.]\n",
      "Joke: [Generated Joke]\n",
      "\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: 8: Craft a humorous joke based on the provided topic, incorporating elements of surprise, relatability, or absurdity to maximize comedic effect. Aim for a joke that is original and avoids offensive or insensitive content. Structure your response as \"Joke: [Your Joke Here]\". Before the joke, include a \"Reasoning:\" section where you outline your thought process for creating the joke, step-by-step, helping the user understand the comedic strategy you employed.\n",
      "\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: \n",
      "\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: ==> STEP 3: FINDING OPTIMAL PROMPT PARAMETERS <==\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: We will evaluate the program over a series of trials with different combinations of instructions and few-shot examples to find the optimal combination using Bayesian Optimization.\n",
      "\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 1 / 34 - Full Evaluation of Default Program ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average Metric: 23.00 / 51 (45.1%): 100%|██████████| 51/51 [00:00<00:00, 4419.71it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:32 INFO dspy.evaluate.evaluate: Average Metric: 23.0 / 51 (45.1%)\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: Default program score: 45.1\n",
      "\n",
      "/Users/hammermt/Codes/dspy-primer/.venv/lib/python3.13/site-packages/optuna/_experimental.py:32: ExperimentalWarning: Argument ``multivariate`` is an experimental feature. The interface can change in the future.\n",
      "  warnings.warn(\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 2 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Average Metric: 11.00 / 35 (31.4%): 100%|██████████| 35/35 [00:00<00:00, 4711.34it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:32 INFO dspy.evaluate.evaluate: Average Metric: 11.0 / 35 (31.4%)\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 31.43 on minibatch of size 35 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 2'].\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43]\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1]\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 45.1\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: =========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 3 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average Metric: 16.00 / 35 (45.7%): 100%|██████████| 35/35 [00:00<00:00, 4811.09it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:32 INFO dspy.evaluate.evaluate: Average Metric: 16.0 / 35 (45.7%)\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 45.71 on minibatch of size 35 with parameters ['Predictor 0: Instruction 3', 'Predictor 0: Few-Shot Set 16'].\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43, 45.71]\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1]\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 45.1\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: =========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 4 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Average Metric: 16.00 / 35 (45.7%): 100%|██████████| 35/35 [00:00<00:00, 4287.53it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:32 INFO dspy.evaluate.evaluate: Average Metric: 16.0 / 35 (45.7%)\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 45.71 on minibatch of size 35 with parameters ['Predictor 0: Instruction 3', 'Predictor 0: Few-Shot Set 1'].\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43, 45.71, 45.71]\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1]\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 45.1\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: =========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 5 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Average Metric: 13.00 / 35 (37.1%): 100%|██████████| 35/35 [00:00<00:00, 4694.17it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:32 INFO dspy.evaluate.evaluate: Average Metric: 13.0 / 35 (37.1%)\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 37.14 on minibatch of size 35 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 15'].\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43, 45.71, 45.71, 37.14]\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1]\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 45.1\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: =========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:32 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 6 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Average Metric: 16.00 / 35 (45.7%): 100%|██████████| 35/35 [00:00<00:00, 4632.10it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:33 INFO dspy.evaluate.evaluate: Average Metric: 16.0 / 35 (45.7%)\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 45.71 on minibatch of size 35 with parameters ['Predictor 0: Instruction 6', 'Predictor 0: Few-Shot Set 1'].\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43, 45.71, 45.71, 37.14, 45.71]\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1]\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 45.1\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: =========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 7 / 34 - Full Evaluation =====\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 45.71) from minibatch trials...\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Average Metric: 22.00 / 51 (43.1%): 100%|██████████| 51/51 [00:00<00:00, 4534.57it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:33 INFO dspy.evaluate.evaluate: Average Metric: 22.0 / 51 (43.1%)\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14]\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 45.1\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: =======================\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: \n",
      "\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 8 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Average Metric: 13.00 / 35 (37.1%): 100%|██████████| 35/35 [00:00<00:00, 4013.80it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:33 INFO dspy.evaluate.evaluate: Average Metric: 13.0 / 35 (37.1%)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 37.14 on minibatch of size 35 with parameters ['Predictor 0: Instruction 3', 'Predictor 0: Few-Shot Set 12'].\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43, 45.71, 45.71, 37.14, 45.71, 37.14]\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14]\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 45.1\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: =========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 9 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average Metric: 16.00 / 35 (45.7%): 100%|██████████| 35/35 [00:00<00:00, 4407.90it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:33 INFO dspy.evaluate.evaluate: Average Metric: 16.0 / 35 (45.7%)\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 45.71 on minibatch of size 35 with parameters ['Predictor 0: Instruction 5', 'Predictor 0: Few-Shot Set 5'].\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43, 45.71, 45.71, 37.14, 45.71, 37.14, 45.71]\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14]\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 45.1\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: =========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 10 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Average Metric: 15.00 / 35 (42.9%): 100%|██████████| 35/35 [00:00<00:00, 4719.52it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:33 INFO dspy.evaluate.evaluate: Average Metric: 15.0 / 35 (42.9%)\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 42.86 on minibatch of size 35 with parameters ['Predictor 0: Instruction 4', 'Predictor 0: Few-Shot Set 5'].\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43, 45.71, 45.71, 37.14, 45.71, 37.14, 45.71, 42.86]\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14]\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 45.1\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 11 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Average Metric: 15.00 / 35 (42.9%): 100%|██████████| 35/35 [00:00<00:00, 4761.93it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:33 INFO dspy.evaluate.evaluate: Average Metric: 15.0 / 35 (42.9%)\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 42.86 on minibatch of size 35 with parameters ['Predictor 0: Instruction 3', 'Predictor 0: Few-Shot Set 16'].\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43, 45.71, 45.71, 37.14, 45.71, 37.14, 45.71, 42.86, 42.86]\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14]\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 45.1\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 12 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Average Metric: 17.00 / 35 (48.6%): 100%|██████████| 35/35 [00:00<00:00, 4327.47it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:33 INFO dspy.evaluate.evaluate: Average Metric: 17.0 / 35 (48.6%)\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 48.57 on minibatch of size 35 with parameters ['Predictor 0: Instruction 3', 'Predictor 0: Few-Shot Set 1'].\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43, 45.71, 45.71, 37.14, 45.71, 37.14, 45.71, 42.86, 42.86, 48.57]\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14]\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 45.1\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 13 / 34 - Full Evaluation =====\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 47.14) from minibatch trials...\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Average Metric: 25.00 / 51 (49.0%): 100%|██████████| 51/51 [00:00<00:00, 4696.56it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:33 INFO dspy.evaluate.evaluate: Average Metric: 25.0 / 51 (49.0%)\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: \u001b[92mNew best full eval score!\u001b[0m Score: 49.02\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14, 49.02]\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 49.02\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: =======================\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: \n",
      "\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 14 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average Metric: 19.00 / 35 (54.3%): 100%|██████████| 35/35 [00:00<00:00, 5210.69it/s]\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:33 INFO dspy.evaluate.evaluate: Average Metric: 19.0 / 35 (54.3%)\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 54.29 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 8'].\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43, 45.71, 45.71, 37.14, 45.71, 37.14, 45.71, 42.86, 42.86, 48.57, 54.29]\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14, 49.02]\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 49.02\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 15 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average Metric: 18.00 / 35 (51.4%): 100%|██████████| 35/35 [00:00<00:00, 4947.78it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:33 INFO dspy.evaluate.evaluate: Average Metric: 18.0 / 35 (51.4%)\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 51.43 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 8'].\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43, 45.71, 45.71, 37.14, 45.71, 37.14, 45.71, 42.86, 42.86, 48.57, 54.29, 51.43]\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14, 49.02]\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 49.02\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 16 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Average Metric: 18.00 / 35 (51.4%): 100%|██████████| 35/35 [00:00<00:00, 4145.39it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:33 INFO dspy.evaluate.evaluate: Average Metric: 18.0 / 35 (51.4%)\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 51.43 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 6'].\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43, 45.71, 45.71, 37.14, 45.71, 37.14, 45.71, 42.86, 42.86, 48.57, 54.29, 51.43, 51.43]\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14, 49.02]\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 49.02\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 17 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Average Metric: 18.00 / 35 (51.4%): 100%|██████████| 35/35 [00:00<00:00, 4916.46it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:34 INFO dspy.evaluate.evaluate: Average Metric: 18.0 / 35 (51.4%)\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 51.43 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 8'].\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43, 45.71, 45.71, 37.14, 45.71, 37.14, 45.71, 42.86, 42.86, 48.57, 54.29, 51.43, 51.43, 51.43]\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14, 49.02]\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 49.02\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 18 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Average Metric: 15.00 / 35 (42.9%): 100%|██████████| 35/35 [00:00<00:00, 4262.50it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:34 INFO dspy.evaluate.evaluate: Average Metric: 15.0 / 35 (42.9%)\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 42.86 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 4'].\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43, 45.71, 45.71, 37.14, 45.71, 37.14, 45.71, 42.86, 42.86, 48.57, 54.29, 51.43, 51.43, 51.43, 42.86]\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14, 49.02]\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 49.02\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 19 / 34 - Full Evaluation =====\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 52.38333333333333) from minibatch trials...\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Average Metric: 26.00 / 51 (51.0%): 100%|██████████| 51/51 [00:00<00:00, 4287.11it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:34 INFO dspy.evaluate.evaluate: Average Metric: 26.0 / 51 (51.0%)\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: \u001b[92mNew best full eval score!\u001b[0m Score: 50.98\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14, 49.02, 50.98]\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 50.98\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: =======================\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: \n",
      "\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 20 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average Metric: 17.00 / 35 (48.6%): 100%|██████████| 35/35 [00:00<00:00, 4808.56it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:34 INFO dspy.evaluate.evaluate: Average Metric: 17.0 / 35 (48.6%)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 48.57 on minibatch of size 35 with parameters ['Predictor 0: Instruction 6', 'Predictor 0: Few-Shot Set 8'].\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43, 45.71, 45.71, 37.14, 45.71, 37.14, 45.71, 42.86, 42.86, 48.57, 54.29, 51.43, 51.43, 51.43, 42.86, 48.57]\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14, 49.02, 50.98]\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 50.98\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 21 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average Metric: 19.00 / 35 (54.3%): 100%|██████████| 35/35 [00:00<00:00, 725.25it/s] "
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:34 INFO dspy.evaluate.evaluate: Average Metric: 19.0 / 35 (54.3%)\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 54.29 on minibatch of size 35 with parameters ['Predictor 0: Instruction 7', 'Predictor 0: Few-Shot Set 8'].\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43, 45.71, 45.71, 37.14, 45.71, 37.14, 45.71, 42.86, 42.86, 48.57, 54.29, 51.43, 51.43, 51.43, 42.86, 48.57, 54.29]\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14, 49.02, 50.98]\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 50.98\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 22 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Average Metric: 17.00 / 35 (48.6%): 100%|██████████| 35/35 [00:00<00:00, 4002.20it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:34 INFO dspy.evaluate.evaluate: Average Metric: 17.0 / 35 (48.6%)\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 48.57 on minibatch of size 35 with parameters ['Predictor 0: Instruction 7', 'Predictor 0: Few-Shot Set 14'].\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43, 45.71, 45.71, 37.14, 45.71, 37.14, 45.71, 42.86, 42.86, 48.57, 54.29, 51.43, 51.43, 51.43, 42.86, 48.57, 54.29, 48.57]\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14, 49.02, 50.98]\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 50.98\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 23 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Average Metric: 15.00 / 35 (42.9%): 100%|██████████| 35/35 [00:00<00:00, 4165.74it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:34 INFO dspy.evaluate.evaluate: Average Metric: 15.0 / 35 (42.9%)\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 42.86 on minibatch of size 35 with parameters ['Predictor 0: Instruction 7', 'Predictor 0: Few-Shot Set 8'].\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43, 45.71, 45.71, 37.14, 45.71, 37.14, 45.71, 42.86, 42.86, 48.57, 54.29, 51.43, 51.43, 51.43, 42.86, 48.57, 54.29, 48.57, 42.86]\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14, 49.02, 50.98]\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 50.98\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:34 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 24 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Average Metric: 19.00 / 35 (54.3%): 100%|██████████| 35/35 [00:00<00:00, 4119.68it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:35 INFO dspy.evaluate.evaluate: Average Metric: 19.0 / 35 (54.3%)\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 54.29 on minibatch of size 35 with parameters ['Predictor 0: Instruction 8', 'Predictor 0: Few-Shot Set 8'].\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43, 45.71, 45.71, 37.14, 45.71, 37.14, 45.71, 42.86, 42.86, 48.57, 54.29, 51.43, 51.43, 51.43, 42.86, 48.57, 54.29, 48.57, 42.86, 54.29]\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14, 49.02, 50.98]\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 50.98\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 25 / 34 - Full Evaluation =====\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 54.29) from minibatch trials...\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Average Metric: 26.00 / 51 (51.0%): 100%|██████████| 51/51 [00:00<00:00, 3332.49it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:35 INFO dspy.evaluate.evaluate: Average Metric: 26.0 / 51 (51.0%)\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14, 49.02, 50.98, 50.98]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 50.98\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: =======================\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: \n",
      "\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 26 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average Metric: 18.00 / 35 (51.4%): 100%|██████████| 35/35 [00:00<00:00, 3433.45it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:35 INFO dspy.evaluate.evaluate: Average Metric: 18.0 / 35 (51.4%)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 51.43 on minibatch of size 35 with parameters ['Predictor 0: Instruction 8', 'Predictor 0: Few-Shot Set 10'].\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43, 45.71, 45.71, 37.14, 45.71, 37.14, 45.71, 42.86, 42.86, 48.57, 54.29, 51.43, 51.43, 51.43, 42.86, 48.57, 54.29, 48.57, 42.86, 54.29, 51.43]\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14, 49.02, 50.98, 50.98]\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 50.98\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 27 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average Metric: 16.00 / 35 (45.7%): 100%|██████████| 35/35 [00:00<00:00, 3445.54it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:35 INFO dspy.evaluate.evaluate: Average Metric: 16.0 / 35 (45.7%)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 45.71 on minibatch of size 35 with parameters ['Predictor 0: Instruction 4', 'Predictor 0: Few-Shot Set 8'].\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43, 45.71, 45.71, 37.14, 45.71, 37.14, 45.71, 42.86, 42.86, 48.57, 54.29, 51.43, 51.43, 51.43, 42.86, 48.57, 54.29, 48.57, 42.86, 54.29, 51.43, 45.71]\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14, 49.02, 50.98, 50.98]\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 50.98\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 28 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average Metric: 15.00 / 35 (42.9%): 100%|██████████| 35/35 [00:00<00:00, 2058.83it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:35 INFO dspy.evaluate.evaluate: Average Metric: 15.0 / 35 (42.9%)\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 42.86 on minibatch of size 35 with parameters ['Predictor 0: Instruction 7', 'Predictor 0: Few-Shot Set 7'].\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43, 45.71, 45.71, 37.14, 45.71, 37.14, 45.71, 42.86, 42.86, 48.57, 54.29, 51.43, 51.43, 51.43, 42.86, 48.57, 54.29, 48.57, 42.86, 54.29, 51.43, 45.71, 42.86]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14, 49.02, 50.98, 50.98]\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 50.98\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 29 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average Metric: 17.00 / 35 (48.6%): 100%|██████████| 35/35 [00:00<00:00, 3608.31it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:35 INFO dspy.evaluate.evaluate: Average Metric: 17.0 / 35 (48.6%)\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 48.57 on minibatch of size 35 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 11'].\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43, 45.71, 45.71, 37.14, 45.71, 37.14, 45.71, 42.86, 42.86, 48.57, 54.29, 51.43, 51.43, 51.43, 42.86, 48.57, 54.29, 48.57, 42.86, 54.29, 51.43, 45.71, 42.86, 48.57]\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14, 49.02, 50.98, 50.98]\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 50.98\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 30 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Average Metric: 20.00 / 35 (57.1%): 100%|██████████| 35/35 [00:00<00:00, 3263.03it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:35 INFO dspy.evaluate.evaluate: Average Metric: 20.0 / 35 (57.1%)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 57.14 on minibatch of size 35 with parameters ['Predictor 0: Instruction 8', 'Predictor 0: Few-Shot Set 17'].\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43, 45.71, 45.71, 37.14, 45.71, 37.14, 45.71, 42.86, 42.86, 48.57, 54.29, 51.43, 51.43, 51.43, 42.86, 48.57, 54.29, 48.57, 42.86, 54.29, 51.43, 45.71, 42.86, 48.57, 57.14]\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14, 49.02, 50.98, 50.98]\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 50.98\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 31 / 34 - Full Evaluation =====\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 57.14) from minibatch trials...\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average Metric: 26.00 / 51 (51.0%): 100%|██████████| 51/51 [00:00<00:00, 3039.70it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:35 INFO dspy.evaluate.evaluate: Average Metric: 26.0 / 51 (51.0%)\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14, 49.02, 50.98, 50.98, 50.98]\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 50.98\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: =======================\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: \n",
      "\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 32 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Average Metric: 21.00 / 35 (60.0%): 100%|██████████| 35/35 [00:00<00:00, 2391.82it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:35 INFO dspy.evaluate.evaluate: Average Metric: 21.0 / 35 (60.0%)\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 60.0 on minibatch of size 35 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 17'].\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43, 45.71, 45.71, 37.14, 45.71, 37.14, 45.71, 42.86, 42.86, 48.57, 54.29, 51.43, 51.43, 51.43, 42.86, 48.57, 54.29, 48.57, 42.86, 54.29, 51.43, 45.71, 42.86, 48.57, 57.14, 60.0]\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14, 49.02, 50.98, 50.98, 50.98]\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 50.98\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 33 / 34 - Minibatch ==\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Average Metric: 16.00 / 35 (45.7%): 100%|██████████| 35/35 [00:00<00:00, 1990.36it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:35 INFO dspy.evaluate.evaluate: Average Metric: 16.0 / 35 (45.7%)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 45.71 on minibatch of size 35 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 14'].\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [31.43, 45.71, 45.71, 37.14, 45.71, 37.14, 45.71, 42.86, 42.86, 48.57, 54.29, 51.43, 51.43, 51.43, 42.86, 48.57, 54.29, 48.57, 42.86, 54.29, 51.43, 45.71, 42.86, 48.57, 57.14, 60.0, 45.71]\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14, 49.02, 50.98, 50.98, 50.98]\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 50.98\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: ==========================================\n",
      "\n",
      "\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 34 / 34 - Full Evaluation =====\n",
      "2025/08/06 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 60.0) from minibatch trials...\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average Metric: 32.00 / 51 (62.7%): 100%|██████████| 51/51 [00:00<00:00, 4818.65it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:50:36 INFO dspy.evaluate.evaluate: Average Metric: 32.0 / 51 (62.7%)\n",
      "2025/08/06 17:50:36 INFO dspy.teleprompt.mipro_optimizer_v2: \u001b[92mNew best full eval score!\u001b[0m Score: 62.75\n",
      "2025/08/06 17:50:36 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [45.1, 43.14, 49.02, 50.98, 50.98, 50.98, 62.75]\n",
      "2025/08/06 17:50:36 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 62.75\n",
      "2025/08/06 17:50:36 INFO dspy.teleprompt.mipro_optimizer_v2: =======================\n",
      "2025/08/06 17:50:36 INFO dspy.teleprompt.mipro_optimizer_v2: \n",
      "\n",
      "2025/08/06 17:50:36 INFO dspy.teleprompt.mipro_optimizer_v2: Returning best identified program with score 62.75!\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "MIPRO optimization complete!\n"
     ]
    }
   ],
   "source": [
    "# Import MIPRO optimizer\n",
    "from dspy.teleprompt import MIPROv2\n",
    "\n",
    "# Create MIPRO optimizer with balanced settings\n",
    "mipro_optimizer = MIPROv2(\n",
    "    metric=judge_score,\n",
    "    num_threads=8,\n",
    "    max_bootstrapped_demos=4,  # Include some bootstrapped examples (generated by the model)\n",
    "    max_labeled_demos=4,      # Include some labeled examples (from our training data)  \n",
    "    auto=\"heavy\",             # Light optimization for faster results\n",
    "    seed=69,\n",
    "    init_temperature=1\n",
    ")\n",
    "\n",
    "# Optimize with MIPRO\n",
    "print(\"Optimizing joke generator with MIPRO...\")\n",
    "print(\"MIPRO will optimize:\")\n",
    "print(\"- The instruction text\")\n",
    "print(\"- Select the best few-shot examples\")\n",
    "print(\"- Balance between different optimization strategies\")\n",
    "\n",
    "mipro_optimized_joke_program = mipro_optimizer.compile(\n",
    "    joke_module,\n",
    "    trainset=topic_trainset, # giving the model jokes to learn from\n",
    "    valset=topic_valset, # we're giving it fresh topics to test against\n",
    "    requires_permission_to_run=False,\n",
    "    seed=69\n",
    ")\n",
    "\n",
    "print(\"\\nMIPRO optimization complete!\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "id": "0c09cd50",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average Metric: 25.00 / 51 (49.0%): 100%|██████████| 51/51 [00:01<00:00, 45.13it/s]  "
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/08/06 17:52:36 INFO dspy.evaluate.evaluate: Average Metric: 25.0 / 51 (49.0%)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>topic</th>\n",
       "      <th>joke</th>\n",
       "      <th>prediction</th>\n",
       "      <th>judge_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Field</td>\n",
       "      <td>Why did the scarecrow become a successful neurosurgeon? Because he...</td>\n",
       "      <td>I went to a field the other day. It was a minefield. That's why th...</td>\n",
       "      <td>✔️ [1.000]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Ghosts</td>\n",
       "      <td>Why are ghosts bad at lying? Because you can see right through them.</td>\n",
       "      <td>Why did the ghost break up with his girlfriend?\\n\\nShe ghosted him.</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Marriage</td>\n",
       "      <td>The first time I met my wife, I knew she was a keeper. She was wea...</td>\n",
       "      <td>My wife told me to take the spider out instead of killing it. We w...</td>\n",
       "      <td>✔️ [1.000]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Wealth</td>\n",
       "      <td>My wealth and happiness would suggest that God definitely does lov...</td>\n",
       "      <td>A wealthy man is complaining to his friend, \"Last year, I spent $5...</td>\n",
       "      <td>✔️ [1.000]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Social Media</td>\n",
       "      <td>Following someone on Twitter and asking them to tweet about someth...</td>\n",
       "      <td>I unfollowed the baker on social media. All his posts were just ri...</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Fitness</td>\n",
       "      <td>I said to the gym instructor: \"Can you teach me to do the splits?\"...</td>\n",
       "      <td>I told my personal trainer I wanted to get in shape for summer. No...</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Field</td>\n",
       "      <td>Why did the scarecrow win an award? Because he was outstanding in ...</td>\n",
       "      <td>I went to a field the other day. It was a minefield. That's why th...</td>\n",
       "      <td>✔️ [1.000]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Fashion</td>\n",
       "      <td>Why do cows wear bells? Because their horns don't work.</td>\n",
       "      <td>I tried to put on my skinny jeans, but I think I pulled a muscle.....</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Ghosts</td>\n",
       "      <td>Why do ghosts like elevators? Because it lifts their spirits.</td>\n",
       "      <td>Why did the ghost break up with his girlfriend?\\n\\nShe ghosted him.</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Animals</td>\n",
       "      <td>What do you call an alligator in a vest? An investigator.</td>\n",
       "      <td>Why did the chicken cross the road? To get away from his crippling...</td>\n",
       "      <td>✔️ [1.000]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          topic  \\\n",
       "0         Field   \n",
       "1        Ghosts   \n",
       "2      Marriage   \n",
       "3        Wealth   \n",
       "4  Social Media   \n",
       "5       Fitness   \n",
       "6         Field   \n",
       "7       Fashion   \n",
       "8        Ghosts   \n",
       "9       Animals   \n",
       "\n",
       "                                                                    joke  \\\n",
       "0  Why did the scarecrow become a successful neurosurgeon? Because he...   \n",
       "1   Why are ghosts bad at lying? Because you can see right through them.   \n",
       "2  The first time I met my wife, I knew she was a keeper. She was wea...   \n",
       "3  My wealth and happiness would suggest that God definitely does lov...   \n",
       "4  Following someone on Twitter and asking them to tweet about someth...   \n",
       "5  I said to the gym instructor: \"Can you teach me to do the splits?\"...   \n",
       "6  Why did the scarecrow win an award? Because he was outstanding in ...   \n",
       "7                Why do cows wear bells? Because their horns don't work.   \n",
       "8          Why do ghosts like elevators? Because it lifts their spirits.   \n",
       "9              What do you call an alligator in a vest? An investigator.   \n",
       "\n",
       "                                                              prediction  \\\n",
       "0  I went to a field the other day. It was a minefield. That's why th...   \n",
       "1    Why did the ghost break up with his girlfriend?\\n\\nShe ghosted him.   \n",
       "2  My wife told me to take the spider out instead of killing it. We w...   \n",
       "3  A wealthy man is complaining to his friend, \"Last year, I spent $5...   \n",
       "4  I unfollowed the baker on social media. All his posts were just ri...   \n",
       "5  I told my personal trainer I wanted to get in shape for summer. No...   \n",
       "6  I went to a field the other day. It was a minefield. That's why th...   \n",
       "7  I tried to put on my skinny jeans, but I think I pulled a muscle.....   \n",
       "8    Why did the ghost break up with his girlfriend?\\n\\nShe ghosted him.   \n",
       "9  Why did the chicken cross the road? To get away from his crippling...   \n",
       "\n",
       "  judge_score  \n",
       "0  ✔️ [1.000]  \n",
       "1              \n",
       "2  ✔️ [1.000]  \n",
       "3  ✔️ [1.000]  \n",
       "4              \n",
       "5              \n",
       "6  ✔️ [1.000]  \n",
       "7              \n",
       "8              \n",
       "9  ✔️ [1.000]  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "\n",
       "            <div style='\n",
       "                text-align: center;\n",
       "                font-size: 16px;\n",
       "                font-weight: bold;\n",
       "                color: #555;\n",
       "                margin: 10px 0;'>\n",
       "                ... 41 more rows not displayed ...\n",
       "            </div>\n",
       "            "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# How did we do on topics the optimizer hadn't seen?\n",
    "optimized_results = evaluate(mipro_optimized_joke_program)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e73f3a83",
   "metadata": {},
   "source": [
    "### MIPRO Optimization Results\n",
    "\n",
    "The MIPRO-optimized joke generator achieved a 47.1% accuracy on the validation set, compared to the baseline model's 39.2% accuracy. This represents a significant improvement of 7.9 percentage points.\n",
    "\n",
    "Key observations:\n",
    "- Baseline model: 20.0/51 (39.2%) accuracy\n",
    "- MIPRO optimized: 24.0/51 (47.1%) accuracy\n",
    "- Improvement: +7.9% accuracy (+20% improvement)\n",
    "\n",
    "The optimized model shows better performance on unseen topics, suggesting MIPRO helped create more robust and generalizable instructions for joke generation.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "553swutx1mi",
   "metadata": {},
   "source": [
    "### Examining MIPRO's Optimized Instructions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "id": "bfdb3b8c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "==================================================\n",
      "MESSAGE 1: SYSTEM\n",
      "==================================================\n",
      "Your input fields are:\n",
      "1. `topic` (str): The topic of the joke\n",
      "Your output fields are:\n",
      "1. `reasoning` (str): \n",
      "2. `joke` (str): The joke that is being told\n",
      "All interactions will be structured in the following way, with the appropriate values filled in.\n",
      "\n",
      "[[ ## topic ## ]]\n",
      "{topic}\n",
      "\n",
      "[[ ## reasoning ## ]]\n",
      "{reasoning}\n",
      "\n",
      "[[ ## joke ## ]]\n",
      "{joke}\n",
      "\n",
      "[[ ## completed ## ]]\n",
      "In adhering to this structure, your objective is: \n",
      "        Generate a humorous joke related to the specified topic, suitable for a general adult audience. Be mindful of potentially sensitive content.\n",
      "\n",
      "==================================================\n",
      "MESSAGE 2: USER\n",
      "==================================================\n",
      "This is an example of the task, though some input or output fields are not supplied.\n",
      "\n",
      "[[ ## topic ## ]]\n",
      "Religious Satire\n",
      "\n",
      "==================================================\n",
      "MESSAGE 3: ASSISTANT\n",
      "==================================================\n",
      "[[ ## reasoning ## ]]\n",
      "Not supplied for this particular example. \n",
      "\n",
      "[[ ## joke ## ]]\n",
      "I respect everybody's beliefs, except Amish people. They are the only ones I can say clearly, 'Their God is wrong.' The speed limit is 75 miles an hour in Ohio, and one lane of traffic is blocked by a goddamned horse and buggy?\n",
      "\n",
      "==================================================\n",
      "MESSAGE 4: USER\n",
      "==================================================\n",
      "This is an example of the task, though some input or output fields are not supplied.\n",
      "\n",
      "[[ ## topic ## ]]\n",
      "Workplace\n",
      "\n",
      "==================================================\n",
      "MESSAGE 5: ASSISTANT\n",
      "==================================================\n",
      "[[ ## reasoning ## ]]\n",
      "Not supplied for this particular example. \n",
      "\n",
      "[[ ## joke ## ]]\n",
      "If your boss is getting you down, look at him through the prongs of a fork and imagine him in jail.\n",
      "\n",
      "==================================================\n",
      "MESSAGE 6: USER\n",
      "==================================================\n",
      "This is an example of the task, though some input or output fields are not supplied.\n",
      "\n",
      "[[ ## topic ## ]]\n",
      "Creation\n",
      "\n",
      "==================================================\n",
      "MESSAGE 7: ASSISTANT\n",
      "==================================================\n",
      "[[ ## reasoning ## ]]\n",
      "Not supplied for this particular example. \n",
      "\n",
      "[[ ## joke ## ]]\n",
      "If God had written the Bible, the first line should have been 'It's round.'\n",
      "\n",
      "==================================================\n",
      "MESSAGE 8: USER\n",
      "==================================================\n",
      "[[ ## topic ## ]]\n",
      "Misunderstanding\n",
      "\n",
      "==================================================\n",
      "MESSAGE 9: ASSISTANT\n",
      "==================================================\n",
      "[[ ## reasoning ## ]]\n",
      "The joke plays on a common misunderstanding of a phrase, taking it literally for comedic effect.\n",
      "\n",
      "[[ ## joke ## ]]\n",
      "I told my wife she was drawing her eyebrows too high. She seemed surprised.\n",
      "\n",
      "==================================================\n",
      "MESSAGE 10: USER\n",
      "==================================================\n",
      "[[ ## topic ## ]]\n",
      "{topic}\n",
      "\n",
      "Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## joke ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Extract and display the optimized prompt\n",
    "# MIPRO modifies the instructions to improve performance\n",
    "prompt = {\n",
    "  name: dspy.ChatAdapter().format(\n",
    "    p.signature,\n",
    "    demos=p.demos, \n",
    "    inputs={k: f\"{{{k}}}\" for k in p.signature.input_fields},\n",
    "  )\n",
    "  for name, p in mipro_optimized_joke_program.named_predictors()\n",
    "}['joke_generator.predict']\n",
    "\n",
    "# Show the optimized instructions\n",
    "for i, message in enumerate(prompt):\n",
    "    role = message['role']\n",
    "    content = message['content']\n",
    "    \n",
    "    print(f\"{'='*50}\")\n",
    "    print(f\"MESSAGE {i+1}: {role.upper()}\")\n",
    "    print(f\"{'='*50}\")\n",
    "    print(content)\n",
    "    print()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "id": "75b3c07c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'role': 'system',\n",
       "  'content': 'Your input fields are:\\n1. `topic` (str): The topic of the joke\\nYour output fields are:\\n1. `reasoning` (str): \\n2. `joke` (str): The joke that is being told\\nAll interactions will be structured in the following way, with the appropriate values filled in.\\n\\n[[ ## topic ## ]]\\n{topic}\\n\\n[[ ## reasoning ## ]]\\n{reasoning}\\n\\n[[ ## joke ## ]]\\n{joke}\\n\\n[[ ## completed ## ]]\\nIn adhering to this structure, your objective is: \\n        Generate a humorous joke related to the specified topic, suitable for a general adult audience. Be mindful of potentially sensitive content.'},\n",
       " {'role': 'user',\n",
       "  'content': 'This is an example of the task, though some input or output fields are not supplied.\\n\\n[[ ## topic ## ]]\\nReligious Satire'},\n",
       " {'role': 'assistant',\n",
       "  'content': \"[[ ## reasoning ## ]]\\nNot supplied for this particular example. \\n\\n[[ ## joke ## ]]\\nI respect everybody's beliefs, except Amish people. They are the only ones I can say clearly, 'Their God is wrong.' The speed limit is 75 miles an hour in Ohio, and one lane of traffic is blocked by a goddamned horse and buggy?\"},\n",
       " {'role': 'user',\n",
       "  'content': 'This is an example of the task, though some input or output fields are not supplied.\\n\\n[[ ## topic ## ]]\\nWorkplace'},\n",
       " {'role': 'assistant',\n",
       "  'content': '[[ ## reasoning ## ]]\\nNot supplied for this particular example. \\n\\n[[ ## joke ## ]]\\nIf your boss is getting you down, look at him through the prongs of a fork and imagine him in jail.'},\n",
       " {'role': 'user',\n",
       "  'content': 'This is an example of the task, though some input or output fields are not supplied.\\n\\n[[ ## topic ## ]]\\nCreation'},\n",
       " {'role': 'assistant',\n",
       "  'content': \"[[ ## reasoning ## ]]\\nNot supplied for this particular example. \\n\\n[[ ## joke ## ]]\\nIf God had written the Bible, the first line should have been 'It's round.'\"},\n",
       " {'role': 'user', 'content': '[[ ## topic ## ]]\\nMisunderstanding'},\n",
       " {'role': 'assistant',\n",
       "  'content': '[[ ## reasoning ## ]]\\nThe joke plays on a common misunderstanding of a phrase, taking it literally for comedic effect.\\n\\n[[ ## joke ## ]]\\nI told my wife she was drawing her eyebrows too high. She seemed surprised.'},\n",
       " {'role': 'user',\n",
       "  'content': '[[ ## topic ## ]]\\n{topic}\\n\\nRespond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## joke ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.'}]"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prompt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "id": "aa923d64",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Saved optimized joke generator!\n"
     ]
    }
   ],
   "source": [
    "# Save optimized program\n",
    "mipro_optimized_joke_program.save(\"./mipro_optimized_joke_program/\", save_program=True)\n",
    "print(\"Saved optimized joke generator!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3smk3iozkip",
   "metadata": {},
   "source": [
    "### Programs Saved Successfully!\n",
    "\n",
    "The optimized program has been saved to JSON file:\n",
    "- `mipro_optimized_jokes.json` - Contains the MIPRO-optimized joke generator\n",
    "\n",
    "This file include the optimized prompts, few-shot examples, and all parameters needed to reproduce the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "id": "ec6c596f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Joke about programming:\n",
      "Why did the programmer quit his job? Because he didn't get arrays! (Inheritance)\n"
     ]
    }
   ],
   "source": [
    "# Load the saved program\n",
    "loaded_joke_program = dspy.load(\"./mipro_optimized_joke_program/\")\n",
    "\n",
    "# Generate a joke about programming\n",
    "result = loaded_joke_program(topic=\"Python\") \n",
    "print(f\"\\nJoke about programming:\\n{result}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0xuxila2geh",
   "metadata": {},
   "source": [
    "## 14. Comparing Our Joke Generators\n",
    "\n",
    "Let's test our joke generators side by side:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "id": "xkfdxedinbf",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "==================================================\n",
      "TOPIC: Python\n",
      "==================================================\n",
      "\n",
      "BASIC: Why do Python programmers prefer dark mode?\n",
      "\n",
      "Because light attracts bugs!\n",
      "\n",
      "MIPRO: Why did the programmer quit his job? Because he didn't get arrays! (Inheritance)\n",
      "\n",
      "==================================================\n",
      "TOPIC: Coffee\n",
      "==================================================\n",
      "\n",
      "BASIC: Why did the coffee go to the police?\n",
      "\n",
      "It got mugged!\n",
      "\n",
      "MIPRO: I like my coffee how I like myself: dark, bitter, and too hot for you.\n",
      "\n",
      "==================================================\n",
      "TOPIC: Exercise\n",
      "==================================================\n",
      "\n",
      "BASIC: Why did the bicycle fall over? Because it was two tired!\n",
      "\n",
      "MIPRO: I hate when I lose my motivation to exercise. It's like, where do these extra 10 pounds keep coming from?\n"
     ]
    }
   ],
   "source": [
    "# Test all our joke generators\n",
    "test_topics = [\"Python\", \"Coffee\", \"Exercise\"]\n",
    "\n",
    "for topic in test_topics:\n",
    "    print(f\"\\n{'='*50}\")\n",
    "    print(f\"TOPIC: {topic}\")\n",
    "    print(f\"{'='*50}\")\n",
    "    \n",
    "    # Basic joke generator\n",
    "    basic_result = basic_joke_program(topic=topic)\n",
    "    print(f\"\\nBASIC: {basic_result.joke}\")\n",
    "    \n",
    "    # MIPRO optimized\n",
    "    mipro_result = mipro_optimized_joke_program(topic=topic)\n",
    "    print(f\"\\nMIPRO: {mipro_result}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ozmnfjximp",
   "metadata": {},
   "source": [
    "### Joke Quality Comparison\n",
    "\n",
    "Notice the progression in joke quality:\n",
    "\n",
    "**Basic Generator:**\n",
    "- Simple puns and wordplay\n",
    "- Generic dad joke style\n",
    "- No personality or edge\n",
    "\n",
    "**MIPRO Optimized:**\n",
    "- Mix of styles - sometimes puns, sometimes observational\n",
    "- More relatable and human\n",
    "- Better understanding of context\n",
    "\n",
    "The optimization process taught the models what makes professional comedians' jokes funnier than generic dad jokes!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7881u523lsp",
   "metadata": {},
   "source": [
    "## Summary: What We Learned\n",
    "\n",
    "In this tutorial, we explored DSPy's key concepts:\n",
    "\n",
    "### 1. **Signatures** - Simple Input/Output Declarations\n",
    "```python\n",
    "basic_joke_program = dspy.Predict('topic -> joke')\n",
    "```\n",
    "\n",
    "### 2. **Chain of Thought** - Automatic Reasoning\n",
    "```python\n",
    "cot_joke_program = dspy.ChainOfThought(joke_signature)\n",
    "```\n",
    "\n",
    "### 3. **Modules** - Reusable Components\n",
    "```python\n",
    "class JokeModule(dspy.Module):\n",
    "    def forward(self, topic: str) -> str:\n",
    "        # Your logic here\n",
    "```\n",
    "\n",
    "### 4. **Evaluation** - Systematic Performance Measurement\n",
    "```python\n",
    "evaluate = Evaluate(metric=exact_match, devset=valset)\n",
    "```\n",
    "\n",
    "### 5. **Optimization** - Automatic Prompt Improvement\n",
    "- **Bootstrap**: Selects effective few-shot examples\n",
    "- **MIPRO**: Optimizes both instructions and examples\n",
    "\n",
    "### Key Takeaways\n",
    "\n",
    "1. **No Manual Prompt Engineering**: DSPy handles prompt formatting automatically\n",
    "2. **Data-Driven Optimization**: Use labeled data to improve performance systematically\n",
    "3. **Modular Design**: Build complex AI systems from simple, reusable components\n",
    "4. **Automatic Improvement**: Optimizers find better prompts than manual tuning\n",
    "\n",
    "### Next Steps\n",
    "\n",
    "- Try different signatures and modules for your use case\n",
    "- Create custom evaluation metrics\n",
    "- Experiment with different optimizers\n",
    "- Build multi-step AI pipelines with multiple modules\n",
    "\n",
    "Happy building with DSPy! 🚀"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
