{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "f0f9eff6-2e9f-4ed6-baf5-57377aa3c044",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from dotenv import load_dotenv, find_dotenv\n",
    "_ = load_dotenv(find_dotenv())\n",
    "openai_api_key = os.environ[\"OPENAI_API_KEY\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "503a946e-bd0f-4d75-b211-894fc6666089",
   "metadata": {},
   "source": [
    "## Output Parsers"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b01d48a4-9f1f-4a2d-9d3e-e5ae0f019cee",
   "metadata": {},
   "source": [
    "Language models respond to our questions with plain text, but sometimes we would like to have those answers in a different format, like a JSON dictionary or a XML document. In order to achieve that, we use the Output Parsers."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5b95287-8289-478b-b5e4-a1b43e3a62af",
   "metadata": {},
   "source": [
    "**Example WITHOUT output parsers**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "9d38386e-9ba6-4940-a58c-86d1909d6c81",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_openai import OpenAI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "dc7f7f5a-55d8-42c3-b783-251147761b32",
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = OpenAI()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "2f929672-76e0-4ebb-8b8c-d9e770f8b657",
   "metadata": {},
   "outputs": [],
   "source": [
    "template_for_desired_output = \"\"\"\n",
    "Here is a book review: {book_review}.\n",
    "\n",
    "I want you to output three things about the review in a JSON dictionary:\n",
    "\n",
    "\"sentiment\": Is it positive or negative?\n",
    "\"positive\": What is positively highlighted about the book in the review?\n",
    "\"negative\": What is negatively highlighted about the book in the review?\n",
    "\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "cd06c8e9-feef-42e3-847a-d639b13d6de4",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.prompts import PromptTemplate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "d488a861-c65b-4ef9-a217-38f78b45ec98",
   "metadata": {},
   "outputs": [],
   "source": [
    "prompt_template = PromptTemplate(\n",
    "    input_variables=[\"book_review\"],\n",
    "    template=template_for_desired_output\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "35d30695-bd01-4886-a4a0-cdf8a74c2acc",
   "metadata": {},
   "outputs": [],
   "source": [
    "user_input = \"\"\"\n",
    "I bought the kindle edition of this book and I found it to be \n",
    "a terrible reading experience. Quote often it seemed like \n",
    "pieces of sentences were missing and occasionally when going \n",
    "from one page to another I would see the text of two lines \n",
    "superimposed on each other. I don't have a kindle so I used \n",
    "the android kindle app and kindle for the PC and both left \n",
    "much to be desired. Also I hated that if I flipped a few \n",
    "pages ahead that it would reset the location. Overall I would \n",
    "not recommend the e-book edition, the quality is horrendous. \n",
    "Also now that I see the prices, I paid more for the terrible \n",
    "kindle edition than the hardcover edition goes for. Anyway \n",
    "I would never recommend paying more for an e-book than a print \n",
    "book because you have less rights than with the print book. \n",
    "Also for a book like this I think there is value to being able \n",
    "to flip through it really quickly while the kindle interface \n",
    "is best for flipping through page by page.\n",
    "\n",
    "That being said, the quality of the book was great. It was full \n",
    "of all sorts of insights and experiences. They can all be \n",
    "summarized as don't give up, watch out for VC's but don't \n",
    "write them off, listen to your customers, be willing to change, \n",
    "make sure the initial founding team works together well, etc... \n",
    "But just listening the values does not do it justice. You \n",
    "really have to read the experiences. The book is full of all \n",
    "sorts of insights too, not just about entrepreneurship but \n",
    "also about the individual companies. For example I was really \n",
    "impressed about PayPal and the fraud stuff they did and how \n",
    "valuable that was. I just never knew. Overall I think the book \n",
    "was very well put together. Although some of the founders liked \n",
    "to talk a lot more than others and it droned on and on. But \n",
    "others were brief and insightful. I would definitely recommend \n",
    "this.\n",
    "\n",
    "If I bought the print edition I suspect I would be giving it \n",
    "5 stars. But really the kindle experience is probably worth \n",
    "0 stars. But the content is so good that I figure 4 stars is \n",
    "fair. Since at this time I see hardcover editions for $5 or \n",
    "$6 new I would say go grab one of those now!!! The book is \n",
    "definitely inspirational.\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "4f8c164b-604c-4ed5-992d-211a7a397df5",
   "metadata": {},
   "outputs": [],
   "source": [
    "question = prompt_template.format(book_review=user_input)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "23d9cc53-3392-4b91-bd92-668cdb5bc3ba",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/juliocolomer/.pyenv/versions/3.11.4/envs/venv021524/lib/python3.11/site-packages/langchain_core/_api/deprecation.py:117: LangChainDeprecationWarning: The function `__call__` was deprecated in LangChain 0.1.7 and will be removed in 0.2.0. Use invoke instead.\n",
      "  warn_deprecated(\n"
     ]
    }
   ],
   "source": [
    "response = llm(question)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "d02d7351-208b-472e-8103-1fc0f5ff2869",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"sentiment\": \"Mixed\",\n",
      "    \"positive\": \"The book is full of insights and experiences that are valuable for entrepreneurs and individuals alike\",\n",
      "    \"negative\": \"The kindle edition has terrible quality, with missing sentences and overlapping text. The interface is not user-friendly and the price is higher than the hardcover edition.\"\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "print(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "a6676418-0bda-4d31-b660-b7128bdea9c3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'str'>\n"
     ]
    }
   ],
   "source": [
    "print(type(response))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "65e789c7-269f-42de-9092-34bac8bab49f",
   "metadata": {},
   "source": [
    "**As you can see, even when it seems it is in JSON format, in fact the response is NOT in JSON format but in string format.**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eb4b4c18-ef0e-493a-8307-653d5593cba9",
   "metadata": {},
   "source": [
    "**Example WITH Output Parser**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "1c52236c-d8dd-4a66-ab07-06aa92b40a9e",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.output_parsers import PydanticOutputParser"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "c45214bc-4ebb-491a-8947-c36139126376",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_core.pydantic_v1 import BaseModel, Field, validator"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "20b8daf9-f2b0-4a7a-b978-ddb870f188cc",
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import List"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "abb4616d-0275-40cf-bbf5-1d6b24d0728e",
   "metadata": {},
   "outputs": [],
   "source": [
    "class Reviews_Desired_Output_Structure(BaseModel):\n",
    "    sentiment: List[str] = Field(\n",
    "        description=\"Is it positive or negative?\"\n",
    "    )\n",
    "    positive: List[str] = Field(\n",
    "        description=\"What is positively highlighted about the book in the review?\"\n",
    "    )\n",
    "    negative: List[str] = Field(\n",
    "        description=\"What is negatively highlighted about the book in the review?\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "52330c1e-7e93-4d56-a334-4a09df9a769a",
   "metadata": {},
   "outputs": [],
   "source": [
    "output_parser = PydanticOutputParser(\n",
    "    pydantic_object=Reviews_Desired_Output_Structure\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "73a73f9a-9f33-45fb-b9b5-b69af9d3017d",
   "metadata": {},
   "outputs": [],
   "source": [
    "template_for_desired_output_with_parser = \"\"\"\n",
    "Here is a book review: {book_review}.\n",
    "\n",
    "I want you to output three things about the review in a JSON dictionary:\n",
    "\n",
    "{format_instructions}\n",
    "\n",
    "\"sentiment\": Is it positive or negative?\n",
    "\"positive\": What is positively highlighted about the book in the review?\n",
    "\"negative\": What is negatively highlighted about the book in the review?\n",
    "\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "045ef2f9-fa10-4bba-8caa-73dbd6864a32",
   "metadata": {},
   "outputs": [],
   "source": [
    "prompt_template_with_parser = PromptTemplate(\n",
    "    template=template_for_desired_output_with_parser,\n",
    "    input_variables=[\"book_review\"],\n",
    "    partial_variables={\n",
    "        \"format_instructions\": output_parser.get_format_instructions()\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "df4fc20d-4ac8-421d-a62e-9903eaa87129",
   "metadata": {},
   "outputs": [],
   "source": [
    "question = prompt_template_with_parser.format(book_review=user_input)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "25597a84-99b1-4059-820d-d965af6b25ba",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = llm(question)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "a1055c93-8ab4-47a9-8912-586bb458d3a0",
   "metadata": {},
   "outputs": [],
   "source": [
    "formatted_output = output_parser.parse(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "98484d2f-5f28-4afb-b006-1b62185581bf",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class '__main__.Reviews_Desired_Output_Structure'>\n"
     ]
    }
   ],
   "source": [
    "print(type(formatted_output))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "1f394fb6-b930-44d3-a12f-e6f03b1ea5fd",
   "metadata": {},
   "outputs": [],
   "source": [
    "json_output = formatted_output.json()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "e3c1e6f5-2e2a-4202-9ca3-56c7d3855cb6",
   "metadata": {},
   "outputs": [],
   "source": [
    "import json"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "6dfe8577-b16f-4533-9bf0-9659c3f63325",
   "metadata": {},
   "outputs": [],
   "source": [
    "python_dict = json.loads(json_output)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "ecd344c0-2591-4eaa-abed-d2efc5274b17",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'sentiment': ['positive'], 'positive': ['Insights and experiences shared by founders', 'Valuable insights about entrepreneurship and individual companies', 'Inspirational'], 'negative': ['Poor quality of Kindle edition']}\n"
     ]
    }
   ],
   "source": [
    "print(python_dict)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "78f06108-530a-4bcc-87f7-828b6412e1b7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'dict'>\n"
     ]
    }
   ],
   "source": [
    "print(type(python_dict))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "565e8cef-8494-4f37-a4fe-a928e25e8a16",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
