{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "879e5710-c975-4688-a47b-bab09c743f3c",
   "metadata": {},
   "source": [
    "## Multimodal RAG with GPT4V and LangChain"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c2a87a8d-1a8c-44b8-bf00-9c952fb580a5",
   "metadata": {},
   "source": [
    "#### When do we need Multimodal RAG\n",
    "Standard RAG is easy with text-only files, but what if we want to use RAG with pdfs or slides that have text, images, and tables? Then we use Multimodal RAG."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "161a41ec-623e-4711-acad-f166c6faca52",
   "metadata": {},
   "source": [
    "#### Multimodal RAG explained\n",
    "* Summarize text with the LLM model.\n",
    "* Summarize table with the LLM model.\n",
    "* Summarize images with the new Multimodal LLM model (GPT4V).\n",
    "* Convert summaries into numbers (embeddings) and store the embeddings in a multivector retriever (vector database).\n",
    "* Store the raw documents (the text and the summary of the images) in a DocumentStore.\n",
    "* When a question is asked, do similarity search to retrieve the most relevant docs and send the response to the LLM Model to format it properly using natural language."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "ebc913a1-54d9-40ee-8339-0f3a474c77cb",
   "metadata": {},
   "outputs": [],
   "source": [
    "#!pip install langchain openai"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "4a147265-5c72-433a-ad7c-65dbe335eb7e",
   "metadata": {},
   "outputs": [],
   "source": [
    "#!pip install pydantic lxml chromadb tiktoken"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "512b288a-711d-4dd1-bdc8-e55ea78aafe6",
   "metadata": {},
   "outputs": [],
   "source": [
    "#!pip install \"unstructured[all-docs]\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0526b432-7594-4e41-9e17-be900a2753c8",
   "metadata": {},
   "source": [
    "* The unstructured module is the key here. We will use it to extract all the relevant parts of the document (text, tables and images).\n",
    "* Chromadb will be our vector store."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "04b13762-8421-4696-979f-1fb1fe35f2a5",
   "metadata": {},
   "source": [
    "#### In order to use the unstructured module, we will need to install two other modules: tesseract and poppler\n",
    "* In MacOS with Homebrew:\n",
    "    * brew install tesseract\n",
    "    * brew install poppler\n",
    "* For other systems (Windows, etc):\n",
    "    * [info on how to install tesseract](https://tesseract-ocr.github.io/tessdoc/Installation.html)\n",
    "    * [info on how to install poppler](https://pdf2image.readthedocs.io/en/latest/installation.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "076da1c8-1791-47aa-b340-51ac3e7f4bd4",
   "metadata": {},
   "source": [
    "#### We will use a fake startupai-financial-report-v2.pdf file with text, tables and images"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8fc0e708-5cc4-447e-8ffa-48d8714b2567",
   "metadata": {},
   "source": [
    "#### What we will do next:\n",
    "* Import partition_pdf from the unstructured package\n",
    "* We set the tesseract_cmd to the path where we store our tesseract.exe file\n",
    "* We set input_path and output_path\n",
    "* We then create the raw_pdf_elements and run the partition_pdf function from the unstructured package:\n",
    "    * we set the filename and path\n",
    "    * we instruct unstructured to extract all the relevant parts of the file (text, tables and images)\n",
    "    * we set chunking strategy\n",
    "    * we set the output path\n",
    "* The following cell can take a few seconds to run"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "b9550ca4-4626-4447-bc04-c9d5baf7a20b",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "This function will be deprecated in a future release and `unstructured` will simply use the DEFAULT_MODEL from `unstructured_inference.model.base` to set default model name\n",
      "Some weights of the model checkpoint at microsoft/table-transformer-structure-recognition were not used when initializing TableTransformerForObjectDetection: ['model.backbone.conv_encoder.model.layer2.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer3.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer4.0.downsample.1.num_batches_tracked']\n",
      "- This IS expected if you are initializing TableTransformerForObjectDetection from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
      "- This IS NOT expected if you are initializing TableTransformerForObjectDetection from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n"
     ]
    }
   ],
   "source": [
    "from typing import Any\n",
    "import os\n",
    "from unstructured.partition.pdf import partition_pdf\n",
    "import pytesseract\n",
    "import os\n",
    "\n",
    "pytesseract.pytesseract.tesseract_cmd = r'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'\n",
    "\n",
    "input_path = os.getcwd()\n",
    "output_path = os.path.join(os.getcwd(), \"figures\")\n",
    "\n",
    "# Get elements\n",
    "raw_pdf_elements = partition_pdf(\n",
    "    filename=os.path.join(input_path, \"startupai-financial-report-v2.pdf\"),\n",
    "    extract_images_in_pdf=True,\n",
    "    infer_table_structure=True,\n",
    "    chunking_strategy=\"by_title\",\n",
    "    max_characters=4000,\n",
    "    new_after_n_chars=3800,\n",
    "    combine_text_under_n_chars=2000,\n",
    "    image_output_dir_path=output_path,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9ca91cc2-7ad1-478a-9963-e505a6ce6cd9",
   "metadata": {},
   "source": [
    "## See what we have in the raw_pdf_elements variable\n",
    "* The classes with CompositeElements are text\n",
    "* The classes with Table are tables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "0d1e230b-9aba-4938-a952-7c43fe99b46f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<unstructured.documents.elements.CompositeElement at 0x31a909650>,\n",
       " <unstructured.documents.elements.Table at 0x31fa35250>,\n",
       " <unstructured.documents.elements.CompositeElement at 0x31fac7950>]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "raw_pdf_elements"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cda0795a-453c-4949-b7fe-a30eee550c56",
   "metadata": {},
   "source": [
    "## Now we want to extract the relevant information\n",
    "* We want to store the text, table and image elements in 3 lists.\n",
    "* We cannot send the images as they are, we need to convert them into binary format with base64.\n",
    "* For the text and table elements we will loop to add them in their list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "866aa4a0-01f3-427b-adc9-c248914ec6fc",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "number of table elements in the pdf file:  1\n",
      "number of text elements in the pdf file:  2\n"
     ]
    }
   ],
   "source": [
    "import base64\n",
    "\n",
    "text_elements = []\n",
    "table_elements = []\n",
    "image_elements = []\n",
    "\n",
    "# Function to encode images\n",
    "def encode_image(image_path):\n",
    "    with open(image_path, \"rb\") as image_file:\n",
    "        return base64.b64encode(image_file.read()).decode('utf-8')\n",
    "\n",
    "# We create the text and table elements in 2 steps\n",
    "# Step 1: append the entire class in the list\n",
    "for element in raw_pdf_elements:\n",
    "    #Text elements have CompositeElement in the string of their type name\n",
    "    if 'CompositeElement' in str(type(element)):\n",
    "        text_elements.append(element)\n",
    "    #Table element have Table in the string of their type name\n",
    "    elif 'Table' in str(type(element)):\n",
    "        table_elements.append(element)\n",
    "\n",
    "# Step 2: extract just the text, we don't want to store the raw classes\n",
    "table_elements = [i.text for i in table_elements]\n",
    "text_elements = [i.text for i in text_elements]\n",
    "\n",
    "# Tables\n",
    "print(\"number of table elements in the pdf file: \", len(table_elements))\n",
    "\n",
    "# Text\n",
    "print(\"number of text elements in the pdf file: \", len(text_elements))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9814ec5b-55f9-4bb9-90ba-d609c38bbb17",
   "metadata": {},
   "source": [
    "#### Images\n",
    "* They are currently stored in the \"figures\" folder.\n",
    "* We will loop through that folder:\n",
    "    * check if the image file ends with png, jpg, jpeg\n",
    "    * then provide the full page to the encode_image function to encode it in a base64 format\n",
    "    * and then enter the encoding result in the image list\n",
    "* The following cell may take a few seconds to run. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "18c058ca-4521-4b9b-8329-27a7f343cda8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "number of image elements in the pdf file:  8\n"
     ]
    }
   ],
   "source": [
    "for image_file in os.listdir(output_path):\n",
    "    if image_file.endswith(('.png', '.jpg', '.jpeg')):\n",
    "        image_path = os.path.join(output_path, image_file)\n",
    "        encoded_image = encode_image(image_path)\n",
    "        image_elements.append(encoded_image)\n",
    "print(\"number of image elements in the pdf file: \",len(image_elements))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa509350-76b5-4ee1-83f0-56351eae3423",
   "metadata": {},
   "source": [
    "## Now we can create 3 functions to summarize the texts, table and images\n",
    "* for the text and table the functions are very similar\n",
    "* for the images we use GPT4V\n",
    "* pay attention on how we set the url"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "8b027331-d12f-4849-a62c-9a8b1781a857",
   "metadata": {},
   "outputs": [],
   "source": [
    "#!pip install langchain-openai"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e0060d24-0493-419f-95db-17e79d8a1abb",
   "metadata": {},
   "source": [
    "Remember to load your OpenAI API key in the .env file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "6df5cc87-afd8-48bd-9353-8885f68a898e",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from dotenv import load_dotenv, find_dotenv\n",
    "_ = load_dotenv(find_dotenv())\n",
    "openai_api_key = os.environ[\"OPENAI_API_KEY\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "8fe4290c-04ca-4f45-8382-5aa2e62a32e8",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_openai import ChatOpenAI\n",
    "from langchain.schema.messages import HumanMessage, AIMessage\n",
    "\n",
    "chain_gpt_35 = ChatOpenAI(model=\"gpt-3.5-turbo\", max_tokens=1024)\n",
    "chain_gpt_4_vision = ChatOpenAI(model=\"gpt-4-vision-preview\", max_tokens=1024)\n",
    "\n",
    "# Function for text summaries\n",
    "def summarize_text(text_element):\n",
    "    prompt = f\"Summarize the following text:\\n\\n{text_element}\\n\\nSummary:\"\n",
    "    response = chain_gpt_35.invoke([HumanMessage(content=prompt)])\n",
    "    return response.content\n",
    "\n",
    "# Function for table summaries\n",
    "def summarize_table(table_element):\n",
    "    prompt = f\"Summarize the following table:\\n\\n{table_element}\\n\\nSummary:\"\n",
    "    response = chain_gpt_35.invoke([HumanMessage(content=prompt)])\n",
    "    return response.content\n",
    "\n",
    "# Function for image summaries\n",
    "def summarize_image(encoded_image):\n",
    "    prompt = [\n",
    "        AIMessage(content=\"You are a bot that is good at analyzing images.\"),\n",
    "        HumanMessage(content=[\n",
    "            {\n",
    "                \"type\": \"text\", \n",
    "                \"text\": \"Describe the contents of this image.\"},\n",
    "            {\n",
    "                \"type\": \"image_url\",\n",
    "                \"image_url\": {\n",
    "                    \"url\": f\"data:image/jpeg;base64,{encoded_image}\"\n",
    "                },\n",
    "            },\n",
    "        ])\n",
    "    ]\n",
    "    response = chain_gpt_4_vision.invoke(prompt)\n",
    "    return response.content"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "418d6007-d3e4-484d-a682-616111fa004f",
   "metadata": {},
   "source": [
    "## Now we will create a summary for each text, table and image element\n",
    "* The following cells will take some time to run.\n",
    "* Careful: GPT4V is significantly more expensive than the regular GPT models.\n",
    "* Note: If you try to summarize all them, the Jupyter Kernel may crash occassionally. It that happens, you will have to run it again. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "3d0d9e75-91cb-4c0f-899d-612eb2c30f08",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1th element of texts processed.\n",
      "2th element of texts processed.\n"
     ]
    }
   ],
   "source": [
    "# Processing text elements, stopping at the 2nd\n",
    "text_summaries = []\n",
    "for i, te in enumerate(text_elements[0:2]):\n",
    "    summary = summarize_text(te)\n",
    "    text_summaries.append(summary)\n",
    "    print(f\"{i + 1}th element of texts processed.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "cf6a8880-2390-4217-8874-f27dea4c521b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1th element of tables processed.\n"
     ]
    }
   ],
   "source": [
    "# Processing table elements, stopping at the 1st\n",
    "table_summaries = []\n",
    "for i, te in enumerate(table_elements[0:1]):\n",
    "    summary = summarize_table(te)\n",
    "    table_summaries.append(summary)\n",
    "    print(f\"{i + 1}th element of tables processed.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "7fea3c3e-df33-4596-989a-8245b190507e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1th element of images processed.\n",
      "2th element of images processed.\n",
      "3th element of images processed.\n",
      "4th element of images processed.\n",
      "5th element of images processed.\n",
      "6th element of images processed.\n",
      "7th element of images processed.\n",
      "8th element of images processed.\n"
     ]
    }
   ],
   "source": [
    "# Processing image elements, stopping at the 8th\n",
    "image_summaries = []\n",
    "for i, ie in enumerate(image_elements[0:8]):\n",
    "    summary = summarize_image(ie)\n",
    "    image_summaries.append(summary)\n",
    "    print(f\"{i + 1}th element of images processed.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c56c27c-96f8-411f-b4af-73a111f90989",
   "metadata": {},
   "source": [
    "## After creating the summaries, we can now proceed with the RAG technique\n",
    "* We will use LangChain's [Multi-Vector Retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/multi_vector) to store:\n",
    "    * all our documents,\n",
    "    * the summaries,\n",
    "    * and also the embeddings\n",
    "    * in a vector database\n",
    "* We will use a Chroma vector database\n",
    "* We will use a docstore to store the raw documents (the original documents). For that we will use InMemoryStore from LangChain.\n",
    "* We will provide later the id_key, now it is just a string\n",
    "* Then we create the function to add documents to the retriever\n",
    "    * create some uuids for our documents using the uuid4() function.\n",
    "        * The uuid4 function in Python is part of the uuid module, which generates unique identifiers according to the UUID (Universally Unique Identifier) standard.\n",
    "        * The uuid4 function specifically generates a random UUID based on the version 4 specification. This means that each time you call uuid4, it generates a completely random UUID that is highly unlikely to be duplicated anywhere else, now or in the future.\n",
    "        * A UUID generated by uuid4 looks something like this: 12345678-1234-5678-1234-567812345678, where each digit is a hexadecimal character (0-9, a-f), representing a 128-bit value.\n",
    "        * The version 4 UUIDs are useful for situations where you need to ensure uniqueness across different systems without the need for a central coordinating mechanism.\n",
    "    * then we will create a list of documents using the Document class\n",
    "        * for each page_content, we include the summary of the element\n",
    "        * for each metadata, we enter the uuid\n",
    "    * Then we add the documents to the vector database\n",
    "    * We also store our raw documents in the docstore. Each raw document has the corresponding uuid.\n",
    "    * As you can see, the connection between the vector database and the docstore is in the uuids."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "fff2267a-47d5-4a14-adfb-6f550927d20e",
   "metadata": {},
   "outputs": [],
   "source": [
    "import uuid\n",
    "\n",
    "from langchain_openai import OpenAIEmbeddings\n",
    "from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
    "from langchain.schema.document import Document\n",
    "from langchain.storage import InMemoryStore\n",
    "from langchain.vectorstores import Chroma\n",
    "\n",
    "# Initialize the Chroma vector database and docstore\n",
    "vectorstorev2 = Chroma(collection_name=\"summaries\", embedding_function=OpenAIEmbeddings())\n",
    "storev2 = InMemoryStore()\n",
    "id_key = \"doc_id\"\n",
    "\n",
    "# Initialize the multi-vector retriever\n",
    "retrieverv2 = MultiVectorRetriever(vectorstore=vectorstorev2, docstore=storev2, id_key=id_key)\n",
    "\n",
    "# Function to add documents to the multi-vector retriever\n",
    "def add_documents_to_retriever(summaries, original_contents):\n",
    "    doc_ids = [str(uuid.uuid4()) for _ in summaries]\n",
    "    summary_docs = [\n",
    "        Document(page_content=s, metadata={id_key: doc_ids[i]})\n",
    "        for i, s in enumerate(summaries)\n",
    "    ]\n",
    "    retrieverv2.vectorstore.add_documents(summary_docs)\n",
    "    retrieverv2.docstore.mset(list(zip(doc_ids, original_contents)))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f3d09c50-b99a-478a-901a-17ae2f703296",
   "metadata": {},
   "source": [
    "## Now we can add everything to the Multivector Retriever\n",
    "* for text and tables:\n",
    "    * summaries are stored in the vector database.\n",
    "    * raw documents are stored in the docstore.\n",
    "* for the images:\n",
    "    * summaries are stored in the vector database.\n",
    "    * summaries (not the raw images) are also stored in the docstore.   "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "cceadb36-b3c4-4125-900a-03709dcec5f5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Add text summaries\n",
    "add_documents_to_retriever(text_summaries, text_elements)\n",
    "\n",
    "# Add table summaries\n",
    "add_documents_to_retriever(table_summaries, table_elements)\n",
    "\n",
    "# Add image summaries\n",
    "add_documents_to_retriever(image_summaries, image_summaries)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "37d032b9-c6d6-4170-98ae-dc91605ebb90",
   "metadata": {},
   "source": [
    "## After adding that, we can now retrieve the information"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "0a37dece-ec38-4359-874a-9bd848e71060",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['The image contains the word \"STATEMENT\" in capital letters, set against a solid blue background. The text is white, making it stand out prominently against the blue. The font is bold and sans-serif, which gives it a strong and clear appearance. There are no other graphics or text in the image.',\n",
       " \"The image shows a modern graphics card, which is a piece of computer hardware responsible for rendering images, videos, and animations for the display. This card has a dual-fan cooling solution, with two large fans on the top to dissipate heat. You can see the heatsink fins beneath the fans, which help to increase the surface area for better thermal performance.\\n\\nOn the side facing us, there's a bracket with various ports that indicate this card's output options, which typically include DisplayPort, HDMI, and sometimes DVI or USB-C. The presence of multiple outputs suggests it can support multi-monitor setups.\\n\\nThe card is mounted on a black printed circuit board (PCB) with a PCIe (Peripheral Component Interconnect Express) connector at the bottom, which is used to slot the card into the motherboard of a computer.\\n\\nThere are also some branding elements visible, but due to the guidelines, I will not identify the specific brand or model of the graphics card. The overall design suggests that it's intended for gaming or high-performance computing tasks that require substantial graphical processing power.\",\n",
       " 'The image displays the text \"FINANCIAL STATEMENT\" in large, bold, capital letters. The text is in a bright orange color set against a navy blue background. The font is sans-serif, which gives it a modern and clean appearance. The text is the focal point of this image and there are no other graphics or elements visible.',\n",
       " 'The image displays the text \"$22,000,000 SALES\" in bold, capitalized letters. The text is in a shade of blue or purple, set against a white background. The dollar sign and the numbers indicate that the text is referring to a financial figure, specifically twenty-two million dollars in sales.']"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "retrieverv2.get_relevant_documents(\n",
    "    \"What do you see in the images?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "040b58e0-b9b5-4e83-b28c-da07638a3ef1",
   "metadata": {},
   "source": [
    "#### Now, if we use the multi-vector retriever as the context:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "3d6c41db-3b39-4c01-8d01-99dc504923f1",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.schema.runnable import RunnablePassthrough\n",
    "from langchain.prompts import ChatPromptTemplate\n",
    "from langchain.schema.output_parser import StrOutputParser\n",
    "\n",
    "template = \"\"\"Answer the question based only on the following context, which can include text, images and tables:\n",
    "{context}\n",
    "Question: {question}\n",
    "\"\"\"\n",
    "prompt = ChatPromptTemplate.from_template(template)\n",
    "\n",
    "model = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo\")\n",
    "\n",
    "chain = (\n",
    "    {\"context\": retrieverv2, \"question\": RunnablePassthrough()}\n",
    "    | prompt\n",
    "    | model\n",
    "    | StrOutputParser()\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9686d003-1f2c-4698-ba91-a305d7698913",
   "metadata": {},
   "source": [
    "#### Then we can make questions about text, images or tables in the document"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "3eba9409-4db0-4b95-95a5-92f9e4c9ef38",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Based on the context provided, the images in the database contain the following elements:\\n1. A statement displayed in bold white text against a blue background.\\n2. A financial figure of $22,000,000 in sales in bold blue or purple text on a white background.\\n3. The text \"FINANCIAL STATEMENT\" in large orange letters on a navy blue background.\\n4. A donut chart with two segments - one orange segment covering around 67% and one purple segment covering around 33%. In the center of the chart, there is text that reads \"33% ROI\" indicating the Return on Investment percentage.'"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain.invoke(\n",
    "     \"What do you see on the images in the database?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "6713b558-a029-4da1-b3ca-51e63c7c88e4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The name of the company is StartupAI.'"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain.invoke(\n",
    "     \"What is the name of the company?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "89332ccb-b326-47b2-a2f3-7277929025b4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The product displayed in the image is a modern graphics card.'"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain.invoke(\n",
    "     \"What is the product displayed in the image?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "04ae8143-6c7c-46fe-8449-0ae9fcf40caf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The total expenses of the company are $2,000,000.'"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain.invoke(\n",
    "     \"How much are the total expenses of the company?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "6df50141-a858-43c5-baeb-132b3faba90d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The ROI is 33%.'"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain.invoke(\n",
    "     \"What is the ROI?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "df0f1601-ca0c-4894-848c-1acf98257579",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The company sold $22 million in 2023.'"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain.invoke(\n",
    "     \"How much did the company sell in 2023?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "37f169b7-d2d6-4677-88e1-9b48d63c0c61",
   "metadata": {},
   "source": [
    "* Note: see that the previous answer can be seen as a mistake if we look at the bar chart, but we have to admit that the pdf is a bit confusing about it since it highlights de 22M sales in 2 different places."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "087f5577-7400-4bff-828f-c8961afc7342",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'In 2022, the approximate value represented on the bar chart is 15 units.'"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain.invoke(\n",
    "     \"And in 2022?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f47596b-212f-47db-a42c-90be3ff0c659",
   "metadata": {},
   "source": [
    "* Note: see that now GPT4 is taking the right sales data from the bar chart. Impressive!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "acaf814c-9f2a-4b28-9d1b-9fdbcbfb4997",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}