{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "2ca5bbbf-142b-48ed-89f2-18c63eac1eb0",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from dotenv import load_dotenv, find_dotenv\n",
    "_ = load_dotenv(find_dotenv())\n",
    "openai_api_key = os.environ[\"OPENAI_API_KEY\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "42f0647a-3384-4a8d-bc5a-446b58a9ec80",
   "metadata": {},
   "source": [
    "## Basic App: Question & Answering from a Document"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "196e2414-39ff-4643-9c31-b130697e3074",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_openai import OpenAI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "b191eb7d-a9df-47a6-95ac-fdec0b0b1969",
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = OpenAI()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "567ecd8a-34b9-4489-9b6d-7b257919737a",
   "metadata": {},
   "source": [
    "**Load the text file**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "e9801a4d-ecd7-460a-a4db-f0a1711163f9",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.document_loaders import TextLoader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "e01cbe49-38ce-40d2-8e01-01052483fbe5",
   "metadata": {},
   "outputs": [],
   "source": [
    "loader = TextLoader(\"data/be-good-and-how-not-to-die.txt\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "afff80c1-1cd1-408e-ac28-2a8594af2de6",
   "metadata": {},
   "outputs": [],
   "source": [
    "document = loader.load()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f6ff8b1-c63c-48b4-acaf-b959b98e770d",
   "metadata": {},
   "source": [
    "**The document is loaded as a Python list with metadata**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "684f2ccb-9ec4-4384-b875-3f89608292f8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'list'>\n"
     ]
    }
   ],
   "source": [
    "print(type(document))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "3f712992-beac-420c-b94e-1db729bd9508",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1\n"
     ]
    }
   ],
   "source": [
    "print(len(document))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "6cb6e6d1-765b-4762-8213-0c56e4829427",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'source': 'data/be-good-and-how-not-to-die.txt'}\n"
     ]
    }
   ],
   "source": [
    "print(document[0].metadata)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "eef34dd2-aace-45c9-a87b-c3967370f44f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "You have 1 document.\n"
     ]
    }
   ],
   "source": [
    "print(f\"You have {len(document)} document.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "dfb653b8-8035-40bf-b73c-542c90499486",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Your document has 27419 characters\n"
     ]
    }
   ],
   "source": [
    "print(f\"Your document has {len(document[0].page_content)} characters\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d8fbef1-4447-4ef5-ba21-9211b4486904",
   "metadata": {},
   "source": [
    "**Split the document in small chunks**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "18899367-8f8e-4d47-a61c-8041bafb4cd2",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.text_splitter import RecursiveCharacterTextSplitter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "65f31176-187b-4e0c-819a-dd35b7e61ee3",
   "metadata": {},
   "outputs": [],
   "source": [
    "text_splitter = RecursiveCharacterTextSplitter(\n",
    "    chunk_size=3000,\n",
    "    chunk_overlap=400\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "0147e8c9-70fb-4990-8bf6-c77ea283200a",
   "metadata": {},
   "outputs": [],
   "source": [
    "document_chunks = text_splitter.split_documents(document)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "86820a0a-e307-4bb9-9b69-24284d9f0754",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Now you have 12 chunks.\n"
     ]
    }
   ],
   "source": [
    "print(f\"Now you have {len(document_chunks)} chunks.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1756c471-5af1-4ba4-8f90-cfc70e897155",
   "metadata": {},
   "source": [
    "**Convert text chunks in numeric vectors (called \"embeddings\")**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "2e40b2df-53ad-4984-8972-cf7258b762fb",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.embeddings.openai import OpenAIEmbeddings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "c1e7c4f6-2116-4ce1-8349-640fa6591970",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/juliocolomer/.pyenv/versions/3.11.4/envs/venv020124/lib/python3.11/site-packages/langchain_core/_api/deprecation.py:117: LangChainDeprecationWarning: The class `langchain_community.embeddings.openai.OpenAIEmbeddings` was deprecated in langchain-community 0.1.0 and will be removed in 0.2.0. An updated version of the class exists in the langchain-openai package and should be used instead. To use it run `pip install -U langchain-openai` and import as `from langchain_openai import OpenAIEmbeddings`.\n",
      "  warn_deprecated(\n"
     ]
    }
   ],
   "source": [
    "embeddings = OpenAIEmbeddings()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67881eb4-7418-4423-b4b9-b750e4a579cf",
   "metadata": {},
   "source": [
    "**Load the embeddings to a vector database**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "75acf1fc-25a4-437c-bd1b-7310a83bae9e",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.vectorstores import FAISS"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "014dc49b-498a-49b0-883c-f323769320b3",
   "metadata": {},
   "source": [
    "*Careful: the next operation is expensive in OpenAI*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "8a2e4dbc-af28-497f-877a-933347c747ad",
   "metadata": {},
   "outputs": [],
   "source": [
    "stored_embeddings = FAISS.from_documents(document_chunks, embeddings)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "40ae930c-5d99-45c5-8597-52457d22b6d1",
   "metadata": {},
   "source": [
    "**Create a Retrieval Question & Answering Chain**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "e30338bd-58ae-467d-a01e-8bef3ae435b5",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.chains import RetrievalQA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "d9bed1ac-d148-4fb1-9382-bc58db111d73",
   "metadata": {},
   "outputs": [],
   "source": [
    "QA_chain = RetrievalQA.from_chain_type(\n",
    "    llm=llm,\n",
    "    chain_type=\"stuff\",\n",
    "    retriever=stored_embeddings.as_retriever()\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "71a96c6d-a473-4ecc-885d-55f1618487d4",
   "metadata": {},
   "source": [
    "**Now we have a Question & Answering APP**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "c432c922-984b-49b9-b3c1-4a6c5d2ccd00",
   "metadata": {},
   "outputs": [],
   "source": [
    "question = \"\"\"\n",
    "What is this article about? \n",
    "Describe it in less than 100 words.\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "7d407274-69c0-4d7f-8def-5e2275027a41",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/juliocolomer/.pyenv/versions/3.11.4/envs/venv020124/lib/python3.11/site-packages/langchain_core/_api/deprecation.py:117: LangChainDeprecationWarning: The function `run` was deprecated in LangChain 0.1.0 and will be removed in 0.2.0. Use invoke instead.\n",
      "  warn_deprecated(\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'\\nThis article is about the importance of perseverance and not giving up in startup companies. The author shares his experiences and observations from working with startups and emphasizes the need to keep pushing through difficult times in order to achieve success. He also discusses the role of motivation and the fear of failure in driving founders to succeed.'"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "QA_chain.run(question)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "825ce125-42c2-49bd-806b-1c3400778ba3",
   "metadata": {},
   "source": [
    "## New way: with LCEL"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "6adbc897-7d4f-4a81-a2bb-229489891849",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.prompts import PromptTemplate\n",
    "\n",
    "template = \"\"\"Answer the question based only on the following context:\n",
    "{context}\n",
    "\n",
    "Question: {question}\n",
    "\"\"\"\n",
    "prompt = PromptTemplate.from_template(template)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "f10dd0bf-2fab-483f-851e-324b5ede2045",
   "metadata": {},
   "outputs": [],
   "source": [
    "retriever = stored_embeddings.as_retriever()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "b1bf635b-99c7-4799-80a2-c882b4836718",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.schema.output_parser import StrOutputParser\n",
    "from langchain.schema.runnable import RunnablePassthrough"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "e0561e55-6dac-49c1-8373-6e0d705c3182",
   "metadata": {},
   "outputs": [],
   "source": [
    "chain = (\n",
    "    {\"context\": retriever, \"question\": RunnablePassthrough()}\n",
    "    | prompt\n",
    "    | llm\n",
    "    | StrOutputParser()\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "b87314dc-415e-4a24-938a-61b051ee9d9a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'\\nThis article discusses the importance of creating something that people want in order to have a successful startup. It also emphasizes the idea of not worrying too much about the business model at first and instead focusing on building a great product. The article also touches on the concept of failure and how it can be a motivating factor for founders to succeed. Additionally, it mentions the high likelihood of facing challenges and obstacles in a startup, but encourages readers to not give up and to keep pushing through. Overall, the article highlights the key elements and mindset needed for a startup to thrive.'"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain.invoke(\"What is this article about? Describe it in less than 100 words.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "77e6d414-8c1c-43bb-aa4d-05004868f608",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
