{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "e4fc491a-0270-4f50-9f54-49c81d5d1fd7",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from dotenv import load_dotenv, find_dotenv\n",
    "_ = load_dotenv(find_dotenv())\n",
    "openai_api_key = os.environ[\"OPENAI_API_KEY\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "53569208-eefb-400b-92f9-f63a88352ba5",
   "metadata": {},
   "source": [
    "## Basic Use Case: Summarize a Text File"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "bd631867-f2c0-4005-b3fc-6b90794f77b0",
   "metadata": {},
   "outputs": [],
   "source": [
    "#!pip install openai"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "e350f508-0fed-419c-97f4-e2701e06d61e",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_openai import OpenAI\n",
    "llm = OpenAI(openai_api_key=openai_api_key)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e98370b-dc76-4aa6-8e30-9514127e818e",
   "metadata": {},
   "source": [
    "**Load the text file**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "af73a1cf-a538-43d3-bde6-ba548648b42f",
   "metadata": {},
   "outputs": [],
   "source": [
    "with open('data/be-good-and-how-not-to-die.txt', 'r') as file:\n",
    "    article = file.read()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "c9aa63f3-366c-46a0-85b6-839ae9b144cd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'str'>\n"
     ]
    }
   ],
   "source": [
    "print(type(article))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "97602f83-729c-4919-95dc-8116baaa1727",
   "metadata": {},
   "source": [
    "**Print the first 285 characters of the article**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "8f4cfd7a-5c15-436b-897d-04b38562963a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Be good\n",
      "\n",
      "April 2008(This essay is derived from a talk at the 2008 Startup School.)About a month after we started Y Combinator we came up with the\n",
      "phrase that became our motto: Make something people want.  We've\n",
      "learned a lot since then, but if I were choosing now that's still\n",
      "the one \n"
     ]
    }
   ],
   "source": [
    "print(article[:285])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "051cd10b-cf73-4eb0-9125-1090a8d930ce",
   "metadata": {},
   "source": [
    "**Check how many tokens are in the article**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "e8dfe7bc-2f8c-44c8-b097-d0572aa7b384",
   "metadata": {},
   "outputs": [],
   "source": [
    "num_tokens = llm.get_num_tokens(article)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "1abaf389-0b97-4492-98e9-67f049b89fe0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There are 6284 in the article.\n"
     ]
    }
   ],
   "source": [
    "print(f\"There are {num_tokens} in the article.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ab9738ed-b07d-462e-824a-e1e0e3419e8c",
   "metadata": {},
   "source": [
    "**ChatGPT-3.5 has a context window of 4097 tokens**\n",
    "* This means that we cannot enter all this text in chatGPT to summarize it."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c2a972a-d3c1-4aac-b57d-6a814cc021cc",
   "metadata": {},
   "source": [
    "**Split the article in smaller chunks**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "cf6ef48d-f633-4ace-8477-6b55c33f4b57",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.text_splitter import RecursiveCharacterTextSplitter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "74f79206-f831-443f-8253-6f81b440a475",
   "metadata": {},
   "outputs": [],
   "source": [
    "text_splitter = RecursiveCharacterTextSplitter(\n",
    "    separators=[\"\\n\\n\", \"\\n\"], \n",
    "    chunk_size=5000,\n",
    "    chunk_overlap=350\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "a8d7f87d-61e9-49bc-97ff-4793c0d28ca8",
   "metadata": {},
   "outputs": [],
   "source": [
    "article_chunks = text_splitter.create_documents([article])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "a1e3cfa5-db7b-4f90-a9ef-ec4dd6bafec6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "You have 8 chunks instead of 1 article.\n"
     ]
    }
   ],
   "source": [
    "print(f\"You have {len(article_chunks)} chunks instead of 1 article.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1128210e-420f-456e-800c-4abfd6584eb8",
   "metadata": {},
   "source": [
    "**Use a chain to help the LLM to summarize the 8 chunks**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "1c0b9634-1fd8-43c0-a720-53475a3570d8",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.chains.summarize import load_summarize_chain"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "8fd39d3a-6348-459c-a5b4-feaaa0796365",
   "metadata": {},
   "outputs": [],
   "source": [
    "chain = load_summarize_chain(\n",
    "    llm=llm,\n",
    "    chain_type=\"map_reduce\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "cc94614a-8efc-4cfa-b0ca-f9141d864bbd",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/juliocolomer/.pyenv/versions/3.11.4/envs/venv020124/lib/python3.11/site-packages/langchain_core/_api/deprecation.py:117: LangChainDeprecationWarning: The function `run` was deprecated in LangChain 0.1.0 and will be removed in 0.2.0. Use invoke instead.\n",
      "  warn_deprecated(\n"
     ]
    }
   ],
   "source": [
    "article_summary = chain.run(article_chunks)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "d7af84c6-8a5c-4640-b123-6fdeb51dcb98",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " This passage discusses the importance of exhibiting positive behavior in the business world, particularly in the success of startups. The author emphasizes the benefits of benevolence in attracting investors, creating a positive work environment, and making a company harder to fail. They also discuss the challenges of maintaining benevolence as a company grows and the importance of perseverance and determination in achieving success.\n"
     ]
    }
   ],
   "source": [
    "print(article_summary)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "36fa1303-d1a5-43a9-8495-6bd43735d4b8",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
