{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "1469814f-64f6-4a04-a006-94ff259226e7",
   "metadata": {},
   "source": [
    "## LangChain Update: 15 Feb 2024\n",
    "* [LangChain announces general availability of LangSmith and 25M Series A led by Sequoia Capital](https://blog.langchain.dev/langsmith-ga/)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cd51a314-fc1e-4c8b-b86a-c95afd665d9d",
   "metadata": {},
   "source": [
    "#### Highlights\n",
    "* LangSmith goes from beta to production.\n",
    "    * **Improving from user feedback** since July 2023.\n",
    "    * Investing in infrastructure.\n",
    "* The capital raised means trust, future and stability. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e7d23cc7-82c0-4351-acda-d5f26af85e4b",
   "metadata": {},
   "source": [
    "## LangChain describes the full Professional LLM App Production Cycle, and how to use LangSmith through it all\n",
    "\n",
    "#### 1. Prototyping: develop the first prototype of the application.\n",
    "* Have LangSmith tracing enabled since the day 1 of the development of a new LLM Application. When things go wrong it’s extremely helpful to debug by looking through the application traces.\n",
    "* Use LangSmith Playground to iterate and experiment.\n",
    "* Use LangSmith Comparison View to compare the performance of alternative prompts, retrieval strategies and model choices.\n",
    "* Use LangSmith to create a Test Dataset. Test datasets are collections of inputs and reference outputs to evaluate the performance of the LLM Application.\n",
    "\n",
    "#### 2. Beta testing: test the prototype with feedback from real users.\n",
    "* Use LangSmith to filter traces with negative human feedback to understand the problems behind them.\n",
    "* Use LangSmith to inspect interesting traces and enter annotations about them.\n",
    "* Use LangSmith to expand the Test Dataset by adding runs as examples. \n",
    "\n",
    "#### 3. Production: monitor the performance of your application and keep improving it.\n",
    "* Use LangSmith to monitor key metrics.\n",
    "* Use LangSmith to mark different versions for A/B Testing of prompts, models or retrieval strategies."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4449ee0b-1833-44bb-afcf-9c204e25c3a6",
   "metadata": {},
   "source": [
    "## 1. Prototyping\n",
    "* involves quick experimentation between prompts, model types, retrieval strategy and other parameters.\n",
    "* The ability to rapidly understand how the model is performing — and debug where it is failing — is incredibly important for this phase."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "edff6310-acd7-45e7-82c9-6e34be5fd4bc",
   "metadata": {},
   "source": [
    "## 1.1. Debugging"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43812939-0133-4b81-b73d-541e811759ac",
   "metadata": {},
   "source": [
    "#### Have LangSmith tracing enabled when developing a new LLM Application\n",
    "* When developing new LLM applications, we suggest having LangSmith tracing enabled by default.\n",
    "* It isn’t necessary to look at every single trace. However, **when things go wrong** (an unexpected end result, infinite agent loop, slower than expected execution, higher than expected token usage), **it’s extremely helpful to debug by looking through the application traces**.\n",
    "* LangSmith gives clear visibility and debugging information at each step of an LLM sequence, making it much easier to identify and root-cause issues.\n",
    "* LangSmith provides native rendering of chat messages, functions, and retrieve documents."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "28074912-0703-4688-a10c-a10e6c35ada9",
   "metadata": {},
   "source": [
    "#### Use LangSmith Playground to iterate and experiment\n",
    "* LangSmith provides a playground environment for rapid iteration and experimentation.\n",
    "* This allows you to **quickly test out different prompts and models**. You can open the playground from any prompt or model run in your trace.\n",
    "* Every playground run is logged in the system and can be used to create test cases or compare with other runs."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18f613f3-73ed-4bce-99ce-8bd06ae980fd",
   "metadata": {},
   "source": [
    "#### Use LangSmith Comparison View to compare the performance of alternative prompts, retrieval strategies and model choices \n",
    "* When prototyping different versions of your applications and making changes, it’s important to see whether or not you’ve regressed with respect to your initial test cases.\n",
    "* Oftentimes, **changes in the prompt, retrieval strategy, or model choice can have huge implications in responses produced by your application**.\n",
    "* In order to get a sense for which variant is performing better, it’s useful to be able to view results for different configurations on the same datapoints side-by-side.\n",
    "* LangSmith has invested heavily in a user-friendly comparison view for test runs to track and diagnose regressions in test scores across multiple revisions of your application."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "71cae6e8-2370-4109-a6a2-ca7311e9d756",
   "metadata": {},
   "source": [
    "#### Create a Test Dataset with LangSmith\n",
    "* While many developers still ship an initial version of their application based on “vibe checks”, we’ve seen an increasing number of engineering teams start to adopt a more test driven approach.\n",
    "* LangSmith allows developers to create **test datasets, which are collections of inputs and reference outputs**, and use these to run tests on their LLM applications.\n",
    "* These test cases can be uploaded in bulk, created on the fly, or exported from application traces.\n",
    "* LangSmith also makes it easy to **run custom evaluations (both LLM and heuristic based) to score test results**."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "81ccf212-d6cb-4070-9b89-38d5c0b45ca9",
   "metadata": {},
   "source": [
    "## 2. Beta Testing\n",
    "* Beta testing allows developers to **collect more data on how their LLM applications are performing in real-world scenarios**.\n",
    "* In this phase, it’s important to **develop an understanding for the types of inputs the app is performing well or poorly on** and how exactly it’s breaking down in those cases.\n",
    "* **Both feedback collection and run annotation are critical for this workflow**. This will help in curation of test cases that can help track regressions/improvements and development of automatic evaluations."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "72c3329c-4da8-4492-8f94-efab5b79ba3b",
   "metadata": {},
   "source": [
    "#### Use LangSmith to filter traces with negative human feedback to understand the problems behind them\n",
    "* When launching your application to an initial set of users, it’s important to **gather human feedback** on the responses it’s producing. This helps draw attention to the most interesting runs and highlight edge cases that are causing problematic responses.\n",
    "* **LangSmith allows you to attach feedback scores to logged traces** (oftentimes, this is hooked up to a feedback button in your app), **then filter on traces that have a specific feedback tag and score**.\n",
    "* A common workflow is to filter on traces that receive a poor user feedback score, then drill down into problematic points using the detailed trace view."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b1b4e3a5-6ba6-47f4-b808-2d46a7da0168",
   "metadata": {},
   "source": [
    "#### Use LangSmith to inspect interesting traces and enter annotations about them\n",
    "* LangSmith supports sending runs to annotation queues, which **allow annotators to closely inspect interesting traces and annotate them** with respect to different criteria.\n",
    "* Annotators can be PMs, engineers, or even subject matter experts.\n",
    "* This allows users to catch regressions across important evaluation criteria."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "65947073-c33d-47a6-ac52-4c04dd898c37",
   "metadata": {},
   "source": [
    "#### LangSmith allows you to expand the Test Dataset by adding runs as examples\n",
    "* As your application progresses through the beta testing phase, it's essential to continue collecting data to refine and improve its performance.\n",
    "* **LangSmith enables you to add runs as examples to datasets** (from both the project page and within an annotation queue), **expanding your test coverage on real-world scenarios**.\n",
    "* This is a key benefit in having your logging system and your evaluation/testing system in the same platform."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3493ce92-5d3e-465b-a2f9-be22db623bcc",
   "metadata": {},
   "source": [
    "## 3. Production\n",
    "* Closely inspecting key data points,\n",
    "* growing benchmarking datasets,\n",
    "* annotating traces,\n",
    "* and drilling down into important data in trace view\n",
    "* are workflows you’ll also want to do once your app hits production.\n",
    "* However, especially at the production stage, it’s crucial to **get a high-level overview of application performance with respect to**\n",
    "    * **latency,**\n",
    "    * **cost,**\n",
    "    * **and feedback scores**.\n",
    "* This ensures that it's delivering desirable results at scale."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ace4aeae-e07b-4107-b24d-3bfd9b8cefa8",
   "metadata": {},
   "source": [
    "#### Use LangSmith to monitor key metrics and to mark different versions for A/B Testing prompts, models or retrieval strategies \n",
    "* LangSmith provides monitoring charts that allow you to **track key metrics** over time.\n",
    "* You can expand to view metrics for a given period and drill down into a specific data point to get a trace table for that time period — this is especially handy for debugging production issues.\n",
    "* The platform also allows for **tag and metadata grouping, which allows users to mark different versions of their applications** with different identifiers and view how they are performing side-by-side within each chart.\n",
    "* This is **helpful for A/B testing changes in prompt, model, or retrieval strategy**."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8a0e2bd-7ffe-45a8-a401-f3959941366a",
   "metadata": {},
   "source": [
    "## The road ahead\n",
    "Our future directions include:\n",
    "* Support for regression testing.\n",
    "* Ability to run online evaluators on a sample of production data.\n",
    "* Better filtering and conversation support.\n",
    "* **Easy deployment of applications with hosted LangServe**.\n",
    "* **Enterprise features to support the administration and security needs** for our largest customers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "99281f4d-49ca-4b62-8f88-35332067ff81",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
