WEBVTT

00:00.160 --> 00:05.960
So today we're going to talk about chat model objects and LM objects.

00:05.960 --> 00:06.840
In Linkchain.

00:08.000 --> 00:15.560
The chat model object is often going to be our primary interface for interacting with large language

00:15.560 --> 00:24.760
models, and it is a standard way that linkchain help us to talk to those llms like GPT four, Anthropic

00:24.760 --> 00:32.600
Cloud, Google, Gemini, and even open source models like Llama by Facebook via llama.

00:34.560 --> 00:43.240
So historically, many Llms just took a single string of text as an input and returned us a single string

00:43.240 --> 00:44.120
of text.

00:44.360 --> 00:52.280
However, modern llms are designed for conversation and interaction between a user, so they work best

00:52.280 --> 00:59.200
when we provide them with a list of messages representing the dialogue and the history, and they return

00:59.200 --> 01:00.620
us a message back.

01:00.900 --> 01:03.900
And this is the core of the chat model interface.

01:04.220 --> 01:11.260
The input is going to be a list of structured messages like a system instruction, user questions or

01:11.500 --> 01:12.380
responses.

01:12.620 --> 01:18.780
And the output is going to be an I message representing the response of the large language model.

01:19.180 --> 01:27.860
However, chat model objects are very powerful and beyond generating human like text, they have very

01:27.860 --> 01:29.220
cool capabilities.

01:29.540 --> 01:34.820
So first of all, they have tool calling for LMS that supports tool calling.

01:34.980 --> 01:38.820
And this allows the LM to interact with outside world.

01:39.140 --> 01:45.020
So let's take the example of a math calculation without tool calling the LM could simply hallucinate

01:45.020 --> 01:47.220
the answer of the math equation.

01:47.340 --> 01:53.580
However, with tool calling, it can select and execute a math tool to provide the answer and link chain.

01:53.580 --> 02:01.290
Provide us a standard way to bind these tools to our Are LMS, letting us build agents that can perform

02:01.290 --> 02:08.010
actions based on the user requests like sending emails, fetching data from a database, or integrating

02:08.010 --> 02:09.730
with any external API.

02:09.890 --> 02:15.290
And we dive very deep on tool calling and tools and tools execution in this course.

02:15.970 --> 02:22.250
However, tool calling is not the only thing we can do with the tool.

02:22.250 --> 02:28.610
Calling under the hood can also be used to extract structured information from unstructured text.

02:28.610 --> 02:31.970
And we'll be elaborating on this in the course.

02:32.610 --> 02:38.570
And this enables us with another key capability which is called structured output.

02:38.890 --> 02:43.210
So often we don't want the free form text from data.

02:43.810 --> 02:52.410
We might want a specific format like JSON or a pedantic object that our application can easily process

02:52.490 --> 02:53.250
downstream.

02:53.810 --> 02:58.990
For example, if a user provides context detail in a sentence.

02:59.150 --> 03:06.950
We might want to extract the name, the email and phone number into a predictable JSON or Pydantic object.

03:07.230 --> 03:13.790
In modern jet models, can be prompted or explicitly told to respond in this kind of format with the

03:13.830 --> 03:16.190
given scheme that we give it in.

03:16.430 --> 03:22.990
Chain simplifies it with methods like we've structured output, which we are going to cover in this

03:22.990 --> 03:23.750
course as well.

03:24.430 --> 03:27.190
And let's talk now about Multi-modality.

03:27.870 --> 03:35.150
So while text is the base for everything, models these days are increasingly capable of processing

03:35.190 --> 03:39.030
other data types, particularly images and videos.

03:39.310 --> 03:45.350
And you could potentially send a picture along your text prompt, for example, asking the model to

03:45.350 --> 03:50.870
describe the content of an image or to analyze it or do other tasks on that team.

03:50.910 --> 03:57.620
In blockchain acts as a crucial layer of abstraction over these different models and their capabilities.

03:57.980 --> 04:04.180
So instead of writing specific code for OpenAI, then different code for anthropic and another code

04:04.180 --> 04:11.140
for Google, we can simply use link chains based chat model interface, which will give us a consistent

04:11.140 --> 04:14.380
way to call the models regardless of the provider.

04:14.860 --> 04:22.420
So we'll find integrations for many popular providers like OpenAI, anthropic, Google, Azure, OpenAI,

04:22.420 --> 04:23.260
and many more.

04:23.420 --> 04:30.780
And they are neatly organized into packages like link chain, OpenAI, link chain, llama link chain,

04:30.780 --> 04:38.740
vertex AI, etc. so this doesn't only standardize the interface when it comes to talking those models.

04:38.740 --> 04:44.020
We can get also crucial features when we are developing an LLM application.

04:44.260 --> 04:52.780
So the chat models support functionality like asynchronous operations making multiple calls concurrently.

04:53.460 --> 05:01.560
Batch processing, so sending many requests all at once, and a robust streaming API where we get the

05:01.560 --> 05:09.080
model's output token by token as it's generated in real time, providing us with a better user experience.

05:09.080 --> 05:16.920
For example, if we're using a chat and all of it comes with a seamless integration with Langschmidt,

05:16.960 --> 05:22.680
their tracing and debugging platform to help us monitor our application.

05:22.880 --> 05:24.960
So eventually we get here.

05:24.960 --> 05:26.000
Everything together.

05:26.560 --> 05:27.200
All right.

05:27.200 --> 05:31.240
So let's discuss now how can we interact with these models in link chain.

05:31.800 --> 05:39.600
And the basic model interface which all the chat models derive from it provide us with several key methods.

05:39.600 --> 05:42.560
And the most fundamental one is called invoke.

05:43.400 --> 05:50.080
And invoke is going to take a list of messages and return us a single response message.

05:50.320 --> 05:57.550
We also have the stream method, which is usually for real time application where it will yield output

05:57.550 --> 05:59.390
chunks as they are generated.

05:59.630 --> 06:06.110
And this is very useful if we want to build chat interfaces where we want to get the response token

06:06.110 --> 06:13.990
by token, and if we have many prompts to process, the batch method lets us send all of them efficiently

06:14.150 --> 06:15.110
in groups.

06:15.990 --> 06:22.470
We also have the bind tools method, which will allow us to attach external tools to the model.

06:22.470 --> 06:29.510
Enable tool calling and the with structured output is a convenient wrapper for getting response in a

06:29.510 --> 06:31.350
structured format directly.

06:31.710 --> 06:37.510
All right, I want to elaborate a bit about the input and output of those elements in link chain.

06:37.630 --> 06:40.750
And these are structured as messages.

06:40.910 --> 06:44.230
And these messages are typically have a role.

06:44.230 --> 06:52.050
So we can have a system role for instructions, a human role for the user input or an assistant role

06:52.050 --> 06:59.410
for the AI's response, and the content can be simple texts or a list of content blocks, which is where

06:59.410 --> 07:02.130
multimodal data like images can fit in.

07:03.050 --> 07:06.490
All right, let's talk about initializing a chat model.

07:06.690 --> 07:10.770
And when we do that we need to configure it using various parameters.

07:11.210 --> 07:16.250
Link chain standardizes many common ones across the model providers.

07:16.650 --> 07:24.410
You'll almost always specify the model name like GPT four or Claude three sonnet, and the temperature

07:24.410 --> 07:26.970
parameter controls the creativity.

07:27.010 --> 07:35.770
A lower value like 0.0 makes the output more deterministic and more focused, while a higher value like

07:35.770 --> 07:39.610
1.0 makes it more random and creative.

07:40.170 --> 07:46.250
We have other values like mixed tokens, which limits the length of the response of the LLM, which

07:46.250 --> 07:50.600
is useful for controlling costs and to optimizing output size.

07:50.840 --> 07:56.920
We also have stop sequences, which tells the model when to stop generating text, which is helpful

07:57.040 --> 07:59.000
in specific formatting techniques.

07:59.320 --> 08:03.720
And we are elaborating a lot on this stop argument in this course.

08:04.640 --> 08:11.640
We have the timeout and max retries, which are critical for robustness and handling network issues

08:11.640 --> 08:17.040
or temporarily provider problems, which when we go and scale happens a lot.

08:17.360 --> 08:24.400
And naturally, we'll need of course to provide the API key and potentially a base URL if you're using

08:24.400 --> 08:26.920
some kind of cloud provided service.

08:27.280 --> 08:31.600
So keep in mind that link chain standardize many parameters.

08:31.840 --> 08:35.920
Some models have unique parameters specific to the provider.

08:36.160 --> 08:39.440
So always check that specific integration documentation.

08:40.000 --> 08:45.200
And we are going to see an example of this when we're going to work with Google's Gemini.