WEBVTT

00:00.360 --> 00:03.224
-: Okay, let me walk you through LangSmith.

00:03.224 --> 00:06.480
LangSmith is really important I think for the workflow

00:06.480 --> 00:09.120
of an AI engineer or a prompt engineer

00:09.120 --> 00:12.090
just because it gives you a very easy,

00:12.090 --> 00:16.170
simple way to trace what's happening with not just prompts

00:16.170 --> 00:17.940
but also chains and then agents.

00:17.940 --> 00:19.680
It becomes increasingly important

00:19.680 --> 00:21.870
when there's lots of different calls going on

00:21.870 --> 00:24.390
and I think that's why LangChain focused on this

00:24.390 --> 00:26.550
as their first kind of major project.

00:26.550 --> 00:29.310
The this sort of tutorial is just a shortened version

00:29.310 --> 00:31.860
of the tutorial on their website, which I'll link to.

00:31.860 --> 00:35.070
But in particular the things to note are that

00:35.070 --> 00:37.080
in order to get it running you just need

00:37.080 --> 00:40.530
to use LangChain like you don't need really anything else.

00:40.530 --> 00:42.780
And there are a few different things you need to just add

00:42.780 --> 00:43.980
as environment variables.

00:43.980 --> 00:47.670
You need this LangChain endpoint which sends,

00:47.670 --> 00:51.300
whenever the tracing happens within LangChain,

00:51.300 --> 00:54.210
it sends this to LangSmith.

00:54.210 --> 00:58.340
And then there's also a project name you can give it.

00:58.340 --> 01:00.720
In this case I'll just put in a unique_id

01:00.720 --> 01:03.300
and then also you just need to set TRACING_V2

01:03.300 --> 01:04.860
and equals to true.

01:04.860 --> 01:06.900
That's all you need to get set up and everything else,

01:06.900 --> 01:09.030
whenever you run anything in LangChain,

01:09.030 --> 01:12.300
it's gonna trace, which is really helpful.

01:12.300 --> 01:14.190
You also need to set up your API key,

01:14.190 --> 01:16.470
which you get from within LangSmith.

01:16.470 --> 01:20.075
And then for this tutorial, the agent uses OpenAI.

01:20.075 --> 01:22.740
It also uses SERP API to do searching.

01:22.740 --> 01:25.290
So I've set this up in an environment variable.

01:25.290 --> 01:27.690
I'm not gonna show you my AI API keys,

01:27.690 --> 01:31.590
but that's what the dotenv file would look like.

01:31.590 --> 01:35.160
And then I'm basically just loading dotenv

01:35.160 --> 01:36.900
with this package, but equally you could

01:36.900 --> 01:39.600
just type it in here if you want.

01:39.600 --> 01:41.370
So I'm just gonna run that.

01:41.370 --> 01:44.940
And then all you have to do is from langsmith import Client

01:44.940 --> 01:47.550
and instead of client, if you want to interact with it,

01:47.550 --> 01:51.330
but you don't actually need this for this demo specifically.

01:51.330 --> 01:52.650
The client is you,

01:52.650 --> 01:54.000
if you wanna pull the results back

01:54.000 --> 01:55.410
or if you wanna set up a dataset,

01:55.410 --> 01:56.910
I'll show you what that means.

01:56.910 --> 01:59.190
Okay, everything else is just LangChain code, right?

01:59.190 --> 02:02.580
We're just setting up the ChatOpenAI API,

02:02.580 --> 02:03.510
we've got the tools

02:03.510 --> 02:04.920
and we're setting up an agent

02:04.920 --> 02:09.030
and we're giving it the ll-math and the serp-api tools.

02:09.030 --> 02:11.400
So it can do, it can use a calculator

02:11.400 --> 02:13.680
and it can search the web

02:13.680 --> 02:16.290
and this is a ZERO_SHOT_REACT agent.

02:16.290 --> 02:18.000
So I'm just gonna run that

02:18.000 --> 02:20.340
and if you guys haven't done this before,

02:20.340 --> 02:22.080
what's really interesting about this code,

02:22.080 --> 02:23.430
there's two different things.

02:23.430 --> 02:26.070
So one is that you have an agent out of the box

02:26.070 --> 02:28.710
that will do the reason and act framework.

02:28.710 --> 02:30.750
So it's gonna, when you give it a task like

02:30.750 --> 02:32.310
how many people live in Canada,

02:32.310 --> 02:34.770
it's gonna first gonna check

02:34.770 --> 02:36.450
whether it needs to use one of these tools

02:36.450 --> 02:37.590
and then it's gonna use it

02:37.590 --> 02:39.090
and then using the output,

02:39.090 --> 02:40.950
then it's gonna check again what does it need to do.

02:40.950 --> 02:43.800
What we want for this to do is to search

02:43.800 --> 02:47.130
for the population of Canada and then give you that result.

02:47.130 --> 02:48.330
And then there's things like this like

02:48.330 --> 02:49.980
who is Dua Lipa's boyfriend,

02:49.980 --> 02:52.560
what is his age raised to the .43 power?

02:52.560 --> 02:54.720
So it would need to search the web

02:54.720 --> 02:56.310
to find Dua Lipa's boyfriend,

02:56.310 --> 02:58.385
but it needs to use the calculator

02:58.385 --> 03:00.090
to do the raising of to the power

03:00.090 --> 03:02.670
because LLMS are famously bad at math

03:02.670 --> 03:03.810
and that's really useful.

03:03.810 --> 03:04.980
The other thing that's interesting

03:04.980 --> 03:07.410
about this code is it's using asyncio.

03:07.410 --> 03:09.510
So this is a good simple example

03:09.510 --> 03:12.360
of how you run multiple prompts all at once.

03:12.360 --> 03:14.130
And this is just a list of prompts

03:14.130 --> 03:15.540
that it's gonna run through

03:15.540 --> 03:17.370
and then gather the results at the end.

03:17.370 --> 03:20.010
This is gonna be so much faster actually

03:20.010 --> 03:21.780
than running the one by one.

03:21.780 --> 03:23.400
So if you're testing a bunch of stuff,

03:23.400 --> 03:25.860
I really recommend you look into Async.

03:25.860 --> 03:27.780
So we're just gonna run this

03:27.780 --> 03:31.440
and you see like took 0.3 seconds, super fast

03:31.440 --> 03:34.890
and that's what the basic at 3.5,

03:34.890 --> 03:37.680
but even for GPT-4, this would run pretty fast.

03:37.680 --> 03:40.860
And then you just wanna wait for all traces,

03:40.860 --> 03:42.540
not necessarily really for this,

03:42.540 --> 03:45.240
but this is just to make sure that everything was finished

03:45.240 --> 03:47.340
with this asyncio call.

03:47.340 --> 03:48.810
Alright, that's run.

03:48.810 --> 03:51.840
And now we get into LangSmith

03:51.840 --> 03:53.940
and we can take a look and here we go.

03:53.940 --> 03:56.520
This Tracing Walkthrough has been run again.

03:56.520 --> 03:58.050
And actually just before I get in this,

03:58.050 --> 04:00.180
these are the different projects that we have

04:00.180 --> 04:01.560
and you can see when it was run

04:01.560 --> 04:03.420
what the latency was in particular

04:03.420 --> 04:04.950
very useful and interesting

04:04.950 --> 04:06.960
and then how many times it was run

04:06.960 --> 04:08.460
and how many tokens you used.

04:08.460 --> 04:10.470
So I've had some pretty big tests

04:10.470 --> 04:13.050
where I've used hundreds of thousands of tokens,

04:13.050 --> 04:15.540
which takes a long time and costs a lot of money.

04:15.540 --> 04:18.600
And the latency was super bad 'cause I was using GPT-4.

04:18.600 --> 04:20.910
It's really useful for me to be able to go back

04:20.910 --> 04:23.940
and pick this out and see what the difference is.

04:23.940 --> 04:25.620
When you click into the project,

04:25.620 --> 04:27.570
it has a really good interface actually

04:27.570 --> 04:29.520
for searching into different things.

04:29.520 --> 04:33.420
You could for example, show me like are there any calls

04:33.420 --> 04:35.520
that had used more than

04:35.520 --> 04:37.320
a certain number of tokens or whatever.

04:37.320 --> 04:40.950
You can also search for any errors that you might have.

04:40.950 --> 04:43.770
And actually in this case, there's an error on everything

04:43.770 --> 04:47.610
and that's because this API key is deprecated, right?

04:47.610 --> 04:51.140
You can identify issues like this.

04:51.140 --> 04:54.990
So I'm gonna go into OpenAI,

04:54.990 --> 04:56.730
I'm gonna put in a new API key

04:56.730 --> 04:58.080
and then we're gonna run again.

04:58.080 --> 04:59.100
And so you can see

04:59.100 --> 05:03.120
this is the type of framework that you're doing.

05:03.120 --> 05:05.310
I'm gonna delete this afterwards,

05:05.310 --> 05:10.310
but if I go up here and I change my OpenAI key to this,

05:12.690 --> 05:14.370
I'm gonna delete this one afterwards

05:14.370 --> 05:17.700
so you guys don't rinse me of tokens

05:17.700 --> 05:21.540
and then I'm gonna run this again and then run this again.

05:21.540 --> 05:24.620
Then we can go back iteratively,

05:24.620 --> 05:28.290
iteratively a change any and anything that we find,

05:28.290 --> 05:30.180
any other issues we find with the prompt

05:30.180 --> 05:31.800
and that's what this is doing,

05:31.800 --> 05:33.300
which is super useful I think.

05:33.300 --> 05:35.460
So this is taking 15 seconds

05:35.460 --> 05:39.780
and if we go into LangSmith,

05:39.780 --> 05:41.310
we go back to the Projects.

05:41.310 --> 05:44.250
You see we've got this new run here and now it's all green.

05:44.250 --> 05:45.630
So that's really cool.

05:45.630 --> 05:47.460
We just identified a problem

05:47.460 --> 05:50.310
and we saw an error and we can drill into it.

05:50.310 --> 05:52.080
Here we had had a problem

05:52.080 --> 05:53.970
and it's saying value error, unknown format.

05:53.970 --> 05:56.460
I'm sorry, I can't provide an answer to that question.

05:56.460 --> 05:58.140
So, this is really interesting

05:58.140 --> 06:00.690
because we saw that one in 10 times.

06:00.690 --> 06:04.620
It's gonna say, "No, I don't know how to do this."

06:04.620 --> 06:05.670
So that's really interesting.

06:05.670 --> 06:07.237
You can flag that and say,

06:07.237 --> 06:08.370
"Okay, there's an issue here

06:08.370 --> 06:10.980
that this happens fairly often."

06:10.980 --> 06:13.050
Then for the ones where it did work,

06:13.050 --> 06:15.360
like you can see what the latency was,

06:15.360 --> 06:17.280
you can see how many tokens you used

06:17.280 --> 06:19.740
and you can also just see what it did, right?

06:19.740 --> 06:21.235
What did it do?

06:21.235 --> 06:22.980
So the input was, what is this?

06:22.980 --> 06:24.117
And then the output is this.

06:24.117 --> 06:26.173
And so how did it get there?

06:26.173 --> 06:28.417
So first the LLM chain said,

06:28.417 --> 06:30.450
"Here's the question that I have."

06:30.450 --> 06:31.297
And the output was,

06:31.297 --> 06:33.660
"I need to perform a division calculation

06:33.660 --> 06:35.730
and then I'm gonna take the action of calculator

06:35.730 --> 06:38.850
and then I'm going to use this for the calculator."

06:38.850 --> 06:40.020
Then it called the tool

06:40.020 --> 06:42.900
and the calculator gave it this answer

06:42.900 --> 06:44.760
and then it fed this back in

06:44.760 --> 06:46.770
and it said, "Here's my observation,

06:46.770 --> 06:49.170
here's what I had decided to do."

06:49.170 --> 06:51.600
And then that was the input and then that was the output.

06:51.600 --> 06:53.790
And then it says, "Now I know the final answer.

06:53.790 --> 06:55.470
The final answer is this."

06:55.470 --> 06:58.620
The actual output ended up being correct.

06:58.620 --> 07:00.360
So you can actually step through this code.

07:00.360 --> 07:02.220
And this is so important when you have agents

07:02.220 --> 07:05.190
because they go off the rails all the time

07:05.190 --> 07:07.800
and being able to see this is is super useful.

07:07.800 --> 07:08.880
I'm gonna click into this one just

07:08.880 --> 07:11.100
so you see, "Who's Kendall Jenner's boyfriend?"

07:11.100 --> 07:14.070
And it looks like it's got the right answer.

07:14.070 --> 07:15.090
So we're gonna dig in.

07:15.090 --> 07:17.997
First it says, "I need to find out who his boyfriend is."

07:17.997 --> 07:21.210
And so I'm gonna search for Kendall Jenner's boyfriend.

07:21.210 --> 07:24.780
The result is that you got this response

07:24.780 --> 07:27.990
and then it doesn't give the actual current boyfriend,

07:27.990 --> 07:28.950
I need to search again.

07:28.950 --> 07:31.920
Now it's saying current boyfriend has adapted its search

07:31.920 --> 07:34.590
and then it's coming back and it's getting this answer

07:34.590 --> 07:37.710
that it was player Devin Booker

07:37.710 --> 07:39.780
and then now it knows the boyfriend.

07:39.780 --> 07:41.610
So it's gonna keep on going through

07:41.610 --> 07:44.310
and now I know the boyfriend, now I can use the calculator.

07:44.310 --> 07:46.260
So this is a really cool way to see

07:46.260 --> 07:47.580
what agents are actually doing

07:47.580 --> 07:49.230
and you can see how long it took

07:49.230 --> 07:52.080
and how many calls as well, how many tokens.

07:52.080 --> 07:54.720
So really useful to be able to do this.

07:54.720 --> 07:57.390
You can also dig into what were the individual calls,

07:57.390 --> 08:00.210
what specifically did the prompt say?

08:00.210 --> 08:03.600
And look at this as the actual data here

08:03.600 --> 08:05.760
that you got from this.

08:05.760 --> 08:07.800
And this is how it decided

08:07.800 --> 08:09.660
between what to do with that data.

08:09.660 --> 08:12.060
So you can really drill into anything you need

08:12.060 --> 08:14.610
and you can see how many tokens are used in each.

08:14.610 --> 08:17.580
That is, I mean, in itself incredibly useful.

08:17.580 --> 08:19.710
But you can also take it from here

08:19.710 --> 08:21.750
and set up things like datasets

08:21.750 --> 08:23.220
and you can do evaluations

08:23.220 --> 08:25.650
as here's like an evaluation framework.

08:25.650 --> 08:28.860
You rating it based on correctness or helpfulness

08:28.860 --> 08:31.170
and then you can actually export this

08:31.170 --> 08:33.300
and start using it for fine tuning as well.

08:33.300 --> 08:35.970
So it's a whole workflow and it's super useful.

08:35.970 --> 08:39.870
Highly recommend you try it.

08:39.870 --> 08:41.580
It is I think still in beta,

08:41.580 --> 08:42.810
so you might need to talk

08:42.810 --> 08:44.970
to someone at LangSmith or LangChain.

08:44.970 --> 08:46.440
That's again access like I did,

08:46.440 --> 08:47.910
but I get on this early

08:47.910 --> 08:50.730
'cause I think evaluation is continually coming up

08:50.730 --> 08:52.320
as the number one problem

08:52.320 --> 08:54.780
that people have when they're using AI in production.

08:54.780 --> 08:56.673
Hopefully you guys can get in early.