1
00:00:03,310 --> 00:00:05,710
Okay, so how long?

2
00:00:05,710 --> 00:00:06,910
Chain locksmith.

3
00:00:06,910 --> 00:00:07,630
Excuse me.

4
00:00:07,630 --> 00:00:10,300
Help us during the production phase.

5
00:00:12,220 --> 00:00:20,110
The first thing we can do is to keep using Lang Smith, as in the beta testing phase, to keep processing

6
00:00:20,110 --> 00:00:21,880
and analyzing user feedback.

7
00:00:21,880 --> 00:00:25,900
So this doesn't stop when we enter in the production phase.

8
00:00:25,900 --> 00:00:32,409
We still need to keep processing and analyzing user feedback, so we will keep doing the same thing

9
00:00:32,409 --> 00:00:35,200
we started in the beta testing phase.

10
00:00:35,200 --> 00:00:35,830
Okay.

11
00:00:35,890 --> 00:00:42,370
Second thing how how to use Lang Smith to monitor key metrics.

12
00:00:42,880 --> 00:00:49,780
So in order to measure the performance of the app, we are going to use Lang Smith to monitor key metrics.

13
00:00:49,780 --> 00:00:51,760
Let's take a look at the platform.

14
00:00:53,770 --> 00:01:02,740
So with the monitor tab of the project, if we go to one project, we see that there is one tab called

15
00:01:02,740 --> 00:01:03,370
monitor.

16
00:01:03,370 --> 00:01:08,170
So in this tab we can monitor the key metrics.

17
00:01:08,170 --> 00:01:17,590
So you see here trace count LM call count trace success rates alarm call success rates latency LM calls

18
00:01:17,590 --> 00:01:21,190
per trace feedback tokens.

19
00:01:21,190 --> 00:01:24,940
Tokens per call cost okay.

20
00:01:24,940 --> 00:01:28,810
So you have streaming.

21
00:01:29,630 --> 00:01:31,790
Trace time to first token, etc..

22
00:01:31,820 --> 00:01:32,450
Okay.

23
00:01:32,600 --> 00:01:37,580
This here this P50 is the the median.

24
00:01:37,580 --> 00:01:43,160
This P 95 is the highest value or something like that.

25
00:01:43,730 --> 00:01:50,870
So we will learn more and more about the different ways of presenting these, uh, charts.

26
00:01:50,870 --> 00:01:58,040
In my opinion, they are going to improve this dashboard a lot in the coming versions.

27
00:01:58,040 --> 00:02:01,910
Right now is okay, but I think it's going to be much better.

28
00:02:01,910 --> 00:02:04,610
So you have here different possibilities.

29
00:02:04,670 --> 00:02:11,720
But eh, in my opinion, they are going to improve this area with a little bit of work.

30
00:02:11,720 --> 00:02:16,670
It can be much, much, much, much better in the visual display.

31
00:02:16,850 --> 00:02:24,140
So key metrics are cost, number of tokens per trace, latency, etc..

32
00:02:24,620 --> 00:02:33,140
Lang Smith allows for tag and metadata grouping, which allows users to mark different versions of their

33
00:02:33,140 --> 00:02:42,110
applications with different identifiers and view how they are performing side by side within each chart.

34
00:02:42,110 --> 00:02:49,070
This is helpful for a B testing changes in prompt model or retrieval strategy.

35
00:02:49,070 --> 00:02:57,860
So when you are working, uh, in this, uh, well, let's keep reading because you are going to see

36
00:02:57,860 --> 00:03:00,170
that there is something very interesting here.

37
00:03:00,170 --> 00:03:10,130
So after opening the monitor tab click on the tag or metadata buttons, tag or metadata buttons that

38
00:03:10,130 --> 00:03:16,670
are located on the top and select the metadata or tag that you want to use to display the monitoring

39
00:03:16,670 --> 00:03:17,300
data.

40
00:03:17,300 --> 00:03:24,980
Apart from AB testing, monitoring is a great way to see if your application in production is performing

41
00:03:24,980 --> 00:03:27,260
better or worse with the time.

42
00:03:28,160 --> 00:03:29,060
So.

43
00:03:31,480 --> 00:03:32,650
One way.

44
00:03:32,650 --> 00:03:35,200
One very, uh.

45
00:03:36,110 --> 00:03:39,890
Good way to improve your application.

46
00:03:41,010 --> 00:03:51,030
Is to, uh, prepare different versions with small improvements and then compare the performance of

47
00:03:51,030 --> 00:03:54,300
these new versions with the current one you are having.

48
00:03:54,750 --> 00:03:57,750
You will see that we have different strategies to do that.

49
00:03:57,750 --> 00:04:00,180
We are going to see them later.

50
00:04:00,180 --> 00:04:08,130
But here you see that the visual display of the monitoring, uh.

51
00:04:09,760 --> 00:04:16,899
Indicators is a very good way to a compare.

52
00:04:17,350 --> 00:04:25,630
The visually is very easy and fast to see if one if one version is better than another.

53
00:04:25,630 --> 00:04:35,230
Okay, so we can use Lang Smith to monitor key metrics, and we can also use Lang Smith to mark different

54
00:04:35,230 --> 00:04:40,390
versions for a B testing of prompts, models, or retrieval strategies.

55
00:04:40,390 --> 00:04:50,410
This is the these are the three things that most frequently are we going to compare different prompts

56
00:04:50,440 --> 00:04:52,480
different LM models.

57
00:04:52,480 --> 00:05:01,120
So using OpenAI as LM model or using anthropic or using a mistral or Lama to.

58
00:05:02,170 --> 00:05:07,090
Lambda three SIM preparation or different retrieval strategies.

59
00:05:07,090 --> 00:05:16,270
So one of the most interesting uses of monitoring is to compare the performance of different versions

60
00:05:16,270 --> 00:05:18,250
of your LM app.

61
00:05:19,090 --> 00:05:28,450
Clicking on the metadata metadata button and selecting one particular metadata parameter will show you

62
00:05:28,450 --> 00:05:32,110
how the different versions of your application behave.

63
00:05:32,320 --> 00:05:40,060
It allows you to a B test different configurations, for example using different LM models, prompts

64
00:05:40,060 --> 00:05:47,800
or retrieval strategies of your application and see how each of them impacts on performance metrics,

65
00:05:47,800 --> 00:05:52,030
cost, number of tokens per trace, latency, etc..

66
00:05:52,060 --> 00:05:58,840
Okay, we are going to see some very interesting advanced tips associated with some of the things we

67
00:05:58,840 --> 00:06:00,220
have been talking here.

68
00:06:00,220 --> 00:06:05,680
Okay, so be ready for the next lesson because you are going to find it very interesting.

69
00:06:05,680 --> 00:06:11,380
The advanced tips we are going to share with you about Lang Smith.

