1
00:00:03,370 --> 00:00:03,700
Okay.

2
00:00:03,700 --> 00:00:08,440
So now let's talk about the beta testing phase.

3
00:00:08,860 --> 00:00:17,200
The main challenge we have and how long chain is going to help us solve this main challenge.

4
00:00:17,440 --> 00:00:27,550
So remember that the beta testing phase allows developers to collect more data on how their LM application

5
00:00:27,550 --> 00:00:30,940
is performing in real world scenarios.

6
00:00:31,570 --> 00:00:39,040
In this phase, it is important to develop an understanding for the types of inputs the app is performing

7
00:00:39,040 --> 00:00:46,930
well or poorly on and on how exactly it's breaking down in those cases.

8
00:00:47,770 --> 00:00:54,670
Both feedback collection and run annotation are critical for this workflow.

9
00:00:54,700 --> 00:00:58,090
This is what the long chain team is telling us.

10
00:00:58,090 --> 00:01:04,480
Both feedback collection and run annotation are critical for this workflow.

11
00:01:04,599 --> 00:01:11,830
Don't worry, we are going to see how feedback collection operates in the practical part of the of the

12
00:01:11,830 --> 00:01:15,130
blog and how to create annotations.

13
00:01:16,010 --> 00:01:23,870
These will help in curation of test cases that can help track regression or improvements and development

14
00:01:23,870 --> 00:01:26,360
of automatic evaluation.

15
00:01:26,360 --> 00:01:39,500
So the main challenge that long a long chain team has identified that the LM app developer teams have

16
00:01:39,500 --> 00:01:47,180
in the beta testing phase is how to process and analyze the feedback of the initial users.

17
00:01:49,360 --> 00:01:59,170
And they have the solution for this challenge, which is to use Lang Smith to filter traces with negative

18
00:01:59,170 --> 00:02:02,740
human feedback to understand the problems behind them.

19
00:02:03,250 --> 00:02:10,630
To use Lang Smith to inspect interesting traces and enter annotations about them, and to use Lang Smith

20
00:02:10,630 --> 00:02:14,920
to span the test data set by adding runs and examples.

21
00:02:15,220 --> 00:02:21,340
Let's see each of these solutions that the Lang Smith team has found.

22
00:02:21,370 --> 00:02:29,680
First, use Lang Smith to filter traces with negative human feedback to understand the problems behind

23
00:02:29,680 --> 00:02:30,100
them.

24
00:02:31,230 --> 00:02:38,700
When launching your application to an initial set of users, it is important to gather human feedback

25
00:02:38,700 --> 00:02:41,220
on the responses it is producing.

26
00:02:41,460 --> 00:02:49,740
This helps draw attention to the most interesting runs and highlight edge cases that are causing problematic

27
00:02:49,740 --> 00:02:50,400
responses.

28
00:02:50,430 --> 00:02:56,670
Okay, so these are again the conclusions from the A long chain team.

29
00:02:57,330 --> 00:03:04,530
They tell us that it is very important to gather human feedback on the responses our application is

30
00:03:04,530 --> 00:03:05,490
producing.

31
00:03:06,790 --> 00:03:14,830
So Lang Smith they continue allows you to attach feedback scores to log traces.

32
00:03:14,860 --> 00:03:19,810
Oftentimes this is hooked up to a feedback button in your app.

33
00:03:19,810 --> 00:03:24,970
Then filter on traces that have specific feedback tag and score.

34
00:03:24,970 --> 00:03:33,610
So remember how chat GPT a is getting feedback from you?

35
00:03:33,610 --> 00:03:36,970
This thumbs up and thumbs down button.

36
00:03:36,970 --> 00:03:42,370
So this is what the long chain team is talking about, right?

37
00:03:42,370 --> 00:03:51,400
This is the positive or negative feedback we usually have from users in an LM application.

38
00:03:51,400 --> 00:03:55,600
You can be much more sophisticated than this, but these are the most common.

39
00:03:56,050 --> 00:04:05,620
So a common workflow is to filter on traces that receive a a poor user feedback score.

40
00:04:05,620 --> 00:04:11,410
And then drill down into problematic points using the detailed trace trace view.

41
00:04:11,440 --> 00:04:16,750
We will see this in action in the next practical lesson.

42
00:04:16,750 --> 00:04:17,290
Right.

43
00:04:17,620 --> 00:04:23,500
But I think the a conceptual explanation is clear.

44
00:04:23,500 --> 00:04:31,720
So one of the things one of the, the, the things we can do in order to understand the feedback from

45
00:04:31,720 --> 00:04:37,210
our initial users is to filter traces with negative human feedback.

46
00:04:37,360 --> 00:04:38,080
Okay.

47
00:04:38,080 --> 00:04:43,390
So we will use Lang Smith in order to do that next.

48
00:04:44,740 --> 00:04:52,870
In order to solve this big challenge, how to process and analyze the feedback of the initial users.

49
00:04:52,870 --> 00:05:00,730
We can also use Lang Smith to inspect interesting traces and enter annotations about them.

50
00:05:02,460 --> 00:05:10,560
Lang Lang Smith supports sending runs to annotation queues, which allow annotators to closely inspect

51
00:05:10,560 --> 00:05:15,120
interesting traces and annotate them with respect to different criteria.

52
00:05:15,120 --> 00:05:19,680
So this is what the Lang Chain team is telling us, right?

53
00:05:19,680 --> 00:05:28,260
We will see how we are going to use annotations when we are working with real applications, but you

54
00:05:28,260 --> 00:05:28,950
get the point.

55
00:05:28,950 --> 00:05:40,710
So these annotations are our own feedback to what we observe in the, uh, Lang Lang Smith traces.

56
00:05:40,710 --> 00:05:41,280
Right.

57
00:05:41,280 --> 00:05:50,220
And remember that in this case, we are not just talking about the LM app developer, we are also talking

58
00:05:50,220 --> 00:05:58,950
about other people involved in the development of this application, like product managers or even subject

59
00:05:58,950 --> 00:06:07,950
matter experts, who probably are not technical at all, but they understand the a the the goal of our

60
00:06:07,950 --> 00:06:08,640
application.

61
00:06:08,640 --> 00:06:09,030
Right.

62
00:06:09,030 --> 00:06:17,550
That's why, uh, the, the Lang Smith or Lang chain team is telling us annotators can be product managers,

63
00:06:17,580 --> 00:06:22,620
PMS, engineers or even subject matter experts.

64
00:06:23,280 --> 00:06:27,000
This allows users to catch users.

65
00:06:27,000 --> 00:06:29,280
Here are Lang Smith users.

66
00:06:29,490 --> 00:06:36,330
So this allows the Lang Smith users to catch regression across important evaluation criteria.

67
00:06:36,780 --> 00:06:37,260
Okay.

68
00:06:39,630 --> 00:06:48,540
Finally, we can use Lang Smith to span the test data set by adding runs as examples.

69
00:06:49,020 --> 00:06:53,250
Remember, we are going to talk about all this terminology in the next block.

70
00:06:53,250 --> 00:06:57,930
What is a run, what is a trace, what is an LM call, etc..

71
00:06:58,170 --> 00:07:06,300
So as your application progresses through the beta testing phase, it is essential to continue collecting

72
00:07:06,300 --> 00:07:09,900
data to refine and improve its performance.

73
00:07:10,320 --> 00:07:19,170
Lang Smith enables you to add runs as examples to data sets from both the project page and within an

74
00:07:19,170 --> 00:07:20,340
annotation queue.

75
00:07:20,940 --> 00:07:25,500
Expanding your test coverage on real world scenarios.

76
00:07:26,930 --> 00:07:28,040
So.

77
00:07:29,400 --> 00:07:38,340
In short, when we are talking about traces, NLM calls and runs, we are talking about different.

78
00:07:39,160 --> 00:07:51,100
Degrees of a detail when we are inspecting or researching what is happening under the hood.

79
00:07:51,100 --> 00:08:01,300
So what we are seeing here is that in the test data set, we not only can add traces, we can also add

80
00:08:01,300 --> 00:08:02,110
runs.

81
00:08:02,110 --> 00:08:06,700
We will see how to do that when we go to the practical exercise.

82
00:08:07,210 --> 00:08:15,760
This is a key benefit in having your logging system and your evaluation slash testing system in the

83
00:08:15,760 --> 00:08:17,260
same platform.

84
00:08:17,260 --> 00:08:22,000
So this is one of the main benefits of Lamb-smith.

85
00:08:23,210 --> 00:08:34,520
So Lang Smith is presented as a full LM ops platform, and even when it is a probably the new kid on

86
00:08:34,520 --> 00:08:42,380
the block where you can find other adults in the room, you know, so there are other applications that

87
00:08:42,380 --> 00:08:48,110
are, you know, very solid and, uh, with more years than Lang Smith.

88
00:08:48,110 --> 00:08:56,270
But the very interesting thing about the Lang Smith platform is that this is developed by the Lang Chain

89
00:08:56,270 --> 00:08:56,810
team.

90
00:08:56,810 --> 00:09:08,570
So, uh, probably you don't have other people with better knowledge about LM application development

91
00:09:08,570 --> 00:09:11,840
teams than the Lang chain team.

92
00:09:11,840 --> 00:09:21,410
So it is very interesting that with this knowledge and expertise that they have from observing LM application

93
00:09:21,410 --> 00:09:28,970
development teams, uh, working with all this knowledge and expertise, they have created this new

94
00:09:28,970 --> 00:09:30,590
LM ops platform.

95
00:09:30,590 --> 00:09:30,920
Okay.

96
00:09:30,920 --> 00:09:32,870
So this is very important for us.

97
00:09:33,350 --> 00:09:40,520
In the next, uh, lesson, we are going to talk about how Lang Smith is helping us.

98
00:09:41,330 --> 00:09:45,770
To solve the challenges in the production phase.