1
00:00:05,470 --> 00:00:11,470
So let's first start talking about what it is LM ops.

2
00:00:17,350 --> 00:00:27,040
So LM ops refers to the practices and tools used to operate and maintain LM applications in production

3
00:00:27,040 --> 00:00:28,060
environment.

4
00:00:29,170 --> 00:00:40,240
This terme is analogous to DevOps or ML ops, but specifically adapted to the peculiarities of LM applications.

5
00:00:40,240 --> 00:00:49,930
So DevOps is the terme we use in conventional software development, and ML ops is the terme we use

6
00:00:49,930 --> 00:00:55,930
to, uh, apply to the machine learning traditional, uh, environment.

7
00:00:57,010 --> 00:01:08,170
LM ops covers the entire life cycle of an LM application, from the development to deployment monitoring,

8
00:01:08,470 --> 00:01:12,130
uh, to their continuous updating and maintenance.

9
00:01:14,280 --> 00:01:24,420
Sometimes we get confused with terms like observability, monitoring, evaluation, guardrails, and

10
00:01:24,420 --> 00:01:27,360
others regarding LM ops.

11
00:01:27,990 --> 00:01:33,240
So let's talk a little bit about these four topics.

12
00:01:34,290 --> 00:01:43,800
Lopes represents a comprehensive approach to managing LM applications, encompassing everything from

13
00:01:43,800 --> 00:01:48,120
development to maintenance of these apps in production environments.

14
00:01:48,970 --> 00:01:57,310
Unlike more specific concepts such as observability, monitoring, evaluation, and guardrails, which

15
00:01:57,310 --> 00:02:06,670
are components or aspects of LM ops, this terms encompasses the complete operational management of

16
00:02:06,670 --> 00:02:07,750
LM ops.

17
00:02:08,680 --> 00:02:17,050
These other components are critical to the success and sustainability of lmps in the real world, ensuring

18
00:02:17,050 --> 00:02:21,940
their performance, reliability, and ethical compliance.

19
00:02:22,330 --> 00:02:25,270
So let's talk a little bit about.

20
00:02:26,650 --> 00:02:28,870
Each of them first.

21
00:02:28,870 --> 00:02:30,190
Observability.

22
00:02:31,300 --> 00:02:40,090
It refers to the ability to understand the internal state of an LM application from its external outputs.

23
00:02:40,090 --> 00:02:46,480
So when you say internal states external outputs, it's talking about questions and answers.

24
00:02:46,480 --> 00:02:57,610
So observability is going to focus on the quality of an LM application A to get good response a accurate

25
00:02:57,610 --> 00:03:01,450
accurate responses from questions.

26
00:03:02,570 --> 00:03:12,320
Observability is an observability in an Elm application implies having visibility on how the model processes

27
00:03:12,320 --> 00:03:20,360
and responds to inputs, how the Elm application processes and responds to inputs, which is crucial

28
00:03:20,360 --> 00:03:25,070
for diagnosing problems or understanding its behavior.

29
00:03:25,460 --> 00:03:30,170
So observability is one component of Elm Ops.

30
00:03:30,200 --> 00:03:30,980
Okay.

31
00:03:32,320 --> 00:03:38,440
Second component that we are going to know a little bit better.

32
00:03:38,440 --> 00:03:39,610
Monitoring.

33
00:03:40,180 --> 00:03:48,250
Monitoring is another component of LM ops focused on the continuous surveillance of the performance

34
00:03:48,250 --> 00:03:52,690
and health of the LM application in production.

35
00:03:53,570 --> 00:04:02,690
This includes tracking key metrics such as response latency and set accuracy and resource consumption.

36
00:04:03,470 --> 00:04:11,120
Monitoring is essential to ensure that the LM application functions as intended and to quickly identify

37
00:04:11,120 --> 00:04:12,230
any problems.

38
00:04:12,230 --> 00:04:17,870
Okay, so observability focus on questions and answers.

39
00:04:18,410 --> 00:04:25,730
Monitoring questions on key metrics A monitoring focus on key metrics.

40
00:04:26,180 --> 00:04:27,320
Evaluation.

41
00:04:28,400 --> 00:04:36,620
Evaluation involves the periodic assessment of the effectiveness and accuracy of the LM application.

42
00:04:37,250 --> 00:04:44,900
Evaluation can include testing with new data, comparisons with benchmarks or standards, and analysis

43
00:04:44,900 --> 00:04:46,310
of user feedback.

44
00:04:47,340 --> 00:04:54,300
It is a crucial step to ensure that the LLM application remains relevant and useful over time.

45
00:04:54,300 --> 00:05:03,630
So evaluation when you are a in the last stages of development of your professional LLM application

46
00:05:03,810 --> 00:05:14,250
evaluation means testing and testing is different in LLM applications from a conventional software applications.

47
00:05:14,790 --> 00:05:21,090
Remember the expression lack of reproducibility that we have used before.

48
00:05:21,570 --> 00:05:23,760
This is uh applied here.

49
00:05:23,760 --> 00:05:33,600
So in the same question is not always going to have the same answer in an LLM application.

50
00:05:33,720 --> 00:05:39,930
Even when the meaning is the same the wording may differ.

51
00:05:39,960 --> 00:05:40,710
Okay.

52
00:05:40,710 --> 00:05:47,430
So a conventional software application will always give you, if it is correct, will always give you

53
00:05:47,430 --> 00:05:53,760
the same answer to one, uh, question in LLM applications.

54
00:05:53,760 --> 00:06:03,000
If I, for example, ask a chatbot, the wife of Napoleon, in some cases, uh, is going to give me

55
00:06:03,000 --> 00:06:07,410
just the first name in some others are give me the first and the second name.

56
00:06:07,530 --> 00:06:15,300
Uh, in some other cases, the same chat, uh, bot is going to tell me, you know, the, the aristocratic

57
00:06:15,300 --> 00:06:17,730
title of the person or whatever.

58
00:06:17,730 --> 00:06:27,720
So the response is going to be accurate or correct, but we cannot measure this accuracy the same way

59
00:06:27,720 --> 00:06:35,370
we, uh, we use with a conventional, uh, software applications, because we will with a conventional

60
00:06:35,370 --> 00:06:42,720
software application, we will say every time the user asks you two plus two, the answer is going to

61
00:06:42,720 --> 00:06:43,470
be four.

62
00:06:43,470 --> 00:06:46,320
And that's it is nothing else is four.

63
00:06:46,320 --> 00:06:48,750
If it is, four is correct, the test is correct.

64
00:06:48,750 --> 00:06:50,550
If it is not for the test is not correct.

65
00:06:50,550 --> 00:06:56,280
With early applications, there's a different scenario as you as you may see.

66
00:06:56,280 --> 00:06:59,040
So we need different approaches and different tools.

67
00:06:59,040 --> 00:07:01,740
We will talk more about that later.

68
00:07:02,480 --> 00:07:13,760
So evaluation is going to be an LM ops, uh, component in the final stages of development of a professional

69
00:07:14,240 --> 00:07:20,720
LM applications, and also during the life of the application.

70
00:07:22,700 --> 00:07:29,600
Last thing and that is is worthy to to to consider now guard rails.

71
00:07:30,300 --> 00:07:38,760
So guardrails are control and safety measures implemented to ensure that the behavior of the LM application

72
00:07:38,760 --> 00:07:42,330
remains within acceptable limits.

73
00:07:42,630 --> 00:07:50,070
They include restrictions on the application responses, filters for inappropriate content, and safeguards

74
00:07:50,070 --> 00:07:53,010
against biased or misused.

75
00:07:53,520 --> 00:07:58,800
They are an essential part of risk management in LM ops.

76
00:07:59,430 --> 00:08:10,170
So if you remember a immediately after the launch of ChatGPT, we started reading news about, you know,

77
00:08:10,170 --> 00:08:20,430
how easy it was to learn how to make a bomb or, you know, any other criminal activities, learn any

78
00:08:20,430 --> 00:08:22,590
other criminal activities using ChatGPT.

79
00:08:23,730 --> 00:08:24,690
So.

80
00:08:25,560 --> 00:08:35,580
This, of course moved and the company behind ChatGPT to start using guardrails in order to prevent

81
00:08:35,580 --> 00:08:41,850
all these, uh, improper, uh, data or information or content.

82
00:08:41,850 --> 00:08:42,270
Right.

83
00:08:42,270 --> 00:08:49,770
So right now, if you go to ChatGPT and you try to ask about these kind of things, uh, 99% of the

84
00:08:49,770 --> 00:09:00,090
times, the chat GPT, uh, application is going to respond with a negative, like, I cannot answer

85
00:09:00,090 --> 00:09:06,720
this, I am not authorized, blah, blah, blah, or this is, um, whatever the whatever the answer,

86
00:09:06,720 --> 00:09:14,370
it will, uh, respond to a guardrail policy that, uh, ChatGPT has in place.

87
00:09:14,370 --> 00:09:14,850
Right.

88
00:09:14,850 --> 00:09:24,090
So this is, uh, one thing that you are going to see more and more in LM ops and one thing that, uh,

89
00:09:24,090 --> 00:09:27,960
it was interesting to talk about in this stage.

90
00:09:27,960 --> 00:09:33,690
So, as you know, we are not going to talk about LM ops in detail.

91
00:09:33,690 --> 00:09:41,550
What we are going to do is just to tell you briefly the kind of things you will like to study further

92
00:09:41,550 --> 00:09:49,560
once you are in this, uh, in this scenario of preparing the launch of a professional LM application.

93
00:09:51,770 --> 00:10:01,910
So we are going to talk a little bit more about some of these things associated with the LM ops world.

94
00:10:01,910 --> 00:10:07,580
In the next lesson, we are going to talk about what it is misaligned behavior.