1
00:00:04,730 --> 00:00:05,090
Okay.

2
00:00:05,090 --> 00:00:10,130
So this is a very interesting level two application.

3
00:00:10,610 --> 00:00:12,170
We are going to build.

4
00:00:12,170 --> 00:00:20,150
You are going to learn how to build an application to evaluate a drug application.

5
00:00:20,300 --> 00:00:27,650
If you remember we have talked about we have we have talked about that before a LM applications are

6
00:00:27,650 --> 00:00:32,540
not very easy to evaluate, not very easy to test.

7
00:00:32,540 --> 00:00:38,480
Test is the word we use as software engineers for traditional software applications.

8
00:00:38,480 --> 00:00:47,390
So when you build a traditional software application, testing is is is relatively easy because every

9
00:00:47,390 --> 00:00:55,520
time you ask one question to a traditional software application, you are going to have the same answer.

10
00:00:55,520 --> 00:01:04,430
So testing is easy, but in the LM applications we have a different situation.

11
00:01:04,430 --> 00:01:16,430
We can, uh, tell the same question twice to an LM application and we can get two different answers.

12
00:01:16,430 --> 00:01:25,520
So sometimes it is difficult to evaluate the, the quality or to test the quality of a LM application.

13
00:01:25,670 --> 00:01:31,850
So we have different methods, different ways to approach evaluation.

14
00:01:31,850 --> 00:01:34,040
And this is a very simple one.

15
00:01:34,040 --> 00:01:39,770
What we are going to do here, as you can see in the drop down with explanation.

16
00:01:39,770 --> 00:01:44,750
But what we are going to do here is first we have created a Rag application.

17
00:01:44,750 --> 00:01:52,790
So we are going to load a txt and we are going to make questions about the content in the in the txt

18
00:01:52,790 --> 00:01:53,210
file.

19
00:01:53,210 --> 00:01:53,750
Okay.

20
00:01:53,750 --> 00:01:59,900
But in this application we have included this evaluation part.

21
00:01:59,900 --> 00:02:08,479
And in the evaluation part we are going to enter a question and an answer we already know.

22
00:02:08,720 --> 00:02:17,480
And then this application is going to ask this security question if you want to the application in order

23
00:02:17,480 --> 00:02:28,610
to see if the application is going to, uh, give us a similar or an exactly and the same, uh, response

24
00:02:28,610 --> 00:02:29,300
that we have.

25
00:02:29,300 --> 00:02:38,180
And this is the interesting the interesting part, because this application is going to be able to see

26
00:02:38,660 --> 00:02:48,290
if the answer provided by the application is not exactly the same as the answer we have provided, but

27
00:02:48,290 --> 00:02:54,290
it it means the same or it is mostly the same, you see.

28
00:02:54,290 --> 00:03:02,030
So if instead of saying, I don't know, uh, the United States of America, the application says United

29
00:03:02,030 --> 00:03:08,930
States of America without the article first, this application is going to be wise enough to see, okay,

30
00:03:08,930 --> 00:03:10,790
the answer is good is correct.

31
00:03:10,790 --> 00:03:12,770
So the evaluation is positive.

32
00:03:12,770 --> 00:03:13,220
Okay.

33
00:03:13,220 --> 00:03:22,370
So this is the very interesting thing about evaluating an app an application with an LM application

34
00:03:22,370 --> 00:03:23,030
okay.

35
00:03:23,480 --> 00:03:24,950
So let's let's try this.

36
00:03:24,950 --> 00:03:29,600
So first we are going to load a txt file.

37
00:03:29,600 --> 00:03:34,550
This is a txt file from AI accelerator our company.

38
00:03:34,550 --> 00:03:38,750
And it's just explaining the different services we provide.

39
00:03:38,750 --> 00:03:46,880
So artificial intelligence consulting services for enterprise level companies for small businesses LM

40
00:03:46,880 --> 00:03:48,560
app development etc. etc..

41
00:03:48,560 --> 00:03:48,830
Right.

42
00:03:48,830 --> 00:03:51,680
So we load this file.

43
00:03:51,680 --> 00:03:57,620
Now we are going to enter this security question or this testing question or whatever.

44
00:03:58,040 --> 00:04:00,380
We know the answer to this question.

45
00:04:00,380 --> 00:04:02,780
So that's the purpose of this security question okay.

46
00:04:02,780 --> 00:04:05,090
So I'm going to ask what.

47
00:04:06,380 --> 00:04:10,940
What is the name of the company?

48
00:04:12,180 --> 00:04:13,350
Company.

49
00:04:13,920 --> 00:04:14,880
Company.

50
00:04:15,510 --> 00:04:15,960
When you.

51
00:04:15,960 --> 00:04:20,160
When you talk and write at the same time, what is the name of the company?

52
00:04:20,160 --> 00:04:20,790
Okay.

53
00:04:20,790 --> 00:04:25,620
We press enter and then we enter the real answer to the question.

54
00:04:25,620 --> 00:04:31,290
We say, okay, the name of our company is a AI accelerator okay.

55
00:04:31,290 --> 00:04:33,180
And then we press enter.

56
00:04:33,420 --> 00:04:38,130
Now we enter the OpenAI API key.

57
00:04:38,160 --> 00:04:40,890
We press enter to submit the form.

58
00:04:40,890 --> 00:04:47,160
And the application now is going to make this question to the application okay.

59
00:04:47,160 --> 00:04:50,370
So question what is the name of the company.

60
00:04:50,370 --> 00:04:55,770
Real answer AI accelerator answer provided by the application.

61
00:04:55,920 --> 00:04:56,700
The same.

62
00:04:56,700 --> 00:05:00,750
Therefore the AI application answer was correct.

63
00:05:00,750 --> 00:05:03,780
So the evaluation has been positive.

64
00:05:03,870 --> 00:05:12,600
Usually when we evaluate an application an LM application we have like a range of questions like ten,

65
00:05:12,600 --> 00:05:15,330
15, 20 depends of the case right.

66
00:05:15,330 --> 00:05:22,980
But this is a very simple prototype in order to demo this kind of functionality okay.

67
00:05:22,980 --> 00:05:25,650
So let's take a look at the code.

68
00:05:27,240 --> 00:05:30,360
As always, two main files requirements.

69
00:05:30,840 --> 00:05:34,890
We are familiar with the packages here and Main.py.

70
00:05:35,400 --> 00:05:41,340
As you see, you are also familiar with this drag functionality we have prepared here.

71
00:05:41,340 --> 00:05:46,380
Okay, you did this in the previous blog with level one applications.

72
00:05:46,380 --> 00:05:53,970
Okay, so the only thing is you can see here are the typical Streamlit tricks like encapsulating functionality

73
00:05:53,970 --> 00:05:58,860
and functions, using streamlit elements, etc. etc..

74
00:05:58,860 --> 00:06:02,520
Okay, so nothing new for you now.

75
00:06:02,520 --> 00:06:10,800
As always, download this code, execute our code in your computer, then repeat the code by yourself,

76
00:06:10,800 --> 00:06:17,130
line by line and try to understand what you are doing whenever you have a question or a doubt ChatGPT

77
00:06:17,130 --> 00:06:25,050
for extremely documentation, etc. etc. and only once you are comfortable with this application you

78
00:06:25,050 --> 00:06:26,880
go to the next lesson.

