1
00:00:03,000 --> 00:00:10,620
In this basic application, we are going to extract structured data from a conversation.

2
00:00:11,070 --> 00:00:20,130
So, uh, let's say that we have the text of our chat conversation in which a person talks about, uh,

3
00:00:20,130 --> 00:00:22,140
his favorite song.

4
00:00:22,710 --> 00:00:30,450
And we want our application, our LM application, to extract the names of the song and the singer and

5
00:00:30,450 --> 00:00:35,040
to archive them, store them in a Json dictionary.

6
00:00:35,520 --> 00:00:36,090
Right.

7
00:00:36,510 --> 00:00:43,530
So in order to do that, we are going to use a couple of new things in landscape.

8
00:00:44,600 --> 00:00:50,210
One is going to be response schema and the other is going to be an output parser.

9
00:00:50,450 --> 00:00:53,060
So what are we going to do.

10
00:00:53,060 --> 00:00:58,670
And you will see here in the code we are going to follow these five steps.

11
00:00:59,210 --> 00:01:05,269
In the first step we are going to determine which data we want to extract.

12
00:01:05,269 --> 00:01:07,100
We are going to tell and chain.

13
00:01:07,100 --> 00:01:11,840
We want to extract the name of the singer and the name of the song.

14
00:01:12,080 --> 00:01:19,400
In the second step, we are going to archive the the extracted data into a Json dictionary.

15
00:01:19,400 --> 00:01:28,940
Well, in fact, what we are going to tell the long chain is the kind of structure a output we want

16
00:01:28,940 --> 00:01:29,690
to have.

17
00:01:29,690 --> 00:01:36,740
Then we are going to create the the prompt template, in this case a chat prompt template.

18
00:01:36,740 --> 00:01:41,450
And we are going to ask the user for a the input.

19
00:01:41,450 --> 00:01:41,810
Right.

20
00:01:41,810 --> 00:01:48,620
So the user is going to tell us or we are going to have a text of a conversation where the user is talking

21
00:01:48,620 --> 00:01:51,770
about his favorite singer and so on.

22
00:01:51,770 --> 00:01:52,400
Right.

23
00:01:53,120 --> 00:01:54,920
Then we are going to apply.

24
00:01:56,030 --> 00:02:04,130
A the all this, uh, configuration into our model, and we are going to extract the data and archive

25
00:02:04,130 --> 00:02:05,570
it in a Json format.

26
00:02:05,570 --> 00:02:08,270
So these are the steps in code.

27
00:02:08,270 --> 00:02:16,640
The first step as, as usual is to get the OpenAI API key from our m uh file.

28
00:02:16,670 --> 00:02:22,640
Once we have that solved the first thing is to define our extraction goal.

29
00:02:22,940 --> 00:02:27,980
This is what Lang chain calls, uh, to define their response schema.

30
00:02:28,730 --> 00:02:29,360
Right.

31
00:02:29,360 --> 00:02:36,380
And what we are going to do using the response schema component is just tell lang chain the kind of

32
00:02:36,380 --> 00:02:39,620
information we want to extract from our text.

33
00:02:39,620 --> 00:02:41,960
So we are telling lang chain, okay.

34
00:02:41,960 --> 00:02:47,360
We want to extract the the name of the singer and we want to extract the name of the song.

35
00:02:48,470 --> 00:02:52,790
Once we have that, once we won once launching.

36
00:02:52,790 --> 00:02:53,600
Understand?

37
00:02:53,600 --> 00:02:55,700
What are we looking for?

38
00:02:55,700 --> 00:02:59,300
We are going to configure what is called the output parser.

39
00:02:59,300 --> 00:03:05,810
So the output parser is the way to tell lang chain a.

40
00:03:05,810 --> 00:03:10,820
What do we want to do once we have this information.

41
00:03:10,820 --> 00:03:14,690
So once we have this data how do we want to organize it.

42
00:03:14,690 --> 00:03:16,760
How do we want to store it.

43
00:03:16,760 --> 00:03:23,930
And in this case we are telling Lang Chain that we want to store this information in a Json dictionary.

44
00:03:23,930 --> 00:03:26,180
How do we say that to Nancy?

45
00:03:27,080 --> 00:03:32,390
We are saying this importing the component structure output parser.

46
00:03:32,390 --> 00:03:36,410
This is a predefined output parser from lang chain.

47
00:03:37,130 --> 00:03:47,450
And if we check what kind of format has this uh output parser associated, you will see that we we have

48
00:03:47,450 --> 00:03:50,690
a this answer, the output should be blah blah blah blah blah.

49
00:03:50,690 --> 00:03:55,160
In short, the output is going to be a structure in a Json vocabulary.

50
00:03:55,160 --> 00:03:55,580
Right.

51
00:03:55,580 --> 00:04:00,080
So these initial steps are going to tell lang chain okay.

52
00:04:00,080 --> 00:04:03,230
This is the data we are looking to extract.

53
00:04:03,230 --> 00:04:08,240
And this is the way we want to store this data after we have them.

54
00:04:08,360 --> 00:04:15,560
Once we have that we are going to a manage, uh, you know, the conversation.

55
00:04:15,560 --> 00:04:20,930
In order to manage the conversation we are going to create a chat prompt template.

56
00:04:20,930 --> 00:04:21,470
Right.

57
00:04:21,470 --> 00:04:28,880
So we are going to tell, uh, our chat model, we are going to tell our chat model a given a command

58
00:04:28,880 --> 00:04:32,390
or a text or a conversation from the user.

59
00:04:32,420 --> 00:04:35,450
Extract the artist and the song names.

60
00:04:35,630 --> 00:04:41,900
So and we are associating here, you know, the, the the.

61
00:04:42,970 --> 00:04:44,380
Output parser.

62
00:04:44,470 --> 00:04:51,490
Uh, we are going to use and also the user prompt that we are going to receive.

63
00:04:51,490 --> 00:05:02,020
So uh, once we have defined the chat prompt template with this information we can enter the user message.

64
00:05:02,020 --> 00:05:05,560
This is going to be the input from the user.

65
00:05:05,560 --> 00:05:08,590
And as you can see here we are a very simple one.

66
00:05:08,590 --> 00:05:15,010
This could be a super complex conversation or a super complex piece of conversation.

67
00:05:15,010 --> 00:05:21,670
Let's say, I don't know, a chat of 10,000, uh, messages, people, uh, social network, whatever

68
00:05:21,670 --> 00:05:22,270
you want.

69
00:05:22,270 --> 00:05:29,050
And in this case, it's a very simple sentence that says, I like the song New York, New York by Frank

70
00:05:29,050 --> 00:05:29,710
Sinatra.

71
00:05:30,340 --> 00:05:39,610
So once we use this, uh, user message in our, uh, chat prompt template.

72
00:05:40,360 --> 00:05:44,860
We can start extracting the singer and the song.

73
00:05:45,370 --> 00:05:52,210
So the only thing we need to do is to apply this, uh.

74
00:05:54,130 --> 00:06:03,310
Chat, a prompt template we have created to the output parser we defined previously, and we are going

75
00:06:03,310 --> 00:06:08,650
to store the result of this extraction in the variable with the same name.

76
00:06:08,650 --> 00:06:11,800
If we print this variable, we will see that.

77
00:06:11,800 --> 00:06:21,520
Now we have a Json dictionary that is storing two variables singer and song with the name of Frank Sinatra

78
00:06:21,520 --> 00:06:23,320
and the name of the song.

79
00:06:23,740 --> 00:06:33,610
So as you can see here in this very basic application, we can use Lang Chain to, uh, have a very

80
00:06:33,610 --> 00:06:45,100
sophisticated, uh, task performed for us, which is extract for from a document, uh, only the kind

81
00:06:45,100 --> 00:06:47,740
of information we are interested in.

82
00:06:47,740 --> 00:06:54,580
And not just that, once we have that information, store it in a particular way.

83
00:06:54,580 --> 00:06:59,590
In this case, it's going to be a structured way in a Json dictionary.