1
00:00:11,110 --> 00:00:14,770
So in this lecture, we will summarize everything we learned in this section.

2
00:00:15,400 --> 00:00:18,110
This section was all about how to summarize text.

3
00:00:18,640 --> 00:00:23,980
We learned about two different approaches to this problem, specifically extractive and obstructive.

4
00:00:25,030 --> 00:00:30,280
To review extractive is when the summary is made up of pieces of the given documents.

5
00:00:30,850 --> 00:00:36,220
On the other hand, obstructive is when the summary is made up of novel text more like a paraphrasing

6
00:00:36,220 --> 00:00:37,060
of the input.

7
00:00:37,810 --> 00:00:43,510
Clearly, extractive summarization is easier because it only amounts to choosing which sentences are

8
00:00:43,510 --> 00:00:44,470
the most relevant.

9
00:00:45,040 --> 00:00:50,830
On the other hand, abstract of summarization is hard because it requires us to generate text, which

10
00:00:50,830 --> 00:00:52,960
even by itself is pretty difficult.

11
00:00:53,590 --> 00:00:58,630
Abstracted methods are more suited to deep learning models such as seek to seek in transformers.

12
00:01:03,380 --> 00:01:08,690
This section looked at two methods of summarizing text where the second builds upon the first.

13
00:01:09,350 --> 00:01:10,970
The first method was pretty simple.

14
00:01:11,420 --> 00:01:16,400
It only involved using the techniques we learned from the first part of this course, which was on vector

15
00:01:16,400 --> 00:01:19,160
based methods, in particular TFI Taf.

16
00:01:20,480 --> 00:01:26,180
The basic idea was we would compute a tier free of matrix from a document split in two sentences.

17
00:01:26,720 --> 00:01:31,490
We could then score each sentence using the average of the non-zero tier free of values.

18
00:01:32,720 --> 00:01:36,650
From there, we simply sought each sentence by their corresponding score.

19
00:01:36,890 --> 00:01:38,840
In return, the top scoring sentences.

20
00:01:39,990 --> 00:01:45,540
The second method we looked at was called Tex Rank, and it gave us a better way of scoring each sentence.

21
00:01:46,320 --> 00:01:49,260
Most of the steps remain the same, which was convenient.

22
00:01:50,310 --> 00:01:55,650
The Tex Rank method is based on Google's Page Rank, which treats every web page as a state in a Markov

23
00:01:55,650 --> 00:01:56,100
chain.

24
00:01:56,790 --> 00:02:02,070
The Page Rank score is then just the probability you would land on a specific webpage after doing a

25
00:02:02,070 --> 00:02:04,480
random walk for an infinite number of steps.

26
00:02:06,240 --> 00:02:11,610
If you looked at the advanced lectures, you know the conditions under which such a score exists and

27
00:02:11,610 --> 00:02:14,520
a simple way for us to ensure that those conditions are true.

28
00:02:15,390 --> 00:02:19,140
Importantly, you also learned an efficient way to compute these scores.

29
00:02:19,770 --> 00:02:24,270
For those who prefer a more beginner approach, we also looked at a few libraries that give you back

30
00:02:24,270 --> 00:02:26,730
a summary in just a few lines of code.

31
00:02:27,570 --> 00:02:30,840
Those libraries included text rank, among other methods.

32
00:02:35,620 --> 00:02:40,900
Finally, recognize that both of the methods we learned about compute, the summary based only on the

33
00:02:40,900 --> 00:02:42,790
document we wanted to summarize.

34
00:02:43,300 --> 00:02:48,250
In other words, unlike other machine learning methods, it doesn't require us to train on a whole text

35
00:02:48,250 --> 00:02:50,920
corpus, learn a language model and so forth.

36
00:02:51,850 --> 00:02:57,130
If we wanted to build an abstract of summariser, that would be required since we need a language model

37
00:02:57,130 --> 00:03:03,280
in order to know how to generate text using this method, we only need the document itself, which is

38
00:03:03,280 --> 00:03:04,630
simpler and more efficient.

39
00:03:05,830 --> 00:03:11,440
So hopefully you found this section useful both for learning more about NLP and for reducing your own

40
00:03:11,440 --> 00:03:12,120
reading time.

41
00:03:12,130 --> 00:03:14,290
If you're like me and you have a lot to read.

42
00:03:14,860 --> 00:03:17,020
Thanks for listening, and I'll see you in the next lecture.