1
00:00:11,060 --> 00:00:16,160
So in this lecture, we're going to discuss the next exercise for this course, which is to implement

2
00:00:16,160 --> 00:00:19,430
the process we just described for how to summarize text.

3
00:00:20,090 --> 00:00:22,820
This lecture will recap the steps you need to complete.

4
00:00:24,250 --> 00:00:29,830
For the data, say please feel free to use any data set you like, will be using the BBC News data once

5
00:00:29,830 --> 00:00:33,640
again, which you can get the euro for from the upcoming notebook.

6
00:00:34,930 --> 00:00:40,240
OK, so the steps are as follows The first step, of course, is to pick an article you would like to

7
00:00:40,240 --> 00:00:40,990
summarize.

8
00:00:41,440 --> 00:00:46,210
I recommend trying this on multiple articles to see how well it works for different examples.

9
00:00:47,110 --> 00:00:51,770
Once you have your article, you want to split it into sentences, as you recall.

10
00:00:51,790 --> 00:00:55,150
This can be done using the tokenize function from Nozick.

11
00:00:56,770 --> 00:01:02,080
Once you have your sentences, you can then compute the t.f IDF matrix passing in these sentences as

12
00:01:02,080 --> 00:01:02,680
input.

13
00:01:03,280 --> 00:01:08,080
This will give you a sparse matrix where the number of rows is the number of sentences and the number

14
00:01:08,080 --> 00:01:10,310
of columns is the number of terms.

15
00:01:11,740 --> 00:01:17,440
Once you have your 240 of Matrix, you can then score each sentence by taking the average of the non-zero

16
00:01:17,440 --> 00:01:19,900
values from each row in this matrix.

17
00:01:21,140 --> 00:01:24,110
The next step is to sort these scores in descending order.

18
00:01:24,710 --> 00:01:29,240
You want to make note of this ordering, since that will be the ordering you use to build your summary.

19
00:01:29,870 --> 00:01:34,100
In particular, your summary will simply consists of the top scoring sentences.

20
00:01:34,970 --> 00:01:40,040
As mentioned, you can feel free to use any method you like for choosing how many sentences to keep.

21
00:01:40,880 --> 00:01:42,500
So that sums up the exercise.

22
00:01:42,860 --> 00:01:44,840
Good luck, and I'll see you in the next lecture.

