1
00:00:11,060 --> 00:00:16,580
OK, so in this lecture, we will be looking at how to apply in grand models to the article spinning

2
00:00:16,580 --> 00:00:17,270
task.

3
00:00:17,840 --> 00:00:22,220
This will be similar to the markup models we studied in the past, but with a twist.

4
00:00:22,880 --> 00:00:26,510
Let's begin by reviewing what do we know about Markov models so far?

5
00:00:27,320 --> 00:00:31,100
Suppose that we like to build a first order Markov model for language.

6
00:00:31,520 --> 00:00:35,690
That is, we'd like to be able to predict the next word from the current word.

7
00:00:36,980 --> 00:00:43,070
In this case, we can estimate a probability distribution of next words, given current words.

8
00:00:43,520 --> 00:00:46,280
As you recall, this is simply done by counting.

9
00:00:47,510 --> 00:00:54,650
If we want to know how often what follows T minus one, we count, how many times that occurs and divide

10
00:00:54,650 --> 00:00:58,730
that by how many times w t minus one appeared as a unit gram.

11
00:01:00,140 --> 00:01:02,240
But suppose we'd like to have a richer model.

12
00:01:03,530 --> 00:01:09,350
Suppose that instead of depending on just one password, we would like to depend on two passwords.

13
00:01:09,890 --> 00:01:13,520
In this case, we would build a second order Markov model for language.

14
00:01:14,090 --> 00:01:21,620
We want to know the probability that what appears given that W T minus one end w t minus two already

15
00:01:21,620 --> 00:01:22,190
appeared.

16
00:01:22,970 --> 00:01:30,830
Again, we estimate this by counting the number of programs for T minus two W T minus one and T in our

17
00:01:30,830 --> 00:01:31,700
text corpus.

18
00:01:32,240 --> 00:01:37,160
We then divide this by the bigram count four W T minus two and W T minus one.

19
00:01:41,890 --> 00:01:48,430
So one way we saw that we could use the trigger model was to generate poetry that is given the previous

20
00:01:48,430 --> 00:01:53,320
two generated words randomly select the next word and so on and so forth.

21
00:01:54,040 --> 00:01:56,890
Four articles spinning our problem is a bit different.

22
00:01:57,580 --> 00:02:00,580
It's not that we want to generate texts from start to end.

23
00:02:01,210 --> 00:02:03,700
Instead, our job is to replace text.

24
00:02:04,120 --> 00:02:09,729
We want to replace text in such a way that it still makes sense in the context of the text that came

25
00:02:09,729 --> 00:02:12,520
before, as well as the text that comes after.

26
00:02:13,330 --> 00:02:14,980
So here is a simple idea.

27
00:02:15,760 --> 00:02:23,740
How about we create a distribution for W.T. given w t minus one of the previous word NWT plus one the

28
00:02:23,740 --> 00:02:24,490
next word?

29
00:02:25,060 --> 00:02:30,100
This is still a trigger model, except that the dependency structure is a bit different.

30
00:02:34,780 --> 00:02:40,060
Now, luckily, the way that we estimate this program probability is pretty much the same as what you

31
00:02:40,060 --> 00:02:40,810
would expect.

32
00:02:41,440 --> 00:02:49,360
It's simply that count of the program T minus one T NWT plus one divided by the count for the context

33
00:02:49,360 --> 00:02:52,750
window w t minus one NWT plus one.

34
00:02:53,680 --> 00:02:59,350
OK, so hopefully you're convinced that this is pretty much exactly the same approach that we used before.

35
00:02:59,950 --> 00:03:02,980
As you recall, this is the maximum likelihood estimate.

36
00:03:07,610 --> 00:03:12,200
Now, it's probably a good idea to make sure that this approach even makes sense.

37
00:03:12,950 --> 00:03:19,310
Consider the context, words production and to hear some actual middle words from the dataset will be

38
00:03:19,310 --> 00:03:19,850
using.

39
00:03:20,690 --> 00:03:26,780
So some examples are began, capacity closer continued and facilities.

40
00:03:27,500 --> 00:03:30,440
You can imagine how all of these could be used in a sentence.

41
00:03:30,980 --> 00:03:33,890
Production began to speed up in Q1.

42
00:03:34,490 --> 00:03:38,270
Apple's production capacity to build iPhones increased.

43
00:03:39,350 --> 00:03:44,900
The CEO brought production closer to the company's headquarters and so on and so forth.

44
00:03:45,680 --> 00:03:48,560
Note that there is one potential issue that may arise.

45
00:03:49,070 --> 00:03:52,040
Not all of these metal words are the same part of speech.

46
00:03:52,430 --> 00:03:57,380
Sometimes it's a noun, whereas other times it's a verb, whereas other times it's an adjective.

47
00:03:58,010 --> 00:04:04,160
So it's very possible you'll encounter a weird looking result if you replace a verb with a noun or some

48
00:04:04,160 --> 00:04:06,500
other part of speech that does not belong.

49
00:04:07,250 --> 00:04:11,720
In this case, you are changing the grammar of the sentence, which will not work.

50
00:04:12,740 --> 00:04:17,540
There are a couple of additions you could use to try and fix this issue, which are exercises at the

51
00:04:17,540 --> 00:04:18,980
end of the coming notebook.

