1
00:00:11,160 --> 00:00:14,910
So in this lecture, we will summarize everything we learned in this section.

2
00:00:15,600 --> 00:00:19,440
The section was about the Markov model and how it can be used for NLP.

3
00:00:20,010 --> 00:00:25,500
We started this section by learning about the markoff property, although this assumption seems to be

4
00:00:25,500 --> 00:00:26,580
very restrictive.

5
00:00:26,880 --> 00:00:32,070
We also learned that it has many applications in fields such as finance, reinforcement learning and

6
00:00:32,070 --> 00:00:33,060
speech recognition.

7
00:00:33,750 --> 00:00:37,650
So clearly, although it is very restrictive, it is still very useful.

8
00:00:38,790 --> 00:00:44,070
The next step was to discuss how a market model is represented mathematically, which also helps us

9
00:00:44,070 --> 00:00:46,380
understand how to represent one in Python.

10
00:00:47,490 --> 00:00:51,750
The next step was to look at how to apply the markup model for applications in NLP.

11
00:00:56,580 --> 00:01:02,100
Now, you may not yet realize this, but what you learned in this section is just the start of a potentially

12
00:01:02,100 --> 00:01:02,850
long journey.

13
00:01:03,630 --> 00:01:06,270
In fact, what you have learned does not end here.

14
00:01:06,960 --> 00:01:12,900
We learned in this section that the basic idea is that we want to predict the future from the past.

15
00:01:13,350 --> 00:01:19,110
That is, we want to predict X of T given X of T minus one x of T minus two and so forth.

16
00:01:19,770 --> 00:01:23,790
With Markov models, it turns out that we cut this off that X of T minus one.

17
00:01:24,510 --> 00:01:26,430
However, this need not be the case.

18
00:01:27,090 --> 00:01:32,010
One thing we can do is to simply add more terms when we use one pass value.

19
00:01:32,010 --> 00:01:33,840
It's called First Order Markov.

20
00:01:34,230 --> 00:01:38,130
When we use to pass values, it's called Second Order, Markov and so forth.

21
00:01:38,760 --> 00:01:44,250
Despite the simplicity of this idea, you will find that it appears again and again in machine learning,

22
00:01:44,550 --> 00:01:46,500
even at the most advanced levels.

23
00:01:47,160 --> 00:01:49,920
One example of this is with a time series analysis.

24
00:01:50,880 --> 00:01:56,340
A common way to build a forecasting model is to try and learn a function that maps previous values to

25
00:01:56,340 --> 00:01:57,120
the next value.

26
00:01:57,660 --> 00:02:01,950
This can be a linear function, a neural network, a random forest and so forth.

27
00:02:02,940 --> 00:02:06,630
Another direction we can go is to use more complex architectures.

28
00:02:07,170 --> 00:02:12,540
We'll keep working with text, but we'll use more powerful models that can yield more complex dependencies

29
00:02:12,810 --> 00:02:14,820
between the pass values and the next value.

30
00:02:15,810 --> 00:02:21,450
One example of this is with recurrent neural networks, although that sounds complicated when using

31
00:02:21,490 --> 00:02:28,140
Arden's for NLP, you can do useful things only by training your model to predict the next value from

32
00:02:28,140 --> 00:02:29,280
previous values.

33
00:02:29,970 --> 00:02:32,220
In fact, the story still doesn't end here.

34
00:02:32,760 --> 00:02:38,820
The most powerful NLP models we have today called Transformers are also trained to do this exact same

35
00:02:38,820 --> 00:02:39,240
thing.

36
00:02:39,930 --> 00:02:45,390
Transformers might sound intimidating, but behind the scenes, all they have been trying to do is predict

37
00:02:45,390 --> 00:02:48,000
the next word from previous words.

38
00:02:48,510 --> 00:02:54,150
So again, although this is a very simple idea, it's also very powerful and it's the foundation for

39
00:02:54,150 --> 00:02:55,470
the current state of the art.

