1
00:00:11,090 --> 00:00:16,760
So in this lecture, we will be introducing the next part of this course, which comprises several sections.

2
00:00:17,270 --> 00:00:22,370
We'll talk about how and why we are transitioning and how what we're about to study differs from what

3
00:00:22,370 --> 00:00:23,630
we have already studied.

4
00:00:24,350 --> 00:00:28,580
So the first major section of this course covered vector models in textbook processing.

5
00:00:29,330 --> 00:00:34,220
This was all about how to represent text numerically so that machine learning models can be applied.

6
00:00:34,910 --> 00:00:40,460
Now, even though we haven't yet covered machine learning in this course, we still saw how vector representations

7
00:00:40,460 --> 00:00:41,970
could be used to.

8
00:00:41,990 --> 00:00:43,970
Document retrieval is one example.

9
00:00:44,600 --> 00:00:50,630
So vector models are useful in their own right, and yet there is a whole subfield of NLP, which does

10
00:00:50,630 --> 00:00:54,170
not make use of these numerical vector representations.

11
00:00:54,560 --> 00:00:56,720
And these are probabilistic models.

12
00:01:01,220 --> 00:01:05,900
So let's discuss the importance of probabilistic models in the context of an LP.

13
00:01:06,620 --> 00:01:09,560
This course will mainly focus on a model called the Markov model.

14
00:01:10,190 --> 00:01:15,200
In fact, the application of Markov models to an LP goes hand in hand with their creation.

15
00:01:16,040 --> 00:01:22,010
So in the early 1800s, Andre Markov, whom the Markov model is named after, applied these models to

16
00:01:22,010 --> 00:01:24,410
analyze the patterns of vowels and consonants.

17
00:01:25,310 --> 00:01:31,880
In 1948, another pioneer of our field, Claude Shannon, used Markov models to generate text, which

18
00:01:31,880 --> 00:01:37,790
he introduced in his famous paper that laid the foundations of information theory and digital communications.

19
00:01:38,870 --> 00:01:44,270
In more recent times, the hidden Markov model has been applied in areas such as speech recognition

20
00:01:44,450 --> 00:01:46,520
and biological sequence analysis.

21
00:01:49,150 --> 00:01:53,900
It's a little known fact that biological sequences are closely related to NLP.

22
00:01:54,490 --> 00:01:58,480
But if you think about it just a little bit, you'll see how it begins to make sense.

23
00:01:59,020 --> 00:02:03,680
DNA is made up of an alphabet of just four letters ATC and G.

24
00:02:04,300 --> 00:02:09,430
These letters can be combined to form words which are then combined to form instructions about what

25
00:02:09,430 --> 00:02:11,740
our body should build with its resources.

26
00:02:12,550 --> 00:02:15,100
In other words, DNA is just another language.

27
00:02:15,460 --> 00:02:18,970
It happens to be a language created by nature instead of by humans.

28
00:02:19,240 --> 00:02:21,100
But it is a language nonetheless.

29
00:02:21,880 --> 00:02:27,640
In fact, what you will find is that developments in NLP tend to be closely followed by developments

30
00:02:27,640 --> 00:02:28,390
in genomics.

31
00:02:28,720 --> 00:02:35,140
Since many of the same techniques, like CNN's ANA and Transformers, can be directly applied to biological

32
00:02:35,140 --> 00:02:35,980
sequences.

33
00:02:37,090 --> 00:02:42,970
The latest example of this is the latest version of Alpha Fold released by DeepMind in 2020, with the

34
00:02:42,970 --> 00:02:45,550
paper published in Nature in 2021.

35
00:02:46,360 --> 00:02:52,360
Later on in your NLP studies, you'll see that even Transformers will implement many of the ideas we

36
00:02:52,360 --> 00:02:54,280
will learn about in the coming sessions.

37
00:02:58,930 --> 00:03:04,930
So one final example is one I really like, which is Google's page rank method, as you recall.

38
00:03:04,960 --> 00:03:08,740
Google is one of the largest and most powerful tech companies in the world.

39
00:03:09,250 --> 00:03:13,960
The reason Google became as large as it is is largely due to the page rank method.

40
00:03:14,500 --> 00:03:17,650
This is back when search engines were not as good as they are today.

41
00:03:17,920 --> 00:03:22,270
And so Google's method was able to do far better than anything that existed.

42
00:03:22,900 --> 00:03:24,310
So what is Page Rank?

43
00:03:24,940 --> 00:03:28,990
Well, suppose that you built a markup model out of all the web pages on the internet.

44
00:03:29,470 --> 00:03:35,290
The page rank score for each web page is the probability that you'd land on that page after randomly

45
00:03:35,290 --> 00:03:37,900
browsing the web for an infinite amount of time.

46
00:03:38,680 --> 00:03:44,380
So let this be a lesson to you if you want to start one of the largest and most powerful Billion-Dollar

47
00:03:44,380 --> 00:03:45,640
businesses in the world.

48
00:03:46,060 --> 00:03:48,130
All you really need is a simple Markov model.

49
00:03:52,840 --> 00:03:58,180
So the next part of this course will be made up of multiple sections in the first section, we'll study

50
00:03:58,180 --> 00:04:01,690
Markov models and what it means to build an Engram language model.

51
00:04:02,380 --> 00:04:04,330
We'll also see a few applications.

52
00:04:04,900 --> 00:04:08,650
One application, which will get its own section is the article spinner.

53
00:04:09,220 --> 00:04:14,080
This is the kind of tool that is used by Black Hat SEO marketers that don't want to write legitimate

54
00:04:14,080 --> 00:04:17,440
content, but instead spend content written by others.

55
00:04:18,190 --> 00:04:23,710
Another section we'll look at an application called Cipher Decryption, also known as code breaking.

56
00:04:24,280 --> 00:04:29,320
You'll see how language models can be applied to decrypt ciphers, which is an important application

57
00:04:29,320 --> 00:04:31,030
in espionage and warfare.

58
00:04:32,340 --> 00:04:36,780
After these sections will then move on to machine learning and deep learning, where you'll get a chance

59
00:04:36,780 --> 00:04:41,160
to see how both vector models and probability models can be applied further.

