1
00:00:11,090 --> 00:00:16,760
So in this lecture, we'll be starting the next section of this course, which is on lean semantic analysis,

2
00:00:17,150 --> 00:00:19,280
also known as lean semantic indexing.

3
00:00:20,060 --> 00:00:23,810
This is another algorithm with a key role in the field of NLP.

4
00:00:24,950 --> 00:00:31,070
Now, when people introduced this topic, they usually start with two related concepts Tsunami and Palesa

5
00:00:31,070 --> 00:00:31,400
Me.

6
00:00:32,030 --> 00:00:35,900
Tsunami is when you have multiple words, which means similar things.

7
00:00:36,260 --> 00:00:41,600
For instance, a run in sprints Palesa me is when you have one word, which means multiple things.

8
00:00:41,960 --> 00:00:43,760
For instance, consider the word bank.

9
00:00:44,420 --> 00:00:46,400
A bank is a financial institution.

10
00:00:46,880 --> 00:00:51,230
But when we say River Bank, we do not mean a financial institution in a river.

11
00:00:52,190 --> 00:00:54,950
So this instance of bank means something else.

12
00:00:56,680 --> 00:01:03,610
Another example is the word pupil, a pupil is a part of the eye, but a pupil can also refer to a student.

13
00:01:04,030 --> 00:01:06,010
You are a pupil of this course.

14
00:01:10,740 --> 00:01:13,020
So how do these concepts enter in NLP?

15
00:01:13,800 --> 00:01:15,540
Well, they are seen as problematic.

16
00:01:16,290 --> 00:01:22,080
Suppose, for example, we build a recommender system using a document termed Matrix, as we've seen.

17
00:01:22,980 --> 00:01:25,710
Now suppose that the user searches for the word to run.

18
00:01:26,400 --> 00:01:28,710
We know that running is the same as sprinting.

19
00:01:29,070 --> 00:01:32,610
So documents about sprinting should also probably be relevant.

20
00:01:33,900 --> 00:01:39,030
However, since the user didn't enter the word sprints, they will only get back documents containing

21
00:01:39,030 --> 00:01:39,720
the word run.

22
00:01:40,680 --> 00:01:41,820
How about polygamy?

23
00:01:42,510 --> 00:01:48,300
Well, suppose that the user is an ophthalmologist doing research on the pupil of the eye, so they

24
00:01:48,300 --> 00:01:49,530
search for the word pupil.

25
00:01:49,830 --> 00:01:55,320
But then they get back documents about students and how taking handwritten notes is superior to downloading

26
00:01:55,320 --> 00:01:56,790
slides for students.

27
00:01:57,450 --> 00:02:00,240
Of course, those documents are not what the user was searching for.

28
00:02:00,300 --> 00:02:06,720
Since the user was searching for documents about the eye, lean semantic analysis seeks to solve both

29
00:02:06,720 --> 00:02:07,620
of these issues.

30
00:02:12,260 --> 00:02:14,420
Let's go through a quick outline for this section.

31
00:02:15,260 --> 00:02:20,600
We'll begin by explaining the intuition behind this video, which is the machine learning technique

32
00:02:20,990 --> 00:02:23,240
underlying lean semantic analysis.

33
00:02:23,930 --> 00:02:29,840
Once we understand as we do, we can then discuss how it is applied to an LP and why it's useful in

34
00:02:29,840 --> 00:02:30,800
this context.

35
00:02:31,670 --> 00:02:36,320
Once we've done that, we'll look at some code which demonstrates how SVT is useful.

36
00:02:37,640 --> 00:02:40,760
Now achieved is an extremely versatile technique.

37
00:02:41,300 --> 00:02:46,400
In fact, SVOD can be used for many of the applications we've already discussed.

38
00:02:46,970 --> 00:02:50,450
As mentioned, it can help improve recommenders or search engines.

39
00:02:51,380 --> 00:02:54,410
In addition, it can also help to improve classifiers.

40
00:02:54,770 --> 00:03:01,130
So, for example, detecting spam or sentiment as you've seen, it can also be used for text, summarization

41
00:03:01,400 --> 00:03:02,420
and topic modeling.

42
00:03:03,140 --> 00:03:09,080
At the end of this section, applying as ready to these other applications will be mentioned as exercises

43
00:03:09,080 --> 00:03:10,100
for you to complete.

44
00:03:14,750 --> 00:03:20,960
So it's important to note that latent semantic analysis is yet another topic, or it's actually just

45
00:03:20,960 --> 00:03:23,750
a machine learning model being applied to NLP.

46
00:03:24,500 --> 00:03:29,330
In other words, behind the scenes, all we are doing is making use of a well known machine learning

47
00:03:29,330 --> 00:03:35,900
technique, specifically as read and applying that to a term document matrix as we normally do.

48
00:03:36,740 --> 00:03:41,630
So it's actually nothing new if you've already learned about S.V. or PCA in the past.

49
00:03:42,590 --> 00:03:45,860
This is just like naive bays and just like logistic regression.

50
00:03:46,190 --> 00:03:51,470
And just like all of the other techniques we've learned by studying these techniques, you're not only

51
00:03:51,470 --> 00:03:54,980
improving your skills in NLP, but all of machine learning.

52
00:03:54,980 --> 00:04:01,220
Since any of these techniques can be applied in other fields like computer vision or bioinformatics.