1
00:00:11,050 --> 00:00:16,390
So in this lecture, we will be introducing the next major part of this course, which is on machine

2
00:00:16,390 --> 00:00:21,190
learning, this course is looked at two very different techniques thus far.

3
00:00:21,370 --> 00:00:23,170
The first being vector based models.

4
00:00:23,470 --> 00:00:26,080
And the second being probability based models.

5
00:00:26,620 --> 00:00:31,360
In this part of the course, we will now look at new models, which are based on what we've already

6
00:00:31,360 --> 00:00:31,990
learned.

7
00:00:32,530 --> 00:00:37,960
In some cases, these new models will be vector based, and in others they will be probability based,

8
00:00:38,290 --> 00:00:40,060
and in some cases they will be both.

9
00:00:41,200 --> 00:00:46,060
Now, one pattern I hope you're starting to see is that many of these different techniques we're learning

10
00:00:46,060 --> 00:00:48,010
about are interchangeable.

11
00:00:48,610 --> 00:00:54,370
A simple example of this is with count vectors, as you know, any place where I use count vectors.

12
00:00:54,610 --> 00:00:56,490
I can also use TFI Taf.

13
00:00:57,100 --> 00:00:59,410
So these two techniques are interchangeable.

14
00:01:00,130 --> 00:01:05,040
What works best for your use case is really dependent on the specifics of your data set.

15
00:01:05,560 --> 00:01:11,160
So if you see me using TFI Taf in the coming lectures, this does not imply that you should always use

16
00:01:11,200 --> 00:01:16,930
TFI Taf if it's simply one possible option out of many in your work.

17
00:01:16,960 --> 00:01:20,320
You should be testing all options to see which works best for you.

18
00:01:21,920 --> 00:01:24,530
Another example is with machine learning models.

19
00:01:25,010 --> 00:01:28,820
Some machine learning models we will learn about are meant for the same task.

20
00:01:29,120 --> 00:01:35,540
For example, logistic regression and naive Bayes, both of these can be used for text classification.

21
00:01:36,140 --> 00:01:38,960
So anywhere you see one, you could use the other.

22
00:01:39,530 --> 00:01:42,530
And if you know of any other classifiers you'd like to try.

23
00:01:42,860 --> 00:01:44,390
Feel free to use those as well.

24
00:01:46,020 --> 00:01:50,880
Note that in this course, when we talk about machine learning, this refers to techniques which are

25
00:01:50,880 --> 00:01:53,100
not based on deep learning and neural networks.

26
00:01:53,490 --> 00:01:58,900
This is only to disambiguate these topics within this course outside of this course.

27
00:01:58,920 --> 00:02:01,530
Deep learning is simply a subset of machine learning.

28
00:02:01,800 --> 00:02:04,920
And so just be aware of how we categorized each topic.

29
00:02:09,479 --> 00:02:13,140
OK, so let me give you a brief outline of the coming sections.

30
00:02:13,680 --> 00:02:19,170
Now, unlike the other sections of this course, these will be centered around the application rather

31
00:02:19,170 --> 00:02:20,100
than the technique.

32
00:02:20,730 --> 00:02:25,650
So for example, when we looked at vector models and Markov models, these were centered around the

33
00:02:25,650 --> 00:02:26,400
technique.

34
00:02:26,910 --> 00:02:33,450
Our goal was to learn the technique and any applications we discussed were secondary in the machine

35
00:02:33,450 --> 00:02:34,260
learning sections.

36
00:02:34,290 --> 00:02:38,780
This is sort of flipped around, which is more in the spirit of V1 of this course.

37
00:02:39,990 --> 00:02:45,210
So as an example, the first section will be based on an application known as spam detection.

38
00:02:45,810 --> 00:02:52,980
We will study a technique that happens to be useful for this called naive Bayes in the second section.

39
00:02:53,160 --> 00:02:56,880
This will be based on an application known as sentiment analysis.

40
00:02:57,480 --> 00:03:01,440
In this section, we will study a technique known as logistic regression.

41
00:03:02,770 --> 00:03:08,620
In the third section, we will study an application called Latent Semantic Indexing, which is relevant

42
00:03:08,620 --> 00:03:11,950
for search engine optimization, also known as SEO.

43
00:03:12,820 --> 00:03:17,080
In this section, we will study techniques such as PCA and SVT.

44
00:03:18,640 --> 00:03:22,420
In the next section, we'll look at an application called topic modeling.

45
00:03:23,020 --> 00:03:28,150
This is a way for you to automatically categorize a set of documents without being told what they are

46
00:03:28,150 --> 00:03:28,660
about.

47
00:03:29,290 --> 00:03:31,320
It is an unsupervised technique.

48
00:03:31,990 --> 00:03:36,490
In this section, we'll study a technique known as latent directly allocation.

49
00:03:41,170 --> 00:03:47,080
So why are these sections based on the application instead of the technique primarily?

50
00:03:47,110 --> 00:03:53,680
My goal is to help you see that NLP is really applicable in the real world by putting the application

51
00:03:53,680 --> 00:03:54,580
at the forefront.

52
00:03:54,880 --> 00:03:56,560
You will see this more immediately.

53
00:03:57,520 --> 00:04:00,280
In addition, some people simply learn better this way.

54
00:04:01,240 --> 00:04:04,630
Many people don't seek out to learn naive Bayes for no reason.

55
00:04:05,680 --> 00:04:10,330
Typically, you are part of a business and you have in mind some business application.

56
00:04:11,050 --> 00:04:16,269
You will generally use whatever techniques exist that will help you improve your business capability

57
00:04:16,510 --> 00:04:18,160
at handling that application.

58
00:04:18,970 --> 00:04:23,740
And if not, you base happens to be one of those techniques, then that is what you will opt to learn.

59
00:04:24,430 --> 00:04:28,780
So this is simply a different way that one may encounter and learn about machine learning.

60
00:04:29,380 --> 00:04:34,630
But either way, you will still learn about both techniques and applications, regardless of their order

61
00:04:34,630 --> 00:04:35,650
or emphasis.

62
00:04:40,350 --> 00:04:46,080
So one thing I wanted to mention about this part of the course is that each section is essentially independent.

63
00:04:46,740 --> 00:04:50,310
What I mean by this is that it doesn't really matter which order you do them in.

64
00:04:50,850 --> 00:04:55,470
So if you want to do topic modeling before you do spam detection, that should work.

65
00:04:56,010 --> 00:04:58,080
This is unlike some of the other sections.

66
00:04:58,170 --> 00:05:03,690
For example, you can't do cipher decryption before you learn about basic markup models because Markov

67
00:05:03,690 --> 00:05:05,070
models are a prerequisite.

68
00:05:05,730 --> 00:05:09,420
So that's one thing to keep in mind for the machine learning sections.

69
00:05:09,420 --> 00:05:12,360
Please feel free to do them in any order that works for you.

70
00:05:13,710 --> 00:05:18,900
Now, one point of caution is that from my experience, students are typically more comfortable with

71
00:05:18,900 --> 00:05:22,200
supervised methods compared with unsupervised methods.

72
00:05:23,160 --> 00:05:27,300
In this case, spam detection and sentiment analysis are supervised.

73
00:05:27,840 --> 00:05:32,280
On the other hand, topic modeling and latent semantic indexing are unsupervised.

74
00:05:33,360 --> 00:05:38,910
Interestingly, tech summarization can be either, but for the purpose of this course, it's unsupervised

75
00:05:38,910 --> 00:05:39,480
as well.

76
00:05:40,140 --> 00:05:45,330
However, it's conceptually easier to understand compared with the other unsupervised topics.