1
00:00:11,110 --> 00:00:16,600
So in this lecture, we are going to briefly discuss a concept known as the bag of words representation.

2
00:00:17,350 --> 00:00:21,820
Well, begin by reiterating that text and language more generally are sequential.

3
00:00:22,420 --> 00:00:27,820
That is, text is made up of sequences of words, and those sequences give our words meaning.

4
00:00:28,540 --> 00:00:31,870
For example, if you take any sentence, but then a randomize.

5
00:00:31,870 --> 00:00:36,880
Other words, you would completely change the meaning of that sentence or more likely, just render

6
00:00:36,880 --> 00:00:38,470
the sentence incomprehensible.

7
00:00:39,280 --> 00:00:45,370
Now, despite this, many NLP approaches simply do not consider the ordering of words in a sentence.

8
00:00:46,540 --> 00:00:52,390
So suppose that we come up with a numerical representation of text that does not consider word ordering.

9
00:00:53,050 --> 00:00:56,260
When we do this, we call this a bag of words representation.

10
00:01:00,950 --> 00:01:06,620
OK, so let's think of an example of where we would lose information by using a bag of words representation.

11
00:01:07,490 --> 00:01:09,440
Consider the phrase dog toy.

12
00:01:10,130 --> 00:01:13,700
The meaning of this is that we have a toy, which is for dogs.

13
00:01:14,360 --> 00:01:16,490
Now consider the phrase toy dog.

14
00:01:17,210 --> 00:01:20,720
The meaning of this is that we have a toy, which looks like a dog.

15
00:01:21,200 --> 00:01:23,300
So clearly these two things are different.

16
00:01:23,660 --> 00:01:29,450
And clearly, any representation that ignored the ordering of these words would not be able to differentiate

17
00:01:29,450 --> 00:01:30,890
between these two entities.

18
00:01:31,430 --> 00:01:34,250
That's why we can say that we are losing information.

19
00:01:38,940 --> 00:01:44,370
Now, despite the fact that the bag of words approach seems to be quite limited, it is in fact widely

20
00:01:44,370 --> 00:01:46,870
used in this course.

21
00:01:46,890 --> 00:01:51,990
What you'll find is that for the most part, vector models and classical machine learning make use of

22
00:01:51,990 --> 00:01:53,310
the big of words approach.

23
00:01:53,880 --> 00:01:59,370
On the other hand, probabilistic models and deep learning do not make use of the bag of words approach.

24
00:01:59,880 --> 00:02:03,180
But keep in mind that this is just a very rough categorization.

25
00:02:03,540 --> 00:02:08,970
We do have probabilistic models that use bag of words, and we do have deep learning models that also

26
00:02:08,970 --> 00:02:10,110
use a bag of words.

27
00:02:10,560 --> 00:02:15,210
So it's not exact, but I hope that as you go through the course, you will look for these patterns

28
00:02:15,210 --> 00:02:19,200
and recognize when bag of words is being used and when it isn't.

29
00:02:19,950 --> 00:02:24,900
And another thing to mention is that although the bag of words approach does seem to be limited, it

30
00:02:24,900 --> 00:02:27,150
is pretty accurate in many cases.

