1
00:00:00,000 --> 00:00:02,595
The full scope of
how embeddings work

2
00:00:02,595 --> 00:00:04,260
is beyond the scope
of this course.

3
00:00:04,260 --> 00:00:06,105
But think of it like this.

4
00:00:06,105 --> 00:00:08,070
You have words in a sentence and

5
00:00:08,070 --> 00:00:09,150
often words that have

6
00:00:09,150 --> 00:00:11,400
similar meanings are
close to each other.

7
00:00:11,400 --> 00:00:12,870
So in a movie review,

8
00:00:12,870 --> 00:00:15,765
it might say that the movie
was dull and boring,

9
00:00:15,765 --> 00:00:18,675
or it might say that it
was fun and exciting.

10
00:00:18,675 --> 00:00:20,820
So what if you could
pick a vector in

11
00:00:20,820 --> 00:00:23,985
a higher-dimensional space
say 16 dimensions,

12
00:00:23,985 --> 00:00:25,845
and words that are found together

13
00:00:25,845 --> 00:00:27,825
are given similar vectors.

14
00:00:27,825 --> 00:00:32,040
Then over time, words can
begin to cluster together.

15
00:00:32,040 --> 00:00:34,190
The meaning of the words can

16
00:00:34,190 --> 00:00:36,695
come from the labeling
of the dataset.

17
00:00:36,695 --> 00:00:38,270
So in this case, we say

18
00:00:38,270 --> 00:00:40,970
a negative review and
the words dull and

19
00:00:40,970 --> 00:00:42,710
boring show up a lot in

20
00:00:42,710 --> 00:00:46,100
the negative review so that
they have similar sentiments,

21
00:00:46,100 --> 00:00:48,725
and they are close to each
other in the sentence.

22
00:00:48,725 --> 00:00:51,215
Thus their vectors
will be similar.

23
00:00:51,215 --> 00:00:53,135
As the neural network trains,

24
00:00:53,135 --> 00:00:56,585
it can then learn
these vectors associating them

25
00:00:56,585 --> 00:00:58,520
with the labels to come up

26
00:00:58,520 --> 00:01:00,630
with what's called
an embedding i.e.,

27
00:01:00,630 --> 00:01:02,360
the vectors for each word

28
00:01:02,360 --> 00:01:04,850
with their associated sentiment.

29
00:01:04,850 --> 00:01:07,310
The results of
the embedding will be

30
00:01:07,310 --> 00:01:09,230
a 2D array with the length of

31
00:01:09,230 --> 00:01:11,615
the sentence and
the embedding dimension

32
00:01:11,615 --> 00:01:14,255
for example 16 as its size.

33
00:01:14,255 --> 00:01:16,070
So we need to flatten it out in

34
00:01:16,070 --> 00:01:19,595
much the same way as we needed
to flatten out our images.

35
00:01:19,595 --> 00:01:21,470
We then feed that into

36
00:01:21,470 --> 00:01:24,490
a dense neural network to
do the classification.

37
00:01:24,490 --> 00:01:27,225
Often in natural
language processing,

38
00:01:27,225 --> 00:01:30,110
a different layer type
than a flatten is used,

39
00:01:30,110 --> 00:01:33,275
and this is a
global average pooling 1D.

40
00:01:33,275 --> 00:01:35,090
The reason for this
is the size of

41
00:01:35,090 --> 00:01:37,835
the output vector being
fed into the dance.

42
00:01:37,835 --> 00:01:40,310
So for example, if
I show the summary

43
00:01:40,310 --> 00:01:42,785
of the model with the
flatten that we just saw,

44
00:01:42,785 --> 00:01:44,455
it will look like this.

45
00:01:44,455 --> 00:01:46,520
Or alternatively, you can use

46
00:01:46,520 --> 00:01:49,140
a Global Average
Pooling 1D like this,

47
00:01:49,140 --> 00:01:53,090
which averages across
the vector to flatten it out.

48
00:01:53,090 --> 00:01:55,790
Your model summary
should look like this,

49
00:01:55,790 --> 00:01:59,050
which is simpler and
should be a little faster.

50
00:01:59,050 --> 00:02:02,655
Try it for yourself in colab
and check the results.

51
00:02:02,655 --> 00:02:05,845
Over 10 epochs with
global average pooling,

52
00:02:05,845 --> 00:02:08,840
I got an accuracy of 0.9664 on

53
00:02:08,840 --> 00:02:12,140
training and 0.8187 on test,

54
00:02:12,140 --> 00:02:14,990
taking about 6.2
seconds per epoch.

55
00:02:14,990 --> 00:02:19,250
With flatten, my accuracy
was 1.0 and my validation

56
00:02:19,250 --> 00:02:23,375
about 0.83 taking about
6.5 seconds per epoch.

57
00:02:23,375 --> 00:02:24,830
So it was a little slower,

58
00:02:24,830 --> 00:02:26,180
but a bit more accurate.

59
00:02:26,180 --> 00:02:27,435
Try them both out, and

60
00:02:27,435 --> 00:02:30,240
experiment where
the results for yourself.