The full scope of
how embeddings work is beyond the scope
of this course. But think of it like this. You have words in a sentence and often words that have similar meanings are
close to each other. So in a movie review, it might say that the movie
was dull and boring, or it might say that it
was fun and exciting. So what if you could
pick a vector in a higher-dimensional space
say 16 dimensions, and words that are found together are given similar vectors. Then over time, words can
begin to cluster together. The meaning of the words can come from the labeling
of the dataset. So in this case, we say a negative review and
the words dull and boring show up a lot in the negative review so that
they have similar sentiments, and they are close to each
other in the sentence. Thus their vectors
will be similar. As the neural network trains, it can then learn
these vectors associating them with the labels to come up with what's called
an embedding i.e., the vectors for each word with their associated sentiment. The results of
the embedding will be a 2D array with the length of the sentence and
the embedding dimension for example 16 as its size. So we need to flatten it out in much the same way as we needed
to flatten out our images. We then feed that into a dense neural network to
do the classification. Often in natural
language processing, a different layer type
than a flatten is used, and this is a
global average pooling 1D. The reason for this
is the size of the output vector being
fed into the dance. So for example, if
I show the summary of the model with the
flatten that we just saw, it will look like this. Or alternatively, you can use a Global Average
Pooling 1D like this, which averages across
the vector to flatten it out. Your model summary
should look like this, which is simpler and
should be a little faster. Try it for yourself in colab
and check the results. Over 10 epochs with
global average pooling, I got an accuracy of 0.9664 on training and 0.8187 on test, taking about 6.2
seconds per epoch. With flatten, my accuracy
was 1.0 and my validation about 0.83 taking about
6.5 seconds per epoch. So it was a little slower, but a bit more accurate. Try them both out, and experiment where
the results for yourself.