Here's the comparison
of accuracies between the one layer LSTM and the two
layer one over 10 epochs. There's not much of
a difference except the nosedive and
the validation accuracy. But notice how the training
curve is smoother. I found from
training networks that jaggedness can be an indication that your model
needs improvement, and the single LSTM that you can see here is not the smoothest. If you look at loss, over the first 10 epochs, we can see similar results. But look what happens
when we increase to 50 epochs training. Our one layer LSTM, while climbing in accuracy, is also prone to
some pretty sharp dips. The final result might be good, but those dips makes me suspicious about the overall
accuracy of the model. Our two layer one
looks much smoother, and as such makes me much more
confident in its results. Note also the
validation accuracy. Considering it levels
out at about 80 percent, it's not bad given that
the training set and the test set were
both 25,000 reviews. But we're using 8,000 sub-words taken only
from the training set. So there would be many tokens in the test sets that would
be out of vocabulary. Yet despite that, we are still at about 80 percent accuracy. Our loss results are similar with the two layer having
a much smoother curve. The loss is increasing
epoch by epoch. So that's worth
monitoring to see if it flattens out in later epochs
as would be desired. I hope this was
a good introduction into how RNNs and LSTMs can help you
with text classification. Their inherent
sequencing is great for predicting unseen text if
you want to generate some, and we'll see that next week. But first, I'd like to
explore some other RNN types, and you'll see those
in the next video.