So let's start with
this basic neural network. It has an embedding
taking my vocab size, embedding dimensions, and
input length as usual. The output from
the embedding is flattened, averaged, and then fed into
a dense neural network. But we can experiment with the layers that
bridge the embedding and the dense by removing the flatten and puling from here, and replacing them with
an LSTM like this. For a trainee using the sarcasm
data-set with these, when just using the pooling
and flattening, I quickly got close to
85 percent accuracy and then it flattened out there. The validation set was
a little less accurate, but the curves we're
quite in sync. On the other hand,
when using LSTM, I reached 85 percent accuracy really quickly and
continued climbing towards about 97.5 percent
accuracy within 50 epochs. The validation set
dropped slowly, but it was still close to the same value as
the non- LSTM version. Still the drop indicates that there's some over
fitting going on here. So a bit of tweaking to
the LSTM should help fix that. Similarly, the loss values
from my non-LSTM one got to healthy state quite quickly and
then flattened out. Whereas with the LSTM, the training loss drop nicely, but the validation one increased
as I continue training. Again, this shows some over
fitting in the LSTM network. While the accuracy of
the prediction increased, the confidence in it decreased. So you should be
careful to adjust your training parameters when you use different network types, it's not just a straight
drop-in like I did here.