So let's start with this basic neural network. It has an embedding taking my vocab size, embedding dimensions, and input length as usual. The output from the embedding is flattened, averaged, and then fed into a dense neural network. But we can experiment with the layers that bridge the embedding and the dense by removing the flatten and puling from here, and replacing them with an LSTM like this. For a trainee using the sarcasm data-set with these, when just using the pooling and flattening, I quickly got close to 85 percent accuracy and then it flattened out there. The validation set was a little less accurate, but the curves we're quite in sync. On the other hand, when using LSTM, I reached 85 percent accuracy really quickly and continued climbing towards about 97.5 percent accuracy within 50 epochs. The validation set dropped slowly, but it was still close to the same value as the non- LSTM version. Still the drop indicates that there's some over fitting going on here. So a bit of tweaking to the LSTM should help fix that. Similarly, the loss values from my non-LSTM one got to healthy state quite quickly and then flattened out. Whereas with the LSTM, the training loss drop nicely, but the validation one increased as I continue training. Again, this shows some over fitting in the LSTM network. While the accuracy of the prediction increased, the confidence in it decreased. So you should be careful to adjust your training parameters when you use different network types, it's not just a straight drop-in like I did here.