There can be a limitation when approaching text classification in this way. Consider the following. Here's a sentence. Today has a beautiful blue. What do you think would come next? Probably sky. Right? Today has a beautiful blue sky. Why would you say that? Well, there's a big clue in the word blue. In a context like this, it's quite likely that when we're talking about a beautiful blue something, we mean a beautiful blue sky. So, the context word that helps us understand the next word is very close to the word that we're interested in. But, what about a sentence like this, I lived in Ireland so at school they made me learn how to speak something. How would you finish that sentence? Well, you might say Irish but you'd be much more accurate if you said, I lived in Ireland so at school they made me learn how to speak Gaelic. First of course, is the syntactic issue. Irish describes the people, Gaelic describes the language. But more importantly in the ML context is the key word that gives us the details about the language. That's the word Ireland, which appears much earlier in the sentence. So, if we're looking at a sequence of words we might lose that context. With that in mind an update to RNNs is called LSTM, long short - term memory has been created. In addition to the context being PaaSed as it is in RNNs, LSTMs have an additional pipeline of contexts called cell state. This can pass through the network to impact it. This helps keep context from earlier tokens relevance in later ones so issues like the one that we just discussed can be avoided. Cell states can also be bidirectional. So later contexts can impact earlier ones as we'll see when we look at the code. The detail about LSTMs is beyond the scope of this course but you can learn more about them in this video from Andrew.