So what do we learn from this? First of all, we really need a lot of training data to get a broad vocabulary or we could end up with sentences like, my dog my, like we just did. Secondly, in many cases, it's a good idea to instead of just ignoring unseen words, to put a special value in when an unseen word is encountered. You can do this with a property on the tokenizer. Let's take a look. Here's the complete code showing both the original sentences and the test data. What I've changed is to add a property oov token to the tokenizer constructor. You can see now that I've specified that I want the token oov for outer vocabulary to be used for words that aren't in the word index. You can use whatever you like here, but remember that it should be something unique and distinct that isn't confused with a real word. So now, if I run this code, I'll get my test sequences looking like this. I pasted the word index underneath so you can look it up. The first sentence will be, i out of vocab, love my dog. The second will be, my dog oov, my oov Still not syntactically great, but it is doing better. As the corpus grows and more words are in the index, hopefully previously unseen sentences will have better coverage. Next up is padding. As we mentioned earlier when we were building neural networks to handle pictures. When we fed them into the network for training, we needed them to be uniform in size. Often, we use the generators to resize the image to fit for example. With texts you'll face a similar requirement before you can train with texts, we needed to have some level of uniformity of size, so padding is your friend there.