We have a pre-trained sub-words tokenizer now, so we can inspect its vocabulary by looking at its sub-words property. If we want to see how it encodes or decode strings, we can do so with this code. So we can encode simply by calling the encode method passing it the string. Similarly, decode by calling the decode method. We can see the results of the tokenization when we print out the encoded and decoded strings. If we want to see the tokens themselves, we can take each element and decode that, showing the value to token. Note that this is case sensitive and punctuation is maintained unlike the tokenizer we saw in the last video. You don't need to do anything with them yet, I just wanted to show you how sub-word tokenization works. So now, let's take a look at classifying IMDB with it. What the results are going to be? Here's the model. Again, it should look very familiar at this point. One thing to take into account though, is the shape of the vectors coming from the tokenizer through the embedding, and it's not easily flattened. So we'll use Global Average Pooling 1D instead. Trying to flatten them, will cause a TensorFlow crash. Here's the output of the model summary. You can compile and train the model like this, it's pretty standard code. Training is dealing with a lot of hyper-parameters and sub-words, so expect it to be slow. Running on a colab with GPU took me about four-and-a-half minutes per epoch. So set it off and give it some time to train. If your results don't look good, don't worry, that's part of the point. You can graph the results with this code, and your graphs will probably look something like this. In my case, the accuracy was barely about 50 percent, which you could get with a random guess. While losses decreasing, it's decreasing in a very small way. So why do you think that might be? Well, the keys in the fact that we're using sub-words and not for-words, sub-word meanings are often nonsensical and it's only when we put them together in sequences that they have meaningful semantics. Thus, some way from learning from sequences would be a great way forward, and that's exactly what you're going to do next week with recurrent neural networks