So we'll start by looking at TensorFlow data-sets, you can find them at this URL. If you look at the IMDB reviews data-set, you'll see that there's a bunch of versions that you can use. These include,"plain_text" which we used in the last video,"bytes", where the text is encoded at byte level, and sub-word encoding which we'll look at in this video. One thing to note is that you should use TensorFlow 2.0 to for the code I'll be sharing here. There are some inconsistencies with version 1.x. So if you're using the colab, you should first print out the TF version. If it is 1.x, you should install TensorFlow 2 like this. Note that over time the alpha's 0 will change to later versions. So I would recommend that you look up the latest install guide for TensorFlow 2.0 if you hit any issues. I'd recommend running this code again to ensure that you are on version 2 before going any further, particularly if you're using a Colab or a Jupiter notebook. Once you're on TensorFlow 2, you can now start using the imdb subwords data-set. We'll use the 8k version today. Getting access to your training and test data is then as easy as this. Next, if you want to access the sub words tokenizer, you can do it with this code. You can learn all about the sub-words texts encoder at this URL.