Earlier this week, we looked at using TensorFlow Data Services, or TFDS to load the reviews from the IMDb dataset and perform classification on them. In that video, you loaded the raw text for the reviews, and tokenized them yourself. However, often with prepackaged datasets like these, some data scientists have done the work for you already, and the IMDb dataset is no exception. In this video, we'll take a look at a version of the IMDb dataset that has been pre-tokenized for you, but the tokenization is done on sub words. We'll use that to demonstrate how text classification can have some unique issues, namely that the sequence of words can be just as important as their existence.