This code is very similar to what you saw in the earlier videos, but let's look at it line by line. We've just created a sentences list from the headlines, in the sarcasm data set. So by calling tokenizer.fit_on_texts, will generate the word index and we'll initialize the tokenizer. We can see the word index as before by calling the word index property. Note that this returns all words that the tokenizer saw when tokenizing the sentences. If you specify num words to get the top 1000 or whatever, you may be confused by seeing something greater than that here. It's an easy mistake to make, but the key thing to remember, is that when it takes the top 1000 or whatever you specified, it does that in the text to sequence this process. Our word index is much larger than with the previous example. So we'll see a greater variety of words in it. Here's a few. Now we'll create the sequences from the text, as well as padding them. Here's the code to do that. It's very similar to what you did earlier, and here's the output. First, I took the first headline in the data set and showed its output. We can see that it has been encoded with the values for the keys that are the corresponding word in the sentence. This is the size of the padded matrix. We had 26,709 sentences, and they were encoded with padding, to get them up to 40 words long which was the length of the longest one. You could truncate this if you like, but I'll keep it at 40. That's it for processing the Sarcasm data set. Let's take a look at that in action in a screen cast.