So that was an example using the IMDB data set, where data is provided to you by the tfds API, which I hope you found helpful. Now I'd like to return to the sarcasm data set from last week, and let's look at building a classifier for that. We'll start with importing tensorflow and json, as well as the tokenizer and pad sequences from pre-processing. Now let's set up our hyper parameters; the vocabulary size, embedding dimensions, maximum length of sentences, and other stuff like the training size. This data set has about 27,000 records. So let's train on 20,000 and validate on the rest. The sarcasm data is stored at this URL, so you can download it to /tmp/sarcasm.json with this code. Now that you have the data set, you can open it and load it as an iterable with this code. You can create an array for sentences, and another for labels, and then iterate through the datastore, loading each headline as a sentence, and each is_sarcastic field, as your label.