To split the corpus into training and validation sets, we'll use this code. To get the training set, you take array items from zero to the training size, and to get the testing set, you can go from training size to the end of the array with code like this. To get the training and testing labels, you'll use similar codes to slice the labels array. Now that we have training and test sets of sequences and labels, it's time to sequence them. To pad those sequences, you'll do that with this code. You start with a tokenizer, passing it the number of words you want to tokenize on and the desired out of vocabulary token. Then fit that on the training set by calling fit on texts, passing it the training sentences array. Then you can use text to sequences to create the training sequence, replacing the words with their tokens. Then you can pad the training sequences to the desired length or truncate if they're too long. Next, you'll do the same but with a test set. Now, we can create our neural network in the usual way. We'll compile it with binary cross entropy, as we're classifying to different classes. When we call a model's summary, we'll see that it looks like this, pretty much as we'd expect. It's pretty simple and embedding feeds into an average pooling, which then feeds our DNA. To train for 30 epochs, you pass in the padded data and labels. If you want to validate, you'll give the testing padded and labels to. After training for little while, you can plot the results. Here's the code for simple plot. We can see accuracy increase nicely as we trained and the validation accuracy was okay, but not great. What's interesting is the loss values on the right, the training loss fall, but the validation loss increased. Well, why might that be?