So now, we have to split our sequences into
our x's and our y's. To do this, let's grab
the first n tokens, and make them our x's. We'll then get the last token
and make it our label. Before the label becomes a y, there's one more step, and
you'll see that shortly. Python makes this really easy to do with it's less syntax. So to get my x's, I just get all of the input sequences sliced
to remove the last token. To get the labels, I get all of the input sequence sliced to keep the last token. Now, I should one-hot encode my labels as this really is
a classification problem. Where given a sequence of words, I can classify from the corpus, what the next word
would likely be. So to one-hot encode, I can use the contrast utility to convert a list to a categorical. I simply give it
the list of labels and the number of classes which
is my number of words, and it will create a one-hot
encoding of the labels. So for example, if we consider this list of
tokens as a sentence, then the x is the list
up to the last value, and the label is the last value
which in this case is 70. The y is a one-hot
encoded array whether length is the size of
the corpus of words and the value that is set
to one is the one at the index of the label which in this case is the 70th element. Okay. You now have all of the data ready to train
a network for prediction. Hopefully, this was
useful for you. You'll see the neural network
in the next video. But first, let's see your screen cast of processing the data, using the methods that
you saw in this lesson.