Think about loss in this context, as a confidence in the prediction. So while the number of accurate predictions increased over time, what was interesting was that the confidence per prediction effectively decreased. You may find this happening a lot with text data. So it's very important to keep an eye on it. One way to do this is to explore the differences as you tweak the hyperparameters. So for example, if you consider these changes, a decrease in vocabulary size, and taking shorter sentences, reducing the likelihood of padding, and then rerun, you may see results like this. Here, you can see that the loss has flattened out which looks good, but of course, your accuracy is not as high. Another tweak. Changing the number of dimensions using the embedding was also tried. Here, we can see that that had very little difference. Putting the hyperparameters as separate variables like this is a useful programming exercise, making it much easier for you to tweak and explore their impact on training. Keep working on them and see if you can find any combinations that give a 90 percent plus training accuracy without a cost of the lost function increasing sharply. In the next video, we'll also look at the impact of splitting our words into sub-tokens and how that might impact your training.