Think about loss in this context, as a confidence in
the prediction. So while the number of accurate predictions
increased over time, what was interesting was that the confidence per prediction
effectively decreased. You may find this happening
a lot with text data. So it's very important
to keep an eye on it. One way to do this is to explore the differences as you
tweak the hyperparameters. So for example, if you
consider these changes, a decrease in vocabulary size, and taking shorter sentences, reducing the
likelihood of padding, and then rerun, you may
see results like this. Here, you can see
that the loss has flattened out which looks good, but of course,
your accuracy is not as high. Another tweak. Changing the number of dimensions using the embedding
was also tried. Here, we can see that that
had very little difference. Putting the hyperparameters as separate variables like this is a useful programming exercise, making it much easier for you to tweak and explore
their impact on training. Keep working on them
and see if you can find any combinations that give a 90 percent plus
training accuracy without a cost of the lost
function increasing sharply. In the next video, we'll also look at the impact
of splitting our words into sub-tokens and how that
might impact your training.