1
00:00:00,000 --> 00:00:02,775
Think about loss in this context,

2
00:00:02,775 --> 00:00:05,100
as a confidence in
the prediction.

3
00:00:05,100 --> 00:00:06,405
So while the number of

4
00:00:06,405 --> 00:00:09,240
accurate predictions
increased over time,

5
00:00:09,240 --> 00:00:10,980
what was interesting was that

6
00:00:10,980 --> 00:00:14,505
the confidence per prediction
effectively decreased.

7
00:00:14,505 --> 00:00:17,415
You may find this happening
a lot with text data.

8
00:00:17,415 --> 00:00:19,770
So it's very important
to keep an eye on it.

9
00:00:19,770 --> 00:00:21,770
One way to do this is to explore

10
00:00:21,770 --> 00:00:25,190
the differences as you
tweak the hyperparameters.

11
00:00:25,190 --> 00:00:29,390
So for example, if you
consider these changes,

12
00:00:29,390 --> 00:00:31,790
a decrease in vocabulary size,

13
00:00:31,790 --> 00:00:33,770
and taking shorter sentences,

14
00:00:33,770 --> 00:00:35,765
reducing the
likelihood of padding,

15
00:00:35,765 --> 00:00:40,345
and then rerun, you may
see results like this.

16
00:00:40,345 --> 00:00:41,810
Here, you can see
that the loss has

17
00:00:41,810 --> 00:00:43,790
flattened out which looks good,

18
00:00:43,790 --> 00:00:45,710
but of course,
your accuracy is not

19
00:00:45,710 --> 00:00:48,290
as high. Another tweak.

20
00:00:48,290 --> 00:00:49,880
Changing the number of dimensions

21
00:00:49,880 --> 00:00:51,680
using the embedding
was also tried.

22
00:00:51,680 --> 00:00:55,475
Here, we can see that that
had very little difference.

23
00:00:55,475 --> 00:00:58,070
Putting the hyperparameters as

24
00:00:58,070 --> 00:00:59,840
separate variables like this is

25
00:00:59,840 --> 00:01:01,925
a useful programming exercise,

26
00:01:01,925 --> 00:01:03,560
making it much easier for you to

27
00:01:03,560 --> 00:01:06,065
tweak and explore
their impact on training.

28
00:01:06,065 --> 00:01:08,150
Keep working on them
and see if you can

29
00:01:08,150 --> 00:01:10,220
find any combinations that give

30
00:01:10,220 --> 00:01:12,454
a 90 percent plus
training accuracy

31
00:01:12,454 --> 00:01:15,935
without a cost of the lost
function increasing sharply.

32
00:01:15,935 --> 00:01:17,300
In the next video,

33
00:01:17,300 --> 00:01:19,970
we'll also look at the impact
of splitting our words into

34
00:01:19,970 --> 00:01:24,290
sub-tokens and how that
might impact your training.