1
00:00:00,470 --> 00:00:01,970
Before you go onto the next unit,

2
00:00:01,970 --> 00:00:05,370
I have created some CO-LABS
with each of these options.

3
00:00:05,370 --> 00:00:09,220
Try them out for yourself, check on
the time, check on the results, and

4
00:00:09,220 --> 00:00:13,440
see what techniques you can figure
out to avoid some of the overfitting.

5
00:00:13,440 --> 00:00:14,720
Remember that with text,

6
00:00:14,720 --> 00:00:18,770
you'll probably get a bit more overfitting
than you would have done with images.

7
00:00:18,770 --> 00:00:21,591
Not least because you'll
almost always have out

8
00:00:21,591 --> 00:00:24,420
of vocabulary words in
the validation data set.

9
00:00:24,420 --> 00:00:28,017
That is words in the validation dataset
that weren't present in the training,

10
00:00:28,017 --> 00:00:30,100
naturally leading to overfitting.

11
00:00:30,100 --> 00:00:33,060
These words can't be classified and,
of course, you're going to have these

12
00:00:33,060 --> 00:00:35,480
overfitting issues, but
see what you can do to avoid them.