1
00:00:00,000 --> 00:00:03,540
We have a pre-trained
sub-words tokenizer now,

2
00:00:03,540 --> 00:00:04,680
so we can inspect

3
00:00:04,680 --> 00:00:08,490
its vocabulary by looking
at its sub-words property.

4
00:00:08,490 --> 00:00:11,790
If we want to see how it
encodes or decode strings,

5
00:00:11,790 --> 00:00:13,815
we can do so with this code.

6
00:00:13,815 --> 00:00:16,500
So we can encode
simply by calling

7
00:00:16,500 --> 00:00:19,290
the encode method
passing it the string.

8
00:00:19,290 --> 00:00:23,520
Similarly, decode by
calling the decode method.

9
00:00:23,520 --> 00:00:26,550
We can see the results
of the tokenization when

10
00:00:26,550 --> 00:00:29,640
we print out the encoded
and decoded strings.

11
00:00:29,640 --> 00:00:32,665
If we want to see
the tokens themselves,

12
00:00:32,665 --> 00:00:35,240
we can take each element
and decode that,

13
00:00:35,240 --> 00:00:37,400
showing the value to token.

14
00:00:37,400 --> 00:00:41,300
Note that this is
case sensitive and punctuation

15
00:00:41,300 --> 00:00:43,760
is maintained unlike
the tokenizer

16
00:00:43,760 --> 00:00:45,395
we saw in the last video.

17
00:00:45,395 --> 00:00:47,960
You don't need to do
anything with them yet,

18
00:00:47,960 --> 00:00:49,160
I just wanted to show you

19
00:00:49,160 --> 00:00:51,745
how sub-word tokenization works.

20
00:00:51,745 --> 00:00:55,325
So now, let's take a look at
classifying IMDB with it.

21
00:00:55,325 --> 00:00:57,780
What the results are going to be?

22
00:00:58,330 --> 00:01:01,160
Here's the model. Again, it

23
00:01:01,160 --> 00:01:03,565
should look very
familiar at this point.

24
00:01:03,565 --> 00:01:05,860
One thing to take
into account though,

25
00:01:05,860 --> 00:01:08,030
is the shape of
the vectors coming from

26
00:01:08,030 --> 00:01:10,450
the tokenizer through
the embedding,

27
00:01:10,450 --> 00:01:12,310
and it's not easily flattened.

28
00:01:12,310 --> 00:01:15,575
So we'll use Global Average
Pooling 1D instead.

29
00:01:15,575 --> 00:01:19,210
Trying to flatten them, will
cause a TensorFlow crash.

30
00:01:19,610 --> 00:01:23,165
Here's the output of
the model summary.

31
00:01:23,165 --> 00:01:26,255
You can compile and train
the model like this,

32
00:01:26,255 --> 00:01:28,090
it's pretty standard code.

33
00:01:28,090 --> 00:01:30,350
Training is dealing with

34
00:01:30,350 --> 00:01:32,779
a lot of hyper-parameters
and sub-words,

35
00:01:32,779 --> 00:01:34,610
so expect it to be slow.

36
00:01:34,610 --> 00:01:37,250
Running on a colab
with GPU took me

37
00:01:37,250 --> 00:01:39,830
about four-and-a-half
minutes per epoch.

38
00:01:39,830 --> 00:01:42,680
So set it off and give
it some time to train.

39
00:01:42,680 --> 00:01:44,145
If your results don't look good,

40
00:01:44,145 --> 00:01:46,750
don't worry, that's
part of the point.

41
00:01:46,850 --> 00:01:50,655
You can graph the results
with this code,

42
00:01:50,655 --> 00:01:54,505
and your graphs will probably
look something like this.

43
00:01:54,505 --> 00:01:58,025
In my case, the accuracy was
barely about 50 percent,

44
00:01:58,025 --> 00:02:00,185
which you could get
with a random guess.

45
00:02:00,185 --> 00:02:02,240
While losses decreasing, it's

46
00:02:02,240 --> 00:02:05,160
decreasing in a very small way.

47
00:02:05,390 --> 00:02:08,130
So why do you think
that might be?

48
00:02:08,130 --> 00:02:10,370
Well, the keys in
the fact that we're using

49
00:02:10,370 --> 00:02:12,625
sub-words and not for-words,

50
00:02:12,625 --> 00:02:16,160
sub-word meanings are often
nonsensical and it's only

51
00:02:16,160 --> 00:02:17,400
when we put them together in

52
00:02:17,400 --> 00:02:20,125
sequences that they have
meaningful semantics.

53
00:02:20,125 --> 00:02:22,220
Thus, some way from learning from

54
00:02:22,220 --> 00:02:24,830
sequences would be
a great way forward,

55
00:02:24,830 --> 00:02:26,630
and that's exactly what
you're going to do

56
00:02:26,630 --> 00:02:29,910
next week with
recurrent neural networks