1
00:00:00,000 --> 00:00:03,570
So now, let's take a look at
how to get a prediction for

2
00:00:03,570 --> 00:00:05,010
a word and how to generate

3
00:00:05,010 --> 00:00:07,350
new text based on
those predictions.

4
00:00:07,350 --> 00:00:09,720
So let's start with
a single sentence.

5
00:00:09,720 --> 00:00:12,240
For example, 'Lawrence
went to Dublin.'

6
00:00:12,240 --> 00:00:14,820
I'm calling this
sentence the seed.

7
00:00:14,820 --> 00:00:16,830
If I want to predict
the next 10 words

8
00:00:16,830 --> 00:00:18,765
in the sentence to follow this,

9
00:00:18,765 --> 00:00:21,330
then this code will
tokenizer that for

10
00:00:21,330 --> 00:00:25,305
me using the text to sequences
method on the tokenizer.

11
00:00:25,305 --> 00:00:28,340
As we don't have
an outer vocabulary word,

12
00:00:28,340 --> 00:00:30,470
it will ignore 'Lawrence,'
which isn't in

13
00:00:30,470 --> 00:00:33,815
the corpus and will get
the following sequence.

14
00:00:33,815 --> 00:00:36,215
This code will then
pad the sequence

15
00:00:36,215 --> 00:00:38,945
so it matches the ones
in the training set.

16
00:00:38,945 --> 00:00:40,895
So we end up with something like

17
00:00:40,895 --> 00:00:43,220
this which we can

18
00:00:43,220 --> 00:00:45,920
pass to the model to
get a prediction back.

19
00:00:45,920 --> 00:00:48,125
This will give us
the token of the word

20
00:00:48,125 --> 00:00:51,430
most likely to be
the next one in the sequence.

21
00:00:51,430 --> 00:00:53,345
So now, we can do

22
00:00:53,345 --> 00:00:56,870
a reverse lookup on the word
index items to turn

23
00:00:56,870 --> 00:00:59,780
the token back into
a word and to add that

24
00:00:59,780 --> 00:01:02,855
to our seed texts, and that's it.

25
00:01:02,855 --> 00:01:04,940
Here's the complete
code to do that

26
00:01:04,940 --> 00:01:07,640
10 times and you can
tweak it for more.

27
00:01:07,640 --> 00:01:10,130
But do you know that
the more words you predict,

28
00:01:10,130 --> 00:01:12,845
the more likely you are
going to get gibberish?

29
00:01:12,845 --> 00:01:15,220
Because each word is predicted,

30
00:01:15,220 --> 00:01:17,250
so it's not 100 per cent certain,

31
00:01:17,250 --> 00:01:19,230
and then the next one
is less certain,

32
00:01:19,230 --> 00:01:21,985
and the next one, etc.

33
00:01:21,985 --> 00:01:23,540
So for example, if you try

34
00:01:23,540 --> 00:01:25,940
the same seed and
predict 100 words,

35
00:01:25,940 --> 00:01:28,770
you'll end up with
something like this.

36
00:01:28,880 --> 00:01:31,755
Using a larger corpus we'll help,

37
00:01:31,755 --> 00:01:32,920
and then the next video,

38
00:01:32,920 --> 00:01:34,700
you'll see the impact of that,

39
00:01:34,700 --> 00:01:36,110
as well as some tweaks
that a neural

40
00:01:36,110 --> 00:01:38,970
network that will help
you create poetry.