1
00:00:00,000 --> 00:00:02,070
We've seen classification of

2
00:00:02,070 --> 00:00:04,020
text over the last few lessons.

3
00:00:04,020 --> 00:00:07,235
But what about if we want
to generate new text.

4
00:00:07,235 --> 00:00:10,065
Now this might sound like
new unbroken ground,

5
00:00:10,065 --> 00:00:11,220
but when you think about it,

6
00:00:11,220 --> 00:00:12,930
you've actually covered
everything that

7
00:00:12,930 --> 00:00:15,135
you need to do this already.

8
00:00:15,135 --> 00:00:17,265
Instead of generating new text,

9
00:00:17,265 --> 00:00:20,565
how about thinking about it
as a prediction problem.

10
00:00:20,565 --> 00:00:22,740
Remember when for example you had

11
00:00:22,740 --> 00:00:24,765
a bunch of pixels for a picture,

12
00:00:24,765 --> 00:00:26,310
and you trained a
neural network to

13
00:00:26,310 --> 00:00:28,590
classify what those pixels were,

14
00:00:28,590 --> 00:00:31,005
and it would predict
the contents of the image,

15
00:00:31,005 --> 00:00:32,565
like maybe a fashion item,

16
00:00:32,565 --> 00:00:34,135
or a piece of handwriting.

17
00:00:34,135 --> 00:00:36,945
Well, text prediction
is very similar.

18
00:00:36,945 --> 00:00:38,820
We can get a body of texts,

19
00:00:38,820 --> 00:00:41,040
extract the full
vocabulary from it,

20
00:00:41,040 --> 00:00:43,620
and then create
datasets from that,

21
00:00:43,620 --> 00:00:46,520
where we make it
phrase the Xs and

22
00:00:46,520 --> 00:00:49,705
the next word in
that phrase to be the Ys.

23
00:00:49,705 --> 00:00:52,320
For example, consider the phrase,

24
00:00:52,320 --> 00:00:54,100
Twinkle, Twinkle, Little, Star.

25
00:00:54,100 --> 00:00:55,550
What if we were to create

26
00:00:55,550 --> 00:00:57,920
training data where
the Xs are Twinkle,

27
00:00:57,920 --> 00:01:01,345
Twinkle, Little,
and the Y is star.

28
00:01:01,345 --> 00:01:04,830
Then, whenever neural network
sees the words Twinkle,

29
00:01:04,830 --> 00:01:08,965
Twinkle, Little, the predicted
next word would be star.

30
00:01:08,965 --> 00:01:12,140
Thus given enough words
in a corpus with

31
00:01:12,140 --> 00:01:13,895
a neural network trained on

32
00:01:13,895 --> 00:01:16,145
each of the phrases
in that corpus,

33
00:01:16,145 --> 00:01:17,915
and the predicted next word,

34
00:01:17,915 --> 00:01:19,220
we can come up with some

35
00:01:19,220 --> 00:01:21,635
pretty sophisticated
text generation

36
00:01:21,635 --> 00:01:24,360
and this week, you'll
look at coding that.