1
00:00:00,000 --> 00:00:02,760
So here's the notebook
with the basic data

2
00:00:02,760 --> 00:00:05,640
for a single song and
traditional Irish music.

3
00:00:05,640 --> 00:00:08,175
Later we're going to do
a corpus with lots of songs.

4
00:00:08,175 --> 00:00:09,630
But let's take a look
at what happens with

5
00:00:09,630 --> 00:00:12,270
just a single song just
to keep it very simple.

6
00:00:12,270 --> 00:00:14,010
First of all, I'm
just gonna input

7
00:00:14,010 --> 00:00:15,450
everything that I need,

8
00:00:15,450 --> 00:00:18,600
and now I'm going to tokenize
the data from the song.

9
00:00:18,600 --> 00:00:20,880
We can see the data is
just this one long string

10
00:00:20,880 --> 00:00:22,215
with all of the lyrics.

11
00:00:22,215 --> 00:00:24,720
I split the lines
with just the /n.

12
00:00:24,720 --> 00:00:26,460
So when I'm creating the corpus,

13
00:00:26,460 --> 00:00:29,655
i am just taking that and
splitting it by a /n.

14
00:00:29,655 --> 00:00:32,570
Tokenizer fit on text
corpus will then

15
00:00:32,570 --> 00:00:35,350
fit a tokenizer to all
the text that's in here,

16
00:00:35,350 --> 00:00:39,565
and we can see the actual
word index replied all etc.

17
00:00:39,565 --> 00:00:42,080
This number 263 is

18
00:00:42,080 --> 00:00:44,760
the total number of unique
words that are in this corpus.

19
00:00:44,760 --> 00:00:47,055
So it's not a lot
of words of course,

20
00:00:47,055 --> 00:00:49,130
as we start doing
predictions based off

21
00:00:49,130 --> 00:00:51,350
this we're going to have
a very limited dataset.

22
00:00:51,350 --> 00:00:52,700
So we'll see a lot of gibberish

23
00:00:52,700 --> 00:00:56,100
but the structure will
actually work quite well.

24
00:00:56,110 --> 00:01:00,710
So now, I am going to
create my input sequences.

25
00:01:00,710 --> 00:01:01,910
So my training data,

26
00:01:01,910 --> 00:01:03,500
that I'm going to use in

27
00:01:03,500 --> 00:01:06,035
the entraining the
neural network itself.

28
00:01:06,035 --> 00:01:08,710
So what I want to do is
that for each sentence,

29
00:01:08,710 --> 00:01:11,250
in the song or in the corpus,

30
00:01:11,250 --> 00:01:13,310
I'm going to take
a look at each of

31
00:01:13,310 --> 00:01:14,840
the phrases within that and

32
00:01:14,840 --> 00:01:16,790
then the word that
actually follows.

33
00:01:16,790 --> 00:01:19,850
So for example, if you look
at the first sentence here in

34
00:01:19,850 --> 00:01:23,515
the song that's in the town
of Athy one Jeremy Lanigan.

35
00:01:23,515 --> 00:01:25,460
When I go down here to look at

36
00:01:25,460 --> 00:01:27,410
the tokenizers word index for

37
00:01:27,410 --> 00:01:30,215
those in the town of
a Athy one Jeremy Lanigan,

38
00:01:30,215 --> 00:01:31,310
we see that those are four,

39
00:01:31,310 --> 00:01:33,380
two, 66, eight etc.

40
00:01:33,380 --> 00:01:35,030
The important one
to look at here is

41
00:01:35,030 --> 00:01:37,115
Lanigan which is number 70.

42
00:01:37,115 --> 00:01:40,985
So now if I start looking
at my Xs that I've created.

43
00:01:40,985 --> 00:01:44,150
So my training data from
my Xs that I've created.

44
00:01:44,150 --> 00:01:46,280
Here is one sentence
that's in there.

45
00:01:46,280 --> 00:01:49,280
It's 4, 2, 66, 8, 67, 68,

46
00:01:49,280 --> 00:01:53,580
69 which is in the town
of a Athy one Jeremy.

47
00:01:53,580 --> 00:01:55,410
So in this case we say when

48
00:01:55,410 --> 00:01:57,970
our training data
looks like this,

49
00:01:57,970 --> 00:02:01,115
we want to label it with
the next word in the sequence.

50
00:02:01,115 --> 00:02:03,590
The next word in
the sequence is 70.

51
00:02:03,590 --> 00:02:06,755
But we've one-hot encoded
that as you can see here,

52
00:02:06,755 --> 00:02:10,025
with The tf.keras.utils
dot to categorical.

53
00:02:10,025 --> 00:02:12,380
So by one-hot encoding that,

54
00:02:12,380 --> 00:02:13,700
when we look here we see there's

55
00:02:13,700 --> 00:02:16,435
a one hiding in here somewhere
and it's right there.

56
00:02:16,435 --> 00:02:19,685
That's actually
the 70th elements in the list.

57
00:02:19,685 --> 00:02:21,620
So our labeled for that word is a

58
00:02:21,620 --> 00:02:23,990
one-hot encoded to
the say number 70.

59
00:02:23,990 --> 00:02:26,830
So when we see, when we train
for sequence like this,

60
00:02:26,830 --> 00:02:29,480
we're saying this is what
it's labeled will look like.

61
00:02:29,480 --> 00:02:31,850
So again, if I just print these

62
00:02:31,850 --> 00:02:34,110
out I get four, two, 66, eight,

63
00:02:34,110 --> 00:02:36,480
67, 68 it should be followed

64
00:02:36,480 --> 00:02:39,960
by 70 but that's not
one-hot encoded to this.

65
00:02:39,960 --> 00:02:42,810
If I look at my word index again,

66
00:02:42,810 --> 00:02:44,630
if we wanted to look
through this and find

67
00:02:44,630 --> 00:02:47,115
those characters
four, two, 66 etc.

68
00:02:47,115 --> 00:02:48,680
We would see them in here.

69
00:02:48,680 --> 00:02:50,090
So in the town of Athy one

70
00:02:50,090 --> 00:02:52,715
Jeremy Lanigan and
that type of thing.

71
00:02:52,715 --> 00:02:55,715
So that's what my training
data looks like.

72
00:02:55,715 --> 00:02:57,260
So now, I'm going to build

73
00:02:57,260 --> 00:02:59,210
a model to actually
train it with that data.

74
00:02:59,210 --> 00:03:01,610
I'm just going to create
a very simple one.

75
00:03:01,610 --> 00:03:04,135
It's in sequential am putting
an embedding in there,

76
00:03:04,135 --> 00:03:05,330
am feeding the embedding

77
00:03:05,330 --> 00:03:06,830
the total number of words
and am just going to

78
00:03:06,830 --> 00:03:09,740
just plot them in 64 dimensions.

79
00:03:09,740 --> 00:03:12,380
I'm going to create
a very simple lstm,

80
00:03:12,380 --> 00:03:15,755
bidirectional LSTM
with 20 LSTM units

81
00:03:15,755 --> 00:03:18,890
and then I'm going to add
a dense layer between

82
00:03:18,890 --> 00:03:22,430
that and add at the end

83
00:03:22,430 --> 00:03:26,090
for the total number of words
activate that by softmax.

84
00:03:26,090 --> 00:03:29,375
So, there are 263 total words

85
00:03:29,375 --> 00:03:32,840
in the corpus and we're
going to train for those.

86
00:03:32,840 --> 00:03:34,010
So my label like he said,

87
00:03:34,010 --> 00:03:35,870
one-hot encoded
looking like this.

88
00:03:35,870 --> 00:03:37,430
So that will be my last layer.

89
00:03:37,430 --> 00:03:40,025
I'm going to compile
this it's categorical.

90
00:03:40,025 --> 00:03:41,990
So I'm going to use
categorical cross entropy.

91
00:03:41,990 --> 00:03:44,995
I'm just going to use
the basic atom as the optimizer.

92
00:03:44,995 --> 00:03:46,910
Because there's
not a lot of data,

93
00:03:46,910 --> 00:03:48,350
I'm going to have to
train it quite a lot

94
00:03:48,350 --> 00:03:49,550
of epochs and you'll see as I

95
00:03:49,550 --> 00:03:50,990
start training that my accuracy

96
00:03:50,990 --> 00:03:52,550
is very low to begin with,

97
00:03:52,550 --> 00:03:55,115
but it will improve over time.

98
00:03:55,115 --> 00:03:58,510
It's like epoch one ammonia 0.02

99
00:03:58,510 --> 00:04:01,075
but I'm almost double to 0.05.

100
00:04:01,075 --> 00:04:03,070
Let's take some
0.05 away await for

101
00:04:03,070 --> 00:04:05,285
a while but then
continues to increase.

102
00:04:05,285 --> 00:04:06,820
There's not a lot of data here.

103
00:04:06,820 --> 00:04:09,340
It's only taking about one
second as you can see for

104
00:04:09,340 --> 00:04:14,090
each epoch and it's increasing
steadily epoch by epoch.

105
00:04:14,820 --> 00:04:18,130
It's 500 epochs so it's going
to take a little while.

106
00:04:18,130 --> 00:04:20,300
So I'm just going to pause there.

107
00:04:28,430 --> 00:04:31,390
We can see now that we're
reaching the end of it.

108
00:04:31,390 --> 00:04:34,420
We're in the 480 epochs that
the accuracy is up into the

109
00:04:34,420 --> 00:04:38,035
94,95 percent range so
it's looking pretty good.

110
00:04:38,035 --> 00:04:39,920
We actually hit that much earlier

111
00:04:39,920 --> 00:04:41,180
as we'll see when we chart it,

112
00:04:41,180 --> 00:04:44,305
but we end up with
94.7 percent accuracy.

113
00:04:44,305 --> 00:04:48,975
If I charted out and plot that,

114
00:04:48,975 --> 00:04:51,960
we'll see we kind of hit
that at around 200 epochs.

115
00:04:51,960 --> 00:04:53,900
We probably didn't
need to go all 500

116
00:04:53,900 --> 00:04:56,535
but it's nice to see
it training like that.

117
00:04:56,535 --> 00:04:59,505
So now let's take a look
at predicting words

118
00:04:59,505 --> 00:05:02,975
using the model that
we trained on this.

119
00:05:02,975 --> 00:05:05,750
So if I seeded with this text
Lawrence went to Dublin,

120
00:05:05,750 --> 00:05:08,000
I am going to ask it
for the next 100 words.

121
00:05:08,000 --> 00:05:09,860
What it's going to do, is

122
00:05:09,860 --> 00:05:12,290
for each of the next 100 words
it's going to create

123
00:05:12,290 --> 00:05:13,310
a token lists using

124
00:05:13,310 --> 00:05:16,840
tokenizer text sequences
of the seed text.

125
00:05:16,840 --> 00:05:18,860
Then that token list
is going to get

126
00:05:18,860 --> 00:05:21,080
padded to the actual
length that we want.

127
00:05:21,080 --> 00:05:23,270
Then that's going to be
passed into the model.

128
00:05:23,270 --> 00:05:25,325
So we're going to
predict the classes

129
00:05:25,325 --> 00:05:27,260
for the token list that was

130
00:05:27,260 --> 00:05:29,465
generated off of this seed text

131
00:05:29,465 --> 00:05:31,900
and then we'll get
an output word from that.

132
00:05:31,900 --> 00:05:33,950
Then that will be
used to feed into

133
00:05:33,950 --> 00:05:35,240
the next time round to predict

134
00:05:35,240 --> 00:05:36,920
again to get another model.

135
00:05:36,920 --> 00:05:39,050
So when we start with
Lawrence went to Dublin,

136
00:05:39,050 --> 00:05:41,570
it'll get us another word and
that phrase will generate

137
00:05:41,570 --> 00:05:42,830
another word and that phrase will

138
00:05:42,830 --> 00:05:44,900
generate another word etc.

139
00:05:44,900 --> 00:05:46,670
So if I print that out,

140
00:05:46,670 --> 00:05:48,455
we'll get something like this.

141
00:05:48,455 --> 00:05:51,770
Lawrence went to Dublin
a twist of a reel and a jig,

142
00:05:51,770 --> 00:05:55,610
jig gathered gathered them
long new weeks i spent up jig

143
00:05:55,610 --> 00:06:00,175
Dublin might ask ask ask
mother asks jig man again.

144
00:06:00,175 --> 00:06:02,620
We can see what's actually
happening here is that in

145
00:06:02,620 --> 00:06:04,840
the beginning it kind
of looks pretty good.

146
00:06:04,840 --> 00:06:06,715
It's beginning to make sense.

147
00:06:06,715 --> 00:06:09,730
But of course because
our body of texts is pretty

148
00:06:09,730 --> 00:06:13,840
small and each prediction
is a probability.

149
00:06:13,840 --> 00:06:16,180
So after the words
Lawrence went to Dublin,

150
00:06:16,180 --> 00:06:18,910
the most probable word
that would come next is A,

151
00:06:18,910 --> 00:06:21,095
but of course it's not
100 percent certainty.

152
00:06:21,095 --> 00:06:23,380
It's a probability and
then the probability

153
00:06:23,380 --> 00:06:25,630
of the next probable word after

154
00:06:25,630 --> 00:06:27,400
Lawrence went to
Dublin A would be

155
00:06:27,400 --> 00:06:30,970
twist and keep going twist
over reel and a jig.

156
00:06:30,970 --> 00:06:32,365
But as you can see then,

157
00:06:32,365 --> 00:06:34,570
as you get further and
further and further

158
00:06:34,570 --> 00:06:36,790
then the probabilities are

159
00:06:36,790 --> 00:06:39,850
decreasing and the quality

160
00:06:39,850 --> 00:06:41,830
of the prediction as
a result goes down.

161
00:06:41,830 --> 00:06:43,430
So you end up with for example

162
00:06:43,430 --> 00:06:46,265
repeated words like
Jake Jake, gathered gathered.

163
00:06:46,265 --> 00:06:48,635
It's kind of fun
if we take a look

164
00:06:48,635 --> 00:06:50,720
at some of the words in the song,

165
00:06:50,720 --> 00:06:55,610
so we could see how they would
deal with the prediction.

166
00:06:55,610 --> 00:06:58,070
So for example, when we say
Lawrence went to Dublin,

167
00:06:58,070 --> 00:06:59,870
a twist of a reel.

168
00:06:59,870 --> 00:07:02,120
So let's take a look
at a twist of a reel.

169
00:07:02,120 --> 00:07:05,270
If we go back to
the original song and

170
00:07:05,270 --> 00:07:06,650
the text of the original song

171
00:07:06,650 --> 00:07:09,570
see if those words
actually exist.

172
00:07:13,490 --> 00:07:16,280
For example, here
I can see within

173
00:07:16,280 --> 00:07:18,140
the song itself it
says i tipped them

174
00:07:18,140 --> 00:07:19,340
the twist of a reel and

175
00:07:19,340 --> 00:07:22,760
the jig was one of
the lyrics of the song.

176
00:07:22,760 --> 00:07:25,385
If I go back to my prediction,

177
00:07:25,385 --> 00:07:28,550
it gave me, Lawrence went to

178
00:07:28,550 --> 00:07:31,280
Dublin a twist of
a reel and a jig.

179
00:07:31,280 --> 00:07:33,545
So women saw the words twists.

180
00:07:33,545 --> 00:07:35,120
The next word was almost
always going to be

181
00:07:35,120 --> 00:07:37,310
off when it seeds word of

182
00:07:37,310 --> 00:07:39,170
the next word is always
going to be reel out

183
00:07:39,170 --> 00:07:41,240
a and then reel and
then at the jigs.

184
00:07:41,240 --> 00:07:43,680
So we see that
actually happening.

185
00:07:44,110 --> 00:07:47,435
Coming from these training words.

186
00:07:47,435 --> 00:07:50,285
So that's a very simple example.

187
00:07:50,285 --> 00:07:51,860
In the next lesson,

188
00:07:51,860 --> 00:07:53,810
you are going to be using
a much bigger corpus

189
00:07:53,810 --> 00:07:55,280
of text and hopefully,

190
00:07:55,280 --> 00:07:56,420
the predictions will make

191
00:07:56,420 --> 00:07:59,940
a little bit more sense and
be a little bit more poetic.