1
00:00:00,874 --> 00:00:04,583
Now instead of hard-coding the song
into a string called data,

2
00:00:04,583 --> 00:00:06,723
I can read it from the file like this.

3
00:00:06,723 --> 00:00:11,143
I've updated the model a little bit to
make it work better with a larger corpus

4
00:00:11,143 --> 00:00:15,433
of work but please feel free to
experiment with these hyper-parameters.

5
00:00:15,433 --> 00:00:17,233
Three things that you can experiment with.

6
00:00:17,233 --> 00:00:21,592
First, is the dimensionality of
the embedding, 100 is purely arbitrary and

7
00:00:21,592 --> 00:00:25,579
I'd love to hear what type of results
you will get with different values.

8
00:00:26,860 --> 00:00:30,720
Similarly, I increase the number
of LSTN units to 150.

9
00:00:30,720 --> 00:00:33,700
Again, you can try different values or

10
00:00:33,700 --> 00:00:37,290
you can see how it behaves if
you remove the bidirectional.

11
00:00:37,290 --> 00:00:41,080
Perhaps you want words only
to have forward meaning,

12
00:00:41,080 --> 00:00:45,720
where big dog makes sense but
dog big doesn't make so much sense.

13
00:00:45,720 --> 00:00:48,897
Perhaps the biggest impact
is on the optimizer.

14
00:00:48,897 --> 00:00:53,479
Instead of just hard coding Adam as
my optimizer this time and getting

15
00:00:53,479 --> 00:00:59,255
the defaults, I've now created my own Adam
optimizer and set the learning rate on it.

16
00:00:59,255 --> 00:01:01,838
Try experimenting with
different values here and

17
00:01:01,838 --> 00:01:04,244
see the impact that they
have on convergence.

18
00:01:04,244 --> 00:01:09,969
In particular, see how different
convergences can create different poetry.

19
00:01:09,969 --> 00:01:14,780
And of course, training for different
epochs will always have an impact with

20
00:01:14,780 --> 00:01:20,345
more generally being better but eventually
you'll hit the law of diminishing returns.