Now instead of hard-coding the song
into a string called data, I can read it from the file like this. I've updated the model a little bit to
make it work better with a larger corpus of work but please feel free to
experiment with these hyper-parameters. Three things that you can experiment with. First, is the dimensionality of
the embedding, 100 is purely arbitrary and I'd love to hear what type of results
you will get with different values. Similarly, I increase the number
of LSTN units to 150. Again, you can try different values or you can see how it behaves if
you remove the bidirectional. Perhaps you want words only
to have forward meaning, where big dog makes sense but
dog big doesn't make so much sense. Perhaps the biggest impact
is on the optimizer. Instead of just hard coding Adam as
my optimizer this time and getting the defaults, I've now created my own Adam
optimizer and set the learning rate on it. Try experimenting with
different values here and see the impact that they
have on convergence. In particular, see how different
convergences can create different poetry. And of course, training for different
epochs will always have an impact with more generally being better but eventually
you'll hit the law of diminishing returns.