Now instead of hard-coding the song into a string called data, I can read it from the file like this. I've updated the model a little bit to make it work better with a larger corpus of work but please feel free to experiment with these hyper-parameters. Three things that you can experiment with. First, is the dimensionality of the embedding, 100 is purely arbitrary and I'd love to hear what type of results you will get with different values. Similarly, I increase the number of LSTN units to 150. Again, you can try different values or you can see how it behaves if you remove the bidirectional. Perhaps you want words only to have forward meaning, where big dog makes sense but dog big doesn't make so much sense. Perhaps the biggest impact is on the optimizer. Instead of just hard coding Adam as my optimizer this time and getting the defaults, I've now created my own Adam optimizer and set the learning rate on it. Try experimenting with different values here and see the impact that they have on convergence. In particular, see how different convergences can create different poetry. And of course, training for different epochs will always have an impact with more generally being better but eventually you'll hit the law of diminishing returns.