In the last videos you saw how
to prepare time series data for machine learning by creating
a window dataset where the previous n values could be seen as
the input features are x. And the current value with any time
stamp is the output label or the y. It would then look a little bit like this. With a number of input values on x,
typically called a window on the data. You saw in the previous
videos how to use the tools in tensor flow datasets
to create these windows. This week you'll adapt that code
to feed a neural network and then train it on the data. So let's start with this function
that will call a windows dataset. It will take in a data series
along with the parameters for the size of the window that we want. The size of the batches
to use when training, and the size of the shuffle buffer, which
determines how the data will be shuffled. The first step will be to create a dataset
from the series using a tf.data dataset. And we'll pass the series to it
using its from_tensor_slices method. We will then use the window
method of the dataset based on our window_size to slice
the data up into the appropriate windows. Each one being shifted by one time set. We'll keep them all the same size
by setting drop remainder to true. We then flatten the data out to
make it easier to work with. And it will be flattened into chunks
in the size of our window_size + 1. Once it's flattened,
it's easy to shuffle it. You call a shuffle and
you pass it the shuffle buffer. Using a shuffle buffer
speeds things up a bit. So for example, if you have
100,000 items in your dataset, but you set the buffer to a thousand. It will just fill the buffer with
the first thousand elements, pick one of them at random. And then it will replace
that with the 1,000 and first element before randomly
picking again, and so on. This way with super large datasets, the
random element choosing can choose from a smaller number which
effectively speeds things up. The shuffled dataset is
then split into the xs, which is all of the elements except the
last, and the y which is the last element. It's then batched into the selected
batch size and returned.