So consider this RNN,
these has two recovered layers, and the first has
return_sequences=True set up. It will output a sequence which
is fed to the next layer. The next layer does not have
return_sequence that's set to True, so it will only output to the final step. But notice the input_shape,
it's set to None and 1. TensorFlow assumes that the first
dimension is the batch size, and that it can have any size at all,
so you don't need to define it. Then the next dimension is the number of
timestamps, which we can set to none, which means that the RNN can
handle sequences of any length. The last dimension is just one because
we're using a unit vary of time series. If we set return_sequences to true and
all recurrent layers, then they will all output sequences and the dense layer
will get a sequence as its inputs. Keras handles this by using the same dense
layer independently at each time stamp. It might look like multiple ones here but it's the same one that's being
reused at each time step. This gives us what is called
a sequence to sequence RNN. It's fed a batch of sequences and it returns a batch of
sequences of the same length. The dimensionality may not always match. It depends on the number of
units in the memory sale. So let's now return to a two-layer
RNN that has the second one not return sequences. This will give us an output
to a single dense.