1
00:00:00,000 --> 00:00:02,965
Okay. We've mentioned
the shape of the data

2
00:00:02,965 --> 00:00:05,665
and the batches that
the data is split up into.

3
00:00:05,665 --> 00:00:07,915
It's important to
take a look at that,

4
00:00:07,915 --> 00:00:10,310
and let's dig into that next.

5
00:00:10,380 --> 00:00:13,410
The inputs are three dimensional.

6
00:00:13,410 --> 00:00:15,970
So for example, if we
have a window size of

7
00:00:15,970 --> 00:00:19,720
30 timestamps and we're
batching them in sizes of four,

8
00:00:19,720 --> 00:00:23,060
the shape will be 4
times 30 times 1,

9
00:00:23,060 --> 00:00:26,590
and each timestamp,
the memory cell input

10
00:00:26,590 --> 00:00:30,250
will be a four by
one matrix, like this.

11
00:00:30,250 --> 00:00:32,770
The cell will also
take the input of

12
00:00:32,770 --> 00:00:35,350
the state matrix from
the previous step.

13
00:00:35,350 --> 00:00:37,120
But of course in this case,

14
00:00:37,120 --> 00:00:39,460
in the first step,
this will be zero.

15
00:00:39,460 --> 00:00:41,320
For subsequent ones, it'll

16
00:00:41,320 --> 00:00:43,385
be the output from
the memory cell.

17
00:00:43,385 --> 00:00:45,530
But other than the state vector,

18
00:00:45,530 --> 00:00:48,730
the cell of course
will output a Y value,

19
00:00:48,730 --> 00:00:50,545
which we can see here.

20
00:00:50,545 --> 00:00:54,040
If the memory cell is
comprised of three neurons,

21
00:00:54,040 --> 00:00:57,250
then the output matrix will
be four by three because

22
00:00:57,250 --> 00:00:58,660
the batch size coming in was

23
00:00:58,660 --> 00:01:01,255
four and the number
of neurons is three.

24
00:01:01,255 --> 00:01:04,549
So the full output of the layer
is three dimensional,

25
00:01:04,549 --> 00:01:07,530
in this case, 4 by 30 by 3.

26
00:01:07,530 --> 00:01:10,185
With four being the batch size,

27
00:01:10,185 --> 00:01:12,485
three being the number of units,

28
00:01:12,485 --> 00:01:15,805
and 30 being the number
of overall steps.

29
00:01:15,805 --> 00:01:17,655
In a simple RNN,

30
00:01:17,655 --> 00:01:22,060
the state output H is just
a copy of the output matrix Y.

31
00:01:22,060 --> 00:01:25,645
So for example, H_0
is a copy of Y_0,

32
00:01:25,645 --> 00:01:28,670
H_1 is a copy of Y_1, and so on.

33
00:01:28,670 --> 00:01:30,280
So at each timestamp,

34
00:01:30,280 --> 00:01:31,370
the memory cell gets

35
00:01:31,370 --> 00:01:34,865
both the current input and
also the previous output.

36
00:01:34,865 --> 00:01:36,560
Now, in some cases,

37
00:01:36,560 --> 00:01:38,585
you might want to
input a sequence,

38
00:01:38,585 --> 00:01:41,300
but you don't want to
output on and you just

39
00:01:41,300 --> 00:01:42,650
want to get a single vector

40
00:01:42,650 --> 00:01:44,525
for each instance in the batch.

41
00:01:44,525 --> 00:01:48,200
This is typically called
a sequence to vector RNN.

42
00:01:48,200 --> 00:01:49,940
But in reality, all you do is

43
00:01:49,940 --> 00:01:52,610
ignore all of the outputs,
except the last one.

44
00:01:52,610 --> 00:01:54,625
When using Keras in TensorFlow,

45
00:01:54,625 --> 00:01:56,330
this is the default behavior.

46
00:01:56,330 --> 00:01:59,420
So if you want the recurrent
layer to output a sequence,

47
00:01:59,420 --> 00:02:01,490
you have to specify
returns sequences

48
00:02:01,490 --> 00:02:03,770
equals true when
creating the layer.

49
00:02:03,770 --> 00:02:05,670
You'll need to do
this when you stack

50
00:02:05,670 --> 00:02:08,530
one RNN layer on top of another.