1
00:00:00,000 --> 00:00:02,025
In the last couple of weeks,

2
00:00:02,025 --> 00:00:03,840
you've looked at
creating neural networks

3
00:00:03,840 --> 00:00:05,640
to forecast time-series data.

4
00:00:05,640 --> 00:00:08,640
You started with some simple
analytical techniques,

5
00:00:08,640 --> 00:00:10,020
which you then extend it to using

6
00:00:10,020 --> 00:00:12,525
Machine Learning to do
a simple regression.

7
00:00:12,525 --> 00:00:14,490
From there you use
the DNN that you

8
00:00:14,490 --> 00:00:16,590
tweaked a bit to get
an even better model.

9
00:00:16,590 --> 00:00:18,090
This week, we're going to look at

10
00:00:18,090 --> 00:00:20,835
RNNs for the task of prediction.

11
00:00:20,835 --> 00:00:22,890
A Recurrent Neural Network,

12
00:00:22,890 --> 00:00:24,720
or RNN is a neural network

13
00:00:24,720 --> 00:00:26,310
that contains recurrent layers.

14
00:00:26,310 --> 00:00:27,780
These are designed to

15
00:00:27,780 --> 00:00:30,690
sequentially processes
sequence of inputs.

16
00:00:30,690 --> 00:00:32,355
RNNs are pretty flexible,

17
00:00:32,355 --> 00:00:34,710
able to process
all kinds of sequences.

18
00:00:34,710 --> 00:00:36,220
As you saw in
the previous course,

19
00:00:36,220 --> 00:00:38,315
they could've been used
for predicting text.

20
00:00:38,315 --> 00:00:41,495
Here we'll use them to
process the time series.

21
00:00:41,495 --> 00:00:44,420
This example, will build
an RNN that contains

22
00:00:44,420 --> 00:00:47,210
two recurrent layers and
a final dense layer,

23
00:00:47,210 --> 00:00:48,665
which will serve as the output.

24
00:00:48,665 --> 00:00:51,905
With an RNN, you can feed
it in batches of sequences,

25
00:00:51,905 --> 00:00:54,230
and it will output
a batch of forecasts,

26
00:00:54,230 --> 00:00:56,135
just like we did last week.

27
00:00:56,135 --> 00:00:58,010
One difference will be that

28
00:00:58,010 --> 00:00:59,570
the full input shape when

29
00:00:59,570 --> 00:01:02,045
using RNNs is three-dimensional.

30
00:01:02,045 --> 00:01:04,340
The first dimension
will be the batch size,

31
00:01:04,340 --> 00:01:06,215
the second will be
the timestamps,

32
00:01:06,215 --> 00:01:08,285
and the third is
the dimensionality

33
00:01:08,285 --> 00:01:10,715
of the inputs at each time step.

34
00:01:10,715 --> 00:01:13,880
For example, if it's
a univariate time series,

35
00:01:13,880 --> 00:01:15,295
this value will be one,

36
00:01:15,295 --> 00:01:17,220
for multivariate it'll be more.

37
00:01:17,220 --> 00:01:18,800
The models you've been using to

38
00:01:18,800 --> 00:01:20,795
date had two-dimensional inputs,

39
00:01:20,795 --> 00:01:22,595
the batch dimension
was the first,

40
00:01:22,595 --> 00:01:24,770
and the second had
all the input features.

41
00:01:24,770 --> 00:01:26,390
But before going further,

42
00:01:26,390 --> 00:01:29,330
let's dig into the RNN layers
to see how they work.

43
00:01:29,330 --> 00:01:31,670
What it looks like
there's lots of cells,

44
00:01:31,670 --> 00:01:33,200
there's actually only one,

45
00:01:33,200 --> 00:01:36,065
and it's used repeatedly
to compute the outputs.

46
00:01:36,065 --> 00:01:38,330
In this diagram, it looks
like there's lots of them,

47
00:01:38,330 --> 00:01:39,710
but I'm just using
the same one being

48
00:01:39,710 --> 00:01:42,010
reused multiple
times by the layer.

49
00:01:42,010 --> 00:01:43,810
At each time step,

50
00:01:43,810 --> 00:01:47,045
the memory cell takes
the input value for that step.

51
00:01:47,045 --> 00:01:49,880
So for example, it is
zero at time zero,

52
00:01:49,880 --> 00:01:51,625
and zero state input.

53
00:01:51,625 --> 00:01:54,230
It then calculates
the output for that step,

54
00:01:54,230 --> 00:01:56,420
in this case Y0, and a state

55
00:01:56,420 --> 00:01:59,555
vector H0 that's fed
into the next step.

56
00:01:59,555 --> 00:02:04,685
H0 is fed into the cell with
X1 to produce Y1 and H1,

57
00:02:04,685 --> 00:02:07,400
which is then fed into
the cell at the next step

58
00:02:07,400 --> 00:02:10,685
with X2 to produce Y2 and H2.

59
00:02:10,685 --> 00:02:13,550
These steps will continue until

60
00:02:13,550 --> 00:02:15,775
we reach the end of
our input dimension,

61
00:02:15,775 --> 00:02:18,780
which in this case has 30 values.

62
00:02:18,780 --> 00:02:22,340
Now, this is what gives
this type of architecture

63
00:02:22,340 --> 00:02:25,010
the name a recurrent
neural network,

64
00:02:25,010 --> 00:02:28,399
because the values recur due
to the output of the cell,

65
00:02:28,399 --> 00:02:29,870
a one-step being fed back

66
00:02:29,870 --> 00:02:32,150
into itself at
the next time step.

67
00:02:32,150 --> 00:02:34,210
As we saw in the NLP course,

68
00:02:34,210 --> 00:02:36,950
this is really helpful
in determining states.

69
00:02:36,950 --> 00:02:38,390
The location of a word in

70
00:02:38,390 --> 00:02:41,295
a sentence can
determine it semantics.

71
00:02:41,295 --> 00:02:43,460
Similarly, for numeric series,

72
00:02:43,460 --> 00:02:46,370
things such as closer numbers
in the series might have

73
00:02:46,370 --> 00:02:47,750
a greater impact than those

74
00:02:47,750 --> 00:02:50,700
further away from
our target value.