1
00:00:00,000 --> 00:00:02,220
In the previous video, you got

2
00:00:02,220 --> 00:00:03,840
a look at RNNs and how they can

3
00:00:03,840 --> 00:00:04,860
be used for sequence to

4
00:00:04,860 --> 00:00:07,260
vector to sequence to
sequence prediction.

5
00:00:07,260 --> 00:00:08,640
Let's now take a look at

6
00:00:08,640 --> 00:00:10,620
coding them for
the problem at hand

7
00:00:10,620 --> 00:00:11,820
and seeing if we can get

8
00:00:11,820 --> 00:00:14,640
good predictions in our
time series using them.

9
00:00:14,640 --> 00:00:17,070
One thing you'll see in
the rest of the lessons

10
00:00:17,070 --> 00:00:19,050
going forward is that
I'd like to write

11
00:00:19,050 --> 00:00:20,790
a little bit of code to optimize

12
00:00:20,790 --> 00:00:22,200
the neural network for

13
00:00:22,200 --> 00:00:24,060
the learning rate
of the optimizer.

14
00:00:24,060 --> 00:00:26,640
Can be pretty quick to
train and we can from

15
00:00:26,640 --> 00:00:30,480
there save a lot of time in
our hyper-parameter tuning.

16
00:00:30,480 --> 00:00:33,210
So here's the code
for training the RNN

17
00:00:33,210 --> 00:00:35,505
with two layers
each with 40 cells.

18
00:00:35,505 --> 00:00:36,810
To tune the learning rate,

19
00:00:36,810 --> 00:00:39,690
we'll set up a callback,
which you can see here.

20
00:00:39,690 --> 00:00:41,480
Every epoch this just changes

21
00:00:41,480 --> 00:00:43,640
the learning rate a
little so that it steps

22
00:00:43,640 --> 00:00:45,410
all the way from 1 times 10 to

23
00:00:45,410 --> 00:00:48,140
the minus 8 to 1 times
10 to the minus 6.

24
00:00:48,140 --> 00:00:51,155
You can see that setup
here while training.

25
00:00:51,155 --> 00:00:54,140
I've also introduced
a new loss function to

26
00:00:54,140 --> 00:00:56,810
use called Huber which
you can see here.

27
00:00:56,810 --> 00:00:58,130
The Huber function is

28
00:00:58,130 --> 00:01:00,140
a loss function that's
less sensitive to

29
00:01:00,140 --> 00:01:03,410
outliers and as this data
can get a little bit noisy,

30
00:01:03,410 --> 00:01:05,365
it's worth giving it a shot.

31
00:01:05,365 --> 00:01:07,790
If I run this for 100 epochs and

32
00:01:07,790 --> 00:01:09,860
measure the loss at each epoch,

33
00:01:09,860 --> 00:01:12,050
I will see that
my optimum learning rate for

34
00:01:12,050 --> 00:01:14,450
stochastic gradient
descent is between

35
00:01:14,450 --> 00:01:17,000
about 10 to the minus 5
and 10 to the minus 6.

36
00:01:17,000 --> 00:01:20,500
So I'm going to set it's 5
times 10 to the minus 5.

37
00:01:20,500 --> 00:01:23,480
So now, I'll set
my models compiled with

38
00:01:23,480 --> 00:01:24,980
that learning rate and

39
00:01:24,980 --> 00:01:27,665
the stochastic gradient
descent optimizer.

40
00:01:27,665 --> 00:01:30,410
After training for 500 epochs,

41
00:01:30,410 --> 00:01:31,825
I will get this chart,

42
00:01:31,825 --> 00:01:35,870
with an MAE on the validation
set of about 6.35.

43
00:01:35,870 --> 00:01:38,870
It's not bad, but I wonder
if we can do better.

44
00:01:38,870 --> 00:01:42,730
So here's the loss and
the MAE during training

45
00:01:42,730 --> 00:01:44,750
with the chart on
the right is zoomed

46
00:01:44,750 --> 00:01:46,985
into the last few epochs.

47
00:01:46,985 --> 00:01:49,370
As you can see,
the trend was genuinely

48
00:01:49,370 --> 00:01:52,490
downward until a little
after 400 epochs,

49
00:01:52,490 --> 00:01:54,475
when it started getting unstable.

50
00:01:54,475 --> 00:01:56,495
Given this, it's probably worth

51
00:01:56,495 --> 00:01:58,670
only training for
about 400 epochs.

52
00:01:58,670 --> 00:02:00,810
When I do that,

53
00:02:00,810 --> 00:02:02,085
I get these results.

54
00:02:02,085 --> 00:02:03,680
That's pretty much the same with

55
00:02:03,680 --> 00:02:06,275
the MAE only a tiny
little bit higher,

56
00:02:06,275 --> 00:02:07,880
but we've saved 100 epochs worth

57
00:02:07,880 --> 00:02:10,070
of training to get
it. So it's worth it.

58
00:02:10,070 --> 00:02:12,350
A quick look at the training MAE

59
00:02:12,350 --> 00:02:14,690
and loss gives us this results.

60
00:02:14,690 --> 00:02:16,230
So we've done quite well,

61
00:02:16,230 --> 00:02:18,560
and that was just
using a simple RNN.

62
00:02:18,560 --> 00:02:20,480
Let's see how we can
improve this with

63
00:02:20,480 --> 00:02:24,000
LSTMs and you'll see
that in the next video.