1
00:00:00,720 --> 00:00:03,820
Okay, so this week you've been
looking at machine learning for

2
00:00:03,820 --> 00:00:06,500
predicting the next value in a sequence.

3
00:00:06,500 --> 00:00:10,050
You've learned how to break your data down
into window chunks that you could train

4
00:00:10,050 --> 00:00:14,450
on, and then you saw a simple
single layer neural network

5
00:00:14,450 --> 00:00:17,630
that gave you what was
effectively a linear regression.

6
00:00:17,630 --> 00:00:22,572
Now let's take that to the next step
with a DNN to see if we can improve

7
00:00:22,572 --> 00:00:24,145
our model accuracy.

8
00:00:24,145 --> 00:00:27,720
It's not that much different from the
linear regression model we saw earlier.

9
00:00:27,720 --> 00:00:31,670
And this is a relatively simple deep
neural network that has three layers.

10
00:00:31,670 --> 00:00:33,950
So let's unpack it line by line.

11
00:00:33,950 --> 00:00:38,950
First we'll have to get a data set which
will generate by passing in x_train data,

12
00:00:38,950 --> 00:00:42,100
along with the desired window size,
batch size, and shuffle buffer size.

13
00:00:43,175 --> 00:00:44,640
We'll then define the model.

14
00:00:44,640 --> 00:00:49,120
Let's keep it simple with three
layers of 10, 10, and 1 neurons.

15
00:00:49,120 --> 00:00:51,580
The input shape is
the size of the window and

16
00:00:51,580 --> 00:00:53,490
we'll activate each layer using a relu.

17
00:00:54,740 --> 00:00:58,936
We'll then compile the model as before
with a mean squared error loss function

18
00:00:58,936 --> 00:01:01,429
and stochastic gradient descent optimizer.

19
00:01:02,700 --> 00:01:05,900
Finally, we'll fit the model
over 100 epochs, and

20
00:01:05,900 --> 00:01:10,131
after a few seconds of training,
we'll see results that look like this.

21
00:01:10,131 --> 00:01:11,376
It's pretty good still.

22
00:01:11,376 --> 00:01:15,506
And when we calculate the mean absolute
error, we're lower than we were earlier,

23
00:01:15,506 --> 00:01:18,420
so it's a step in the right direction.

24
00:01:18,420 --> 00:01:20,550
But it's also a somewhat
a stab in the dark,

25
00:01:20,550 --> 00:01:22,890
particularly with the optimizer function.

26
00:01:22,890 --> 00:01:26,300
Wouldn't it be nice if we could pick the
optimal learning rate instead of the one

27
00:01:26,300 --> 00:01:27,510
that we chose?

28
00:01:27,510 --> 00:01:30,390
We might learn more efficiently and
build a better model.

29
00:01:30,390 --> 00:01:31,690
Now let's look at a technique for

30
00:01:31,690 --> 00:01:36,360
that that uses callbacks that you
used way back in the first course.

31
00:01:36,360 --> 00:01:38,770
So here's a code for
the previous neural network.

32
00:01:38,770 --> 00:01:41,730
But I've added a callback
to tweak the learning rate

33
00:01:41,730 --> 00:01:44,240
using a learning rate scheduler.

34
00:01:44,240 --> 00:01:45,990
You can see that code here.

35
00:01:45,990 --> 00:01:49,880
This will be called at the callback
at the end of each epoch.

36
00:01:49,880 --> 00:01:53,260
What it will do is change
the learning rates to a value

37
00:01:53,260 --> 00:01:55,260
based on the epoch number.

38
00:01:55,260 --> 00:02:01,330
So in epoch 1, it is 1 times 10 to the -8
times 10 to the power of 1 over 20.

39
00:02:01,330 --> 00:02:03,955
And by the time we reach the 100 epoch,

40
00:02:03,955 --> 00:02:09,885
it'll be 1 times 10 to the -8 times 10 to
the power of 5, and that's 100 over 20.

41
00:02:09,885 --> 00:02:13,014
This will happen on each
callback because we set it

42
00:02:13,014 --> 00:02:16,000
in the callbacks parameter
of modeled outfit.

43
00:02:18,220 --> 00:02:22,786
After training with this, we can
then plot the last per epoch against

44
00:02:22,786 --> 00:02:28,151
the learning rate per epoch by using this
code, and we'll see a chart like this.

45
00:02:28,151 --> 00:02:30,965
The y-axis shows us the loss for
that epoch and

46
00:02:30,965 --> 00:02:34,140
the x-axis shows us the learning rate.

47
00:02:34,140 --> 00:02:38,140
We can then try to pick the lowest point
of the curve where it's still relatively

48
00:02:38,140 --> 00:02:43,294
stable like this, and
that's right around 7 times 10 to the -6.

49
00:02:44,310 --> 00:02:48,290
So let's set that to be our learning
rate and then we'll retrain.

50
00:02:48,290 --> 00:02:51,330
So here's the same neural network code,
and

51
00:02:51,330 --> 00:02:55,820
we've updated the learning rate, so
we'll also train it for a bit longer.

52
00:02:55,820 --> 00:02:59,461
Let's check the results after training for
500 epochs.

53
00:02:59,461 --> 00:03:04,045
Here's the codes to plot out the loss that
was calculated during the training, and

54
00:03:04,045 --> 00:03:05,890
it will give us a chart like this.

55
00:03:07,270 --> 00:03:11,180
Which upon first inspection looks
like we're probably wasting our time

56
00:03:11,180 --> 00:03:14,010
training beyond maybe only 10 epochs, but

57
00:03:14,010 --> 00:03:18,320
it's somewhat skewed by the fact that
the earlier losses were so high.

58
00:03:18,320 --> 00:03:22,501
If we cropped them off and plot the loss
for epochs after number 10 with code like

59
00:03:22,501 --> 00:03:25,286
this, then the chart will
tell us a different story.

60
00:03:26,829 --> 00:03:33,402
We can see that the loss was continuing
to decrease even after 500 epochs.

61
00:03:33,402 --> 00:03:37,296
And that shows that our network
is learning very well indeed.

62
00:03:37,296 --> 00:03:41,238
And the results of the predictions
overlaid against the originals looks

63
00:03:41,238 --> 00:03:41,890
like this.

64
00:03:43,110 --> 00:03:45,810
And the mean absolute
error across the results

65
00:03:45,810 --> 00:03:47,779
is significantly lower than earlier.

66
00:03:49,050 --> 00:03:52,380
I'll take you through a screencast of
this code in action in the next video.

67
00:03:52,380 --> 00:03:56,710
Using a very simple DNN,
we've improved our results very nicely.

68
00:03:56,710 --> 00:04:01,219
But it's still just a DNN, there's no
sequencing taken into account, and

69
00:04:01,219 --> 00:04:05,588
in a time series like this, the values
that are immediately before a value

70
00:04:05,588 --> 00:04:09,133
are more likely to impact it
than those further in the past.

71
00:04:09,133 --> 00:04:13,163
And that's the perfect set up to use
RNS like we had in the natural language

72
00:04:13,163 --> 00:04:14,230
course.

73
00:04:14,230 --> 00:04:17,310
Now we'll look at that next week,
but first, let's dig into this code.