1
00:00:00,000 --> 00:00:02,100
One important note is that

2
00:00:02,100 --> 00:00:03,900
while we got rid of
the Lambda layer

3
00:00:03,900 --> 00:00:07,620
that reshaped the input for
us to work with the LSTM's.

4
00:00:07,620 --> 00:00:09,360
So we're actually specifying

5
00:00:09,360 --> 00:00:12,855
an input shape on
the curve 1D here.

6
00:00:12,855 --> 00:00:15,180
This requires us to update

7
00:00:15,180 --> 00:00:17,355
the windowed_datasetet
helper function

8
00:00:17,355 --> 00:00:19,590
that we've been working
with all along.

9
00:00:19,590 --> 00:00:23,070
We'll simply use
tf.expand_ dims in

10
00:00:23,070 --> 00:00:24,690
the helper function to expand

11
00:00:24,690 --> 00:00:27,960
the dimensions of the series
before we process it.

12
00:00:27,960 --> 00:00:30,525
Also similar to last week,

13
00:00:30,525 --> 00:00:32,240
the code will attempt lots of

14
00:00:32,240 --> 00:00:34,260
different learning rates
changing them

15
00:00:34,260 --> 00:00:36,700
epoch by epoch and
plotting the results.

16
00:00:36,700 --> 00:00:38,420
With this data and the

17
00:00:38,420 --> 00:00:41,264
convolutional and
LASTM-based network,

18
00:00:41,264 --> 00:00:43,040
we'll get a plot like this.

19
00:00:43,040 --> 00:00:45,530
It clearly bottoms are
around 10 to the minus

20
00:00:45,530 --> 00:00:48,065
five after which it
looks a bit unstable,

21
00:00:48,065 --> 00:00:51,410
so we'll take that to be
our desired learning rates.

22
00:00:51,410 --> 00:00:54,500
Thus when we define
the optimizer will set

23
00:00:54,500 --> 00:00:58,415
the learning rate to
be 1e-5 as shown here.

24
00:00:58,415 --> 00:01:02,210
When we train for 500 epochs
we'll get this curve.

25
00:01:02,210 --> 00:01:04,760
It's a huge improvement
over earlier.

26
00:01:04,760 --> 00:01:06,590
The peak has lost its plateau

27
00:01:06,590 --> 00:01:08,420
but it's still not quite right,

28
00:01:08,420 --> 00:01:11,420
it's not getting high enough
relative to the data.

29
00:01:11,420 --> 00:01:13,880
Now of course noise is
a factor and we can see

30
00:01:13,880 --> 00:01:17,330
crazy fluctuations in
the peak caused by the noise,

31
00:01:17,330 --> 00:01:18,560
but I think our model could

32
00:01:18,560 --> 00:01:20,480
possibly do a bit
better than this.

33
00:01:20,480 --> 00:01:22,730
Our MAE is below five,

34
00:01:22,730 --> 00:01:24,290
but I would bet that outside of

35
00:01:24,290 --> 00:01:27,065
that first peak is probably
a lot lower than that.

36
00:01:27,065 --> 00:01:30,305
One solution might be to
train a little bit longer.

37
00:01:30,305 --> 00:01:34,310
Even though our MAE loss curves
look flat at 500 epochs,

38
00:01:34,310 --> 00:01:35,390
we can see when we zoom

39
00:01:35,390 --> 00:01:37,640
in that they're
slowly diminishing.

40
00:01:37,640 --> 00:01:40,975
The network is still
learning albeit slowly.

41
00:01:40,975 --> 00:01:43,250
Now one method would be to make

42
00:01:43,250 --> 00:01:46,310
your LASTMs
bidirectional like this.

43
00:01:46,310 --> 00:01:49,880
When training, this
looks really good giving

44
00:01:49,880 --> 00:01:54,340
very low loss in MAE values
sometimes even less than one.

45
00:01:54,340 --> 00:01:57,170
But unfortunately it's
overfittingng when we

46
00:01:57,170 --> 00:01:59,810
plot the predictions
against the validation set,

47
00:01:59,810 --> 00:02:01,580
we don't see much improvement

48
00:02:01,580 --> 00:02:03,755
and in fact our MAE
has gone down.

49
00:02:03,755 --> 00:02:06,410
So it's still a step in
the right direction and

50
00:02:06,410 --> 00:02:09,409
consider an architecture like
this one as you go forward,

51
00:02:09,409 --> 00:02:11,240
but perhaps you might
need to tweak some

52
00:02:11,240 --> 00:02:13,980
of the parameters to
avoid overfittingng.

53
00:02:13,980 --> 00:02:16,460
Some of the problems are clearly

54
00:02:16,460 --> 00:02:19,615
visualize when we plot
the loss against the MAE,

55
00:02:19,615 --> 00:02:22,550
there's a lot of noise
and instability in there.

56
00:02:22,550 --> 00:02:25,520
One common cause for
small spikes like that is

57
00:02:25,520 --> 00:02:29,425
a small batch size introducing
further random noise.

58
00:02:29,425 --> 00:02:31,480
I won't go into the details here,

59
00:02:31,480 --> 00:02:33,530
but if you check out
Andrea's videos and

60
00:02:33,530 --> 00:02:35,980
his course on optimizing
for gradient descent,

61
00:02:35,980 --> 00:02:38,345
there's some really
great stuff in there.

62
00:02:38,345 --> 00:02:40,310
One hint was to explore

63
00:02:40,310 --> 00:02:41,840
the batch size and to make

64
00:02:41,840 --> 00:02:43,880
sure it's appropriate
for my data.

65
00:02:43,880 --> 00:02:45,500
So in this case it's worth

66
00:02:45,500 --> 00:02:48,245
experimenting with
different batch sizes.

67
00:02:48,245 --> 00:02:51,080
So for example experimented with

68
00:02:51,080 --> 00:02:52,310
different batch sizes both

69
00:02:52,310 --> 00:02:54,970
larger and smaller
than the original 32,

70
00:02:54,970 --> 00:02:57,260
and when I tried 16 you can see

71
00:02:57,260 --> 00:02:59,990
the impact here on
the validation set,

72
00:02:59,990 --> 00:03:04,170
and here on the training
loss and MAE data.

73
00:03:05,240 --> 00:03:07,920
So by combining CNNs and

74
00:03:07,920 --> 00:03:11,239
LSTMs we've been able to
build our best model yet,

75
00:03:11,239 --> 00:03:13,985
despite some rough edges
that could be refined.

76
00:03:13,985 --> 00:03:16,670
In the next video, we'll
step through a notebook that

77
00:03:16,670 --> 00:03:17,840
trains with this model so

78
00:03:17,840 --> 00:03:19,354
that you can see it for yourself,

79
00:03:19,354 --> 00:03:21,440
and also we'll maybe
learn how to do some

80
00:03:21,440 --> 00:03:24,335
fine-tuning to improve
the model even further.

81
00:03:24,335 --> 00:03:26,090
After that, we'll take the model

82
00:03:26,090 --> 00:03:27,500
architecture and apply it to

83
00:03:27,500 --> 00:03:29,240
some real-world data instead of

84
00:03:29,240 --> 00:03:32,190
the synthetic ones that
you've been using all along.