1
00:00:00,000 --> 00:00:02,430
So we've looked at
the sunspot data using

2
00:00:02,430 --> 00:00:03,690
a standard DNN like

3
00:00:03,690 --> 00:00:05,655
way back at the beginning
of this course.

4
00:00:05,655 --> 00:00:08,130
We've also been working
a lot with RNNs and a

5
00:00:08,130 --> 00:00:10,485
little with convolutional
neural networks.

6
00:00:10,485 --> 00:00:12,360
So what would happen
if we put them all

7
00:00:12,360 --> 00:00:15,315
together to see if we can
predict the sunspot activity?

8
00:00:15,315 --> 00:00:17,520
This is a difficult dataset

9
00:00:17,520 --> 00:00:19,230
because like we've seen already,

10
00:00:19,230 --> 00:00:21,765
while it's seasonal
the period is really long,

11
00:00:21,765 --> 00:00:23,805
around 11 years, and it's not

12
00:00:23,805 --> 00:00:26,625
perfectly seasonable
during that period.

13
00:00:26,625 --> 00:00:29,780
So let's take a look at using
all the tools we have to

14
00:00:29,780 --> 00:00:31,100
see if we can build a decent

15
00:00:31,100 --> 00:00:33,560
prediction using
machine learning.

16
00:00:33,560 --> 00:00:36,380
So here's the first piece
of code we can try.

17
00:00:36,380 --> 00:00:37,640
I've gone a little crazy here,

18
00:00:37,640 --> 00:00:39,635
so let's break it
down piece by piece.

19
00:00:39,635 --> 00:00:41,720
First of all I'm setting
the batch size to

20
00:00:41,720 --> 00:00:44,420
64 and the window size to 60.

21
00:00:44,420 --> 00:00:46,910
Then we'll start with
a 1D convolution

22
00:00:46,910 --> 00:00:48,935
that we'll learn 32 filters.

23
00:00:48,935 --> 00:00:52,880
This will output to a couple
of LSTMs with 32 cells

24
00:00:52,880 --> 00:00:55,220
each before feeding into

25
00:00:55,220 --> 00:00:57,635
a DNN similar to
what we saw earlier,

26
00:00:57,635 --> 00:00:59,945
30 neurons, then 10, and one.

27
00:00:59,945 --> 00:01:03,830
Finally, as our numbers
are in the 1-400 range,

28
00:01:03,830 --> 00:01:07,735
there is a Lambda layer that
multiplies out our X by 400.

29
00:01:07,735 --> 00:01:10,340
With the first test run
to establish

30
00:01:10,340 --> 00:01:12,905
the best learning, rate
we get this chart.

31
00:01:12,905 --> 00:01:14,870
This suggests
the best learning rate for

32
00:01:14,870 --> 00:01:18,110
this network will be
around 10 to the minus 5.

33
00:01:18,110 --> 00:01:20,630
So when I trained for 500 epochs

34
00:01:20,630 --> 00:01:22,595
with this setup,
here's my results.

35
00:01:22,595 --> 00:01:25,730
It's pretty good
with a nice low MAE.

36
00:01:25,730 --> 00:01:28,430
But when I look at my loss
function during training,

37
00:01:28,430 --> 00:01:30,485
I can see that there's
a lot of noise

38
00:01:30,485 --> 00:01:33,775
which tells me that I can
certainly optimize it a bit,

39
00:01:33,775 --> 00:01:36,140
and as we saw from
earlier videos,

40
00:01:36,140 --> 00:01:37,910
one of the best
things to look at in

41
00:01:37,910 --> 00:01:40,415
these circumstances
is the batch size.

42
00:01:40,415 --> 00:01:44,135
So I'll increase it
to 256 and retrain.

43
00:01:44,135 --> 00:01:46,940
After 500 epochs,
my predictions have

44
00:01:46,940 --> 00:01:50,325
improved a little which is
a step in the right direction.

45
00:01:50,325 --> 00:01:52,320
But look at my training noise.

46
00:01:52,320 --> 00:01:55,070
Particularly towards the end
of the training is really

47
00:01:55,070 --> 00:01:58,610
noisy but it's
a very regular looking wave.

48
00:01:58,610 --> 00:02:01,610
This suggests that my larger
batch size was good,

49
00:02:01,610 --> 00:02:03,320
but maybe a little off.

50
00:02:03,320 --> 00:02:05,360
It's not catastrophic
because as you

51
00:02:05,360 --> 00:02:07,160
can see the
fluctuations are really

52
00:02:07,160 --> 00:02:09,350
small but it would be very nice

53
00:02:09,350 --> 00:02:12,500
if we could regularize
this loss a bit more,

54
00:02:12,500 --> 00:02:15,605
which then brings me to
another thing to try.

55
00:02:15,605 --> 00:02:18,980
My training data has
3,000 data points in it.

56
00:02:18,980 --> 00:02:21,080
So why are things
like my window size

57
00:02:21,080 --> 00:02:22,310
and batch size powers of

58
00:02:22,310 --> 00:02:27,005
two that aren't necessarily
evenly divisible into 3,000?

59
00:02:27,005 --> 00:02:28,400
What would happen if I were to

60
00:02:28,400 --> 00:02:30,410
change my parameters to suit,

61
00:02:30,410 --> 00:02:32,210
and not just the window
and batch size,

62
00:02:32,210 --> 00:02:34,340
how about changing
the filters too?

63
00:02:34,340 --> 00:02:36,215
So what if I set that to 60,

64
00:02:36,215 --> 00:02:40,160
and the LSTMs to 60
instead of 32 or 64?

65
00:02:40,160 --> 00:02:43,370
My DNN already look good,
so I won't change them.

66
00:02:43,370 --> 00:02:46,385
So after training
this for 500 epochs,

67
00:02:46,385 --> 00:02:49,280
my scores improved
again albeit slightly.

68
00:02:49,280 --> 00:02:51,995
So it shows we're heading
in the right direction.

69
00:02:51,995 --> 00:02:54,410
What's interesting is
that the noise and

70
00:02:54,410 --> 00:02:57,160
the loss function actually
increased the bits,

71
00:02:57,160 --> 00:02:58,460
and that made me want to

72
00:02:58,460 --> 00:03:00,770
experiment with the
batch size again.

73
00:03:00,770 --> 00:03:05,080
So I reduced it to just 100
and I got these results.

74
00:03:05,080 --> 00:03:08,195
Now here my MAE has
actually gone up a little.

75
00:03:08,195 --> 00:03:09,440
The projections are doing

76
00:03:09,440 --> 00:03:11,075
much better in
the higher peaks than

77
00:03:11,075 --> 00:03:14,510
earlier but the overall
accuracy has gone down,

78
00:03:14,510 --> 00:03:16,430
and the loss has smoothed

79
00:03:16,430 --> 00:03:19,280
out except for a couple
of large blips.

80
00:03:19,280 --> 00:03:22,580
Experimenting with
hyperparameters like this is

81
00:03:22,580 --> 00:03:24,080
a great way to learn the ins

82
00:03:24,080 --> 00:03:25,640
and outs of machine learning,

83
00:03:25,640 --> 00:03:28,040
not just with sequences
but with anything.

84
00:03:28,040 --> 00:03:30,440
I thoroughly recommend
spending time on

85
00:03:30,440 --> 00:03:32,975
it and seeing if you can
improve on this model.

86
00:03:32,975 --> 00:03:35,180
In addition, you should accompany

87
00:03:35,180 --> 00:03:37,340
that work with looking
deeper into how

88
00:03:37,340 --> 00:03:39,440
all of these things in
machine learning work

89
00:03:39,440 --> 00:03:42,230
and Andrews courses
are terrific for that.

90
00:03:42,230 --> 00:03:43,595
I strongly recommend them

91
00:03:43,595 --> 00:03:45,660
if you haven't done them already.