1
00:00:00,000 --> 00:00:02,040
Let's take a look at running

2
00:00:02,040 --> 00:00:03,890
some statistical forecasting on

3
00:00:03,890 --> 00:00:06,480
the Synthetic Dataset that
you've been working with.

4
00:00:06,480 --> 00:00:08,325
This should give us some form of

5
00:00:08,325 --> 00:00:09,780
baseline that we'll see

6
00:00:09,780 --> 00:00:11,725
if we can beat it with
Machine Learning.

7
00:00:11,725 --> 00:00:14,220
You saw the details
in the previous video

8
00:00:14,220 --> 00:00:17,680
and you'll go through
this workbook in this video.

9
00:00:17,690 --> 00:00:20,520
Before you start, make
sure you are running

10
00:00:20,520 --> 00:00:21,900
Python 3 and you're

11
00:00:21,900 --> 00:00:24,525
using an environment
that provides a GPU.

12
00:00:24,525 --> 00:00:26,895
Some of the code will require

13
00:00:26,895 --> 00:00:28,635
TensoFlow 2.0 to be installed.

14
00:00:28,635 --> 00:00:30,180
So make sure that you have it.

15
00:00:30,180 --> 00:00:32,130
This code will print
out your version.

16
00:00:32,130 --> 00:00:34,245
If you have something
before to 2.0,

17
00:00:34,245 --> 00:00:36,240
you'll need to
install the latest.

18
00:00:36,240 --> 00:00:38,520
To install it, use this code.

19
00:00:38,520 --> 00:00:39,840
At the time of recording,

20
00:00:39,840 --> 00:00:42,015
TensorfFlow 2.0 was at Beta 1.

21
00:00:42,015 --> 00:00:44,450
You can check out the latest
install instructions on

22
00:00:44,450 --> 00:00:46,760
TensorFlow.org for
the updated version

23
00:00:46,760 --> 00:00:48,730
so that you can install it.

24
00:00:48,730 --> 00:00:51,510
Once it's done,
you'll see a message

25
00:00:51,510 --> 00:00:54,445
to restart the run-time,
makes sure that you do that.

26
00:00:54,445 --> 00:00:56,330
Check that you still have

27
00:00:56,330 --> 00:00:59,405
a Python 3 GPU run-time
and run the script again.

28
00:00:59,405 --> 00:01:02,910
You should see that
2.0 is now installed.

29
00:01:03,640 --> 00:01:06,290
The next code block condenses

30
00:01:06,290 --> 00:01:08,765
a lot of what you saw in
the previous lessons.

31
00:01:08,765 --> 00:01:10,640
It will create a time series with

32
00:01:10,640 --> 00:01:13,355
trend, seasonality, and noise.

33
00:01:13,355 --> 00:01:16,590
You can see it in the graph here.

34
00:01:16,630 --> 00:01:21,140
Now to create a training
validation set split,

35
00:01:21,140 --> 00:01:22,820
we'll simply split the array

36
00:01:22,820 --> 00:01:25,240
containing the data
at index 1,000,

37
00:01:25,240 --> 00:01:26,815
and we will chart it.

38
00:01:26,815 --> 00:01:29,360
Again, we can see that
the seasonality is

39
00:01:29,360 --> 00:01:32,045
maintained and it's
still trending upwards.

40
00:01:32,045 --> 00:01:33,965
It also contains some noise.

41
00:01:33,965 --> 00:01:36,800
The validation set
is similar and while

42
00:01:36,800 --> 00:01:40,250
the charts may appear
different, checkout the x-axis.

43
00:01:40,250 --> 00:01:42,620
You can see that we've
zoomed in quite a bit on it,

44
00:01:42,620 --> 00:01:44,540
but it is the same pattern.

45
00:01:44,540 --> 00:01:47,705
Now let's start doing some
of the naive prediction.

46
00:01:47,705 --> 00:01:50,090
The first super
simple prediction is

47
00:01:50,090 --> 00:01:52,865
to just predict the value
at time period plus one.

48
00:01:52,865 --> 00:01:55,925
It's the same as the value
of the current time period.

49
00:01:55,925 --> 00:01:58,830
So we'll create data called
Naive Forecasting that

50
00:01:58,830 --> 00:02:02,145
simply copies the training data
at time t minus 1.

51
00:02:02,145 --> 00:02:04,220
When we plot it, we see

52
00:02:04,220 --> 00:02:05,750
the original series in blue

53
00:02:05,750 --> 00:02:08,240
and the predicted one in orange.

54
00:02:08,240 --> 00:02:10,670
It's hard to make it out.

55
00:02:10,670 --> 00:02:12,200
So let's zoom in a little.

56
00:02:12,200 --> 00:02:14,420
We're looking at
the start of the data,

57
00:02:14,420 --> 00:02:18,035
and that's sharp climb this
C. So when we zoom in,

58
00:02:18,035 --> 00:02:19,760
we can see that
the orange data is

59
00:02:19,760 --> 00:02:23,135
just one time-step
after the blue data.

60
00:02:23,135 --> 00:02:25,760
This code will then
print the mean

61
00:02:25,760 --> 00:02:27,950
squared and mean absolute errors.

62
00:02:27,950 --> 00:02:29,600
We'll see what they are. We get.

63
00:02:29,600 --> 00:02:32,285
61.8 and 5.9 respectively.

64
00:02:32,285 --> 00:02:34,680
We'll call that our baseline.

65
00:02:35,270 --> 00:02:37,670
So now let's get a little smarter

66
00:02:37,670 --> 00:02:39,095
and try a moving average.

67
00:02:39,095 --> 00:02:41,570
In this case, the point
in time t will

68
00:02:41,570 --> 00:02:44,305
be average of
the 30 points prior to it.

69
00:02:44,305 --> 00:02:47,600
This gives us
a nice smoothing effect.

70
00:02:48,500 --> 00:02:51,680
If we print out the error
values for this,

71
00:02:51,680 --> 00:02:53,060
we'll get values higher

72
00:02:53,060 --> 00:02:54,980
than those for
the naive prediction.

73
00:02:54,980 --> 00:02:57,635
Remember, for errors
lower as better.

74
00:02:57,635 --> 00:02:59,180
So we can say that
this is actually

75
00:02:59,180 --> 00:03:02,885
worse than the naive prediction
that we made earlier on.

76
00:03:02,885 --> 00:03:05,765
So let's try a little trick
to improve this.

77
00:03:05,765 --> 00:03:08,180
Since the seasonality
on this Data is

78
00:03:08,180 --> 00:03:10,775
one year or 365 days,

79
00:03:10,775 --> 00:03:12,350
let's take a look
at the difference

80
00:03:12,350 --> 00:03:13,490
between the data at time

81
00:03:13,490 --> 00:03:17,485
t and the data from
365 days before that.

82
00:03:17,485 --> 00:03:18,860
When we plot that,

83
00:03:18,860 --> 00:03:21,410
we can see that
the seasonality is gone and

84
00:03:21,410 --> 00:03:24,460
we're looking at
the raw data plus the noise.

85
00:03:24,460 --> 00:03:28,159
So now, if we calculate
a moving average on this data,

86
00:03:28,159 --> 00:03:30,725
we'll see a relatively
smooth moving average

87
00:03:30,725 --> 00:03:33,230
not impacted by seasonality.

88
00:03:33,230 --> 00:03:35,150
Then if we add back

89
00:03:35,150 --> 00:03:37,540
the past values to
this moving average,

90
00:03:37,540 --> 00:03:40,115
we'll start to see
a pretty good prediction.

91
00:03:40,115 --> 00:03:43,530
The orange line is quite
close to the blue one.

92
00:03:43,700 --> 00:03:46,510
If we calculate
the errors on this,

93
00:03:46,510 --> 00:03:49,355
you'll see that we have a
better value than the baseline.

94
00:03:49,355 --> 00:03:51,860
We're definitely heading
in the right direction.

95
00:03:51,860 --> 00:03:54,320
But all we did was just add in

96
00:03:54,320 --> 00:03:57,130
the raw historic values
which are very noisy.

97
00:03:57,130 --> 00:03:59,510
What if, instead, we added in

98
00:03:59,510 --> 00:04:02,360
the moving average of
the historic values,

99
00:04:02,360 --> 00:04:05,585
so we're effectively using
two different moving averages?

100
00:04:05,585 --> 00:04:08,120
Now, our prediction curve
is a lot less

101
00:04:08,120 --> 00:04:10,850
noisy and the predictions
are looking pretty good.

102
00:04:10,850 --> 00:04:13,250
If we measure
their overall error,

103
00:04:13,250 --> 00:04:15,870
the numbers agree with
our visual inspection,

104
00:04:15,870 --> 00:04:19,080
the error rate has
improved further.

105
00:04:19,300 --> 00:04:22,280
That was a pretty simple
introduction to using

106
00:04:22,280 --> 00:04:23,780
some mathematical methods to

107
00:04:23,780 --> 00:04:26,435
analyze a series and
get a basic prediction.

108
00:04:26,435 --> 00:04:29,660
With a bit of fiddling, we
got a pretty decent one too.

109
00:04:29,660 --> 00:04:32,060
Next week, you'll look at
using what you've learned

110
00:04:32,060 --> 00:04:35,630
from Machine Learning to see
if you can improve on it.