1
00:00:00,000 --> 00:00:03,779
Welcome back to practical time series analysis.

2
00:00:03,779 --> 00:00:08,054
We've been looking at forecasting in these lectures.

3
00:00:08,054 --> 00:00:10,560
We know how to do simple exponential smoothing,

4
00:00:10,560 --> 00:00:13,884
and now we'll move on to Holt Winters.

5
00:00:13,884 --> 00:00:18,120
This will allow us to deal with time series that exhibit some sort of trend,

6
00:00:18,120 --> 00:00:23,850
time series that are rising or falling or sometimes rising and sometimes falling.

7
00:00:23,850 --> 00:00:28,510
After this lecture, you'll be able to use

8
00:00:28,510 --> 00:00:31,329
the r command Holt winters to

9
00:00:31,329 --> 00:00:35,685
produce a forecast for one of these time series that exhibit a trend.

10
00:00:35,685 --> 00:00:37,854
And we always like you to be able to explain

11
00:00:37,854 --> 00:00:44,695
these fundamental tools to a friend or colleague in very simple terms.

12
00:00:44,695 --> 00:00:49,479
Here's our formula for a simple exponential smoothing.

13
00:00:49,479 --> 00:00:51,859
We put hats on things that we're estimating.

14
00:00:51,859 --> 00:00:53,674
And so X with a subscript,

15
00:00:53,674 --> 00:00:57,335
n+1 represents some sort of estimation.

16
00:00:57,335 --> 00:01:01,929
This is our forecast of what we think will be happening in the next time period.

17
00:01:01,929 --> 00:01:08,799
We build that as some sort of a weighted average of new information.

18
00:01:08,799 --> 00:01:11,230
The state of the system in time n,

19
00:01:11,230 --> 00:01:14,805
together with older historical information,

20
00:01:14,805 --> 00:01:20,260
and we think about these levels that we're creating as smoothed averages.

21
00:01:20,260 --> 00:01:25,004
There's a geometric series hiding inside of this formula here which motivates it.

22
00:01:25,004 --> 00:01:26,780
But if you don't want to worry about that,

23
00:01:26,780 --> 00:01:28,894
just think of it as a weighted average,

24
00:01:28,894 --> 00:01:32,719
some component which is new and some component which is,

25
00:01:32,719 --> 00:01:35,185
you could say, historical.

26
00:01:35,185 --> 00:01:37,640
So when we deal with these formulas,

27
00:01:37,640 --> 00:01:42,459
we have a few different notational things to keep in mind.

28
00:01:42,459 --> 00:01:46,519
Levels will always be associated in these lectures with the parameter alpha.

29
00:01:46,519 --> 00:01:48,709
Trend, rising and falling,

30
00:01:48,709 --> 00:01:51,715
will be associated with the parameter beta.

31
00:01:51,715 --> 00:01:56,875
And seasonal components will have a gamma to them.

32
00:01:56,875 --> 00:02:01,200
We'll look at for a motivating sample here,

33
00:02:01,200 --> 00:02:04,484
a data set on the volume of money.

34
00:02:04,484 --> 00:02:07,230
Money is actually more complicated than one might think,

35
00:02:07,230 --> 00:02:08,449
until you start studying it.

36
00:02:08,449 --> 00:02:12,939
But, let's just say how much money is circulating at a given time.

37
00:02:12,939 --> 00:02:16,560
The source here is the Australian Bureau of Statistics,

38
00:02:16,560 --> 00:02:20,735
and I obtained these data from the time series data library.

39
00:02:20,735 --> 00:02:23,319
There are a lot of really terrific data sets there,

40
00:02:23,319 --> 00:02:27,161
and you can go and explore them.

41
00:02:27,161 --> 00:02:32,479
This I'm sure looks like total gibberish if this is your first lecture,

42
00:02:32,479 --> 00:02:34,489
and you've never done r before.

43
00:02:34,489 --> 00:02:36,560
But at this point by week 5,

44
00:02:36,560 --> 00:02:40,520
most of us can look at this and it's almost transparent.

45
00:02:40,520 --> 00:02:45,740
We will say, get rid of all the old variables so we have a clean slate.

46
00:02:45,740 --> 00:02:49,639
We'll read in money.data with the r command read.table,

47
00:02:49,639 --> 00:02:51,680
we've encountered that before.

48
00:02:51,680 --> 00:02:55,669
I've taken, I've downloaded the data set and I stored

49
00:02:55,669 --> 00:02:59,795
it in a file called Volume-of-money-abs-definition.m,

50
00:02:59,795 --> 00:03:01,460
a little bit of an awkward name.

51
00:03:01,460 --> 00:03:05,790
I inherited that from the website.txt.

52
00:03:05,790 --> 00:03:07,925
It was a text file so I just clicked on it,

53
00:03:07,925 --> 00:03:13,889
and went in and got rid of some of the notational information at the top of the file,

54
00:03:13,889 --> 00:03:17,740
so that we only had now data to deal with.

55
00:03:17,740 --> 00:03:23,690
So, we'll create a time series object with a TS command as we've done before.

56
00:03:23,690 --> 00:03:28,039
We will start it off in February 1960,

57
00:03:28,039 --> 00:03:32,150
so we can cut 1960 the year together with the second data point.

58
00:03:32,150 --> 00:03:34,215
Frequency is 12.

59
00:03:34,215 --> 00:03:36,185
The plotting commands are,

60
00:03:36,185 --> 00:03:37,909
they should be instinctual.

61
00:03:37,909 --> 00:03:42,520
At this point, we're going to plot our time series.

62
00:03:42,520 --> 00:03:45,189
It's always good to get some good intuition going.

63
00:03:45,189 --> 00:03:50,479
We'll run our autocorrelation function and our partial autocorrelation function,

64
00:03:50,479 --> 00:03:55,955
and we'll think through our usual idea that we can perhaps identify

65
00:03:55,955 --> 00:03:58,580
an auto-regressive time series

66
00:03:58,580 --> 00:04:02,284
or a moving average time series or something which is a mixture,

67
00:04:02,284 --> 00:04:08,990
depending upon how things are tapering off or abruptly cutting off.

68
00:04:08,990 --> 00:04:11,901
Here's our trend.

69
00:04:11,901 --> 00:04:14,659
It's increasing.

70
00:04:14,659 --> 00:04:20,949
It does look to have some sort of very easy functional approach here.

71
00:04:20,949 --> 00:04:23,379
I wouldn't necessarily just fit an exponential to this.

72
00:04:23,379 --> 00:04:25,634
I certainly would not fit a straight line.

73
00:04:25,634 --> 00:04:29,769
It seems like things increase at times and stall at other times.

74
00:04:29,769 --> 00:04:32,170
So this is a nice candidate for the kind of forecasting

75
00:04:32,170 --> 00:04:35,274
that we're going to do rather than let's say, straight modeling.

76
00:04:35,274 --> 00:04:39,310
The autocorrelation function is what you would probably

77
00:04:39,310 --> 00:04:43,629
guess given time series that exhibits a trend like this.

78
00:04:43,629 --> 00:04:48,250
The autocorrelation is falling off rather gradually because of course,

79
00:04:48,250 --> 00:04:52,854
near neighbors have a good deal of information about where you are at the time series.

80
00:04:52,854 --> 00:04:56,485
The partial has that one lonely spike there.

81
00:04:56,485 --> 00:04:59,110
If you were modeling, you might immediately think that you

82
00:04:59,110 --> 00:05:02,685
could do a first order auto-regressive model.

83
00:05:02,685 --> 00:05:05,495
But again, that's putting a lot of,

84
00:05:05,495 --> 00:05:09,399
you're making that one auto-regressive coefficient do an awful lot of work.

85
00:05:09,399 --> 00:05:14,110
Instead of forming a formal mathematical model as we say,

86
00:05:14,110 --> 00:05:16,595
we'll do some forecasting instead.

87
00:05:16,595 --> 00:05:20,115
Quick reminder, we'll get a new level,

88
00:05:20,115 --> 00:05:27,045
a new smoothed average value as a weighted average of alpha times some new information,

89
00:05:27,045 --> 00:05:32,649
the new state of the system plus one minus Alpha times our old averaged value.

90
00:05:32,649 --> 00:05:35,819
Then you can see that exhibited in this formula right here.

91
00:05:35,819 --> 00:05:40,949
That's when we were just doing simple exponential smoothing.

92
00:05:40,949 --> 00:05:42,240
To get to double,

93
00:05:42,240 --> 00:05:45,529
we'll increase the complexity just a little bit.

94
00:05:45,529 --> 00:05:48,779
We're going to build a forecast as the level,

95
00:05:48,779 --> 00:05:52,375
the smoothed average value plus some trends data.

96
00:05:52,375 --> 00:05:55,935
We'll take a look at how the system is rising or falling.

97
00:05:55,935 --> 00:06:01,574
Expressed notationally, we'll say X item plus one looks like level time at time n,

98
00:06:01,574 --> 00:06:06,389
plus trend at time n. I'll see how we unpack these two terms.

99
00:06:06,389 --> 00:06:09,540
Level at time n is going to be alpha times new information.

100
00:06:09,540 --> 00:06:12,700
No surprise there. Plus one minus alpha.

101
00:06:12,700 --> 00:06:15,319
And now we're going to include our old level,

102
00:06:15,319 --> 00:06:20,475
but we're going to try to bring in the trend in our smoothed average here.

103
00:06:20,475 --> 00:06:22,722
So if the series just rose or fell,

104
00:06:22,722 --> 00:06:24,524
we'll try to bring that in.

105
00:06:24,524 --> 00:06:27,389
So level at time n is alpha Xn,

106
00:06:27,389 --> 00:06:32,860
and then one minus alpha times previous level and the previous trend.

107
00:06:32,860 --> 00:06:36,540
So we're going to build trends now and again,

108
00:06:36,540 --> 00:06:38,475
the geometric series is hiding behind this.

109
00:06:38,475 --> 00:06:43,589
We're going to build trends now based upon a smoothed average of past trends.

110
00:06:43,589 --> 00:06:47,339
We're trying to suppress the effect of too much noise at

111
00:06:47,339 --> 00:06:51,740
any one step in our forecast. So off we go.

112
00:06:51,740 --> 00:06:56,310
Trend to time n, we'll treat that as a weighted average.

113
00:06:56,310 --> 00:06:59,855
This probably looks pretty intuitive to you at this point.

114
00:06:59,855 --> 00:07:03,870
Beta, it's a new piece of notation, but there you go.

115
00:07:03,870 --> 00:07:09,060
So, beta times new trend and one minus beta times old trend.

116
00:07:09,060 --> 00:07:11,970
So the old trend once we have a little induction,

117
00:07:11,970 --> 00:07:14,264
we know our trends as we move along.

118
00:07:14,264 --> 00:07:17,125
This will be a smooth average on the trends.

119
00:07:17,125 --> 00:07:19,319
How would you define your new trend?

120
00:07:19,319 --> 00:07:24,045
The new trend will be the change in levels.

121
00:07:24,045 --> 00:07:27,704
So trend to time n then is a weighted average

122
00:07:27,704 --> 00:07:33,824
of new information and historical information.

123
00:07:33,824 --> 00:07:36,834
The call is rather simple.

124
00:07:36,834 --> 00:07:38,415
Holt Winters is our command.

125
00:07:38,415 --> 00:07:41,129
We'll feed it our time series right here.

126
00:07:41,129 --> 00:07:47,600
The alpha and beta parameters will let Holt Winters figure out optimal values there.

127
00:07:47,600 --> 00:07:49,904
We're turning off, we're suppressing seasonality,

128
00:07:49,904 --> 00:07:53,004
so we'll say gamma is equal to false.

129
00:07:53,004 --> 00:07:55,314
We could write our own code to do this.

130
00:07:55,314 --> 00:07:58,064
We've seen how in past lectures,

131
00:07:58,064 --> 00:08:00,884
and I certainly would not stop you.

132
00:08:00,884 --> 00:08:03,209
But at a certain point, we'd like to get some work done,

133
00:08:03,209 --> 00:08:06,800
and we've got the call here and we'll just go ahead and use it.

134
00:08:06,800 --> 00:08:11,699
The optimal value for alpha and beta would be,

135
00:08:11,699 --> 00:08:14,000
take a look at that alpha value.

136
00:08:14,000 --> 00:08:18,899
You can see pretty quickly that the alpha value is rather close to one.

137
00:08:18,899 --> 00:08:23,060
So, where's that going to put the emphasis in the trend piece?

138
00:08:23,060 --> 00:08:27,488
We've got alpha times new plus one minus alpha times historical.

139
00:08:27,488 --> 00:08:31,409
So this is going to put a fair amount of weight on the new data point.

140
00:08:31,409 --> 00:08:39,070
The beta, the perimeter accommodating our trend is really rather small.

141
00:08:39,070 --> 00:08:41,370
So if you remember how we constructed that,

142
00:08:41,370 --> 00:08:43,590
this is going to put more weight on

143
00:08:43,590 --> 00:08:47,424
the historical piece and less weight on the new piece.

144
00:08:47,424 --> 00:08:50,085
And the seen as telling us that gamma's false.

145
00:08:50,085 --> 00:08:54,105
Coefficients are where you go for your new forecast.

146
00:08:54,105 --> 00:08:59,235
We can read our new forecasted output,

147
00:08:59,235 --> 00:09:06,960
or system state and our new forecast the trend.

148
00:09:06,960 --> 00:09:09,745
These pictures shows how we're doing.

149
00:09:09,745 --> 00:09:12,929
I realize especially if you're viewing this on a cell phone that it's very

150
00:09:12,929 --> 00:09:17,039
hard to see in any detail what's going on in this picture.

151
00:09:17,039 --> 00:09:22,664
So, I've zoomed in for you on the years 1982 1985,

152
00:09:22,664 --> 00:09:27,565
and there's something sort of interesting if you take a look at what's going on here.

153
00:09:27,565 --> 00:09:31,559
Let's look at a piece where we see that we have a decent forecast guide,

154
00:09:31,559 --> 00:09:34,634
and it looks like on a piece which is nearly linear,

155
00:09:34,634 --> 00:09:40,549
that the forecast is matching what the series actually did pretty well.

156
00:09:40,549 --> 00:09:42,980
But look what happens when the series,

157
00:09:42,980 --> 00:09:45,990
this time series of course is shown in black here.

158
00:09:45,990 --> 00:09:50,129
Look what happens when the series makes a more abrupt jump up.

159
00:09:50,129 --> 00:09:57,029
As the trend goes from an around 45 degrees to what would you call that, 60, 70 degrees?

160
00:09:57,029 --> 00:09:59,444
As the trend kicks up,

161
00:09:59,444 --> 00:10:04,544
it takes the forecasts just a little while to become aware of that so to speak,

162
00:10:04,544 --> 00:10:09,539
and you'll see the forecast moving in the same direction as previous.

163
00:10:09,539 --> 00:10:11,759
But then it does realize that

164
00:10:11,759 --> 00:10:16,065
this last little bit of trend is not just a little bit of noise,

165
00:10:16,065 --> 00:10:18,450
but rather something that's more persistent.

166
00:10:18,450 --> 00:10:22,379
And it accommodates that in how it calculates the forecasts.

167
00:10:22,379 --> 00:10:24,659
You can see something similar happening here,

168
00:10:24,659 --> 00:10:28,875
where we were moving along with a decently constant trend,

169
00:10:28,875 --> 00:10:31,679
the time series took a dive down,

170
00:10:31,679 --> 00:10:34,080
and it took the forecasts a couple of time periods to

171
00:10:34,080 --> 00:10:38,179
realize the new information to incorporate that new information.

172
00:10:38,179 --> 00:10:42,880
But it did, and then we moved on.

173
00:10:42,880 --> 00:10:48,335
At this point, if you have a time series that exhibits some trend,

174
00:10:48,335 --> 00:10:53,720
you might think about using Holt Winters in order to do your forecasting.

175
00:10:53,720 --> 00:10:55,100
You know how to use Holt Winters now,

176
00:10:55,100 --> 00:10:57,139
it's really rather a simple call.

177
00:10:57,139 --> 00:11:02,495
And you should be able to explain to a friend or colleague that Holt Winters

178
00:11:02,495 --> 00:11:08,120
when we do double exponential smoothing is a way to accommodate new information,

179
00:11:08,120 --> 00:11:13,529
and also take advantage of historical information when forming a forecast.