1
00:00:11,050 --> 00:00:14,680
In this lecture, we are going to discuss holds linear trend model.

2
00:00:15,430 --> 00:00:21,190
Previously we looked at simple exponential smoothing and we noted that our forecasts were always a straight

3
00:00:21,190 --> 00:00:27,370
horizontal line, holds linear trend model, extends the exponential smoothing model so that we can

4
00:00:27,370 --> 00:00:29,470
capture and forecast trends.

5
00:00:30,130 --> 00:00:33,320
Let's start by considering what linear trend might look like.

6
00:00:33,670 --> 00:00:35,040
Well, linear means line.

7
00:00:35,470 --> 00:00:41,980
We know that the equation for a line in general is Y equals M, X plus B, let's suppose that instead

8
00:00:41,980 --> 00:00:46,240
we say Y, Sebti equals slope times T plus.

9
00:00:46,240 --> 00:00:46,820
Why not?

10
00:00:47,530 --> 00:00:48,730
This doesn't change anything.

11
00:00:48,740 --> 00:00:52,330
All we've done is replaced X with T and be with Y not.

12
00:00:53,350 --> 00:00:59,650
As you know, the coefficient in front of T is called the slope and the constant term by itself is called

13
00:00:59,650 --> 00:01:00,520
the intercept.

14
00:01:01,480 --> 00:01:07,520
Clearly it is the value of Y at time T equals zero, hence the symbol y not or Y zero.

15
00:01:08,500 --> 00:01:13,540
Note that the slope is the amount that y changes when t increases by one.

16
00:01:14,350 --> 00:01:19,900
Understanding this kind of equation is a crucial step in understanding Holtze linear trend model.

17
00:01:25,020 --> 00:01:30,390
The best way to understand holds linear trend model, in my opinion, is to simply look at the model

18
00:01:30,390 --> 00:01:34,320
equations and to try and decipher each component one at a time.

19
00:01:35,070 --> 00:01:39,720
The first thing you'll notice is that it's very similar to the simple exponential smoothing equations

20
00:01:39,720 --> 00:01:40,890
in component form.

21
00:01:41,370 --> 00:01:44,650
That's why it was so important to rearrange them in that way.

22
00:01:45,780 --> 00:01:51,090
You'll notice that, whereas the simple exponential smoothing model had two equations, we now have

23
00:01:51,090 --> 00:01:52,270
three equations.

24
00:01:52,620 --> 00:01:57,580
So we've added one more equation, specifically one more equation for modeling the trend.

25
00:01:58,230 --> 00:02:04,080
So now we have three equations in total, one for the forecast, which we already had, one for the

26
00:02:04,080 --> 00:02:08,070
level which we already had, and one new equation for the trend.

27
00:02:12,970 --> 00:02:18,790
Let's start by studying the forecast equation, as you can see, this is nothing but the equation for

28
00:02:18,790 --> 00:02:19,340
a line.

29
00:02:19,900 --> 00:02:25,210
Of course, we now have different symbols, but I know you're not going to let that intimidate you as

30
00:02:25,210 --> 00:02:25,830
before.

31
00:02:25,840 --> 00:02:29,350
We have the level which represents some sort of average value.

32
00:02:29,890 --> 00:02:36,550
But now we also have a term that increases linearly with H, which is the number of steps in the forecast.

33
00:02:37,660 --> 00:02:44,020
Clearly, this means that B of T is the trend, or in other words, the slope that is elevated, the

34
00:02:44,020 --> 00:02:51,040
level is the starting value of the forecast and then the forecast increases by the amount B of T for

35
00:02:51,040 --> 00:02:56,500
each step into the future that we forecast, of course, but can also be negative.

36
00:02:56,680 --> 00:03:00,490
So if you increase by a negative amount, then you're actually decreasing.

37
00:03:01,270 --> 00:03:07,420
The key point is, thanks to this new linear trend term, our forecast is no longer a horizontal line,

38
00:03:07,630 --> 00:03:09,370
but a line in any direction.

39
00:03:09,910 --> 00:03:13,150
This justifies the name holds linear trend model.

40
00:03:18,130 --> 00:03:23,900
Let's now study the level equation again, this is very similar to the level equation from before.

41
00:03:24,280 --> 00:03:29,100
The only difference is on the far right, which represents this moving average value.

42
00:03:29,830 --> 00:03:35,740
Remember that the prediction is now made up of two components, the level and the trend, which are

43
00:03:35,740 --> 00:03:37,380
represented by LMB.

44
00:03:38,020 --> 00:03:45,130
As you recall, in order to make a forecast, we find L plus H times B, but since this is just one

45
00:03:45,130 --> 00:03:47,080
step ahead, H equals one.

46
00:03:47,230 --> 00:03:54,160
And so the value of our prediction is just L plus B or more specifically L at T minus one, plus B,

47
00:03:54,160 --> 00:03:55,180
A, T minus one.

48
00:03:55,870 --> 00:03:58,050
So that concludes how to update the level.

49
00:03:58,480 --> 00:04:05,080
It is still exponential smoothing on the input signal way T using L of T minus one A plus B of T minus

50
00:04:05,080 --> 00:04:07,840
one as the previous smoothed value.

51
00:04:12,780 --> 00:04:18,990
Finally, we have the trend equation, as you can see, the general form of this equation still follows

52
00:04:18,990 --> 00:04:20,490
that of exponential smoothing.

53
00:04:20,910 --> 00:04:24,670
We have better times, something plus one minus BITA times the old value.

54
00:04:25,230 --> 00:04:29,860
In other words, the trend to B of T is also an exponentially smooth estimate.

55
00:04:30,510 --> 00:04:33,310
But what is it an exponentially smooth estimate of?

56
00:04:33,900 --> 00:04:36,200
Well, that's the thing that goes in front of the beta.

57
00:04:36,390 --> 00:04:37,420
So what goes there?

58
00:04:38,010 --> 00:04:41,460
In fact, it's just elev T minus elev T minus one.

59
00:04:42,090 --> 00:04:43,680
So why does this make sense?

60
00:04:44,400 --> 00:04:48,930
Remember that B of T is trying to estimate the slope or the trend of the signal.

61
00:04:49,470 --> 00:04:55,470
The slope of the signal is the value at one time, minus the value at some other time divided by the

62
00:04:55,470 --> 00:04:56,530
difference in time.

63
00:04:57,150 --> 00:05:02,310
In our case, the time difference is just one, because we're updating B of T on every timestep.

64
00:05:02,550 --> 00:05:05,970
So of T minus elev T minus one makes sense.

65
00:05:07,080 --> 00:05:14,070
You might think y ltt and not Y of T remember that one way of thinking of a time series signal is that

66
00:05:14,070 --> 00:05:15,000
it is noisy.

67
00:05:15,540 --> 00:05:20,450
The level of T represents the smooth version of that signal without the noise.

68
00:05:20,820 --> 00:05:26,910
And so it's making the difference in L.A. is a better estimate of the overall trend because it essentially

69
00:05:26,910 --> 00:05:29,490
removes that noise from the equation.

70
00:05:34,500 --> 00:05:39,780
Recall that previously Alpha was treated as sort of a hyper parameter that could be tuned depending

71
00:05:39,780 --> 00:05:41,040
on what you were trying to do.

72
00:05:41,520 --> 00:05:43,580
In fact, this is often the case.

73
00:05:44,040 --> 00:05:49,770
For example, suppose you're working as an audio engineer and you would like to remove some noise from

74
00:05:49,770 --> 00:05:50,590
a sound file.

75
00:05:51,270 --> 00:05:56,100
Well, if you have experience with sound editing, then you probably know that you often have to choose

76
00:05:56,100 --> 00:05:57,580
some parameters yourself.

77
00:05:58,260 --> 00:06:02,700
You tune these to your tastes depending on what the final output signal sounds like.

78
00:06:03,150 --> 00:06:08,520
You want it to sound subjectively good, which in general is not something that can be quantified.

79
00:06:13,340 --> 00:06:18,320
On the other hand, with Holtze linear trend model, now we're getting closer to the machine learning

80
00:06:18,320 --> 00:06:23,780
picture where what we care about is predictive accuracy, we want our forecast to be good.

81
00:06:23,960 --> 00:06:28,930
And now that our model can exhibit a trend, it actually has a chance of making decent predictions.

82
00:06:29,420 --> 00:06:34,790
Therefore, the parameters that we learned about in this lecture, Alpha and Beta will no longer be

83
00:06:34,790 --> 00:06:39,730
chosen arbitrarily, but rather we can fit them as usual.

84
00:06:39,740 --> 00:06:44,000
If you're familiar with machine learning, then you know, this involves setting up a lost function

85
00:06:44,000 --> 00:06:49,400
like the mean squared error and then minimizing the squarer over the training set using methods such

86
00:06:49,400 --> 00:06:50,420
as gradient descent.

87
00:06:51,050 --> 00:06:55,970
If you are unfamiliar with this process, then you don't need to worry because stats models is going

88
00:06:55,970 --> 00:06:57,610
to do all this work for us.

89
00:07:02,550 --> 00:07:09,090
Finally, let's take a quick look at how the code will look at a high level as before we start by instantiating

90
00:07:09,090 --> 00:07:15,690
the model, unlike cyclosarin and most modern machine learning APIs, this is where we pass in the data,

91
00:07:15,960 --> 00:07:18,150
which is a univariate time series.

92
00:07:18,660 --> 00:07:24,690
Unlike Saikat learn, since the data is UNIVARIATE, it's OK if this data array is one dimensional.

93
00:07:25,710 --> 00:07:31,600
Next, we call the fit function, which in this case doesn't take in any parameters inside yet learn.

94
00:07:31,800 --> 00:07:33,630
This is where the data would usually go.

95
00:07:34,560 --> 00:07:41,060
This returns a result object we can use the result object to obtain the predicted values from the set

96
00:07:41,190 --> 00:07:43,650
by calling the attribute fitted values.

97
00:07:43,890 --> 00:07:50,310
And if we want to forecast, we can call results forecast passing in the number of time steps to forecast.