1
00:00:11,070 --> 00:00:16,050
In this lecture, we are going to bridge the gap between exponential smoothing and the whole winters'

2
00:00:16,050 --> 00:00:16,560
model.

3
00:00:17,130 --> 00:00:19,230
This is going to be done in a series of steps.

4
00:00:19,230 --> 00:00:24,350
And the first step is to understand exponential smoothing from a different perspective.

5
00:00:24,750 --> 00:00:27,960
That is from the perspective of Time series forecasting.

6
00:00:28,890 --> 00:00:34,140
Previously, our perspective was simply what are different ways that we can take the average in a time

7
00:00:34,140 --> 00:00:34,720
series.

8
00:00:35,100 --> 00:00:41,160
Now it's more about how do we fit a model to the Time series and then how do we use that model to forecast

9
00:00:41,160 --> 00:00:42,130
future values?

10
00:00:42,630 --> 00:00:46,880
Mathematically, this is the same as before, but philosophically it's different.

11
00:00:51,920 --> 00:00:55,910
To do this, we're going to have to introduce some new and more precise notation.

12
00:00:56,630 --> 00:01:02,110
Let's begin by reviewing the exponentially weighted moving average once again, but using our new notation.

13
00:01:02,630 --> 00:01:09,200
So now we have we had a time t equal to Alpha times Y of T plus one minus Alpha times.

14
00:01:09,200 --> 00:01:12,620
We had a time T minus one in this form.

15
00:01:12,620 --> 00:01:13,370
Y had a time.

16
00:01:13,370 --> 00:01:21,500
T is our exponentially smooth version of Y at time t y it is the value from our time series at time.

17
00:01:21,500 --> 00:01:24,860
T Alpha is again the smoothing parameter.

18
00:01:29,900 --> 00:01:36,060
The next step is to now phrase this as a forecasting model, in this form, the equation looks as follows.

19
00:01:36,680 --> 00:01:41,720
You'll notice that I've introduced this vertical bar symbol, which, as you know, in probability means

20
00:01:41,720 --> 00:01:48,620
given some on the left hand side, we're saying why hat is the exponentially smooth forecast for a time

21
00:01:48,620 --> 00:01:56,570
T plus one given the known values at time t on the right hand side we have alpha times Y of T as usual,

22
00:01:56,750 --> 00:02:00,710
plus one minus four times the previous exponentially smooth value.

23
00:02:02,350 --> 00:02:08,310
Noticed something interesting about this, the time indices have changed for the current equation,

24
00:02:08,380 --> 00:02:15,640
the Y hat on the left is for the time index T plus one, the Y hat on the right has the time index T.

25
00:02:17,160 --> 00:02:22,290
If you look at the previous slide at the time and next is on the left and at T minus one is on the right.

26
00:02:23,010 --> 00:02:25,650
This is probably extremely confusing at this point.

27
00:02:25,650 --> 00:02:27,480
And I wouldn't blame you for thinking so.

28
00:02:27,930 --> 00:02:32,780
It will make much more sense when we look at the code and you actually observe this behavior for yourself.

29
00:02:33,480 --> 00:02:35,510
Now, realize that I didn't invent this.

30
00:02:35,520 --> 00:02:37,250
This is all part of the whole winters' model.

31
00:02:37,440 --> 00:02:39,420
So don't blame me if you find it confusing.

32
00:02:40,110 --> 00:02:44,820
By the time we reach the full whole Winsor's model and you see it in action, you will see that this

33
00:02:44,820 --> 00:02:45,720
all makes sense.

34
00:02:46,290 --> 00:02:48,840
Again, recognize that this is now a forecast.

35
00:02:49,260 --> 00:02:53,120
Previously we were not forecasting, we were just calculating averages.

36
00:02:53,640 --> 00:03:00,360
The forecast for the next timestep is equal to Alpha Times, the currently observed value plus one minus

37
00:03:00,360 --> 00:03:04,050
alpha times, the current exponentially smooth's fitted value.

38
00:03:09,080 --> 00:03:14,420
The next step is to express the simple exponential smoothing model in what is called component form.

39
00:03:15,050 --> 00:03:19,170
This will make it seem like overkill at this point because there is only a single component.

40
00:03:19,760 --> 00:03:24,870
However, the whole winters' model contains multiple components, which is where this becomes useful.

41
00:03:25,460 --> 00:03:28,490
So in this form, the forecast equation looks as follows.

42
00:03:29,540 --> 00:03:37,700
We say that we had a time at T plus H given T is equal to live T, that is the forecast at an arbitrary

43
00:03:37,700 --> 00:03:41,650
eight steps ahead is equal to Helft, whatever that is.

44
00:03:42,290 --> 00:03:49,370
Then the smoothing equation says this elev T is just the exponentially smooth average of Y of t the

45
00:03:49,370 --> 00:03:50,280
Time series.

46
00:03:50,930 --> 00:03:56,870
Notice that L.A. now becomes the exponentially smooth average and Y hat becomes the forecast.

47
00:03:58,960 --> 00:04:05,770
Also, note that L.A. gets back its original time indices from the previous lecture, that is, we now

48
00:04:05,770 --> 00:04:08,200
have tea on the left and T minus one on the right.

49
00:04:08,950 --> 00:04:15,770
So left is itself not a forecast, but the forecast is based on liberty for this simple model.

50
00:04:16,000 --> 00:04:18,250
The forecast is simply assigned it to be a.

51
00:04:19,000 --> 00:04:21,350
But this will get more complex in later lectures.

52
00:04:22,210 --> 00:04:24,130
So what does Aloft represent?

53
00:04:24,820 --> 00:04:29,760
LTE is called the level, which will become more clear when we put it into context later on.

54
00:04:30,520 --> 00:04:34,890
Basically level can be thought of as the moving average, which kind of makes sense.

55
00:04:35,350 --> 00:04:41,470
The level is the average value of the signal in time, but the actual signal may fluctuate around that

56
00:04:41,470 --> 00:04:42,280
average level.

57
00:04:43,760 --> 00:04:48,170
One important thing to note about this forecasting model is that it's not very expressive.

58
00:04:48,620 --> 00:04:50,940
The forecast is simply a constant value.

59
00:04:50,960 --> 00:04:54,680
It's always left no matter how many steps ahead you want to forecast.

60
00:04:55,310 --> 00:04:59,960
This must be the case because remember, the exponentially weighted moving average is nothing but an

61
00:04:59,960 --> 00:05:05,450
estimate of the mean, since all we're estimating is the mean and that's the only thing we can predict

62
00:05:05,990 --> 00:05:07,750
after we've stopped collecting data.

63
00:05:07,970 --> 00:05:12,150
We have no idea how this will change beyond the last known data point.

64
00:05:12,620 --> 00:05:18,020
And so our forecast simply consists of predicting the same mean value for each timestep.

65
00:05:22,920 --> 00:05:27,110
The last thing we were going to do in this lecture is talk about how this will work in code.

66
00:05:27,870 --> 00:05:32,420
Previously we looked at pandas as a way to calculate the exponentially weighted moving average.

67
00:05:32,730 --> 00:05:38,460
And we've noted that this lecture didn't introduce any new concepts other than some renaming of variables.

68
00:05:39,000 --> 00:05:44,820
However, in the next coding lecture, we're going to use stats models to perform exponential smoothing,

69
00:05:45,060 --> 00:05:48,870
which has a model that works more in line with what we discussed in this lecture.

70
00:05:49,710 --> 00:05:55,170
Furthermore, it's capable of making forecasts so that will allow us to use exponential smoothing as

71
00:05:55,170 --> 00:05:58,800
a predictive model rather than just some way of taking the average.

72
00:06:00,490 --> 00:06:06,220
So how does it work, since this is the first model from stat's models will be looking at in this course,

73
00:06:06,430 --> 00:06:10,110
how it works might be surprising if you come from a cyclone background.

74
00:06:10,810 --> 00:06:14,930
Basically, remember that this is a different library, so it has a different API.

75
00:06:15,790 --> 00:06:22,210
First, we start by importing the class, simple smoothing from the whole winters' module inside stat's

76
00:06:22,210 --> 00:06:22,810
models.

77
00:06:23,620 --> 00:06:29,110
Next, we create a model by instantiating an object of type, simple, smoothy.

78
00:06:30,130 --> 00:06:36,220
Note that when doing this, we pass in the data set, which should be a univariate time series, unlike

79
00:06:36,220 --> 00:06:39,430
Saikat Learn, which expects a two dimensional array, is input.

80
00:06:39,730 --> 00:06:45,000
This model expects a one dimensional array because the only dimension is time inside.

81
00:06:45,070 --> 00:06:48,340
It learn its number of samples by a number of features.

82
00:06:49,250 --> 00:06:55,040
Also, notice that, unlike Saikat, learn where the data is passed into the fit function with stats

83
00:06:55,040 --> 00:06:57,650
models, the data is passed into the constructor.

84
00:06:58,370 --> 00:07:03,560
The stats models object does have a fit function, which is what we will call next its function, except

85
00:07:03,570 --> 00:07:05,220
some parameters such as Alpha.

86
00:07:05,810 --> 00:07:10,340
So it's kind of like the reverse of Saikat, learn in Saikat, learn at the constructor.

87
00:07:10,340 --> 00:07:15,680
It usually takes in the hyper parameters and the fit function it takes in the data in stats models,

88
00:07:15,680 --> 00:07:21,650
the constructor takes in the data and the fit function takes in the hyper parameters for us.

89
00:07:21,650 --> 00:07:27,270
We're going to pass in Alpha into the fit function in addition to setting the argument optimized equal

90
00:07:27,320 --> 00:07:27,920
false.

91
00:07:28,700 --> 00:07:34,010
Note that for all of the models in the whole WINTERS' module, it's possible to fit parameters to the

92
00:07:34,010 --> 00:07:34,460
data.

93
00:07:35,060 --> 00:07:38,780
This makes it more like machine learning than simply calculating a moving average.

94
00:07:38,990 --> 00:07:43,180
But this will make more sense as our model gets more and more complex in later lectures.

95
00:07:44,090 --> 00:07:49,010
In order to make what we're doing equivalent to what we did previously, we will not try to optimize

96
00:07:49,010 --> 00:07:49,970
the value of Alpha.

97
00:07:50,660 --> 00:07:55,040
Basically, Alpha would be optimized in order to minimize the error on the training set.

98
00:07:55,610 --> 00:08:00,350
If you're familiar with machine learning, then you will understand that this involves taking the mean

99
00:08:00,350 --> 00:08:05,270
squared error between the true Time series and the predicted values and then minimizing that with respect

100
00:08:05,270 --> 00:08:05,920
to Alpha.

101
00:08:06,470 --> 00:08:09,830
If you're not comfortable with this idea, it's not necessary to know.

102
00:08:09,830 --> 00:08:12,620
So don't worry about it since we're not going to use it anyway.

103
00:08:15,180 --> 00:08:20,490
All right, so the fit function returns a result object, which is again different from what you're

104
00:08:20,490 --> 00:08:26,190
probably familiar with from Saikat, learn in Saikat, learn you just get a reference to the model object

105
00:08:26,190 --> 00:08:26,820
itself.

106
00:08:27,690 --> 00:08:29,900
In stat's models, we get a result object.

107
00:08:30,390 --> 00:08:36,000
Specifically, we get back a whole winters' results object so you can look up the documentation if you're

108
00:08:36,000 --> 00:08:39,750
interested in seeing what functions can be called on this object.

109
00:08:41,460 --> 00:08:47,490
The first function we are interested in is the predictive function, the way it works is this you pass

110
00:08:47,490 --> 00:08:51,060
in a start date for the start argument and an end date for the end argument.

111
00:08:51,270 --> 00:08:57,380
And then this will return an array containing a prediction of time series values for those dates.

112
00:08:58,790 --> 00:09:03,800
Note that these dates can be in sample or out of sample, meaning that they can be dates from the training

113
00:09:03,800 --> 00:09:09,550
set or after the training set, if the predicted dates are from inside the range of the train set,

114
00:09:09,740 --> 00:09:11,680
we call that an N sample prediction.

115
00:09:12,230 --> 00:09:16,850
If the prediction dates are from outside the range of the train set, we call that an out of sample

116
00:09:16,850 --> 00:09:17,660
forecast.

117
00:09:22,370 --> 00:09:27,770
A simpler way to get predictions is this if you want all of the train predictions, you simply call

118
00:09:27,770 --> 00:09:29,780
the attribute fitted values.

119
00:09:30,320 --> 00:09:35,170
This way you can avoid having to figure out the dates that you need for the start and end arguments.

120
00:09:35,780 --> 00:09:39,760
A simpler way to get a forecast is to simply call a forecast function.

121
00:09:40,430 --> 00:09:43,320
This accepts as input the number of forecasting steps.

122
00:09:43,340 --> 00:09:48,080
So, for example, if you want to forecast ten steps ahead, you just pass in 10.

123
00:09:48,470 --> 00:09:50,600
There's no need to figure out what the data is.

124
00:09:50,600 --> 00:09:51,470
10 steps ahead.

125
00:09:51,650 --> 00:09:53,840
Passing in the Integer 10 is enough.

126
00:09:54,980 --> 00:10:00,350
Note that this assumes that your forecast begins at the timestep after the end of the train set.

127
00:10:00,860 --> 00:10:06,380
So if your train set goes from one up to Big T., then your first forecast prediction will be for the

128
00:10:06,380 --> 00:10:08,270
time index, big T plus one.
