1
00:00:10,260 --> 00:00:16,850
OK, so in this lecture, we are going to implement simple exponential smoothing in code, as you recall,

2
00:00:16,860 --> 00:00:20,800
this is the same operation as the exponentially weighted moving average.

3
00:00:21,210 --> 00:00:23,270
The main difference is philosophical.

4
00:00:23,640 --> 00:00:29,120
We are now treating this as a forecasting model rather than just a method of calculating a moving average.

5
00:00:29,580 --> 00:00:33,960
In addition, instead of using pandas, we'll be using the stats models library.

6
00:00:35,560 --> 00:00:39,260
So the first thing we have to do in this video is update stats, models.

7
00:00:39,730 --> 00:00:45,100
This is because the current version of stat's models on CoLab is not the same as the one you might install

8
00:00:45,100 --> 00:00:45,730
yourself.

9
00:00:46,240 --> 00:00:51,130
So in order to try and stay up to date with the latest API, we're going to make sure that we run this

10
00:00:51,130 --> 00:00:51,800
command.

11
00:00:52,600 --> 00:00:58,690
Now, of course, when and notice I'm saying when, not if this API changes, you'll have to visit the

12
00:00:58,690 --> 00:01:00,970
documentation to see what has changed.

13
00:01:01,480 --> 00:01:05,550
Remember that with libraries like these, this could happen next week or next month.

14
00:01:05,770 --> 00:01:06,970
So be prepared.

15
00:01:07,330 --> 00:01:09,640
This is all part of being a good data scientist.

16
00:01:12,780 --> 00:01:17,160
OK, so the next step is to import the simple exponential smoothing class.

17
00:01:22,870 --> 00:01:28,390
Next, we're going to instantiate an instance of our new class passing in our passengers data into the

18
00:01:28,390 --> 00:01:31,540
constructor, we'll call this model Se's.

19
00:01:38,960 --> 00:01:44,650
Notice that we get a warning which says no frequency information was provided, so inferred frequency

20
00:01:44,650 --> 00:01:45,910
EMS will be used.

21
00:01:46,700 --> 00:01:52,190
The reason for this is when we load it in our data frame, Panas didn't automatically assign a frequency

22
00:01:52,400 --> 00:01:53,460
to the index.

23
00:01:54,500 --> 00:02:01,130
Also, notice this second warning, which says after zero 13 initialization must be handled their model

24
00:02:01,130 --> 00:02:01,820
creation.

25
00:02:02,900 --> 00:02:09,140
So basically this controls how you determine the first value of the moving average in previous versions

26
00:02:09,140 --> 00:02:10,010
of Stach models.

27
00:02:10,190 --> 00:02:11,660
This was chosen automatically.

28
00:02:15,650 --> 00:02:19,390
In the next block, I'm going to print out the index of our data frame.

29
00:02:22,700 --> 00:02:24,860
And we can see that the frequency is known.

30
00:02:29,360 --> 00:02:31,330
Luckily, we can fix this pretty easily.

31
00:02:31,400 --> 00:02:35,760
We can just assign it DFG Index d'Afrique to the string Ms.

32
00:02:36,260 --> 00:02:37,460
MS means months.

33
00:02:37,640 --> 00:02:42,260
And if you want to know how I figured this out, you can check the link I have left above, which lists

34
00:02:42,260 --> 00:02:47,750
out all the different possible frequency strings that you can assign to date range indices.

35
00:02:52,900 --> 00:02:58,870
Next, I'm going to create another instance of our simple smoothing class with our newly updated data

36
00:02:58,870 --> 00:02:59,380
frame.

37
00:02:59,860 --> 00:03:03,800
I'm also going to set initialization method to legacy heuristic.

38
00:03:04,240 --> 00:03:08,110
This will make it so that it'll work just like it did in the old versions.

39
00:03:11,200 --> 00:03:14,230
As you can see, we no longer get the above warnings.

40
00:03:16,810 --> 00:03:22,570
The next step is to call the fit function on our model, as I mentioned in the theory lecture, we are

41
00:03:22,570 --> 00:03:26,890
going to use a fixed alpha and we're going to set optimized equal to falls.

42
00:03:27,550 --> 00:03:32,470
This is just so that it does the same calculation we did previously, which allows us to compare the

43
00:03:32,470 --> 00:03:33,210
outputs.

44
00:03:33,940 --> 00:03:36,790
So let's assign the return value to Rez.

45
00:03:40,030 --> 00:03:43,630
Next, I'm going to print out Rez so we know what we are working with.

46
00:03:46,450 --> 00:03:52,030
As you can see, it's a whole winter's results rapper, so if you want to look up the documentation

47
00:03:52,030 --> 00:03:53,680
for this class now, you can.

48
00:03:56,540 --> 00:04:02,510
The next step is to call the predict function on a results object, since I want the predictions for

49
00:04:02,510 --> 00:04:03,930
our entire dataset.

50
00:04:04,160 --> 00:04:10,820
I'm going to pass in a start date equal to the first row index of our data frame and end date, equal

51
00:04:10,820 --> 00:04:13,250
to the last row index of our data frame.

52
00:04:16,100 --> 00:04:19,280
As you can see, this returns upand a series.

53
00:04:23,310 --> 00:04:28,530
In the next block, I'm going to call the same function again, but I'm going to assign it to a new

54
00:04:28,530 --> 00:04:31,050
column in our data frame called Se's.

55
00:04:35,440 --> 00:04:41,770
Next, I want to demonstrate that the predicts function, since we called it for only in sample dates,

56
00:04:42,040 --> 00:04:45,740
actually returns the same thing as the fitted values attribute.

57
00:04:46,240 --> 00:04:51,490
So in the next block, I use the function and I'll close passing in these two strings.

58
00:04:53,840 --> 00:04:59,240
Since the result is true, that means the predict function returns to the same values as the fitted

59
00:04:59,240 --> 00:05:00,210
values attribute.

60
00:05:02,550 --> 00:05:04,680
Next, I'm going to plot our data frame.

61
00:05:08,940 --> 00:05:13,410
Note that this is the entire data frame, including everything we put in there previously.

62
00:05:14,670 --> 00:05:17,270
Now, right away, you should notice something strange.

63
00:05:17,700 --> 00:05:21,150
The results from our CBS model are different from pandas.

64
00:05:21,430 --> 00:05:23,280
They seem to be shifted up by one.

65
00:05:23,490 --> 00:05:24,360
Why is this?

66
00:05:27,350 --> 00:05:30,440
Let's do a dive head to get an idea of what's happening.

67
00:05:36,480 --> 00:05:43,530
As you can see for both women and says the first value is 112, which is just the first value of our

68
00:05:43,530 --> 00:05:49,110
Time series for either WMA, the next value is one thirteen point two.

69
00:05:49,410 --> 00:05:52,970
But for Se's, the value 112 gets repeated again.

70
00:05:53,550 --> 00:05:56,310
After that, we have won thirteen point two.

71
00:05:56,910 --> 00:05:58,230
So what's going on here?

72
00:06:01,890 --> 00:06:07,170
Let's try to shift the sex column back by one to check if the rest of the values line up.

73
00:06:10,730 --> 00:06:15,080
In the next block, we're going to plot WEMA against X1.

74
00:06:21,460 --> 00:06:25,270
As you can see, the values do indeed line up as suspected.

75
00:06:25,750 --> 00:06:31,570
However, you'll notice that I have this comment in here in which I tell you very loudly, no, do not

76
00:06:31,570 --> 00:06:32,260
do this.

77
00:06:32,500 --> 00:06:33,980
In fact, this is wrong.

78
00:06:34,390 --> 00:06:39,700
Now, I won't name any names, but I've seen a few other people who do this who happen to be very popular

79
00:06:39,700 --> 00:06:40,810
among beginners.

80
00:06:41,200 --> 00:06:42,610
So why is this wrong?

81
00:06:43,240 --> 00:06:48,440
Recall that the forecasting model is defined as slightly differently from the traditional IWM.

82
00:06:49,300 --> 00:06:55,420
As you recall from the theory lecture, the forecasting time index is actually moved up by one step.

83
00:06:55,900 --> 00:06:59,020
That humor is represented by the level.

84
00:06:59,260 --> 00:07:04,380
But the prediction why hat is actually assigns the level at the previous time step.

85
00:07:04,870 --> 00:07:09,620
In other words, the SES model should be lagging behind by one time step.

86
00:07:09,820 --> 00:07:12,670
So what we had plotted originally was correct.

87
00:07:13,240 --> 00:07:15,580
You'll see in our later examples why this is true.

88
00:07:16,540 --> 00:07:20,170
Remember that if we shift this, then we have to shift everything.

89
00:07:20,560 --> 00:07:25,390
But you'll see that with whole winters', the full model, there is no need for shifting because it's

90
00:07:25,390 --> 00:07:28,480
actually the correct model and it will match up very nicely.

91
00:07:28,930 --> 00:07:34,180
If we shift this, then we have to shift that to be consistent and you'll see that would be completely

92
00:07:34,180 --> 00:07:34,630
wrong.

93
00:07:39,380 --> 00:07:43,370
All right, so the next thing we're going to do in this lecture is to treat this more like a machine

94
00:07:43,370 --> 00:07:48,980
learning problem, that is we're going to split up our data, set into train and test, and we're actually

95
00:07:48,980 --> 00:07:50,340
going to do a forecast.

96
00:07:50,870 --> 00:07:57,140
So I've sat and tested 12 and I've set the train set to be everything up to just before the last 12

97
00:07:57,140 --> 00:07:57,910
data points.

98
00:07:58,320 --> 00:08:02,410
I I've therefore set the test set to be the last 12 data points.

99
00:08:07,900 --> 00:08:13,700
Next, I'm going to recreate the simple smoothing object, but this time with only the train set.

100
00:08:14,230 --> 00:08:19,600
Again, I'm going to give this the variable name Se's next I call the fit function.

101
00:08:20,200 --> 00:08:24,990
Notice how this time I'm not going to pass in Elfa or set optimises equal to false.

102
00:08:25,480 --> 00:08:31,480
What this function will now do is find the best alpha so that it minimizes the squared error over the

103
00:08:31,480 --> 00:08:32,230
train set.

104
00:08:37,020 --> 00:08:41,010
The next step is to create indices which we can use to index our data frame.

105
00:08:41,490 --> 00:08:46,650
We need this if we want to index only the train points or only the test points one at a time.

106
00:08:47,460 --> 00:08:51,020
So to do this, we're basically going to create a one, Zeray, of booleans.

107
00:08:51,390 --> 00:08:55,770
It's going to be true for the rows, which corresponds to the correct data set, whether that's train

108
00:08:55,770 --> 00:08:56,440
or test.

109
00:08:57,390 --> 00:09:03,000
So for the trains that we do, DFG index less than or equal to that train of minus one.

110
00:09:03,540 --> 00:09:09,120
That is to say, any row where the index is less than or equal to the last training point belongs to

111
00:09:09,120 --> 00:09:09,450
the train.

112
00:09:09,480 --> 00:09:14,700
So of course, this makes sense since the data set has been split in chronological order.

113
00:09:15,450 --> 00:09:18,610
And using that, we know that the test index is just the opposite.

114
00:09:19,020 --> 00:09:21,870
So we use greater then instead of less than or equal to.

115
00:09:26,700 --> 00:09:33,270
Next, I'm going to assign the predictions from our model back to the original data frame ADF to do

116
00:09:33,270 --> 00:09:33,570
this.

117
00:09:33,600 --> 00:09:39,030
I'm going to add a resident fitted values to the data frame, indexing the rose by train ATX.

118
00:09:39,480 --> 00:09:41,820
This will be assigned to the column Se's fitted.

119
00:09:42,960 --> 00:09:46,860
Next, I'm going to call resident forecasts for NS has time steps.

120
00:09:47,130 --> 00:09:52,170
I'm going to assign this to the data frame under the same column in the final and test rows.

121
00:09:53,160 --> 00:09:57,420
Next, I'm going to plot the passengers and SES fitted columns.

122
00:10:02,180 --> 00:10:07,230
As you can see, the forecast is indeed a horizontal straight line, as promised.

123
00:10:07,790 --> 00:10:09,620
Note that again for the train set.

124
00:10:09,800 --> 00:10:13,440
The prediction looks like it's lagging behind by one step.

125
00:10:13,940 --> 00:10:18,220
You, again, may be tempted to shift this back so that they line up perfectly.

126
00:10:18,530 --> 00:10:21,190
But remember, this is not the correct thing to do.

127
00:10:22,590 --> 00:10:27,780
A simple reason to remember why that is, is because you know that the model stores the dates, so we

128
00:10:27,780 --> 00:10:30,510
have to presume that the model lines up those dates correctly.

129
00:10:31,440 --> 00:10:35,500
Now, if you did shift it back philosophically, think about what that means.

130
00:10:35,850 --> 00:10:41,700
That means this simple exponential smoothing model makes it nearly perfect predictions, which means

131
00:10:41,700 --> 00:10:43,870
that there is no further work to be done.

132
00:10:44,340 --> 00:10:45,790
We don't have to model the trend.

133
00:10:45,850 --> 00:10:47,760
We don't have to model the seasonal component.

134
00:10:48,210 --> 00:10:50,240
And of course, that would not make any sense.

135
00:10:52,500 --> 00:10:58,170
Furthermore, looking at this plot, we can try to guess what the value of Alpha might be when our Alpha

136
00:10:58,170 --> 00:11:04,320
was zero point to the one we set manually, the IWM looked very smooth and didn't track the original

137
00:11:04,320 --> 00:11:05,610
signal that fast.

138
00:11:06,210 --> 00:11:12,180
On the other hand, for this fitted alpha, we can see that the model is reacting very quickly to the

139
00:11:12,180 --> 00:11:17,910
original Time series, which would indicate that it cares more about the most recent sample and less

140
00:11:17,910 --> 00:11:19,580
about the previous average.

141
00:11:20,070 --> 00:11:25,620
Therefore, we might guess that this alpha is very close to one rather than being closer to zero.

142
00:11:29,000 --> 00:11:34,340
In the next block of code, we check what the value of Alpha is by printing resident Prem's.

143
00:11:38,890 --> 00:11:44,260
As you can see, the result is, in fact, very nearly one, that means our fit and model is simply

144
00:11:44,260 --> 00:11:50,110
doing nearly the night forecast, or in other words, it simply copies the last known value in the series.