1
00:00:11,050 --> 00:00:15,970
In this lecture, we are going to apply the whole winter's exponential smoothing model in code.

2
00:00:16,510 --> 00:00:22,120
This is our final model for this group of models, the culmination of our study of exponential smoothing.

3
00:00:23,230 --> 00:00:28,420
This lecture is going to walk you through a prepared CoLab notebook, although a very good exercise,

4
00:00:28,420 --> 00:00:34,090
which I always recommend is once you know how this is done, to try and recreate it yourself with as

5
00:00:34,090 --> 00:00:35,660
few references as possible.

6
00:00:36,190 --> 00:00:41,890
As always, you can check the lectures, how to code by yourself and how to practice for a more in-depth

7
00:00:41,890 --> 00:00:42,660
discussion.

8
00:00:43,120 --> 00:00:48,460
If there's anything in this lecture you didn't understand or you think I missed a step or didn't explain

9
00:00:48,460 --> 00:00:51,690
why we were doing something, please use the Q&amp;A to inquire.

10
00:00:52,300 --> 00:00:57,310
As usual, you can look at the title of the notebook to determine what notebook we are currently looking

11
00:00:57,310 --> 00:00:57,640
at.

12
00:00:58,990 --> 00:01:01,930
Let's start by importing the exponential smoothing class.

13
00:01:05,460 --> 00:01:07,960
Next, we're going to create an instance of our model.

14
00:01:08,610 --> 00:01:13,290
This time we're not going to bother fitting a model to the entire data set, but rather we're going

15
00:01:13,290 --> 00:01:15,330
to skip right away to the train to split.

16
00:01:16,910 --> 00:01:22,880
So the first variation we're going to try is the additive trend and additive seasonality, we set the

17
00:01:22,880 --> 00:01:28,580
seasonal periods, the 12, since we know that the data cycles yearly and the frequency of the data

18
00:01:28,580 --> 00:01:29,390
is monthly.

19
00:01:30,290 --> 00:01:32,030
Next, we call the fit function.

20
00:01:33,530 --> 00:01:34,550
So let's run this.

21
00:01:38,570 --> 00:01:43,940
Next, we assign the fitted values to the whole Winters' column of our original data frame for the train

22
00:01:43,940 --> 00:01:44,540
rose.

23
00:01:47,790 --> 00:01:53,250
Next, we call the forecast function for and test the steps in a sign the results to our original data

24
00:01:53,250 --> 00:01:55,080
frame for the test rose.

25
00:01:58,800 --> 00:02:02,550
Next, we plot the whole winter's column along with the original data set.

26
00:02:08,660 --> 00:02:14,840
So this is very encouraging, unlike the simple exponential smoothing model and the whole linear trend

27
00:02:14,840 --> 00:02:15,240
model.

28
00:02:15,560 --> 00:02:22,100
This model fits very well for both a train and test notice, importantly, that the predictions are

29
00:02:22,100 --> 00:02:24,320
not lagging behind the Time series.

30
00:02:24,710 --> 00:02:29,810
So hopefully now you are convinced that it is not correct to shift it backwards by one step.

31
00:02:30,020 --> 00:02:34,750
And in fact, the lagging is only due to model mis specification.

32
00:02:35,300 --> 00:02:37,100
Remember that you have to be consistent.

33
00:02:37,400 --> 00:02:40,850
If you shift back for one model, you have to shift back for all of them.

34
00:02:41,330 --> 00:02:46,220
And for this model, that would be very silly because then the predictions would be ahead of the data,

35
00:02:46,340 --> 00:02:47,750
which doesn't make any sense.

36
00:02:52,150 --> 00:02:57,760
Now that we know our model fits decently well, it would be a good idea to calculate some metrics.

37
00:02:58,270 --> 00:03:03,450
Although there are many metrics which are used in time series analysis, the root mean squared error,

38
00:03:03,490 --> 00:03:04,870
we'll be just fine for us.

39
00:03:05,440 --> 00:03:10,570
This metric makes more sense, in my opinion, than something like the absolute error, because these

40
00:03:10,570 --> 00:03:13,370
models actually minimize the squared error directly.

41
00:03:13,900 --> 00:03:18,700
Therefore, it would make less sense to check the absolute error because we didn't optimize for that

42
00:03:18,700 --> 00:03:19,820
in the first place.

43
00:03:20,500 --> 00:03:24,340
If you're going to check the absolute error, then you may as well make the absolute error.

44
00:03:24,340 --> 00:03:26,890
Your lost function in the root mean square.

45
00:03:26,890 --> 00:03:32,230
There is just the square root of the square there, which puts it on the same scale as the original

46
00:03:32,230 --> 00:03:32,690
data.

47
00:03:33,130 --> 00:03:38,690
So minimizing the mean squared error is equivalent to minimizing the root mean squared error.

48
00:03:39,370 --> 00:03:40,710
So here's how we calculate it.

49
00:03:41,230 --> 00:03:46,930
We take why the predictions and t the targets, although it doesn't really matter which is which and

50
00:03:46,930 --> 00:03:48,120
then we subtract them.

51
00:03:48,730 --> 00:03:50,790
This does an element y subtraction.

52
00:03:51,490 --> 00:03:53,050
Next we square the result.

53
00:03:53,530 --> 00:03:55,850
This is also an element y's operation.

54
00:03:57,310 --> 00:03:58,580
Next we call the mean.

55
00:03:58,600 --> 00:04:00,760
So now we have the mean squared error.

56
00:04:01,510 --> 00:04:05,530
Finally we take the square root so we get the root mean squared error or.

57
00:04:08,050 --> 00:04:12,540
Now, just out of curiosity, I would also like to calculate the mean absolute error.

58
00:04:12,970 --> 00:04:14,730
This calculation is a little simpler.

59
00:04:15,220 --> 00:04:20,880
We just take the difference between why and t take the absolute value and then take the mean, hence

60
00:04:20,890 --> 00:04:22,480
mean absolute error.

61
00:04:27,850 --> 00:04:34,060
Next, we check the RNC on both the train and test set, as you recall, the train predictions are stored

62
00:04:34,060 --> 00:04:39,010
in the fitted values attribute and the test predictions are made using the forecast function.

63
00:04:40,060 --> 00:04:44,740
As you can see, I've already switched around, which is the prediction and which is the target, since

64
00:04:44,740 --> 00:04:49,120
it doesn't really matter, which is why and which is t you will get the same answer either way.

65
00:04:53,790 --> 00:04:59,070
All right, so the train pharmacy is about 11 something and the test pharmacy is about 17.

66
00:05:04,430 --> 00:05:10,010
When we checked the mean absolute air, we get about nine something for train and 13 something for test.

67
00:05:13,580 --> 00:05:17,960
Next, we're going to try the whole Winters' model again, but with different parameters.

68
00:05:18,590 --> 00:05:24,220
Recall that it seems that the amplitude of the cycle increases with the level of the Time series.

69
00:05:24,770 --> 00:05:28,310
This suggests that a multiplicative model may fit better.

70
00:05:28,910 --> 00:05:34,140
So for this model, we're going to have an additive trend and a multiplicative seasonality.

71
00:05:34,730 --> 00:05:40,040
Again, we call it a sign the trend predictions and the test predictions to the data frame and then

72
00:05:40,040 --> 00:05:40,870
plot the result.

73
00:05:42,390 --> 00:05:47,540
So we're just doing it all in one step, since you already know how each of these steps works.

74
00:05:53,370 --> 00:05:59,200
So from my perspective, at least, this model appears to fit better than the purely additive model.

75
00:05:59,670 --> 00:06:01,740
But let's check our accuracy metrics.

76
00:06:08,750 --> 00:06:13,850
So when we check the pharmacy, we get about nine something for the train set and about 15 something

77
00:06:13,850 --> 00:06:14,420
for the tests.

78
00:06:15,230 --> 00:06:20,290
This gives us a better trainer and a better test there compared to the purely additive model.

79
00:06:20,960 --> 00:06:22,940
Therefore, we would prefer this model.

80
00:06:27,720 --> 00:06:33,780
If we check the mean absolute air, we see the same thing, both the trainer and the tester have improved.

81
00:06:37,880 --> 00:06:43,910
Next, let's try Hault Winters again, but with the multiplicative trend and the multiplicative seasonality,

82
00:06:44,480 --> 00:06:49,080
this kind of makes sense because the trend in the Time series is not exactly a straight line.

83
00:06:49,640 --> 00:06:52,560
The growth actually seems to be accelerating over time.

84
00:06:53,180 --> 00:06:57,550
Again, we call it a sign the predictions to our data frame and plot the result.

85
00:07:03,620 --> 00:07:09,080
It's difficult to tell whether this is a better or worse fate, so let's check our accuracy metrics.

86
00:07:14,010 --> 00:07:19,770
All right, so here we see an interesting pattern, the train arm is about nine something and the test

87
00:07:19,770 --> 00:07:22,300
arm AC is about twenty five something.

88
00:07:22,770 --> 00:07:26,510
So the trainer is better, but the test error is much worse.

89
00:07:30,590 --> 00:07:35,930
We see the same thing for the mean absolute error, the train error has gone down, but the test there

90
00:07:35,930 --> 00:07:38,150
has gone up to about 20 something.

91
00:07:38,750 --> 00:07:44,540
Therefore, while this Time series may look like a multiplicative model is a better fit, it actually

92
00:07:44,540 --> 00:07:47,810
overfit to the training data and we get a worse result.
