1
00:00:11,030 --> 00:00:16,310
OK, so in this lecture, we are going to look at how to implement, walk forward validation and code.

2
00:00:17,060 --> 00:00:20,510
Let's start by downloading our data set, which will be airline passengers.

3
00:00:26,400 --> 00:00:29,940
The next step is to update stats models, so we have the latest API.

4
00:00:35,510 --> 00:00:38,460
The next step is to import the modules we need for this script.

5
00:00:38,930 --> 00:00:41,480
So basically the only new thing is just better tools.

6
00:00:41,990 --> 00:00:46,580
This is useful for looping through all possible combinations of the options we want to try.

7
00:00:51,900 --> 00:00:55,950
The next step is to load in our CEV using pedigreed CSFI.

8
00:00:59,420 --> 00:01:03,320
The next step is to set the frequency of our data from index to mutt's.

9
00:01:06,930 --> 00:01:12,120
The next step is to check the size of our data frame, so this is useful to make sure we don't go too

10
00:01:12,120 --> 00:01:16,610
far back when we create our train set, since we want our train set to have enough data.

11
00:01:21,010 --> 00:01:26,770
The next step is to set a few parameters for our test was that the forecast horizon to be 12 and the

12
00:01:26,770 --> 00:01:32,800
number of walk forward steps to be 10, the effective validation period, which I've called and test,

13
00:01:33,010 --> 00:01:37,960
is the length of the whole data frame, minus H, minus the number of steps, plus one.

14
00:01:38,560 --> 00:01:41,670
You might want to draw this out on paper to make sure it makes sense.

15
00:01:42,070 --> 00:01:45,490
There will be some debugging code in our walk forward function to check this.

16
00:01:50,220 --> 00:01:56,660
The next step is to set some configuration options to try, so please read these and review the documentation

17
00:01:56,790 --> 00:01:57,690
if you need to.

18
00:01:58,320 --> 00:02:02,540
The only thing that's mildly confusing here is the use box Cox list of options.

19
00:02:03,090 --> 00:02:08,250
So according to the documentation, you should be allowed to use the string log as an option.

20
00:02:08,700 --> 00:02:10,440
However, this actually fails.

21
00:02:10,830 --> 00:02:16,230
As you can see from the stats Morse code, it returns an error if you do not pass in true false or offload

22
00:02:16,230 --> 00:02:17,550
value for lambda.

23
00:02:18,090 --> 00:02:20,810
So in fact, passing in the string log is incorrect.

24
00:02:21,240 --> 00:02:25,120
And just to be sure, this is the latest version of the stats model's documentation.

25
00:02:25,830 --> 00:02:28,960
You might want to double check it yourself to confirm what I'm saying.

26
00:02:29,580 --> 00:02:35,040
In any case, the simple solution is to just pass in zero since Lambda Equal Zero corresponds to the

27
00:02:35,040 --> 00:02:35,970
log transform.

28
00:02:44,620 --> 00:02:50,020
OK, so the next step is to implement our walk forward function, it's going to accept as input and

29
00:02:50,020 --> 00:02:54,160
argument for each of the options and the final flag for whether we want to debug.

30
00:02:56,400 --> 00:03:00,510
Inside the function, we'll start by creating a list to store the errors from each round.

31
00:03:00,990 --> 00:03:03,420
We'll also have some useful variables for debugging.

32
00:03:04,020 --> 00:03:09,190
Seen last is a flag that will tell us whether or not we've looped up to the final row in the data frame.

33
00:03:09,690 --> 00:03:12,010
This will ensure the limits of our loop are correct.

34
00:03:12,570 --> 00:03:17,730
We also have a counter called Steps Completed, so we'll increment this by one on each round and by

35
00:03:17,730 --> 00:03:21,230
the end of the loop it should equal to the number of steps we said earlier.

36
00:03:24,540 --> 00:03:29,430
The next step is to enter a loop again, you can check the limits of this loop by drawing them out on

37
00:03:29,430 --> 00:03:33,750
paper and we'll also print out some debugging variables to ensure its correct.

38
00:03:34,650 --> 00:03:37,200
Inside the loop, we index our train and test sets.

39
00:03:37,680 --> 00:03:42,270
Note that unlike what you might think of when we describe this function in words, there's no actual

40
00:03:42,270 --> 00:03:45,130
need to add data to the train set on each round.

41
00:03:45,600 --> 00:03:49,420
Instead, we can just index the data frame using the correct indices.

42
00:03:49,890 --> 00:03:55,080
Furthermore, indexing the data frame just creates a view without needing to create a new data frame

43
00:03:55,110 --> 00:03:57,420
so it doesn't take up any or computation.

44
00:04:00,700 --> 00:04:06,220
The next step is to check if the final index of the test set is equal to the final index of the original

45
00:04:06,220 --> 00:04:06,890
data frame.

46
00:04:07,360 --> 00:04:09,790
If this is true, we set scene last to true.

47
00:04:10,480 --> 00:04:14,680
Again, this is for debugging to ensure that we've looped through the end of the data.

48
00:04:16,580 --> 00:04:24,170
The next step is to increment steps completed by one again for debugging, the next step is to instantiate

49
00:04:24,170 --> 00:04:27,890
our model and call the fit function using the options we passed in.

50
00:04:32,250 --> 00:04:37,320
The next step is to grab our forecast and compute the mean squared error with the test, that we then

51
00:04:37,320 --> 00:04:39,300
append this error to our list of errors.

52
00:04:42,110 --> 00:04:47,630
The next step is to check the debug flag, if this is true, then we print out the value of C and last

53
00:04:47,630 --> 00:04:52,370
and steps completed at the end of this function, we return the mean of the errors list.

54
00:05:00,330 --> 00:05:05,400
OK, so the next step is to just test our function, to make sure that it works well, pass in debug

55
00:05:05,400 --> 00:05:08,120
equals true to make sure our loop limits are correct.

56
00:05:12,890 --> 00:05:18,230
OK, so luckily, scene last is equally true and the number of steps is equal to what we said earlier,

57
00:05:18,500 --> 00:05:20,560
this means our calculations were correct.

58
00:05:23,770 --> 00:05:29,190
OK, so the next step is to figure out how to loop through every combination of our many lists of options.

59
00:05:29,620 --> 00:05:33,850
So one option to do this would be to just create multiple nested for loops.

60
00:05:34,150 --> 00:05:39,190
Of course, this is very ugly and inflexible, since you might want to add new options in the future.

61
00:05:40,090 --> 00:05:44,760
If you have too many options, your lips are going to go off the page, which isn't very nice.

62
00:05:46,140 --> 00:05:51,270
So we're going to use Aitor tools, DOT products, which does exactly what we need, it loops through

63
00:05:51,270 --> 00:05:54,460
every possible combination of items from multiple lists.

64
00:05:55,110 --> 00:06:00,830
Now, the usual way to call this function would be to pass on the list directly into the product function.

65
00:06:01,560 --> 00:06:05,750
However, because the number of lists we have is so long, they would go off the page.

66
00:06:06,120 --> 00:06:11,550
And so an equivalent way of doing this is to pass in a tuple of the arguments we want to pass in and

67
00:06:11,550 --> 00:06:12,560
then use the star.

68
00:06:13,290 --> 00:06:18,450
OK, and so inside this loop will print out whatever this thing gives us to make sure it's what we expect.

69
00:06:23,070 --> 00:06:28,350
OK, so you can see that this is exactly what we expect, every item we get back from this function

70
00:06:28,500 --> 00:06:31,350
is a tuple of the options in different combinations.

71
00:06:33,800 --> 00:06:39,020
OK, so the next step is to finally put this all together and use our walk forward function to find

72
00:06:39,020 --> 00:06:40,250
the best set of options.

73
00:06:40,790 --> 00:06:45,020
We'll start by initializing the best score to infinity and the best options to none.

74
00:06:46,040 --> 00:06:52,460
Then we'll enter a loop that will iterate through all the possible combinations of options inside the

75
00:06:52,460 --> 00:06:52,720
loop.

76
00:06:52,730 --> 00:06:56,080
We'll call our walk forward function with the current set of options.

77
00:06:56,630 --> 00:07:02,030
Again, remember that we don't actually have to expand the tuple of options in order to pass these into

78
00:07:02,030 --> 00:07:03,120
the walk forward function.

79
00:07:03,290 --> 00:07:04,710
We can just use the star.

80
00:07:05,900 --> 00:07:11,330
So from this we get back a score and if this is less than our current best score, we save the score

81
00:07:11,330 --> 00:07:12,710
and we save the options.

82
00:07:32,330 --> 00:07:36,720
OK, so you can see that when we run this, we get some warnings about overflows.

83
00:07:37,130 --> 00:07:40,610
This is probably not an issue since it just means we'll get a bad model.

84
00:07:43,790 --> 00:07:49,040
OK, so the last step in this notebook is to check what the best score and best set of options actually

85
00:07:49,040 --> 00:07:49,400
are.

86
00:07:54,680 --> 00:07:59,930
So we see that the best set of options is a multiplicative trend and multiplicative seasonality.

87
00:08:00,560 --> 00:08:05,520
This makes sense since the seasonal pattern seems to grow over time for airline passengers.

88
00:08:06,380 --> 00:08:11,120
We have true for the downtrend, which is interesting since it doesn't seem like the trend is dampening

89
00:08:11,120 --> 00:08:11,850
over time.

90
00:08:12,830 --> 00:08:16,750
We have legacy heuristic for the Init method, which I guess could go either way.

91
00:08:17,270 --> 00:08:22,760
And interestingly, we have false for use box cox, which means that we got the best results by not

92
00:08:22,760 --> 00:08:24,140
doing any transforms.
