1
00:00:11,130 --> 00:00:17,190
All right, so the main reason I wanted to discuss the Akef and the pickoff was because in a Time series

2
00:00:17,190 --> 00:00:23,880
analysis courses, instructors often talk about how to use the Akef and the pickoff as if you're following

3
00:00:23,880 --> 00:00:25,560
a set of arbitrary rules.

4
00:00:26,130 --> 00:00:31,770
But they never go on to show you that these rules actually do what they say they do by comparing the

5
00:00:31,770 --> 00:00:38,950
Akef and the pickoff to known data by looking at the acronym for known data.

6
00:00:39,210 --> 00:00:41,940
We can confirm that those rules actually work.

7
00:00:42,450 --> 00:00:44,650
Following them thus becomes natural.

8
00:00:45,240 --> 00:00:50,700
It's also important to recognize that these are standard topics and time series analysis, so it wouldn't

9
00:00:50,700 --> 00:00:51,730
be right to leave them out.

10
00:00:52,650 --> 00:00:55,020
Even so, there are important caveats.

11
00:00:55,590 --> 00:01:00,580
The caveats are that even if you do follow these rules, they might not lead to the best model.

12
00:01:01,110 --> 00:01:05,140
Of course, we haven't yet defined what best model actually means, but we will.

13
00:01:05,910 --> 00:01:11,880
The real question is why do all that work when we can get a computer to do it for us, especially when

14
00:01:11,880 --> 00:01:14,280
a computer will do a better job than we can?

15
00:01:14,820 --> 00:01:16,720
That is what this lecture is all about.

16
00:01:21,770 --> 00:01:26,900
So it turns out that the computers of today are fast enough that you don't need to go through all that

17
00:01:26,900 --> 00:01:30,620
work, especially since you might end up with a suboptimal answer.

18
00:01:31,370 --> 00:01:38,120
Instead, popular time series packages usually contain a function called auto arima that automatically

19
00:01:38,120 --> 00:01:40,150
finds the best model for you.

20
00:01:40,730 --> 00:01:46,070
That is to say, it tries a bunch of different settings and returns the best settings according to some

21
00:01:46,070 --> 00:01:51,110
criteria on what these criteria might be will be discussed in the next lecture.

22
00:01:51,800 --> 00:01:57,470
In the other language, for example, you can simply call a function called Auto Arima and pass in your

23
00:01:57,470 --> 00:01:59,180
Time series, which is very nice.

24
00:01:59,660 --> 00:02:02,450
In stat's models, there exists no such function.

25
00:02:03,260 --> 00:02:09,560
However, there is a package called PMed Yarema, which uses stat's models under the hood that does

26
00:02:09,560 --> 00:02:11,530
implement the auto arima function.

27
00:02:12,260 --> 00:02:15,920
So basically we have the power of auto arima at our fingertips.

28
00:02:16,070 --> 00:02:21,140
We just need to install a different library, which luckily still uses Stass models under the hood.

29
00:02:26,000 --> 00:02:31,430
Now, the PMed Arima API is a little different from stat's models, so that would be worth mentioning

30
00:02:31,430 --> 00:02:38,400
briefly assuming that we have our data as a series, the first step will be to call the auto Arima function.

31
00:02:39,050 --> 00:02:43,520
This allows us to pass in an argument called seasonal as well as the seasonal period.

32
00:02:43,530 --> 00:02:50,360
M We will get to what this means later on, but it's basically a seasonal extension of the vanilla Arima

33
00:02:50,360 --> 00:02:51,890
we've been discussing so far.

34
00:02:52,820 --> 00:02:57,410
This will return to us a model object from which we can make predictions.

35
00:02:57,950 --> 00:03:02,090
Note that this is not a static model object, so the API is a bit different.

36
00:03:02,840 --> 00:03:06,380
When we want to forecast, we call the model predict function.

37
00:03:07,070 --> 00:03:14,030
This takes into arguments which are relevant to us, which are in periods and return content and periods.

38
00:03:14,030 --> 00:03:20,210
Is the forecast horizon or the number of time steps to forecast and return content should be set to

39
00:03:20,210 --> 00:03:23,850
true if you want the confidence bounce along with your prediction.

40
00:03:24,530 --> 00:03:26,480
So that's for the out of sample data.

41
00:03:27,080 --> 00:03:31,700
For the sample data you want to call the function model predict in sample.

42
00:03:32,270 --> 00:03:37,450
This takes into arguments start and end along with some others, which we won't worry about.

43
00:03:38,270 --> 00:03:42,980
Technically, you can return confidence intervals for the sample predictions as well, although that

44
00:03:42,980 --> 00:03:44,830
is less common than for forecasts.

45
00:03:45,800 --> 00:03:51,530
Unlike stats, models start and end should be specified as integers which correspond to the indices

46
00:03:51,530 --> 00:03:53,810
of the train set in stats.

47
00:03:53,810 --> 00:03:59,510
Models start and end can be integers, but they can also be daytime objects or strings corresponding

48
00:03:59,510 --> 00:04:02,630
to the index of the the series you passed in for training.

49
00:04:07,600 --> 00:04:13,120
In order to better understand what Auto Arima is doing, we'll have to discuss seasonal Arima.

50
00:04:13,690 --> 00:04:18,300
This will be brief since it's not as important to know compared to the vanilla Arima.

51
00:04:18,790 --> 00:04:22,010
Just understand the basic high level points and you will be fine.

52
00:04:22,960 --> 00:04:28,870
Again, I want to mention that seasonality is not usually observed in stock prices, but stock prices

53
00:04:28,870 --> 00:04:32,650
do not comprise all of what can be considered financial data.

54
00:04:33,070 --> 00:04:37,450
And so these concepts do show up in financial literature as they should.

55
00:04:38,290 --> 00:04:43,210
One example of this is that the weather or in other words, actual seasons like summer, winter and

56
00:04:43,210 --> 00:04:47,040
so forth, affect when and where people purchase real estate.

57
00:04:47,770 --> 00:04:51,430
So that's one example of where seasonality affects finance.

58
00:04:52,150 --> 00:04:59,080
OK, so seasonal Arima, abbreviated as SCREAMO, gives us three new hyper parameters called Capital

59
00:04:59,080 --> 00:05:05,030
P, Capital D and Capital Q, which are analogous with the lowercase versions of the Eurema.

60
00:05:05,710 --> 00:05:08,340
So now we have six hyper parameters in total.

61
00:05:08,980 --> 00:05:17,890
We usually write this as a Arima PD to cross capital P Capital, the capital Q Subscript M where M is

62
00:05:17,890 --> 00:05:19,020
the seasonal period.

63
00:05:20,170 --> 00:05:25,780
The reason why we add the cross is because it turns out when you write the model using operators, you

64
00:05:25,780 --> 00:05:30,310
end up multiplying the non seasonal parts by the seasonal parts to get your full model.

65
00:05:30,940 --> 00:05:35,560
This is outside the scope of this course, since the notation is very messy and doesn't really add any

66
00:05:35,560 --> 00:05:38,020
benefit on top of what we are already doing.

67
00:05:38,770 --> 00:05:43,840
However, if you want some simple intuition, we can discuss what the seasonal part might look like

68
00:05:43,840 --> 00:05:44,980
in isolation.

69
00:05:49,970 --> 00:05:56,330
So let's suppose we want to consider only the seasonal part of the seasonal Arima, if we have capital,

70
00:05:56,340 --> 00:06:01,310
the one that means we difference YFC once using the seasonal period.

71
00:06:02,730 --> 00:06:10,180
Our notation for this is Delta sub M y of T, which is equal to Y of T, minus Y of T minus M.

72
00:06:10,980 --> 00:06:16,440
So this is just like regular differences, except that regular different saying subtracts from one timestep

73
00:06:16,440 --> 00:06:19,080
behind and that measures the slope of the trend.

74
00:06:20,650 --> 00:06:27,070
With the seasonal part of seasonal Arima, we subtract from one period ago, so if we have monthly data

75
00:06:27,070 --> 00:06:30,880
and T is equal to March, then we subtract from the previous March.

76
00:06:31,750 --> 00:06:37,270
Note that when we discussed stationery earlier, we discussed certain characteristics of Time series

77
00:06:37,450 --> 00:06:43,670
that indicate that it is not stationary examples of this where trend and changing variance.

78
00:06:44,080 --> 00:06:49,060
If either the level or the variance changes over time, then the series is not stationary.

79
00:06:49,990 --> 00:06:57,050
Seasonality is the other key component of this stationary time series should also not exhibit seasonality.

80
00:06:57,640 --> 00:07:02,710
So if you find a time series that does exhibit seasonality, then you would say is not stationary.

81
00:07:07,710 --> 00:07:13,530
Interestingly, what we found in our previous, quote, examples was that it is possible for a non seasonal

82
00:07:13,530 --> 00:07:19,410
Arima to model the airline passengers data set quite well, even though it has no seasonal component.

83
00:07:20,130 --> 00:07:26,750
In fact, it's possible for a purely auto regressive H2 model to perfectly represent a sine wave.

84
00:07:27,300 --> 00:07:34,560
So it's not that Arima cannot model seasonality, but including seasonality explicitly by using SCREAMO

85
00:07:34,740 --> 00:07:36,200
can lead to a better model.

86
00:07:41,060 --> 00:07:45,740
So without loss of generality, let's suppose that we have not difference so that we can keep using

87
00:07:45,740 --> 00:07:51,740
the variable y t or you can pretend y t now represents the new difference, the time series doesn't

88
00:07:51,740 --> 00:07:52,150
matter.

89
00:07:52,790 --> 00:07:56,420
The seasonal auto regressive part is, again, what you might expect.

90
00:07:57,050 --> 00:08:05,240
Y t is regressed on past data points in the Time series at multiples of M away from T, so for example,

91
00:08:05,390 --> 00:08:10,280
by of T depends on Y of T minus M Y of T minus two M and so on.

92
00:08:10,850 --> 00:08:13,760
For the moving average part, we again have the same pattern.

93
00:08:14,270 --> 00:08:17,700
Y of T depends on past errors at multiples of M.

94
00:08:18,080 --> 00:08:22,190
So that would be Epsilon at T minus M Epsilon at T, minus two M and so on.

95
00:08:23,410 --> 00:08:29,170
Finally, once we have this season apart, we multiply it with the non seasonal part, and that gives

96
00:08:29,170 --> 00:08:30,700
us the false Arima model.

97
00:08:35,670 --> 00:08:42,210
In code, there is actually one extra component which gives us the sorry max model, the X part is actually

98
00:08:42,210 --> 00:08:47,310
pretty nice and I think it will make you feel like this model can actually be applied to so-called real

99
00:08:47,310 --> 00:08:48,000
world data.

100
00:08:48,780 --> 00:08:51,620
The ex parte refers to exogenous variables.

101
00:08:52,080 --> 00:08:58,290
So imagine that you have a time series of length Big T at each point in the Time series, you have some

102
00:08:58,290 --> 00:09:04,740
feature vector of exogenous data, such as maybe the sentiment of tweets by Elon Musk or the sentiment

103
00:09:04,740 --> 00:09:06,920
from various financial news sites.

104
00:09:07,410 --> 00:09:13,740
You can include this in your model by passing in an array of feature vectors of size T bidi into the

105
00:09:13,740 --> 00:09:15,060
auto arima function.

106
00:09:15,870 --> 00:09:20,610
One caveat to this is that when you want to make predictions, you have to know what these features

107
00:09:20,610 --> 00:09:21,020
are.

108
00:09:21,330 --> 00:09:24,920
So you want to consider whether or not this is actually practical for you.

109
00:09:25,470 --> 00:09:29,090
In any case, we won't be making use of this, but we will see on our output.

110
00:09:29,250 --> 00:09:30,110
Sorry, Max.

111
00:09:30,300 --> 00:09:32,900
So this is just in case you wanted to know what that means.

112
00:09:33,420 --> 00:09:36,770
Otherwise, consider this topic outside the scope of this cause.
