1
00:00:11,060 --> 00:00:17,210
In this lecture, we are going to discuss more on the topic of stationery, previously we looked at

2
00:00:17,210 --> 00:00:21,600
how different kinds of Arima models might fare on the airline passengers data set.

3
00:00:22,160 --> 00:00:23,720
Of course, we were just guessing.

4
00:00:24,290 --> 00:00:28,410
It's not easy to tell what are the right orders to use when fitting NRMA model.

5
00:00:29,210 --> 00:00:34,190
In fact, this is the case in general for machine learning in a deep learning.

6
00:00:34,190 --> 00:00:39,350
For example, one question I often get from beginners is how do I choose the hyper parameters?

7
00:00:39,500 --> 00:00:43,820
How do I choose the learning rate, the hidden layer size, the number of hidden units and so on?

8
00:00:44,390 --> 00:00:47,510
I think today people generally have a better understanding.

9
00:00:47,690 --> 00:00:52,340
But when I first started my courses, it would make people really angry that there wasn't some formula

10
00:00:52,340 --> 00:00:53,440
that they could use.

11
00:00:54,560 --> 00:01:01,250
Indeed, hyper parameter optimization isn't a topic for students who like simple direct in a straightforward

12
00:01:01,250 --> 00:01:02,680
answers to their problems.

13
00:01:03,200 --> 00:01:05,810
Usually it amounts to nothing but trial and error.

14
00:01:06,260 --> 00:01:10,930
As I always say, machine learning is experimentation, not philosophy.

15
00:01:11,420 --> 00:01:15,830
If you want to know whether something is going to work or not, well, then you do an experiment.

16
00:01:16,490 --> 00:01:19,010
In any case, this idea will come into play later.

17
00:01:19,010 --> 00:01:21,470
But for now, what I want to say is this.

18
00:01:21,980 --> 00:01:28,220
For Arima, there is a way that you can scientifically and methodically choose your hyper parameters.

19
00:01:28,580 --> 00:01:31,750
In our case, these are the orders PD and queue.

20
00:01:32,720 --> 00:01:37,610
Note that this process is not exact and does not necessarily lead to the best answer.

21
00:01:38,060 --> 00:01:40,400
However, it is statistically sound.

22
00:01:41,030 --> 00:01:44,380
The first scenario I would like to consider is stationary.

23
00:01:44,900 --> 00:01:49,340
As you recall, this will help us choose the order deep in our Arima model.

24
00:01:54,340 --> 00:01:59,740
So this lecture will be split up into two parts, the first part will be a more beginner and practical

25
00:01:59,740 --> 00:02:01,020
oriented discussion.

26
00:02:01,570 --> 00:02:06,860
We're going to look at how to determine whether or not a time series is stationary in code.

27
00:02:07,390 --> 00:02:10,940
This involves doing a statistical test and checking the P value.

28
00:02:11,830 --> 00:02:17,060
The second part will be more advanced and it will discuss stationary in a more exact manner.

29
00:02:17,710 --> 00:02:22,720
The second part is optional, so feel free to skip it if you want to jump straight to the code or if

30
00:02:22,720 --> 00:02:23,950
you do not like math.

31
00:02:28,790 --> 00:02:36,020
So this is the first part on the practical aspects of stationary stationary, loosely speaking, means

32
00:02:36,020 --> 00:02:39,180
that the distribution of the data does not change over time.

33
00:02:39,830 --> 00:02:45,110
That is, if you look at things like a mean or the variance at any point in the time series, they will

34
00:02:45,110 --> 00:02:46,160
always be the same.

35
00:02:47,090 --> 00:02:49,450
Looking at when this is not the case is helpful.

36
00:02:50,180 --> 00:02:55,250
For example, if you see a time series trending upwards or downwards, then you know that the mean is

37
00:02:55,250 --> 00:02:55,900
changing.

38
00:02:56,510 --> 00:03:01,670
Therefore, the existence of any trend means that the time series is not stationary.

39
00:03:02,540 --> 00:03:07,080
Furthermore, when the variance changes over time, that is also not stationary.

40
00:03:07,610 --> 00:03:12,800
So if at the beginning of a time series the value only wiggles around a little bit, but then starts

41
00:03:12,800 --> 00:03:16,070
to wiggle around more and more later, and that's not stationary.

42
00:03:16,700 --> 00:03:20,620
We've seen this behavior in stock returns which have heteros good activity.

43
00:03:21,200 --> 00:03:27,450
So one might consider stock returns to be non stationary, although it is often assumed that they are.

44
00:03:27,890 --> 00:03:32,350
So don't be surprised if we treat stock returns as if they are stationary in the future.

45
00:03:37,260 --> 00:03:42,790
Let's now talk about a practical issue, how can we test whether or not a time series is stationary?

46
00:03:43,530 --> 00:03:47,530
Luckily, there is a well known statistical test that does exactly this.

47
00:03:47,880 --> 00:03:52,250
It's called the Augmented Deqi Fuller Test or the eighth test for sure.

48
00:03:52,980 --> 00:03:58,010
As we discussed earlier, one way to think of statistical tests is like an API.

49
00:03:58,530 --> 00:04:01,650
We have a null hypothesis and an alternative hypothesis.

50
00:04:01,980 --> 00:04:05,290
We plug our data in and we get a P value as output.

51
00:04:05,940 --> 00:04:09,840
We check whether the P value is below our significant threshold.

52
00:04:10,230 --> 00:04:13,280
If it is, then we reject the null hypothesis.

53
00:04:13,740 --> 00:04:17,580
So really, we don't have to understand the augmented Dickie Fuller test.

54
00:04:17,730 --> 00:04:19,620
We just have to know what it is for.

55
00:04:20,550 --> 00:04:24,570
We have to know what the null hypothesis is and what the alternative is.

56
00:04:24,990 --> 00:04:27,510
Once we know these things, we can use the test.

57
00:04:28,080 --> 00:04:30,000
So what is the null hypothesis?

58
00:04:30,480 --> 00:04:34,040
The null hypothesis is that the time series is non stationary.

59
00:04:34,620 --> 00:04:38,400
The alternative hypothesis is that the TIME series is stationary.

60
00:04:38,970 --> 00:04:45,060
So if we find a P value less than, say, five percent, then we will reject the null hypothesis and

61
00:04:45,060 --> 00:04:47,250
we will say that the TIME series is stationary.

62
00:04:52,120 --> 00:04:55,640
And just to bring this back to Arima, how would we use this?

63
00:04:56,320 --> 00:05:00,670
Well, recall that this all has to do with the AI component of the Arima model.

64
00:05:01,270 --> 00:05:07,480
We want to difference our Time series until it becomes stationary so the process will go something like

65
00:05:07,480 --> 00:05:08,010
this.

66
00:05:08,560 --> 00:05:10,900
First, we just have our Raw Time series.

67
00:05:11,170 --> 00:05:13,410
Maybe it looks pretty stationary already.

68
00:05:13,870 --> 00:05:16,120
If so, we'll do an ATF test.

69
00:05:16,570 --> 00:05:22,390
If we get a P value below the significance threshold, we'll say it's stationary and so we'll then fit

70
00:05:22,390 --> 00:05:30,220
the rest of the A model or equivalently Woolfenden Arima, where we say that D is equal to zero otherwise

71
00:05:30,220 --> 00:05:31,750
will difference the data set.

72
00:05:32,410 --> 00:05:34,330
Then we'll run our ATF test again.

73
00:05:34,990 --> 00:05:40,210
Again, if we get a P value below the significance threshold, then we'll say it's stationary.

74
00:05:40,720 --> 00:05:43,890
Since we differenced once we'll say that D is equal to one.

75
00:05:44,470 --> 00:05:48,610
If the ATF test still does not reject the null, we might difference again.

76
00:05:53,610 --> 00:05:58,770
All right, so let's move on to the second optional part of this lecture where we discuss stationary

77
00:05:58,770 --> 00:05:59,710
more in depth.

78
00:06:00,300 --> 00:06:01,800
So what is stationary?

79
00:06:02,520 --> 00:06:07,980
Previously, we've only discussed this concept informally, but now we are ready to be more exact.

80
00:06:08,520 --> 00:06:15,390
In fact, there are two kinds of stationary, strong and weak, strong stationary means that the distribution

81
00:06:15,390 --> 00:06:20,030
of the random variables in your stochastic process does not change over time.

82
00:06:20,760 --> 00:06:22,330
As a rough example of this.

83
00:06:22,680 --> 00:06:29,570
Suppose I take an arbitrary window over some time series, then a suppose I move this window over tão

84
00:06:29,580 --> 00:06:36,240
time steps strong stationary would say that the distribution over these random variables is the same

85
00:06:36,390 --> 00:06:38,210
no matter what tau is.

86
00:06:38,610 --> 00:06:43,770
In other words, no matter where I look in the Time series, I see the same distribution.

87
00:06:44,490 --> 00:06:49,950
There is a very formal definition for a strong sense stationary, but this is definitely not necessary

88
00:06:49,950 --> 00:06:51,530
to understand for this cause.

89
00:06:52,260 --> 00:06:58,110
In fact, in the practical application of Time series analysis, strong sense stationary is not used

90
00:06:58,110 --> 00:06:58,920
very often.

91
00:07:03,770 --> 00:07:10,820
A more practical kind of stationary is weak and stationary, weak stationary looks at first and second

92
00:07:10,820 --> 00:07:13,930
order statistics rather than the full distribution.

93
00:07:14,570 --> 00:07:20,120
As you know, first order statistics usually corresponds to the ME second order.

94
00:07:20,120 --> 00:07:23,930
Statistics corresponds to things like variance and covariance.

95
00:07:25,010 --> 00:07:28,390
You already know that informal definition of weak stationary.

96
00:07:28,580 --> 00:07:31,610
It's that the mean in the covariance don't change over time.

97
00:07:32,240 --> 00:07:32,660
All right.

98
00:07:32,660 --> 00:07:35,920
But now we're going to look at this in a more exact way.

99
00:07:36,620 --> 00:07:38,800
Luckily, I think these are pretty straightforward.

100
00:07:39,200 --> 00:07:43,880
If you don't find that to be the case, it's not absolutely necessary to understand what we're going

101
00:07:43,880 --> 00:07:44,660
to do next.

102
00:07:45,380 --> 00:07:52,810
So for the mean, it just says that the mean time T is equal to the meantime T plus tau for all tau.

103
00:07:53,330 --> 00:07:54,310
That makes sense.

104
00:07:54,530 --> 00:07:58,100
It means that no matter where we look, the mean is always the same.

105
00:07:59,390 --> 00:08:02,230
For the second order statistics, it gets a little tricky.

106
00:08:02,810 --> 00:08:09,470
It says that the auto covariance for some random variable Y at time T1 and some other random variable

107
00:08:09,470 --> 00:08:16,160
Y at times too is only a function of the time difference between a T one and two.

108
00:08:21,120 --> 00:08:24,420
Now, that's probably confusing, so let's think about what that means.

109
00:08:25,080 --> 00:08:27,550
First of all, what is auto covariance?

110
00:08:28,050 --> 00:08:29,390
Well, auto means self.

111
00:08:29,550 --> 00:08:33,930
We've seen this plenty of times, auto regressive model, auto encoder and so on.

112
00:08:34,950 --> 00:08:41,700
The auto covariance between Whyatt T1 and Wyatts two is really just the covariance between a Wyatts

113
00:08:41,820 --> 00:08:42,120
one.

114
00:08:42,120 --> 00:08:48,390
And why it's to the auto part really just means that why it's one and why a t to come from the same

115
00:08:48,390 --> 00:08:49,220
time series.

116
00:08:49,660 --> 00:08:53,500
Again, this is just the covariance and what is covariance?

117
00:08:53,970 --> 00:08:57,200
Well, we've learned that it is the unscathed correlation.

118
00:08:57,720 --> 00:09:01,440
Therefore it tells us how related to random variables are.

119
00:09:01,980 --> 00:09:07,440
If they are completely unrelated, then the correlation and hence the covariance will be zero.

120
00:09:08,130 --> 00:09:13,920
If they are related, that is, they move together either in the same direction or the opposite direction,

121
00:09:14,100 --> 00:09:15,990
then this value will be non-zero.

122
00:09:16,440 --> 00:09:20,550
If they move in the same direction, then the covariance will be greater than zero.

123
00:09:20,850 --> 00:09:24,690
If they move in opposite directions, then this value will be less than zero.

124
00:09:29,730 --> 00:09:34,950
So why does stationary mean that the covariance can be written as just the time difference between a

125
00:09:34,950 --> 00:09:36,150
T one and two?

126
00:09:36,840 --> 00:09:43,390
Well, the intuition is this T one minus two is just the distance between the two time points.

127
00:09:43,920 --> 00:09:49,980
That means if I pick any two time points in the series, as long as this time difference is the same,

128
00:09:50,220 --> 00:09:53,420
the covariance between these two random variables is the same.

129
00:09:54,060 --> 00:09:59,670
For example, the covariance between a time one and time three is the same as the covariance between

130
00:09:59,670 --> 00:10:01,140
a time three and time five.

131
00:10:01,380 --> 00:10:05,030
And that's the same as the covariance between time 10 a.m. 12.

132
00:10:05,340 --> 00:10:07,500
The distance between all of these is two.

133
00:10:08,370 --> 00:10:14,550
In other words, the relationship between each value and the time series remains constant over time.

134
00:10:15,420 --> 00:10:19,370
This actually makes a lot of sense in terms of auto regressive models.

135
00:10:19,890 --> 00:10:24,910
If this relationship were to change over time, then we wouldn't be able to fit any such model.

136
00:10:25,530 --> 00:10:28,800
That's why we want stationary when we fit these kinds of models.

137
00:10:29,340 --> 00:10:36,510
For example, suppose our auto regressive model is way of t equal to zero point five times Y of T minus

138
00:10:36,510 --> 00:10:36,930
one.

139
00:10:37,590 --> 00:10:42,750
But imagine if this were only true for T equals two and not T equals three and so forth.

140
00:10:43,140 --> 00:10:44,760
Then this equation doesn't work.

141
00:10:45,210 --> 00:10:50,220
In order for this equation to work, this relationship has to hold for all times.

142
00:10:55,050 --> 00:10:58,690
Not that this also implies that the variance remains constant over time.

143
00:10:59,250 --> 00:11:04,470
If it's true that the auto covariance depends only on the time difference, then it doesn't matter what

144
00:11:04,470 --> 00:11:06,330
time we pick K1.

145
00:11:06,330 --> 00:11:08,580
One is just equal to zero zero.

146
00:11:08,940 --> 00:11:11,670
Katsu two is just equal to zero zero.

147
00:11:12,240 --> 00:11:18,300
So that's why if we see the variance change over time, as we do with volatility clustering, then we

148
00:11:18,300 --> 00:11:21,420
take that as evidence that the Time series is nine stationary.

149
00:11:26,200 --> 00:11:29,170
OK, so why is the concept of stationary useful?

150
00:11:30,130 --> 00:11:34,960
Well, you've already learned one important reason, which is that if your Time series is not stationary,

151
00:11:35,170 --> 00:11:37,970
then we cannot even use a single model to forecast.

152
00:11:38,500 --> 00:11:43,180
This is because if our Time series were not stationary, then we would need a different model at each

153
00:11:43,180 --> 00:11:45,620
point in time, which is clearly not useful.

154
00:11:46,260 --> 00:11:48,070
Another simpler reason is this.

155
00:11:49,300 --> 00:11:54,460
Recall that when we're given a time series, we would often like to compute statistics from the Time

156
00:11:54,460 --> 00:11:55,050
series.

157
00:11:55,600 --> 00:11:58,240
For example, what is the mean and what is the variance?

158
00:11:58,780 --> 00:12:04,120
Well, if the Time series is changing over time, then it makes no sense to refer to the mean or the

159
00:12:04,120 --> 00:12:04,840
variance.

160
00:12:05,410 --> 00:12:09,700
This is because each point in time has a different meaning and a different variance.

161
00:12:10,180 --> 00:12:12,570
That is for a non stationary time series.

162
00:12:12,820 --> 00:12:15,440
The mean and variance could be functions of time.

163
00:12:16,180 --> 00:12:20,680
So as an example, imagine that you wanted to compute the mean daily stock return.

164
00:12:21,490 --> 00:12:26,130
In order to do that, you would have to take the daily stock return over some window of time.

165
00:12:26,740 --> 00:12:30,380
That only makes sense if you believe the stock return to stationary.

166
00:12:31,030 --> 00:12:34,570
So it clearly wouldn't make sense to compute something like the mean price.

167
00:12:35,290 --> 00:12:39,850
Typically, when we want to compute estimates like the sample mean and the sample variance, we need

168
00:12:39,850 --> 00:12:40,540
samples.

169
00:12:41,290 --> 00:12:44,480
In other words, samples which all come from the same distribution.

170
00:12:45,370 --> 00:12:49,920
Otherwise, it wouldn't really make sense to combine the samples to compute some statistic.

171
00:12:50,260 --> 00:12:55,390
It only makes sense to take samples from different points in time if their properties do not change

172
00:12:55,390 --> 00:12:56,070
over time.