1
00:00:00,000 --> 00:00:02,655
Hello everyone.

2
00:00:02,655 --> 00:00:04,690
This Tural Sadigov and today we're going to talk about SARIMA processes.

3
00:00:04,690 --> 00:00:10,035
Objectives is to describe seasonal ARIMA models,

4
00:00:10,035 --> 00:00:12,119
which is also called SARIMA models,

5
00:00:12,119 --> 00:00:18,324
and rewrite the seasonal ARIMA models using backshift and difference operators.

6
00:00:18,324 --> 00:00:22,792
So let's remember ARIMA processes Xt.

7
00:00:22,792 --> 00:00:27,210
If we let Y_t to be Delta^d X_t – remember,

8
00:00:27,210 --> 00:00:30,267
Delta is the difference operator,

9
00:00:30,267 --> 00:00:33,249
1-B where B is the backshift operator,

10
00:00:33,249 --> 00:00:35,295
d is the order of differencing.

11
00:00:35,295 --> 00:00:39,677
So, we take difference of X_t d many times, we obtain Y_t,

12
00:00:39,677 --> 00:00:42,979
and then Y_t becomes an ARMA model,

13
00:00:42,979 --> 00:00:44,802
mixed ARMA model. What does it mean?

14
00:00:44,802 --> 00:00:49,970
It has autoregressive parts right here, the p terms,

15
00:00:49,970 --> 00:00:52,890
and it has a moving average parts,

16
00:00:52,890 --> 00:00:56,700
which are a linear combination of the noises.

17
00:00:56,700 --> 00:00:59,445
Now when Y_t is ARMA,

18
00:00:59,445 --> 00:01:04,614
than X_t becomes ARIMA where d is the order of differencing.

19
00:01:04,614 --> 00:01:09,165
Now we can also rewrite this as a polynomial operators.

20
00:01:09,165 --> 00:01:13,965
For example, phi(B) is autoregressive polynomial and theta(B)

21
00:01:13,965 --> 00:01:20,219
is moving average polynomial and this becomes our ARMA model.

22
00:01:20,219 --> 00:01:27,680
Now, but sometimes it is possible that our data might contain some seasonality,

23
00:01:27,680 --> 00:01:29,765
so the way to think about this is the following.

24
00:01:29,765 --> 00:01:34,230
Let's say we are looking at the sales of refrigerators and if you look

25
00:01:34,230 --> 00:01:39,870
at the sales in August of this year and then August of the last year,

26
00:01:39,870 --> 00:01:43,319
there might be some relationship between those two months.

27
00:01:43,319 --> 00:01:47,049
So there might be some seasonality going on and,

28
00:01:47,049 --> 00:01:51,180
in that case, the observations might repeat itself after every,

29
00:01:51,180 --> 00:01:52,734
let's say, s observations.

30
00:01:52,734 --> 00:01:55,045
In this case, 12 observations.

31
00:01:55,045 --> 00:01:57,784
So, for a time series of the monthly observation,

32
00:01:57,784 --> 00:01:59,939
X_t might depend on annual lags.

33
00:01:59,939 --> 00:02:03,329
For example, X_t might depend on X_{t-12},

34
00:02:03,329 --> 00:02:04,755
which is last August;

35
00:02:04,755 --> 00:02:09,175
X_{t-24}, which was August two years ago; and so forth.

36
00:02:09,175 --> 00:02:12,539
In that case, we say that we have

37
00:02:12,539 --> 00:02:18,780
seasonality and the span of the seasonality or the period is s=12.

38
00:02:18,780 --> 00:02:23,745
Now it is also possible that our data comes as quarterly earnings, for example.

39
00:02:23,745 --> 00:02:25,514
We have looked at such data.

40
00:02:25,514 --> 00:02:28,319
We're going to revisit Johnson and Johnson

41
00:02:28,319 --> 00:02:31,444
which was about quarterly earnings of a company.

42
00:02:31,444 --> 00:02:37,270
In that case, the span of the seasonality is actually just four.

43
00:02:37,270 --> 00:02:38,430
So, in that case,

44
00:02:38,430 --> 00:02:42,145
we will like to discuss seasonal ARIMA models,

45
00:02:42,145 --> 00:02:46,120
and Box and Jenkins basically developed these models.

46
00:02:46,120 --> 00:02:50,090
So, what is a pure seasonal ARMA process?

47
00:02:50,090 --> 00:02:56,669
Well, seasonal ARMA process is basically ARMA process but we have instead of little p, q,

48
00:02:56,669 --> 00:03:00,180
we use capital P and Q for the order of the autoregressive terms,

49
00:03:00,180 --> 00:03:01,830
order of the moving average terms,

50
00:03:01,830 --> 00:03:04,469
and s is for span of the seasonality.

51
00:03:04,469 --> 00:03:07,020
And we have the following format.

52
00:03:07,020 --> 00:03:10,680
Only difference between this equation or

53
00:03:10,680 --> 00:03:14,699
this process from the ARMA is that we have this is now at the s here.

54
00:03:14,699 --> 00:03:18,715
Autoregressive polynomial is the following:

55
00:03:18,715 --> 00:03:23,949
1- Phi B^s – not B – B^{2s} – not B^2 – and so forth.

56
00:03:23,949 --> 00:03:27,719
And moving average polynomial is exactly,

57
00:03:27,719 --> 00:03:29,210
very similar, not exactly the same,

58
00:03:29,210 --> 00:03:31,020
it's very similar, but we have B^s,

59
00:03:31,020 --> 00:03:32,169
B^{2s}, and so forth.

60
00:03:32,169 --> 00:03:36,090
Now, just like in the mixed ARMA process,

61
00:03:36,090 --> 00:03:37,800
we want our process,

62
00:03:37,800 --> 00:03:42,870
seasonal pure seasonal ARMA process or pure SARMA process,

63
00:03:42,870 --> 00:03:45,349
to be stationary and invertible.

64
00:03:45,349 --> 00:03:47,573
And for that reason, just like before,

65
00:03:47,573 --> 00:03:51,814
we're going to require that these polynomials have these complex roots and

66
00:03:51,814 --> 00:03:57,900
all of those complex roots are outside of a unit circle.

67
00:03:57,900 --> 00:03:59,425
So let me give you an example.

68
00:03:59,425 --> 00:04:02,854
For example, if I have ARMA(1,0)_12.

69
00:04:02,854 --> 00:04:06,030
So, moving average. I have P=1.

70
00:04:06,030 --> 00:04:09,525
I'm sorry, autoregressive order P=1;

71
00:04:09,525 --> 00:04:11,435
moving average order is zero,

72
00:04:11,435 --> 00:04:12,849
so I don't have any moving average term;

73
00:04:12,849 --> 00:04:14,389
and seasonality is 12.

74
00:04:14,389 --> 00:04:21,000
We basically have only polynomial of autoregressive polynomial of degree one,

75
00:04:21,000 --> 00:04:24,640
but it's not really degree one, it's degree one times s, which is 12.

76
00:04:24,640 --> 00:04:26,310
And if I rewrite this,

77
00:04:26,310 --> 00:04:28,485
if I expand it and rewrite it,

78
00:04:28,485 --> 00:04:32,355
you can see that X_t here depends on annual lags.

79
00:04:32,355 --> 00:04:34,274
For example, if this is a monthly data,

80
00:04:34,274 --> 00:04:38,904
it depends on X_{t-12} and plus some noise.

81
00:04:38,904 --> 00:04:41,234
Let's look at ARMA(1,1)_12.

82
00:04:41,234 --> 00:04:45,164
In this case, we don't now only have autoregressive polynomial,

83
00:04:45,164 --> 00:04:47,370
we also have moving average polynomial.

84
00:04:47,370 --> 00:04:50,504
Again, degree one times s, which is 12.

85
00:04:50,504 --> 00:04:52,000
And if I expand it,

86
00:04:52,000 --> 00:04:55,620
we obtain that X_t depends on X_{t-12},

87
00:04:55,620 --> 00:05:00,959
but it also depends on the noise from last year if this was a monthly data.

88
00:05:00,959 --> 00:05:08,555
Now, in general, not just pure seasonal ARMA process,

89
00:05:08,555 --> 00:05:11,439
if you look at seasonal ARIMA process,

90
00:05:11,439 --> 00:05:14,930
then we have seven parameters.

91
00:05:14,930 --> 00:05:16,220
We have p, d, q,

92
00:05:16,220 --> 00:05:19,079
capital P, capital D,

93
00:05:19,079 --> 00:05:24,380
captive Q, and s. And this is the polynomial form of that process.

94
00:05:24,380 --> 00:05:25,849
We have (1- B)^d;

95
00:05:25,849 --> 00:05:29,970
this is basically the difference operator d many times.

96
00:05:29,970 --> 00:05:33,490
This is coming from I here, ARIMA is I.

97
00:05:33,490 --> 00:05:39,535
And I also have a seasonal differencing: (1-B^S)^D.

98
00:05:39,535 --> 00:05:40,639
So this is seasonal differencing,

99
00:05:40,639 --> 00:05:42,864
this is non-seasonal differencing.

100
00:05:42,864 --> 00:05:45,944
We have usual autoregressive polynomial,

101
00:05:45,944 --> 00:05:47,355
but we also have

102
00:05:47,355 --> 00:05:53,230
seasonal autoregressive process – I'm sorry – seasonal autoregressive polynomial.

103
00:05:53,230 --> 00:05:55,365
If you look at the right-hand side,

104
00:05:55,365 --> 00:05:58,004
we have moving average polynomial,

105
00:05:58,004 --> 00:05:59,460
we have seasonal moving

106
00:05:59,460 --> 00:06:04,170
average polynomial as well and all of them are specified right here.

107
00:06:04,170 --> 00:06:07,584
In this SARIMA models,

108
00:06:07,584 --> 00:06:08,939
basically we have two parts.

109
00:06:08,939 --> 00:06:12,199
We have a non-seasonal part and we have a seasonal part.

110
00:06:12,199 --> 00:06:16,220
So p here is the order of non-seasonal AR terms,

111
00:06:16,220 --> 00:06:19,375
d is the order of non-seasonal differencing,

112
00:06:19,375 --> 00:06:22,564
q is the order of non-seasonal moving average terms,

113
00:06:22,564 --> 00:06:27,660
capital P is the order of seasonal autoregressive terms.

114
00:06:27,660 --> 00:06:32,529
In other words, sometimes we say SR, right, SAR terms.

115
00:06:32,529 --> 00:06:36,195
D, capital D, is the order of seasonal differencing,

116
00:06:36,195 --> 00:06:38,329
in other words, (1-B^s).

117
00:06:38,329 --> 00:06:41,269
And Q is going to be order of seasonal MA terms and

118
00:06:41,269 --> 00:06:44,269
sometimes we're gonna write this as SMA terms.

119
00:06:44,269 --> 00:06:47,751
Now, as in ARIMA processes,

120
00:06:47,751 --> 00:06:50,849
differencing – we don't have much differencing usually,

121
00:06:50,849 --> 00:06:54,134
it's either one or two in practice.

122
00:06:54,134 --> 00:06:56,324
So if D, the capital D=1,

123
00:06:56,324 --> 00:07:03,276
then Delta operator – this is Delta_s – seasonal differencing X_t is (1-B^s)X_t,

124
00:07:03,276 --> 00:07:07,850
and this is basically X_t-X_{t-s}.

125
00:07:07,850 --> 00:07:09,264
So you look at the differences.

126
00:07:09,264 --> 00:07:11,225
So if this was again monthly data,

127
00:07:11,225 --> 00:07:14,279
and this is basically we are looking at the differences of

128
00:07:14,279 --> 00:07:17,930
the sales in last August and this August.

129
00:07:17,930 --> 00:07:23,219
If D=2, then we are looking at differencing, double differencing, right?

130
00:07:23,219 --> 00:07:25,814
It's gonna be (1-B^s)^2.

131
00:07:25,814 --> 00:07:33,639
If I expand it and I open it up, it becomes X_t-2X_{t-s}+X_{t-2s}.

132
00:07:33,639 --> 00:07:37,480
So let me give you example of SARIMA process.

133
00:07:37,480 --> 00:07:42,935
So this SARIMA model, SARIMA process (1,0,0,1,0,1)_12.

134
00:07:42,935 --> 00:07:45,980
So this is little p, little d, little q.

135
00:07:45,980 --> 00:07:47,449
This is capital P,

136
00:07:47,449 --> 00:07:51,314
capital D and capital Q and seasonality is 12.

137
00:07:51,314 --> 00:07:53,314
I can see that there is no differencing,

138
00:07:53,314 --> 00:07:56,290
so I don't expect any differencing in my model.

139
00:07:56,290 --> 00:08:01,454
And there is no moving average term in my model,

140
00:08:01,454 --> 00:08:03,754
but there is seasonal moving average terms.

141
00:08:03,754 --> 00:08:07,074
There is seasonal autoregressive terms and there's

142
00:08:07,074 --> 00:08:10,920
usual non-seasonal autoregressive terms as well.

143
00:08:10,920 --> 00:08:15,852
So (1-phi(B)) – that is basically coming from this one,

144
00:08:15,852 --> 00:08:18,879
degree of that polynomial is one here.

145
00:08:18,879 --> 00:08:25,439
This is coming from this one which is the degree of seasonal autoregressive polynomial,

146
00:08:25,439 --> 00:08:27,605
which is one times 12,

147
00:08:27,605 --> 00:08:29,295
which is 12 here.

148
00:08:29,295 --> 00:08:35,529
And this part, this is seasonal moving average polynomial with degree one times 12.

149
00:08:35,529 --> 00:08:40,259
If I expand this just like a polynomial and if I expand this,

150
00:08:40,259 --> 00:08:44,100
and I can obtain that X_t is actually depends on X_{t-1}.

151
00:08:44,100 --> 00:08:47,190
So, X_t depends on the previous lag,

152
00:08:47,190 --> 00:08:49,394
it depends on previous year,

153
00:08:49,394 --> 00:08:53,945
and surprisingly, it also depends on X_{t-13}.

154
00:08:53,945 --> 00:08:57,504
Right? So this is one lag before the last year's data.

155
00:08:57,504 --> 00:09:03,179
And it also depends on noise from last year as well.

156
00:09:03,179 --> 00:09:04,830
Let me give you another example.

157
00:09:04,830 --> 00:09:08,316
This is SARIMA(0,1,1,0,0,1).

158
00:09:08,316 --> 00:09:13,559
Here, we do not have autoregressive terms or seasonal autoregressive terms,

159
00:09:13,559 --> 00:09:16,259
and we do not have seasonal differencing,

160
00:09:16,259 --> 00:09:18,504
but we have moving average terms;

161
00:09:18,504 --> 00:09:20,930
we have seasonal moving average terms;

162
00:09:20,930 --> 00:09:24,544
we also have non-seasonal differencing.

163
00:09:24,544 --> 00:09:27,694
Now this four here is the span of the seasonality,

164
00:09:27,694 --> 00:09:30,950
so you can think of this as a quarterly data.

165
00:09:30,950 --> 00:09:38,455
So (1-B), it's coming from non-seasonal differencing; and (1+Theta_1(B)),

166
00:09:38,455 --> 00:09:41,549
that's coming from non-seasonal moving average terms;

167
00:09:41,549 --> 00:09:45,419
and (1+Theta_1 B^4), that comes

168
00:09:45,419 --> 00:09:50,754
from seasonal moving average terms with the degree becomes 1 times 4, which is 4.

169
00:09:50,754 --> 00:09:54,809
If I expand this and put everything to the right-hand side,

170
00:09:54,809 --> 00:09:57,620
we obtain that X_t depends on X_{t-1}.

171
00:09:57,620 --> 00:10:00,595
This is because of non-seasonal differencing.

172
00:10:00,595 --> 00:10:03,547
And then we have noises from previous lags.

173
00:10:03,547 --> 00:10:08,299
Z_t, Z_{t-1}, this is coming from moving average part,

174
00:10:08,299 --> 00:10:14,919
and Z_{t-4}, Z_{t-5}, this is from seasonal moving average part in the model.

175
00:10:14,919 --> 00:10:16,125
So, what have you learned?

176
00:10:16,125 --> 00:10:17,934
You have learned how to describe

177
00:10:17,934 --> 00:10:21,164
seasonal autoregressive integrated, moving average models;

178
00:10:21,164 --> 00:10:24,549
and you have learned how to rewrite seasonal autoregressive moving

179
00:10:24,549 --> 00:10:28,000
average models using backshift and difference operators.