1
00:00:11,050 --> 00:00:16,030
In this lecture, we are going to combine what we learned about in the previous lectures and build up

2
00:00:16,030 --> 00:00:17,380
the full Arima model.

3
00:00:17,920 --> 00:00:23,410
Before we do that, however, let's first discuss what we get when we combine the R and the M models

4
00:00:23,410 --> 00:00:25,640
only, which gives us the AMA model.

5
00:00:26,140 --> 00:00:31,360
After that, we'll discuss what the AI component means and then combine that with the AMA model to give

6
00:00:31,360 --> 00:00:32,590
us the Arima model.

7
00:00:37,460 --> 00:00:42,410
So I think the AMA model is pretty straightforward once you understand the auto of aggressive and moving

8
00:00:42,410 --> 00:00:46,510
average parts separately, the AMA model simply adds them together.

9
00:00:47,120 --> 00:00:53,300
Therefore, you would use this model if you believe that each point in the time series is linearly correlated

10
00:00:53,420 --> 00:00:57,620
with both pass points in the Time series, as well as past errors of the model.

11
00:00:58,310 --> 00:01:01,280
Note that the abbreviated form of this model is ama p.

12
00:01:01,280 --> 00:01:08,150
Q This is short for auto regressive moving average model with the auto regressive part has auto P and

13
00:01:08,150 --> 00:01:10,160
the moving average part has autocue.

14
00:01:15,080 --> 00:01:21,580
Now that we know what Amah is, let's talk about Arima, the IPART part in Arima stands for Integrated.

15
00:01:22,040 --> 00:01:27,300
Therefore, the full name for this model is auto regressive, integrated, moving average.

16
00:01:27,830 --> 00:01:32,440
What is interesting about this model is that the integrated part has no parameters.

17
00:01:32,690 --> 00:01:37,460
So the way it looks is very different from the auto regressive and moving average parts.

18
00:01:37,940 --> 00:01:39,800
So how does the integrated part where.

19
00:01:44,780 --> 00:01:50,300
To understand the integrated model, we have to understand differences, I suppose that we have some

20
00:01:50,300 --> 00:01:56,810
time series Y of T to perform differences on this Time series, you would define a new time series,

21
00:01:57,020 --> 00:02:02,090
Delta Y of T, which is defined as Y of T, minus Y of T minus one.

22
00:02:02,900 --> 00:02:07,340
That is for each point in the Time series we subtract of the previous data point.

23
00:02:08,180 --> 00:02:11,700
Now a good question to ask is, why would you want to do such a thing?

24
00:02:12,380 --> 00:02:16,220
In fact, this is not the first time we have seen differences in this course.

25
00:02:16,610 --> 00:02:18,310
We have seen it twice previously.

26
00:02:18,950 --> 00:02:20,030
One time we saw different.

27
00:02:20,030 --> 00:02:21,950
Same is with log prices.

28
00:02:22,340 --> 00:02:26,150
When we perform different single log prices, we get the log return.

29
00:02:26,990 --> 00:02:30,610
As you know, this is a pretty important quantity in finance.

30
00:02:31,190 --> 00:02:37,160
Another time we saw different is what the whole winters' model, the first difference of the level is

31
00:02:37,160 --> 00:02:39,430
used to estimate the trend of the series.

32
00:02:39,950 --> 00:02:45,890
In other words, one way to think of differences is defending, or in other words, separating out the

33
00:02:45,890 --> 00:02:46,450
trend.

34
00:02:47,060 --> 00:02:50,030
In fact, we see this in Stock Price Time series as well.

35
00:02:50,810 --> 00:02:56,540
After taking the first difference of the log prices, we get the log return, which is generally a noisy

36
00:02:56,540 --> 00:02:59,690
signal that fluctuates around a value close to zero.

37
00:03:04,600 --> 00:03:10,930
So why do we want to difference our Time series, the reason for this is when we use Arma models, that

38
00:03:10,930 --> 00:03:13,150
is auto regressive, moving average models.

39
00:03:13,420 --> 00:03:14,620
We want the time series.

40
00:03:14,620 --> 00:03:17,540
We use the model on to be close to stationary.

41
00:03:18,160 --> 00:03:19,600
Now, what does stationary mean?

42
00:03:19,600 --> 00:03:21,530
Again, at a high level?

43
00:03:21,580 --> 00:03:25,340
It means that the distribution of your signal does not change over time.

44
00:03:26,050 --> 00:03:30,880
There are more details we can discuss regarding stationary, but we will leave this for a later lecture.

45
00:03:31,600 --> 00:03:37,780
Stationary is a nice property because it means that various statistics, such as the mean variance and

46
00:03:37,780 --> 00:03:40,930
auto correlation, will remain constant over time.

47
00:03:41,650 --> 00:03:46,510
This is a good thing when you're trying to fit an arm, a model, because remember that each window

48
00:03:46,510 --> 00:03:50,890
of the Time series is like another training point when you are fitting the model.

49
00:03:55,820 --> 00:03:59,910
Now, this slide is optional, but I want to elaborate on the previous point a bit more.

50
00:04:00,560 --> 00:04:04,940
Recall that earlier I talked about training your own auto regressive model, using Saikat.

51
00:04:04,940 --> 00:04:08,860
Learn by first building your data set from the Time series.

52
00:04:09,350 --> 00:04:14,840
As you can see, this data set is built out of sliding windows from the original time series.

53
00:04:15,350 --> 00:04:18,140
First, the window covers Y one y two and why three.

54
00:04:18,590 --> 00:04:20,470
Then we slide it over one step.

55
00:04:20,990 --> 00:04:26,750
Now the window covers y two y three and Y four, then we slide it over one step again and so on.

56
00:04:31,750 --> 00:04:37,990
One important aspect of machine learning is that the data in your data set is assumed to all come from

57
00:04:37,990 --> 00:04:39,190
the same distribution.

58
00:04:39,610 --> 00:04:42,450
If it didn't, then you wouldn't be able to learn anything.

59
00:04:43,150 --> 00:04:48,590
Remember, machine learning is pattern recognition and pattern recognition is machine learning.

60
00:04:49,060 --> 00:04:52,730
If we mess up the pattern, then we mess up the machine learning model.

61
00:04:53,440 --> 00:04:56,680
Imagine, for example, that we had our table of salary data.

62
00:04:57,220 --> 00:05:02,020
Suppose that we work at a software engineering company and all of the coworkers who helped you fill

63
00:05:02,020 --> 00:05:04,630
in your spreadsheet were software engineers.

64
00:05:05,260 --> 00:05:10,060
Then all of a sudden, one of your buddies from the marketing department fills in their data on the

65
00:05:10,060 --> 00:05:10,690
spreadsheet.

66
00:05:11,440 --> 00:05:15,580
But unfortunately, marketers get paid at differently than software engineers.

67
00:05:15,580 --> 00:05:21,280
And so it's likely that the distribution for marketers salaries is different from the distribution for

68
00:05:21,280 --> 00:05:22,960
a software engineer salaries.

69
00:05:23,380 --> 00:05:28,130
Your model might see this data point as an outlier, which might bias the results.

70
00:05:28,750 --> 00:05:32,260
So that's why we want all of our data to come from the same distribution.

71
00:05:32,740 --> 00:05:38,590
Otherwise, what we are trying to learn won't make much sense if we're trying to predict software engineer

72
00:05:38,590 --> 00:05:39,270
salaries.

73
00:05:39,430 --> 00:05:42,640
We don't want data from people who are not software engineers.

74
00:05:44,870 --> 00:05:50,870
For us, we have a time series and each position on this Time series makes up a new data point in our

75
00:05:50,870 --> 00:05:51,590
training set.

76
00:05:52,190 --> 00:05:57,260
If we would like all these data points to come from the same distribution, then the Time series being

77
00:05:57,260 --> 00:06:02,330
stationary accomplishes that because that's exactly what stationary means.

78
00:06:06,970 --> 00:06:11,530
So let's get back to stationary now, we know why stationary is good.

79
00:06:12,040 --> 00:06:17,830
Well, it turns out that different seeing a time series often helps to make the Time series stationary.

80
00:06:18,490 --> 00:06:23,610
Now, remember that when you're working with real data, nothing is exact and you're always approximating.

81
00:06:24,160 --> 00:06:26,500
Yes, we know about volatility clustering.

82
00:06:26,680 --> 00:06:30,640
So we know that the variance of the log return can fluctuate over time.

83
00:06:31,510 --> 00:06:33,910
Does that mean we throw out a rhema completely?

84
00:06:34,060 --> 00:06:35,080
The answer is no.

85
00:06:36,560 --> 00:06:41,510
And sometimes we may even find that we have to difference twice in order to get a time series that looks

86
00:06:41,510 --> 00:06:42,590
somewhat stationary.

87
00:06:43,400 --> 00:06:46,280
Usually, however, we don't difference more than twice.

88
00:06:50,970 --> 00:06:53,880
OK, so what does defensing have to do with a Arima?

89
00:06:54,510 --> 00:07:02,550
We say that a process is ID if it is stationary after being differenced D times and ID process is a

90
00:07:02,550 --> 00:07:05,420
process that is integrated to order D.

91
00:07:06,120 --> 00:07:08,640
Once we have this, we can define the Arima model.

92
00:07:09,270 --> 00:07:16,310
The Arima model has three arguments P and Q, as you recall, P refers to the auto regressive part.

93
00:07:16,710 --> 00:07:21,420
Q refers to the moving average part and now we know that D refers to the integrated part.

94
00:07:22,080 --> 00:07:23,510
Thus Arima P.

95
00:07:23,520 --> 00:07:26,210
Q is just a model where we have difference.

96
00:07:26,290 --> 00:07:31,160
The original time series D times before applying the ama p q model.

97
00:07:31,800 --> 00:07:34,950
So that's why the eye part of the Arima is strange.

98
00:07:35,460 --> 00:07:40,650
Unlike the auto regressive and moving average parts which have some formula that you can use to make

99
00:07:40,650 --> 00:07:45,870
predictions, the integrated part describes an operation that you perform on the data.

100
00:07:50,720 --> 00:07:56,120
Now that we know what the full Arima model is, we can see that all of the previous models we've discussed

101
00:07:56,270 --> 00:07:58,790
are actually just special cases of the Arima.

102
00:07:59,300 --> 00:08:03,830
For example, a remap zero zero isn't IRP process.

103
00:08:04,400 --> 00:08:07,200
This is also the same as an arm up zero.

104
00:08:07,970 --> 00:08:12,020
Similarly, Arima zero zero Q is an AMA zero.

105
00:08:12,020 --> 00:08:18,260
Q Which is also an Macu Arima zero zero is an ID.

106
00:08:23,180 --> 00:08:30,380
One interesting special case is Arima zero one zero or equivalently, I one, this is what is known

107
00:08:30,380 --> 00:08:33,160
as a random walk, which we discussed previously.

108
00:08:33,830 --> 00:08:35,340
This model looks as follows.

109
00:08:35,750 --> 00:08:41,310
It says that Delta Wafty is equal to Epsilon of t the noise at time t.

110
00:08:42,110 --> 00:08:43,580
Why is this a special case?

111
00:08:44,120 --> 00:08:50,180
Well, it just means that there is no auto regressive part and no moving average part by omitting these

112
00:08:50,180 --> 00:08:50,620
two.

113
00:08:50,810 --> 00:08:54,840
All that we are left with is noise, the left side delta.

114
00:08:54,910 --> 00:08:56,690
It is just the difference.

115
00:08:56,690 --> 00:09:04,130
That Time series, equivalently it says that Y of T minus Y of T minus one is equal to the noise at

116
00:09:04,130 --> 00:09:04,430
time.

117
00:09:04,430 --> 00:09:11,690
T if we move the Y of T minus one it to the other side we get that wave T is equal to Y of T minus one

118
00:09:11,840 --> 00:09:16,610
plus Ypsilanti, which is exactly the random walk formula we discussed earlier.

119
00:09:17,750 --> 00:09:24,260
It says that whatever process in nature leads to our Time series, it takes the existing value Y of

120
00:09:24,260 --> 00:09:30,800
T minus one, generates some random noise Ypsilanti and adds them together to get the next data point

121
00:09:30,800 --> 00:09:31,040
y.

122
00:09:31,040 --> 00:09:31,490
T.

123
00:09:36,410 --> 00:09:42,380
Another way to view this, if we think about financial quantities is this suppose our data set is a

124
00:09:42,380 --> 00:09:47,300
time series of log prices, then the first difference is the log return.

125
00:09:47,840 --> 00:09:52,390
A random walk says the log return is just Ypsilanti, the noise.

126
00:09:52,940 --> 00:09:58,100
What this tells us is that there was nothing about the log return that is predictable except for the

127
00:09:58,100 --> 00:09:59,830
expected value of the noise.

128
00:10:00,260 --> 00:10:05,380
It does not depend on any past values, which is what an auto regressive model would tell us.

129
00:10:05,870 --> 00:10:11,050
It does not depend on any past noise terms, which is what a moving average model would tell us.

130
00:10:12,020 --> 00:10:18,800
Instead, a random walk says that the log return is purely noise from a single point in time and hence

131
00:10:18,950 --> 00:10:20,520
completely unpredictable.

132
00:10:20,720 --> 00:10:27,260
Aside from the mean of the noise, therefore, if we later fit in a rhema model to a stock price time

133
00:10:27,260 --> 00:10:34,340
series and we find that the best RMI model is Arima zero one zero, this tells us that the data follows

134
00:10:34,340 --> 00:10:40,220
a random walk and the return can't be predicted using previous values in the Times series.