1
00:00:11,030 --> 00:00:16,600
In this lecture, we are going to continue our discussion on how to choose a rhema hyper parameters,

2
00:00:17,090 --> 00:00:20,830
this lecture will focus on the key parameter, which is for the moving average.

3
00:00:21,320 --> 00:00:26,570
It's kind of interesting, although entirely inconsequential, that we are discussing these backwards

4
00:00:26,570 --> 00:00:30,540
in terms of the order we introduced each component of the Arima model.

5
00:00:31,430 --> 00:00:36,830
It turns out that in the Arima model, it makes the most sense to discuss the auto regressive part first,

6
00:00:37,010 --> 00:00:39,710
then the moving average and then the integrated part.

7
00:00:40,310 --> 00:00:46,460
However, in terms of hyper parameters, it's easiest to understand stationary first, then the Akef,

8
00:00:46,460 --> 00:00:51,930
which is the topic of this lecture, and then the pickoff, which will be the topic of the next lecture.

9
00:00:52,550 --> 00:00:54,140
In any case, let's continue.

10
00:00:59,060 --> 00:01:04,370
So this lecture is all about the akef, which it turns out will help us determine the key parameter

11
00:01:04,520 --> 00:01:05,660
in the Yarema model.

12
00:01:06,470 --> 00:01:08,000
So what is the ATF?

13
00:01:08,510 --> 00:01:11,550
Akef stands for auto correlation function.

14
00:01:12,200 --> 00:01:14,730
Note that this is also known as the telegram.

15
00:01:15,590 --> 00:01:21,650
But before we talk about the auto correlation function, we have to first understand what the auto correlation

16
00:01:21,650 --> 00:01:22,850
is to begin with.

17
00:01:23,830 --> 00:01:29,080
Like the covariance and auto covariance, the auto correlation is similarly defined.

18
00:01:29,590 --> 00:01:34,960
You will understand this if you didn't skip the second part of the stationary lecture, in case you

19
00:01:34,960 --> 00:01:36,390
did, let's recap.

20
00:01:37,570 --> 00:01:43,840
As you know, the covariance is defined as this expected value between any two random variables.

21
00:01:44,350 --> 00:01:51,160
The auto covariance is simply the covariance between two specific random variables selected from two

22
00:01:51,160 --> 00:01:54,060
possibly different time points in a time series.

23
00:01:54,550 --> 00:02:00,130
That is, the auto part simply means that the two random variables came from the same time series.

24
00:02:00,610 --> 00:02:04,770
Otherwise, it is still just a covariance auto means self.

25
00:02:04,780 --> 00:02:08,260
So you can hopefully understand the semantics behind this terminology.

26
00:02:09,070 --> 00:02:15,100
So as mentioned, auto correlation as similarly defined recall that correlation is nice because it's

27
00:02:15,100 --> 00:02:17,260
a scaled version of the covariance.

28
00:02:17,800 --> 00:02:23,950
As such, since we can always expect the correlation to be between minus one and plus one, we can expect

29
00:02:23,950 --> 00:02:25,780
the same of the auto correlation.

30
00:02:26,680 --> 00:02:32,560
Note that in some fields outside of time series analyses such as engineering, the terms auto covariance

31
00:02:32,560 --> 00:02:38,530
and autocorrelation might be used interchangeably and the scale diversion would be specified as the

32
00:02:38,530 --> 00:02:40,420
auto correlation coefficient.

33
00:02:45,250 --> 00:02:48,220
Now, these equations are nice, but how will they help us choose?

34
00:02:48,220 --> 00:02:54,660
Q Now that we understand what the autocorrelation is, we can discuss the auto correlation function.

35
00:02:55,640 --> 00:03:02,030
Suppose that we have some time series, why one up to why Big T if we take the correlation between each

36
00:03:02,030 --> 00:03:07,430
point in the Time series and each point in the Time series, we will get a big matrix of size that big

37
00:03:07,430 --> 00:03:08,630
T by Big T.

38
00:03:09,260 --> 00:03:14,720
In fact, you've already seen how we can do this, but actually this won't even work because we don't

39
00:03:14,720 --> 00:03:17,570
even have multiple samples of the Time series.

40
00:03:17,570 --> 00:03:19,820
We are only given a single time series.

41
00:03:20,240 --> 00:03:25,220
That is, we only have one version of one, one version of Y two and so on.

42
00:03:25,760 --> 00:03:31,430
In order to take the sample autocorrelation or auto covariance, we would need multiple Y ones and multiple

43
00:03:31,430 --> 00:03:32,300
Y twos.

44
00:03:32,840 --> 00:03:37,610
So we can't use the typical formula for the sample autocorrelation that you saw before.

45
00:03:42,440 --> 00:03:49,700
In fact, it turns out that the auto correlation function is defined as if the time series were stationary,

46
00:03:50,330 --> 00:03:56,810
as you recall, stationary either weeks and stationary or strong sense stationary implies that the auto

47
00:03:56,810 --> 00:03:59,380
correlation remains constant over time.

48
00:04:00,110 --> 00:04:06,410
Since that is the case, the auto correlation is only a function of the distance between any two data

49
00:04:06,410 --> 00:04:06,920
points.

50
00:04:06,920 --> 00:04:11,030
Why it's one and why T two will denote this distance tau.

51
00:04:11,420 --> 00:04:14,420
So T one minus two is equal to tau.

52
00:04:15,770 --> 00:04:19,060
Therefore the auto correlation is a function of tau.

53
00:04:19,250 --> 00:04:25,400
And since we assume that the auto correlation remains constant over time, we can use all of the samples

54
00:04:25,400 --> 00:04:29,840
from a single time series to compute the auto correlation for any tau.

55
00:04:30,440 --> 00:04:33,470
I understand that describing this in words can be confusing.

56
00:04:33,800 --> 00:04:36,920
However, the formula helps to provide lots of insight.

57
00:04:38,480 --> 00:04:41,270
The components of this formula should all seem familiar.

58
00:04:42,020 --> 00:04:46,500
First, we divide by T minus tau because that's how many samples we are adding up.

59
00:04:47,090 --> 00:04:52,340
We divide by Sigma squared, since without this we would just have the auto covariance formula.

60
00:04:52,460 --> 00:04:59,280
And the auto correlation is the auto covariance divided by the Sigma's of the two random variables inconsideration.

61
00:04:59,840 --> 00:05:05,390
But again, since we are assuming stationary, that means that every point in the Time series has the

62
00:05:05,390 --> 00:05:06,190
same sigma.

63
00:05:06,380 --> 00:05:08,740
And so that's why we divide by Sigma twice.

64
00:05:09,980 --> 00:05:17,480
Finally, inside the summation we have y a t minus mu times y a T plus tau minus mu, which is exactly

65
00:05:17,480 --> 00:05:21,020
what goes inside the expected value for the auto correlation.

66
00:05:21,770 --> 00:05:27,050
In other words, after considering every component of this equation, you should be convinced that this

67
00:05:27,050 --> 00:05:29,990
does in fact compute the sample autocorrelation.

68
00:05:34,910 --> 00:05:39,620
So what is the consequence of having an auto correlation function, which is only a function of the

69
00:05:39,620 --> 00:05:46,400
lag tau note that we call this the lag because it's as if we are comparing two of the same time series,

70
00:05:46,520 --> 00:05:50,450
except that one of the time series is lagging behind by the duration.

71
00:05:51,800 --> 00:05:53,610
Well, what we get is a graph.

72
00:05:54,110 --> 00:06:01,070
This graph plots row the auto correlation on the vertical axis and how the leg on the horizontal axis,

73
00:06:01,700 --> 00:06:07,610
usually such a plot is called the AKF and in some languages such as R, this function is built right

74
00:06:07,610 --> 00:06:09,840
in in Python.

75
00:06:09,890 --> 00:06:14,630
This function is included in statistical libraries such as Sipi and stat's models.

76
00:06:15,410 --> 00:06:21,290
Generally you will see a plot like the following where the akef like zero is equal to one, and then

77
00:06:21,290 --> 00:06:23,610
all the rest of the values are smaller than one.

78
00:06:24,740 --> 00:06:31,580
The reason why the HCF at like zero is one is because this is just the variance divided by the variance

79
00:06:31,610 --> 00:06:34,760
or sigma squared over sigma squared, which must be one.

80
00:06:35,480 --> 00:06:40,340
If you want, you can check the equation for the sample autocorrelation to confirm this.

81
00:06:45,090 --> 00:06:51,120
The interesting part of the fly is that it also provides confidence intervals, these confidence intervals

82
00:06:51,120 --> 00:06:53,310
can be thought of as a kind of threshold.

83
00:06:53,820 --> 00:06:59,550
If we see any lagged auto correlations that are greater than the threshold, we would reject that they

84
00:06:59,550 --> 00:07:05,130
are equal to zero and hence say that they are non-zero to state it more casually.

85
00:07:05,410 --> 00:07:11,190
Any point we find in the Akef plot that goes outside of the confidence interval, we can consider it

86
00:07:11,190 --> 00:07:12,240
to be non-zero.

87
00:07:13,050 --> 00:07:18,870
Now it's important to also remember the frequent test interpretation of confidence intervals, since

88
00:07:18,870 --> 00:07:21,230
this is a 95 percent confidence interval.

89
00:07:21,570 --> 00:07:27,420
That means five percent of the time one of the values in the Akef plot will just pop out of the confidence

90
00:07:27,420 --> 00:07:28,410
interval randomly.

91
00:07:29,100 --> 00:07:34,680
Therefore, if you have, say, 40 lag's, you might expect two of them to just be randomly outside

92
00:07:34,680 --> 00:07:35,690
the confidence interval.

93
00:07:36,360 --> 00:07:40,180
For example, you might see one at like 25 and one to like 39.

94
00:07:40,860 --> 00:07:46,080
Usually you can use your intuition to determine that this is happening by chance and is not actually

95
00:07:46,080 --> 00:07:46,770
significant.

96
00:07:51,810 --> 00:07:57,270
OK, so now that we understand what the Akef plot is and how to look at one, how does this help us

97
00:07:57,270 --> 00:08:00,660
determine the order cue for the moving average process?

98
00:08:01,410 --> 00:08:06,800
Well, it turns out to have a very simple interpretation, a moving average process of order.

99
00:08:06,810 --> 00:08:12,340
Q Can be shown to have non-zero autocorrelation up to Lagu.

100
00:08:12,990 --> 00:08:19,650
That is, if I look at an active plot and I see that at lag to the HCF goes outside of the confidence

101
00:08:19,650 --> 00:08:24,140
interval, but after that it does not, then I would choose Q equals two.

102
00:08:25,050 --> 00:08:30,410
Usually it will be the case that the plot goes outside the confidence interval for smaller legs as well.

103
00:08:31,170 --> 00:08:37,650
So as another example, if I look at an active plot and I see that at like five, the plot goes outside

104
00:08:37,650 --> 00:08:43,140
of the confidence interval but stays inside for values greater than five, then I would choose Q equals

105
00:08:43,140 --> 00:08:48,750
five and the same logic applies for a Q equals 3Q equals four or any other Q.

106
00:08:53,680 --> 00:08:58,360
Now, you might be wondering why does it work this way and how do we know that we can do this?

107
00:08:58,960 --> 00:09:04,090
So I'm not going to derive it right at this moment, but I may in the future for some in-depth lectures.

108
00:09:04,810 --> 00:09:10,480
Well, what you can do is you can actually calculate, given a moving average process, what the theoretical

109
00:09:10,480 --> 00:09:11,950
autocorrelation would be.

110
00:09:12,670 --> 00:09:15,520
So recall that in May one process looks like this.

111
00:09:15,760 --> 00:09:23,770
It's as y a time T is equal to a constant plus theta one, a coefficient times the previous error term

112
00:09:23,770 --> 00:09:27,730
epsilon T minus one plus the current error term Epsilon T.

113
00:09:28,510 --> 00:09:36,010
In this case we can calculate row one to be theta one divided by one plus theta one squared row to row

114
00:09:36,010 --> 00:09:37,870
three and so on are all zero.

115
00:09:39,390 --> 00:09:44,940
If we have an inmate to process, we can do the same thing, we find some expression for row one and

116
00:09:44,940 --> 00:09:49,180
row two and we find that row three, four and so on are all zero.

117
00:09:49,860 --> 00:09:52,550
And again, these are just statistics calculations.

118
00:09:53,190 --> 00:09:59,550
Therefore, it is theoretically justified, therefore, and make you process all of the legs up to queue

119
00:09:59,760 --> 00:10:01,850
will have a non-zero autocorrelation.

120
00:10:02,370 --> 00:10:05,820
And that is what we are looking for when we look at an akef PLI.