1
00:00:00,590 --> 00:00:04,140
In this lecture,
we'll talk about autocovariance function.

2
00:00:04,140 --> 00:00:05,790
Objectives are the following.

3
00:00:05,790 --> 00:00:10,080
We'll recall random variables from
our introductory statistics and

4
00:00:10,080 --> 00:00:15,600
probability class, and we'll recall
the covariance of two random variables.

5
00:00:15,600 --> 00:00:17,980
We will give a new
definition to a time series.

6
00:00:17,980 --> 00:00:22,900
We'll characterize time series as
a realization of a stochastic process.

7
00:00:22,900 --> 00:00:26,940
We'll talk about stochastic process
taking this cycle as well, and

8
00:00:26,940 --> 00:00:29,800
we'll define autocovariance function.

9
00:00:29,800 --> 00:00:31,330
So what's a random variable?

10
00:00:31,330 --> 00:00:38,010
Random variable is a function that goes
from sample space to real numbers.

11
00:00:38,010 --> 00:00:43,040
Number of sample space are all possible
outcomes of the experiment, and

12
00:00:43,040 --> 00:00:46,730
if we map each possible
outcome of the experiment

13
00:00:46,730 --> 00:00:49,170
with the number in the green line,
we get a random variable.

14
00:00:50,420 --> 00:00:53,360
For people who are familiar
with the measure theory,

15
00:00:53,360 --> 00:00:55,570
random variable is basically
a measurable function.

16
00:00:57,740 --> 00:01:01,140
But for us, we'll look at it
in a slightly different way.

17
00:01:01,140 --> 00:01:03,200
We're going to look at it as a machine.

18
00:01:03,200 --> 00:01:07,520
Basically, it's a machine that
produces this random numbers.

19
00:01:07,520 --> 00:01:12,970
Now once it produces a lot of numbers,
those numbers together is a data set.

20
00:01:12,970 --> 00:01:18,230
If we start with data like this, we can
say, they're all coming from this machine.

21
00:01:18,230 --> 00:01:22,847
This random variable x, if I know
the properties of the random variable,

22
00:01:22,847 --> 00:01:26,347
for example the distribution
of this random variable,

23
00:01:26,347 --> 00:01:29,488
I can say something
meaningful about my dataset.

24
00:01:30,705 --> 00:01:35,568
So here we have random variable, actually
we have a random variable in the right

25
00:01:35,568 --> 00:01:41,670
outside, but we have a dataset in the left
outside, 45, 36, 27, it's a dataset.

26
00:01:41,670 --> 00:01:44,969
But if we assume that it comes
from this one variable x,

27
00:01:44,969 --> 00:01:48,986
we're more than left with x, and
mathematically we work on x, and

28
00:01:48,986 --> 00:01:54,044
then we inverse something meaningful
about the dataset using the proper.

29
00:01:54,044 --> 00:01:56,455
From your probability and
statistics class,

30
00:01:56,455 --> 00:02:00,298
you already know that random variables
might be discrete or continuous.

31
00:02:00,298 --> 00:02:05,514
The script running variably
produces countable

32
00:02:05,514 --> 00:02:09,850
pascal points numbers on a real line.

33
00:02:09,850 --> 00:02:13,760
For example on the left down side, X is
a discrete right number variable, possible

34
00:02:13,760 --> 00:02:20,950
outcomes of X is 20, 30, 57 and so
forth so basically they're countably many.

35
00:02:20,950 --> 00:02:25,551
But on the right hand side, we have
a continuous random variable y, and

36
00:02:25,551 --> 00:02:30,630
it might have any point, might take any
point in between lets say 10 to 60.

37
00:02:30,630 --> 00:02:35,170
Now before we do experiment,
everything is random, right?

38
00:02:35,170 --> 00:02:36,944
You pull up in the coin,
you have a randomness.

39
00:02:36,944 --> 00:02:38,600
It can be heads or tails.

40
00:02:38,600 --> 00:02:43,720
But once we flip the coin, the result of
experiment is known, randomness is gone.

41
00:02:43,720 --> 00:02:46,350
So the same thing happens here, right?

42
00:02:46,350 --> 00:02:50,290
Once we do the experiment,
let's say X becomes 20,

43
00:02:50,290 --> 00:02:56,300
the discrete random variable X becomes
20 which means, randomness is gone now.

44
00:02:56,300 --> 00:02:59,680
And we have exact,
we have exact value for it, it's 20.

45
00:02:59,680 --> 00:03:03,630
We call that 20 as a realization
of the random variable X.

46
00:03:03,630 --> 00:03:04,830
Same thing for Y.

47
00:03:04,830 --> 00:03:06,023
Y is a continuous random variable.

48
00:03:06,023 --> 00:03:08,625
But say we do the experiment.

49
00:03:08,625 --> 00:03:11,100
Randomness is gone,
now we have a value for it.

50
00:03:11,100 --> 00:03:13,060
Let's say it 30.29.

51
00:03:13,060 --> 00:03:17,766
And then we say 30.29 is a realization
of the Y random variable.

52
00:03:19,740 --> 00:03:24,652
If we have two random variables,
X and Y, we'll learn this notion

53
00:03:24,652 --> 00:03:29,919
called covariance from our probability
class that it somehow measures

54
00:03:29,919 --> 00:03:35,150
the linear dependence between
two random variables, right?

55
00:03:35,150 --> 00:03:37,220
We are talking about this abstractly.

56
00:03:37,220 --> 00:03:40,410
If you have two data sets, covariance will

57
00:03:40,410 --> 00:03:44,900
tell us something about the linear
dependence of the pair, data set.

58
00:03:44,900 --> 00:03:50,380
But right now, we model each of our data
set with a random variable, x and y.

59
00:03:50,380 --> 00:03:53,780
Abstractly, we are defining
covariance of x and y,

60
00:03:53,780 --> 00:03:59,340
using the formal expectation x minus its
expectational Y minus expectational Y.

61
00:03:59,340 --> 00:04:02,030
And to put them together
as an expectation.

62
00:04:02,030 --> 00:04:03,990
And that's defined covariance.

63
00:04:03,990 --> 00:04:09,654
And let me just mention that covariance
of X and Y is covariance of Y(X)

64
00:04:09,654 --> 00:04:15,791
if it's symmetrical.

65
00:04:17,600 --> 00:04:18,880
We talked about random variable but

66
00:04:18,880 --> 00:04:23,090
if you just put a lot of random variable
together and give them a sequence.

67
00:04:23,090 --> 00:04:25,230
For example,
there's the first random variable X1.

68
00:04:25,230 --> 00:04:30,160
The second one, at time one it's X1, and
time two it's X2, at time three it's X3,

69
00:04:30,160 --> 00:04:32,940
and now you have a sequence
of random variables.

70
00:04:32,940 --> 00:04:35,440
We call it a stochastic process

71
00:04:35,440 --> 00:04:39,270
that each one of these random variables
might have their own distribution,

72
00:04:39,270 --> 00:04:44,440
might have its own expectation,
might have its own variations.

73
00:04:44,440 --> 00:04:47,430
But the way to think about
Stochastic process is to

74
00:04:47,430 --> 00:04:50,260
think of it versus deterministic process.

75
00:04:50,260 --> 00:04:52,080
In deterministic processes, for example,

76
00:04:52,080 --> 00:04:55,820
if you ask me solution of
ordinary differential equation.

77
00:04:55,820 --> 00:04:59,878
You start with some point and the solution
of the [INAUDIBLE] will tell you exact

78
00:04:59,878 --> 00:05:03,500
trajectory so you know exactly where
you're going to be the next time,

79
00:05:03,500 --> 00:05:05,898
next time step,
next time step and so forth.

80
00:05:05,898 --> 00:05:10,110
The Stochastic process is
basically opposite of that.

81
00:05:10,110 --> 00:05:12,750
At every step you have some randomness.

82
00:05:12,750 --> 00:05:14,260
You don't know exactly
where you're going to be.

83
00:05:14,260 --> 00:05:17,180
But there are some distribution
of X at that time stamp.

84
00:05:18,220 --> 00:05:20,000
But we don't know exactly
where we're going to be.

85
00:05:20,000 --> 00:05:23,077
So we get some stochastic process.

86
00:05:25,303 --> 00:05:30,605
Now, we are ready to define a time
series in a slightly different way.

87
00:05:30,605 --> 00:05:32,988
Let me remind you our first definition.

88
00:05:32,988 --> 00:05:35,600
What was the time series?

89
00:05:35,600 --> 00:05:41,180
Time series is any dataset but
collected different times.

90
00:05:41,180 --> 00:05:46,331
But now we say,wait a minute,
maybe there is some stochastic

91
00:05:46,331 --> 00:05:52,768
process going on the background they
are not way off which is X1, X2, X3,

92
00:05:52,768 --> 00:05:59,602
and so forth, and the realization of X1
is my first datapoint in the time series,

93
00:05:59,602 --> 00:06:04,685
realization of X2 is my second
datapoint in my time series.

94
00:06:04,685 --> 00:06:10,457
So, 30, 29, 57, and ..., this time series,
that I start with, I am trying to analyze

95
00:06:10,457 --> 00:06:15,930
mainly, it's actually a realization of the
stochastic process going on the back one.

96
00:06:15,930 --> 00:06:18,160
So if I know the stochastic process.

97
00:06:18,160 --> 00:06:23,000
If I know X1, X2, X3, and how it changes,
then I can say something meaningful about

98
00:06:23,000 --> 00:06:27,990
my client series, but
realize the phone X1, X2, X3, and

99
00:06:27,990 --> 00:06:33,180
so forth, the stochastic process might
come with ensemble of realizations,

100
00:06:33,180 --> 00:06:35,480
I mean, it might get its own
ensemble of time series.

101
00:06:35,480 --> 00:06:38,204
But I only have one time series.

102
00:06:38,204 --> 00:06:42,454
By having only one time series,
basically, one point at each time,

103
00:06:42,454 --> 00:06:47,094
you would like to say something
meaningful about the stochastic process.

104
00:06:49,851 --> 00:06:54,564
Autocovariance function is defined,
basically,

105
00:06:54,564 --> 00:07:00,346
just taking covariance of different
elements in our sequence,

106
00:07:00,346 --> 00:07:02,923
in our stochastic process.

107
00:07:02,923 --> 00:07:07,367
If you take Xt and Xs and s and
t might be in different locations and

108
00:07:07,367 --> 00:07:12,140
we'll get the cavariance of them,
we get gamma (s,t) then we call

109
00:07:12,140 --> 00:07:17,078
that covariance and if we take (
x,t) the covariance of (x,t) will

110
00:07:17,078 --> 00:07:21,150
itself of course will get
the variance at that time stand.

111
00:07:22,810 --> 00:07:26,973
Now we are ready to actually define
our autocovariance function which we

112
00:07:26,973 --> 00:07:27,730
call gamma.

113
00:07:29,030 --> 00:07:32,180
Gamma force will only depend on

114
00:07:33,180 --> 00:07:36,830
the kind of difference between
these random variables.

115
00:07:36,830 --> 00:07:40,550
In other words, you don't look at,
for example, random variable xt and

116
00:07:40,550 --> 00:07:43,060
run them wherever xt plus k.

117
00:07:43,060 --> 00:07:45,360
It doesn't matter what t is.

118
00:07:45,360 --> 00:07:51,384
The time difference is k and the time
difference actually decides the nature,

119
00:07:51,384 --> 00:07:54,585
decides the fate of our autocovariance.

120
00:07:54,585 --> 00:07:56,230
And the reason is the following.

121
00:07:56,230 --> 00:08:00,840
We assume you're working with
stationary times series.

122
00:08:00,840 --> 00:08:05,278
Remember in a stationary time series
we said one part of the time series,

123
00:08:05,278 --> 00:08:08,333
the properties of the one
part of the time series,

124
00:08:08,333 --> 00:08:12,220
is same as the properties of
the other parts of the time series.

125
00:08:13,260 --> 00:08:19,630
So in this case if you start at
zero x1 to xk plus 1 or x10,

126
00:08:19,630 --> 00:08:24,480
x10 plus k, it's same different
parts of the time series.

127
00:08:24,480 --> 00:08:28,530
But the sense of we only
have k steps in between.

128
00:08:28,530 --> 00:08:33,930
The properties of these sections of
the time series must be the same.

129
00:08:33,930 --> 00:08:42,300
So the covariance from 4 k plus 1 with
x1 is same as x10 plus k with x10.

130
00:08:42,300 --> 00:08:44,020
And we call that gamma k.

131
00:08:44,020 --> 00:08:46,420
So gamma is our autocovariance function.

132
00:08:46,420 --> 00:08:51,050
Gamma k is going to be called
autocovariance coefficient, but

133
00:08:51,050 --> 00:08:54,510
we usually do not have
the stochastic process, right?

134
00:08:54,510 --> 00:08:58,000
We only have a time series, just
a realization of the stochastic process.

135
00:08:58,000 --> 00:09:01,596
So we're going to use that to
approximate gamma k with Ck,

136
00:09:01,596 --> 00:09:05,048
which we will call
the autocovariance coefficient.

137
00:09:07,376 --> 00:09:08,683
So what have we learned in this lecture?

138
00:09:08,683 --> 00:09:11,793
We have learned the definition
of a stochastic process,

139
00:09:11,793 --> 00:09:14,920
which is collection of random variables.

140
00:09:14,920 --> 00:09:18,320
And you learned how to characterize
time series in slightly different way,

141
00:09:18,320 --> 00:09:23,500
but realizing that it is actually
a realization of a stochastic process.

142
00:09:23,500 --> 00:09:27,230
And we learned how to define our
autocovariance function of a time series.