Welcome back to practical
time series analysis. In this lecture, we discuss
the important concept of stationarity. We'll give you intuition into
why stationarity is so important when we try to infer the properties
of a process off of observed data. And also give a mathematical definition so
we can move forward in a structured way. When you're done with this lecture, you
should be able to explain to a friend or a colleague why stationarity is so important when we try to predict
the properties of a process or infer the properties of a process from
a time series that we've acquired. You should also be able to calculate
the mean, the variance and the covariance function in
a few very simple cases. Now before looking at the formal
definition of a stochastic process, let's just think about a very common,
easy-to-understand situation. I could toss a coin four times, and write
the outcomes down on a piece of paper. Perhaps I would get h, t, t,
t, heads, tails, tails, tails. That's a time series. It's a set of observations. You could also try to model
that situation mathematically by lining up four
Bernoulli random variables. Remember a Bernoulli random variable as
a success or a failure and a probability. So we've got these four random variables. We could also talk about how
they're related to each other. For the coin toss example
we are discussing, I would imagine that
they are all independent. So we're making a statement
about the structure as well. How the random variables
are related to each other. In addition to how they're
individually characterized. Stochastic process will do the same thing. We'll look at a set of random variables,
maybe there are four of them, maybe there are a million, maybe there's a countable
infinity, maybe an uncountable infinity. It could really be quite complicated. But for each one of the random
variables along our process, we've indexed let's say with time. For each of the random
variables along our process, we understand the nature
of the random variable but we also understand how the random
variables are related to one another. Now discrete process could be used to
model something like high temperatures, daily high temperatures in Australia. You could also have a continuous process. So we're looking at the index now and discussing the index is either discrete or
continuous. Commonly encountered continuous process
would be the Wiener Process and people use that to study Brownian Motion. The graph on the left
looks like a total mess. Don't worry about that one for the moment. Even though the situation is easier
look at the graph on the right, let me pull that back. We have random walks
with Gaussian increments. We've discussed random walks before. The idea here is that we park ourselves
at some initial position, and then we'll move to the left or
the right randomly. And the random variable guiding
our motion, in this case, is a Gaussian random variable. But I could just as easily have had
a coin toss and moved to the right or the left by one step depending
upon whether I got heads or tails. This is just a little bit more
complicated in that our times or that our step size is determined
off a Gaussian distribution. I obtained these four
different realizations. I'm thinking of these as
four different time series. But those four time series all come
from the same stochastic process. The reason the graph on
the left looks like such a mess is that there's no real
structure going on at all. I'm thinking about a simple random sample. The kind of situation you would have dealt
with in elementary statistics class, where the random variables
are independent of one another. So let's say you
are measuring temperatures. So the temperature you get from the fifth
person is totally independent of the temperature that you'd get from the
60th or up to the 1,000th in this case. The inner variables are independent, they are identically distributed and
so no real structure. And so when I graph the four,
this is just four realizations, four time series on the same axis
there's just as we said a total mess. Now stochastic process really is
a very complicated mathematical thing. In your elements or stats course you might have dealt
with bivariate normal distributions. And I could, perhaps, do the integrations
by hand in some very simple cases. But a stochastic process isn't necessarily
two or three or 20 random variables. You might have an infinite
number of random variables. To fully specify the structure there, you need the joint distribution of
the full set of random variables. That could be very difficult to work with. We also usually just have a set of data,
a time series. Some data that we have gone out and
acquired. And we'll try to understand
the properties of the stochastic process off of this particular time series. How can we do that sort of inference? As we get started, and we'll introduce
stationarity in just a moment, we should review the concepts of
mean function and variance function. Now since your time series is
an indexed set of, I'm sorry, your stochastic process is an indexed set
of random variables, let's assume for the moment each one of those random
variables has a mean and a variance. We can use that to create a mean function
so as I move along the stochastic process I observe what the average is for the
random variable at any individual time. The mean function we'll write as mu of t the variance function we'll
write as sigma squared of t. In this little table I'm implicitly
assuming I have a discrete stochastic process, and
we're writing out the expected value and the variance as we move through for
each of our random variables. And you can do a graph or
plot of mean as a function of index, or variant as a function of index. We can also talk about the relationship. Let's look at a very simple
case of white noise. The mean function there mu of t, I think you can see if that
would be a constant function. So when we have white noise, we have independent identically
distributed random variables. If they're identically distributed,
then the mean function, which is be constant as we move along. We're trying to summarize the random
variables at different time locations with same mean and variance. If you have Independent identically
distributed random variables the other covariance function will
look like a delta function. Our autocovariance function when the
random variables, the separation is zero, we're going to be just look at
the variance we'll get a head. But as soon as we separate the two random
variables and look at two different times, since we're independent
the covariance will be zero. Now an important question comes up. Since you don't have typically
many realizations in front of you. You just have one realization. Think about one of the trajectories
with that random walk. Since you only have one
realization in front of you, how are you going to infer
the properties of the process? Each one of the random variables
along your time series is only giving you an individual point. So if I have a population and I just have
a sample size n equals one, I really can't say anything about the variance and I can
say very meager things about the mean. So the question is,
if you have a time series, in other words a realization
of a stochastic process, how can you infer properties of
the process from that single realization? If we introduce some structure,
we can get some traction. So let's talk about a process and
say that it's strictly stationary, if the joint distribution of
a set of random variables. Here I have k of them, will be the same no matter where you look
along the time series as long as each one of the new random variables is just a
shifted copy of the old random variables. So park your self anywhere you'd
like along the time series and look at the set of random variables. Preserve the spacing between them,
but now look far to the left or far to the right along
the stochastic process. You'll get the same joint distribution
if your process is strictly stationary. It's a very restrictive thing to say. But it has some implications
that work well for us. If you are strictly stationary then
if we only look at one of them, let's let k equal one. The distribution of any random variable
along our stochastic process is the same as the distribution
shifted by whatever amount we like. What that means is the random
variables are identically distributed. They might not be independent and in fact
if they're identically distributed and independent then we don't have
a very interesting process at all. The mean function though, since we get
identically distributed random variables the mean function will be a constant and
the variance function will be a constant. That has implications for estimation. We can use each one of the data points we
have available to us in the times series to try to estimate
the mean of the process. Same thing with the variance. What are the implications for
the autocovariance? If we look at joint distribution
of two rated variables, t1 and t2, that'll be the same as the joint
distribution if I look up or down on the stochastic process. So if I shift to the left or
the right by distance tau. What that's telling us is that the joint
distribution of two random variables depends only on the lag spacing and
not where you are on the random process. So your autocovariance
function isn't constant. But the autocovariance just depends upon there's a separation between
the two random variables. No matter where you look, to the left or
the right on the distribution, autocovariance only
depends upon separation. Now strict stationarity does
a lot of work for us but it's a pretty restrictive concept. We can get the same
sort of things done for us if we relax a little bit,
and view weak stationarity. So process is weakly stationary if
we keep all of the things that we really care about from
a strictly stationary process. What I'm saying is,
a process is weakly stationary. If the mean function depends not on
where you look along the process but rather is constant. So we have constant average up and
down the process. We'll also say weakly stationary if the autocovariance function
just depends upon lag spacing. So implications from strict stationarity
we're using within the definition of weak stationarity, keeping what we want. Of course, if your autocovariance
function just depends upon lag spacing, then length of lag equals zero, and get immediately that it's a constant
variance function as well. Much easier to think about, much easier
to state but still very useful for us. In this lecture,
we have learned why stationary is so crucial in forming a model from data. It helps us to infer
properties of the process, often individual realization or
an individual time series. We also learned the definition of the mean
variance and covariance functions. And you should now be able to calculate
that in a few simple situations.