Welcome back to practical time series analysis. In this lecture, we discuss the important concept of stationarity. We'll give you intuition into why stationarity is so important when we try to infer the properties of a process off of observed data. And also give a mathematical definition so we can move forward in a structured way. When you're done with this lecture, you should be able to explain to a friend or a colleague why stationarity is so important when we try to predict the properties of a process or infer the properties of a process from a time series that we've acquired. You should also be able to calculate the mean, the variance and the covariance function in a few very simple cases. Now before looking at the formal definition of a stochastic process, let's just think about a very common, easy-to-understand situation. I could toss a coin four times, and write the outcomes down on a piece of paper. Perhaps I would get h, t, t, t, heads, tails, tails, tails. That's a time series. It's a set of observations. You could also try to model that situation mathematically by lining up four Bernoulli random variables. Remember a Bernoulli random variable as a success or a failure and a probability. So we've got these four random variables. We could also talk about how they're related to each other. For the coin toss example we are discussing, I would imagine that they are all independent. So we're making a statement about the structure as well. How the random variables are related to each other. In addition to how they're individually characterized. Stochastic process will do the same thing. We'll look at a set of random variables, maybe there are four of them, maybe there are a million, maybe there's a countable infinity, maybe an uncountable infinity. It could really be quite complicated. But for each one of the random variables along our process, we've indexed let's say with time. For each of the random variables along our process, we understand the nature of the random variable but we also understand how the random variables are related to one another. Now discrete process could be used to model something like high temperatures, daily high temperatures in Australia. You could also have a continuous process. So we're looking at the index now and discussing the index is either discrete or continuous. Commonly encountered continuous process would be the Wiener Process and people use that to study Brownian Motion. The graph on the left looks like a total mess. Don't worry about that one for the moment. Even though the situation is easier look at the graph on the right, let me pull that back. We have random walks with Gaussian increments. We've discussed random walks before. The idea here is that we park ourselves at some initial position, and then we'll move to the left or the right randomly. And the random variable guiding our motion, in this case, is a Gaussian random variable. But I could just as easily have had a coin toss and moved to the right or the left by one step depending upon whether I got heads or tails. This is just a little bit more complicated in that our times or that our step size is determined off a Gaussian distribution. I obtained these four different realizations. I'm thinking of these as four different time series. But those four time series all come from the same stochastic process. The reason the graph on the left looks like such a mess is that there's no real structure going on at all. I'm thinking about a simple random sample. The kind of situation you would have dealt with in elementary statistics class, where the random variables are independent of one another. So let's say you are measuring temperatures. So the temperature you get from the fifth person is totally independent of the temperature that you'd get from the 60th or up to the 1,000th in this case. The inner variables are independent, they are identically distributed and so no real structure. And so when I graph the four, this is just four realizations, four time series on the same axis there's just as we said a total mess. Now stochastic process really is a very complicated mathematical thing. In your elements or stats course you might have dealt with bivariate normal distributions. And I could, perhaps, do the integrations by hand in some very simple cases. But a stochastic process isn't necessarily two or three or 20 random variables. You might have an infinite number of random variables. To fully specify the structure there, you need the joint distribution of the full set of random variables. That could be very difficult to work with. We also usually just have a set of data, a time series. Some data that we have gone out and acquired. And we'll try to understand the properties of the stochastic process off of this particular time series. How can we do that sort of inference? As we get started, and we'll introduce stationarity in just a moment, we should review the concepts of mean function and variance function. Now since your time series is an indexed set of, I'm sorry, your stochastic process is an indexed set of random variables, let's assume for the moment each one of those random variables has a mean and a variance. We can use that to create a mean function so as I move along the stochastic process I observe what the average is for the random variable at any individual time. The mean function we'll write as mu of t the variance function we'll write as sigma squared of t. In this little table I'm implicitly assuming I have a discrete stochastic process, and we're writing out the expected value and the variance as we move through for each of our random variables. And you can do a graph or plot of mean as a function of index, or variant as a function of index. We can also talk about the relationship. Let's look at a very simple case of white noise. The mean function there mu of t, I think you can see if that would be a constant function. So when we have white noise, we have independent identically distributed random variables. If they're identically distributed, then the mean function, which is be constant as we move along. We're trying to summarize the random variables at different time locations with same mean and variance. If you have Independent identically distributed random variables the other covariance function will look like a delta function. Our autocovariance function when the random variables, the separation is zero, we're going to be just look at the variance we'll get a head. But as soon as we separate the two random variables and look at two different times, since we're independent the covariance will be zero. Now an important question comes up. Since you don't have typically many realizations in front of you. You just have one realization. Think about one of the trajectories with that random walk. Since you only have one realization in front of you, how are you going to infer the properties of the process? Each one of the random variables along your time series is only giving you an individual point. So if I have a population and I just have a sample size n equals one, I really can't say anything about the variance and I can say very meager things about the mean. So the question is, if you have a time series, in other words a realization of a stochastic process, how can you infer properties of the process from that single realization? If we introduce some structure, we can get some traction. So let's talk about a process and say that it's strictly stationary, if the joint distribution of a set of random variables. Here I have k of them, will be the same no matter where you look along the time series as long as each one of the new random variables is just a shifted copy of the old random variables. So park your self anywhere you'd like along the time series and look at the set of random variables. Preserve the spacing between them, but now look far to the left or far to the right along the stochastic process. You'll get the same joint distribution if your process is strictly stationary. It's a very restrictive thing to say. But it has some implications that work well for us. If you are strictly stationary then if we only look at one of them, let's let k equal one. The distribution of any random variable along our stochastic process is the same as the distribution shifted by whatever amount we like. What that means is the random variables are identically distributed. They might not be independent and in fact if they're identically distributed and independent then we don't have a very interesting process at all. The mean function though, since we get identically distributed random variables the mean function will be a constant and the variance function will be a constant. That has implications for estimation. We can use each one of the data points we have available to us in the times series to try to estimate the mean of the process. Same thing with the variance. What are the implications for the autocovariance? If we look at joint distribution of two rated variables, t1 and t2, that'll be the same as the joint distribution if I look up or down on the stochastic process. So if I shift to the left or the right by distance tau. What that's telling us is that the joint distribution of two random variables depends only on the lag spacing and not where you are on the random process. So your autocovariance function isn't constant. But the autocovariance just depends upon there's a separation between the two random variables. No matter where you look, to the left or the right on the distribution, autocovariance only depends upon separation. Now strict stationarity does a lot of work for us but it's a pretty restrictive concept. We can get the same sort of things done for us if we relax a little bit, and view weak stationarity. So process is weakly stationary if we keep all of the things that we really care about from a strictly stationary process. What I'm saying is, a process is weakly stationary. If the mean function depends not on where you look along the process but rather is constant. So we have constant average up and down the process. We'll also say weakly stationary if the autocovariance function just depends upon lag spacing. So implications from strict stationarity we're using within the definition of weak stationarity, keeping what we want. Of course, if your autocovariance function just depends upon lag spacing, then length of lag equals zero, and get immediately that it's a constant variance function as well. Much easier to think about, much easier to state but still very useful for us. In this lecture, we have learned why stationary is so crucial in forming a model from data. It helps us to infer properties of the process, often individual realization or an individual time series. We also learned the definition of the mean variance and covariance functions. And you should now be able to calculate that in a few simple situations.