Welcome back to Practical Times Series Analysis. We're looking in this lecture at the partial autocorrelation function, the PACF. We'll look at some first examples, and we'll see how this function may help us in our modeling process. If you think about it, if you're presented with the time series that you've observed from some natural process, or maybe some data that you've recovered in an experiment in a laboratory, it's not really obvious how you might model this. We've certainly seen MAQ processes, moving average processes of order Q, we've explored AR processes of order P. Very soon we'll move on to for instance modelling that would be seasonal or regressive process even that are also integrated and have moving average proponents. It's really complicated affair, modelling a particular time series as a stochastic processes. We can use all the help we can get and the PACF will help us with that. In this particular video lecture we'll be using the acf function in order to obtain a piece PACF plot. We'll use that plot to determine the likely order of an AR(p) process and to help us estimate coefficients, we'll automate the process with the ar function. Just to review, if you had a second order moving average process as shown in this picture, then if we were to plug the ACF, you'd see after the first typical spike of height 1, but there is a reference for us, that you would see two other significant spikes. So if you know you have a moving average process, determining the order isn't really all that complicated. We just look at the ACF. What are we going to do for an autoregressive process? We've developed formulas, but let's see how we're going to move forward. I'm going to do a quick simulation of a second order process. There are some commands here for instances I'm using the rm command. If you're generating data it's nice as you come in to get rid of all the rays you might have stored. So, that you keep yourself nice and organised, it's like erasing the blackboard. We'll give ourselves a plot with three rows and one column. I'm going to generate data on an AR2 process and 0.6 and 0.2 as coefficients will give us a nice stationary process. Once we've generated our data with arima.sim, we'll look at three characteristic plots. It's nice to get into the habit of just producing these plots all the time. So we'll produce a plot of the time series, we'll produce a plot of the ACF, and we'll also now introduce the partial autocorrelation function by giving the ACF routine the flag, type equals partial. Let's see what we get. We have a fine looking autoregressive process up on top, nice stationary process. It's hard to imagine how you would look at this particular autocorrelation function and determine the order of the process. Now, we generated our data so we know it's a second order process. Let's take a look at the PACF plot. There is no spike there at time zero that you can see, but at lag of 1 and 2, we have statistically significant spikes. Now, this is just off of one time series, might be a coincidence. So, let's simulate an autoregressive process for order of 3. Again, I'm choosing these coefficients basically at random. I'm just making sure that we have the stationary process that's what's really important for us. We will produce our data and at this point, you should be able to produce your plots with no problem. And what are we going to see here? There's our process up on top, the auto correlation function. Again, I don't really know how I'd look at that auto correlation function off set the data like this, and make a real accurate determination of the order of the process. But when I look down here to the third plot I can see that there are three statistically significant spikes. So an AR process of order 3 is producing a PACF with the three significant spikes. Again we haven't even defined the PACF yet. We're just trying to have a little fun, develop some nice visual interpretations and in the next lecture we'll be much more formal. It's good to produce your own data and I sincerely hope that you're going to produce data sets, 10 or 20 of them, and keep changing your coefficients, keep changing the order of the process, and seeing what you get in a PACF. It's good to own these things, to really feel very comfortable with the process. But it's also good to look at some classic data sets. So the beverage wheat price data set is fairly famous. You can obtain it as I did from the time series data library. I've given you a link here and in the reading. Of course, when you download from the site, you're going to have a text file with a lot of information up on top. That's all very useful, but in order to bring the data into R, we're going to have to edit that file a little bit. The data we're originally presented in a paper by Beveridge called Weather and Harvest Cycles. Subsequent authors, they have had some issues with how Beveridge analyzed this data. That's not really the topic of this lecture. What we're going to do is present the Beveridge data, show you how he analyzed it, and then try to determine the order of the process. So, using the read.table command, we'll bring the Beveridge data into R. We'll create a time series by extracting the second column, telling it we're starting at the year 1500. So this really is a fairly extensive dataset. We'll do some plotting. And, what Beveridge did is he created a moving average. We're going to do this with the filter command, and we're going to grab 31 data points 15 up and 15 down. So we'll surround our particular data point at any time along the representation with 31 data points and we're going to put that on the same graph. We'll superimpose in red the moving average process. You can see that the MA process here does a very nice job of getting rid of some of these fluctuations and it does plot along with the trend that seems apparent in the data. Beveridge's idea was to take the original data and at each data point do some scaling by the filtered data set. He's hoping to create a stationary process by doing this. Again, we're not here to critique Beveridge, we're here to move along with him, try to obtain the same data set that he was analyzing and then look at the PACF associated with that data set. So what are we going to do? We're going to take the Beveridge time series and we'll scale at each point by the moving average process. There's going to be some data at the very beginning and at the very end of the process that won't have any data points represented in Y. It's going to give us NA. So, we're going to have to deal with that in just a moment. We will do a plot. We're going to plot the scale price data, we'll produce an acf and we have to tell it to get rid of the data points that aren't meaningful. So we'll do that with the command na.omit(Y) then we'll plot the acf, we'll also plot the PACF What do you think? We've produced the process up here with a simple scaling that perhaps we've introduced some seasonality, that's sort of a different issue for us. But it looks relatively stationary. When we look at the autocorrelation function we something interesting. But it's the PACF that we really want to draw your attention to. The PCF seems to have two significant spikes. Let's see if we can find other evidence that we might have a second order process on our hands. There's a command called AR which will generate the coefficients in an auto-regressive process. In other words, given a time series it's going to use the time series to estimate the coefficients in the corresponding stochastic process. We'll let the order.max be 5, so ar is free to look for stochastic processes where the parameter for the autoregressive process is up to 5. When we look at the printout, we'll see that ar has come up with a second-order model that's consistent with the picture we saw just a moment ago. And the coefficients we obtained, are really fairly close to the ones Beveridge obtained in his paper. The order that AR has selected then is 2, consistent with our elementary analysis of the PACF plan. Not to sate the obvious but, an AR(p) process will have a PACF that cuts off after p lags. This can be a tremendous asset to us as we look to model time series data. In this lecture, we've used the autocorrelation function to obtain a PACF plot. We've used that plot then to determine the likely order of an AR(p) process. And we've also learned that the ar() function can estimate coefficients for us automatically. In the next lecture, we'll start developing the PACF in a more fundamental and a more theoretical way.