Welcome back to practical time series analysis. In this lecture, we look at the Akaike Information Criterion. As a way of figuring out the quality of a model, assessing the quality of a model, there's an interesting issue that comes and supply for us. At this point, you know that if you have an autoregressive model or moving average model, we have techniques available to us to estimate the coefficients of those models. We'll, of course, usually do that with software rather than write the code by ourselves, rather than code it by hand, but if I told you that you had a third order autoregressive model, pretty trivially you could tell me what good estimates are for those coefficients. A more fundamental question, perhaps, is what is the order of the model? In general, you won't know. You don't usually have the kind of intimate process knowledge that will tell you the order of the model. And you're going to try to infer the order from the data set that you have in front of you. So as we look here in this lecture, we'll try to measure the quality of the model with the Akaike Information Criterion. I suppose I could keep apologizing for my pronunciation, I'm just doing the best I can. We'll try to measure the quality of model with the AIC and the SSE, the sum of the squares of the Earth. And you should be able to do that for a time series that you find interesting after this lecture. You should also be able to explain to a friend or a colleague in fairly simple terms what it is that you're doing. So here's our first example we'll look at. The Loblolly data set is one that we enjoy working with. And if you look at these tree heights as a function of age here, a couple things I think are immediately apparent. One is, there is certainly a trend in this data set. As time goes on, the trees get taller and taller. That's probably not very surprising. As I look through here, too, I do see that there is heteroscedasticity in the sense that the variability is increasing as I move from the left to the right. We won't worry about that in this particular example. Right now, we're just going to do a very simple thing. Given these data, we'll fit a straight line to the data. Now, the green line here, you can see, is doing an okay job capturing the trend. But I think we could do better. If you fit, instead of a first order polynomial, a straight line. If you fit a second order polynomial, here we've got a concave down parabola, then we're doing a much better job of capturing the centers of each of these little parts of the data set as we move from left to right. It's a little surprising though if you look at our numerical measures of quality. Multiple R-squared coefficient determination, you'll remember, is described by many people is the amount of variability that we can explain through our model. It's up on the straight line at around 98%. It's true. So we were from a straight line to a parabola. Multiplying our square does increase up to 99.3%. But still, that different is probably not as profound as you might think, given the visual evidence that we have over here on the left. We talked previously about Adjusted R-squared values, as well. Those curves are analogous to the AIC, in the sense that the Adjusted R-squared makes you pay a penalty as you bring higher order terms into your model. As I move from a first to second order model, I can definitely justify that. As I move from a second to third order model, you should take the data set Loblolly and see if you believe that the enhancement in variability explained is worth the third order term. Now, the data sets that we find in nature tend to be rather messy and complicated. What we're going to do in this lecture is a simulation, and so we will simulate a second order aggressive process. We'll have 2,000 data points so we have a fairly large number of data points we should be able to do our estimation pretty well. We'll do the usual things, we'll take a look at our data set, we'll also look at the ACF and the partial ACF. So in our simulated data, the ACF seems to trail off, the PACF seems to exhibit two significant terms there. This seems rather consistent with the second order, order regressive process that we have. But the ACF and the PACF are probably really not enough by themselves to really establish what sort of model you have. We're going to look for less objective numerical measures of quality. So just to review, if you want to estimate the coefficients on a model and you know the order, then you can make a call to the arima command. It's going to give us an ar1 and ar2 coefficient set as 0.7 and a little bit and negative 0.2 over here. We're doing a pretty good job at our estimation. But again, that's just because we knew, because we generated the data, that it was a second order data set. In general, you won't know what the order of your data set is, and so you'll have to make a guess over here, as to your P value. What we'll try to do with the AAC is make that an educated guess. Down here, you can see that arima, which is a pretty generic call is going to give you somethings that it thinks are just mission critical, things that you should know. One of those, right there is the aic. So this is a fairly standard measure the people use when they're comparing model quality. What I've done in this table is nothing complicated, you could reproduce this yourself. I've gone through and let p equal 1, first order regressive model. Let p equal 2, ,4 and 5, all the way up through fifth order. And with each of these increasingly complex models, we have estimated our coefficients. No huge surprises as we look at those and we've also given in the table the AIC and the more prosaic SSC some squares of the areas. It looks like as I move from the first order to the second order model, I get a pretty good drop in these terms. As I move from a second to a third, third to a fourth, I'm not seeing as much difference. What I'll do on the next slide is give you a picture. So as I look here, the SSE which is pretty obvious and well understood by us at this point, takes a good drop as they go from a first order to a second order model. And then, it pretty much stays the same. There's a little bit of variability there but the big payoff is as you move from a first to a second order model. The AIC picture is telling us pretty much the same story. As we move from a first to a second order model we get a pretty good drop off. As we move out to a fifth order model, it's true that the AIC here is somewhat less than on these others. But you have to ask yourself whether you think that that very small diminution in AIC is really worth the added complexity of the higher order model. Based upon these scree plots here, I would definitely go with a second order model. So what does the AIC try to do? It's rather common as you're getting the feeling at this point for sure, to give credit for models that are going to reduce some sort of aggregate error. And we'll use the error sums of squared as the handiest, most common example. But we'd also like to build in some kind of penalty when out models start bringing in more and more terms. Formal definition of the AIC is going to have two terms in it. In a course in probability, you may have talked about maximum likelihood estimation. In this lecture, we're not going to dive to deep into this first term. But just keep, as people say, as you're flying at 30,000 feet, keep the basic idea in your head that you want something which is going to give you credit for reducing some sort of aggregate error, but make you pay some kind of penalty for the number of parameters in the model. So you’ll see different versions in different textbooks, software has different implementations, the AIC, a very simple version is, let's go take the log of the estimated variability. That'll be a term that for a good model with low SSE will be low and then we'll delve in another term here where n is your sample size, that's unchanging but as your number parameters increases, you pay penalty through this term. The SSE, just to review, is very similar, but it doesn't make you pay a penalty. Pick a potential value of your order, figure model, and look at the aggregated error. So we've done that here. For our arima model, we've explored with various values of p. We will produce the output Here in m. And then, we'll interrogate our model here through the command resid. So we'll pull off the errors here with resid(m), we'll square them and then aggregate them by adding them. At this point, you should be able to use the AIC to measure the quality of a model. Especially when you have several competing models, and you're trying to establish the order of your process. And you should be able to describe in fairly causal terms to a friend or colleague what it is that you're doing.