Welcome back to practical times series analysis. In this set of lectures, we're reviewing some basic statistical concepts with the focus on linear regression. In the preceding lecture, we saw how we plot time series data especially when it comes to us as a time series object. We thought a little bit about ordinary lease squares and how to fit a straight line to a set of data. We move forward in this video by assessing the normality of a data set. If you recall, some of the standard assumptions in regression which are more important when we start discussing inferential techniques like hypothesis test and confidence interval. Some distributional assumptions would be that your errors are normally distributed with constant variance, mean of zero. And also that they are independent. When we plot a straight line as we did with the carbon dioxide data available to SS CO2 in R. Then with a couple quick calls, we were able to put the AD line on our time series plot and some things are immediately obvious. The data themselves are exhibiting an oscillatory trends, some seasonality. But also there's a departure from straight line over on the left and over on the right. There maybe some curvature to this data. Perhaps the straight line isn't the best model. When we look at a set of plots here on the residuals, we'll be able to assess normality. A simple plotting command that we should all know is we're going to set some parameters for our plot. We're going to set it up, organize on row, with one row and three columns of figures. We inseregate our model with the command resid. We credited our linear model in the last lecture and now we store the result in CO2.residuals. So we've created an array of our residuals. The deviations from the actual measured data points. We would call those y sub i and the fitted data points, we would call those y sub i hat. As we look at the histogram, we can see that it's roughly symmetric and mount shaped. But it seems to depart from a normal distribution, especially as I observe this one as I look at the tails. We would like a somewhat less subjective way of looking at this. Also, if you have an abundance of data, you have hundreds of data points. A histogram is a valid approach for looking at structure your data. If you only have 10 or 15 data points, a histogram not the best way to go, we could we could probably do better. In particular if we're assessing normality, we can do what's called a normal probability plot and that's shown in the center figure right here. The fastest way to think about normal probability plot is that is a plot prepared by a software, R in our case. Invoke with a command qqnorm called on our array and I put a title on the plot here. But when I plot a qq plot for an over probability plot. If our residuals are normally distributed, we would expect to see most of our data essentially looking linear, like a straight line you could fit on this plot. Here, I see some systematic departures from the lower and upper tail and so it lets me question the normal assumption a bit. Digging a little bit deeper, what a normal probability plot is going to do is to say. if I have a certain data set, a certain number of points, where would I expect, especially if I standardize, subtract off the mean and divide by the standard deviation. Where would I expect to see the first residual? Where would I expect to see second residual in a dataset of a certain size of through the last one? I can pair that to where I actually have my first residual, the first data point that I'm assessing normality on. And were really was the second is it where you expected if it was coming from a normal distribution? Or are you systematically away? We would expect to see a little scatter in [INAUDIBLE] but here this seems more systematic than random. So the residuals here seem to be roughly normally distributed but not exactly normally distributed. Some of our assumptions are robust to violating that assumption. And so one might not worry too much in a large data set with a plot like this. When we look at the third plot, we've plotted our residuals on time. In linear regression, this is a very common fundamental plot that people do. They have a model. They look at departures from the model assumptions through the residuals. And we look at them in time to see if any patterns emerge. Here, there's a very obvious pattern, that our residuals are higher than we'd expect on the left and on the right. So in other words, our data points are systematically above the straight line on the left and on the right, that's the curvature we were talking about. It's hard on such a small tight plot to get a sense of the oscillatory nature of our data. So what I've done on the next plot is to zoom in on our residuals. Instead of looking across a few decades, now we're just looking across a few years. At this point, it would be very hard to convince anybody that your residuals were independent. There's an apparent time structure in your residuals. From a linear regression sense, that might not be desirable but we're plotting time series data. For us, the structure of these residuals is actually quite interesting. In this lecture, in addition to reviewing some basic concepts from linear regression. We learned how to assess the normality of residuals with qq plot or a normal probability plot. As we move forward, we'll review some more concepts from linear regression and review some additional concepts in statistical inference.