Welcome to practical time series analysis. In these introductory lectures, we're reviewing some basic statistical concepts. In this particular lecture, we'll look at linear regression or ordinary least squares as people call it. In this lecture, we'll learn how to plot time series data and we'll learn how to fit a linear model to a set of ordered pairs. If your background in statistics is strong, you can move very quickly through these lectures. R comes complete with a variety of datasets. If you'd like a little narrative summary explaining your dataset you can use the help command. We've done that here. We looked at the command help on the CO2 dataset and I've put some of the output right here for you, atmospheric concentrations of carbon dioxide over a set of years. When we plot our data set, you can see that a linear model is not going to capture all of the interesting behavior in this data set. First of all, there is a rising trend to the set of data but it doesn't look like a straight line is really our best object for this dataset as far as trend goes. Even worse, there is this oscillatory piece that a straight line is just not going to capture it all. The idea behind linear regression is that you have a response variable, here that will be carbon dioxide concentration, and you feel that it depends at least somewhat on the explanatory variable in a linear way. Our particular dataset shows the deficiencies of this approach and in fact our time series cores allow us to move beyond the simple linear regression to more sophisticated techniques. The response variable is thought to be a linear model plus some noise. This noise term does a lot of work for us. If you just want to fit a straight line to a set of data, that's fine. The ordinary least squares approach will always do that for you. If you want to start drawing inferences though, you need to invoke some distributional assumptions and this error term is our way of doing that. The error term can come about in a variety of ways. You might have measurement error, you might not have all of the important variables in your model. There are a number of ways that we can produce error in a model. Now, if you want to do some inference there are some, what I would think of as, vanilla assumptions. The errors in the simplest case would be normally distributed in an average zero. They'd have the same variance. And when you do regression in a more mature way than we'll do in this lecture you learn how to critique these assumptions. Also, we'll assume that the errors in simple ordinary least squares will assume that the errors are independent. Now that in a time series course makes the modeling a little bit boring. But again, we're just starting here. The basic idea behind ordinary least squares is to get your observed data point and compare it to what your choice of slope and intercept would predict. So I can throw any numbers I like in for a slope and intercept and that'll give me a prediction. If I look at what I've observed and compare it to that prediction then what we're going to do is square our terms and come up with an aggregate error. The idea behind ordinary least squares of course, is that we'll make this aggregate error as small as mathematically possible. It really only takes a little bit of calculus in order to do this. Rather than work with the calculations by hand, we'll let R do the calculation for us. The LM, command, linear model command, will take CO2, it knows to come into your time series and extract the variable of interest here, CO2 concentrations, will take CO2 on. Now the CO2 time series has a time part to it. You can think of it as response together with time and in order to extract the time part we'll use the little command here called time. I've put parentheses around this line in order to have the output appear on the screen and I've copied and pasted in this slide for you. You can see that the intercept, the best intercept is something like negative 2000 and the best slope is 1.3 or so. Now take this number here with a little bit of caution. We're not saying that at time zero, the intercept, the carbon dioxide concentration would be negative 2000. That's sort of a meaningless thing to say. But given our dataset, the best intercept for that cloud, that scatterplot, really would be negative 2000. We will not extrapolate back that far. Our model utility would have broken down long before then. If you'd like to plot your line and include the data I've just reproduced the plot command here. And I'm going to use the command now a b line. So this is intercept slope line. So I'll do that on the model that we developed. If you do that then you'll see your original data set increasing in time, though a straight line is not the best model there probably even to capture the trend, increasing in time but also with an oscillatory part. In the next lecture we'll look at our errors and try to say something meaningful about the errors. But for right now we've been able to fit the best, arithmetically best, straight line to this dataset. At this point given a set of x and y values you should be able to plot your data and fit a linear model to your set of ordered pairs. We will critique the modeling process in the next lecture.