In this video, we begin to work with packages in R. We can think of a package as data sets, together with ways to operate on those data sets' functions. In this particular video, we'll try to understand what a package is, and begin being productive and useful with packages. We'd like to be able to download a package and start accessing the data via the functions. I'm going to take a moment and pull up R. And you can see that packages are integral to using R. It has a button right there. We're going to install a package. I'm selecting a mirror. I'm parked in the US, so I'll grab one of our mirrors. And in just a moment, a set of packages presents itself. [SOUND] So as you can see, there are very many packages available to us, greatly extending the functionality and reach of R. I'm looking for faraway right now. I'll select it, say OK, and now R is taking a moment to download and install the faraway package. For some very simple ways to access data in this particular package, we'll look at a couple commands. First, we'll look at the data command. I'll take a moment and clear the screen. If you type data just by itself, and you spell it correctly, you can see that the base R installation has a number of data sets available. If we're particular, now, to our package, And I'd like single quotes around the name of the package, And evidently, one parenthesis, you can see that faraway has made a number of data sets available to us. The data set that we'll be dealing with in this particular lecture is coagulation, looking at blood coagulation time, in seconds, for animals fed a variety of diets. If I type data:coagulation.package='far away', We bring this particular data set available to us. The ls command shows you that coagulation is right there. If we look at coagulation, just by typing the name of the data set on the screen, you can see that it's stored as 24 cases, where for each case, each animal, in this case, we look at coagulation time together with diet listed. If we would like to get a quick numerical summary of our data, we could type summary(coagulation). And that gives the popular five number summary, minimum through maximum, with each of the quartiles represented. This is for all 24 data points, or 24 cases, listed together. You can see that there are 24 animals here. And they're disaggregated by diet, or rather, the frequencies for diet are available to us right here. If I were to just naively plot coagulation, I would obtain a plot which I don't find very useful. I do see the coagulation times separated out with diet here, but diet isn't really a numerical variable. It's more of a qualitative variable. So instead of plotting like that, I'm going to plot coagulation on diet. This is probably a more intuitive plot for us. It shows you a box plot, not for all that data aggregated together, but rather, spread out by diet. There are four diets in play, so we have four of these box plots. And one would, naively, quickly make an assumption that diets A and D are operating, somehow, similarly, with B and C, perhaps, increasing coagulation time. But that's just a quick intuition based on the picture. We would never draw any conclusions without doing a formal statistical test first. In this video, we've learned that R has environments, called packages, data, together with methods. And we've been able to download at least one of the packages. And you'll develop some facility for downloading other packages through some of the quizzes and readings.