LAB #1: DESCRIPTIVE STATISTICS WITH R

Similar documents
LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT

LAB #6: DATA HANDING AND MANIPULATION

IN-CLASS EXERCISE: INTRODUCTION TO R

An Introduction to R- Programming

An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences. Scott C Merrill. September 5 th, 2012

Introduction to Minitab 1

Introduction to R Commander

Depending on the computer you find yourself in front of, here s what you ll need to do to open SPSS.

Lesson 3 Transcript: Part 1 of 2 - Tools & Scripting

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

R Commander Tutorial

Solution to Tumor growth in mice

Tableau Advanced Training. Student Guide April x. For Evaluation Only

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software.

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA

CHAPTER 1 COPYRIGHTED MATERIAL. Finding Your Way in the Inventor Interface

Applied Calculus. Lab 1: An Introduction to R

Computer lab 2 Course: Introduction to R for Biologists

CMSC/BIOL 361: Emergence Cellular Automata: Introduction to NetLogo

Microsoft Excel 2010 Training. Excel 2010 Basics

Practical 2: Using Minitab (not assessed, for practice only!)

Intellicus Enterprise Reporting and BI Platform

Chapter 2 The SAS Environment

What s New in Spotfire DXP 1.1. Spotfire Product Management January 2007

Lab 1: Introduction, Plotting, Data manipulation

Statistics Lecture 6. Looking at data one variable

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26

Introduction to R and R-Studio Toy Program #1 R Essentials. This illustration Assumes that You Have Installed R and R-Studio

Statistical Software Camp: Introduction to R

Statistics 528: Minitab Handout 1

GIS LAB 1. Basic GIS Operations with ArcGIS. Calculating Stream Lengths and Watershed Areas.

Exploring Data. This guide describes the facilities in SPM to gain initial insights about a dataset by viewing and generating descriptive statistics.

CHAPTER 6. The Normal Probability Distribution

Lab 1. Introduction to R & SAS. R is free, open-source software. Get it here:

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data

Matlab for FMRI Module 1: the basics Instructor: Luis Hernandez-Garcia

Once you define a new command, you can try it out by entering the command in IDLE:

Excel 2. Module 3 Advanced Charts

Practical 2: Plotting

Creating a data file and entering data

Interface. 2. Interface Adobe InDesign CS2 H O T

Table of Contents (As covered from textbook)

POL 345: Quantitative Analysis and Politics

An Introduction to R 1.3 Some important practical matters when working with R

Stat 290: Lab 2. Introduction to R/S-Plus

Lab 1 Introduction to R

Section 1. Introduction. Section 2. Getting Started

D&B Market Insight Release Notes. November, 2015

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

Getting Started With. A Step-by-Step Guide to Using WorldAPP Analytics to Analyze Survey Data, Create Charts, & Share Results Online

SHOW ME THE NUMBERS: DESIGNING YOUR OWN DATA VISUALIZATIONS PEPFAR Applied Learning Summit September 2017 A. Chafetz

Light Speed with Excel

Release notes for StatCrunch mid-march 2015 update

Command Line and Python Introduction. Jennifer Helsby, Eric Potash Computation for Public Policy Lecture 2: January 7, 2016

Sorted bar plot with 45 degree labels in R

Working with Charts Stratum.Viewer 6

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

Outlook Web Access. In the next step, enter your address and password to gain access to your Outlook Web Access account.

Opening a Data File in SPSS. Defining Variables in SPSS

Intro to Stata for Political Scientists

Week - 01 Lecture - 04 Downloading and installing Python

Getting Started with Python and the PyCharm IDE

Statistics 13, Lab 1. Getting Started. The Mac. Launching RStudio and loading data

You will learn: The structure of the Stata interface How to open files in Stata How to modify variable and value labels How to manipulate variables

Statistics for Biologists: Practicals

SISG/SISMID Module 3

Exercise 1: Introduction to MapInfo

AND NUMERICAL SUMMARIES. Chapter 2

Lecture Notes 3: Data summarization

SAS Visual Analytics 8.2: Getting Started with Reports

Week 7: The normal distribution and sample means

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.

A brief introduction to R

ECO375 Tutorial 1 Introduction to Stata

Visual Physics - Introductory Lab Lab 0

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Exsys RuleBook Selector Tutorial. Copyright 2004 EXSYS Inc. All right reserved. Printed in the United States of America.

SPSS 11.5 for Windows Assignment 2

Minitab Lab #1 Math 120 Nguyen 1 of 7

Numerical Descriptive Measures

Quick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018

MATH11400 Statistics Homepage

Data Explorer: User Guide 1. Data Explorer User Guide

Statistics: Interpreting Data and Making Predictions. Visual Displays of Data 1/31

Lab #1: Introduction to Basic SAS Operations

plot(seq(0,10,1), seq(0,10,1), main = "the Title", xlim=c(1,20), ylim=c(1,20), col="darkblue");

Chapter 5 Making Life Easier with Templates and Styles

1. Setup Everyone: Mount the /geobase/geo5215 drive and add a new Lab4 folder in you Labs directory.

Excel. Spreadsheet functions

Using Excel to produce graphs - a quick introduction:

Univariate Statistics Summary

Lab 1 (fall, 2017) Introduction to R and R Studio

R Workshop Module 3: Plotting Data Katherine Thompson Department of Statistics, University of Kentucky

Depending on the computer you find yourself in front of, here s what you ll need to do to open SPSS.

Install RStudio from - use the standard installation.

Introduction to CS databases and statistics in Excel Jacek Wiślicki, Laurent Babout,

Your Name: Section: 2. To develop an understanding of the standard deviation as a measure of spread.

Transcription:

NAVAL POSTGRADUATE SCHOOL LAB #1: DESCRIPTIVE STATISTICS WITH R Statistics (OA3102)

Lab #1: Descriptive Statistics with R Goal: Introduce students to various R commands for descriptive statistics. Lab type: Interactive lab demonstration followed by hands-on exercises for students. Time allotted: Lecture for ~50 minutes followed by ~50 minutes of group exercises. R libraries: Just the base package. Data: iraq.csv 1 R HINT OF THE WEEK 1. Before we begin today's lab, let's look at one approach for managing R sessions and working files. First note that you can save any R session, including all the working files and functions using the save.image() command from the command line. For example: save.image("/users/ron Fricker/Desktop/.RData") This will create a file at the location in the path you specify that contains everything. Then clicking on the icon for the file (a big blue R) will launch R and load all your files and functions into the working directory. Note: If you decide to name your file, it's still important to include the ".RData" extension. Also, if you do give your file a name more than ".RData" you'll have to explicitly save the file to that name every time. That is, if you just quit out of the session and say "Yes" to the save image question in the pop-up dialog box R will create another file with the name ".RData". What's great about this is that you can have as many.rdata files as desired. I usually have one for each project I'm working on and I keep them in the folders for the projects along with other project documentation. I also just use the generic ".RData" name for each one, where I know what project it belongs to by the folder it is in. Note: When you open the.rdata file in its location, do some work, and then close and save the session, R will create and save a second file with a.rhistory extension. This file contains the command history from the prior session(s). You can also save the session to an.rdata file using the GUI. In Windows, click on File > Save Workspace and then choose a location to save to. 1 Data from http://turtle.gis.umn.edu/pmwiki/pmwiki.php?n=statisticsanddatawithr.application TheShapeOfWarsToCome (file titled "iraq.dead.rda"). Revision: January 2012 2

DEMONSTRATION 2. Now, before we begin, let's talk a bit about R scripts. All the commands in all of the labs we will do in this class will be posted on Sakai in the form of R scripts. Why do you care? Because the script is an easy way to follow along in the lab, executing all the commands that I do, but without having to type them in! R scripts are also a great way to document what you have done in a particular analysis (so you can repeat it later, for example). Along with the R Editor, R scripts can also make it easier to work with the R command line. An R script is basically just a text file that ends with a ".R" extension. After opening it in R (PCs: File > Open script; Macs: File > Open Document), you can execute the commands right from the Editor window: a. PCs: Put the cursor on a line and hit Ctrl+R. Note that the cursor automatically moves down to the next line so, if you want to execute a sequence of commands, you can just keep hitting Ctrl+R. Alternatively, you can highlight part of a line or a bunch of lines and select Edit > Run line or selection, or if you want to run everything in the script, choose Edit > Run all. b. Macs: Put the cursor on a line and hit Command+Enter. Note that, unlike on the PC, the cursor does not automatically move down to the next line. Alternatively, you can highlight part of a line or a bunch of lines and hit Command+Enter. So, if you're going to follow along in the lab, download the R script called "Lab 1 Script.R" from Sakai and load it into your R Editor window. 3. So, on to the lab! For this lab, we're going to use a dataset from Statistics and Data with R by Cohen and Cohen (2008). It contains detailed casualty information from the Iraq war from 21 March 2003 to 10 October 2007. To begin, download the dataset from the course Sakai site and read it into R. a. Since the data are contained in a comma delimited file use read.csv() command. There are a couple of ways to do this. First, you can explicitly tell R where to find the file, as in (where you will need to modify the path to correspond to the file location on your computer): iraq <- read.csv("/users/ron Fricker/Desktop/iraq.csv", header=true) An easier way I recently discovered is to use the file.choose() command with the read.csv() function. This will allow you to simply browse for and select the file you want to read in. For example: iraq <- read.csv(file.choose()) Either of these commands reads in the CSV file and puts it into a data.frame object called iraq. 4. Now, take a quick look through the data to familiarize yourself with it. Look at each of the variables to get some idea what s in them, look for unusual data, etc. a. First, to get an idea of the size of the dataset, use the dim() command, as in Revision: January 2012 3

dim(iraq) This tells you the number of rows (observations) and the number of columns (variables) in the dataset. b. To dig a little deeper and get some summary statistics for each of the variables, use the summary() command, as in: summary(iraq) Note that the summary statistics that you get for each variable depend on whether the variable is continuous or discrete. c. To get just a list of the variable names, type: names(iraq) d. To look even deeper, browse some of the individual observations. For example, to look at the first five observations (i.e., rows of data) type iraq[1:5,] If you want to look at the complete results for one of the variables, say question Srv.Branch type iraq$srv.branch or, because it s the sixth variable in the data frame, iraq[,6] 5. Now, let's look at ways to calculate summary statistics. a. At the command line, instead of using the data frame name in the summary command, you can use the summary() function on individual variables, such as summary(iraq$ Major.Cause.of.Death) or on groups of variables, such as summary(iraq[,7:9]) You can also use the self-explanatory commands below to calculate specific numeric descriptive statistics. mean(iraq$age,na.rm=t) sd(iraq$age,na.rm=t) min(iraq$age,na.rm=t) median(iraq$age,na.rm=t) max(iraq$age,na.rm=t) quantile(iraq$age,na.rm=t) IQR(iraq$Age,na.rm=T) range(iraq$age,na.rm=t) fivenum(iraq$age,na.rm=t) Note that the option "na.rm=t" is necessary for this data. It tells R to ignore the missing data when calculating the specified statistic. Revision: January 2012 4

For "clean" data sets that are not missing any responses, you can leave that option off. But, note what happens when you leave it out on this data. For example, type mean(iraq$age) Of course, these functions only work on numeric variables. To determine the type of a variable (e.g., numeric, integer, character, factor, logical, etc), use the class() function, as in: class(iraq$age) b. How many observations are missing in variable Age? table(is.na(iraq$age)) Here the is.na() function generates a logical vector that is true if a value is missing (i.e., its set to NA in R) and false if it's not. Then the table() function creates a table with counts of the unique values in the is.na(iraq$age) vector. If this seems a bit confusing, first look at what the following command gives is.na(iraq$age) It's just a gigantic vector of "TRUE"s and "FALSE"s, where "TRUE" means Age was missing for that observation. Then all the table() function does is count up the number of "TRUE"s and "FALSE"s. This is one of the great strengths of R, once you get used to it. You can just wrap function around function around function R starts at the innermost function, generates its output, feeds that into the next innermost function, which then generates its output, feeding that into the next function With this type of functionality, you can write some very compact (and sometime quite inscruitable) code. c. Now, to figure out how observations are missing in the entire data set: table(is.na(iraq)) Note that you should be very careful about applying functions to whole data frames. Indeed, in general you should avoid doing this. Often the function will fail and other times it's not entirely clear what the function is doing exactly. However, in this case, it works quite nicely. d. Now, another strength of R is that you can easily condition on data values (i.e., subset) when doing calculations. For example, imagine if we wanted to compare the average age of hostile casualties to the age of non-hostile casualties. Here's an easy way to do that: mean(iraq$age[iraq$major.cause.of.death=="hostile"],na.rm=t) In words, the above syntax says, "Calculate the mean of the Age variable for those observations where Major.Cause.of.Death is "Hostile." In the next line, we do the same calculation for Major.Cause.of.Death equal to "Non-hostile." mean(iraq$age[iraq$major.cause.of.death=="non-hostile"],na.rm=t) Revision: January 2012 5

Note the use of the double equal sign. This is a "logical equals," which is different from a single equals sign. The former generates a TRUE or FALSE logical value (depending on whether the statement is true or false for a given observation) while the latter is an assignment operator (just like the "<-" operator. To see the difference, type the following: x = 3 x == 4 In the first line, x was assigned the value of 3, while the second returned a value of FALSE was generated since 3 is not equal to 4. Now, see what the following returns: iraq$major.cause.of.death=="hostile" e. Also, when calculating the statistics, you may want to only do the calculations by various categories of a factor. Useful syntax for this is, for example, by(iraq$age,iraq$major.cause.of.death,summary) The by() function is saying, "Calculate the summary statistics for variable Age separately for each level of the Major.Cause.of.Death variable." 6. When exploring data, sometimes it s useful to look at two-way (or higher) tabular summaries, particularly for discrete (i.e., categorical) variables a. To generate a simple table of counts, say for variables Rank and Major.Cause.of.Death, use table(iraq$rank,iraq$major.cause.of.death) b. You can generate higher-way tables by adding more variables separated by commas. For this dataset, even a three-way table is unwieldy, though. 7. Now, in addition to looking at numerical summaries, it's often very useful to look at graphical summaries. Here are some examples. a. Bar charts for categorical variables are easily generated with barplot(table(iraq$major.cause.of.death)) If you prefer the bars plotted horizontally, try barplot(table(iraq$major.cause.of.death),horiz=true) And note that you can improve the appearance of the plots by including various options, for example if we wanted to label the axes and the chart we could do: barplot(table(iraq$major.cause.of.death),horiz=true,xlab="count",y lab="major Cause of Death",main="Iraq Casualties, 3/21/03-10/10/07",xlim=c(0,3500)) To convert from counts to fractions: barplot(table(iraq$major.cause.of.death)*100/length(iraq$major.cau se.of.death),ylab="percent",xlab="major Cause of Death",main="Iraq Casualties, 3/21/03-10/10/07",ylim=c(0,100)) Revision: January 2012 6

b. Just like in Excel, you can do stacked and side-by-side bar charts in R. To keep them from being unwieldy with the full data set, let's create a smaller data set to illustrate: iraq.sub <- iraq[(iraq$country=="uk" iraq$country=="it" iraq$country=="pol"),] iraq.sub$country <- factor(iraq.sub$country) Now, let's do some bar charts on this data: barplot(table(iraq.sub$country,iraq.sub$major.cause.of.death), legend=true) barplot(table(iraq.sub$country,iraq.sub$major.cause.of.death), beside=true,legend=true) c. As we talked about in class, histograms are useful for gaining insight into the (underlying) distribution of a (continuous, numeric) variable. For example, we might want to know what the distribution of the Age variable looks like: hist(iraq$age) Clearly it's skewed to the right. As with all R plots, we can modify it in all sorts of ways. For example, here's a fancier plot with the axes labeled, a title added, and a line showing the mean age overlaid and annotated: hist(iraq$age,xlab="age (in years)",ylab="count",xlim=c(10,70),ylim=c(0,2000),main="distributi on of Age\nIraq Casualties, 3/21/03-10/10/07",col="light blue") lines(c(26.3,26.3),c(0,1950),col="red") text(mean(iraq$age,na.rm=t),2000,"average Age",col="red") d. Boxplots are another way to look at the distribution of continuous numeric variables. For example, boxplot(iraq$age,xlab="age (in years)",ylab="count", main="distribution of Age\nIraq Casualties, 3/21/03-10/10/07") Compare the histogram and boxplot for Age: par(mfrow=c(1,2)) #this allows for two plots (side-by-side) hist(iraq$age,xlab="age (in years)",ylab="count", main="distribution of Age\nIraq Casualties, 3/21/03-10/10/07") boxplot(iraq$age,xlab="age (in years)",horizontal=true,ylab="count", main="distribution of Age\nIraq Casualties, 3/21/03-10/10/07") Let's draw in the boxplot lines now to help compare: hist(iraq$age,xlab="age (in years)",ylab="count", main="distribution of Age\nIraq Casualties, 3/21/03-10/10/07") abline(v=median(iraq$age,na.rm=t),col="red",lwd=4) abline(v=fivenum(iraq$age)[2],col="red",lwd=2,lty=2) abline(v= fivenum(iraq$age)[4],col="red",lwd=2,lty=2) abline(v= fivenum(iraq$age)[4]+iqr(iraq$age,na.rm=t)*1.5, col="red",lwd=2,lty=3) abline(v=min(iraq$age,na.rm=t),col="red",lwd=2,lty=3) Revision: January 2012 7

boxplot(iraq$age,xlab="age (in years)",horizontal=true,ylab="count", main="distribution of Age\nIraq Casualties, 3/21/03-10/10/07") abline(v=median(iraq$age,na.rm=t),col="red",lwd=4) abline(v=fivenum(iraq$age)[2],col="red",lwd=2,lty=2) abline(v= fivenum(iraq$age)[4],col="red",lwd=2,lty=2) abline(v= fivenum(iraq$age)[4]+iqr(iraq$age,na.rm=t)*1.5, col="red",lwd=2,lty=3) abline(v=min(iraq$age,na.rm=t),col="red",lwd=2,lty=3) e. Side-by-side boxplots (aka as comparative boxplots) are useful for comparing the distributions of two (or more) sets of data. For example, we might want to compare the distribution of Age for US casualties versus casualties from all the other countries. To do that: par(mfrow=c(1,1)) #first, put it back to showing only one plot raq$us.indicator <- as.numeric(iraq$country=="us") boxplot(iraq$age~iraq$us.indicator,ylab="age", xlab="us Indicator (1=US, 0=Other country)",notch=true) f. Now, there are lots of other types of charts you can do in R. Too many to go into them all here. For example, a "dot chart" is kind of like a bar chart, except that dots are placed where the top of the bar would be. For example: dotchart(table(iraq$srv.branch),xlab="number of Casualties") g. You can also do pie charts: pie(table(iraq$major.cause.of.death)) Fancying it up: pie(table(iraq$major.cause.of.death), main="iraq Casualties, 3/21/03-10/10/07", col=c("red","blue")) If you don't like the look of the plot, look at the help page for the available options and modify as desired. h. And here's an example of something called a "strip chart": stripchart((iraq$julian-12131)~iraq$us.indicator,method="stack", ylab="us Indicator (1=US, 0=Other country)", xlab="date (1=3/21/03,1165=10/10/07)",pch=20) i. Finally, there's an important type of plot we will frequently use to assess whether data is normally distributed. It's sometimes called a normal probability plot and sometimes a quantile-quantile plot, or Q-Q plot for short. The latter name arises because it compares the quantiles from a normal distribution to the empirical quantiles from the data. Let's see what a normal probability plot looks like for data we know is normally distributed: par(mfrow=c(2,2)) normal.data.10 <- rnorm(10) normal.data.50 <- rnorm(50) Revision: January 2012 8

normal.data.100 <- rnorm(100) normal.data.1000 <- rnorm(1000) qqnorm(normal.data.10); qqline(normal.data.10) qqnorm(normal.data.50); qqline(normal.data.50) qqnorm(normal.data.100); qqline(normal.data.100) qqnorm(normal.data.1000); qqline(normal.data.1000) Compare to what the Q-Qplot looks like for data we know is not normally distributed: gamma.data.10 <- rgamma (10,1,2) gamma.data.50 <- rgamma (50,1,2) gamma.data.100 <- rgamma (100,1,2) gamma.data.1000 <- rgamma (1000,1,2) qqnorm(gamma.data.10); qqline(gamma.data.10) qqnorm(gamma.data.50); qqline(gamma.data.50) qqnorm(gamma.data.100); qqline(gamma.data.100) qqnorm(gamma.data.1000); qqline(gamma.data.1000) So, let's check to see whether we can reasonably assume the Age data is normally distributed: par(mfrow=c(1,1)) qqnorm(iraq$age); qqline(iraq$age) Finally, you can also compare two sets of data one against the other using the qqplot() function. Here's a fancy example: qqplot(normal.data.1000, gamma.data.1000, xlab=expression(paste("1000 observations from ",N(mu==0,sigma^2==1),sep="")), ylab= expression(paste("1000 observations from ",Gamma(alpha==1,beta==2),sep=""))) To explore all the expression options, see the help for plotmath:?plotmath Revision: January 2012 9

GROUP # EXERCISES Members:,,, 1. For the Age variable in the Iraq data, calculate the (a) Mean: (b) Median: (c) Standard deviation: (d) Maximum: (e) IQR: 2. For the Hometown variable: (a) Which hometown had the largest number of casualties in the data? How many? (Hint: The sort() function can be useful for ordering the values that come out of the table() function ) (b) For how many respondents do not have hometown information? Note that there are three types of missing data: (i) completely missing (i.e., the value for an observation is NA), (ii) those that are missing but that have a blank space in the field (i.e., the field contains a " "), and (iii) some are "Not reported yet." i) # NAs: ii) # " ": iii) # "Not reported yet": 3. Create and attach a frequency histogram and a boxplot of Age only for those with Rank of "Corporal." Appropriately label the x- and y-axes, and further modify the plots as desired to make them look professional. 4. Create a table of Major.Cause.of.Death versus the US.indicator created in the lab. Does there seem to be a difference in the fraction of casualties attributable to hostile causes between US and non-us personnel? Revision: January 2012 10

Name: _ INDIVIDUAL EXERCISES 1. Exploring the Iraq data. (a) How many records (observations) are in the data set? (b) How many variables are in the data set? (c) How is a missing observation indicated? i) How many observations are missing in the dataset? ii) How many are missing for the variable Country? 2. Calculating basic statistics and creating plots of the Iraq data. (a) How many casualties are US (i.e., Country is equal to "US")? i) How many of the US casualties were a result of non-hostile causes? ii) Among the hostile casualties, what was the cause that resulted in the largest number of casualties? (b) Now, subset the data to only US casualties and create a bar chart for Branch of Service and attach. That is, first run the following code: iraq.us.subset <- iraq[iraq$country=="us",] iraq.us.subset$srv.branch <- factor(iraq.us.subset$srv.branch) iraq.us.subset$state <- factor(iraq.us.subset$state) Note that the labels are also too long to all print out. To fix that, use the "las=2" option to turn the x-axis labels 90 degrees. And, to leave enough room for them to print (some are pretty long), before making the bar plot first run par(mar=c(12,3,1,1)) to adjust the margin area for printing. What do you notice from the bar chart? (c) Finally, do a side-by-side boxplot of Age by Svc.Branch, where again you will need to rotate the labels using the option "las=2". What do you notice from the plot? Revision: January 2012 11