Statistical Programming Camp: An Introduction to R

Size: px
Start display at page:

Download "Statistical Programming Camp: An Introduction to R"

Transcription

1 Statistical Programming Camp: An Introduction to R Handout 3: Data Manipulation and Summarizing Univariate Data Fox Chapters 1-3, 7-8 In this handout, we cover the following new materials: ˆ Using logical operators: <, <=, >, >=, ==,!=, &,, and is.na() ˆ Subsetting data with [] and subset() using logical expressions ˆ Using ifelse() for conditional statements ˆ More functions for summary statistics: var() (variance), sd() (standard deviation), weighted.mean(), quantile(x, P), and IQR() (Inter-quartile Range) ˆ Applying functions by indexes using tapply() ˆ Using function() to create user-defined functions. ˆ Common arguments for graphs: main (main title), xlab and ylab (axis labels), xlim and ylim (axis limits), pch (point symbol), lty (line type), lwd (line width), col (color), and cex (sizing) ˆ Adding features to graphs with lines() and abline() (lines), points() (points), text() (text), and arrows() (arrows) ˆ Using identify() to identify points on graphs. ˆ Using \n to break lines. ˆ Using par(mfrow = c(x, Y)) at the beginning of graphical commands to produce X by Y figure in one graphical window. ˆ Using hist() to generate histograms. ˆ Calculating a smooth density via density() ˆ Adding a legend to an existing graph by legend() ˆ Printing and saving graphs We will cover the following Statistical Programming Camp Coding Rule: ˆ Curly Brackets 1

2 1 Logical Operators and Values ˆ Logical operators (<, <=, >, >=, == and!=) allow for data manipulation and subsetting by determining whether a specified condition is TRUE or FALSE, both of which must be uppercased and are special values in R just like NA. The operators correspond to standard use. For instance, <= evaluates whether a number is greater than or equal to a specified value. The symbol!= corresponds to not equal. The output of a logical statement is of the class logical. > "Hello" == "hello" [1] FALSE > y <- 3 < 4 > y [1] TRUE > class(y) [1] "logical" ˆ Logical operators may be applied to individual data entries or entire vectors (or even a dataframe!). When applied to a vector, logical operators evaluate each element of the vector. > x <- c(3, 2, 1, -2, -1) > x!= 1 [1] TRUE TRUE FALSE TRUE TRUE ˆ Combine logical statements and operations with & (and) and (or). > x > 2 x <= -1 [1] TRUE FALSE FALSE TRUE TRUE > x > 0 & x <= 2 [1] FALSE TRUE TRUE FALSE FALSE ˆ Combinging logical operators with other commands allows us to perform operations only on elements that meet the logical condition. For instance, we can add up the number of TRUE statements using sum(). > sum(x > 0 & x <= 2) # Adds up the number of TRUE statements [1] 2 ˆ The command is.na() is a logical operator that identifies missing data. We may use na.rm() to remove missing data. 2

3 > x <- c(x, NA) > is.na(x) # identifies missing data by returning a logical vector [1] FALSE FALSE FALSE FALSE FALSE TRUE > mean(x) # cannot compute the mean due to missing data [1] NA > mean(x[!is.na(x)]) # calculates the mean for only non-missing data [1] Subsetting with Logical Expressions For the remainder of this handout, we will use the following data, which is a collection of countylevel data used by D. Matthews and J. Prothro in Negroes and the New Southern Politics. We can answer some interesting questions using this data set. Does the state-wide mean value of black voter registration depend on the existence of polltax? What about the literacy requirement? Finally, what do you find when considering the four combinations of these two? The variables of the data are: Variable Description state state name county county code polltax the existence of polltax (1 = Yes, 0 = No) litreq the existence of literacy requirement (1 = Yes, 0 = No) blackpop 1960 % black of state population (100s) pblackreg % black voting age population registered in 1964 (black registration rate) fedex66 federal examiner present in county in 1966 pincreasereg % increase in black registration rate from 1964 to 1968 ˆ Previously, we learned that vectors and data frames can be subsetted by using brackets ([ ]). For example, a subset of a data frame can be obtained by specifying row numbers (or row names) and column numbers (or column names) in brackets. Logical expressions can also be used within brackets for subsetting. > reg <- read.table("registration.txt", header=true) > ## black registration is lower where polltax is present > mean(reg$pblackreg[reg$polltax == 1]) [1] > mean(reg$pblackreg[reg$polltax == 0]) [1] > ## black registration is lower where literacy requirement is imposed > mean(reg$pblackreg[reg$litreq == 1]) 3

4 [1] > mean(reg$pblackreg[reg$litreq == 0]) [1] > ## black registration is lowest where both requirements are present > mean(reg$pblackreg[(reg$polltax == 1) & (reg$litreq == 1)]) [1] > ## no observations returns NaN > mean(reg$pblackreg[(reg$polltax == 1) & (reg$litreq == 0)]) [1] NaN > mean(reg$pblackreg[(reg$polltax == 0) & (reg$litreq == 1)]) [1] > mean(reg$pblackreg[(reg$polltax == 0) & (reg$litreq == 0)]) [1] ˆ In addition to [ ], subset() may be used to subset data, which takes vectors and data frames as the first argument. Then, users can specify subset and/or select as arguments. The former should be a logical vector indicating elements or rows to keep while the latter should specify the variables to keep (either by a vector of variable names or by a numeric vector indicating column numbers) > ## counties with a higher than average black population but lower than > ## average registration rate > lowreg <- subset(reg, subset = ((reg$blackpop >= mean(reg$blackpop)) + & (reg$pblackreg <= mean(reg$pblackreg))), + select = c("blackpop", "pblackreg", "polltax", "litreq")) > ## How many impose both polltax and literacy requirement > nrow(lowreg[(lowreg$polltax == 1) & (lowreg$litreq == 1), ]) [1] 34 > ## Another way > sum((lowreg$polltax == 1) & (lowreg$litreq == 1)) [1] 34 4

5 3 Using Conditional Statements via ifelse() Conditional Statements evaluate a logical statement, then perform different actions depending on whether the statement is true or false. The function ifelse(x, Y, Z) performs an action Y and returns the result of this action as the output if the statement X is true and performs Z and returns the output if X is false. > ## Creating a new variable indicating counties with higher than average > ## black population and polltax > reg$highpoptax <- ifelse((reg$blackpop >= mean(reg$blackpop) & reg$litreq == 1), + "Yes", "No") > ## a more complex example creating region variable > reg$region <- ifelse(reg$state=="alabama" reg$state=="georgia" + reg$state=="louisiana" reg$state=="mississippi" + reg$state=="south Carolina", "Deep South", "Peripheral South") > reg$region <- as.factor(reg$region) > table(reg$region) Deep South Peripheral South More Functions for Summarizing Data In addition to the functions we learned last week (i.e., mean(), median(), min(), max(), and range()), we have the following new functions that are useful for summarizing data. ˆ var() (variance) and sd() (standard deviation) summarize numeric data. > ## two ways of calculating standard deviation > sd(reg$pblackreg) [1] > sqrt(var(reg$pblackreg)) [1] ˆ Weighted mean can be computed using weighted.mean(x, Y), where the output is the mean of X weighted by Y. > ## overall registration rate should be weighted by county population > weighted.mean(reg$pblackreg, reg$blackpop) [1] ˆ The function quantile(x, P) provides the sample quantiles of a numeric vector X for each element of another numeric vector P. > quantile(reg$pblackreg) # the default is quartiles plus min and max 5

6 0% 25% 50% 75% 100% > quantile(reg$pblackreg, seq(from = 0.2, to = 0.8, by = 0.2)) # quintiles 20% 40% 60% 80% ˆ The function IQR() returns the interquartile range > IQR(reg$pBlackReg) [1] Applying Functions by Indexes In many situations, we want to apply the same function repeatedly for different parts of the data. For example, in the black registration data, we may want to compute the registration rate within each state. Doing this manually is a pain especially if the number of states is large; you have to subset the data for one state and then use mean() to compute the registration for that state, and this has to be repeated for each state. The function tapply() (t is a short hand for table) enables you to do such computation in one line. Specifically, tapply(x, INDEX, FUN) applies the function FUN to X for each of the groups defined by a vector INDEX. Replace FUN with mean, median, sd, etc. to generate desired quantity. > ## Calculate the mean of % black registration rates by state > tapply(reg$pblackreg, reg$state, mean) Alabama Florida Georgia Louisiana Mississippi North Carolina South Carolina Writing Functions One of the greatest benefits of R is the flexibility the software allows for users to write their own functions. The syntax takes the form of name <- foo(bar1, bar2,...), where name is the function name, (bar1, bar2,...) are the inputs, and the commands within the brackets { } define the function. We begin with a simple example, creating a function to compute the mean from a vector with missing data. Note that an opening curly brace should never go on its own line. A closing curly brace should always go on its own line. Additionally, code within brackets should be aligned according to the text editor s automatic alignment. > x <- c(10:22, NA, 1:7, NA, 5) > mean(x) # cannot compute mean due to missing data [1] NA 6

7 > my.mean <- function(x){ + x <- x[!is.na(x)] # removes missing data + sum <- sum(x) + length <- length(x) + mean <- sum/length + out <- c(sum, length, mean) # define the output + names(out) <- c("sum", "length", "mean") + out # end function by calling output + } > my.mean(x) sum length mean Programming Camp Coding Rule: Curly Brackets An opening curly brace should never go on its own line. A closing curly brace should always go on its own line. Code within brackets should be properly aligned. GOOD Code: name <- foo(bar1, bar2,...){ command1 <- code1 command2 <- code2 } BAD Code: name <- foo(bar1, bar2,...) {command1 <- code1 command2 <- code2} 7 Graphs for Univariate Data: Histograms Graphs are critical tools for summarizing data in a straightforward and easy to understand manner. Great graphics strengthen projects and report by illustrating central features of the data without much additional explanation. Bad graphics are inefficient (leaving out critical information such as labels), potentially misleading, or too complicated. 7

8 ˆ There are several common graphing arguments that specify basic features of the graph, including the number of figures included on a graph, titles, axis labels, data range, etc. The following table summarizes these arguments: main Main title of the graph. xlab, ylab Labels for the x-axis and y-axis. xlim, ylim Specifies the x-limits and y-limits, as in xlim = c(0, 10), for the interval [0, 10]. col Specifies the color to use, e.g., "blue" or "red". cex Specifies size of plotted text or symbols. cex.axis Specifies size of axis annotation. cex.main Specifies size of plot title. ˆ The second class of graphing commands adds additional features to an existing graph. These functions include points() for adding points, lines() for lines, and text() for texts. lines() abline() points() text() arrows() Adds a plot-line to figure e.g. lines(x, y) where x and y define coordinates Adds a straight line e.g. abline(h = x) to place a horizontal line at height x e.g. abline(v = x) to place a vertical line at point x e.g. abline(a = x, b = y) to place a line with intercept x and slope y Add points e.g. points(x, y) to place dots with x and y as the coordinates e.g. points(x, y, line = TRUE) connects the dots as a line Adds additional text e.g. text(x, y, z) to display z as a text centered at coordinates (x, y) Adds arrows e.g. arrows(x, y, length, angle, code) to display arrows beginning from coordinate x, ending at coordinate y, for the length specified, at the angle specified, of the arrow type specified by code =, and of the color specified by col. ˆ The function identify() allows us to click on points in our graphs and R will return meaningful data about those datapoints. When done, press Esc. ˆ The command \n will force a line break. This is convenient to use with long plot titles. ˆ The command par(mfrow = c(x, Y)) will produce an X by Y figure in one graphical window. mfrow means the graphs will be filled by row whereas mfcol means they will be filled by columns ˆ The function hist() will produce a histogram to summarize the distribution of data. Setting freq = FALSE within hist() will produce a histogram rather than a frequency plot. If you specify a single number as the argument breaks, you will be able to set the number of equally spaced bins. If you give a numeric vector instead, it will specify the breakpoints between histogram cells. ˆ The function density() will calculate the smooth density of a numeric object as an output, which then in turn can be an input to the plot() function to draw the smooth histogram (use the lines() function to add it to the existing graph). 8

9 ˆ To add a legend to an existing graph, use legend(). The syntax legend(x, y, z) adds legend with text z at coordinates (x, y), which can also be substituted with "topleft", "bottomright", etc. > ## begin by subsetting the data > examiner <- reg[reg$fedex66 == 1, ] > noexaminer <- reg[reg$fedex66 == 0, ] > ## side by side histograms of registration rates > par(mfrow = c(1, 2)) > hist(examiner$pblackreg, freq = FALSE, breaks = 10, xlim = c(0, 100), + main = "Federal Examiner Present", + xlab = "Registration Rates") > hist(noexaminer$pblackreg, freq = FALSE, breaks = 10, xlim = c(0, 100), + main = "No Federal Examiner Present", cex.main = 0.995, ## smaller plot title + xlab = "Registration Rates") Federal Examiner Present No Federal Examiner Present Density Density Registration Rates Registration Rates > ## return to single graph > par(mfrow = c(1,1)) > ## histogram for counties with examiner > hist(examiner$pblackreg, freq = FALSE, breaks = 10, xlim = c(0, 100), + main = "Registration Rates \n Federal Examiner Present", cex.main = 1.5, + xlab = "Registration Rates", cex.axis = 1.5) > ## add counties with no-examiner as smooth density > lines(density(noexaminer$pblackreg)) > ## add lines to compare median of counties with/without examiner > abline(v = median(examiner$pblackreg), col = "red", lty = 2) > abline(v = median(noexaminer$pblackreg), col = "blue", lty = 2) > ## add legend > legend("topright", c("examiner Median", "No Examiner Median"), + lty = c(2, 2), col = c("red", "blue")) 9

10 Registration Rates Federal Examiner Present Density Examiner Median No Examiner Median Registration Rates 8 Printing and Saving Graphs There are a few ways to print and save the graphs you create in R. ˆ In the window of your graph (if you are a Mac user, make sure your graphic window rather than the R console is selected), you can click File: Save as: PDF... or File: Print... ˆ You can also right-click on a figure in R and copy the image (if you are a Mac user, you need to highlight the graph and type Apple+C to copy it). Then paste that image into Microsoft Word or any other document. ˆ You can also do it via a command by using pdf() before your plotting commands and then dev.off() afterwards. > pdf(file = "myplot.pdf", height = 3, width = 5) # height and width are in inches > dev.off() ## This creates a pdf file in the working directory 9 Practice Questions 9.1 Supreme Court Justice Ideal Points In a 2002 article, Andrew Martin and Kevin Quinn explored the extent to which the ideal points (i.e., policy preferences) of Supreme Court Justices change throughout their tenure on the Court.The data set contains the following: ˆ term Supreme Court Term ˆ justice Justice s Last Name ˆ idealpt Justice s Estimated Ideal Point, where negative values indicate liberal leanings and positive values indicate conservative leanings 10

11 ˆ pparty President s Political Party 1. Using the tapply() function, create a variable for the median ideal point of court justices for each term of the court. 2. Generate a new variable in the justices data set to indicate whether each justice falls on the Conservative or Liberal end of the ideal point spectrum. Using ifelse(), generate a new variable that takes a value of Liberal if the justice s ideal point is less than 0 and a value of Conservative if the justice s ideal point is greater than 0. Using table(), determine how many justices in the data set were Conservative and how many were liberal. 3. Create a histogram of justice s ideal points. Using tapply(), calculate each justice s median ideal point. Generate a histogram of the justice s ideal points. Be sure to add an informative title and labels. Create a red, vertical dashed line indicating the median. Additionally, add the density line to the plot. Save the graph you created as a pdf file using the file name xxx.pdf where xxx is your netid. Submit it to Blackboard along with your R script file xxx.r (Do not turn in your R console print out). 9.2 The Impact of Increases in the Minimum Wage Many economists believe that increasing the minimum wage actually hurts the poor, the very part of the population such a policy is supposed to help out. The reason is that if employers have to pay higher wages then they would simply hire less people. This means that those who are earning the minimum wage may lose their jobs as a result of increasing the minimum wage. Two researchers, David Card and Alan Krueger, tested this argument using the data from fast food industry in New Jersey and Pennsylvania. We analyze their original data in this precept. The njmin.txt data file, available at Blackboard, contains the following variables Variable chain location wagebefore wageafter fullbefore fullafter partbefore partafter Description fast food chain store location (southj, centralj, northj, shorej & PA) Starting wage measured before the increase Starting wage measured after the increase number of full-time employees before the increase number of full-time employees before the increase number of part-time employees before the increase number of part-time employees before the increase 1. Load the data into R 2. Create a factor variable called state, which takes two values NJ and PA. How many stores in NJ and PA does the study sample contain, respectively? Which chain has the largest number of restaurants NJ and PA, respectively, in this study sample? 3. Create four histograms in one graph using the starting wage data; starting from the left upper corner in a clockwise manner, NJ before the increase, NJ after the increase, PA after the increase, and PA before the increase. Add informative labels to each graph. Are the starting wages similar between NJ and PA before the increase? What about after the increase? Within each state, does the histogram look similar before and after the increase? 11

12 4. Compute the average number of full-time employees in NJ separately before and after the increase. Do the same for PA. What do these numbers tell you about the impact of the increase in minimum wage? Are these average differences large compared to the standard deviation of full-time employees before the change in each state? 5. Calculate the difference in the number of full-time employees between before and after the increase within each state. Summarize the data using two smoothed histograms in one plot (red solid line for NJ, and blue solid line for PA), with dashed lines for representing the mean difference of each state. Finally, calculate the difference in differences between the two states. (If you are curious, go ahead and conduct the same calculation for part-time employment and see if similar results are obtained.) 12

Statistical Programming Camp: An Introduction to R

Statistical Programming Camp: An Introduction to R Statistical Programming Camp: An Introduction to R Handout 5: Loops and Conditional Statements Fox Chapter 2, 8 In this handout, we cover the following new materials: Using loops for(i in X){ to repeat

More information

Statistical Software Camp: Introduction to R

Statistical Software Camp: Introduction to R Statistical Software Camp: Introduction to R Day 1 August 24, 2009 1 Introduction 1.1 Why Use R? ˆ Widely-used (ever-increasingly so in political science) ˆ Free ˆ Power and flexibility ˆ Graphical capabilities

More information

POL 345: Quantitative Analysis and Politics

POL 345: Quantitative Analysis and Politics POL 345: Quantitative Analysis and Politics Precept Handout 4 Week 5 (Verzani Chapter 6: 6.2) Remember to complete the entire handout and submit the precept questions to the Blackboard DropBox 24 hours

More information

Introduction to R for Epidemiologists

Introduction to R for Epidemiologists Introduction to R for Epidemiologists Jenna Krall, PhD Thursday, January 29, 2015 Final project Epidemiological analysis of real data Must include: Summary statistics T-tests or chi-squared tests Regression

More information

Univariate Data - 2. Numeric Summaries

Univariate Data - 2. Numeric Summaries Univariate Data - 2. Numeric Summaries Young W. Lim 2018-08-01 Mon Young W. Lim Univariate Data - 2. Numeric Summaries 2018-08-01 Mon 1 / 36 Outline 1 Univariate Data Based on Numerical Summaries R Numeric

More information

Univariate Data - 2. Numeric Summaries

Univariate Data - 2. Numeric Summaries Univariate Data - 2. Numeric Summaries Young W. Lim 2018-02-05 Mon Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 1 / 31 Outline 1 Univariate Data Based on Numerical Summaries Young

More information

AA BB CC DD EE. Introduction to Graphics in R

AA BB CC DD EE. Introduction to Graphics in R Introduction to Graphics in R Cori Mar 7/10/18 ### Reading in the data dat

More information

Statistics 251: Statistical Methods

Statistics 251: Statistical Methods Statistics 251: Statistical Methods Summaries and Graphs in R Module R1 2018 file:///u:/documents/classes/lectures/251301/renae/markdown/master%20versions/summary_graphs.html#1 1/14 Summary Statistics

More information

Lab 1 Introduction to R

Lab 1 Introduction to R Lab 1 Introduction to R Date: August 23, 2011 Assignment and Report Due Date: August 30, 2011 Goal: The purpose of this lab is to get R running on your machines and to get you familiar with the basics

More information

DSCI 325: Handout 18 Introduction to Graphics in R

DSCI 325: Handout 18 Introduction to Graphics in R DSCI 325: Handout 18 Introduction to Graphics in R Spring 2016 This handout will provide an introduction to creating graphics in R. One big advantage that R has over SAS (and over several other statistical

More information

Statistical Programming with R

Statistical Programming with R Statistical Programming with R Lecture 9: Basic graphics in R Part 2 Bisher M. Iqelan biqelan@iugaza.edu.ps Department of Mathematics, Faculty of Science, The Islamic University of Gaza 2017-2018, Semester

More information

POL 345: Quantitative Analysis and Politics

POL 345: Quantitative Analysis and Politics POL 345: Quantitative Analysis and Politics Precept Handout 1 Week 2 (Verzani Chapter 1: Sections 1.2.4 1.4.31) Remember to complete the entire handout and submit the precept questions to the Blackboard

More information

Practical 2: Plotting

Practical 2: Plotting Practical 2: Plotting Complete this sheet as you work through it. If you run into problems, then ask for help - don t skip sections! Open Rstudio and store any files you download or create in a directory

More information

2.1: Frequency Distributions and Their Graphs

2.1: Frequency Distributions and Their Graphs 2.1: Frequency Distributions and Their Graphs Frequency Distribution - way to display data that has many entries - table that shows classes or intervals of data entries and the number of entries in each

More information

Basics of Plotting Data

Basics of Plotting Data Basics of Plotting Data Luke Chang Last Revised July 16, 2010 One of the strengths of R over other statistical analysis packages is its ability to easily render high quality graphs. R uses vector based

More information

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 6: DESCRIPTIVE STATISTICS Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling

More information

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /6/ /13

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /6/ /13 BIO5312 Biostatistics R Session 02: Graph Plots in R Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 9/6/2016 1 /13 Graphic Methods Graphic methods of displaying data give a quick

More information

Graphics in R STAT 133. Gaston Sanchez. Department of Statistics, UC Berkeley

Graphics in R STAT 133. Gaston Sanchez. Department of Statistics, UC Berkeley Graphics in R STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 Base Graphics 2 Graphics in R Traditional

More information

INTRODUCTION TO R. Basic Graphics

INTRODUCTION TO R. Basic Graphics INTRODUCTION TO R Basic Graphics Graphics in R Create plots with code Replication and modification easy Reproducibility! graphics package ggplot2, ggvis, lattice graphics package Many functions plot()

More information

Bar Graphs and Dot Plots

Bar Graphs and Dot Plots CONDENSED LESSON 1.1 Bar Graphs and Dot Plots In this lesson you will interpret and create a variety of graphs find some summary values for a data set draw conclusions about a data set based on graphs

More information

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. 1 CHAPTER 1 Introduction Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. Variable: Any characteristic of a person or thing that can be expressed

More information

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to

More information

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 1 2 Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 2. How to construct (in your head!) and interpret confidence intervals.

More information

Plotting: An Iterative Process

Plotting: An Iterative Process Plotting: An Iterative Process Plotting is an iterative process. First we find a way to represent the data that focusses on the important aspects of the data. What is considered an important aspect may

More information

WHOLE NUMBER AND DECIMAL OPERATIONS

WHOLE NUMBER AND DECIMAL OPERATIONS WHOLE NUMBER AND DECIMAL OPERATIONS Whole Number Place Value : 5,854,902 = Ten thousands thousands millions Hundred thousands Ten thousands Adding & Subtracting Decimals : Line up the decimals vertically.

More information

TMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS

TMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS To Describe Data, consider: Symmetry Skewness TMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS Unimodal or bimodal or uniform Extreme values Range of Values and mid-range Most frequently occurring values In

More information

LAB #1: DESCRIPTIVE STATISTICS WITH R

LAB #1: DESCRIPTIVE STATISTICS WITH R NAVAL POSTGRADUATE SCHOOL LAB #1: DESCRIPTIVE STATISTICS WITH R Statistics (OA3102) Lab #1: Descriptive Statistics with R Goal: Introduce students to various R commands for descriptive statistics. Lab

More information

Data Visualization. Andrew Jaffe Instructor

Data Visualization. Andrew Jaffe Instructor Module 9 Data Visualization Andrew Jaffe Instructor Basic Plots We covered some basic plots previously, but we are going to expand the ability to customize these basic graphics first. 2/45 Read in Data

More information

CHAPTER 2: SAMPLING AND DATA

CHAPTER 2: SAMPLING AND DATA CHAPTER 2: SAMPLING AND DATA This presentation is based on material and graphs from Open Stax and is copyrighted by Open Stax and Georgia Highlands College. OUTLINE 2.1 Stem-and-Leaf Graphs (Stemplots),

More information

Advanced Econometric Methods EMET3011/8014

Advanced Econometric Methods EMET3011/8014 Advanced Econometric Methods EMET3011/8014 Lecture 2 John Stachurski Semester 1, 2011 Announcements Missed first lecture? See www.johnstachurski.net/emet Weekly download of course notes First computer

More information

Bar Charts and Frequency Distributions

Bar Charts and Frequency Distributions Bar Charts and Frequency Distributions Use to display the distribution of categorical (nominal or ordinal) variables. For the continuous (numeric) variables, see the page Histograms, Descriptive Stats

More information

MATH11400 Statistics Homepage

MATH11400 Statistics Homepage MATH11400 Statistics 1 2010 11 Homepage http://www.stats.bris.ac.uk/%7emapjg/teach/stats1/ 1.1 A Framework for Statistical Problems Many statistical problems can be described by a simple framework in which

More information

Plotting Complex Figures Using R. Simon Andrews v

Plotting Complex Figures Using R. Simon Andrews v Plotting Complex Figures Using R Simon Andrews simon.andrews@babraham.ac.uk v2017-11 The R Painters Model Plot area Base plot Overlays Core Graph Types Local options to change a specific plot Global options

More information

Introduction to R: Day 2 September 20, 2017

Introduction to R: Day 2 September 20, 2017 Introduction to R: Day 2 September 20, 2017 Outline RStudio projects Base R graphics plotting one or two continuous variables customizable elements of plots saving plots to a file Create a new project

More information

Data Management Project Using Software to Carry Out Data Analysis Tasks

Data Management Project Using Software to Carry Out Data Analysis Tasks Data Management Project Using Software to Carry Out Data Analysis Tasks This activity involves two parts: Part A deals with finding values for: Mean, Median, Mode, Range, Standard Deviation, Max and Min

More information

> glucose = c(81, 85, 93, 93, 99, 76, 75, 84, 78, 84, 81, 82, 89, + 81, 96, 82, 74, 70, 84, 86, 80, 70, 131, 75, 88, 102, 115, + 89, 82, 79, 106)

> glucose = c(81, 85, 93, 93, 99, 76, 75, 84, 78, 84, 81, 82, 89, + 81, 96, 82, 74, 70, 84, 86, 80, 70, 131, 75, 88, 102, 115, + 89, 82, 79, 106) This document describes how to use a number of R commands for plotting one variable and for calculating one variable summary statistics Specifically, it describes how to use R to create dotplots, histograms,

More information

Chapter 1. Looking at Data-Distribution

Chapter 1. Looking at Data-Distribution Chapter 1. Looking at Data-Distribution Statistics is the scientific discipline that provides methods to draw right conclusions: 1)Collecting the data 2)Describing the data 3)Drawing the conclusions Raw

More information

Chapter 5 An Introduction to Basic Plotting Tools

Chapter 5 An Introduction to Basic Plotting Tools Chapter 5 An Introduction to Basic Plotting Tools We have demonstrated the use of R tools for importing data, manipulating data, extracting subsets of data, and making simple calculations, such as mean,

More information

Exploring and Understanding Data Using R.

Exploring and Understanding Data Using R. Exploring and Understanding Data Using R. Loading the data into an R data frame: variable

More information

Intro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington

Intro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington Intro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington Class Outline - The R Environment and Graphics Engine - Basic Graphs

More information

Dealing with Data in Excel 2013/2016

Dealing with Data in Excel 2013/2016 Dealing with Data in Excel 2013/2016 Excel provides the ability to do computations and graphing of data. Here we provide the basics and some advanced capabilities available in Excel that are useful for

More information

IQR = number. summary: largest. = 2. Upper half: Q3 =

IQR = number. summary: largest. = 2. Upper half: Q3 = Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number

More information

Package sciplot. February 15, 2013

Package sciplot. February 15, 2013 Package sciplot February 15, 2013 Version 1.1-0 Title Scientific Graphing Functions for Factorial Designs Author Manuel Morales , with code developed by the R Development Core Team

More information

R Programming: Worksheet 6

R Programming: Worksheet 6 R Programming: Worksheet 6 Today we ll study a few useful functions we haven t come across yet: all(), any(), `%in%`, match(), pmax(), pmin(), unique() We ll also apply our knowledge to the bootstrap.

More information

IST 3108 Data Analysis and Graphics Using R Week 9

IST 3108 Data Analysis and Graphics Using R Week 9 IST 3108 Data Analysis and Graphics Using R Week 9 Engin YILDIZTEPE, Ph.D 2017-Spring Introduction to Graphics >y plot (y) In R, pictures are presented in the active graphical device or window.

More information

Chapter 2 Organizing and Graphing Data. 2.1 Organizing and Graphing Qualitative Data

Chapter 2 Organizing and Graphing Data. 2.1 Organizing and Graphing Qualitative Data Chapter 2 Organizing and Graphing Data 2.1 Organizing and Graphing Qualitative Data 2.2 Organizing and Graphing Quantitative Data 2.3 Stem-and-leaf Displays 2.4 Dotplots 2.1 Organizing and Graphing Qualitative

More information

Introduction to R. Biostatistics 615/815 Lecture 23

Introduction to R. Biostatistics 615/815 Lecture 23 Introduction to R Biostatistics 615/815 Lecture 23 So far We have been working with C Strongly typed language Variable and function types set explicitly Functional language Programs are a collection of

More information

Name: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution

Name: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution Name: Date: Period: Chapter 2 Section 1: Describing Location in a Distribution Suppose you earned an 86 on a statistics quiz. The question is: should you be satisfied with this score? What if it is the

More information

A (very) brief introduction to R

A (very) brief introduction to R A (very) brief introduction to R You typically start R at the command line prompt in a command line interface (CLI) mode. It is not a graphical user interface (GUI) although there are some efforts to produce

More information

Box Plots. OpenStax College

Box Plots. OpenStax College Connexions module: m46920 1 Box Plots OpenStax College This work is produced by The Connexions Project and licensed under the Creative Commons Attribution License 3.0 Box plots (also called box-and-whisker

More information

Solution to Tumor growth in mice

Solution to Tumor growth in mice Solution to Tumor growth in mice Exercise 1 1. Import the data to R Data is in the file tumorvols.csv which can be read with the read.csv2 function. For a succesful import you need to tell R where exactly

More information

Module 10. Data Visualization. Andrew Jaffe Instructor

Module 10. Data Visualization. Andrew Jaffe Instructor Module 10 Data Visualization Andrew Jaffe Instructor Basic Plots We covered some basic plots on Wednesday, but we are going to expand the ability to customize these basic graphics first. 2/37 But first...

More information

Depending on the computer you find yourself in front of, here s what you ll need to do to open SPSS.

Depending on the computer you find yourself in front of, here s what you ll need to do to open SPSS. 1 SPSS 11.5 for Windows Introductory Assignment Material covered: Opening an existing SPSS data file, creating new data files, generating frequency distributions and descriptive statistics, obtaining printouts

More information

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures Part I, Chapters 4 & 5 Data Tables and Data Analysis Statistics and Figures Descriptive Statistics 1 Are data points clumped? (order variable / exp. variable) Concentrated around one value? Concentrated

More information

MATH NATION SECTION 9 H.M.H. RESOURCES

MATH NATION SECTION 9 H.M.H. RESOURCES MATH NATION SECTION 9 H.M.H. RESOURCES SPECIAL NOTE: These resources were assembled to assist in student readiness for their upcoming Algebra 1 EOC. Although these resources have been compiled for your

More information

6th Grade Vocabulary Mathematics Unit 2

6th Grade Vocabulary Mathematics Unit 2 6 th GRADE UNIT 2 6th Grade Vocabulary Mathematics Unit 2 VOCABULARY area triangle right triangle equilateral triangle isosceles triangle scalene triangle quadrilaterals polygons irregular polygons rectangles

More information

Package EnQuireR. R topics documented: February 19, Type Package Title A package dedicated to questionnaires Version 0.

Package EnQuireR. R topics documented: February 19, Type Package Title A package dedicated to questionnaires Version 0. Type Package Title A package dedicated to questionnaires Version 0.10 Date 2009-06-10 Package EnQuireR February 19, 2015 Author Fournier Gwenaelle, Cadoret Marine, Fournier Olivier, Le Poder Francois,

More information

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010 UCLA Statistical Consulting Center R Bootcamp Irina Kukuyeva ikukuyeva@stat.ucla.edu September 20, 2010 Outline 1 Introduction 2 Preliminaries 3 Working with Vectors and Matrices 4 Data Sets in R 5 Overview

More information

R syntax guide. Richard Gonzalez Psychology 613. August 27, 2015

R syntax guide. Richard Gonzalez Psychology 613. August 27, 2015 R syntax guide Richard Gonzalez Psychology 613 August 27, 2015 This handout will help you get started with R syntax. There are obviously many details that I cannot cover in these short notes but these

More information

Matrix algebra. Basics

Matrix algebra. Basics Matrix.1 Matrix algebra Matrix algebra is very prevalently used in Statistics because it provides representations of models and computations in a much simpler manner than without its use. The purpose of

More information

Chapter 3 - Displaying and Summarizing Quantitative Data

Chapter 3 - Displaying and Summarizing Quantitative Data Chapter 3 - Displaying and Summarizing Quantitative Data 3.1 Graphs for Quantitative Data (LABEL GRAPHS) August 25, 2014 Histogram (p. 44) - Graph that uses bars to represent different frequencies or relative

More information

Week 4: Describing data and estimation

Week 4: Describing data and estimation Week 4: Describing data and estimation Goals Investigate sampling error; see that larger samples have less sampling error. Visualize confidence intervals. Calculate basic summary statistics using R. Calculate

More information

Table of Contents (As covered from textbook)

Table of Contents (As covered from textbook) Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression

More information

Fathom Dynamic Data TM Version 2 Specifications

Fathom Dynamic Data TM Version 2 Specifications Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other

More information

10.4 Measures of Central Tendency and Variation

10.4 Measures of Central Tendency and Variation 10.4 Measures of Central Tendency and Variation Mode-->The number that occurs most frequently; there can be more than one mode ; if each number appears equally often, then there is no mode at all. (mode

More information

10.4 Measures of Central Tendency and Variation

10.4 Measures of Central Tendency and Variation 10.4 Measures of Central Tendency and Variation Mode-->The number that occurs most frequently; there can be more than one mode ; if each number appears equally often, then there is no mode at all. (mode

More information

Organizing and Summarizing Data

Organizing and Summarizing Data 1 Organizing and Summarizing Data Key Definitions Frequency Distribution: This lists each category of data and how often they occur. : The percent of observations within the one of the categories. This

More information

Az R adatelemzési nyelv

Az R adatelemzési nyelv Az R adatelemzési nyelv alapjai II. Egészségügyi informatika és biostatisztika Gézsi András gezsi@mit.bme.hu Functions Functions Functions do things with data Input : function arguments (0,1,2, ) Output

More information

Practice for Learning R and Learning Latex

Practice for Learning R and Learning Latex Practice for Learning R and Learning Latex Jennifer Pan August, 2011 Latex Environments A) Try to create the following equations: 1. 5+6 α = β2 2. P r( 1.96 Z 1.96) = 0.95 ( ) ( ) sy 1 r 2 3. ˆβx = r xy

More information

PyPlot. The plotting library must be imported, and we will assume in these examples an import statement similar to those for numpy and math as

PyPlot. The plotting library must be imported, and we will assume in these examples an import statement similar to those for numpy and math as Geog 271 Geographic Data Analysis Fall 2017 PyPlot Graphicscanbeproducedin Pythonviaavarietyofpackages. We willuseapythonplotting package that is part of MatPlotLib, for which documentation can be found

More information

3. Data Analysis and Statistics

3. Data Analysis and Statistics 3. Data Analysis and Statistics 3.1 Visual Analysis of Data 3.2.1 Basic Statistics Examples 3.2.2 Basic Statistical Theory 3.3 Normal Distributions 3.4 Bivariate Data 3.1 Visual Analysis of Data Visual

More information

Stat 290: Lab 2. Introduction to R/S-Plus

Stat 290: Lab 2. Introduction to R/S-Plus Stat 290: Lab 2 Introduction to R/S-Plus Lab Objectives 1. To introduce basic R/S commands 2. Exploratory Data Tools Assignment Work through the example on your own and fill in numerical answers and graphs.

More information

Common Sta 101 Commands for R. 1 One quantitative variable. 2 One categorical variable. 3 Two categorical variables. Summary statistics

Common Sta 101 Commands for R. 1 One quantitative variable. 2 One categorical variable. 3 Two categorical variables. Summary statistics Common Sta 101 Commands for R 1 One quantitative variable summary(x) # most summary statitstics at once mean(x) median(x) sd(x) hist(x) boxplot(x) # horizontal = TRUE for horizontal plot qqnorm(x) qqline(x)

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions

More information

The Average and SD in R

The Average and SD in R The Average and SD in R The Basics: mean() and sd() Calculating an average and standard deviation in R is straightforward. The mean() function calculates the average and the sd() function calculates the

More information

Introduction to R 21/11/2016

Introduction to R 21/11/2016 Introduction to R 21/11/2016 C3BI Vincent Guillemot & Anne Biton R: presentation and installation Where? https://cran.r-project.org/ How to install and use it? Follow the steps: you don t need advanced

More information

SPSS TRAINING SPSS VIEWS

SPSS TRAINING SPSS VIEWS SPSS TRAINING SPSS VIEWS Dataset Data file Data View o Full data set, structured same as excel (variable = column name, row = record) Variable View o Provides details for each variable (column in Data

More information

The nor1mix Package. August 3, 2006

The nor1mix Package. August 3, 2006 The nor1mix Package August 3, 2006 Title Normal (1-d) Mixture Models (S3 Classes and Methods) Version 1.0-6 Date 2006-08-02 Author: Martin Mächler Maintainer Martin Maechler

More information

Shrinkage of logarithmic fold changes

Shrinkage of logarithmic fold changes Shrinkage of logarithmic fold changes Michael Love August 9, 2014 1 Comparing the posterior distribution for two genes First, we run a DE analysis on the Bottomly et al. dataset, once with shrunken LFCs

More information

No. of blue jelly beans No. of bags

No. of blue jelly beans No. of bags Math 167 Ch5 Review 1 (c) Janice Epstein CHAPTER 5 EXPLORING DATA DISTRIBUTIONS A sample of jelly bean bags is chosen and the number of blue jelly beans in each bag is counted. The results are shown in

More information

STA 570 Spring Lecture 5 Tuesday, Feb 1

STA 570 Spring Lecture 5 Tuesday, Feb 1 STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row

More information

Preservation of protein-protein interaction networks Simple simulated example

Preservation of protein-protein interaction networks Simple simulated example Preservation of protein-protein interaction networks Simple simulated example Peter Langfelder and Steve Horvath May, 0 Contents Overview.a Setting up the R session............................................

More information

8: Statistics. Populations and Samples. Histograms and Frequency Polygons. Page 1 of 10

8: Statistics. Populations and Samples. Histograms and Frequency Polygons. Page 1 of 10 8: Statistics Statistics: Method of collecting, organizing, analyzing, and interpreting data, as well as drawing conclusions based on the data. Methodology is divided into two main areas. Descriptive Statistics:

More information

Tips and Guidance for Analyzing Data. Executive Summary

Tips and Guidance for Analyzing Data. Executive Summary Tips and Guidance for Analyzing Data Executive Summary This document has information and suggestions about three things: 1) how to quickly do a preliminary analysis of time-series data; 2) key things to

More information

Prob and Stats, Sep 4

Prob and Stats, Sep 4 Prob and Stats, Sep 4 Variations on the Frequency Histogram Book Sections: N/A Essential Questions: What are the methods for displaying data, and how can I build them? What are variations of the frequency

More information

STAT:5400 Computing in Statistics

STAT:5400 Computing in Statistics STAT:5400 Computing in Statistics Introduction to SAS Lecture 18 Oct 12, 2015 Kate Cowles 374 SH, 335-0727 kate-cowles@uiowaedu SAS SAS is the statistical software package most commonly used in business,

More information

CAMBRIDGE TECHNOLOGY IN MATHS Year 11 TI-89 User guide

CAMBRIDGE TECHNOLOGY IN MATHS Year 11 TI-89 User guide Year 11 TI-89 User guide Page 1 of 17 CAMBRIDGE TECHNOLOGY IN MATHS Year 11 TI-89 User guide CONTENTS Getting started 2 Linear equations and graphs 3 Statistics 5 Sequences 11 Business and related mathematics

More information

PyPlot. The plotting library must be imported, and we will assume in these examples an import statement similar to those for numpy and math as

PyPlot. The plotting library must be imported, and we will assume in these examples an import statement similar to those for numpy and math as Geog 271 Geographic Data Analysis Fall 2015 PyPlot Graphicscanbeproducedin Pythonviaavarietyofpackages. We willuseapythonplotting package that is part of MatPlotLib, for which documentation can be found

More information

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order. Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good

More information

The nor1mix Package. June 12, 2007

The nor1mix Package. June 12, 2007 The nor1mix Package June 12, 2007 Title Normal (1-d) Mixture Models (S3 Classes and Methods) Version 1.0-7 Date 2007-03-15 Author Martin Mächler Maintainer Martin Maechler

More information

1. Start WinBUGS by double clicking on the WinBUGS icon (or double click on the file WinBUGS14.exe in the WinBUGS14 directory in C:\Program Files).

1. Start WinBUGS by double clicking on the WinBUGS icon (or double click on the file WinBUGS14.exe in the WinBUGS14 directory in C:\Program Files). Hints on using WinBUGS 1 Running a model in WinBUGS 1. Start WinBUGS by double clicking on the WinBUGS icon (or double click on the file WinBUGS14.exe in the WinBUGS14 directory in C:\Program Files). 2.

More information

Name Geometry Intro to Stats. Find the mean, median, and mode of the data set. 1. 1,6,3,9,6,8,4,4,4. Mean = Median = Mode = 2.

Name Geometry Intro to Stats. Find the mean, median, and mode of the data set. 1. 1,6,3,9,6,8,4,4,4. Mean = Median = Mode = 2. Name Geometry Intro to Stats Statistics are numerical values used to summarize and compare sets of data. Two important types of statistics are measures of central tendency and measures of dispersion. A

More information

An Introduction to R Graphics

An Introduction to R Graphics An Introduction to R Graphics PnP Group Seminar 25 th April 2012 Why use R for graphics? Fast data exploration Easy automation and reproducibility Create publication quality figures Customisation of almost

More information

Chapter 5snow year.notebook March 15, 2018

Chapter 5snow year.notebook March 15, 2018 Chapter 5: Statistical Reasoning Section 5.1 Exploring Data Measures of central tendency (Mean, Median and Mode) attempt to describe a set of data by identifying the central position within a set of data

More information

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Lecture 3 Questions that we should be able to answer by the end of this lecture: Lecture 3 Questions that we should be able to answer by the end of this lecture: Which is the better exam score? 67 on an exam with mean 50 and SD 10 or 62 on an exam with mean 40 and SD 12 Is it fair

More information

1. Descriptive Statistics

1. Descriptive Statistics 1.1 Descriptive statistics 1. Descriptive Statistics A Data management Before starting any statistics analysis with a graphics calculator, you need to enter the data. We will illustrate the process by

More information

STAT 503 Fall Introduction to SAS

STAT 503 Fall Introduction to SAS Getting Started Introduction to SAS 1) Download all of the files, sas programs (.sas) and data files (.dat) into one of your directories. I would suggest using your H: drive if you are using a computer

More information

Numerical Descriptive Measures

Numerical Descriptive Measures Chapter 3 Numerical Descriptive Measures 1 Numerical Descriptive Measures Chapter 3 Measures of Central Tendency and Measures of Dispersion A sample of 40 students at a university was randomly selected,

More information

Chapter 5: The standard deviation as a ruler and the normal model p131

Chapter 5: The standard deviation as a ruler and the normal model p131 Chapter 5: The standard deviation as a ruler and the normal model p131 Which is the better exam score? 67 on an exam with mean 50 and SD 10 62 on an exam with mean 40 and SD 12? Is it fair to say: 67 is

More information

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Lecture 3 Questions that we should be able to answer by the end of this lecture: Lecture 3 Questions that we should be able to answer by the end of this lecture: Which is the better exam score? 67 on an exam with mean 50 and SD 10 or 62 on an exam with mean 40 and SD 12 Is it fair

More information