Statistical Programming Camp: An Introduction to R
|
|
- Ross Lloyd
- 5 years ago
- Views:
Transcription
1 Statistical Programming Camp: An Introduction to R Handout 3: Data Manipulation and Summarizing Univariate Data Fox Chapters 1-3, 7-8 In this handout, we cover the following new materials: ˆ Using logical operators: <, <=, >, >=, ==,!=, &,, and is.na() ˆ Subsetting data with [] and subset() using logical expressions ˆ Using ifelse() for conditional statements ˆ More functions for summary statistics: var() (variance), sd() (standard deviation), weighted.mean(), quantile(x, P), and IQR() (Inter-quartile Range) ˆ Applying functions by indexes using tapply() ˆ Using function() to create user-defined functions. ˆ Common arguments for graphs: main (main title), xlab and ylab (axis labels), xlim and ylim (axis limits), pch (point symbol), lty (line type), lwd (line width), col (color), and cex (sizing) ˆ Adding features to graphs with lines() and abline() (lines), points() (points), text() (text), and arrows() (arrows) ˆ Using identify() to identify points on graphs. ˆ Using \n to break lines. ˆ Using par(mfrow = c(x, Y)) at the beginning of graphical commands to produce X by Y figure in one graphical window. ˆ Using hist() to generate histograms. ˆ Calculating a smooth density via density() ˆ Adding a legend to an existing graph by legend() ˆ Printing and saving graphs We will cover the following Statistical Programming Camp Coding Rule: ˆ Curly Brackets 1
2 1 Logical Operators and Values ˆ Logical operators (<, <=, >, >=, == and!=) allow for data manipulation and subsetting by determining whether a specified condition is TRUE or FALSE, both of which must be uppercased and are special values in R just like NA. The operators correspond to standard use. For instance, <= evaluates whether a number is greater than or equal to a specified value. The symbol!= corresponds to not equal. The output of a logical statement is of the class logical. > "Hello" == "hello" [1] FALSE > y <- 3 < 4 > y [1] TRUE > class(y) [1] "logical" ˆ Logical operators may be applied to individual data entries or entire vectors (or even a dataframe!). When applied to a vector, logical operators evaluate each element of the vector. > x <- c(3, 2, 1, -2, -1) > x!= 1 [1] TRUE TRUE FALSE TRUE TRUE ˆ Combine logical statements and operations with & (and) and (or). > x > 2 x <= -1 [1] TRUE FALSE FALSE TRUE TRUE > x > 0 & x <= 2 [1] FALSE TRUE TRUE FALSE FALSE ˆ Combinging logical operators with other commands allows us to perform operations only on elements that meet the logical condition. For instance, we can add up the number of TRUE statements using sum(). > sum(x > 0 & x <= 2) # Adds up the number of TRUE statements [1] 2 ˆ The command is.na() is a logical operator that identifies missing data. We may use na.rm() to remove missing data. 2
3 > x <- c(x, NA) > is.na(x) # identifies missing data by returning a logical vector [1] FALSE FALSE FALSE FALSE FALSE TRUE > mean(x) # cannot compute the mean due to missing data [1] NA > mean(x[!is.na(x)]) # calculates the mean for only non-missing data [1] Subsetting with Logical Expressions For the remainder of this handout, we will use the following data, which is a collection of countylevel data used by D. Matthews and J. Prothro in Negroes and the New Southern Politics. We can answer some interesting questions using this data set. Does the state-wide mean value of black voter registration depend on the existence of polltax? What about the literacy requirement? Finally, what do you find when considering the four combinations of these two? The variables of the data are: Variable Description state state name county county code polltax the existence of polltax (1 = Yes, 0 = No) litreq the existence of literacy requirement (1 = Yes, 0 = No) blackpop 1960 % black of state population (100s) pblackreg % black voting age population registered in 1964 (black registration rate) fedex66 federal examiner present in county in 1966 pincreasereg % increase in black registration rate from 1964 to 1968 ˆ Previously, we learned that vectors and data frames can be subsetted by using brackets ([ ]). For example, a subset of a data frame can be obtained by specifying row numbers (or row names) and column numbers (or column names) in brackets. Logical expressions can also be used within brackets for subsetting. > reg <- read.table("registration.txt", header=true) > ## black registration is lower where polltax is present > mean(reg$pblackreg[reg$polltax == 1]) [1] > mean(reg$pblackreg[reg$polltax == 0]) [1] > ## black registration is lower where literacy requirement is imposed > mean(reg$pblackreg[reg$litreq == 1]) 3
4 [1] > mean(reg$pblackreg[reg$litreq == 0]) [1] > ## black registration is lowest where both requirements are present > mean(reg$pblackreg[(reg$polltax == 1) & (reg$litreq == 1)]) [1] > ## no observations returns NaN > mean(reg$pblackreg[(reg$polltax == 1) & (reg$litreq == 0)]) [1] NaN > mean(reg$pblackreg[(reg$polltax == 0) & (reg$litreq == 1)]) [1] > mean(reg$pblackreg[(reg$polltax == 0) & (reg$litreq == 0)]) [1] ˆ In addition to [ ], subset() may be used to subset data, which takes vectors and data frames as the first argument. Then, users can specify subset and/or select as arguments. The former should be a logical vector indicating elements or rows to keep while the latter should specify the variables to keep (either by a vector of variable names or by a numeric vector indicating column numbers) > ## counties with a higher than average black population but lower than > ## average registration rate > lowreg <- subset(reg, subset = ((reg$blackpop >= mean(reg$blackpop)) + & (reg$pblackreg <= mean(reg$pblackreg))), + select = c("blackpop", "pblackreg", "polltax", "litreq")) > ## How many impose both polltax and literacy requirement > nrow(lowreg[(lowreg$polltax == 1) & (lowreg$litreq == 1), ]) [1] 34 > ## Another way > sum((lowreg$polltax == 1) & (lowreg$litreq == 1)) [1] 34 4
5 3 Using Conditional Statements via ifelse() Conditional Statements evaluate a logical statement, then perform different actions depending on whether the statement is true or false. The function ifelse(x, Y, Z) performs an action Y and returns the result of this action as the output if the statement X is true and performs Z and returns the output if X is false. > ## Creating a new variable indicating counties with higher than average > ## black population and polltax > reg$highpoptax <- ifelse((reg$blackpop >= mean(reg$blackpop) & reg$litreq == 1), + "Yes", "No") > ## a more complex example creating region variable > reg$region <- ifelse(reg$state=="alabama" reg$state=="georgia" + reg$state=="louisiana" reg$state=="mississippi" + reg$state=="south Carolina", "Deep South", "Peripheral South") > reg$region <- as.factor(reg$region) > table(reg$region) Deep South Peripheral South More Functions for Summarizing Data In addition to the functions we learned last week (i.e., mean(), median(), min(), max(), and range()), we have the following new functions that are useful for summarizing data. ˆ var() (variance) and sd() (standard deviation) summarize numeric data. > ## two ways of calculating standard deviation > sd(reg$pblackreg) [1] > sqrt(var(reg$pblackreg)) [1] ˆ Weighted mean can be computed using weighted.mean(x, Y), where the output is the mean of X weighted by Y. > ## overall registration rate should be weighted by county population > weighted.mean(reg$pblackreg, reg$blackpop) [1] ˆ The function quantile(x, P) provides the sample quantiles of a numeric vector X for each element of another numeric vector P. > quantile(reg$pblackreg) # the default is quartiles plus min and max 5
6 0% 25% 50% 75% 100% > quantile(reg$pblackreg, seq(from = 0.2, to = 0.8, by = 0.2)) # quintiles 20% 40% 60% 80% ˆ The function IQR() returns the interquartile range > IQR(reg$pBlackReg) [1] Applying Functions by Indexes In many situations, we want to apply the same function repeatedly for different parts of the data. For example, in the black registration data, we may want to compute the registration rate within each state. Doing this manually is a pain especially if the number of states is large; you have to subset the data for one state and then use mean() to compute the registration for that state, and this has to be repeated for each state. The function tapply() (t is a short hand for table) enables you to do such computation in one line. Specifically, tapply(x, INDEX, FUN) applies the function FUN to X for each of the groups defined by a vector INDEX. Replace FUN with mean, median, sd, etc. to generate desired quantity. > ## Calculate the mean of % black registration rates by state > tapply(reg$pblackreg, reg$state, mean) Alabama Florida Georgia Louisiana Mississippi North Carolina South Carolina Writing Functions One of the greatest benefits of R is the flexibility the software allows for users to write their own functions. The syntax takes the form of name <- foo(bar1, bar2,...), where name is the function name, (bar1, bar2,...) are the inputs, and the commands within the brackets { } define the function. We begin with a simple example, creating a function to compute the mean from a vector with missing data. Note that an opening curly brace should never go on its own line. A closing curly brace should always go on its own line. Additionally, code within brackets should be aligned according to the text editor s automatic alignment. > x <- c(10:22, NA, 1:7, NA, 5) > mean(x) # cannot compute mean due to missing data [1] NA 6
7 > my.mean <- function(x){ + x <- x[!is.na(x)] # removes missing data + sum <- sum(x) + length <- length(x) + mean <- sum/length + out <- c(sum, length, mean) # define the output + names(out) <- c("sum", "length", "mean") + out # end function by calling output + } > my.mean(x) sum length mean Programming Camp Coding Rule: Curly Brackets An opening curly brace should never go on its own line. A closing curly brace should always go on its own line. Code within brackets should be properly aligned. GOOD Code: name <- foo(bar1, bar2,...){ command1 <- code1 command2 <- code2 } BAD Code: name <- foo(bar1, bar2,...) {command1 <- code1 command2 <- code2} 7 Graphs for Univariate Data: Histograms Graphs are critical tools for summarizing data in a straightforward and easy to understand manner. Great graphics strengthen projects and report by illustrating central features of the data without much additional explanation. Bad graphics are inefficient (leaving out critical information such as labels), potentially misleading, or too complicated. 7
8 ˆ There are several common graphing arguments that specify basic features of the graph, including the number of figures included on a graph, titles, axis labels, data range, etc. The following table summarizes these arguments: main Main title of the graph. xlab, ylab Labels for the x-axis and y-axis. xlim, ylim Specifies the x-limits and y-limits, as in xlim = c(0, 10), for the interval [0, 10]. col Specifies the color to use, e.g., "blue" or "red". cex Specifies size of plotted text or symbols. cex.axis Specifies size of axis annotation. cex.main Specifies size of plot title. ˆ The second class of graphing commands adds additional features to an existing graph. These functions include points() for adding points, lines() for lines, and text() for texts. lines() abline() points() text() arrows() Adds a plot-line to figure e.g. lines(x, y) where x and y define coordinates Adds a straight line e.g. abline(h = x) to place a horizontal line at height x e.g. abline(v = x) to place a vertical line at point x e.g. abline(a = x, b = y) to place a line with intercept x and slope y Add points e.g. points(x, y) to place dots with x and y as the coordinates e.g. points(x, y, line = TRUE) connects the dots as a line Adds additional text e.g. text(x, y, z) to display z as a text centered at coordinates (x, y) Adds arrows e.g. arrows(x, y, length, angle, code) to display arrows beginning from coordinate x, ending at coordinate y, for the length specified, at the angle specified, of the arrow type specified by code =, and of the color specified by col. ˆ The function identify() allows us to click on points in our graphs and R will return meaningful data about those datapoints. When done, press Esc. ˆ The command \n will force a line break. This is convenient to use with long plot titles. ˆ The command par(mfrow = c(x, Y)) will produce an X by Y figure in one graphical window. mfrow means the graphs will be filled by row whereas mfcol means they will be filled by columns ˆ The function hist() will produce a histogram to summarize the distribution of data. Setting freq = FALSE within hist() will produce a histogram rather than a frequency plot. If you specify a single number as the argument breaks, you will be able to set the number of equally spaced bins. If you give a numeric vector instead, it will specify the breakpoints between histogram cells. ˆ The function density() will calculate the smooth density of a numeric object as an output, which then in turn can be an input to the plot() function to draw the smooth histogram (use the lines() function to add it to the existing graph). 8
9 ˆ To add a legend to an existing graph, use legend(). The syntax legend(x, y, z) adds legend with text z at coordinates (x, y), which can also be substituted with "topleft", "bottomright", etc. > ## begin by subsetting the data > examiner <- reg[reg$fedex66 == 1, ] > noexaminer <- reg[reg$fedex66 == 0, ] > ## side by side histograms of registration rates > par(mfrow = c(1, 2)) > hist(examiner$pblackreg, freq = FALSE, breaks = 10, xlim = c(0, 100), + main = "Federal Examiner Present", + xlab = "Registration Rates") > hist(noexaminer$pblackreg, freq = FALSE, breaks = 10, xlim = c(0, 100), + main = "No Federal Examiner Present", cex.main = 0.995, ## smaller plot title + xlab = "Registration Rates") Federal Examiner Present No Federal Examiner Present Density Density Registration Rates Registration Rates > ## return to single graph > par(mfrow = c(1,1)) > ## histogram for counties with examiner > hist(examiner$pblackreg, freq = FALSE, breaks = 10, xlim = c(0, 100), + main = "Registration Rates \n Federal Examiner Present", cex.main = 1.5, + xlab = "Registration Rates", cex.axis = 1.5) > ## add counties with no-examiner as smooth density > lines(density(noexaminer$pblackreg)) > ## add lines to compare median of counties with/without examiner > abline(v = median(examiner$pblackreg), col = "red", lty = 2) > abline(v = median(noexaminer$pblackreg), col = "blue", lty = 2) > ## add legend > legend("topright", c("examiner Median", "No Examiner Median"), + lty = c(2, 2), col = c("red", "blue")) 9
10 Registration Rates Federal Examiner Present Density Examiner Median No Examiner Median Registration Rates 8 Printing and Saving Graphs There are a few ways to print and save the graphs you create in R. ˆ In the window of your graph (if you are a Mac user, make sure your graphic window rather than the R console is selected), you can click File: Save as: PDF... or File: Print... ˆ You can also right-click on a figure in R and copy the image (if you are a Mac user, you need to highlight the graph and type Apple+C to copy it). Then paste that image into Microsoft Word or any other document. ˆ You can also do it via a command by using pdf() before your plotting commands and then dev.off() afterwards. > pdf(file = "myplot.pdf", height = 3, width = 5) # height and width are in inches > dev.off() ## This creates a pdf file in the working directory 9 Practice Questions 9.1 Supreme Court Justice Ideal Points In a 2002 article, Andrew Martin and Kevin Quinn explored the extent to which the ideal points (i.e., policy preferences) of Supreme Court Justices change throughout their tenure on the Court.The data set contains the following: ˆ term Supreme Court Term ˆ justice Justice s Last Name ˆ idealpt Justice s Estimated Ideal Point, where negative values indicate liberal leanings and positive values indicate conservative leanings 10
11 ˆ pparty President s Political Party 1. Using the tapply() function, create a variable for the median ideal point of court justices for each term of the court. 2. Generate a new variable in the justices data set to indicate whether each justice falls on the Conservative or Liberal end of the ideal point spectrum. Using ifelse(), generate a new variable that takes a value of Liberal if the justice s ideal point is less than 0 and a value of Conservative if the justice s ideal point is greater than 0. Using table(), determine how many justices in the data set were Conservative and how many were liberal. 3. Create a histogram of justice s ideal points. Using tapply(), calculate each justice s median ideal point. Generate a histogram of the justice s ideal points. Be sure to add an informative title and labels. Create a red, vertical dashed line indicating the median. Additionally, add the density line to the plot. Save the graph you created as a pdf file using the file name xxx.pdf where xxx is your netid. Submit it to Blackboard along with your R script file xxx.r (Do not turn in your R console print out). 9.2 The Impact of Increases in the Minimum Wage Many economists believe that increasing the minimum wage actually hurts the poor, the very part of the population such a policy is supposed to help out. The reason is that if employers have to pay higher wages then they would simply hire less people. This means that those who are earning the minimum wage may lose their jobs as a result of increasing the minimum wage. Two researchers, David Card and Alan Krueger, tested this argument using the data from fast food industry in New Jersey and Pennsylvania. We analyze their original data in this precept. The njmin.txt data file, available at Blackboard, contains the following variables Variable chain location wagebefore wageafter fullbefore fullafter partbefore partafter Description fast food chain store location (southj, centralj, northj, shorej & PA) Starting wage measured before the increase Starting wage measured after the increase number of full-time employees before the increase number of full-time employees before the increase number of part-time employees before the increase number of part-time employees before the increase 1. Load the data into R 2. Create a factor variable called state, which takes two values NJ and PA. How many stores in NJ and PA does the study sample contain, respectively? Which chain has the largest number of restaurants NJ and PA, respectively, in this study sample? 3. Create four histograms in one graph using the starting wage data; starting from the left upper corner in a clockwise manner, NJ before the increase, NJ after the increase, PA after the increase, and PA before the increase. Add informative labels to each graph. Are the starting wages similar between NJ and PA before the increase? What about after the increase? Within each state, does the histogram look similar before and after the increase? 11
12 4. Compute the average number of full-time employees in NJ separately before and after the increase. Do the same for PA. What do these numbers tell you about the impact of the increase in minimum wage? Are these average differences large compared to the standard deviation of full-time employees before the change in each state? 5. Calculate the difference in the number of full-time employees between before and after the increase within each state. Summarize the data using two smoothed histograms in one plot (red solid line for NJ, and blue solid line for PA), with dashed lines for representing the mean difference of each state. Finally, calculate the difference in differences between the two states. (If you are curious, go ahead and conduct the same calculation for part-time employment and see if similar results are obtained.) 12
Statistical Programming Camp: An Introduction to R
Statistical Programming Camp: An Introduction to R Handout 5: Loops and Conditional Statements Fox Chapter 2, 8 In this handout, we cover the following new materials: Using loops for(i in X){ to repeat
More informationStatistical Software Camp: Introduction to R
Statistical Software Camp: Introduction to R Day 1 August 24, 2009 1 Introduction 1.1 Why Use R? ˆ Widely-used (ever-increasingly so in political science) ˆ Free ˆ Power and flexibility ˆ Graphical capabilities
More informationPOL 345: Quantitative Analysis and Politics
POL 345: Quantitative Analysis and Politics Precept Handout 4 Week 5 (Verzani Chapter 6: 6.2) Remember to complete the entire handout and submit the precept questions to the Blackboard DropBox 24 hours
More informationIntroduction to R for Epidemiologists
Introduction to R for Epidemiologists Jenna Krall, PhD Thursday, January 29, 2015 Final project Epidemiological analysis of real data Must include: Summary statistics T-tests or chi-squared tests Regression
More informationUnivariate Data - 2. Numeric Summaries
Univariate Data - 2. Numeric Summaries Young W. Lim 2018-08-01 Mon Young W. Lim Univariate Data - 2. Numeric Summaries 2018-08-01 Mon 1 / 36 Outline 1 Univariate Data Based on Numerical Summaries R Numeric
More informationUnivariate Data - 2. Numeric Summaries
Univariate Data - 2. Numeric Summaries Young W. Lim 2018-02-05 Mon Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 1 / 31 Outline 1 Univariate Data Based on Numerical Summaries Young
More informationAA BB CC DD EE. Introduction to Graphics in R
Introduction to Graphics in R Cori Mar 7/10/18 ### Reading in the data dat
More informationStatistics 251: Statistical Methods
Statistics 251: Statistical Methods Summaries and Graphs in R Module R1 2018 file:///u:/documents/classes/lectures/251301/renae/markdown/master%20versions/summary_graphs.html#1 1/14 Summary Statistics
More informationLab 1 Introduction to R
Lab 1 Introduction to R Date: August 23, 2011 Assignment and Report Due Date: August 30, 2011 Goal: The purpose of this lab is to get R running on your machines and to get you familiar with the basics
More informationDSCI 325: Handout 18 Introduction to Graphics in R
DSCI 325: Handout 18 Introduction to Graphics in R Spring 2016 This handout will provide an introduction to creating graphics in R. One big advantage that R has over SAS (and over several other statistical
More informationStatistical Programming with R
Statistical Programming with R Lecture 9: Basic graphics in R Part 2 Bisher M. Iqelan biqelan@iugaza.edu.ps Department of Mathematics, Faculty of Science, The Islamic University of Gaza 2017-2018, Semester
More informationPOL 345: Quantitative Analysis and Politics
POL 345: Quantitative Analysis and Politics Precept Handout 1 Week 2 (Verzani Chapter 1: Sections 1.2.4 1.4.31) Remember to complete the entire handout and submit the precept questions to the Blackboard
More informationPractical 2: Plotting
Practical 2: Plotting Complete this sheet as you work through it. If you run into problems, then ask for help - don t skip sections! Open Rstudio and store any files you download or create in a directory
More information2.1: Frequency Distributions and Their Graphs
2.1: Frequency Distributions and Their Graphs Frequency Distribution - way to display data that has many entries - table that shows classes or intervals of data entries and the number of entries in each
More informationBasics of Plotting Data
Basics of Plotting Data Luke Chang Last Revised July 16, 2010 One of the strengths of R over other statistical analysis packages is its ability to easily render high quality graphs. R uses vector based
More informationChapter 6: DESCRIPTIVE STATISTICS
Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling
More informationDr. Junchao Xia Center of Biophysics and Computational Biology. Fall /6/ /13
BIO5312 Biostatistics R Session 02: Graph Plots in R Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 9/6/2016 1 /13 Graphic Methods Graphic methods of displaying data give a quick
More informationGraphics in R STAT 133. Gaston Sanchez. Department of Statistics, UC Berkeley
Graphics in R STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 Base Graphics 2 Graphics in R Traditional
More informationINTRODUCTION TO R. Basic Graphics
INTRODUCTION TO R Basic Graphics Graphics in R Create plots with code Replication and modification easy Reproducibility! graphics package ggplot2, ggvis, lattice graphics package Many functions plot()
More informationBar Graphs and Dot Plots
CONDENSED LESSON 1.1 Bar Graphs and Dot Plots In this lesson you will interpret and create a variety of graphs find some summary values for a data set draw conclusions about a data set based on graphs
More informationCHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.
1 CHAPTER 1 Introduction Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. Variable: Any characteristic of a person or thing that can be expressed
More informationLAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA
LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to
More informationThings you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.
1 2 Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 2. How to construct (in your head!) and interpret confidence intervals.
More informationPlotting: An Iterative Process
Plotting: An Iterative Process Plotting is an iterative process. First we find a way to represent the data that focusses on the important aspects of the data. What is considered an important aspect may
More informationWHOLE NUMBER AND DECIMAL OPERATIONS
WHOLE NUMBER AND DECIMAL OPERATIONS Whole Number Place Value : 5,854,902 = Ten thousands thousands millions Hundred thousands Ten thousands Adding & Subtracting Decimals : Line up the decimals vertically.
More informationTMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS
To Describe Data, consider: Symmetry Skewness TMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS Unimodal or bimodal or uniform Extreme values Range of Values and mid-range Most frequently occurring values In
More informationLAB #1: DESCRIPTIVE STATISTICS WITH R
NAVAL POSTGRADUATE SCHOOL LAB #1: DESCRIPTIVE STATISTICS WITH R Statistics (OA3102) Lab #1: Descriptive Statistics with R Goal: Introduce students to various R commands for descriptive statistics. Lab
More informationData Visualization. Andrew Jaffe Instructor
Module 9 Data Visualization Andrew Jaffe Instructor Basic Plots We covered some basic plots previously, but we are going to expand the ability to customize these basic graphics first. 2/45 Read in Data
More informationCHAPTER 2: SAMPLING AND DATA
CHAPTER 2: SAMPLING AND DATA This presentation is based on material and graphs from Open Stax and is copyrighted by Open Stax and Georgia Highlands College. OUTLINE 2.1 Stem-and-Leaf Graphs (Stemplots),
More informationAdvanced Econometric Methods EMET3011/8014
Advanced Econometric Methods EMET3011/8014 Lecture 2 John Stachurski Semester 1, 2011 Announcements Missed first lecture? See www.johnstachurski.net/emet Weekly download of course notes First computer
More informationBar Charts and Frequency Distributions
Bar Charts and Frequency Distributions Use to display the distribution of categorical (nominal or ordinal) variables. For the continuous (numeric) variables, see the page Histograms, Descriptive Stats
More informationMATH11400 Statistics Homepage
MATH11400 Statistics 1 2010 11 Homepage http://www.stats.bris.ac.uk/%7emapjg/teach/stats1/ 1.1 A Framework for Statistical Problems Many statistical problems can be described by a simple framework in which
More informationPlotting Complex Figures Using R. Simon Andrews v
Plotting Complex Figures Using R Simon Andrews simon.andrews@babraham.ac.uk v2017-11 The R Painters Model Plot area Base plot Overlays Core Graph Types Local options to change a specific plot Global options
More informationIntroduction to R: Day 2 September 20, 2017
Introduction to R: Day 2 September 20, 2017 Outline RStudio projects Base R graphics plotting one or two continuous variables customizable elements of plots saving plots to a file Create a new project
More informationData Management Project Using Software to Carry Out Data Analysis Tasks
Data Management Project Using Software to Carry Out Data Analysis Tasks This activity involves two parts: Part A deals with finding values for: Mean, Median, Mode, Range, Standard Deviation, Max and Min
More information> glucose = c(81, 85, 93, 93, 99, 76, 75, 84, 78, 84, 81, 82, 89, + 81, 96, 82, 74, 70, 84, 86, 80, 70, 131, 75, 88, 102, 115, + 89, 82, 79, 106)
This document describes how to use a number of R commands for plotting one variable and for calculating one variable summary statistics Specifically, it describes how to use R to create dotplots, histograms,
More informationChapter 1. Looking at Data-Distribution
Chapter 1. Looking at Data-Distribution Statistics is the scientific discipline that provides methods to draw right conclusions: 1)Collecting the data 2)Describing the data 3)Drawing the conclusions Raw
More informationChapter 5 An Introduction to Basic Plotting Tools
Chapter 5 An Introduction to Basic Plotting Tools We have demonstrated the use of R tools for importing data, manipulating data, extracting subsets of data, and making simple calculations, such as mean,
More informationExploring and Understanding Data Using R.
Exploring and Understanding Data Using R. Loading the data into an R data frame: variable
More informationIntro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington
Intro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington Class Outline - The R Environment and Graphics Engine - Basic Graphs
More informationDealing with Data in Excel 2013/2016
Dealing with Data in Excel 2013/2016 Excel provides the ability to do computations and graphing of data. Here we provide the basics and some advanced capabilities available in Excel that are useful for
More informationIQR = number. summary: largest. = 2. Upper half: Q3 =
Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number
More informationPackage sciplot. February 15, 2013
Package sciplot February 15, 2013 Version 1.1-0 Title Scientific Graphing Functions for Factorial Designs Author Manuel Morales , with code developed by the R Development Core Team
More informationR Programming: Worksheet 6
R Programming: Worksheet 6 Today we ll study a few useful functions we haven t come across yet: all(), any(), `%in%`, match(), pmax(), pmin(), unique() We ll also apply our knowledge to the bootstrap.
More informationIST 3108 Data Analysis and Graphics Using R Week 9
IST 3108 Data Analysis and Graphics Using R Week 9 Engin YILDIZTEPE, Ph.D 2017-Spring Introduction to Graphics >y plot (y) In R, pictures are presented in the active graphical device or window.
More informationChapter 2 Organizing and Graphing Data. 2.1 Organizing and Graphing Qualitative Data
Chapter 2 Organizing and Graphing Data 2.1 Organizing and Graphing Qualitative Data 2.2 Organizing and Graphing Quantitative Data 2.3 Stem-and-leaf Displays 2.4 Dotplots 2.1 Organizing and Graphing Qualitative
More informationIntroduction to R. Biostatistics 615/815 Lecture 23
Introduction to R Biostatistics 615/815 Lecture 23 So far We have been working with C Strongly typed language Variable and function types set explicitly Functional language Programs are a collection of
More informationName: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution
Name: Date: Period: Chapter 2 Section 1: Describing Location in a Distribution Suppose you earned an 86 on a statistics quiz. The question is: should you be satisfied with this score? What if it is the
More informationA (very) brief introduction to R
A (very) brief introduction to R You typically start R at the command line prompt in a command line interface (CLI) mode. It is not a graphical user interface (GUI) although there are some efforts to produce
More informationBox Plots. OpenStax College
Connexions module: m46920 1 Box Plots OpenStax College This work is produced by The Connexions Project and licensed under the Creative Commons Attribution License 3.0 Box plots (also called box-and-whisker
More informationSolution to Tumor growth in mice
Solution to Tumor growth in mice Exercise 1 1. Import the data to R Data is in the file tumorvols.csv which can be read with the read.csv2 function. For a succesful import you need to tell R where exactly
More informationModule 10. Data Visualization. Andrew Jaffe Instructor
Module 10 Data Visualization Andrew Jaffe Instructor Basic Plots We covered some basic plots on Wednesday, but we are going to expand the ability to customize these basic graphics first. 2/37 But first...
More informationDepending on the computer you find yourself in front of, here s what you ll need to do to open SPSS.
1 SPSS 11.5 for Windows Introductory Assignment Material covered: Opening an existing SPSS data file, creating new data files, generating frequency distributions and descriptive statistics, obtaining printouts
More informationPart I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures
Part I, Chapters 4 & 5 Data Tables and Data Analysis Statistics and Figures Descriptive Statistics 1 Are data points clumped? (order variable / exp. variable) Concentrated around one value? Concentrated
More informationMATH NATION SECTION 9 H.M.H. RESOURCES
MATH NATION SECTION 9 H.M.H. RESOURCES SPECIAL NOTE: These resources were assembled to assist in student readiness for their upcoming Algebra 1 EOC. Although these resources have been compiled for your
More information6th Grade Vocabulary Mathematics Unit 2
6 th GRADE UNIT 2 6th Grade Vocabulary Mathematics Unit 2 VOCABULARY area triangle right triangle equilateral triangle isosceles triangle scalene triangle quadrilaterals polygons irregular polygons rectangles
More informationPackage EnQuireR. R topics documented: February 19, Type Package Title A package dedicated to questionnaires Version 0.
Type Package Title A package dedicated to questionnaires Version 0.10 Date 2009-06-10 Package EnQuireR February 19, 2015 Author Fournier Gwenaelle, Cadoret Marine, Fournier Olivier, Le Poder Francois,
More informationIntroduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010
UCLA Statistical Consulting Center R Bootcamp Irina Kukuyeva ikukuyeva@stat.ucla.edu September 20, 2010 Outline 1 Introduction 2 Preliminaries 3 Working with Vectors and Matrices 4 Data Sets in R 5 Overview
More informationR syntax guide. Richard Gonzalez Psychology 613. August 27, 2015
R syntax guide Richard Gonzalez Psychology 613 August 27, 2015 This handout will help you get started with R syntax. There are obviously many details that I cannot cover in these short notes but these
More informationMatrix algebra. Basics
Matrix.1 Matrix algebra Matrix algebra is very prevalently used in Statistics because it provides representations of models and computations in a much simpler manner than without its use. The purpose of
More informationChapter 3 - Displaying and Summarizing Quantitative Data
Chapter 3 - Displaying and Summarizing Quantitative Data 3.1 Graphs for Quantitative Data (LABEL GRAPHS) August 25, 2014 Histogram (p. 44) - Graph that uses bars to represent different frequencies or relative
More informationWeek 4: Describing data and estimation
Week 4: Describing data and estimation Goals Investigate sampling error; see that larger samples have less sampling error. Visualize confidence intervals. Calculate basic summary statistics using R. Calculate
More informationTable of Contents (As covered from textbook)
Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression
More informationFathom Dynamic Data TM Version 2 Specifications
Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other
More information10.4 Measures of Central Tendency and Variation
10.4 Measures of Central Tendency and Variation Mode-->The number that occurs most frequently; there can be more than one mode ; if each number appears equally often, then there is no mode at all. (mode
More information10.4 Measures of Central Tendency and Variation
10.4 Measures of Central Tendency and Variation Mode-->The number that occurs most frequently; there can be more than one mode ; if each number appears equally often, then there is no mode at all. (mode
More informationOrganizing and Summarizing Data
1 Organizing and Summarizing Data Key Definitions Frequency Distribution: This lists each category of data and how often they occur. : The percent of observations within the one of the categories. This
More informationAz R adatelemzési nyelv
Az R adatelemzési nyelv alapjai II. Egészségügyi informatika és biostatisztika Gézsi András gezsi@mit.bme.hu Functions Functions Functions do things with data Input : function arguments (0,1,2, ) Output
More informationPractice for Learning R and Learning Latex
Practice for Learning R and Learning Latex Jennifer Pan August, 2011 Latex Environments A) Try to create the following equations: 1. 5+6 α = β2 2. P r( 1.96 Z 1.96) = 0.95 ( ) ( ) sy 1 r 2 3. ˆβx = r xy
More informationPyPlot. The plotting library must be imported, and we will assume in these examples an import statement similar to those for numpy and math as
Geog 271 Geographic Data Analysis Fall 2017 PyPlot Graphicscanbeproducedin Pythonviaavarietyofpackages. We willuseapythonplotting package that is part of MatPlotLib, for which documentation can be found
More information3. Data Analysis and Statistics
3. Data Analysis and Statistics 3.1 Visual Analysis of Data 3.2.1 Basic Statistics Examples 3.2.2 Basic Statistical Theory 3.3 Normal Distributions 3.4 Bivariate Data 3.1 Visual Analysis of Data Visual
More informationStat 290: Lab 2. Introduction to R/S-Plus
Stat 290: Lab 2 Introduction to R/S-Plus Lab Objectives 1. To introduce basic R/S commands 2. Exploratory Data Tools Assignment Work through the example on your own and fill in numerical answers and graphs.
More informationCommon Sta 101 Commands for R. 1 One quantitative variable. 2 One categorical variable. 3 Two categorical variables. Summary statistics
Common Sta 101 Commands for R 1 One quantitative variable summary(x) # most summary statitstics at once mean(x) median(x) sd(x) hist(x) boxplot(x) # horizontal = TRUE for horizontal plot qqnorm(x) qqline(x)
More informationRegression III: Advanced Methods
Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions
More informationThe Average and SD in R
The Average and SD in R The Basics: mean() and sd() Calculating an average and standard deviation in R is straightforward. The mean() function calculates the average and the sd() function calculates the
More informationIntroduction to R 21/11/2016
Introduction to R 21/11/2016 C3BI Vincent Guillemot & Anne Biton R: presentation and installation Where? https://cran.r-project.org/ How to install and use it? Follow the steps: you don t need advanced
More informationSPSS TRAINING SPSS VIEWS
SPSS TRAINING SPSS VIEWS Dataset Data file Data View o Full data set, structured same as excel (variable = column name, row = record) Variable View o Provides details for each variable (column in Data
More informationThe nor1mix Package. August 3, 2006
The nor1mix Package August 3, 2006 Title Normal (1-d) Mixture Models (S3 Classes and Methods) Version 1.0-6 Date 2006-08-02 Author: Martin Mächler Maintainer Martin Maechler
More informationShrinkage of logarithmic fold changes
Shrinkage of logarithmic fold changes Michael Love August 9, 2014 1 Comparing the posterior distribution for two genes First, we run a DE analysis on the Bottomly et al. dataset, once with shrunken LFCs
More informationNo. of blue jelly beans No. of bags
Math 167 Ch5 Review 1 (c) Janice Epstein CHAPTER 5 EXPLORING DATA DISTRIBUTIONS A sample of jelly bean bags is chosen and the number of blue jelly beans in each bag is counted. The results are shown in
More informationSTA 570 Spring Lecture 5 Tuesday, Feb 1
STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row
More informationPreservation of protein-protein interaction networks Simple simulated example
Preservation of protein-protein interaction networks Simple simulated example Peter Langfelder and Steve Horvath May, 0 Contents Overview.a Setting up the R session............................................
More information8: Statistics. Populations and Samples. Histograms and Frequency Polygons. Page 1 of 10
8: Statistics Statistics: Method of collecting, organizing, analyzing, and interpreting data, as well as drawing conclusions based on the data. Methodology is divided into two main areas. Descriptive Statistics:
More informationTips and Guidance for Analyzing Data. Executive Summary
Tips and Guidance for Analyzing Data Executive Summary This document has information and suggestions about three things: 1) how to quickly do a preliminary analysis of time-series data; 2) key things to
More informationProb and Stats, Sep 4
Prob and Stats, Sep 4 Variations on the Frequency Histogram Book Sections: N/A Essential Questions: What are the methods for displaying data, and how can I build them? What are variations of the frequency
More informationSTAT:5400 Computing in Statistics
STAT:5400 Computing in Statistics Introduction to SAS Lecture 18 Oct 12, 2015 Kate Cowles 374 SH, 335-0727 kate-cowles@uiowaedu SAS SAS is the statistical software package most commonly used in business,
More informationCAMBRIDGE TECHNOLOGY IN MATHS Year 11 TI-89 User guide
Year 11 TI-89 User guide Page 1 of 17 CAMBRIDGE TECHNOLOGY IN MATHS Year 11 TI-89 User guide CONTENTS Getting started 2 Linear equations and graphs 3 Statistics 5 Sequences 11 Business and related mathematics
More informationPyPlot. The plotting library must be imported, and we will assume in these examples an import statement similar to those for numpy and math as
Geog 271 Geographic Data Analysis Fall 2015 PyPlot Graphicscanbeproducedin Pythonviaavarietyofpackages. We willuseapythonplotting package that is part of MatPlotLib, for which documentation can be found
More informationPrepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.
Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good
More informationThe nor1mix Package. June 12, 2007
The nor1mix Package June 12, 2007 Title Normal (1-d) Mixture Models (S3 Classes and Methods) Version 1.0-7 Date 2007-03-15 Author Martin Mächler Maintainer Martin Maechler
More information1. Start WinBUGS by double clicking on the WinBUGS icon (or double click on the file WinBUGS14.exe in the WinBUGS14 directory in C:\Program Files).
Hints on using WinBUGS 1 Running a model in WinBUGS 1. Start WinBUGS by double clicking on the WinBUGS icon (or double click on the file WinBUGS14.exe in the WinBUGS14 directory in C:\Program Files). 2.
More informationName Geometry Intro to Stats. Find the mean, median, and mode of the data set. 1. 1,6,3,9,6,8,4,4,4. Mean = Median = Mode = 2.
Name Geometry Intro to Stats Statistics are numerical values used to summarize and compare sets of data. Two important types of statistics are measures of central tendency and measures of dispersion. A
More informationAn Introduction to R Graphics
An Introduction to R Graphics PnP Group Seminar 25 th April 2012 Why use R for graphics? Fast data exploration Easy automation and reproducibility Create publication quality figures Customisation of almost
More informationChapter 5snow year.notebook March 15, 2018
Chapter 5: Statistical Reasoning Section 5.1 Exploring Data Measures of central tendency (Mean, Median and Mode) attempt to describe a set of data by identifying the central position within a set of data
More informationLecture 3 Questions that we should be able to answer by the end of this lecture:
Lecture 3 Questions that we should be able to answer by the end of this lecture: Which is the better exam score? 67 on an exam with mean 50 and SD 10 or 62 on an exam with mean 40 and SD 12 Is it fair
More information1. Descriptive Statistics
1.1 Descriptive statistics 1. Descriptive Statistics A Data management Before starting any statistics analysis with a graphics calculator, you need to enter the data. We will illustrate the process by
More informationSTAT 503 Fall Introduction to SAS
Getting Started Introduction to SAS 1) Download all of the files, sas programs (.sas) and data files (.dat) into one of your directories. I would suggest using your H: drive if you are using a computer
More informationNumerical Descriptive Measures
Chapter 3 Numerical Descriptive Measures 1 Numerical Descriptive Measures Chapter 3 Measures of Central Tendency and Measures of Dispersion A sample of 40 students at a university was randomly selected,
More informationChapter 5: The standard deviation as a ruler and the normal model p131
Chapter 5: The standard deviation as a ruler and the normal model p131 Which is the better exam score? 67 on an exam with mean 50 and SD 10 62 on an exam with mean 40 and SD 12? Is it fair to say: 67 is
More informationLecture 3 Questions that we should be able to answer by the end of this lecture:
Lecture 3 Questions that we should be able to answer by the end of this lecture: Which is the better exam score? 67 on an exam with mean 50 and SD 10 or 62 on an exam with mean 40 and SD 12 Is it fair
More information