ICPSR Training Program McMaster University Summer, The R Statistical Computing Environment: The Basics and Beyond

Size: px
Start display at page:

Download "ICPSR Training Program McMaster University Summer, The R Statistical Computing Environment: The Basics and Beyond"

Transcription

1 John Fox ICPSR Training Program McMaster University Summer, 2012 The R Statistical Computing Environment: The Basics and Beyond The R statistical programming language and computing environment has become the defacto standard for writing statistical software among statisticians and has made substantial inroads in the social sciences. R is a free, open-source implementation of the S language, and is available for Windows, Mac OS X, and Unix/Linux systems. There is also a commercial implementation of S called S-PLUS, but it has been eclipsed by R. The basic R system is developed and maintained by the R Core group, comprising 20 members, many of them eminent in the field of statistical computing. The R Project for Statistical Computing is a project of the R Foundation, whose membership includes the R Core group and several other individuals. A statistical package, such as SPSS or SAS, is primarily oriented toward combining instructions with rectangular case-by-variable datasets to produce (often voluminous) printouts. Such packages make routine data analysis relatively easy, but they make it relatively difficult to do things that are innovative or non-standard, or to add to the builtin capabilities of the package. In contrast, a good statistical computing environment also makes routine data analysis easy, but it additionally supports convenient programming; this means that users can extend the already impressive facilities of R. Statisticians and others have taken advantage of the extensibility of R to contribute nearly 4000 freely available packages of documented R programs and data to CRAN (the Comprehensive R Archive Network) < and many others to the Bioconductor package archive < As well, R is especially capable in the area of statistical graphics, reflecting the origin of S at Bell Labs, a centre of graphical innovation. The first day of this workshop is meant to provide a basic overview of and introduction to R, including to statistical modeling in R in effect, using R as a statistical package. The following three days pick up where the basic lectures leave off, and are intended to provide the background required to use R seriously for data analysis and presentation, including an introduction to R programming and to the design of custom statistical graphs, unlocking the power in the R statistical programming environment. Participants should bring their laptops to the workshop and should install R and RStudio in advance (see the instructions on the workshop website). An outline of the workshop follows (with chapter references to Fox and Weisberg, An R Companion to Applied Regression, Second Edition): Day 1. Getting started with R (Ch. 1); statistical models in R (Ch. 4, 5, & appendices) Day 2. Data in R (Ch. 2); the basics of R programming (Ch. 8, Sec )

2 Day 3. R programming, beyond the basics (Ch. 8, Sec ) Day 4. R graphics (Ch. 7); building R packages Course Web Site Materials for the course will be deposited at < abbreviation < >, which also has active links to many of the resources described in this syllabus. Acquiring R More detailed instructions are on the workshop website at < Windows Users You can download the R Windows installer from CRAN < or better from a CRAN mirror site near you < then double-click on the installer to install R as you would any Windows software. You can subsequently download and install only those packages that you want over the Internet from CRAN, via the Packages Install packages from CRAN menu in the RGui console. Mac Users A universal binary for Mac OS X 10.5 and higher is available from CRAN < or better from a CRAN mirror site near you < Double-click on the downloaded file to install R. You can then download and install packages over the Internet via the Packages & Data Packages Installer menu in R.app or R64.app console. Linux/Unix Users Precompiled binaries for popular Linux systems are available from CRAN < (or better from a CRAN mirror site near you < or users can compile R from source. See CRAN for details < RStudio RStudio < is a free, open-source interactive development environment (IDE) for R that installs easily on Windows, Mac OS X, and Linux systems and works well out of the box. Though still under active development, RStudio in my

3 opinion provides a better interface to R than the standard Windows and Mac OS X interfaces. Among the many services that it provides RStudio includes a package manner that will allow you to install packages conveniently. Installing the car Package For this course, you'll want to install the car package associated with the R Companion to Applied Regression; use the command install.packages("car") or install via the menus in the Windows or Mac OS X versions of R or via the packages tab in RStudio. Selected Bibliography Publishers of statistical texts have been producing a steady stream of books on R. Of particular note is Springer's Use R! series of brief paperbacks on various R-related topics < several titles of which I've listed below. Recently, Chapman and Hall, which has published a number of books on R, has also announced The R Series. Basic Texts The principal source for this workshop is J. Fox and S. Weisberg, An R Companion to Applied Regression, Second Edition, Sage (2011). Additional materials are available on the web site for the book < including several appendices (on structural-equation models, mixed models, survival analysis, etc.). The book is associated with the car package for R. I am a member of the R Foundation. Alternatively (or additionally), more advanced students may wish to use W. N. Venables and B. D. Ripley, Modern Applied Statistics with S as a principal source. Bill Venables is a member of the R Foundation, and Brian Ripley is a member of the R Core group. Manuals R is distributed with a set of manuals, which are also available at the CRAN web site < A manual for S-PLUS Trellis Graphics (also useful for the lattice package in R) is also available on the web at < Programming in R

4 R. A. Becker, J. M. Chambers, and A.R. Wilks, The New S Language: A Programming Environment for Data Analysis and Statistics. Pacific Grove, CA: Wadsworth, Defines S Version 2, which forms the basis of S Versions 3 and 4, as well as R. (Sometimes called the Blue Book. ) J. M. Chambers, Programming with Data: A Guide to the S Language. New York: Springer, Describes the then-new features in S Version 4, including the newer formal object-oriented programming system (also incorporated in R), by the principal designer of the S language and a member of the R Core group of developers. Not an easy read. (The Green Book. ) J. M. Chambers, Software for Data Analysis: Programming with R. New York: Springer, Chambers s newest book ranges quite widely, and emphasizes a deep understanding of the R language, along with object-oriented programming, and links between R and other software. Some topics are unusual, such as processing text data in R. J. M. Chambers and T.J. Hastie, eds., Statistical Models in S. Pacific Grove, CA: Wadsworth, An edited volume describing the statistical modeling capabilities in S, Versions 3 and 4, and R, and the object-oriented programming system used in S Version 3 and R (and available, for backwards compatibility, in S Version 4). In addition, the text covers S software for particular kinds of statistical models, including linear models, nonlinear models, generalized linear models, local-polynomial regression models, and generalized additive models. (The White Book. ) R. Gentleman, R Programming for Bioinformatics. Boca Raton: Chapman and Hall, A thorough, though at points relatively difficult, treatment of programming in R, by one of the original co-developers of R and a founder of the related Bioconductor Project (which develops computing tools for the analysis of genomic data). Don t let the title fool you: Most of the book is of general interest to R programmers. R. Ihaka and R. Gentleman, R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics, 5: , The original published description of the R project, now quite out of date but still worth looking at. W. N. Venables and B. D. Ripley, S Programming. New York: Springer, A companion volume to Modern Applied Statistics with S, and at the time of its publication the definitive treatment of writing software in the various versions of S-PLUS and R; now somewhat dated, particularly with respect to R. Brian Ripley is a member of the R Core group of developers, and Bill Venables is a member of the R Foundation. Statistical Computing in R The following three books treat traditional topics in statistical computing, such as optimization, simulation, probability calculations, and computational linear algebra, using R (although the coverage of particular topics in the books differs). All offer introductions

5 to R programming. Of these books, Braun and Murdoch is the briefest and most accessible. W. J. Braun and D. J. Murdoch, A First Course in Statistical Programming with R. Cambridge: Cambridge University Press, Duncan Murdoch is a member of the R Core group of developers. O. Jones, R. Maillardet, and A. Robinson, Introduction to Scientific Programming and Simulation Using R. Boca Raton: Chapman and Hall, M. L. Rizzo. Statistical Computing with R, Boca Raton: Chapman and Hall, Graphics in R P. Murrell. R Graphics, Second Edition. New York: Chapman and Hall, A tour-deforce the definitive reference on traditional R graphics and on the grid graphics system on which lattice graphics (the R implementation of William Cleveland s Trellis graphics) is built. R code to produce the figures in the book are on Murrell s web site < Paul Murrell is a member of the R Core group of developers. P. Murrell and R. Ihaka, An approach to providing mathematical annotation in plots. Journal of Computational and Graphical Statistics, 9: , One of the unusual and very useful features of R graphics is the ability to include mathematical notation. This article explains how. Paul Murrell and Ross Ihaka are both members of the R core group. D. Sarkar, Lattice: Multivariate Data Visualization with R. New York: Springer, Deepayan Sarkar is the developer of the powerful lattice package in R, which implements Trellis graphics. This book provides a fine introduction to and overview of lattice graphics. Figures from the book and the R code to produce them are available on the web < Deepayan Sarkar is a member of the R Core group of developers. H. Wickham, ggplot2: Elegant Graphics for Data Analysis. New York: Springer, 2009: A guide to Hadley Wickham's ggplot2 package, which provides an alternative graphics system for R based on an extension of Wilkinson's The Grammar of Graphics (Second Edition, Springer, 2005), which, in turn, provides a systematic basis for constructing statistical graphs. Data Management P. Spector, Data Manipulation with R. New York: Springer, Data management is a dry subject, but the ability to carry it out is vital to the effective day-to-day use of R (or of any statistical software). Spector provides a reasonably broad and clear introduction to the subject.

6 (Highly) Selected Statistical Methods Programmed in R Also see the package listing on CRAN < and the various CRAN task views < R. S. Bivand, E. J. Pebesma, and V. Gómez-Rubio, Applied Spatial Data Analysis with R, New York: Springer, There is a strong community of researchers in spatial statistics developing R software, much of which is described in this book, including the basic sp package, which provides R classes for spatial data. Roger Bivand is a member of the R Foundation. W. Bowman and A. Azzalini, Applied Smoothing Techniques for Data Analysis: The Kernel Approach with S-Plus Illustrations. Oxford: Oxford University Press, A good introduction to nonparametric density estimation and nonparametric regression, associated with the sm package (for both S-PLUS and R). C. Davison and D. V. Hinkley, Bootstrap Methods and their Application. Cambridge: Cambridge University Press, A comprehensive introduction to bootstrap resampling, associated with the boot package (written by A. J. Canty). Somewhat more difficult than Efron and Tibshirani (immediately below). B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap. London: Chapman and Hall, Another extensive treatment of bootstrapping by its originator (Efron), also accompanied by an R package, bootstrap (but somewhat less usable than boot). A. Gelman and J. Hill, Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge: Cambridge University Press, A wide-ranging yet deep treatment of hierarchical models and various related topics, predominantly but not exclusively from a Bayesian perspective, using both R and BUGS software. F. E. Harrell, Jr., Regression Modeling Strategies, With Applications to Linear Models, Logistic Regression, and Survival Analysis. New York: Springer, Describes an interesting approach to statistical modeling, with frequent references to Harrell's Hmisc and Design packages. T. J. Hastie and R. J. Tibshirani, Generalized Additive Models. London: Chapman and Hall, An accessible treatment of generalized additive models, as implemented in the gam package, and of nonparametric regression analysis in general. [The gam function in the mgcv package in R takes a somewhat different approach; see Wood (2000), below.] R. Koenker, Quantile Regression. Cambridge: Cambridge University Press, Describes a variety of methods for quantile regression by the leading figure in the area. The methods are implemented in Koenker's quantreg package for R.

7 C. Loader, Local Likelihood and Regression. New York: Springer, Another text on nonparametric regression and density estimation, using the locfit package. Although the text is less readable than Bowman and Azzalini, the locfit software in very capable. T. Lumley, Complex Surveys: A Guide to Analysis Using R. Hoboken NJ, Wiley, A lucid introduction to the analysis of data from complex survey samples and to Lumley's highly capable survey package. Thomas Lumley is a member of the R Core group of developers. G. P. Nason, Wavelet Methods in Statistics with R. New York: Springer, Describes the wavethresh package for wavelet smoothing, by one of the key figures in the development of wavelet methods in statistics. J. C. Pinheiro and D. M. Bates, Mixed-Effects Models in S and S-PLUS. New York: Springer, An extensive treatment of linear and nonlinear mixed-effects models in S, focused on the authors' nlme package. Mixed models are appropriate for various kinds of non-independent (clustered) data, including hierarchical and longitudinal data. Does not cover Bates's newer lme4 package. Doug Bates is a member of the R Core group of developers. T. M. Therneau and P. M. Grambsch, Modeling Survival Data: Extending the Cox Model. New York, Springer: An overview of both basic and advanced methods of survival analysis (event-history analysis), with reference to S and SAS software, the former implemented in Therneau's state-of-the-art survival package. S. van Buuren, Flexible Imputation of Missing Data, Boca Raton FL: CRC Press, There are several packages in R for multiple imputation of missing data; this book largely describes the mice (multiple imputation by chained equations) package. W. N. Venables and B. D. Ripley. Modern Applied Statistics with S, Fourth Edition. New York: Springer, An influential and wide-ranging treatment of data analysis using S. Many of the facilities described in the book are programmed in the associated (and indispensable) MASS, nnet, and spatial packages, which are included in the standard R distribution. This text is more advanced and has a broader focus than the R Companion. Brian Ripley is a member of the R Core group of developers. S. N. Wood, Generalized Additive Models: An Introduction with R. New York: Chapman and Hall, Describes the mgcv package in R, which contains a gam function for fitting generalized additive models based on smoothing splines. The initials mgcv stand for multiple generalized cross validation, the method by which Wood selects GAM smoothing parameters. Other Sources (Some Free) See the publications list on the R web site < The R Journal < the journal of the R

8 Project for Statistical Computing, and its predecessor R News < are also good sources of information, as is the Journal of Statistical Software < an on-line American Statistical Association journal dominated by coverage of R packages.

ICPSR Training Program McMaster University Summer, Introduction to the R Statistical Computing Environment

ICPSR Training Program McMaster University Summer, Introduction to the R Statistical Computing Environment John Fox ICPSR Training Program McMaster University Summer, 2016 Introduction to the R Statistical Computing Environment The R statistical programming language and computing environment has become the

More information

ICPSR Training Program McMaster University Summer, The R Statistical Computing Environment: The Basics and Beyond

ICPSR Training Program McMaster University Summer, The R Statistical Computing Environment: The Basics and Beyond John Fox ICPSR Training Program McMaster University Summer, 2016 The R Statistical Computing Environment: The Basics and Beyond The R statistical programming language and computing environment has become

More information

Overview of R. Biostatistics

Overview of R. Biostatistics Overview of R Biostatistics 140.776 Stroustrup s Law There are only two kinds of languages: the ones people complain about and the ones nobody uses. R is a dialect of S What is R? What is S? S is a language

More information

Goals of this course. Crash Course in R. Getting Started with R. What is R? What is R? Getting you setup to use R under Windows

Goals of this course. Crash Course in R. Getting Started with R. What is R? What is R? Getting you setup to use R under Windows Oxford Spring School, April 2013 Effective Presentation ti Monday morning lecture: Crash Course in R Robert Andersen Department of Sociology University of Toronto And Dave Armstrong Department of Political

More information

IASC-ERS Summer School CoDaCourse 2014

IASC-ERS Summer School CoDaCourse 2014 IASC-ERS Summer School CoDaCourse 2014 3. Software www.compositionaldata.com Dept. Informàtica, Matemàtica Aplicada i Estadística Universitat de Girona Campus Montilivi, EPS-4 E-17071 Girona (Spain) [Coord:

More information

8.1 Come analizzare i dati: R

8.1 Come analizzare i dati: R 8.1 Come analizzare i dati: R Insegnamento di Informatica Elisabetta Ronchieri Corso di Laurea di Economia, Universitá di Ferrara I semestre, anno 2014-2015 Elisabetta Ronchieri (Universitá) Insegnamento

More information

Statistics Statistical Computing Software

Statistics Statistical Computing Software Statistics 135 - Statistical Computing Software Mark E. Irwin Department of Statistics Harvard University Autumn Term Monday, September 19, 2005 - January 2006 Copyright c 2005 by Mark E. Irwin Personnel

More information

Introducing Oracle R Enterprise 1.4 -

Introducing Oracle R Enterprise 1.4 - Hello, and welcome to this online, self-paced lesson entitled Introducing Oracle R Enterprise. This session is part of an eight-lesson tutorial series on Oracle R Enterprise. My name is Brian Pottle. I

More information

Introduction to R: Part I

Introduction to R: Part I Introduction to R: Part I Jeffrey C. Miecznikowski March 26, 2015 R impact R is the 13th most popular language by IEEE Spectrum (2014) Google uses R for ROI calculations Ford uses R to improve vehicle

More information

CREATING POWERFUL AND EFFECTIVE GRAPHICAL DISPLAYS: AN INTRODUCTION TO LATTICE GRAPHICS IN R

CREATING POWERFUL AND EFFECTIVE GRAPHICAL DISPLAYS: AN INTRODUCTION TO LATTICE GRAPHICS IN R APSA Short Course, SC 13 Chicago, Illinois August 29, 2007 Michigan State University CREATING POWERFUL AND EFFECTIVE GRAPHICAL DISPLAYS: AN INTRODUCTION TO LATTICE GRAPHICS IN R I. Some Basic R Concepts

More information

Bootstrap and multiple imputation under missing data in AR(1) models

Bootstrap and multiple imputation under missing data in AR(1) models EUROPEAN ACADEMIC RESEARCH Vol. VI, Issue 7/ October 2018 ISSN 2286-4822 www.euacademic.org Impact Factor: 3.4546 (UIF) DRJI Value: 5.9 (B+) Bootstrap and multiple imputation under missing ELJONA MILO

More information

Nonparametric Regression

Nonparametric Regression Nonparametric Regression John Fox Department of Sociology McMaster University 1280 Main Street West Hamilton, Ontario Canada L8S 4M4 jfox@mcmaster.ca February 2004 Abstract Nonparametric regression analysis

More information

Multivariable Regression Modelling

Multivariable Regression Modelling Multivariable Regression Modelling A review of available spline packages in R. Aris Perperoglou for TG2 ISCB 2015 Aris Perperoglou for TG2 Multivariable Regression Modelling ISCB 2015 1 / 41 TG2 Members

More information

An Introduction to R. Subhajit Dutta Stat-Math Unit. Indian Statistical Institute, Kolkata October 17, 2012

An Introduction to R. Subhajit Dutta Stat-Math Unit. Indian Statistical Institute, Kolkata October 17, 2012 An Introduction to R Subhajit Dutta Stat-Math Unit Indian Statistical Institute, Kolkata October 17, 2012 Why R? It is FREE!! Basic as well as specialized data analysis technique at your fingertips. Highly

More information

The History and Use of R. Joseph Kambourakis

The History and Use of R. Joseph Kambourakis The History and Use of R Joseph Kambourakis Ground Rules Interrupt me These are all my opinions and not of EMC or Big Data Analytics, Discovery & Visualization Meetup Slides will be available Joseph

More information

A Survey of Statistical Modeling Tools

A Survey of Statistical Modeling Tools 1 of 6 A Survey of Statistical Modeling Tools Madhuri Kulkarni (A survey paper written under the guidance of Prof. Raj Jain) Abstract: A plethora of statistical modeling tools are available in the market

More information

Solving the Unsolvable Through Scientific Computing: Explorations in the Best Uses of Popular Mathematics Software

Solving the Unsolvable Through Scientific Computing: Explorations in the Best Uses of Popular Mathematics Software Solving the Unsolvable Through Scientific Computing: Explorations in the Best Uses of Popular Mathematics Software Talitha Washington, Howard University Edray Goins, Purdue University Luis Melara, Shippensburg

More information

The R statistical computing environment

The R statistical computing environment The R statistical computing environment Luke Tierney Department of Statistics & Actuarial Science University of Iowa June 17, 2011 Luke Tierney (U. of Iowa) R June 17, 2011 1 / 27 Introduction R is a language

More information

Part 1: Getting Started

Part 1: Getting Started Part 1: Getting Started 140.776 Statistical Computing Ingo Ruczinski Thanks to Thomas Lumley and Robert Gentleman of the R-core group (http://www.r-project.org/) for providing some tex files that appear

More information

A Method for Comparing Multiple Regression Models

A Method for Comparing Multiple Regression Models CSIS Discussion Paper No. 141 A Method for Comparing Multiple Regression Models Yuki Hiruta Yasushi Asami Department of Urban Engineering, the University of Tokyo e-mail: hiruta@ua.t.u-tokyo.ac.jp asami@csis.u-tokyo.ac.jp

More information

An Introduction to R 1.3 Some important practical matters when working with R

An Introduction to R 1.3 Some important practical matters when working with R An Introduction to R 1.3 Some important practical matters when working with R Dan Navarro (daniel.navarro@adelaide.edu.au) School of Psychology, University of Adelaide ua.edu.au/ccs/people/dan DSTO R Workshop,

More information

An introduction to ggplot: An implementation of the grammar of graphics in R

An introduction to ggplot: An implementation of the grammar of graphics in R An introduction to ggplot: An implementation of the grammar of graphics in R Hadley Wickham 00-0-7 1 Introduction Currently, R has two major systems for plotting data, base graphics and lattice graphics

More information

Fuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC)

Fuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC) Intro to R Fuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC) fuz@mrl.ucsb.edu MRL 2066B Sharon Solis Paul Weakliem Research Computing

More information

The R Software Environment

The R Software Environment The R Software Environment a (very) short introduction L. Torgo ltorgo@dcc.fc.up.pt Departamento de Ciência de Computadores Faculdade de Ciências / Universidade do Porto Feb, 2017 What is R? The R Project

More information

On R for Statistics. Subhajit Dutta Stat-Math Unit. Indian Statistical Institute, Kolkata September 16, 2011

On R for Statistics. Subhajit Dutta Stat-Math Unit. Indian Statistical Institute, Kolkata September 16, 2011 On R for Statistics Subhajit Dutta Stat-Math Unit Indian Statistical Institute, Kolkata September 16, 2011 Why R? It is FREE!! Basic as well as specialized data analysis technique at your fingertips. Highly

More information

SQL Server 2017: Data Science with Python or R?

SQL Server 2017: Data Science with Python or R? SQL Server 2017: Data Science with Python or R? Dejan Sarka Sponsor Introduction Dejan Sarka (dsarka@solidq.com, dsarka@siol.net, @DejanSarka) 30 years of experience SQL Server MVP, MCT, 16 books 20+ courses,

More information

Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones

Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones What is machine learning? Data interpretation describing relationship between predictors and responses

More information

davidr Cornell University

davidr Cornell University 1 NONPARAMETRIC RANDOM EFFECTS MODELS AND LIKELIHOOD RATIO TESTS Oct 11, 2002 David Ruppert Cornell University www.orie.cornell.edu/ davidr (These transparencies and preprints available link to Recent

More information

Package r2d2. February 20, 2015

Package r2d2. February 20, 2015 Package r2d2 February 20, 2015 Version 1.0-0 Date 2014-03-31 Title Bivariate (Two-Dimensional) Confidence Region and Frequency Distribution Author Arni Magnusson [aut], Julian Burgos [aut, cre], Gregory

More information

Data Handling: Import, Cleaning and Visualisation

Data Handling: Import, Cleaning and Visualisation Data Handling: Import, Cleaning and Visualisation 1 Data Display Lecture 11: Visualisation and Dynamic Documents Prof. Dr. Ulrich Matter (University of St. Gallen) 13/12/18 In the last part of a data pipeline

More information

Introduction to R. base -> R win32.exe (this will change depending on the latest version)

Introduction to R. base -> R win32.exe (this will change depending on the latest version) Dr Raffaella Calabrese, Essex Business School 1. GETTING STARTED Introduction to R R is a powerful environment for statistical computing which runs on several platforms. R is available free of charge.

More information

Computational statistics Jamie Griffin. Semester B 2018 Lecture 1

Computational statistics Jamie Griffin. Semester B 2018 Lecture 1 Computational statistics Jamie Griffin Semester B 2018 Lecture 1 Course overview This course is not: Statistical computing Programming This course is: Computational statistics Statistical methods that

More information

Package slp. August 29, 2016

Package slp. August 29, 2016 Version 1.0-5 Package slp August 29, 2016 Author Wesley Burr, with contributions from Karim Rahim Copyright file COPYRIGHTS Maintainer Wesley Burr Title Discrete Prolate Spheroidal

More information

An Introduction To R. Erin Rachael Shellman Bioinformatics PhD Program Biostatistics Brownbag Seminar 09/26/2008

An Introduction To R. Erin Rachael Shellman Bioinformatics PhD Program   Biostatistics Brownbag Seminar 09/26/2008 An Introduction To R Erin Rachael Shellman Bioinformatics PhD Program www.umich.edu/~shellman/rtalk.html Biostatistics Brownbag Seminar 09/26/2008 1 Talking Points In this talk, my goal is to: Introduce

More information

Package blocksdesign

Package blocksdesign Type Package Package blocksdesign September 11, 2017 Title Nested and Crossed Block Designs for Factorial, Fractional Factorial and Unstructured Treatment Sets Version 2.7 Date 2017-09-11 Author R. N.

More information

IST Computational Tools for Statistics I. DEÜ, Department of Statistics

IST Computational Tools for Statistics I. DEÜ, Department of Statistics IST 1051 Computational Tools for Statistics I 1 DEÜ, Department of Statistics Course Objectives Computational Tools for Statistics-I course can increase the understanding of statistics and helps to learn

More information

STATISTICS (STAT) 200 Level Courses. 300 Level Courses. Statistics (STAT) 1

STATISTICS (STAT) 200 Level Courses. 300 Level Courses. Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) 200 Level Courses STAT 250: Introductory Statistics I. 3 credits. Elementary introduction to statistics. Topics include descriptive statistics, probability, and estimation

More information

Fuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC)

Fuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC) Intro to R Fuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC) fuz@mrl.ucsb.edu MRL 2066B Sharon Solis Paul Weakliem Research Computing

More information

History and Ecology of R

History and Ecology of R History and Ecology of R Martyn Plummer International Agency for Research on Cancer ANF R avancé et performances Aussois 6 Oct 2015 Pre-history Before there was R, there was S. The S language Developed

More information

Generalized Additive Model

Generalized Additive Model Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1

More information

An Introduction to the Bootstrap

An Introduction to the Bootstrap An Introduction to the Bootstrap Bradley Efron Department of Statistics Stanford University and Robert J. Tibshirani Department of Preventative Medicine and Biostatistics and Department of Statistics,

More information

An Introduction to R- Programming

An Introduction to R- Programming An Introduction to R- Programming Hadeel Alkofide, Msc, PhD NOT a biostatistician or R expert just simply an R user Some slides were adapted from lectures by Angie Mae Rodday MSc, PhD at Tufts University

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 2: Software Introduction Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University jacoby@msu.edu Getting Started with R What is R? A tiny R session

More information

Stat 849: Plotting responses and covariates

Stat 849: Plotting responses and covariates Stat 849: Plotting responses and covariates Douglas Bates Department of Statistics University of Wisconsin, Madison 2010-09-03 Outline R Graphics Systems Brain weight Cathedrals Longshoots Domedata Summary

More information

Stat 849: Plotting responses and covariates

Stat 849: Plotting responses and covariates Stat 849: Plotting responses and covariates Douglas Bates 10-09-03 Outline Contents 1 R Graphics Systems Graphics systems in R ˆ R provides three dierent high-level graphics systems base graphics The system

More information

STATISTICS (STAT) 200 Level Courses Registration Restrictions: STAT 250: Required Prerequisites: not Schedule Type: Mason Core: STAT 346:

STATISTICS (STAT) 200 Level Courses Registration Restrictions: STAT 250: Required Prerequisites: not Schedule Type: Mason Core: STAT 346: Statistics (STAT) 1 STATISTICS (STAT) 200 Level Courses STAT 250: Introductory Statistics I. 3 credits. Elementary introduction to statistics. Topics include descriptive statistics, probability, and estimation

More information

An Introduction to the R Commander

An Introduction to the R Commander An Introduction to the R Commander BIO/MAT 460, Spring 2011 Christopher J. Mecklin Department of Mathematics & Statistics Biomathematics Research Group Murray State University Murray, KY 42071 christopher.mecklin@murraystate.edu

More information

Frames, Environments, and Scope in R and S-PLUS

Frames, Environments, and Scope in R and S-PLUS Frames, Environments, and Scope in R and S-PLUS Appendix to An R and S-PLUS Companion to Applied Regression John Fox March 2002 1 Introduction Section 2.2.1 of the text describes in some detail how objects

More information

A comparison of spline methods in R for building explanatory models

A comparison of spline methods in R for building explanatory models A comparison of spline methods in R for building explanatory models Aris Perperoglou on behalf of TG2 STRATOS Initiative, University of Essex ISCB2017 Aris Perperoglou Spline Methods in R ISCB2017 1 /

More information

Introduction to RStudio

Introduction to RStudio Introduction to RStudio Ulrich Halekoh Epidemiology and Biostatistics, SDU May 4, 2018 R R is a language that started by Ross Ihaka and Robert Gentleman in 1991 as an open source alternative to S emphasizes

More information

Introduction to R and Bioconductor

Introduction to R and Bioconductor Introduction to R and Bioconductor RNA-Seq / ChIP-Seq Data Analysis Workshop 10 September 2012 CSC, Helsinki Nicolas Delhomme A bit of interaction? What is your R knowledge, on a 0 (beginner) to 2 (expert)

More information

Intro to R. Some history. Some history

Intro to R. Some history. Some history Intro to R Héctor Corrada Bravo CMSC858B Spring 2012 University of Maryland Computer Science http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html?_r=2&pagewanted=1 http://www.forbes.com/forbes/2010/0524/opinions-software-norman-nie-spss-ideas-opinions.html

More information

AMERICAN JOURNAL OF POLITICAL SCIENCE GUIDELINES FOR PREPARING REPLICATION FILES Version 1.0, March 25, 2015 William G. Jacoby

AMERICAN JOURNAL OF POLITICAL SCIENCE GUIDELINES FOR PREPARING REPLICATION FILES Version 1.0, March 25, 2015 William G. Jacoby AJPS, South Kedzie Hall, 368 Farm Lane, S303, East Lansing, MI 48824 ajps@msu.edu (517) 884-7836 AMERICAN JOURNAL OF POLITICAL SCIENCE GUIDELINES FOR PREPARING REPLICATION FILES Version 1.0, March 25,

More information

Introduction to R. Hao Helen Zhang. Fall Department of Mathematics University of Arizona

Introduction to R. Hao Helen Zhang. Fall Department of Mathematics University of Arizona Department of Mathematics University of Arizona hzhang@math.aricona.edu Fall 2019 What is R R is the most powerful and most widely used statistical software Video: A language and environment for statistical

More information

Missing Data: What Are You Missing?

Missing Data: What Are You Missing? Missing Data: What Are You Missing? Craig D. Newgard, MD, MPH Jason S. Haukoos, MD, MS Roger J. Lewis, MD, PhD Society for Academic Emergency Medicine Annual Meeting San Francisco, CA May 006 INTRODUCTION

More information

A First Course in Statistical Programming with R

A First Course in Statistical Programming with R A First Course in Statistical Programming with R This new, color edition of Braun and Murdoch s bestselling textbook integrates use of the RStudio platform and adds discussion of newer graphics systems,

More information

Updates and Errata for Statistical Data Analytics (1st edition, 2015)

Updates and Errata for Statistical Data Analytics (1st edition, 2015) Updates and Errata for Statistical Data Analytics (1st edition, 2015) Walter W. Piegorsch University of Arizona c 2018 The author. All rights reserved, except where previous rights exist. CONTENTS Preface

More information

Dynamic Thresholding for Image Analysis

Dynamic Thresholding for Image Analysis Dynamic Thresholding for Image Analysis Statistical Consulting Report for Edward Chan Clean Energy Research Center University of British Columbia by Libo Lu Department of Statistics University of British

More information

An Introduction To R For Spatial Analysis And Mapping

An Introduction To R For Spatial Analysis And Mapping We have made it easy for you to find a PDF Ebooks without any digging. And by having access to our ebooks online or by storing it on your computer, you have convenient answers with an introduction to r

More information

Statistics (STAT) Statistics (STAT) 1. Prerequisites: grade in C- or higher in STAT 1200 or STAT 1300 or STAT 1400

Statistics (STAT) Statistics (STAT) 1. Prerequisites: grade in C- or higher in STAT 1200 or STAT 1300 or STAT 1400 Statistics (STAT) 1 Statistics (STAT) STAT 1200: Introductory Statistical Reasoning Statistical concepts for critically evaluation quantitative information. Descriptive statistics, probability, estimation,

More information

Trellis Displays. Definition. Example. Trellising: Which plot is best? Historical Development. Technical Definition

Trellis Displays. Definition. Example. Trellising: Which plot is best? Historical Development. Technical Definition Trellis Displays The curse of dimensionality as described by Huber [6] is not restricted to mathematical statistical problems, but can be found in graphicbased data analysis as well. Most plots like histograms

More information

Hierarchical Mixture Models for Nested Data Structures

Hierarchical Mixture Models for Nested Data Structures Hierarchical Mixture Models for Nested Data Structures Jeroen K. Vermunt 1 and Jay Magidson 2 1 Department of Methodology and Statistics, Tilburg University, PO Box 90153, 5000 LE Tilburg, Netherlands

More information

MAT128A: Numerical Analysis Lecture One: Course Logistics and What is Numerical Analysis?

MAT128A: Numerical Analysis Lecture One: Course Logistics and What is Numerical Analysis? MAT128A: Numerical Analysis Lecture One: Course Logistics and What is Numerical Analysis? September 26, 2018 Lecture 1 September 26, 2018 1 / 19 Course Logistics My contact information: James Bremer Email:

More information

Software for your own computer: R, RStudio, LaTeX, PsychoPy

Software for your own computer: R, RStudio, LaTeX, PsychoPy Software for your own computer: R, RStudio, LaTeX, PsychoPy There are four software packages that you might want to install on your own computer. They will allow you to work on the various class exercises

More information

R for absolute beginners Duncan Golicher

R for absolute beginners Duncan Golicher R for absolute beginners Duncan Golicher 11/19/2008 Introduction Motivation for the course 1. Encourage researchers and students to begin using R 2. Draw on personal experience to flatten the learning

More information

Software for your own computer: R, RStudio, LaTeX, PsychoPy

Software for your own computer: R, RStudio, LaTeX, PsychoPy Software for your own computer: R, RStudio, LaTeX, PsychoPy You do not need your own computer for this class. There are, however, four software packages that you might want to install on your own computer,

More information

Introduction to R and RStudio IDE

Introduction to R and RStudio IDE Introduction to R and RStudio IDE Wan Nor Arifin Unit of Biostatistics and Research Methodology, Universiti Sains Malaysia. email: wnarifin@usm.my December 19, 2018 Wan Nor Arifin (USM) Introduction to

More information

Introduction to R programming a SciLife Lab course

Introduction to R programming a SciLife Lab course Introduction to R programming a SciLife Lab course 31 August 2016 What R is a programming language, a programming platform (=environment + interpreter), a software project driven by the core team and the

More information

2 Installing and Updating R

2 Installing and Updating R 2 Installing and Updating R Stata and R are somewhat similar in that both are modular. Each comes with a single binary executable file and a large number of individual functions or commands. These are

More information

Introduction to R programming a SciLife Lab course

Introduction to R programming a SciLife Lab course Introduction to R programming a SciLife Lab course 20 October 2017 What R really is? a programming language, a programming platform (= environment + interpreter), a software project driven by the core

More information

EPIB Four Lecture Overview of R

EPIB Four Lecture Overview of R EPIB-613 - Four Lecture Overview of R R is a package with enormous capacity for complex statistical analysis. We will see only a small proportion of what it can do. The R component of EPIB-613 is divided

More information

Package FCGR. October 13, 2015

Package FCGR. October 13, 2015 Type Package Title Fatigue Crack Growth in Reliability Version 1.0-0 Date 2015-09-29 Package FCGR October 13, 2015 Author Antonio Meneses , Salvador Naya ,

More information

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs 1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be

More information

Package blocksdesign

Package blocksdesign Type Package Package blocksdesign June 12, 2018 Title Nested and Crossed Block Designs for Factorial, Fractional Factorial and Unstructured Treatment Sets Version 2.9 Date 2018-06-11" Author R. N. Edmondson.

More information

The main topics that we will learn about during the class are

The main topics that we will learn about during the class are STA 141 Syllabus This class is about scientific and statistical computing. It is intended to provide you with a strong foundation in computing skills that are increasingly necessary for a practicing statistician

More information

Intro Intro.3

Intro Intro.3 Intro.1 Intro.2 Introduction to R Much of the content here is from Appendix A of my Analysis of Categorical Data with R book (www.chrisbilder.com/ categorical). All R code is available in AppendixInitialExamples.R

More information

R Primer for Introduction to Mathematical Statistics 8th Edition Joseph W. McKean

R Primer for Introduction to Mathematical Statistics 8th Edition Joseph W. McKean R Primer for Introduction to Mathematical Statistics 8th Edition Joseph W. McKean Copyright 2017 by Joseph W. McKean at Western Michigan University. All rights reserved. Reproduction or translation of

More information

Categorical explanatory variables

Categorical explanatory variables Hutcheson, G. D. (2011). Tutorial: Categorical Explanatory Variables. Journal of Modelling in Management. 6, 2: 225 236. NOTE: this is a slightly updated version of this paper which is distributed to correct

More information

Introduction to R programming a SciLife Lab course

Introduction to R programming a SciLife Lab course Introduction to R programming a SciLife Lab course 22 March 2017 What R really is? a programming language, a programming platform (= environment + interpreter), a software project driven by the core team

More information

Multiple-imputation analysis using Stata s mi command

Multiple-imputation analysis using Stata s mi command Multiple-imputation analysis using Stata s mi command Yulia Marchenko Senior Statistician StataCorp LP 2009 UK Stata Users Group Meeting Yulia Marchenko (StataCorp) Multiple-imputation analysis using mi

More information

Decision Making Procedure: Applications of IBM SPSS Cluster Analysis and Decision Tree

Decision Making Procedure: Applications of IBM SPSS Cluster Analysis and Decision Tree World Applied Sciences Journal 21 (8): 1207-1212, 2013 ISSN 1818-4952 IDOSI Publications, 2013 DOI: 10.5829/idosi.wasj.2013.21.8.2913 Decision Making Procedure: Applications of IBM SPSS Cluster Analysis

More information

Introduction to R Jason Huff, QB3 CGRL UC Berkeley April 15, 2016

Introduction to R Jason Huff, QB3 CGRL UC Berkeley April 15, 2016 Introduction to R Jason Huff, QB3 CGRL UC Berkeley April 15, 2016 Installing R R is constantly updated and you should download a recent version; the version when this workshop was written was 3.2.4 I also

More information

A review of spline function selection procedures in R

A review of spline function selection procedures in R Matthias Schmid Department of Medical Biometry, Informatics and Epidemiology University of Bonn joint work with Aris Perperoglou on behalf of TG2 of the STRATOS Initiative September 1, 2016 Introduction

More information

MS in Applied Statistics: Study Guide for the Data Science concentration Comprehensive Examination. 1. MAT 456 Applied Regression Analysis

MS in Applied Statistics: Study Guide for the Data Science concentration Comprehensive Examination. 1. MAT 456 Applied Regression Analysis MS in Applied Statistics: Study Guide for the Data Science concentration Comprehensive Examination. The Part II comprehensive examination is a three-hour closed-book exam that is offered on the second

More information

Data Science Bootcamp Curriculum. NYC Data Science Academy

Data Science Bootcamp Curriculum. NYC Data Science Academy Data Science Bootcamp Curriculum NYC Data Science Academy 100+ hours free, self-paced online course. Access to part-time in-person courses hosted at NYC campus Machine Learning with R and Python Foundations

More information

Outline. Mixed models in R using the lme4 package Part 1: Introduction to R. Following the operations on the slides

Outline. Mixed models in R using the lme4 package Part 1: Introduction to R. Following the operations on the slides Outline Mixed models in R using the lme4 package Part 1: Introduction to R Douglas Bates University of Wisconsin - Madison and R Development Core Team UseR!2009, Rennes, France

More information

Data Wrangling in the Tidyverse

Data Wrangling in the Tidyverse Data Wrangling in the Tidyverse 21 st Century R DS Portugal Meetup, at Farfetch, Porto, Portugal April 19, 2017 Jim Porzak Data Science for Customer Insights 4/27/2017 1 Outline 1. A very quick introduction

More information

IBM SPSS Statistics and open source: A powerful combination. Let s go

IBM SPSS Statistics and open source: A powerful combination. Let s go and open source: A powerful combination Let s go The purpose of this paper is to demonstrate the features and capabilities provided by the integration of IBM SPSS Statistics and open source programming

More information

Converting a large R package to S4 classes and methods

Converting a large R package to S4 classes and methods DSC 2003 Working Papers (Draft Versions) http://www.ci.tuwien.ac.at/conferences/dsc-2003/ Converting a large R package to S4 classes and methods Douglas M. Bates and Saikat DebRoy Department of Statistics

More information

Analysis of Incomplete Multivariate Data

Analysis of Incomplete Multivariate Data Analysis of Incomplete Multivariate Data J. L. Schafer Department of Statistics The Pennsylvania State University USA CHAPMAN & HALL/CRC A CR.C Press Company Boca Raton London New York Washington, D.C.

More information

Weekly Discussion Sections & Readings

Weekly Discussion Sections & Readings Weekly Discussion Sections & Readings Teaching Fellows (TA) Name Office Email Mengting Gu Bass 437 mengting.gu (at) yale.edu Paul Muir Bass437 Paul.muir (at) yale.edu Please E-mail cbb752@gersteinlab.org

More information

STATISTICS (STAT) Statistics (STAT) 1

STATISTICS (STAT) Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).

More information

Using the DATAMINE Program

Using the DATAMINE Program 6 Using the DATAMINE Program 304 Using the DATAMINE Program This chapter serves as a user s manual for the DATAMINE program, which demonstrates the algorithms presented in this book. Each menu selection

More information

Software for your own computer: R, RStudio, LaTeX, PsychoPy

Software for your own computer: R, RStudio, LaTeX, PsychoPy Software for your own computer: R, RStudio, LaTeX, PsychoPy You do not need your own computer for this class. There are, however, four software packages that you might want to install on your own computer,

More information

Using Sunflower Plots and Classification Trees to Study Typeface Legibility

Using Sunflower Plots and Classification Trees to Study Typeface Legibility CS-BIGS 2(2): 92-98 2009 CS-BIGS http://www.bentley.edu/csbigs/vol2-2/merkle.pdf Using Sunflower Plots and Classification Trees to Study Typeface Legibility Edgar C. Merkle and Barbara S. Chaparro Wichita

More information

Statistical Modeling with Spline Functions Methodology and Theory

Statistical Modeling with Spline Functions Methodology and Theory This is page 1 Printer: Opaque this Statistical Modeling with Spline Functions Methodology and Theory Mark H Hansen University of California at Los Angeles Jianhua Z Huang University of Pennsylvania Charles

More information

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017 Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last

More information

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Generalized Additive Model and Applications in Direct Marketing Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Abstract Logistic regression 1 has been widely used in direct marketing applications

More information

Short Introduction to R

Short Introduction to R Short Introduction to R Paulino Pérez 1 José Crossa 2 1 ColPos-México 2 CIMMyT-México June, 2015. CIMMYT, México-SAGPDB Short Introduction to R 1/51 Contents 1 Introduction 2 Simple objects 3 User defined

More information

1 Overview XploRe is an interactive computational environment for statistics. The aim of XploRe is to provide a full, high-level programming language

1 Overview XploRe is an interactive computational environment for statistics. The aim of XploRe is to provide a full, high-level programming language Teaching Statistics with XploRe Marlene Muller Institute for Statistics and Econometrics, Humboldt University Berlin Spandauer Str. 1, D{10178 Berlin, Germany marlene@wiwi.hu-berlin.de, http://www.wiwi.hu-berlin.de/marlene

More information