Reproducible Research with R and RStudio

Similar documents
Mastering Linux. Paul S. Wang. CRC Press. Taylor & Francis Group. Taylor & Francis Croup an informa business. A CHAPMAN St HALL BOOK

BIOST 561: R Markdown Intro

the Simulation of Dynamics Using Simulink

Managing Your Biological Data with Python

Jim Jackson II Ian Gilman

A Web-Based Introduction

This course is designed for web developers that want to learn HTML5, CSS3, JavaScript and jquery.

Support Vector. Machines. Algorithms, and Extensions. Optimization Based Theory, Naiyang Deng YingjieTian. Chunhua Zhang.

Computer Network. The Practical User Guide for. Simulation. Adarshpal S. Hnatyshin. Vasil Y. CRC Press. Taylor Si Francis Croup

Introduction. Part I: jquery API 1. Chapter 1: Introduction to jquery 3

Data Clustering in C++

Programming Graphical

End User s Guide Release 5.0

"Charting the Course... SharePoint 2007 Hands-On Labs Course Summary

George Grätzer. Practical L A TEX

Mobile Device Security

8.3 Come analizzare i dati: introduzione a RStudio

Adding a corporate identity to reproducible research

Merge Conflicts p. 92 More GitHub Workflows: Forking and Pull Requests p. 97 Using Git to Make Life Easier: Working with Past Commits p.

SOME ASSEMBLY REQUIRED

Pro JavaScript. Development. Coding, Capabilities, and Tooling. Den Odell. Apress"

Reproducibility with git and rmarkdown

Privacy-Preserving. Introduction to. Data Publishing. Concepts and Techniques. Benjamin C. M. Fung, Ke Wang, Chapman & Hall/CRC. S.

SAP' ABAP. Handbook. Kogent Learning Solutions, Inc. Sudbury, Massachusetts JONES AND BARTLETT PUBLISHERS BOSTON TORONTO LONDON SINUAPORI:

AAM Guide for Authors

LATEX. Leslie Lamport. Digital Equipment Corporation. Illustrations by Duane Bibby. v ADDISON-WESLEY

A Guide to MATLAB Object-Oriented Programming

\XjP^J Taylor & Francis Group. Model-Based Control. Tensor Product Model Transformation in Polytopic. Yeung Yam. CRC Press.

PRACTICAL SPEECH USER INTERFACE DESIGN

Dynamic Documents. Using knitr. Benjamin Hofner

Management. Port Security. Second Edition KENNETH CHRISTOPHER. CRC Press. Taylor & Francis Group. Taylor & Francis Group,

COPYRIGHTED MATERIAL. Contents. Chapter 1: Creating Structured Documents 1

Mathematics Shape and Space: Polygon Angles

1.264 Lecture 12. HTML Introduction to FrontPage

Reproducible research and knitr

Pro Angular 6. Third Edition. Adam Freeman

Package knitrbootstrap

IN PRACTICE. Daniele Bochicchio Stefano Mostarda Marco De Sanctis. Includes 106 practical techniques MANNING

Create Web Charts. With jqplot. Apress. Fabio Nelli

Contents. Tutorials Section 1. About SAS Enterprise Guide ix About This Book xi Acknowledgments xiii

Installing and Administering a Satellite Environment

Practical Node.js. Building Real-World Scalable Web Apps. Apress* Azat Mardan

CROSS-REFERENCE TABLE ASME A Including A17.1a-1997 Through A17.1d 2000 vs. ASME A

Package BiocStyle. April 22, 2016

PTC Mathcad Prime 3.0

Beginning Microsoft Office 2010

LYX with Beamer and Sweave

Excel Programming with VBA (Macro Programming) 24 hours Getting Started

Package workflowr. July 6, 2018

Analyzing Economic Data using R

@Taylor. Usability. Evaluation. for In-Vehicle Systems. Harvey. Catherine. Neville A.Stanton. CRC Press. Francis Group

Scripts define HOW. The report defines WHAT & WHY. Mikhail Dozmorov. Fall Mikhail Dozmorov Scripts define HOW Fall / 27

Contents. Acknowledgments

Modelling and Quantitative Methods in Fisheries

Moving ROOT Documentation from Docbook to Markdown

Citation guide. Carleton College L A TEX workshop. You don t have to keep track of what sources you cite in your document.

documentation Editing Files and Folders

Python Scripting for Computational Science

Chapter 1 Getting Started with HTML 5 1. Chapter 2 Introduction to New Elements in HTML 5 21

Oracle Application Express

Applied Combinatorics

GuideAutomator: Automated User Manual Generation with Markdown

The Scope of This Book... xxii A Quick Note About Browsers and Platforms... xxii The Appendices and Further Resources...xxiii

"Charting the Course... Comprehensive Angular 5. Course Summary

Foundation XML and E4X for Flash and Flex

Version Monitoring Agent User s Guide SC

Graphics Shaders. Theory and Practice. Second Edition. Mike Bailey. Steve Cunningham. CRC Press. Taylor&FnincIs Croup tootutor London New York

Introduction to Windchill PDMLink 10.2 for the Implementation Team

Contents. Acknowledgments Parachutes: Coda. About the Author. Presentation Conventions. PART ONE Foundations 1

Oracle BI 11g R1: Build Repositories

OER Publishing with LaTeX and GitHub

Overview 14 Table Definitions and Style Definitions 16 Output Objects and Output Destinations 18 ODS References and Resources 20

Script for Interview about LATEX and Friends

The CIW Web Foundations courses prepare students to take the CIW Web Foundations Associate certification exam.

Contents in Detail. Foreword by Xavier Noria

TEACHER PAGES USER MANUAL CHAPTER 6 SHARPSCHOOL. For more information, please visit: Chapter 6 Teacher Pages

Getting MEAN. with Mongo, Express, Angular, and Node SIMON HOLMES MANNING SHELTER ISLAND

Inline Processing Engine User Guide. Release: August 2017 E

Andale Store Getting Started Manual

User Guide. Chapter 6. Teacher Pages

Latex Manually Set Font Size For Tables

About the Authors. Preface

3. Centralize your title. To do this, click the Center button on the tab s paragraph group.

CONTENTS. ... vii. ... xv The Old Standard xvi The New Standard xvi A Whole New Ball Game xvii e-rpg xviii INTRODUCTION

Fundamentals of the Java Programming Language

COPYRIGHTED MATERIAL. Contents. Introduction. Chapter 1: Structuring Documents for the Web 1

Package markdown. February 15, 2013

The 4D Web Companion. David Adams

Microsoft Office 365 Word Online

Beginning ASP.NET. 4.5 in C# Matthew MacDonald

Introduction to Reproducible Research in R and R Studio.

Table of Contents COPYRIGHTED MATERIAL. Introduction Book I: Excel Basics Chapter 1: The Excel 2013 User Experience...

Securing an IT. Governance, Risk. Management, and Audit

Introduction to typesetting with L A TEX

IBM Tivoli Federated Identity Manager Version Installation Guide GC

SEMINAR REPORT SEMINAR TITLE. Name of the student

2 Webpage Markup with HTML HTML5 Page Structure Creating a Webpage HTML5 Elements and Entities

Biosignal And Medical Image Processing Second Edition Signal Processing And Communications

A Guide to QuarkXPress 9.1

COPYRIGHTED MATERIAL. Acknowledgments...v Introduction... xxi

Transcription:

The R Series Reproducible Research with R and RStudio Christopher Gandrud C\ CRC Press cj* Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Group an informa business A CHAPMAN & HALL BOOK

Contents Preface Stylistic Conventions Required R Packages Additional Resources Chapter Examples Short Example Project List of Figures List of Tables vii xv xvii xix xix xix xxiv xxv I Getting Started 1 1 Introducing Reproducible Research 3 1.1 What is Reproducible Research? 3 1.2 Why Should Research Be Reproducible? 5 1.2.1 For science 5 1.2.2 For you 6 1.3 Who Should Read This Book? 7 1.3.1 Academic researchers 8 1.3.2 Students 8 1.3.3 Instructors 8 1.3.4 Editors 9 1.3.5 Private sector researchers 9 1.4 The Tools of Reproducible Research 10 1.5 Why Use R, knitr, and RStudio for Reproducible Research? 11 1.5.1 Installing the main software 13 1.6 Book Overview 14 1.6.1 How to read this book 15 1.6.2 Reproduce this book 16 1.6.3 Contents overview 16 IX

Getting Started with Reproducible Research 17 2.1 The Big Picture: A Workflow for Reproducible Research.. 17 2.1.1 Reproducible theory 18 2.2 Practical Tips for Reproducible Research 20 2.2.1 Document everything! 20 2.2.2 Everything is a (text) file 22 2.2.3 All files should be human readable 22 2.2.4 Explicitly tie your files together 24 2.2.5 Have a plan to organize, store, & make your files available 25 Getting Started with R, RStudio, and knitr 27 3.1 Using R: the Basics 27 3.1.1 Objects 28 3.1.2 Component selection 34 3.1.3 Subscripts 36 3.1.4 Functions and commands 37 3.1.5 Arguments 38 3.1.6 The workspace & history 39 3.1.7 Global R options 41 3.1.8 Installing new packages and loading commands... 42 3.2 Using RStudio 43 3.3 Using knitr: the Basics 44 3.3.1 What knitr does 45 3.3.2 File extensions 45 3.3.3 Code chunks 47 3.3.4 Global chunk options 49 3.3.5 knitr package options 51 3.3.6 Hooks 52 3.3.7 knitr & RStudio 52 3.3.8 knitr & R 55 Getting Started with File Management 59 4.1 File Paths & Naming Conventions 60 4.1.1 Root directories 60 4.1.2 Subdirectories & parent directories 60 4.1.3 Spaces in directory & file names 61 4.1.4 Working directories 61 4.2 Organizing Your Research Project 63 4.3 Setting Directories as RStudio Projects 64 4.4 R File Manipulation Commands 64 4.5 Unix-like Shell Commands for File Management 67 4.6 File Navigation in RStudio 72 Data Gathering and Storage 73

5 Storing, Collaborating, Accessing Files, &c Versioning 75 5.1 Saving Data in Reproducible Formats 76 5.2 Storing Your Files in the Cloud: Dropbox 77 5.2.1 Storage 78 5.2.2 Accessing data 78 5.2.3 Collaboration 80 5.2.4 Version control 80 5.3 Storing Your Files in the Cloud: GitHub 81 5.3.1 Setting up GitHub: basic 83 5.3.2 Version control with Git 84 5.3.3 Remote storage on GitHub 91 5.3.4 Accessing on GitHub 94 5.3.4.1 Collaboration with GitHub 96 5.3.5 Summing up the GitHub workflow 96 5.4 RStudio & GitHub 97 5.4.1 Setting up Git/GitHub with Projects 98 5.4.2 Using Git in RStudio Projects 99 6 Gathering Data with R 101 6.1 Organize Your Data Gathering: Makefiles 101 6.1.1 R Make-like files 102 6.1.2 GNU Make 103 6.1.2.1 Example makefile 104 6.1.2.2 Makefiles and RStudio Projects 108 6.1.2.3 Other information about makefiles 109 6.2 Importing Locally Stored Data Sets 109 6.3 Importing Data Sets from the Internet 110 6.3.1 Data from non-secure (http) URLs Ill 6.3.2 Data from secure (https) URLs Ill 6.3.3 Compressed data stored online 114 6.3.4 Data APIs & feeds 115 6.4 Advanced Automatic Data Gathering: Web Scraping 117 7 Preparing Data for Analysis 121 7.1 Cleaning Data for Merging 121 7.1.1 Get a handle on your data 121 7.1.2 Reshaping data 123 7.1.3 Renaming variables 126 7.1.4 Ordering data 126 7.1.5 Subsetting data 127 7.1.6 Recoding string/numeric variables 129 7.1.7 Creating new variables from old 131 7.1.8 Changing variable types 134 7.2 Merging Data Sets 135 7.2.1 Binding 135 xi

xii 7.2.2 The merge command 135 7.2.3 Duplicate values 138 7.2.4 Duplicate columns 139 III Analysis and Results 143 8 Statistical Modelling and knitr 145 8.1 Incorporating Analyses into the Markup 146 8.1.1 Full code chunks 146 8.1.2 Showing code & results inline 148 8.1.2.1 LaTeX 148 8.1.2.2 Markdown 150 8.1.3 Dynamically including non-r code in code chunks.. 150 8.2 Dynamically Including Modular Analysis Files 152 8.2.1 Source from a local file 152 8.2.2 Source from a non-secure URL (http) 153 8.2.3 Source from a secure URL (https) 154 8.3 Reproducibly Random: set. seed 155 8.4 Computationally Intensive Analyses 157 9 Showing Results with Tables 159 9.1 Basic knitr Syntax for Tables 160 9.2 Table Basics 160 9.2.1 Tables in LaTeX 161 9.2.2 Tables in Markdown/HTML 165 9.3 Creating Tables from R Objects 169 9.3.1 xtable & apsrtable basics with supported class objects 169 9.3.1.1 apsrtable for LaTeX 172 9.3.2 xtable with non-supported class objects 175 9.3.3 Creating variable description documents with xtable. 177 10 Showing Results with Figures 181 10.1 Including Non-knitted Graphics 181 10.1.1 Including graphics in LaTeX 182 10.1.2 Including graphics in Markdown/HTML 184 10.2 Basic knitr Figure Options 185 10.2.1 Chunk options 185 10.2.2 Global options 187 10.3 Knitting R's Default Graphics 187 10.4 Including ggplot2 Graphics 192 10.4.1 Showing regression results with caterpillar plots... 195 10.5 JavaScript Graphs with googlevis 198 IV Presentation Documents 203

xiii 11 Presenting with LaTeX 205 11.1 The Basics 205 11.1.1 Getting started with LaTeX editors 205 11.1.2 Basic LaTeX command syntax 206 11.1.3 The LaTeX preamble & body 207 11.1.4 Headings 210 11.1.5 Paragraphs & spacing 210 11.1.6 Horizontal lines 211 11.1.7 Text formatting 211 11.1.8 Math 212 11.1.9 Lists 213 11.1.10 Footnotes 214 11.1.11 Cross-references 214 11.2 Bibliographies with BibTeX 215 11.2.1 The.bib file 215 11.2.2 Including citations in LaTeX documents 216 11.2.3 Generating a BibTeX file of R package citations... 217 11.3 Presentations with LaTeX Beamer 220 11.3.1 Beamer basics 220 11.3.2 knitr with LaTeX slideshows 223 12 Large LaTeX Documents: Theses, Books, & Batch Reports 225 12.1 Planning Large Documents 225 12.2 Large Documents with Traditional LaTeX 226 12.2.1 Inputting/including children 227 12.2.2 Other common features of large documents 228 12.3 knitr and Large Documents 229 12.3.1 The parent document 229 12.3.2 Knitting child documents 230 12.4 Child Documents in a Different Markup Language 231 12.5 Creating Batch Reports 232 13 Presenting on the Web with Markdown 239 13.1 The Basics 239 13.1.1 Getting started with Markdown editors 240 13.1.2 Preamble and document structure 240 13.1.3 Headers 243 13.1.4 Horizontal lines 243 13.1.5 Paragraphs and new lines 243 13.1.6 Italics and bold 244 13.1.7 Links 244 13.1.8 Special characters and font customization 244 13.1.9 Lists 244 13.1.10 Escape characters 245 13.1.11 Math with MathJax 245

XIV 13.2 Markdown with Pandoc and Custom CSS 246 13.2.1 Pandoc 246 13.2.2 CSS style files and Markdown 250 13.2.3 knitr's pandoc command 251 13.3 Slideshows with Markdown, knitr, and HTML 254 13.3.1 Slideshows with Markdown, knitr, and RStudio's R Presentations 254 13.3.2 Slideshows with Markdown, knitr, and slidif y... 256 13.4 Publishing Markdown Documents 262 13.4.1 Stand alone HTML files 262 13.4.2 Hosting webpages with Dropbox 263 13.4.3 GitHub Pages 263 14 Conclusion 265 14.1 Citing Reproducible Research 265 14.2 Licensing Your Reproducible Research 267 14.3 Sharing Your Code in Packages 267 14.4 Project Development: Public or Private? 268 14.5 Is it Possible to Completely Future Proof Your Research?.. 269 Bibliography 271 Index 279