Exploratory Data Analysis September 8, 2010

Size: px
Start display at page:

Download "Exploratory Data Analysis September 8, 2010"

Transcription

1 Exploratory Data Analysis p. 1/2 Exploratory Data Analysis September 8, 2010

2 Exploratory Data Analysis p. 2/2 Scatter Plots plot(x,y) plot(y x) Note use of model formula Today: how to add lines/smoothed curves add/identify points add legends transformations

3 Exploratory Data Analysis p. 3/2 Old Faithful Eruptions > summary(geyser) Day interval duration Min. : 1.00 Min. :42.00 Min. : st Qu.: st Qu.: st Qu.:2.300 Median :16.00 Median :75.00 Median :4.000 Mean :12.30 Mean :71.01 Mean : rd Qu.: rd Qu.: rd Qu.:4.400 Max. :23.00 Max. :95.00 Max. :5.200 Day (of eruption) in August interval (waiting time until the next eruption in minutes) duration (duration of current eruption in minutes)

4 Exploratory Data Analysis p. 4/2 Scatter Plot Old Faithful Eruptions Waiting Time (mins) Duration (mins)

5 Exploratory Data Analysis p. 5/2 Adding a Smooth Trend The function lowess fits a one dimensional smooth trend to (x,y) pairs To construct a scatter plot with smooth trend 1. plot(duration, interval) 2. lines(lowess(duration, interval)) or 3. scatter.smooth(duration, interval)

6 Exploratory Data Analysis p. 6/2 Default Lowess Smooth Old Faithful Eruptions Waiting Time (mins) Duration (mins)

7 Exploratory Data Analysis p. 7/2 lowess The algorithm behind lowess and its cousin loess is quite complex, but the idea is to use robust local estimates of the curve A window is placed around a point x data points in the window are weighted so that points near x get the most weight using the points in the window and weights, use a robust weighted regression to predict at x the parameter f gives the proportion of points in the plot which influence the smooth at each value. Larger values give more smoothness. Default is 2/3.

8 Exploratory Data Analysis p. 8/2 Choice of f in lowess Old Faithful Eruptions Waiting Time (mins) f = 1 f = 2/3 f =.5 f = Duration (mins)

9 Exploratory Data Analysis p. 9/2 Legends legend(x=2,y=90, legend=c("f=1","f=2/3","f=1/2","f=1/10"), col=2:5, lty=c(2,1,3,4), lwd=rep(2,4)) x,y location for legend legend text for legend col colors used lty line types lwd line widths

10 Exploratory Data Analysis p. 10/2 Transformations For plots to be more effective, we may need to transform variables. (also may make normality assumptions more plausible) Common transformations log positive continuous measures, concentrations, size sqrt counts ˆ-1 Typically need the largest observation to be at least 10 times larger than the smallest case for transformations to be effective.

11 Exploratory Data Analysis p. 11/2 Log Transformations library(mass) data(animals) plot(brain body, data=animals) plot(brain body, log="xy",data=animals) Original Units Logarithmic Scale Brain Weight (g) African elephant Asian elephant Human Brachiosaurus Brain Weight (g) 5 e 01 5 e+01 5 e+03 1 e 01 1 e+03 Brachiosaurus Triceratops Dipliodocus Body Weight (kg) Body Weight (kg)

12 Exploratory Data Analysis p. 12/2 Identifying Points Draw plot, then use either identify(x,y, labels) locator(n) to label points to find coordinates of n locations left-click with mouse at locations (MAC click) to stop, right-click the mouse in the plot (MAC command-click) > identify(animals$body, Animals$brain, labels=rownames(animals)) Use text() to add other text or legends at locations

13 Exploratory Data Analysis p. 13/2 Multiple Plots per Page Use the mfrow option to control the number of plots in a page The command par(mfrow=c(1,2)) plot(brain body, data=animals) plot(brain body, log="xy",data=animals) creates a figure with 1 row of plots and 2 columns stays in effect until another mfrow command is sent See help(par) for more graphical control parameters

14 Exploratory Data Analysis p. 14/2 Powers Transformations Explore Box-Cox family of power transformations of y, x power(y, p) { y p 1 p (p 0) log(y) (p = 0) The ladder function of HH is built on the lattice package > ladder(brain body, data=animals, main="powers of Brain and Body Weight") See Adler Chapter 4 for installing/loading packages on different systems

15 Exploratory Data Analysis p. 15/2 Animals Example Powers Transformations of Brain and Body Weight brain^2 ~ body^ 1 brain^2 ~ body^ 0.5 brain^2 ~ body^0 brain^2 ~ body^0.5 brain^2 ~ body^1 brain^2 ~ body^2 3e+07 2e+07 1e e+00 3e+07 brain^1 ~ body^ 1 brain^1 ~ body^ 0.5 brain^1 ~ body^0 brain^1 ~ body^0.5 brain^1 ~ body^1 brain^1 ~ body^2 2e+07 1e e+00 3e+07 brain^0.5 ~ body^ 1 brain^0.5 ~ body^ 0.5 brain^0.5 ~ body^0 brain^0.5 ~ body^0.5 brain^0.5 ~ body^1 brain^0.5 ~ body^2 2e+07 1e brain 0e+00 3e+07 brain^0 ~ body^ 1 brain^0 ~ body^ 0.5 brain^0 ~ body^0 brain^0 ~ body^0.5 brain^0 ~ body^1 brain^0 ~ body^2 2e+07 1e e+00 3e+07 brain^ 0.5 ~ body^ 1brain^ 0.5 ~ body^ 0.5 brain^ 0.5 ~ body^0 brain^ 0.5 ~ body^0.5 brain^ 0.5 ~ body^1 brain^ 0.5 ~ body^2 2e+07 1e e+00 brain^ 1 ~ body^ 1 brain^ 1 ~ body^ 0.5 brain^ 1 ~ body^0 brain^ 1 ~ body^0.5 brain^ 1 ~ body^1 brain^ 1 ~ body^2 3e+07 2e+07 1e e+00 0e+00 4e+09 0e+00 4e+09 0e+00 4e+09 0e+00 4e+09 0e+00 4e+09 0e+00 4e+09 body

16 Exploratory Data Analysis p. 16/2 Scatterplot Matrices SPLOM plot(usair) or pairs(usair) pairs(usair, panel=panel.smooth) Add a smoother to each plot pairs(so2., panel=panel.smooth, data=usair) use a model formula pairs(so2 temp + mfgfirms + wind + raindays, panel=panel.smooth, data=usair) Hartigan s original version of a scatterplot matrix had histograms on the diagonal. We need to first define a function panel.hist for the diaginal panels See help(pairs)

17 Exploratory Data Analysis p. 17/2 Defining a function panel.hist = function(x, col="grey",...) { usr <- par("usr") on.exit(par(usr)) par(usr = c(usr[1:2], 0, 1.5) ) h <- hist(x, plot = FALSE) breaks <- h$breaks nb <- length(breaks) y <- h$counts; y <- y/max(y) rect(breaks[-nb],0,breaks[-1],y, col=col,...) }

18 Exploratory Data Analysis p. 18/2 SPLOM with histogram > pairs(so2 temp + mfgfirms + popn + wind + precip + raindays, data=usair, panel=panel.smooth, diag.panel=panel.hist) > pairs(log(so2) log(temp) + log(mfgfirms) + log(popn) + log(wind) + log(precip) + log(raindays), data=usair, panel=panel.smooth, diag.panel=panel.hist)

19 Exploratory Data Analysis p. 19/2 Scatterplot Matrix SO temp mfgfirms popn wind precip raindays

20 Exploratory Data Analysis p. 20/2 Scatterplot Matrix with transformation log(so2) log(temp) log(mfgfirms) log(popn) wind log(precip) log(raindays)

21 Exploratory Data Analysis p. 21/2 Ladder of Powers ladder(so2 temp, data=usair) ladder(so2 mfgfirms, data=usair) ladder(so2 popn, data=usair) ladder(so2 wind, data=usair) ladder(so2 precip, data=usair) ladder(so2 raindays, data=usair) Is there one trasformation for SO2 that "linearizes" relationships with other variables? Subjective choice of transformation

Exploratory Data Analysis September 6, 2005

Exploratory Data Analysis September 6, 2005 Exploratory Data Analysis September 6, 2005 Exploratory Data Analysis p. 1/16 Somethings to Look for with EDA skewness in distributions non-constant variability nonlinearity need for transformations outliers

More information

Exploratory Data Analysis - Part 2 September 8, 2005

Exploratory Data Analysis - Part 2 September 8, 2005 Exploratory Data Analysis - Part 2 September 8, 2005 Exploratory Data Analysis - Part 2 p. 1/20 Trellis Plots Trellis plots (S-Plus) and Lattice plots in R also create layouts for multiple plots. A trellis

More information

Exploratory Data Analysis September 3, 2008

Exploratory Data Analysis September 3, 2008 Exploratory Data Analysis September 3, 2008 Exploratory Data Analysis p.1/16 Installing Packages from CRAN R GUI (MAC/WINDOWS): Use the Package Menu (make sure that you select install dependencies ) Command-line

More information

Advanced Statistics 1. Lab 11 - Charts for three or more variables. Systems modelling and data analysis 2016/2017

Advanced Statistics 1. Lab 11 - Charts for three or more variables. Systems modelling and data analysis 2016/2017 Advanced Statistics 1 Lab 11 - Charts for three or more variables 1 Preparing the data 1. Run RStudio Systems modelling and data analysis 2016/2017 2. Set your Working Directory using the setwd() command.

More information

Table of Contents. Preface... ix

Table of Contents. Preface... ix See also the online version (not complete) http://www.cookbook-r.com/graphs/ UCSD download from http:// proquest.safaribooksonline.com/9781449363086 Table of Contents Preface.......................................................................

More information

SECTION 1-B. 1.2 Data Visualization with R Reading Data. Flea Beetles. Data Mining 2018

SECTION 1-B. 1.2 Data Visualization with R Reading Data. Flea Beetles. Data Mining 2018 Data Mining 18 SECTION 1-B 1.2 Data Visualization with R It is very important to gain a feel for the data that we are investigating. One way to do this is by visualization. We will do this by starting

More information

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures Part I, Chapters 4 & 5 Data Tables and Data Analysis Statistics and Figures Descriptive Statistics 1 Are data points clumped? (order variable / exp. variable) Concentrated around one value? Concentrated

More information

22s:152 Applied Linear Regression. Chapter 18: Nonparametric Regression (Lowess smoothing)

22s:152 Applied Linear Regression. Chapter 18: Nonparametric Regression (Lowess smoothing) 22s:152 Applied Linear Regression Chapter 18: Nonparametric Regression (Lowess smoothing) When the nature of the response function (or mean structure) is unknown and one wishes to investigate it, a nonparametric

More information

Exploratory Data Analysis EDA

Exploratory Data Analysis EDA Exploratory Data Analysis EDA Luc Anselin http://spatial.uchicago.edu 1 from EDA to ESDA dynamic graphics primer on multivariate EDA interpretation and limitations 2 From EDA to ESDA 3 Exploratory Data

More information

Introduction to R. Hao Helen Zhang. Fall Department of Mathematics University of Arizona

Introduction to R. Hao Helen Zhang. Fall Department of Mathematics University of Arizona Department of Mathematics University of Arizona hzhang@math.aricona.edu Fall 2019 What is R R is the most powerful and most widely used statistical software Video: A language and environment for statistical

More information

Types of Plotting Functions. Managing graphics devices. Further High-level Plotting Functions. The plot() Function

Types of Plotting Functions. Managing graphics devices. Further High-level Plotting Functions. The plot() Function 3 / 23 5 / 23 Outline The R Statistical Environment R Graphics Peter Dalgaard Department of Biostatistics University of Copenhagen January 16, 29 1 / 23 2 / 23 Overview Standard R Graphics The standard

More information

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Minitab 17 commands Prepared by Jeffrey S. Simonoff Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save

More information

Matrix Plot. Sample StatFolio: matrixplot.sgp

Matrix Plot. Sample StatFolio: matrixplot.sgp Matrix Plot Summary The Matrix Plot procedure creates a matrix of plots for 3 or more numeric variables. The diagonal of the matrix contains box-and-whisker plots for each variable. The off-diagonal positions

More information

Install RStudio from - use the standard installation.

Install RStudio from   - use the standard installation. Session 1: Reading in Data Before you begin: Install RStudio from http://www.rstudio.com/ide/download/ - use the standard installation. Go to the course website; http://faculty.washington.edu/kenrice/rintro/

More information

R Programming: Worksheet 6

R Programming: Worksheet 6 R Programming: Worksheet 6 Today we ll study a few useful functions we haven t come across yet: all(), any(), `%in%`, match(), pmax(), pmin(), unique() We ll also apply our knowledge to the bootstrap.

More information

The Basics of Plotting in R

The Basics of Plotting in R The Basics of Plotting in R R has a built-in Datasets Package: iris mtcars precip faithful state.x77 USArrests presidents ToothGrowth USJudgeRatings You can call built-in functions like hist() or plot()

More information

Roger D. Peng, Associate Professor of Biostatistics Johns Hopkins Bloomberg School of Public Health

Roger D. Peng, Associate Professor of Biostatistics Johns Hopkins Bloomberg School of Public Health The Lattice Plotting System in R Roger D. Peng, Associate Professor of Biostatistics Johns Hopkins Bloomberg School of Public Health The Lattice Plotting System The lattice plotting system is implemented

More information

Statistics 251: Statistical Methods

Statistics 251: Statistical Methods Statistics 251: Statistical Methods Summaries and Graphs in R Module R1 2018 file:///u:/documents/classes/lectures/251301/renae/markdown/master%20versions/summary_graphs.html#1 1/14 Summary Statistics

More information

AA BB CC DD EE. Introduction to Graphics in R

AA BB CC DD EE. Introduction to Graphics in R Introduction to Graphics in R Cori Mar 7/10/18 ### Reading in the data dat

More information

BASIC LOESS, PBSPLINE & SPLINE

BASIC LOESS, PBSPLINE & SPLINE CURVES AND SPLINES DATA INTERPOLATION SGPLOT provides various methods for fitting smooth trends to scatterplot data LOESS An extension of LOWESS (Locally Weighted Scatterplot Smoothing), uses locally weighted

More information

MATH11400 Statistics Homepage

MATH11400 Statistics Homepage MATH11400 Statistics 1 2010 11 Homepage http://www.stats.bris.ac.uk/%7emapjg/teach/stats1/ 1.1 A Framework for Statistical Problems Many statistical problems can be described by a simple framework in which

More information

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Objectives: 1. To learn how to interpret scatterplots. Specifically you will investigate, using

More information

Stat 849: Plotting responses and covariates

Stat 849: Plotting responses and covariates Stat 849: Plotting responses and covariates Douglas Bates Department of Statistics University of Wisconsin, Madison 2010-09-03 Outline R Graphics Systems Brain weight Cathedrals Longshoots Domedata Summary

More information

Stat 849: Plotting responses and covariates

Stat 849: Plotting responses and covariates Stat 849: Plotting responses and covariates Douglas Bates 10-09-03 Outline Contents 1 R Graphics Systems Graphics systems in R ˆ R provides three dierent high-level graphics systems base graphics The system

More information

Multistat2 1

Multistat2 1 Multistat2 1 2 Multistat2 3 Multistat2 4 Multistat2 5 Multistat2 6 This set of data includes technologically relevant properties for lactic acid bacteria isolated from Pasta Filata cheeses 7 8 A simple

More information

DSCI 325: Handout 18 Introduction to Graphics in R

DSCI 325: Handout 18 Introduction to Graphics in R DSCI 325: Handout 18 Introduction to Graphics in R Spring 2016 This handout will provide an introduction to creating graphics in R. One big advantage that R has over SAS (and over several other statistical

More information

A guide to use the flowpeaks: a flow cytometry data clustering algorithm via K-means and density peak finding

A guide to use the flowpeaks: a flow cytometry data clustering algorithm via K-means and density peak finding A guide to use the flowpeaks: a flow cytometry data clustering algorithm via K-means and density peak finding Yongchao Ge October 30, 2018 yongchao.ge@gmail.com 1 Licensing Under the Artistic License,

More information

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures

More information

Visualizing Multivariate Data

Visualizing Multivariate Data Visualizing Multivariate Data We ve spent a lot of time so far looking at analysis of the relationship of two variables. When we compared groups, we had 1 continuous variable and 1 categorical variable.

More information

In Minitab interface has two windows named Session window and Worksheet window.

In Minitab interface has two windows named Session window and Worksheet window. Minitab Minitab is a statistics package. It was developed at the Pennsylvania State University by researchers Barbara F. Ryan, Thomas A. Ryan, Jr., and Brian L. Joiner in 1972. Minitab began as a light

More information

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables

More information

Making Science Graphs and Interpreting Data

Making Science Graphs and Interpreting Data Making Science Graphs and Interpreting Data Eye Opener: 5 mins What do you see? What do you think? Look up terms you don t know What do Graphs Tell You? A graph is a way of expressing a relationship between

More information

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /6/ /13

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /6/ /13 BIO5312 Biostatistics R Session 02: Graph Plots in R Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 9/6/2016 1 /13 Graphic Methods Graphic methods of displaying data give a quick

More information

Model Selection and Inference

Model Selection and Inference Model Selection and Inference Merlise Clyde January 29, 2017 Last Class Model for brain weight as a function of body weight In the model with both response and predictor log transformed, are dinosaurs

More information

An Introductory Guide to R

An Introductory Guide to R An Introductory Guide to R By Claudia Mahler 1 Contents Installing and Operating R 2 Basics 4 Importing Data 5 Types of Data 6 Basic Operations 8 Selecting and Specifying Data 9 Matrices 11 Simple Statistics

More information

Package OLScurve. August 29, 2016

Package OLScurve. August 29, 2016 Type Package Title OLS growth curve trajectories Version 0.2.0 Date 2014-02-20 Package OLScurve August 29, 2016 Maintainer Provides tools for more easily organizing and plotting individual ordinary least

More information

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave.

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. http://en.wikipedia.org/wiki/local_regression Local regression

More information

A (very) brief introduction to R

A (very) brief introduction to R A (very) brief introduction to R You typically start R at the command line prompt in a command line interface (CLI) mode. It is not a graphical user interface (GUI) although there are some efforts to produce

More information

Lesson 18-1 Lesson Lesson 18-1 Lesson Lesson 18-2 Lesson 18-2

Lesson 18-1 Lesson Lesson 18-1 Lesson Lesson 18-2 Lesson 18-2 Topic 18 Set A Words survey data Topic 18 Set A Words Lesson 18-1 Lesson 18-1 sample line plot Lesson 18-1 Lesson 18-1 frequency table bar graph Lesson 18-2 Lesson 18-2 Instead of making 2-sided copies

More information

Box-Cox Transformation for Simple Linear Regression

Box-Cox Transformation for Simple Linear Regression Chapter 192 Box-Cox Transformation for Simple Linear Regression Introduction This procedure finds the appropriate Box-Cox power transformation (1964) for a dataset containing a pair of variables that are

More information

Statistics. Lowess. Now, compute the lowess model,, and plot. Finally, plot the data points by themselves, and display the two plots together.

Statistics. Lowess. Now, compute the lowess model,, and plot. Finally, plot the data points by themselves, and display the two plots together. Statistics The updates for Statistics in Maple 2015 include several new commands, as well as added support in the context menu for matrix data sets, and new and improved visualizations. Lowess Lowess (locally

More information

Course on Microarray Gene Expression Analysis

Course on Microarray Gene Expression Analysis Course on Microarray Gene Expression Analysis ::: Normalization methods and data preprocessing Madrid, April 27th, 2011. Gonzalo Gómez ggomez@cnio.es Bioinformatics Unit CNIO ::: Introduction. The probe-level

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming

More information

CIND123 Module 6.2 Screen Capture

CIND123 Module 6.2 Screen Capture CIND123 Module 6.2 Screen Capture Hello, everyone. In this segment, we will discuss the basic plottings in R. Mainly; we will see line charts, bar charts, histograms, pie charts, and dot charts. Here is

More information

Tips and Guidance for Analyzing Data. Executive Summary

Tips and Guidance for Analyzing Data. Executive Summary Tips and Guidance for Analyzing Data Executive Summary This document has information and suggestions about three things: 1) how to quickly do a preliminary analysis of time-series data; 2) key things to

More information

A Comparative Study of LOWESS and RBF Approximations for Visualization

A Comparative Study of LOWESS and RBF Approximations for Visualization A Comparative Study of LOWESS and RBF Approximations for Visualization Michal Smolik, Vaclav Skala and Ondrej Nedved Faculty of Applied Sciences, University of West Bohemia, Univerzitni 8, CZ 364 Plzen,

More information

Bar Charts and Frequency Distributions

Bar Charts and Frequency Distributions Bar Charts and Frequency Distributions Use to display the distribution of categorical (nominal or ordinal) variables. For the continuous (numeric) variables, see the page Histograms, Descriptive Stats

More information

Elementary Statistics. Chapter 2 Review: Summarizing & Graphing Data

Elementary Statistics. Chapter 2 Review: Summarizing & Graphing Data Name Elementary Statistics Date Period Chapter 2 Review: Summarizing & Graphing Data Quick Quiz p.74 #1-10 Use the following information to answer questions 1-3: When one is constructing a table representing

More information

Using Excel for Graphical Analysis of Data

Using Excel for Graphical Analysis of Data Using Excel for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters. Graphs are

More information

Box-Cox Transformation

Box-Cox Transformation Chapter 190 Box-Cox Transformation Introduction This procedure finds the appropriate Box-Cox power transformation (1964) for a single batch of data. It is used to modify the distributional shape of a set

More information

Az R adatelemzési nyelv

Az R adatelemzési nyelv Az R adatelemzési nyelv alapjai II. Egészségügyi informatika és biostatisztika Gézsi András gezsi@mit.bme.hu Functions Functions Functions do things with data Input : function arguments (0,1,2, ) Output

More information

NEURAL NETWORKS. Cement. Blast Furnace Slag. Fly Ash. Water. Superplasticizer. Coarse Aggregate. Fine Aggregate. Age

NEURAL NETWORKS. Cement. Blast Furnace Slag. Fly Ash. Water. Superplasticizer. Coarse Aggregate. Fine Aggregate. Age NEURAL NETWORKS As an introduction, we ll tackle a prediction task with a continuous variable. We ll reproduce research from the field of cement and concrete manufacturing that seeks to model the compressive

More information

Logical operators: R provides an extensive list of logical operators. These include

Logical operators: R provides an extensive list of logical operators. These include meat.r: Explanation of code Goals of code: Analyzing a subset of data Creating data frames with specified X values Calculating confidence and prediction intervals Lists and matrices Only printing a few

More information

Nonparametric Approaches to Regression

Nonparametric Approaches to Regression Nonparametric Approaches to Regression In traditional nonparametric regression, we assume very little about the functional form of the mean response function. In particular, we assume the model where m(xi)

More information

6. More Loops, Control Structures, and Bootstrapping

6. More Loops, Control Structures, and Bootstrapping 6. More Loops, Control Structures, and Bootstrapping Ken Rice Tim Thornton University of Washington Seattle, July 2018 In this session We will introduce additional looping procedures as well as control

More information

Data Management Project Using Software to Carry Out Data Analysis Tasks

Data Management Project Using Software to Carry Out Data Analysis Tasks Data Management Project Using Software to Carry Out Data Analysis Tasks This activity involves two parts: Part A deals with finding values for: Mean, Median, Mode, Range, Standard Deviation, Max and Min

More information

Data Visualization SURFACE WATER MODELING SYSTEM. 1 Introduction. 2 Data sets. 3 Open the Geometry and Solution Files

Data Visualization SURFACE WATER MODELING SYSTEM. 1 Introduction. 2 Data sets. 3 Open the Geometry and Solution Files SURFACE WATER MODELING SYSTEM Data Visualization 1 Introduction It is useful to view the geospatial data utilized as input and generated as solutions in the process of numerical analysis. It is also helpful

More information

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked Plotting Menu: QCExpert Plotting Module graphs offers various tools for visualization of uni- and multivariate data. Settings and options in different types of graphs allow for modifications and customizations

More information

R Workshop Module 3: Plotting Data Katherine Thompson Department of Statistics, University of Kentucky

R Workshop Module 3: Plotting Data Katherine Thompson Department of Statistics, University of Kentucky R Workshop Module 3: Plotting Data Katherine Thompson (katherine.thompson@uky.edu Department of Statistics, University of Kentucky October 15, 2013 Reading in Data Start by reading the dataset practicedata.txt

More information

Mathematics 9 Exploration Lab Scatter Plots and Lines of Best Fit. a line used to fit into data in order to make a prediction about the data.

Mathematics 9 Exploration Lab Scatter Plots and Lines of Best Fit. a line used to fit into data in order to make a prediction about the data. Mathematics 9 Exploration Lab Scatter Plots and Lines of Best Fit A. Definitions Line of Best Fit: a line used to fit into data in order to make a prediction about the data. Scatter Plot: a graph of unconnected

More information

After opening Stata for the first time: set scheme s1mono, permanently

After opening Stata for the first time: set scheme s1mono, permanently Stata 13 HELP Getting help Type help command (e.g., help regress). If you don't know the command name, type lookup topic (e.g., lookup regression). Email: tech-support@stata.com. Put your Stata serial

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SPSS SPSS (originally Statistical Package for the Social Sciences ) is a commercial statistical software package with an easy-to-use

More information

Advanced Econometric Methods EMET3011/8014

Advanced Econometric Methods EMET3011/8014 Advanced Econometric Methods EMET3011/8014 Lecture 2 John Stachurski Semester 1, 2011 Announcements Missed first lecture? See www.johnstachurski.net/emet Weekly download of course notes First computer

More information

Mixed models in R using the lme4 package Part 2: Lattice graphics

Mixed models in R using the lme4 package Part 2: Lattice graphics Mixed models in R using the lme4 package Part 2: Lattice graphics Douglas Bates University of Wisconsin - Madison and R Development Core Team University of Lausanne July 1,

More information

Hot springs that erupt intermittently in a column

Hot springs that erupt intermittently in a column L A B 1 MODELING OLD FAITHFUL S ERUPTIONS Modeling Data Hot springs that erupt intermittently in a column of steam and hot water are called geysers. Geysers may erupt in regular or irregular intervals

More information

testo Comfort Software Professional 4 Instruction manual

testo Comfort Software Professional 4 Instruction manual testo Comfort Software Professional 4 Instruction manual 2 1 Contents 1 Contents 1 Contents...3 2 About this document...5 3 Specifications...6 3.1. Use...6 3.2. System requirements...6 4 First steps...7

More information

Statistical Modelling for Social Scientists

Statistical Modelling for Social Scientists Statistical Modelling for Social Scientists January 18, 19 and 20, 2012 Exercise 5: Diagnostics and Transformation The data set Transformations.csv contains information on four continuous variables, outcome,

More information

Making the Transition from R-code to Arc

Making the Transition from R-code to Arc Making the Transition from R-code to Arc Sanford Weisberg Supported by the National Science Foundation Division of Undergraduate Education Grants 93-54678 and 96-52887. May 17, 2000 Arc is the revision

More information

9 POINTS TO A GOOD LINE GRAPH

9 POINTS TO A GOOD LINE GRAPH NAME: PD: DATE: 9 POINTS TO A GOOD LINE GRAPH - 2013 1. Independent Variable on the HORIZONTAL (X) AXIS RANGE DIVIDED BY SPACES and round up to nearest usable number to spread out across the paper. LABELED

More information

Chapter 6: Comparing Two Means Section 6.1: Comparing Two Groups Quantitative Response

Chapter 6: Comparing Two Means Section 6.1: Comparing Two Groups Quantitative Response Stat 300: Intro to Probability & Statistics Textbook: Introduction to Statistical Investigations Name: American River College Chapter 6: Comparing Two Means Section 6.1: Comparing Two Groups Quantitative

More information

IST 3108 Data Analysis and Graphics Using R Week 9

IST 3108 Data Analysis and Graphics Using R Week 9 IST 3108 Data Analysis and Graphics Using R Week 9 Engin YILDIZTEPE, Ph.D 2017-Spring Introduction to Graphics >y plot (y) In R, pictures are presented in the active graphical device or window.

More information

Binary Regression in S-Plus

Binary Regression in S-Plus Fall 200 STA 216 September 7, 2000 1 Getting Started in UNIX Binary Regression in S-Plus Create a class working directory and.data directory for S-Plus 5.0. If you have used Splus 3.x before, then it is

More information

Contents 10. Graphs of Trigonometric Functions

Contents 10. Graphs of Trigonometric Functions Contents 10. Graphs of Trigonometric Functions 2 10.2 Sine and Cosine Curves: Horizontal and Vertical Displacement...... 2 Example 10.15............................... 2 10.3 Composite Sine and Cosine

More information

IQR = number. summary: largest. = 2. Upper half: Q3 =

IQR = number. summary: largest. = 2. Upper half: Q3 = Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number

More information

6. More Loops, Control Structures, and Bootstrapping

6. More Loops, Control Structures, and Bootstrapping 6. More Loops, Control Structures, and Bootstrapping Ken Rice Timothy Thornotn University of Washington Seattle, July 2013 In this session We will introduce additional looping procedures as well as control

More information

Fathom Dynamic Data TM Version 2 Specifications

Fathom Dynamic Data TM Version 2 Specifications Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other

More information

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Module 2B Organizing Data and Comparing Distributions (Part II) STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and

More information

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II) STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and

More information

Outline. Part 2: Lattice graphics. The formula/data method of specifying graphics. Exploring and presenting data. Presenting data.

Outline. Part 2: Lattice graphics. The formula/data method of specifying graphics. Exploring and presenting data. Presenting data. Outline Part 2: Lattice graphics ouglas ates University of Wisconsin - Madison and R evelopment ore Team Sept 08, 2010 Presenting data Scatter plots Histograms and density plots

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

Pre-Lab Excel Problem

Pre-Lab Excel Problem Pre-Lab Excel Problem Read and follow the instructions carefully! Below you are given a problem which you are to solve using Excel. If you have not used the Excel spreadsheet a limited tutorial is given

More information

Stats with R and RStudio Practical: basic stats for peak calling Jacques van Helden, Hugo varet and Julie Aubert

Stats with R and RStudio Practical: basic stats for peak calling Jacques van Helden, Hugo varet and Julie Aubert Stats with R and RStudio Practical: basic stats for peak calling Jacques van Helden, Hugo varet and Julie Aubert 2017-01-08 Contents Introduction 2 Peak-calling: question...........................................

More information

More Summer Program t-shirts

More Summer Program t-shirts ICPSR Blalock Lectures, 2003 Bootstrap Resampling Robert Stine Lecture 2 Exploring the Bootstrap Questions from Lecture 1 Review of ideas, notes from Lecture 1 - sample-to-sample variation - resampling

More information

The nor1mix Package. August 3, 2006

The nor1mix Package. August 3, 2006 The nor1mix Package August 3, 2006 Title Normal (1-d) Mixture Models (S3 Classes and Methods) Version 1.0-6 Date 2006-08-02 Author: Martin Mächler Maintainer Martin Maechler

More information

Nonparametric Density Estimation

Nonparametric Density Estimation Nonparametric Estimation Data: X 1,..., X n iid P where P is a distribution with density f(x). Aim: Estimation of density f(x) Parametric density estimation: Fit parametric model {f(x θ) θ Θ} to data parameter

More information

Introduction to R. Biostatistics 615/815 Lecture 23

Introduction to R. Biostatistics 615/815 Lecture 23 Introduction to R Biostatistics 615/815 Lecture 23 So far We have been working with C Strongly typed language Variable and function types set explicitly Functional language Programs are a collection of

More information

Calculating Speech Intelligibility Index (SII) using R

Calculating Speech Intelligibility Index (SII) using R Calculating Speech Intelligibility Index (SII) using R Gregory R. Warnes, Ph.D. Gregory R. Warnes Consulting December 7, 2013 This document describes the calculation of Speech Intelligibility Index (SII)

More information

IT 403 Practice Problems (1-2) Answers

IT 403 Practice Problems (1-2) Answers IT 403 Practice Problems (1-2) Answers #1. Using Tukey's Hinges method ('Inclusionary'), what is Q3 for this dataset? 2 3 5 7 11 13 17 a. 7 b. 11 c. 12 d. 15 c (12) #2. How do quartiles and percentiles

More information

Package fractaldim. R topics documented: February 19, Version Date Title Estimation of fractal dimensions

Package fractaldim. R topics documented: February 19, Version Date Title Estimation of fractal dimensions Version 0.8-4 Date 2014-02-24 Title Estimation of fractal dimensions Package fractaldim February 19, 2015 Author Hana Sevcikova , Don Percival , Tilmann Gneiting

More information

From Getting Started with the Graph Template Language in SAS. Full book available for purchase here.

From Getting Started with the Graph Template Language in SAS. Full book available for purchase here. From Getting Started with the Graph Template Language in SAS. Full book available for purchase here. Contents About This Book... xi About The Author... xv Acknowledgments...xvii Chapter 1: Introduction

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Rebecca C. Steorts, Duke University STA 325, Chapter 3 ISL 1 / 49 Agenda How to extend beyond a SLR Multiple Linear Regression (MLR) Relationship Between the Response and Predictors

More information

Gallery of Graphics using EcLS (Econometrics with Lisp-Stat)

Gallery of Graphics using EcLS (Econometrics with Lisp-Stat) Gallery of Graphics using EcLS (Econometrics with Lisp-Stat) An Introductory Guide by Thomas A. McGahagan May, 2007 Introduction The EcLS program builds on the strong graphics capabilities of Luke Tierney's

More information

Session 6: Oracle R Enterprise Statistics Engine Oracle R Technologies

Session 6: Oracle R Enterprise Statistics Engine Oracle R Technologies Session 6: Oracle R Enterprise 1.5.1 Statistics Engine Oracle R Technologies Mark Hornick Director, Oracle Advanced Analytics and Machine Learning July 2017 Safe Harbor Statement The following is intended

More information

STAT 540: R: Sections Arithmetic in R. Will perform these on vectors, matrices, arrays as well as on ordinary numbers

STAT 540: R: Sections Arithmetic in R. Will perform these on vectors, matrices, arrays as well as on ordinary numbers Arithmetic in R R can be viewed as a very fancy calculator Can perform the ordinary mathematical operations: + - * / ˆ Will perform these on vectors, matrices, arrays as well as on ordinary numbers With

More information

Common Sta 101 Commands for R. 1 One quantitative variable. 2 One categorical variable. 3 Two categorical variables. Summary statistics

Common Sta 101 Commands for R. 1 One quantitative variable. 2 One categorical variable. 3 Two categorical variables. Summary statistics Common Sta 101 Commands for R 1 One quantitative variable summary(x) # most summary statitstics at once mean(x) median(x) sd(x) hist(x) boxplot(x) # horizontal = TRUE for horizontal plot qqnorm(x) qqline(x)

More information

5 Practical Data Analysis Examples

5 Practical Data Analysis Examples 5 Practical Data Analysis Examples Scatterplot matrices Transformation Model objects Generic functions Extractor function List objects Scatterplot matrices can give useful insights on data that will be

More information

Introduction to SAS/GRAPH Statistical Graphics Procedures

Introduction to SAS/GRAPH Statistical Graphics Procedures 3 CHAPTER 1 Introduction to SAS/GRAPH Statistical Graphics Procedures Overview of SAS/GRAPH Statistical Graphics Procedures 3 Introduction to the SGPLOT Procedure 4 Introduction to the SGPANEL Procedure

More information

STATGRAPHICS Sigma Express for Microsoft Excel. Getting Started

STATGRAPHICS Sigma Express for Microsoft Excel. Getting Started STATGRAPHICS Sigma Express for Microsoft Excel Getting Started STATGRAPHICS SIGMA EXPRESS GETTING STARTED 2012 by StatPoint Technologies, Inc. www.statgraphics.com All rights reserved. No portion of this

More information

Math 227 EXCEL / MEGASTAT Guide

Math 227 EXCEL / MEGASTAT Guide Math 227 EXCEL / MEGASTAT Guide Introduction Introduction: Ch2: Frequency Distributions and Graphs Construct Frequency Distributions and various types of graphs: Histograms, Polygons, Pie Charts, Stem-and-Leaf

More information

Package r2d2. February 20, 2015

Package r2d2. February 20, 2015 Package r2d2 February 20, 2015 Version 1.0-0 Date 2014-03-31 Title Bivariate (Two-Dimensional) Confidence Region and Frequency Distribution Author Arni Magnusson [aut], Julian Burgos [aut, cre], Gregory

More information