Chuck Cartledge, PhD. 23 September 2017

Similar documents
Chuck Cartledge, PhD. 13 October 2018

Introduction to Lattice Graphics. Richard Pugh 4 th December 2012

CREATING POWERFUL AND EFFECTIVE GRAPHICAL DISPLAYS: AN INTRODUCTION TO LATTICE GRAPHICS IN R

Chuck Cartledge, PhD. 23 September 2017

Roger D. Peng, Associate Professor of Biostatistics Johns Hopkins Bloomberg School of Public Health

Stata tip 81: A table of graphs

Chuck Cartledge, PhD. 20 January 2018

Chuck Cartledge, PhD. 12 October 2018

An introduction to ggplot: An implementation of the grammar of graphics in R

Chuck Cartledge, PhD. 23 September 2017

Introduction. Product List. Design and Functionality 1/10/2013. GIS Seminar Series 2012 Division of Spatial Information Science

Chuck Cartledge, PhD. 19 January 2018

Trellis Displays. Definition. Example. Trellising: Which plot is best? Historical Development. Technical Definition

Mixed models in R using the lme4 package Part 2: Lattice graphics

Introduction to Minitab 1

Stat 849: Plotting responses and covariates

Stat 849: Plotting responses and covariates

WOMEN'S INTERAGENCY HIV STUDY SECTION 40: GEOCODING PROTOCOL

AN INTRODUCTION TO LATTICE GRAPHICS IN R

Chuck Cartledge, PhD. 24 September 2017

Multinomial Scan Statistic for Identifying Unusual Population Age Structures

Quick introduction to descriptive statistics and graphs in. R Commander. Written by: Robin Beaumont

Section 2-2 Frequency Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc

SES 123 Global and Regional Energy Lab Worksheet

Introduction to R for Epidemiologists

Outline. Part 2: Lattice graphics. The formula/data method of specifying graphics. Exploring and presenting data. Presenting data.

4 Displaying Multiway Tables

THE DOT PLOT: A GRAPHICAL DISPLAY FOR LABELED QUANTITATIVE VALUES

8: Statistics. Populations and Samples. Histograms and Frequency Polygons. Page 1 of 10

Data Mining: Exploring Data. Lecture Notes for Chapter 3

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham City DH1 3LE UK

R Commander Tutorial

Introduction to R and Statistical Data Analysis

STATISTICAL LABORATORY, April 30th, 2010 BIVARIATE PROBABILITY DISTRIBUTIONS

Advanced Statistics 1. Lab 11 - Charts for three or more variables. Systems modelling and data analysis 2016/2017

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Learning Objectives for Data Concept and Visualization

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining

R Graphics. SCS Short Course March 14, 2008

You submitted this quiz on Sat 17 May :19 AM CEST. You got a score of out of

Using EmisView to Quality Assure and Visualize Emission Modeling Data

Package qicharts. October 7, 2014

22s:152 Applied Linear Regression DeCook Fall 2011 Lab 3 Monday October 3

Data Mining: Exploring Data

wireframe: perspective plot of a surface evaluated on a regular grid cloud: perspective plot of a cloud of points (3D scatterplot)

A straight line is the graph of a linear equation. These equations come in several forms, for example: change in x = y 1 y 0

Salary 9 mo : 9 month salary for faculty member for 2004

Combo Charts. Chapter 145. Introduction. Data Structure. Procedure Options

IQR = number. summary: largest. = 2. Upper half: Q3 =

SES 123 Global and Regional Energy Lab Procedures

How to Make Graphs in EXCEL

Excel 2010 with XLSTAT

Overview and Practical Application of Machine Learning in Pricing

Author: Leonore Findsen, Qi Wang, Sarah H. Sellke, Jeremy Troisi

Chuck Cartledge, PhD. 24 February 2018

Visualizing Data: Customization with ggplot2

Penalized regression Statistical Learning, 2011

Introduction to Nesstar

Exploratory Data Analysis EDA

Chapter 5: The beast of bias

Practice for Learning R and Learning Latex

AN OVERVIEW AND EXPLORATION OF JMP A DATA DISCOVERY SYSTEM IN DAIRY SCIENCE

Tutorial for the R Statistical Package

CHAPTER 6. The Normal Probability Distribution

PASS Sample Size Software

Chapter 8 An Introduction to the Lattice Package

MicroStrategy Academic Program

BIOL 458 BIOMETRY Lab 10 - Multiple Regression

8. MINITAB COMMANDS WEEK-BY-WEEK

Intermediate SAS: Statistics

The linear mixed model: modeling hierarchical and longitudinal data

Package oaxaca. January 3, Index 11

Facet: Multiple View Methods

Package r2d2. February 20, 2015

Advanced Econometric Methods EMET3011/8014

Algebra 1 Interactive Chalkboard Copyright by The McGraw-Hill Companies, Inc. Send all inquiries to:

Getting Started with JMP at ISU

An Introduction to R 2.2 Statistical graphics

VIDEO SCREEN EXPLANATION

Organizing data in R. Fitting Mixed-Effects Models Using the lme4 Package in R. R packages. Accessing documentation. The Dyestuff data set

7/18/16. Review. Review of Homework. Lecture 3: Programming Statistics in R. Questions from last lecture? Problems with Stata? Problems with Excel?

ITSMR RESEARCH NOTE EFFECTS OF CELL PHONE USE AND OTHER DRIVER DISTRACTIONS ON HIGHWAY SAFETY: 2006 UPDATE. Introduction SUMMARY

2.1 Transforming Linear Functions

1. Setup Everyone: Mount the /geobase/geo5215 drive and add a new Lab4 folder in you Labs directory.

Select Cases. Select Cases GRAPHS. The Select Cases command excludes from further. selection criteria. Select Use filter variables

IBMSPSSSTATL1P: IBM SPSS Statistics Level 1

Chapter 7 Assignment due Wednesday, May 24

Overview: GeoViz Toolkit Interface:

Compensatory and Noncompensatory Multidimensional Randomized Item Response Analysis of the CAPS and EAQ Data

Package lattice. March 25, 2017

ITACS : Interactive Tool for Analysis of the Climate System

4. Descriptive Statistics: Measures of Variability and Central Tendency

An Introduction to Minitab Statistics 529

Exploratory Data Analysis September 3, 2008

Hypothesis Test Exercises from Class, Oct. 12, 2018

What's New in MATLAB for Engineering Data Analytics?

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Visualisation of Categorical Data

Data Science and Machine Learning Essentials

23.2 Normal Distributions

Transcription:

Introduction Lattice Cancer case study Hands-on Q&A Conclusion References Files Big Data: Data Analysis Boot Camp Visualizing with the Lattice Package Chuck Cartledge, PhD 23 September 2017 1/38

Table of contents (1 of 1) 1 Introduction 2 Lattice 3 Cancer case study 4 Hands-on 6 Conclusion 7 References 8 Files 5 Q & A 2/38

What are we going to cover? We re going to talk about: The lattice package functions and capabilities. Choroplethrmaps (what they are and how to construct them.) 3/38

Background A description A powerful and elegant high-level data visualization system inspired by Trellis graphics, with an emphasis on multivariate data. Lattice is sufficient for typical graphics needs, and is also flexible enough to handle most nonstandard requirements. D. Sarkar [2] 4/38

Background An explanation Trellis displays are plots which contain one or more panels, arranged in a regular grid-like structure (a trellis). Each panel graphs a subset of the data. All panels in a Trellis display contain the same type of graph but these graphs are general enough to encompass a wide variety of 2-D and 3-D displays: histogram, scatter plot, dot plot, contour plot, wireframe, 3-D point cloud and more. Becker, et al. [1] 5/38

Examples A lattice iris example library(lattice) xyplot(sepal.length ~Sepal.Width Species, data = iris) 6/38

Examples Same image. 7/38

Examples Traditional pairs plot of air quality Air quality data is part of the datasets package. pairs(airquality, panel = panel.smooth, main = airquality data ) 8/38

Examples Same image. 9/38

Examples A lattice histogram of air quality histogram(~temp factor(month), data = airquality, xlab = "Temperature", ylab = "Percent of total") 10/38

Examples Same image. 11/38

Examples A few words about R s formula Symbol Example Meaning + +X include this variable - -X delete this variable : X:Z include the interaction between these variables * X*Y include these variables and the interactions between them X Z conditioning: include x given z ˆ (X + Z + W )ˆ3 include these variables and all interactions up to three way I I(X*Z) as is: include a new variable consisting of these variables multiplied 1 X - 1 intercept: delete the intercept (regress through the origin) 12/38

Examples Stacked bar plots from text (Load attached file latticestackedbars.r into editor) (Attached file.) 13/38

Examples Same image. (Attached file.) 14/38

Examples Functions in package lattice 1 To list the functions in any package: 1 l s ( package : packagename ) 2 To see which packages are loaded: 1 s e a r c h ( ) 3 To list the functions in lattice 1 l s ( package : l a t t i c e ) There are 149 functions in the lattice package. 15/38

The latticeextra package Looking at the data The text devotes a fair number of pages to analyzing and plotting cancer related data from the latticeextra package. The purpose is to demonstrate plotting and analytical techniques. There is a problem with the data. 16/38

The latticeextra package Not all US states and territories are represented. Data is in the data frame: USCancerRates USCancerRates column names are: rate.male, LCL95.male, UCL95.male, rate.female, LCL95.female, UCL95.female, state, county Use unique(sort(unlist(as.character(uscancerrates$state)))) to find the unique state names Hawaii and the District of Columbia are not in the data frame. We ll explore cancer rates with better data. 17/38

The latticeextra package Which data and where to get it? Various sources of data: 1 Cancer rates for men and women (use 65 and under): https://statecancerprofiles.cancer.gov/ 2 Insurance rates: https://www.census.gov/did/www/sahie/data/ 20082013/index.html 3 Connect state and county names to Federal Information Processing Standards (FIPS) https://www2.census.gov/geo/docs/reference/codes/ files/national_county.txt 4 Get state and county shapefiles based FIPS http://www2.census.gov/geo/docs/maps-data/data/ gazetteer/gaz_counties_national.zip 18/38

The latticeextra package What do we do after we know where the data is? 1 Download the various data files (some will require bringing up a browser, and selecting a data file manually). 2 Build a data frame based on FIPS ID 1 Combine cancer rates 2 Expand with insurance rates 3 Expand with positional (latitude and longitude) data 3 Cleanse the data (numerical data wrangling) 4 Analyze the data All of these steps are in the attached file: cancerdata.r 19/38

The latticeextra package Parallel plot of selected counties and gender based cancer rates. set.seed(987) subsample <- data[sample(1:nrow(data), size=15), c( StateFIP, rate.male, rate.female )] parallelplot(subsample, horizontal.axis=false, groups=data$statefip) Important thing is that cancer rates differ based on gender. 20/38

The latticeextra package Same image. Important thing is that cancer rates differ based on gender. 21/38

The latticeextra package Male and female rates across all states and the DC. parallelplot(~data[,c( rate.female, rate.male )] data$state, horizontal.axis=false, data=data, varnames=c( f, m )) In general, female rate is higher than male. 22/38

The latticeextra package Same image. In general, female rate is higher than male. 23/38

The latticeextra package Fitted male and female rates across all states and the DC. xyplot(rate.male ~rate.female State, data=data, main = Death due to cancer, xlab= Rates in males, ylab= Rates in females, panel=function(x,y,...){ panel.xyplot(x,y) panel.abline(lm(y~x)) } ) Limited data points for DC. 24/38

The latticeextra package Same image. Limited data points for DC. 25/38

The latticeextra package Insurance based on latitude (north or south) xyplot(data$noinsur ~data$centlat, main = State latitude and health insurance coverage, xlab = Latitude, ylab = Percent of noninsured individuals ) 26/38

The latticeextra package Same image. 27/38

The latticeextra package Cancer rate by gender and northern or southern xyplot((rate.female + rate.male)/2 ~NoInsur ifelse(data$centlat>median(data$centlat), North, South ), data = data, index.cond=list(c(2,1)), panel = function(x, y,...) { panel.xyplot(x,y) panel.abline(lm(y~x)) panel.loess(x,y, col = Red ) }, xlab = Percent of noninsured individuals, ylab = Mortality due to cancer, main = Health insurance coverage and Mortality due to cancer by latitude ) 28/38

The latticeextra package Same image. 29/38

The latticeextra package Map of county level cancer rate for males. plotcolumn(data, rate.male ) (See attached file.) 30/38

The latticeextra package Same image. (See attached file.) 31/38

The latticeextra package Map of county level insurance rates. plotcolumn(data, NoInsur ) (See attached file.) 32/38

The latticeextra package Same image. (See attached file.) 33/38

Some simple exercises to get familiar with data visualization 1 Create a choroplethrmap showing the cancer rate for women 2 Create a plot of insurance rate based on longitude 3 Most states have a positive Death due to cancer of women to men. Ideas why? 34/38

Q & A time. Q: How does a hacker fix a function which doesn t work for all of the elements in its domain? A: He changes the domain. 35/38

What have we covered? Reviewed some of the functions in the lattice package Reviewed some of the data in the latticeextra package Created our own cancer rate data frame Analyzed the data using tools from the lattice package Analyzed the data using tools from the choroplethr and choroplethrmaps packages Next: LPAR Chapter 4, cluster analysis 36/38

References (1 of 1) [1] Richard A Becker, William S Cleveland, Ming-Jen Shyu, Stephen P Kaluzny, et al., A Tour of Trellis Graphics, Murray Hill, NJ: AT & T Bell Laboratories 44 (1996). [2] Deepayan Sarkar, Lattice: Multivariate Data Visualization with R, Springer, New York, 2008, ISBN 978-0-387-75968-5. 37/38

Files of interest 1 Updated cancer related 2 Stacked bar plots data 3 R library script file 38/38