STA490HY1Y. Initial Examination of Data
|
|
- Branden Bennett
- 5 years ago
- Views:
Transcription
1 STA490Y1Y Initial Examination of Data Alison L. Department of Statistical Sciences University of Toronto
2 Course mantra It s OK not to know. Expressing ignorance is encouraged. It s not OK to not have a willingness to learn.
3 Cleveland s Visualization of the Barley Data Trebi Wisconsin No. 38 No. 457 Glabron Peatland Velvet No. 475 Manchuria No. 462 Svansota Trebi Wisconsin No. 38 No. 457 Glabron Peatland Velvet No. 475 Manchuria No. 462 Svansota Trebi Wisconsin No. 38 No. 457 Glabron Peatland Velvet No. 475 Manchuria No. 462 Svansota Crookston Waseca University Farm Morris Grand Rapids Duluth Barley Yield (bushels/acre)
4 Initial Examination of Data Purpose: Understand the structure of the data.
5 Initial Examination of Data Purpose: Understand the structure of the data. Types of variables: Quantitative: continuous or discrete Categorical: nominal, ordinal (e.g., Likert items or binned quantitative), binary
6 Initial Examination of Data Purpose: Understand the structure of the data. Types of variables: Quantitative: continuous or discrete Categorical: nominal, ordinal (e.g., Likert items or binned quantitative), binary Check the quality of the data. Find errors (data cleaning). Check for credibility, consistency, completeness. Identify potential outliers. Are there missing observations?
7 Initial Examination of Data Purpose: Understand the structure of the data. Types of variables: Quantitative: continuous or discrete Categorical: nominal, ordinal (e.g., Likert items or binned quantitative), binary Check the quality of the data. Find errors (data cleaning). Check for credibility, consistency, completeness. Identify potential outliers. Are there missing observations? Clear up any problems.
8 Initial Examination of Data Purpose: Understand the structure of the data. Types of variables: Quantitative: continuous or discrete Categorical: nominal, ordinal (e.g., Likert items or binned quantitative), binary Check the quality of the data. Find errors (data cleaning). Check for credibility, consistency, completeness. Identify potential outliers. Are there missing observations? Clear up any problems. Get ideas for more sophisticated analyses.
9 Initial Examination of Data Purpose: Understand the structure of the data. Types of variables: Quantitative: continuous or discrete Categorical: nominal, ordinal (e.g., Likert items or binned quantitative), binary Check the quality of the data. Find errors (data cleaning). Check for credibility, consistency, completeness. Identify potential outliers. Are there missing observations? Clear up any problems. Get ideas for more sophisticated analyses. Check on whether or not assumptions of more sophisticated analyses seem reasonable.
10 IDA Should be motivated by original research questions.
11 IDA Should be motivated by original research questions. Avoid data dredging. (Look long enough and you ll find some meaningless pattern.)
12 IDA Should be motivated by original research questions. Avoid data dredging. (Look long enough and you ll find some meaningless pattern.) Trivial? Requires judgment and common sense.
13 IDA Should be motivated by original research questions. Avoid data dredging. (Look long enough and you ll find some meaningless pattern.) Trivial? Requires judgment and common sense. May be all that is needed.
14 Types of Missing Data 1. Missing Completely At Random (MCAR) The probability that a data value is missing does not depend on the missing value, nor on the values of all other variables.
15 Types of Missing Data 1. Missing Completely At Random (MCAR) The probability that a data value is missing does not depend on the missing value, nor on the values of all other variables. 2. Missing At Random (MAR) The probability that a data value is missing, conditional on the values of the other variables for the observation, is not related to the missing value.
16 Types of Missing Data 1. Missing Completely At Random (MCAR) The probability that a data value is missing does not depend on the missing value, nor on the values of all other variables. 2. Missing At Random (MAR) The probability that a data value is missing, conditional on the values of the other variables for the observation, is not related to the missing value. 3. Informative / Non-ignorable (NMAR) Difficult to deal with.
17 Tools for IDA 5 number summary (for all data and for subsets).
18 Tools for IDA 5 number summary (for all data and for subsets). Other summary statistics, e.g., mean and s.d.
19 Tools for IDA 5 number summary (for all data and for subsets). Other summary statistics, e.g., mean and s.d. Histograms (kernel density estimates), boxplots, dotplots.
20 Tools for IDA 5 number summary (for all data and for subsets). Other summary statistics, e.g., mean and s.d. Histograms (kernel density estimates), boxplots, dotplots. Frequency tables (1- and 2-way) for categorical variables
21 Tools for IDA 5 number summary (for all data and for subsets). Other summary statistics, e.g., mean and s.d. Histograms (kernel density estimates), boxplots, dotplots. Frequency tables (1- and 2-way) for categorical variables Scatterplots.
22 Tools for IDA 5 number summary (for all data and for subsets). Other summary statistics, e.g., mean and s.d. Histograms (kernel density estimates), boxplots, dotplots. Frequency tables (1- and 2-way) for categorical variables Scatterplots. Correlations.
23 Tools for IDA 5 number summary (for all data and for subsets). Other summary statistics, e.g., mean and s.d. Histograms (kernel density estimates), boxplots, dotplots. Frequency tables (1- and 2-way) for categorical variables Scatterplots. Correlations. What else?
24 AMore Sophisticated Tools for IDA Cluster Analysis Dimension reduction LOESS: Locally Weighted Scatterplot Smoothing Idea: fit a simple polynomial using regression on small ranges of the independent variable, and smoothly join up the pieces. Amount of smoothing controlled by a smoothing parameter.
25 Course mantra It s OK not to know. Expressing ignorance is encouraged. It s not OK to not have a willingness to learn.
26 For next week 1. Initial examination of student-performance-and-time-of-day data input the data into RStudio carry out IDA (no confidence intervals or P-values... yet...) document what you did and observed using R Markdown send me an html or pdf of your R Markdown document before class 2. Read: Lefevre et al (2010) Beer comsumption increases human attractiveness to malaria mosquitoes. PLoS ONE, vol. 5, issue 3. Make sure you understand the figures.
STA 490H1S Initial Examination of Data
Initial Examination of Data Alison L. Department of Statistics University of Toronto Winter 2011 Course mantra It s OK not to know. Expressing ignorance is encouraged. It s not OK to not have a willingness
More informationDivide & Recombine for Large Complex Data (a.k.a. Big Data): Goals
Divide & Recombine for Large Complex Data (a.k.a. Big Data): Goals 1 Provide the data analyst with statistical methods and a computational environment that enable deep study of large complex data. This
More informationDivide & Recombine (D&R) with Tessera: High Performance Computing for Data Analysis.
1 Divide & Recombine (D&R) with Tessera: High Performance Computing for Data Analysis www.tessera.io Big Data? 2 A big term An associated concept new computational methods and systems are needed to make
More informationGraphics in R. Jim Bentley. The following code creates a couple of sample data frames that we will use in our examples.
Graphics in R Jim Bentley 1 Sample Data The following code creates a couple of sample data frames that we will use in our examples. > sex = c(rep("female",12),rep("male",7)) > mass = c(36.1, 54.6, 48.5,
More informationSpace. Pat Hanrahan. On Being the Right Size. Page 1
Page 1 Space Pat Hanrahan On Being the Right Size The most obvious differences between different animals are differences of size, but for some reason zoologists have paid singularly little attention to
More informationR: a nearly-lisp. Christophe Rhodes. April 6, Teclo Networks AG
R: a nearly-lisp Christophe Rhodes Teclo Networks AG April 6, 2011 1 / 29 Outline Introduction Examples Repeated Measurement Trellis Graphics R and Lisp 2 / 29 Outline Introduction Examples Repeated Measurement
More informationTalk Overview. grid Graphics and Programming. The Structure of R Graphics. Introduction to grid
Talk Overview grid Graphics and Programming Introduction to grid Important grid concepts Sketching with grid Annotating with grid Editing with grid Combining grid with traditional graphics Developing new
More informationTrellis Displays. Definition. Example. Trellising: Which plot is best? Historical Development. Technical Definition
Trellis Displays The curse of dimensionality as described by Huber [6] is not restricted to mathematical statistical problems, but can be found in graphicbased data analysis as well. Most plots like histograms
More information2 A Technical Overview of lattice
2 A Technical Overview of lattice This chapter gives a broad overview of lattice, briefly describing the most important features shared by all high-level functions. Some of the topics covered are somewhat
More informationAcquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.
Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting
More informationWELCOME! Lecture 3 Thommy Perlinger
Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important
More informationCHAPTER 1 INTRODUCTION
Introduction CHAPTER 1 INTRODUCTION Mplus is a statistical modeling program that provides researchers with a flexible tool to analyze their data. Mplus offers researchers a wide choice of models, estimators,
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationBrief Guide on Using SPSS 10.0
Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new
More information3. Data Preprocessing. 3.1 Introduction
3. Data Preprocessing Contents of this Chapter 3.1 Introduction 3.2 Data cleaning 3.3 Data integration 3.4 Data transformation 3.5 Data reduction SFU, CMPT 740, 03-3, Martin Ester 84 3.1 Introduction Motivation
More information2. Data Preprocessing
2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459
More informationSPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL
SPSS QM II SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform some statistical analyses in SPSS. Details around a certain function/analysis method not covered
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationMath 183 Statistical Methods
Math 183 Statistical Methods Eddie Aamari S.E.W. Assistant Professor eaamari@ucsd.edu math.ucsd.edu/~eaamari/ AP&M 5880A 1 / 24 Math 183 Statistical Methods Eddie Aamari S.E.W. Assistant Professor eaamari@ucsd.edu
More informationApplied Regression Modeling: A Business Approach
i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming
More informationApplied Regression Modeling: A Business Approach
i Applied Regression Modeling: A Business Approach Computer software help: SPSS SPSS (originally Statistical Package for the Social Sciences ) is a commercial statistical software package with an easy-to-use
More informationPart I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures
Part I, Chapters 4 & 5 Data Tables and Data Analysis Statistics and Figures Descriptive Statistics 1 Are data points clumped? (order variable / exp. variable) Concentrated around one value? Concentrated
More informationYour Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression
Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Objectives: 1. To learn how to interpret scatterplots. Specifically you will investigate, using
More informationDATA WAREHOUING UNIT I
BHARATHIDASAN ENGINEERING COLLEGE NATTRAMAPALLI DEPARTMENT OF COMPUTER SCIENCE SUB CODE & NAME: IT6702/DWDM DEPT: IT Staff Name : N.RAMESH DATA WAREHOUING UNIT I 1. Define data warehouse? NOV/DEC 2009
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationPreprocessing Short Lecture Notes cse352. Professor Anita Wasilewska
Preprocessing Short Lecture Notes cse352 Professor Anita Wasilewska Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept
More informationMinitab 18 Feature List
Minitab 18 Feature List * New or Improved Assistant Measurement systems analysis * Capability analysis Graphical analysis Hypothesis tests Regression DOE Control charts * Graphics Scatterplots, matrix
More information3 Graphical Displays of Data
3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked
More informationChapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data Section 2.2 The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 2 Modeling Distributions of Data 2.1 Describing Location in a Distribution
More informationMinitab 17 commands Prepared by Jeffrey S. Simonoff
Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save
More informationCHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS
Examples: Missing Data Modeling And Bayesian Analysis CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS Mplus provides estimation of models with missing data using both frequentist and Bayesian
More information1 RefresheR. Figure 1.1: Soy ice cream flavor preferences
1 RefresheR Figure 1.1: Soy ice cream flavor preferences 2 The Shape of Data Figure 2.1: Frequency distribution of number of carburetors in mtcars dataset Figure 2.2: Daily temperature measurements from
More informationMINITAB Release Comparison Chart Release 14, Release 13, and Student Versions
Technical Support Free technical support Worksheet Size All registered users, including students Registered instructors Number of worksheets Limited only by system resources 5 5 Number of cells per worksheet
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More informationApplied Regression Modeling: A Business Approach
i Applied Regression Modeling: A Business Approach Computer software help: SAS code SAS (originally Statistical Analysis Software) is a commercial statistical software package based on a powerful programming
More informationLearner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display
CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &
More informationTechnical Support Minitab Version Student Free technical support for eligible products
Technical Support Free technical support for eligible products All registered users (including students) All registered users (including students) Registered instructors Not eligible Worksheet Size Number
More informationExcel 2010 with XLSTAT
Excel 2010 with XLSTAT J E N N I F E R LE W I S PR I E S T L E Y, PH.D. Introduction to Excel 2010 with XLSTAT The layout for Excel 2010 is slightly different from the layout for Excel 2007. However, with
More informationRegression on SAT Scores of 374 High Schools and K-means on Clustering Schools
Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools Abstract In this project, we study 374 public high schools in New York City. The project seeks to use regression techniques
More information3 Graphical Displays of Data
3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked
More informationSpecial Review Section. Copyright 2014 Pearson Education, Inc.
Special Review Section SRS-1--1 Special Review Section Chapter 1: The Where, Why, and How of Data Collection Chapter 2: Graphs, Charts, and Tables Describing Your Data Chapter 3: Describing Data Using
More informationBIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26
Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26 INTRODUCTION Graphs are one of the most important aspects of data analysis and presentation of your of data. They are visual representations
More informationStatistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.
Statistical Methods Instructor: Lingsong Zhang 1 Issues before Class Statistical Methods Lingsong Zhang Office: Math 544 Email: lingsong@purdue.edu Phone: 765-494-7913 Office Hour: Monday 1:00 pm - 2:00
More informationSummary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4
Principles of Knowledge Discovery in Data Fall 2004 Chapter 3: Data Preprocessing Dr. Osmar R. Zaïane University of Alberta Summary of Last Chapter What is a data warehouse and what is it for? What is
More informationLearn What s New. Statistical Software
Statistical Software Learn What s New Upgrade now to access new and improved statistical features and other enhancements that make it even easier to analyze your data. The Assistant Data Customization
More informationStatistics Lecture 6. Looking at data one variable
Statistics 111 - Lecture 6 Looking at data one variable Chapter 1.1 Moore, McCabe and Craig Probability vs. Statistics Probability 1. We know the distribution of the random variable (Normal, Binomial)
More informationData Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures
More informationBUSINESS ANALYTICS. 96 HOURS Practical Learning. DexLab Certified. Training Module. Gurgaon (Head Office)
SAS (Base & Advanced) Analytics & Predictive Modeling Tableau BI 96 HOURS Practical Learning WEEKDAY & WEEKEND BATCHES CLASSROOM & LIVE ONLINE DexLab Certified BUSINESS ANALYTICS Training Module Gurgaon
More informationData Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha
Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking
More informationSimulation: Solving Dynamic Models ABE 5646 Week 12, Spring 2009
Simulation: Solving Dynamic Models ABE 5646 Week 12, Spring 2009 Week Description Reading Material 12 Mar 23- Mar 27 Uncertainty and Sensitivity Analysis Two forms of crop models Random sampling for stochastic
More informationECLT 5810 Data Preprocessing. Prof. Wai Lam
ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate
More informationRegression III: Advanced Methods
Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions
More informationData Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality
Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing
More informationMultiple Imputation for Missing Data. Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health
Multiple Imputation for Missing Data Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health Outline Missing data mechanisms What is Multiple Imputation? Software Options
More informationVisualisation. Wolfgang Huber
Visualisation Wolfgang Huber Visualisation 1-dim. data: distributions 2-dim. data: scatterplots Overview 3-dim. data: pseudo-3d displays a few more than 2-dim: colours, drill-down, lattice, parallel coordinates
More informationTrelliscope: A System for Detailed Visualization in the Deep Analysis of Large Complex Data
Trelliscope: A System for Detailed Visualization in the Deep Analysis of Large Complex Data Ryan Hafen, Luke Gosink, William S. Cleveland, Jason McDermott, Karin Rodland, and Kerstin Kleese-Van Dam Fig.
More information8. MINITAB COMMANDS WEEK-BY-WEEK
8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are
More informationDSC 201: Data Analysis & Visualization
DSC 201: Data Analysis & Visualization Exploratory Data Analysis Dr. David Koop What is Exploratory Data Analysis? "Detective work" to summarize and explore datasets Includes: - Data acquisition and input
More informationBasic Medical Statistics Course
Basic Medical Statistics Course S0 SPSS Intro November 2013 Wilma Heemsbergen w.heemsbergen@nki.nl 1 13.00 ~ 15.30 Database (20 min) SPSS (40 min) Short break Exercise (60 min) This Afternoon During the
More informationGraphical Programming, Part I. Using R graphics functions
CSSS 569: Visualizing Data Graphical Programming, Part I. Using R graphics functions Christopher Adolph Department of Political Science and Center for Statistics and the Social Sciences University of Washington,
More informationHomework # 4. Example: Age in years. Answer: Discrete, quantitative, ratio. a) Year that an event happened, e.g., 1917, 1950, 2000.
Homework # 4 1. Attribute Types Classify the following attributes as binary, discrete, or continuous. Further classify the attributes as qualitative (nominal or ordinal) or quantitative (interval or ratio).
More informationGraphical Programming, Part I. Using R graphics functions
CSSS 569: Visualizing Data Graphical Programming, Part I. Using R graphics functions Christopher Adolph Department of Political Science and Center for Statistics and the Social Sciences University of Washington,
More informationTable Of Contents: xix Foreword to Second Edition
Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data
More informationChapter 2: Looking at Multivariate Data
Chapter 2: Looking at Multivariate Data Multivariate data could be presented in tables, but graphical presentations are more effective at displaying patterns. We can see the patterns in one variable at
More informationContents. Foreword to Second Edition. Acknowledgments About the Authors
Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1
More informationIntroduction to R Jason Huff, QB3 CGRL UC Berkeley April 15, 2016
Introduction to R Jason Huff, QB3 CGRL UC Berkeley April 15, 2016 Installing R R is constantly updated and you should download a recent version; the version when this workshop was written was 3.2.4 I also
More informationJMP 10 Student Edition Quick Guide
JMP 10 Student Edition Quick Guide Instructions presume an open data table, default preference settings and appropriately typed, user-specified variables of interest. RMC = Click Right Mouse Button Graphing
More informationTips and Guidance for Analyzing Data. Executive Summary
Tips and Guidance for Analyzing Data Executive Summary This document has information and suggestions about three things: 1) how to quickly do a preliminary analysis of time-series data; 2) key things to
More informationK236: Basis of Data Science
Schedule of K236 K236: Basis of Data Science Lecture 6: Data Preprocessing Lecturer: Tu Bao Ho and Hieu Chi Dam TA: Moharasan Gandhimathi and Nuttapong Sanglerdsinlapachai 1. Introduction to data science
More information1.3 Graphical Summaries of Data
Arkansas Tech University MATH 3513: Applied Statistics I Dr. Marcel B. Finan 1.3 Graphical Summaries of Data In the previous section we discussed numerical summaries of either a sample or a data. In this
More informationGetting Started with JMP at ISU
Getting Started with JMP at ISU 1 Introduction JMP (pronounced like jump ) is the new campus-wide standard statistical package for introductory statistics courses at Iowa State University. JMP is produced
More informationAP Statistics Prerequisite Packet
Types of Data Quantitative (or measurement) Data These are data that take on numerical values that actually represent a measurement such as size, weight, how many, how long, score on a test, etc. For these
More informationTable of Contents (As covered from textbook)
Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression
More information12. A(n) is the number of times an item or number occurs in a data set.
Chapter 15 Vocabulary Practice Match each definition to its corresponding term. a. data b. statistical question c. population d. sample e. data analysis f. parameter g. statistic h. survey i. experiment
More informationMultiple Linear Regression
Multiple Linear Regression Rebecca C. Steorts, Duke University STA 325, Chapter 3 ISL 1 / 49 Agenda How to extend beyond a SLR Multiple Linear Regression (MLR) Relationship Between the Response and Predictors
More informationThe basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student
Organizing data Learning Outcome 1. make an array 2. divide the array into class intervals 3. describe the characteristics of a table 4. construct a frequency distribution table 5. constructing a composite
More informationData Statistics Population. Census Sample Correlation... Statistical & Practical Significance. Qualitative Data Discrete Data Continuous Data
Data Statistics Population Census Sample Correlation... Voluntary Response Sample Statistical & Practical Significance Quantitative Data Qualitative Data Discrete Data Continuous Data Fewer vs Less Ratio
More informationSimulation of Imputation Effects Under Different Assumptions. Danny Rithy
Simulation of Imputation Effects Under Different Assumptions Danny Rithy ABSTRACT Missing data is something that we cannot always prevent. Data can be missing due to subjects' refusing to answer a sensitive
More informationOrientation Assignment for Statistics Software (nothing to hand in) Mary Parker,
Orientation to MINITAB, Mary Parker, mparker@austincc.edu. Last updated 1/3/10. page 1 of Orientation Assignment for Statistics Software (nothing to hand in) Mary Parker, mparker@austincc.edu When you
More informationSTA 570 Spring Lecture 5 Tuesday, Feb 1
STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row
More informationUNIT 2 Data Preprocessing
UNIT 2 Data Preprocessing Lecture Topic ********************************************** Lecture 13 Why preprocess the data? Lecture 14 Lecture 15 Lecture 16 Lecture 17 Data cleaning Data integration and
More informationUNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES
UNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES Data Pre-processing-Data Cleaning, Integration, Transformation, Reduction, Discretization Concept Hierarchies-Concept Description: Data Generalization And
More informationIntroductory Applied Statistics: A Variable Approach TI Manual
Introductory Applied Statistics: A Variable Approach TI Manual John Gabrosek and Paul Stephenson Department of Statistics Grand Valley State University Allendale, MI USA Version 1.1 August 2014 2 Copyright
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationDATA ANALYSIS I. Types of Attributes Sparse, Incomplete, Inaccurate Data
DATA ANALYSIS I Types of Attributes Sparse, Incomplete, Inaccurate Data Sources Bramer, M. (2013). Principles of data mining. Springer. [12-21] Witten, I. H., Frank, E. (2011). Data Mining: Practical machine
More informationNonparametric Approaches to Regression
Nonparametric Approaches to Regression In traditional nonparametric regression, we assume very little about the functional form of the mean response function. In particular, we assume the model where m(xi)
More informationVisualizing Data: Freq. Tables, Histograms
Visualizing Data: Freq. Tables, Histograms Engineering Statistics Section 1.2 Josh Engwer TTU 25 January 2016 Josh Engwer (TTU) Visualizing Data: Freq. Tables, Histograms 25 January 2016 1 / 23 Descriptive
More informationIntro to Stata. University of Virginia Library data.library.virginia.edu. September 16, 2014
to 1/12 Intro to University of Virginia Library data.library.virginia.edu September 16, 2014 Getting to Know to 2/12 Strengths Available A full-featured statistical programming language For Windows, Mac
More informationR for IR. Created by Narren Brown, Grinnell College, and Diane Saphire, Trinity University
R for IR Created by Narren Brown, Grinnell College, and Diane Saphire, Trinity University For presentation at the June 2013 Meeting of the Higher Education Data Sharing Consortium Table of Contents I.
More informationHandling Your Data in SPSS. Columns, and Labels, and Values... Oh My! The Structure of SPSS. You should think about SPSS as having three major parts.
Handling Your Data in SPSS Columns, and Labels, and Values... Oh My! You might think that simple intuition will guide you to a useful organization of your data. If you follow that path, you might find
More informationJMP Book Descriptions
JMP Book Descriptions The collection of JMP documentation is available in the JMP Help > Books menu. This document describes each title to help you decide which book to explore. Each book title is linked
More informationData Mining and Analytics. Introduction
Data Mining and Analytics Introduction Data Mining Data mining refers to extracting or mining knowledge from large amounts of data It is also termed as Knowledge Discovery from Data (KDD) Mostly, data
More informationQuick introduction to descriptive statistics and graphs in. R Commander. Written by: Robin Beaumont
Quick introduction to descriptive statistics and graphs in R Commander Written by: Robin Beaumont e-mail: robin@organplayers.co.uk http://www.robin-beaumont.co.uk/virtualclassroom/stats/course1.html Date
More informationin this course) ˆ Y =time to event, follow-up curtailed: covered under ˆ Missing at random (MAR) a
Chapter 3 Missing Data 3.1 Types of Missing Data ˆ Missing completely at random (MCAR) ˆ Missing at random (MAR) a ˆ Informative missing (non-ignorable non-response) See 1, 38, 59 for an introduction to
More informationStatistical Graphics
Idea: Instant impression Statistical Graphics Bad graphics abound: From newspapers, magazines, Excel defaults, other software. 1 Color helpful: if used effectively. Avoid "chartjunk." Keep level/interests
More informationData Mining. CS57300 Purdue University. Bruno Ribeiro. February 1st, 2018
Data Mining CS57300 Purdue University Bruno Ribeiro February 1st, 2018 1 Exploratory Data Analysis & Feature Construction How to explore a dataset Understanding the variables (values, ranges, and empirical
More informationBASIC LOESS, PBSPLINE & SPLINE
CURVES AND SPLINES DATA INTERPOLATION SGPLOT provides various methods for fitting smooth trends to scatterplot data LOESS An extension of LOWESS (Locally Weighted Scatterplot Smoothing), uses locally weighted
More informationExtending ODS Output by Incorporating
Paper PO1 Extending ODS Output by Incorporating Trellis TM Graphics from S-PLUS Robert Treder, Ph. D., Insightful Corporation, Seattle WA Jagrata Minardi, Ph. D., Insightful Corporation, Seattle WA ABSTRACT
More informationLAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA
LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to
More informationDATA. Business Statistics
DATA Business Statistics CONTENTS The role of data The data matrix Data types Aspects of data Obtaining data Further study THE ROLE OF DATA Data refers to observed facts there are 82 persons in this train
More information