WELCOME! Lecture 3 Thommy Perlinger

Size: px
Start display at page:

Download "WELCOME! Lecture 3 Thommy Perlinger"

Transcription

1 Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger

2 Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values

3 Graphical examination of the data It is important to understand, evaluate, and interpret results from multivariate analyses, which might be complex. This requires a thorough understanding of the basic characteristics of the underlying data and relationships. Graphical techniques are used to complement the empirical measures, to provide a visual representation of the basic relationships in order to feel confident in the understanding of these relationships.

4 Recap: Quantitative variables Discrete variables Take only some values, often integer values. A clear indicator that your variable is discrete is that it begins with Number of Eg. number of children. Continuous variables Take any value in an interval. A more precise measurement would always give more decimals. Eg. weight.

5 Recap: Describing distributions Bar charts display distributions of categorical variables, and discrete (quantitative) variables Histograms display distributions of continuous variables Count Weight

6 Recap: Describing distributions To interpret a histogram, think about: The general shape The center & spread Deviations from the general shape

7 Recap: Graphs for continuous variables Relationship betwen two variables (bivariate relationships): Scatter plot

8 Examining relationships To interpret a scatterplot: The pattern of points represents the relationship A strong organization of points along a straight line implies a linear relationship or correlation A curved set of points may denote a nonlinear relationship A seemingly random pattern of points may indicate that there is no relationship.

9 Scatterplot matrix in the book Scatterplots Bivariate (pairwise) scatterplots Histograms Univariate graphs

10 Recap: Correlation The Pearson correlation coefficient (r ) measures the direction (positive/negative) and strength of the linear relationship between two variables X andy The correlation is always a number between -1 and 1-1 r 1 The population correlation coefficient is denoted (the Greek letter rho)

11 Recap: Correlation If there is a strong positive linear relationship between X and Y, the value of the correlation coefficient (r ) is close to 1. If there is a strong negative linear relationship between X and Y, the value of the correlation coefficient (r ) is close to -1. If there is no linear relationship at all between X and Y, the value of the correlation coefficient (r ) is close to 0. Y Y Y r = 1 r = -1 r = 0 X X X

12 Examples of scatter plots of data with various correlation coefficients Y Y Y r = -1 X r = -.6 X r = 0 X Y Y Y r = +1 X r = +.3 X r = 0 12 X

13 Scatterplot matrix in the book Correlations Bivariate (pairwise) correlations Scatterplots Bivariate (pairwise) scatterplots Histograms Univariate graphs

14 Scatterplot matrix in SPSS Scatterplots with X and Y reversed Scatterplots Bivariate (pairwise) scatterplots

15 Examining group differences Groups can be formed from the categories of a nonmetric variable. Group differences are often of interest, differences of one or more metric variables. Assessing group differences is done through univariate analyses such as t-tests, or multivariate techniques such as MANOVA (multivariate analysis of variance). The graphical method used for this task is the boxplot.

16 Recap: Boxplot (for any data where the median is appropriate) 25% 50% 25% Age (years)

17 Recap: Boxplot Max Q3 Median Q1 Min Age (years) IQR If the median lies near one end of the box, skewness is indicated.

18 Recap: Boxplot Outliers Outlier: an observation more than 1.5 interquartile ranges away from Q1 or Q3. Extreme outlier (* in SPSS): an observation more than 3 interquartile ranges away from Q1 or Q3.

19 Examining group differences Boxplots are used as a complement to the statistical tests to get descriptive information that adds to our understanding of the group differences

20 Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values

21 Missing data When values on one or more variable(s) are not available for analysis, we say that we have missing data. Missing data are a fact of life in multivariate analysis. Data entry errors or data collection problems, or the respondent refusing to answer (among other things) can lead to missing data.

22 Missing data If there is a lot of missing data, the results can be biased. The larger the rate of missing data, the larger the risk of making incorrect generalizations to the target population There are different ways of dealing with missing data, you can e.g. impute data (substitute the missing data with some values). It is important to identify any patterns and relationships underlying the missing data, in order to maintain as close as possible the original distribution of values when any remedy is applied.

23 The impact of missing data The missing data processes, especially those based on actions by the respondent (e.g. nonresponse to some questions), are rarely known beforehand. Questions to be investigated regarding missing data: 1) Are the missing data scattered randomly throughout the observations, or are distinct patterns identifiable? 2) How prevalent are the missing data (what is the extent of the missing data)?

24 The impact of missing data The practical impact is the reduction of the sample size available for analysis. Since several variables are included in the analysis, any individual with a missing value on any of the variables will not be a part of the analysis. It has been shown that if 10% of the data is randomly missing in a set of five variables, the sample is reduced to only 40% of the original size. In such situations, you must either gather additional observations, or find a remedy for the missing data.

25 The impact of missing data From a substantive perspective, any statistical results based on data with a nonrandom missing data process could be biased. If, e.g., individuals that don t provide their household income tend to be those in the higher income brackets, the results will be erroneus. We still get results from the analysis even without the missing data, but it is important to consider the validity of the results.

26 A four-step process for identifying missing data and applying remedies Step 1: determine the type of missing data Is the missing data part of the research design and under the control of the researcher? Or are the causes and impacts of the missing data truly unknown? If the missing data are expected and part of the research design, they are termed ignorable. The missing data process is then operating at random (the observed values are a random sample of the total set of values) and no specific remedies are needed.

27 Example: ignorable missing data 1. Have you experienced any pain during the past 7 days? 2. How strong was your pain as worst? (If you answered no to question 1, proceed to question 3) No pain Worst pain imaginable Missing data on question 2 are part of the research design and would be inappropriate to attempt to remedy.

28 A four-step process for identifying missing data and applying remedies Step 1 Is the missing data ignorable? Yes Apply specialized techniques for ignorable missing data

29 Non-ignorable missing data In general, missing data that cannot be classified as ignorable fall into two classes based on their source: 1) Known processes. Missing data that can be identified due to procedural factors, such as errors in data entry, failure to complete the entire questionnaire, etc. 2) Unknown processes. Most often directly related to the respondent, e.g. refusal to respond to certain questions (common when questions are of a sensitive nature), or when the respondent has no opinion or not enough knowledge to answer.

30 Non-ignorable missing data When non-ignorable missing data occur in a random pattern, some remedies may be applicable to mitigate (ease) the effect of the missing data.

31 A four-step process for identifying missing data and applying remedies Step 1 Is the missing data ignorable? No Step 2 Is the extent of missing data substantial enough to warrant action? Yes Apply specialized techniques for ignorable missing data

32 A four-step process for identifying missing data and applying remedies Step 2: determine the extent of missing data Determine the extent of missing data for individual variables, individual cases (subjects/objects), and even overall. Determine whether the amount of missing data is low enough to not affect the results, even if it is non-random. If the extent is sufficiently low, then any of the approaches for remedying missing data may be applied.

33 Assessing the extent of missing data To identify the extent of missing data, and any exceptionally high levels of missing data that occur for individual cases or observations, tabulate the following: 1) The percentage of variables with missing data for each case/individual/object 2) The number of cases with missing data for each variable separately This can be done using the Missing Values Analysis option in SPSS.

34 Assessing any patterns of missing data Using the tabulations for each case and each variable: Look for any nonrandom patterns in the data, such as concentration of missing data in a specific set of questions, or signs of individuals not completing the questionnaire, etc.

35 Imputation Imputation is the process of substituting the missing value with a valid value based on other variables and/or cases in the sample. The reason for imputation is that it is desirable to keep as much information as possible in your data set. If the extent of missing data is acceptably low, and no specific nonrandom patterns appear, an imputation technique can be used without biasing the results too much.

36 How much missing data is too much? Rules of thumb Missing data under 10% for an individual case or observation can generally be ignored, except when the missing data occurs in a specific nonrandom fashion (e.g. concentration in a specific set of questions, missing answers at the end of the questionnaire implying non-completion, etc.) The number of cases with no missing data must be sufficient for the selected analysis technique if values will not be substituted (imputed) for the missing data (complete-case analysis).

37 Deleting individual cases and/or variables Consider the simplest approach of remedying missing data, i.e. deleting cases and/or variables with high levels of missing data. You may find that the missing data are concentrated in a small subset of cases and/or variables, and the exclusion of these might substantially reduce the extent of the missing data. If cases where a nonrandom pattern of missing data is present, this might be the most efficient solution.

38 Deletions based on missing data Rules of thumb Variables or cases with 50% or more missing data should always be deleted. Variables with as little as 15% missing data are candidates for deletion, but higher levels of missing data (20% to 30%) can often be remedied. Be sure that the overall decrease in missing data is large enough to justify deleting an individual variable or case.

39 Deletions based on missing data Rules of thumb, cont d Cases with missing data for dependent/response variable(s) typically are deleted to avoid any artificial increase in relationships with independent variable When deleting a variable, ensure that alternative variables, hopefully highly correlated, are available to represent the intent of the original variable. Always consider performing the analysis both with and without the deleted cases or variable(s) to identify any marked differences.

40 Example: HBAT missing data A pretest of a questionnaire used to collect the HBAT data, consisting of n=70 individuals and 14 variables.

41 Example: HBAT missing data Step 1 Is the missing data ignorable? No Step 2 Is the extent of missing data substantial enough to warrant action? All the missing data in this example are unknown, due to nonresponse by the respondents, and thus not ignorable.

42 Example: HBAT missing data Univariate Statistics N Mean Std. Deviation Missing No. of Extremes a Count Percent Low High v1 49 4,008, ,0 0 0 v2 57 1,944, ,6 0 0 v3 53 8,062 1, ,3 0 0 v4 63 5,168 1, ,0 0 0 v5 61 2,856, ,9 0 0 v6 64 2,611, ,6 0 0 v7 61 6,823 1, ,9 1 0 v ,033 9, ,9 0 0 v9 63 4,759, ,0 0 0 v ,9 v ,9 Categorical variables v ,9 v ,4 v ,9 a. Number of cases outside the range (Q1-1.5*IQR, Q *IQR). V1, V2, and V3 are possible candidates for deletion SPSS: Analyze >> Missing Value Analysis

43 Example: HBAT missing data SPSS: Analyze >> Missing Value Analysis. Click Pattern, mark Cases with missing values

44 Example: HBAT missing data

45 Example: HBAT missing data 6 individuals with 50% missing data, candidates for deletion All missing values for the categorical variables occur in these 6 cases.

46 Example: HBAT missing data 26 cases with complete data (no missing values) Only one more complete case if V3 is deleted SPSS: Analyze >> Missing Value Analysis. Click Pattern, mark Tabulated cases 11 more complete cases by deletion of V1 and V3 (37-26=11) 6 more complete cases by deletion of V1 only (32-26=6)

47 A four-step process for identifying missing data and applying remedies Step 1 Is the missing data ignorable? Delete cases and/or variables with high missing data Yes No Step 2 Is the extent of missing data substantial enough to warrant action? Yes Should cases and/or variables be deleted due to high levels of missing data? No Step 3 Are the missing data processes MAR (nonrandom) or MCAR (random)? No Step 4 Do you want to replace the missing data with values?

48 Example: HBAT missing data Step 1 Is the missing data ignorable? Delete cases and/or variables with high missing data Yes 2 variables (V1 and 3), and 6 cases are to be deleted. No Step 2 Is the extent of missing data substantial enough to warrant action? Yes Should cases and/or variables be deleted due to high levels of missing data? No Step 3 Are the missing data processes MAR (nonrandom) or MCAR (random)? No Step 4 Do you want to replace the missing data with values?

49 A four-step process for identifying missing data and applying remedies Step 3: diagnose the randomness of the missing data processes If the extent of missing data is substantial enough to warrant action, the degree of randomness in the missing data has to be ascertained. A nonrandom missing data process is present between the two variables X and Y when significant differences in the values of X occur between cases that have valid data for Y versus those cases with missing data on Y.

50 Levels of randomness of the missing data process Two levels of randomness of missing data: Missing At Random (MAR), which requires special methods to accommodate a nonrandom component. Missing Completely At Random (MCAR), which is sufficiently random to accommodate any type of missing data remedy. The distinction between these two levels is in the generalizability to the population.

51 Missing at random (MAR) If the missing values of Y depend on the variable X, but not on Y, the data are missing at random. The observed values of Y represent a random sample of the actual Y values for each observed value of X, but the observed data for Y do not necessarily represent a truly random sample of all Y values. The missing data process is random in the sample, but the observed values are not generalizable to the population.

52 Example: Missing at random (MAR) X= gender of the respondents (assumed to be known) Y = household income Missing data are random for both males and females, but occur much more frequently for males. The missing data is random within the gender variable, but the observed data is not generalizable to the population since it does not reflect the ultimate distribution of the household income values.

53 Missing completely at random (MCAR) Data are missing completely at random if the observed values of Y are truly a random sample of all Y values, with no underlying process that introduces bias to the observed data. There is no property of the cases that distinguishes those with missing data from cases with complete data.

54 Example: Missing completely at random (MCAR) X= gender of the respondents (assumed to be known) Y = household income Missing data are random for both males and females, and in equal proportions for both gender. In this missing data process, any remedy can be applied without having to consider the impact of any other variable or missing data process.

55 Diagnostic tests for levels of randomness There are two diagnostics tests that can be used to assess the level of randomness (MAR or MCAR): 1) Two groups of individuals are formed: one with missing values of Y, and another with valid values of Y. Then statistical tests (e.g. t-tests) are performed to see if differences exist between the two groups based on other variables of interest. Significant differences indicate the possibility of nonrandom missing data. A number of variables should be examined to find any consistent pattern. Either a large number of differences or a systematic pattern may indicate a nonrandom component (MAR).

56 Diagnostic tests for levels of randomness 2) An overall test of randomness compares patterns of missing data on all variables with the pattern expected for random missing data. If no significant differences are found, the missing data can be classified as MCAR. If significant differences are found, the nonrandom missing data processes have to be investigated. As a result of these tests, the missing data process is classified as either MAR or MCAR.

57 Example: HBAT missing data 1) Two groups of individuals are formed: one with missing values of e.g. V2, and another with valid values of V2. Then, t-tests are performed to see if differences exist between the two groups based on all other numerical variables of interest.

58 Variable that the groups are based on Variables used to test for differences between the groups

59 Example: HBAT missing data Three significant differences between groups based on V2. Only one significant difference among the rest of the tests. SPSS: Analyze >> Missing Value Analysis. Click Descriptives, mark t tests with groups formed by indicator variables

60 Example: HBAT missing data 2) An overall test of randomness. H 0 : H a : P-value (two-sided) The observed pattern of missing data does not differ from a random pattern. The observed pattern of missing data differs from a random pattern. SPSS: Analyze >> Missing Value Analysis. To the right under Estimation, mark EM (for Little s MCAR test).

61 Example: HBAT missing data This result, together with the analysis showing minimal differences in a nonrandom pattern, allows us to conclude that the missing data process is MCAR. If the MCAR test had been significant, or a nonrandom pattern had been obvious in the previous analysis, the missing data process would have been concluded to be MAR.

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL SPSS QM II SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform some statistical analyses in SPSS. Details around a certain function/analysis method not covered

More information

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 6: DESCRIPTIVE STATISTICS Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling

More information

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to

More information

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables

More information

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file 1 SPSS Guide 2009 Content 1. Basic Steps for Data Analysis. 3 2. Data Editor. 2.4.To create a new SPSS file 3 4 3. Data Analysis/ Frequencies. 5 4. Recoding the variable into classes.. 5 5. Data Analysis/

More information

IAT 355 Visual Analytics. Data and Statistical Models. Lyn Bartram

IAT 355 Visual Analytics. Data and Statistical Models. Lyn Bartram IAT 355 Visual Analytics Data and Statistical Models Lyn Bartram Exploring data Example: US Census People # of people in group Year # 1850 2000 (every decade) Age # 0 90+ Sex (Gender) # Male, female Marital

More information

STA 570 Spring Lecture 5 Tuesday, Feb 1

STA 570 Spring Lecture 5 Tuesday, Feb 1 STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row

More information

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order. Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good

More information

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency lowest value + highest value midrange The word average: is very ambiguous and can actually refer to the mean,

More information

Statistics: Normal Distribution, Sampling, Function Fitting & Regression Analysis (Grade 12) *

Statistics: Normal Distribution, Sampling, Function Fitting & Regression Analysis (Grade 12) * OpenStax-CNX module: m39305 1 Statistics: Normal Distribution, Sampling, Function Fitting & Regression Analysis (Grade 12) * Free High School Science Texts Project This work is produced by OpenStax-CNX

More information

UNIT 1A EXPLORING UNIVARIATE DATA

UNIT 1A EXPLORING UNIVARIATE DATA A.P. STATISTICS E. Villarreal Lincoln HS Math Department UNIT 1A EXPLORING UNIVARIATE DATA LESSON 1: TYPES OF DATA Here is a list of important terms that we must understand as we begin our study of statistics

More information

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &

More information

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. + What is Data? Data is a collection of facts. Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. In most cases, data needs to be interpreted and

More information

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 Objectives 2.1 What Are the Types of Data? www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions

More information

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and

More information

Lecture 6: Chapter 6 Summary

Lecture 6: Chapter 6 Summary 1 Lecture 6: Chapter 6 Summary Z-score: Is the distance of each data value from the mean in standard deviation Standardizes data values Standardization changes the mean and the standard deviation: o Z

More information

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures Part I, Chapters 4 & 5 Data Tables and Data Analysis Statistics and Figures Descriptive Statistics 1 Are data points clumped? (order variable / exp. variable) Concentrated around one value? Concentrated

More information

Averages and Variation

Averages and Variation Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus

More information

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data. Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting

More information

3 Graphical Displays of Data

3 Graphical Displays of Data 3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked

More information

AND NUMERICAL SUMMARIES. Chapter 2

AND NUMERICAL SUMMARIES. Chapter 2 EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 What Are the Types of Data? 2.1 Objectives www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative

More information

Univariate Statistics Summary

Univariate Statistics Summary Further Maths Univariate Statistics Summary Types of Data Data can be classified as categorical or numerical. Categorical data are observations or records that are arranged according to category. For example:

More information

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Module 2B Organizing Data and Comparing Distributions (Part II) STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and

More information

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II) STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and

More information

Chapter 2 Describing, Exploring, and Comparing Data

Chapter 2 Describing, Exploring, and Comparing Data Slide 1 Chapter 2 Describing, Exploring, and Comparing Data Slide 2 2-1 Overview 2-2 Frequency Distributions 2-3 Visualizing Data 2-4 Measures of Center 2-5 Measures of Variation 2-6 Measures of Relative

More information

How individual data points are positioned within a data set.

How individual data points are positioned within a data set. Section 3.4 Measures of Position Percentiles How individual data points are positioned within a data set. P k is the value such that k% of a data set is less than or equal to P k. For example if we said

More information

Statistics Lecture 6. Looking at data one variable

Statistics Lecture 6. Looking at data one variable Statistics 111 - Lecture 6 Looking at data one variable Chapter 1.1 Moore, McCabe and Craig Probability vs. Statistics Probability 1. We know the distribution of the random variable (Normal, Binomial)

More information

STA Module 4 The Normal Distribution

STA Module 4 The Normal Distribution STA 2023 Module 4 The Normal Distribution Learning Objectives Upon completing this module, you should be able to 1. Explain what it means for a variable to be normally distributed or approximately normally

More information

STA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves

STA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves STA 2023 Module 4 The Normal Distribution Learning Objectives Upon completing this module, you should be able to 1. Explain what it means for a variable to be normally distributed or approximately normally

More information

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student Organizing data Learning Outcome 1. make an array 2. divide the array into class intervals 3. describe the characteristics of a table 4. construct a frequency distribution table 5. constructing a composite

More information

CHAPTER 3: Data Description

CHAPTER 3: Data Description CHAPTER 3: Data Description You ve tabulated and made pretty pictures. Now what numbers do you use to summarize your data? Ch3: Data Description Santorico Page 68 You ll find a link on our website to a

More information

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. 1 CHAPTER 1 Introduction Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. Variable: Any characteristic of a person or thing that can be expressed

More information

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Objectives: 1. To learn how to interpret scatterplots. Specifically you will investigate, using

More information

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data Chapter 2 Descriptive Statistics: Organizing, Displaying and Summarizing Data Objectives Student should be able to Organize data Tabulate data into frequency/relative frequency tables Display data graphically

More information

Chapter 1. Looking at Data-Distribution

Chapter 1. Looking at Data-Distribution Chapter 1. Looking at Data-Distribution Statistics is the scientific discipline that provides methods to draw right conclusions: 1)Collecting the data 2)Describing the data 3)Drawing the conclusions Raw

More information

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or  me, I will answer promptly. Statistical Methods Instructor: Lingsong Zhang 1 Issues before Class Statistical Methods Lingsong Zhang Office: Math 544 Email: lingsong@purdue.edu Phone: 765-494-7913 Office Hour: Monday 1:00 pm - 2:00

More information

Chapter2 Description of samples and populations. 2.1 Introduction.

Chapter2 Description of samples and populations. 2.1 Introduction. Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that

More information

ECLT 5810 Data Preprocessing. Prof. Wai Lam

ECLT 5810 Data Preprocessing. Prof. Wai Lam ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate

More information

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel Research Methods for Business and Management Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel A Simple Example- Gym Purpose of Questionnaire- to determine the participants involvement

More information

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking

More information

Excel 2010 with XLSTAT

Excel 2010 with XLSTAT Excel 2010 with XLSTAT J E N N I F E R LE W I S PR I E S T L E Y, PH.D. Introduction to Excel 2010 with XLSTAT The layout for Excel 2010 is slightly different from the layout for Excel 2007. However, with

More information

Chapter 2 Modeling Distributions of Data

Chapter 2 Modeling Distributions of Data Chapter 2 Modeling Distributions of Data Section 2.1 Describing Location in a Distribution Describing Location in a Distribution Learning Objectives After this section, you should be able to: FIND and

More information

AP Statistics Summer Assignment:

AP Statistics Summer Assignment: AP Statistics Summer Assignment: Read the following and use the information to help answer your summer assignment questions. You will be responsible for knowing all of the information contained in this

More information

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable. 5-number summary 68-95-99.7 Rule Area principle Bar chart Bimodal Boxplot Case Categorical data Categorical variable Center Changing center and spread Conditional distribution Context Contingency table

More information

Chapter 5: The beast of bias

Chapter 5: The beast of bias Chapter 5: The beast of bias Self-test answers SELF-TEST Compute the mean and sum of squared error for the new data set. First we need to compute the mean: + 3 + + 3 + 2 5 9 5 3. Then the sum of squared

More information

Week 2: Frequency distributions

Week 2: Frequency distributions Types of data Health Sciences M.Sc. Programme Applied Biostatistics Week 2: distributions Data can be summarised to help to reveal information they contain. We do this by calculating numbers from the data

More information

1. Descriptive Statistics

1. Descriptive Statistics 1.1 Descriptive statistics 1. Descriptive Statistics A Data management Before starting any statistics analysis with a graphics calculator, you need to enter the data. We will illustrate the process by

More information

3 Graphical Displays of Data

3 Graphical Displays of Data 3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked

More information

Key Stage 3 Curriculum

Key Stage 3 Curriculum Key Stage 3 Curriculum Learning Area: Maths Learning Area Coordinator: Ms S J Pankhurst What will I study? SUBJECT YEAR 7 Autumn 1 Autumn 2 Spring 1 Spring 2 Summer 1 Summer 2 Focus Counting and comparing

More information

CHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and

CHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and CHAPTER-13 Mining Class Comparisons: Discrimination between DifferentClasses: 13.1 Introduction 13.2 Class Comparison Methods and Implementation 13.3 Presentation of Class Comparison Descriptions 13.4

More information

Chapter 2: Looking at Multivariate Data

Chapter 2: Looking at Multivariate Data Chapter 2: Looking at Multivariate Data Multivariate data could be presented in tables, but graphical presentations are more effective at displaying patterns. We can see the patterns in one variable at

More information

8. MINITAB COMMANDS WEEK-BY-WEEK

8. MINITAB COMMANDS WEEK-BY-WEEK 8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are

More information

Chapter 6 Normal Probability Distributions

Chapter 6 Normal Probability Distributions Chapter 6 Normal Probability Distributions 6-1 Review and Preview 6-2 The Standard Normal Distribution 6-3 Applications of Normal Distributions 6-4 Sampling Distributions and Estimators 6-5 The Central

More information

CHAPTER 2 DESCRIPTIVE STATISTICS

CHAPTER 2 DESCRIPTIVE STATISTICS CHAPTER 2 DESCRIPTIVE STATISTICS 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is how the data is spread or distributed over the range of the data values. This is one of

More information

Visual Analytics. Visualizing multivariate data:

Visual Analytics. Visualizing multivariate data: Visual Analytics 1 Visualizing multivariate data: High density time-series plots Scatterplot matrices Parallel coordinate plots Temporal and spectral correlation plots Box plots Wavelets Radar and /or

More information

Preparing for Data Analysis

Preparing for Data Analysis Preparing for Data Analysis Prof. Andrew Stokes March 27, 2018 Managing your data Entering the data into a database Reading the data into a statistical computing package Checking the data for errors and

More information

Chapter 2: The Normal Distribution

Chapter 2: The Normal Distribution Chapter 2: The Normal Distribution 2.1 Density Curves and the Normal Distributions 2.2 Standard Normal Calculations 1 2 Histogram for Strength of Yarn Bobbins 15.60 16.10 16.60 17.10 17.60 18.10 18.60

More information

At the end of the chapter, you will learn to: Present data in textual form. Construct different types of table and graphs

At the end of the chapter, you will learn to: Present data in textual form. Construct different types of table and graphs DATA PRESENTATION At the end of the chapter, you will learn to: Present data in textual form Construct different types of table and graphs Identify the characteristics of a good table and graph Identify

More information

WORKSHOP: Using the Health Survey for England, 2014

WORKSHOP: Using the Health Survey for England, 2014 WORKSHOP: Using the Health Survey for England, 2014 There are three sections to this workshop, each with a separate worksheet. The worksheets are designed to be accessible to those who have no prior experience

More information

Chapter 3: Describing, Exploring & Comparing Data

Chapter 3: Describing, Exploring & Comparing Data Chapter 3: Describing, Exploring & Comparing Data Section Title Notes Pages 1 Overview 1 2 Measures of Center 2 5 3 Measures of Variation 6 12 4 Measures of Relative Standing & Boxplots 13 16 3.1 Overview

More information

CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016)

CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016) CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1 Daphne Skipper, Augusta University (2016) 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is

More information

Name Date Types of Graphs and Creating Graphs Notes

Name Date Types of Graphs and Creating Graphs Notes Name Date Types of Graphs and Creating Graphs Notes Graphs are helpful visual representations of data. Different graphs display data in different ways. Some graphs show individual data, but many do not.

More information

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Creation & Description of a Data Set * 4 Levels of Measurement * Nominal, ordinal, interval, ratio * Variable Types

More information

Stat 528 (Autumn 2008) Density Curves and the Normal Distribution. Measures of center and spread. Features of the normal distribution

Stat 528 (Autumn 2008) Density Curves and the Normal Distribution. Measures of center and spread. Features of the normal distribution Stat 528 (Autumn 2008) Density Curves and the Normal Distribution Reading: Section 1.3 Density curves An example: GRE scores Measures of center and spread The normal distribution Features of the normal

More information

Section 9: One Variable Statistics

Section 9: One Variable Statistics The following Mathematics Florida Standards will be covered in this section: MAFS.912.S-ID.1.1 MAFS.912.S-ID.1.2 MAFS.912.S-ID.1.3 Represent data with plots on the real number line (dot plots, histograms,

More information

number Understand the equivalence between recurring decimals and fractions

number Understand the equivalence between recurring decimals and fractions number Understand the equivalence between recurring decimals and fractions Using and Applying Algebra Calculating Shape, Space and Measure Handling Data Use fractions or percentages to solve problems involving

More information

Introduction. About this Document. What is SPSS. ohow to get SPSS. oopening Data

Introduction. About this Document. What is SPSS. ohow to get SPSS. oopening Data Introduction About this Document This manual was written by members of the Statistical Consulting Program as an introduction to SPSS 12.0. It is designed to assist new users in familiarizing themselves

More information

Section 6.3: Measures of Position

Section 6.3: Measures of Position Section 6.3: Measures of Position Measures of position are numbers showing the location of data values relative to the other values within a data set. They can be used to compare values from different

More information

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 1 2 Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 2. How to construct (in your head!) and interpret confidence intervals.

More information

Summarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester

Summarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester Summarising Data Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 09/10/2018 Summarising Data Today we will consider Different types of data Appropriate ways to summarise these

More information

appstats6.notebook September 27, 2016

appstats6.notebook September 27, 2016 Chapter 6 The Standard Deviation as a Ruler and the Normal Model Objectives: 1.Students will calculate and interpret z scores. 2.Students will compare/contrast values from different distributions using

More information

MHPE 494: Data Analysis. Welcome! The Analytic Process

MHPE 494: Data Analysis. Welcome! The Analytic Process MHPE 494: Data Analysis Alan Schwartz, PhD Department of Medical Education Memoona Hasnain,, MD, PhD, MHPE Department of Family Medicine College of Medicine University of Illinois at Chicago Welcome! Your

More information

Subject. Creating a diagram. Dataset. Importing the data file. Descriptive statistics with TANAGRA.

Subject. Creating a diagram. Dataset. Importing the data file. Descriptive statistics with TANAGRA. Subject Descriptive statistics with TANAGRA. The aim of descriptive statistics is to describe the main features of a collection of data in quantitative terms 1. The visualization of the whole data table

More information

8: Statistics. Populations and Samples. Histograms and Frequency Polygons. Page 1 of 10

8: Statistics. Populations and Samples. Histograms and Frequency Polygons. Page 1 of 10 8: Statistics Statistics: Method of collecting, organizing, analyzing, and interpreting data, as well as drawing conclusions based on the data. Methodology is divided into two main areas. Descriptive Statistics:

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1

More information

1 Overview of Statistics; Essential Vocabulary

1 Overview of Statistics; Essential Vocabulary 1 Overview of Statistics; Essential Vocabulary Statistics: the science of collecting, organizing, analyzing, and interpreting data in order to make decisions Population and sample Population: the entire

More information

LASER s Level 2 Maths Course - Summary

LASER s Level 2 Maths Course - Summary LASER s Level 2 Maths Course - Summary Unit Code Unit Title Credits Level Status SER945 Shape, Space and Measurement 3 2 Mandatory SER946 Collecting, Recording and Analysing Data 3 2 Mandatory SER947 Development

More information

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26 Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26 INTRODUCTION Graphs are one of the most important aspects of data analysis and presentation of your of data. They are visual representations

More information

MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation

MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation Objectives: 1. Learn the meaning of descriptive versus inferential statistics 2. Identify bar graphs,

More information

Integrated Mathematics I Performance Level Descriptors

Integrated Mathematics I Performance Level Descriptors Limited A student performing at the Limited Level demonstrates a minimal command of Ohio s Learning Standards for Integrated Mathematics I. A student at this level has an emerging ability to demonstrate

More information

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- #

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- # Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series by Mario F. Triola Chapter 2 Summarizing and Graphing Data 2-1 Review and Preview 2-2 Frequency Distributions 2-3 Histograms

More information

CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12

CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12 Tool 1: Standards for Mathematical ent: Interpreting Functions CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12 Name of Reviewer School/District Date Name of Curriculum Materials:

More information

Lecture 1: Exploratory data analysis

Lecture 1: Exploratory data analysis Lecture 1: Exploratory data analysis Statistics 101 Mine Çetinkaya-Rundel January 17, 2012 Announcements Announcements Any questions about the syllabus? If you sent me your gmail address your RStudio account

More information

This chapter will show how to organize data and then construct appropriate graphs to represent the data in a concise, easy-to-understand form.

This chapter will show how to organize data and then construct appropriate graphs to represent the data in a concise, easy-to-understand form. CHAPTER 2 Frequency Distributions and Graphs Objectives Organize data using frequency distributions. Represent data in frequency distributions graphically using histograms, frequency polygons, and ogives.

More information

Middle School Math Course 2

Middle School Math Course 2 Middle School Math Course 2 Correlation of the ALEKS course Middle School Math Course 2 to the Indiana Academic Standards for Mathematics Grade 7 (2014) 1: NUMBER SENSE = ALEKS course topic that addresses

More information

Table of Contents (As covered from textbook)

Table of Contents (As covered from textbook) Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SPSS SPSS (originally Statistical Package for the Social Sciences ) is a commercial statistical software package with an easy-to-use

More information

AP Statistics Prerequisite Packet

AP Statistics Prerequisite Packet Types of Data Quantitative (or measurement) Data These are data that take on numerical values that actually represent a measurement such as size, weight, how many, how long, score on a test, etc. For these

More information

Page 1. Graphical and Numerical Statistics

Page 1. Graphical and Numerical Statistics TOPIC: Description Statistics In this tutorial, we show how to use MINITAB to produce descriptive statistics, both graphical and numerical, for an existing MINITAB dataset. The example data come from Exercise

More information

What are we working with? Data Abstractions. Week 4 Lecture A IAT 814 Lyn Bartram

What are we working with? Data Abstractions. Week 4 Lecture A IAT 814 Lyn Bartram What are we working with? Data Abstractions Week 4 Lecture A IAT 814 Lyn Bartram Munzner s What-Why-How What are we working with? DATA abstractions, statistical methods Why are we doing it? Task abstractions

More information

Name: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution

Name: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution Name: Date: Period: Chapter 2 Section 1: Describing Location in a Distribution Suppose you earned an 86 on a statistics quiz. The question is: should you be satisfied with this score? What if it is the

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

round decimals to the nearest decimal place and order negative numbers in context

round decimals to the nearest decimal place and order negative numbers in context 6 Numbers and the number system understand and use proportionality use the equivalence of fractions, decimals and percentages to compare proportions use understanding of place value to multiply and divide

More information

MATH& 146 Lesson 10. Section 1.6 Graphing Numerical Data

MATH& 146 Lesson 10. Section 1.6 Graphing Numerical Data MATH& 146 Lesson 10 Section 1.6 Graphing Numerical Data 1 Graphs of Numerical Data One major reason for constructing a graph of numerical data is to display its distribution, or the pattern of variability

More information

Slide Copyright 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION. Chapter 13. Statistics Sampling Techniques

Slide Copyright 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION. Chapter 13. Statistics Sampling Techniques SEVENTH EDITION and EXPANDED SEVENTH EDITION Slide - Chapter Statistics. Sampling Techniques Statistics Statistics is the art and science of gathering, analyzing, and making inferences from numerical information

More information

Basic Statistical Terms and Definitions

Basic Statistical Terms and Definitions I. Basics Basic Statistical Terms and Definitions Statistics is a collection of methods for planning experiments, and obtaining data. The data is then organized and summarized so that professionals can

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming

More information

Preparing for Data Analysis

Preparing for Data Analysis Preparing for Data Analysis Prof. Andrew Stokes March 21, 2017 Managing your data Entering the data into a database Reading the data into a statistical computing package Checking the data for errors and

More information

MAT 142 College Mathematics. Module ST. Statistics. Terri Miller revised July 14, 2015

MAT 142 College Mathematics. Module ST. Statistics. Terri Miller revised July 14, 2015 MAT 142 College Mathematics Statistics Module ST Terri Miller revised July 14, 2015 2 Statistics Data Organization and Visualization Basic Terms. A population is the set of all objects under study, a sample

More information