Chapter2 Description of samples and populations. 2.1 Introduction.

Size: px
Start display at page:

Download "Chapter2 Description of samples and populations. 2.1 Introduction."

Transcription

1 Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that can be assigned a numerical value or nonnumerical category. Data itself and its transformed forms are also called statistics. Types of variables: 1. Categorical Variable, it records a category subject belongs to, like Blood Type (O, A, B, AB) or Gender (Female, Male). Usually categories do not have a meaningful order. Some categorical data can be ordinal, where some natural order exists for example: response to the treatment: none, partial, complete. 2. Quantitative (Numeric) Variable, records amount of something or a count of something. It can be continuous,with values on the continuous scale (Weight of a newborn, Cholesterol content in a blood specimen) or discrete, where values can be listed, often values are integer (Number of eggs in the nest, Number of bacteria in a petri dish). Distinction between discrete and continuous variables is not rigid, we often round up measurements to nearest integer Sample=collection of persons or things on which we measure one or more variables. Sometimes that same word is used in a different context (for example sample of blood taken from a subject). To avoid confusion we will say a specimens of blood in that case. Some other vocabulary and notation: Example. Twenty students gave reported their gender, blood type and weight to a researcher. Students are here observational units. Variables are: Gender, Blood Type ( both categorical) and Weight (numerical). Sample size is n=20 We will use capital letters like X and Y for the names of the variables and lower case letters (x or y) for the particular observations. For example we may use Y=weight of a student and y 1 =150 lb as a weight of one such a student (John) Frequency distributions. When data is collected, to make sense of it it is helpful to summarize it in a form of tables and/or graphs. We will use some example data sets to examine different ways data can be displayed. Ex1: Sample of Blood Type for 21 people: A O A AB O B AB A O A O AB O A O B A AB A O A We can summarize it using frequency and relative frequency table. Frequency=count in a particular class. Relative frequency=frequency/n % frequency= relative frequency*100%

2 Frequency table results for Blood Type: Blood Type Frequency Relative Frequency A AB B O Notice that all frequencies add up to n=21 and all relative frequencies add up to 1 (or 100%) Graphical display includes a Bar Chart. Notice that classes do not have to be placed in any particular order. Example#2: US Solid Waste Weight (Pie Chart) Material Weight (million tons) Percent of Total Food Scraps Glass Metals Paper, Paperboard Plastics Rubber, Leather, Textiles Wood Yard Trimmings Other % 37.4% 10.7% 6.8% 5.5% 11.9% 3.2% Totals %

3 Missing frequency=7.6, missing relative frequencies are 5.5% and 7.8% To figure out the sizes of each slice multiply 360 by the relative frequency. Ex3 40 couples, # of children in each family These data can be grouped using a single value, since there are relatively few different data values. Our classes will be in order: 0,1,2,3,4,5, frequencies will be computed exactly as in example #1. Frequency table results for Number of children: Number of children Frequency Relative Frequency

4 Graphical display of such a data is called a histogram, bars will be raised with classes placed in the middle of each bar. Another way to display such a data is a dotplot. You place a dot over each data value. If values are repeated, you place multiple dots equally spaced above these values. Grouped frequency distribution is appropriate for a data set with a lot of different values like in the following example. Ex4 AGE of onset of diabetes (35 people) If we decide to start at 0 and have groups with the width=10 we can have following classes: [0,10), [10,20), [20,30) and so on, Treat the notation like an interval notation. Histogram for these data can also be obtain, bars will be raised over each class. Vertical axis can represent either frequency or relative frequency. We can also obtain a fast histogram, otherwise called stem-and-leaf diagram (or a stemplot): Each data point is divided into stem and leaf, all possible stems are placed vertically and leaves are added to them in order. Our stemplot is given below, notice that leaves are ordered stems: tens leaves: ones

5 How to make a stemplot: 1. Separate each observation into a stem (has all but the last digit, can be 1, 2, or more digits) consisting of all but the final (rightmost) digit and a leaf (has only one digit), the final digit. 2. Write the stems in a vertical column with the smallest at the top, and draw a vertical line at the right of this column. 3. Write each leaf in the row to the right of its stem, in increasing order out from the stem. Ex5 Radishes growth (mm in 3 days) A(in the dark) B (12 hours of light/ 12 hours of dark) A: B: Side by side Stemplots (with 2 leaves per stem) can let us compare both sets: In both stems are tens, leaves are ones stems: tens leaves: ones A B Stemplot with two leaves per stem: The number of stems can be doubled by splitting the stem in two ; one with leaves from 0 to 4 and the other with leaves 5 to 9. Interpreting areas of the histogram: Area of each bar of the histogram is proportional to corresponding frequency. In example #4 area between 10 and 30 (2 bars) equals 3/35~8.6% of the total area of the histogram. We can draw a f histogram using a density scale ( n ), then total area of the histogram will be 1 or unit= class width 100%

6 Ex6 The amounts of iron intake, in milligrams, during a 24-hour period for a sample of 30 females under the age of In that last example we may select groups of width 2, namely: [9,11), [11,13), [13,15) and so on, we will get 6 classes, appropriate number for data of 30 observations. Example7: Weight data (in pounds) in an Intro. Stats Class 100, 105, 111,115, 118, 118, 119, 120, 125, 125, 128, 128, 129, 130, 133, 135, 135, 138, 138, 140, 140, 145, 146, 150, 155, 158, 160, 162, 164, 165, 167, 171, 175, 178, 180, 180, 182, 185, 185, 187, 189, 190, 190, 193, 194, 195, 200, 205, 210, 215, 230, 270 We can clearly observe two prominent picks, data is bimodal

7 Describing distribution of the sample data: Modality, Shapes, Symmetry, and Skewness. Modality: Unimodal - has one peak eg. Bell-shaped, Triangular, Reverse J-shaped, J-shaped, Right skewed, Left skewed Bimodal - has two peaks (technically, all peaks should be same height, not so in practice) Multimodal - has 3 or more peaks Symmetry and Skewness Symmetry - property of a distribution to be divided into 2 parts that are mirror images of each other. Do not have to be exact in identifying symmetry. Eg. bell-shaped, triangular, uniform. Non-symmetric Distribution - Reverse J-shaped, J-shaped, Right skewed, Left skewed The distribution of population data is called population distribution, or the distribution of the variable. The distribution of sample data is a sample distribution. The distribution of a random sample from a population approximates the population distribution, hence, larger samples give better approximation. Shapes of Distributions. right skewed distribution, left skewed distribution, symmetric distribution,

8 2.3 Descriptive Measures of Center Let Y be our variable, numerical. y = Median=middle of the ordered data. Position (location) of the median is n=sample size. n+ 1 2, where Ex Weight gain in pounds for 6 young lambs , 0.5(6+1)=3.5 (median is between observation #3 and #4), y =(10+11)/2=10.5 lb If we add one more observation: 10lb, data becomes: , 0.5(7+1)=4,(median is observation #4) y =10 Median is a robust (resistant) measure of center, it is relatively unaffected by changes in small portion of the data. y = Mean (arithmetic mean)= n i=1 y= n y i, where y i -s are observations in the sample. In our example y =56/6~9.33 lb Differences between each data point and the mean and their sum i=1 n ( y i y)=0 for any data set. ( y i y) are called deviations from the mean In our example sum of all deviations= (-7.33) =0 Mean can be visualized as a point of balance of the weightless seesaw with points (like children) sitting on it. Unlike median, mean is not robust, it is influenced by any data changes, very much by extremes. If data has some extreme values then median is a better measure of center for that data.

9 Mean vs Median right skewed distribution, left skewed distribution, symmetric distribution, Mean>Median Mean< Median Mean=Median 2.4 Boxplots. Single variable data may be summarized by 5 numbers: Minimum, Maximum, Median and 2 Quartiles referred to as five-number summary. These values are also used to make a box plot. Lower quartile denoted by Q 1 is a median of lower half of data, upper quartile denoted by Q 3 is a median of upper half of data. Ex1 Data represents systolic blood pressure (in mmhg) of 7 adult males We order data first: Min=113, Max=170, Median=132 Q 1 =124 Q 3 =151 (Median is excluded when we compute quartiles) Boxplot connects all 5 numbers in the following way, the box represents middle half of the data Another measure we can compute is Interquartile Range IQR= Q 3 - Q 1. This measure gives spread of middle half of data values. We can use it to find unusual data points (outliers). The procedure is as follows:

10 Compute lower fence=q 1-1.5*IQR and upper fence=q *IQR. An outlier is a data point that falls outside of the fences. In our example: IQR= =27, 1.5(IQR)=1.5*27 = 40.5 lower fence= =83.5, upper fence= = 191.5, all observations are within the fences, so so there are no outliers in our data set. Ex2 Radishes growth (in mm) in the light Min=4, Max=21, Q 1 =7, Median=(9+10)/2=9.5 Q 3 =10 IQR=3, lower fence=2.5 upper fence=14.5, so 20 and 21 are outliers. Modified box plot exposes outliers. * * Relationship between variables. This section discusses various ways used to compare two or more variables. Some methods include: a) Two way frequency and relative frequency tables to examine relationship between two categorical variables. They are useful to determine if variables are associated or not. b) Scatter plots for numerical variables to decide if there is a linear trend present, so that we can fit a regression line to the data. c) Side-by-side boxplots, dot plots, stemplots are useful to observe if there are differences between two or more treatments. 2.6 Measures of dispersion (variability) Range=Maximum-Minimum, gives overall spread of the data, easy to calculate, but very sensitive to extreme data values. IQR as we stated before gives range of the middle half of data and is a robust measure, not sensitive to extreme data values.

11 Sample standard deviation s = n (y i y ) 2 i=1 n 1 averages the squared deviations from the mean. Square root is taken at the end, so the units of s are the same as the units of the data. s 0, s=0 if all data points are the same s 2 is the sample variance. We will abbreviate SD for standard deviation, s will be used in the formulas. Ex. Experiment on chrysanthemums, botanist measured stem elongation in 7 days (in mm) 76, 72, 65, 70, 82 n=5 y=365 /5=73, deviations from the mean are: 3, -1,-8,-3,9, squared deviations are: 9, 1,64,9,81 s= ( )/4 = 164/ 4 =6.40 mm variance s 2 =41mm 2 s gives typical distance of the observations from the mean, larger s means more variability. Similar to the mean, s is also influenced by extreme data values (not a robust measure). n-1 =degrees of freedom of s, as an intuitive justification why we use ( n-1) not n we can consider n=1, when variability of 1 observation can't be computed, one data point gives no information about variability. The Coefficient of Variation = s expressed as a percentage of the mean: coefficient of variation= units, for example: s y 100% has no units and can be used to compare data sets with different EX Weight and height is measured for girls at age 2. Which of the two measures has greater variability? Weight : mean=12.6 kg, SD=1.4 kg Height: mean=86.6 cm, SD=2.9 cm coef. of variation: 11.1% for weight and 3.3% for height, we conclude that weight is more variable, here SD is much larger percentage of the mean than for height.

12 Typical Percentages: The Empirical Rule For a nice distribution (pretty symmetric, unimodal, no very long or very short tails) we expect to find : about 68% of all data points within the interval ( y SD, y+ SD) about 95% of all data points within the interval ( y 2SD, y+ 2SD) more than 99% of all data points within the interval ( y 3SD, y+ 3SD) 2.8 Effect of Transformation of Variables Sometimes when we work with a data set it is convenient to transform our variable(s). For example, we may want to change units or transform very small numbers that appear in scientific notation to something easier to use by multiplying original data by 10,000. Linear transformation is the simplest one: Let Y be the original variable with mean y and SD s, then Y '=ay +b is it's linear transformation, mean and SD of Y ' are y' and s' respectively. That type of transformation does not change the essential shape of the distribution of Y, the histogram of transformed variable can be made identical to the original histogram by suitable scaling of the horizontal axis. How Linear Transformation Affects mean and SD? Only mean (but not s) is affected by the additive transformation (adding positive or negative constant b to Y), but both mean and SD are affected by multiplying Y by a positive or a negative constant a: y'=a y+b and s '= a s Ex Suppose Y=summer temperature in some American city in 2013 in F, y=79.6 F and s=12.7 F. If we would like to change the Y to C, the transformation is as follows: Y '=(Y 32) 5 9 = 5 9 Y , so new mean s'= =7.06 C y' = ( )=26.44 C and Nonlinear transformations like the following examples: Y '= Y, Y '=logy, Y '= 1 Y, Y '=Y 2, can affect data in complex ways and they do change essential shape of the frequency distribution. If the distribution is right skewed, for example, and we wish to make it more symmetric, we can apply square root transformation to pool the righthand tail and push out the left -hand tail. Logarithmic transformation will deliver even more drastic change in that regard (check out the histograms given at the end of this section)

13 2.8. Statistical Inference is the process of drawing conclusions about the population based on the observations in the sample. We can for example estimate percentage of all people in England with blood type A as 44% (the sample proportion of people with that blood type). Sample must be considered a random sample from entire population, must be representative of that population. 44% is a statistics (sample proportion p= y n, p hat ) that is estimating a parameter of the population (population proportion p). There are also other statistics we can use to estimate a population proportion, namely p= y+ 2, p tilde. n+ 4 In each case y=number of people in a sample that have a blood type A, n=sample size. We will discuss these estimates in later chapters Other parameters of the population that we often estimate from the samples are: population mean, μ, is estimated by sample mean, y. population SD, σ, is estimated by sample SD, s.

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order. Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good

More information

Chapter 3 - Displaying and Summarizing Quantitative Data

Chapter 3 - Displaying and Summarizing Quantitative Data Chapter 3 - Displaying and Summarizing Quantitative Data 3.1 Graphs for Quantitative Data (LABEL GRAPHS) August 25, 2014 Histogram (p. 44) - Graph that uses bars to represent different frequencies or relative

More information

STP 226 ELEMENTARY STATISTICS NOTES

STP 226 ELEMENTARY STATISTICS NOTES ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 2 ORGANIZING DATA Descriptive Statistics - include methods for organizing and summarizing information clearly and effectively. - classify

More information

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or  me, I will answer promptly. Statistical Methods Instructor: Lingsong Zhang 1 Issues before Class Statistical Methods Lingsong Zhang Office: Math 544 Email: lingsong@purdue.edu Phone: 765-494-7913 Office Hour: Monday 1:00 pm - 2:00

More information

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 6: DESCRIPTIVE STATISTICS Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling

More information

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. + What is Data? Data is a collection of facts. Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. In most cases, data needs to be interpreted and

More information

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 Objectives 2.1 What Are the Types of Data? www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative

More information

UNIT 1A EXPLORING UNIVARIATE DATA

UNIT 1A EXPLORING UNIVARIATE DATA A.P. STATISTICS E. Villarreal Lincoln HS Math Department UNIT 1A EXPLORING UNIVARIATE DATA LESSON 1: TYPES OF DATA Here is a list of important terms that we must understand as we begin our study of statistics

More information

Name Date Types of Graphs and Creating Graphs Notes

Name Date Types of Graphs and Creating Graphs Notes Name Date Types of Graphs and Creating Graphs Notes Graphs are helpful visual representations of data. Different graphs display data in different ways. Some graphs show individual data, but many do not.

More information

Chapter 2 Describing, Exploring, and Comparing Data

Chapter 2 Describing, Exploring, and Comparing Data Slide 1 Chapter 2 Describing, Exploring, and Comparing Data Slide 2 2-1 Overview 2-2 Frequency Distributions 2-3 Visualizing Data 2-4 Measures of Center 2-5 Measures of Variation 2-6 Measures of Relative

More information

MATH& 146 Lesson 10. Section 1.6 Graphing Numerical Data

MATH& 146 Lesson 10. Section 1.6 Graphing Numerical Data MATH& 146 Lesson 10 Section 1.6 Graphing Numerical Data 1 Graphs of Numerical Data One major reason for constructing a graph of numerical data is to display its distribution, or the pattern of variability

More information

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. 1 CHAPTER 1 Introduction Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. Variable: Any characteristic of a person or thing that can be expressed

More information

AND NUMERICAL SUMMARIES. Chapter 2

AND NUMERICAL SUMMARIES. Chapter 2 EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 What Are the Types of Data? 2.1 Objectives www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative

More information

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data Chapter 2 Descriptive Statistics: Organizing, Displaying and Summarizing Data Objectives Student should be able to Organize data Tabulate data into frequency/relative frequency tables Display data graphically

More information

Table of Contents (As covered from textbook)

Table of Contents (As covered from textbook) Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression

More information

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable. 5-number summary 68-95-99.7 Rule Area principle Bar chart Bimodal Boxplot Case Categorical data Categorical variable Center Changing center and spread Conditional distribution Context Contingency table

More information

AP Statistics Summer Assignment:

AP Statistics Summer Assignment: AP Statistics Summer Assignment: Read the following and use the information to help answer your summer assignment questions. You will be responsible for knowing all of the information contained in this

More information

No. of blue jelly beans No. of bags

No. of blue jelly beans No. of bags Math 167 Ch5 Review 1 (c) Janice Epstein CHAPTER 5 EXPLORING DATA DISTRIBUTIONS A sample of jelly bean bags is chosen and the number of blue jelly beans in each bag is counted. The results are shown in

More information

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to

More information

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and

More information

Chapter 3 Analyzing Normal Quantitative Data

Chapter 3 Analyzing Normal Quantitative Data Chapter 3 Analyzing Normal Quantitative Data Introduction: In chapters 1 and 2, we focused on analyzing categorical data and exploring relationships between categorical data sets. We will now be doing

More information

Averages and Variation

Averages and Variation Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus

More information

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Module 2B Organizing Data and Comparing Distributions (Part II) STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and

More information

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II) STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and

More information

Section 1.2. Displaying Quantitative Data with Graphs. Mrs. Daniel AP Stats 8/22/2013. Dotplots. How to Make a Dotplot. Mrs. Daniel AP Statistics

Section 1.2. Displaying Quantitative Data with Graphs. Mrs. Daniel AP Stats 8/22/2013. Dotplots. How to Make a Dotplot. Mrs. Daniel AP Statistics Section. Displaying Quantitative Data with Graphs Mrs. Daniel AP Statistics Section. Displaying Quantitative Data with Graphs After this section, you should be able to CONSTRUCT and INTERPRET dotplots,

More information

Summarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester

Summarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester Summarising Data Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 09/10/2018 Summarising Data Today we will consider Different types of data Appropriate ways to summarise these

More information

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency lowest value + highest value midrange The word average: is very ambiguous and can actually refer to the mean,

More information

Chapter 1. Looking at Data-Distribution

Chapter 1. Looking at Data-Distribution Chapter 1. Looking at Data-Distribution Statistics is the scientific discipline that provides methods to draw right conclusions: 1)Collecting the data 2)Describing the data 3)Drawing the conclusions Raw

More information

1.3 Graphical Summaries of Data

1.3 Graphical Summaries of Data Arkansas Tech University MATH 3513: Applied Statistics I Dr. Marcel B. Finan 1.3 Graphical Summaries of Data In the previous section we discussed numerical summaries of either a sample or a data. In this

More information

Section 2-2 Frequency Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc

Section 2-2 Frequency Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc Section 2-2 Frequency Distributions Copyright 2010, 2007, 2004 Pearson Education, Inc. 2.1-1 Frequency Distribution Frequency Distribution (or Frequency Table) It shows how a data set is partitioned among

More information

The main issue is that the mean and standard deviations are not accurate and should not be used in the analysis. Then what statistics should we use?

The main issue is that the mean and standard deviations are not accurate and should not be used in the analysis. Then what statistics should we use? Chapter 4 Analyzing Skewed Quantitative Data Introduction: In chapter 3, we focused on analyzing bell shaped (normal) data, but many data sets are not bell shaped. How do we analyze quantitative data when

More information

Chapter 2 Modeling Distributions of Data

Chapter 2 Modeling Distributions of Data Chapter 2 Modeling Distributions of Data Section 2.1 Describing Location in a Distribution Describing Location in a Distribution Learning Objectives After this section, you should be able to: FIND and

More information

STA 570 Spring Lecture 5 Tuesday, Feb 1

STA 570 Spring Lecture 5 Tuesday, Feb 1 STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row

More information

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student Organizing data Learning Outcome 1. make an array 2. divide the array into class intervals 3. describe the characteristics of a table 4. construct a frequency distribution table 5. constructing a composite

More information

Univariate Statistics Summary

Univariate Statistics Summary Further Maths Univariate Statistics Summary Types of Data Data can be classified as categorical or numerical. Categorical data are observations or records that are arranged according to category. For example:

More information

Name: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution

Name: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution Name: Date: Period: Chapter 2 Section 1: Describing Location in a Distribution Suppose you earned an 86 on a statistics quiz. The question is: should you be satisfied with this score? What if it is the

More information

2.1: Frequency Distributions and Their Graphs

2.1: Frequency Distributions and Their Graphs 2.1: Frequency Distributions and Their Graphs Frequency Distribution - way to display data that has many entries - table that shows classes or intervals of data entries and the number of entries in each

More information

To calculate the arithmetic mean, sum all the values and divide by n (equivalently, multiple 1/n): 1 n. = 29 years.

To calculate the arithmetic mean, sum all the values and divide by n (equivalently, multiple 1/n): 1 n. = 29 years. 3: Summary Statistics Notation Consider these 10 ages (in years): 1 4 5 11 30 50 8 7 4 5 The symbol n represents the sample size (n = 10). The capital letter X denotes the variable. x i represents the

More information

appstats6.notebook September 27, 2016

appstats6.notebook September 27, 2016 Chapter 6 The Standard Deviation as a Ruler and the Normal Model Objectives: 1.Students will calculate and interpret z scores. 2.Students will compare/contrast values from different distributions using

More information

CHAPTER 3: Data Description

CHAPTER 3: Data Description CHAPTER 3: Data Description You ve tabulated and made pretty pictures. Now what numbers do you use to summarize your data? Ch3: Data Description Santorico Page 68 You ll find a link on our website to a

More information

Chapter 2 - Graphical Summaries of Data

Chapter 2 - Graphical Summaries of Data Chapter 2 - Graphical Summaries of Data Data recorded in the sequence in which they are collected and before they are processed or ranked are called raw data. Raw data is often difficult to make sense

More information

10.4 Measures of Central Tendency and Variation

10.4 Measures of Central Tendency and Variation 10.4 Measures of Central Tendency and Variation Mode-->The number that occurs most frequently; there can be more than one mode ; if each number appears equally often, then there is no mode at all. (mode

More information

10.4 Measures of Central Tendency and Variation

10.4 Measures of Central Tendency and Variation 10.4 Measures of Central Tendency and Variation Mode-->The number that occurs most frequently; there can be more than one mode ; if each number appears equally often, then there is no mode at all. (mode

More information

MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation

MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation Objectives: 1. Learn the meaning of descriptive versus inferential statistics 2. Identify bar graphs,

More information

STP 226 ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES

STP 226 ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES STP 6 ELEMENTARY STATISTICS NOTES PART - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES Chapter covered organizing data into tables, and summarizing data with graphical displays. We will now use

More information

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &

More information

TMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS

TMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS To Describe Data, consider: Symmetry Skewness TMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS Unimodal or bimodal or uniform Extreme values Range of Values and mid-range Most frequently occurring values In

More information

CHAPTER 2: SAMPLING AND DATA

CHAPTER 2: SAMPLING AND DATA CHAPTER 2: SAMPLING AND DATA This presentation is based on material and graphs from Open Stax and is copyrighted by Open Stax and Georgia Highlands College. OUTLINE 2.1 Stem-and-Leaf Graphs (Stemplots),

More information

CHAPTER 2 DESCRIPTIVE STATISTICS

CHAPTER 2 DESCRIPTIVE STATISTICS CHAPTER 2 DESCRIPTIVE STATISTICS 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is how the data is spread or distributed over the range of the data values. This is one of

More information

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data. Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting

More information

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- #

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- # Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series by Mario F. Triola Chapter 2 Summarizing and Graphing Data 2-1 Review and Preview 2-2 Frequency Distributions 2-3 Histograms

More information

Measures of Central Tendency. A measure of central tendency is a value used to represent the typical or average value in a data set.

Measures of Central Tendency. A measure of central tendency is a value used to represent the typical or average value in a data set. Measures of Central Tendency A measure of central tendency is a value used to represent the typical or average value in a data set. The Mean the sum of all data values divided by the number of values in

More information

Frequency Distributions

Frequency Distributions Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data so that it is possible to get a general overview of the results. Remember,

More information

Raw Data is data before it has been arranged in a useful manner or analyzed using statistical techniques.

Raw Data is data before it has been arranged in a useful manner or analyzed using statistical techniques. Section 2.1 - Introduction Graphs are commonly used to organize, summarize, and analyze collections of data. Using a graph to visually present a data set makes it easy to comprehend and to describe the

More information

Measures of Central Tendency

Measures of Central Tendency Page of 6 Measures of Central Tendency A measure of central tendency is a value used to represent the typical or average value in a data set. The Mean The sum of all data values divided by the number of

More information

Measures of Dispersion

Measures of Dispersion Lesson 7.6 Objectives Find the variance of a set of data. Calculate standard deviation for a set of data. Read data from a normal curve. Estimate the area under a curve. Variance Measures of Dispersion

More information

15 Wyner Statistics Fall 2013

15 Wyner Statistics Fall 2013 15 Wyner Statistics Fall 2013 CHAPTER THREE: CENTRAL TENDENCY AND VARIATION Summary, Terms, and Objectives The two most important aspects of a numerical data set are its central tendencies and its variation.

More information

+ Statistical Methods in

+ Statistical Methods in 9/4/013 Statistical Methods in Practice STA/MTH 379 Dr. A. B. W. Manage Associate Professor of Mathematics & Statistics Department of Mathematics & Statistics Sam Houston State University Discovering Statistics

More information

1.2. Pictorial and Tabular Methods in Descriptive Statistics

1.2. Pictorial and Tabular Methods in Descriptive Statistics 1.2. Pictorial and Tabular Methods in Descriptive Statistics Section Objectives. 1. Stem-and-Leaf displays. 2. Dotplots. 3. Histogram. Types of histogram shapes. Common notation. Sample size n : the number

More information

Descriptive Statistics, Standard Deviation and Standard Error

Descriptive Statistics, Standard Deviation and Standard Error AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.

More information

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Lecture 3 Questions that we should be able to answer by the end of this lecture: Lecture 3 Questions that we should be able to answer by the end of this lecture: Which is the better exam score? 67 on an exam with mean 50 and SD 10 or 62 on an exam with mean 40 and SD 12 Is it fair

More information

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables

More information

STA Module 4 The Normal Distribution

STA Module 4 The Normal Distribution STA 2023 Module 4 The Normal Distribution Learning Objectives Upon completing this module, you should be able to 1. Explain what it means for a variable to be normally distributed or approximately normally

More information

STA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves

STA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves STA 2023 Module 4 The Normal Distribution Learning Objectives Upon completing this module, you should be able to 1. Explain what it means for a variable to be normally distributed or approximately normally

More information

VCEasy VISUAL FURTHER MATHS. Overview

VCEasy VISUAL FURTHER MATHS. Overview VCEasy VISUAL FURTHER MATHS Overview This booklet is a visual overview of the knowledge required for the VCE Year 12 Further Maths examination.! This booklet does not replace any existing resources that

More information

CHAPTER 2 Modeling Distributions of Data

CHAPTER 2 Modeling Distributions of Data CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Density Curves

More information

Understanding and Comparing Distributions. Chapter 4

Understanding and Comparing Distributions. Chapter 4 Understanding and Comparing Distributions Chapter 4 Objectives: Boxplot Calculate Outliers Comparing Distributions Timeplot The Big Picture We can answer much more interesting questions about variables

More information

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Lecture 3 Questions that we should be able to answer by the end of this lecture: Lecture 3 Questions that we should be able to answer by the end of this lecture: Which is the better exam score? 67 on an exam with mean 50 and SD 10 or 62 on an exam with mean 40 and SD 12 Is it fair

More information

Organizing and Summarizing Data

Organizing and Summarizing Data 1 Organizing and Summarizing Data Key Definitions Frequency Distribution: This lists each category of data and how often they occur. : The percent of observations within the one of the categories. This

More information

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures Part I, Chapters 4 & 5 Data Tables and Data Analysis Statistics and Figures Descriptive Statistics 1 Are data points clumped? (order variable / exp. variable) Concentrated around one value? Concentrated

More information

Chapter 2: The Normal Distributions

Chapter 2: The Normal Distributions Chapter 2: The Normal Distributions Measures of Relative Standing & Density Curves Z-scores (Measures of Relative Standing) Suppose there is one spot left in the University of Michigan class of 2014 and

More information

Exploratory Data Analysis

Exploratory Data Analysis Chapter 10 Exploratory Data Analysis Definition of Exploratory Data Analysis (page 410) Definition 12.1. Exploratory data analysis (EDA) is a subfield of applied statistics that is concerned with the investigation

More information

Descriptive Statistics

Descriptive Statistics Chapter 2 Descriptive Statistics 2.1 Descriptive Statistics 1 2.1.1 Student Learning Objectives By the end of this chapter, the student should be able to: Display data graphically and interpret graphs:

More information

Density Curve (p52) Density curve is a curve that - is always on or above the horizontal axis.

Density Curve (p52) Density curve is a curve that - is always on or above the horizontal axis. 1.3 Density curves p50 Some times the overall pattern of a large number of observations is so regular that we can describe it by a smooth curve. It is easier to work with a smooth curve, because the histogram

More information

ECLT 5810 Data Preprocessing. Prof. Wai Lam

ECLT 5810 Data Preprocessing. Prof. Wai Lam ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate

More information

MATH& 146 Lesson 8. Section 1.6 Averages and Variation

MATH& 146 Lesson 8. Section 1.6 Averages and Variation MATH& 146 Lesson 8 Section 1.6 Averages and Variation 1 Summarizing Data The distribution of a variable is the overall pattern of how often the possible values occur. For numerical variables, three summary

More information

Statistics Lecture 6. Looking at data one variable

Statistics Lecture 6. Looking at data one variable Statistics 111 - Lecture 6 Looking at data one variable Chapter 1.1 Moore, McCabe and Craig Probability vs. Statistics Probability 1. We know the distribution of the random variable (Normal, Binomial)

More information

SLStats.notebook. January 12, Statistics:

SLStats.notebook. January 12, Statistics: Statistics: 1 2 3 Ways to display data: 4 generic arithmetic mean sample 14A: Opener, #3,4 (Vocabulary, histograms, frequency tables, stem and leaf) 14B.1: #3,5,8,9,11,12,14,15,16 (Mean, median, mode,

More information

CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016)

CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016) CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1 Daphne Skipper, Augusta University (2016) 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is

More information

Chapter 2: Descriptive Statistics

Chapter 2: Descriptive Statistics Chapter 2: Descriptive Statistics Student Learning Outcomes By the end of this chapter, you should be able to: Display data graphically and interpret graphs: stemplots, histograms and boxplots. Recognize,

More information

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 1 2 Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 2. How to construct (in your head!) and interpret confidence intervals.

More information

CHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and

CHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and CHAPTER-13 Mining Class Comparisons: Discrimination between DifferentClasses: 13.1 Introduction 13.2 Class Comparison Methods and Implementation 13.3 Presentation of Class Comparison Descriptions 13.4

More information

Chapter 6. THE NORMAL DISTRIBUTION

Chapter 6. THE NORMAL DISTRIBUTION Chapter 6. THE NORMAL DISTRIBUTION Introducing Normally Distributed Variables The distributions of some variables like thickness of the eggshell, serum cholesterol concentration in blood, white blood cells

More information

Measures of Dispersion

Measures of Dispersion Measures of Dispersion 6-3 I Will... Find measures of dispersion of sets of data. Find standard deviation and analyze normal distribution. Day 1: Dispersion Vocabulary Measures of Variation (Dispersion

More information

Chapter 6. THE NORMAL DISTRIBUTION

Chapter 6. THE NORMAL DISTRIBUTION Chapter 6. THE NORMAL DISTRIBUTION Introducing Normally Distributed Variables The distributions of some variables like thickness of the eggshell, serum cholesterol concentration in blood, white blood cells

More information

3 Graphical Displays of Data

3 Graphical Displays of Data 3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked

More information

Lecture Notes 3: Data summarization

Lecture Notes 3: Data summarization Lecture Notes 3: Data summarization Highlights: Average Median Quartiles 5-number summary (and relation to boxplots) Outliers Range & IQR Variance and standard deviation Determining shape using mean &

More information

Section 6.3: Measures of Position

Section 6.3: Measures of Position Section 6.3: Measures of Position Measures of position are numbers showing the location of data values relative to the other values within a data set. They can be used to compare values from different

More information

Sections 2.3 and 2.4

Sections 2.3 and 2.4 Sections 2.3 and 2.4 Shiwen Shen Department of Statistics University of South Carolina Elementary Statistics for the Biological and Life Sciences (STAT 205) 2 / 25 Descriptive statistics For continuous

More information

Chapter 6 Normal Probability Distributions

Chapter 6 Normal Probability Distributions Chapter 6 Normal Probability Distributions 6-1 Review and Preview 6-2 The Standard Normal Distribution 6-3 Applications of Normal Distributions 6-4 Sampling Distributions and Estimators 6-5 The Central

More information

Lecture 6: Chapter 6 Summary

Lecture 6: Chapter 6 Summary 1 Lecture 6: Chapter 6 Summary Z-score: Is the distance of each data value from the mean in standard deviation Standardizes data values Standardization changes the mean and the standard deviation: o Z

More information

Data organization. So what kind of data did we collect?

Data organization. So what kind of data did we collect? Data organization Suppose we go out and collect some data. What do we do with it? First we need to figure out what kind of data we have. To illustrate, let s do a simple experiment and collect the height

More information

Basic Statistical Terms and Definitions

Basic Statistical Terms and Definitions I. Basics Basic Statistical Terms and Definitions Statistics is a collection of methods for planning experiments, and obtaining data. The data is then organized and summarized so that professionals can

More information

STAT STATISTICAL METHODS. Statistics: The science of using data to make decisions and draw conclusions

STAT STATISTICAL METHODS. Statistics: The science of using data to make decisions and draw conclusions STAT 515 --- STATISTICAL METHODS Statistics: The science of using data to make decisions and draw conclusions Two branches: Descriptive Statistics: The collection and presentation (through graphical and

More information

Chapter 2: Modeling Distributions of Data

Chapter 2: Modeling Distributions of Data Chapter 2: Modeling Distributions of Data Section 2.2 The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 2 Modeling Distributions of Data 2.1 Describing Location in a Distribution

More information

Vocabulary: Data Distributions

Vocabulary: Data Distributions Vocabulary: Data Distributions Concept Two Types of Data. I. Categorical data: is data that has been collected and recorded about some non-numerical attribute. For example: color is an attribute or variable

More information

Section 2.2 Normal Distributions. Normal Distributions

Section 2.2 Normal Distributions. Normal Distributions Section 2.2 Normal Distributions Normal Distributions One particularly important class of density curves are the Normal curves, which describe Normal distributions. All Normal curves are symmetric, single-peaked,

More information

IAT 355 Visual Analytics. Data and Statistical Models. Lyn Bartram

IAT 355 Visual Analytics. Data and Statistical Models. Lyn Bartram IAT 355 Visual Analytics Data and Statistical Models Lyn Bartram Exploring data Example: US Census People # of people in group Year # 1850 2000 (every decade) Age # 0 90+ Sex (Gender) # Male, female Marital

More information

Chapter 5: The standard deviation as a ruler and the normal model p131

Chapter 5: The standard deviation as a ruler and the normal model p131 Chapter 5: The standard deviation as a ruler and the normal model p131 Which is the better exam score? 67 on an exam with mean 50 and SD 10 62 on an exam with mean 40 and SD 12? Is it fair to say: 67 is

More information

Chapter 3: Describing, Exploring & Comparing Data

Chapter 3: Describing, Exploring & Comparing Data Chapter 3: Describing, Exploring & Comparing Data Section Title Notes Pages 1 Overview 1 2 Measures of Center 2 5 3 Measures of Variation 6 12 4 Measures of Relative Standing & Boxplots 13 16 3.1 Overview

More information