Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

Similar documents
Table of Contents (As covered from textbook)

Chapter 3 - Displaying and Summarizing Quantitative Data

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

AND NUMERICAL SUMMARIES. Chapter 2

Chapter 2 Describing, Exploring, and Comparing Data

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Chapter 5. Understanding and Comparing Distributions. Copyright 2012, 2008, 2005 Pearson Education, Inc.

MATH& 146 Lesson 10. Section 1.6 Graphing Numerical Data

UNIT 1A EXPLORING UNIVARIATE DATA

Chapter 1. Looking at Data-Distribution

Averages and Variation

Understanding and Comparing Distributions. Chapter 4

AP Statistics Summer Assignment:

Chapter 2 Modeling Distributions of Data

STA Module 4 The Normal Distribution

STA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves

Chapter 5. Understanding and Comparing Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

No. of blue jelly beans No. of bags

1.3 Graphical Summaries of Data

Chapter 6: DESCRIPTIVE STATISTICS

Name: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution

STA 570 Spring Lecture 5 Tuesday, Feb 1

appstats6.notebook September 27, 2016

Frequency Distributions

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

Measures of Central Tendency. A measure of central tendency is a value used to represent the typical or average value in a data set.

Learning Log Title: CHAPTER 7: PROPORTIONS AND PERCENTS. Date: Lesson: Chapter 7: Proportions and Percents

Univariate Statistics Summary

Chapter 6: Comparing Two Means Section 6.1: Comparing Two Groups Quantitative Response

TMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS

Descriptive Statistics

Measures of Central Tendency

MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation

Unit 7 Statistics. AFM Mrs. Valentine. 7.1 Samples and Surveys

CHAPTER 2: SAMPLING AND DATA

Section 1.2. Displaying Quantitative Data with Graphs. Mrs. Daniel AP Stats 8/22/2013. Dotplots. How to Make a Dotplot. Mrs. Daniel AP Statistics

STP 226 ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES

15 Wyner Statistics Fall 2013

Section 3.1 Shapes of Distributions MDM4U Jensen

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- #

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Middle Years Data Analysis Display Methods

3 Graphical Displays of Data

Lecture 6: Chapter 6 Summary

Center, Shape, & Spread Center, shape, and spread are all words that describe what a particular graph looks like.

DAY 52 BOX-AND-WHISKER

The main issue is that the mean and standard deviations are not accurate and should not be used in the analysis. Then what statistics should we use?

Measures of Central Tendency

Math 167 Pre-Statistics. Chapter 4 Summarizing Data Numerically Section 3 Boxplots

Name Date Types of Graphs and Creating Graphs Notes

a. divided by the. 1) Always round!! a) Even if class width comes out to a, go up one.

Chapter 2 - Graphical Summaries of Data

CHAPTER 2 DESCRIPTIVE STATISTICS

2.1: Frequency Distributions and Their Graphs

Chapter2 Description of samples and populations. 2.1 Introduction.

M7D1.a: Formulate questions and collect data from a census of at least 30 objects and from samples of varying sizes.

CHAPTER 3: Data Description

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

3 Graphical Displays of Data

CHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and

Chapter 5: The standard deviation as a ruler and the normal model p131

Probability and Statistics. Copyright Cengage Learning. All rights reserved.

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Bar Graphs and Dot Plots

Week 2: Frequency distributions

Section 2-2 Frequency Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc

Chpt 3. Data Description. 3-2 Measures of Central Tendency /40

Math 214 Introductory Statistics Summer Class Notes Sections 3.2, : 1-21 odd 3.3: 7-13, Measures of Central Tendency

Lecture 3 Questions that we should be able to answer by the end of this lecture:

CHAPTER 2 Modeling Distributions of Data

Chapter 3 Analyzing Normal Quantitative Data

AP Statistics Prerequisite Packet

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Probability and Statistics. Copyright Cengage Learning. All rights reserved.

Day 4 Percentiles and Box and Whisker.notebook. April 20, 2018

Descriptive Statistics

Chapter 3: Describing, Exploring & Comparing Data

Section 9: One Variable Statistics

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Measures of Dispersion

Exploratory Data Analysis

The first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies.

Homework Packet Week #3

Descriptive Statistics, Standard Deviation and Standard Error

1.2. Pictorial and Tabular Methods in Descriptive Statistics

Lecture Notes 3: Data summarization

MAT 110 WORKSHOP. Updated Fall 2018

CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016)

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.

Chapter 3: Data Description - Part 3. Homework: Exercises 1-21 odd, odd, odd, 107, 109, 118, 119, 120, odd

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

MATH NATION SECTION 9 H.M.H. RESOURCES

Stat Day 6 Graphs in Minitab

Chapter 2: Descriptive Statistics

10.4 Measures of Central Tendency and Variation

Transcription:

5-number summary 68-95-99.7 Rule Area principle Bar chart Bimodal Boxplot Case Categorical data Categorical variable Center Changing center and spread Conditional distribution Context Contingency table Data Data table A 5-number summary for a variable consists of the minimum, the lower quartile, the median, the upper quartile, and the maximum. In a normal model, about 68% of values fall within 1 standard deviation of the mean, about 95% fall within 2 standard deviations of the mean, and about 99.7% fall within 3 standard deviations of the mean. In a statistical display, each data value should be represented by the same amount of area. Bar charts show a bar representing the count of each category in a categorical variable. Distributions with two modes. A boxplot displays the 5-number summary as a central box with whiskers that extend to the non-outlying data values. Boxplots are particularly effective for comparing groups of different sizes. A case is an individual about whom or which we have data. Before making a bar or pie chart, be sure that the categorical data is in counts or percentages of individuals. A variable that names categories (whether with words or numerals) is called categorical. A value that attempts the impossible by summarizing the entire distribution with a single number, a "typical" value. We summarize the center of a distribution with the mean or the median. Changing the center and spread of a variable is equivalent to changing its units. The distribution of a variable restricting the Who to consider only a smaller group of individuals is called a al distribution. The context ideally tells Who was measured, What was measured, How the data were collected, Where the data were collected, and When and Why the study was performed. A contingency table displays counts and, sometimes, percentages of individuals falling into named categories on two or more variables. The table categorizes the individuals on all variables at once, to reveal possible patterns in one variable that may be contingent on the category of the other. Systematically recorded information, whether numbers, or labels, together with its context. An arrangement of data in which each row represents a case and each column represents a variable.

Distribution Dotplot Experimental units The distribution of a variable gives the possible values of the variable and the relative frequency of each value. A dotplot graphs a dot for each case against a single axis. Animals, plants, web sites, and other inanimate objects upon whom we experiment Frequency table Gaps Histogram Independence Interquartile range (IQR) Lower Quartile Marginal distribution Mean Median Midrange Modes Multimodal Nearly normal Normal model Normal percentiles Normal probability plot Normality assumption A frequency table lists the categories in a categorical variable and gives the count of observations for each category. Regions of a histogram that have no values for a given data set. A histogram uses adjacent bars to show the distribution of values in a quantitative variable. Each bar represents the frequency of values falling in an interval of values. Variables are said to be independent if the al distribution of one variable is the same for each category of the other. The difference between the first and third quartile (IQR = Q3 - Q1). The lower quartile (Q1) is the value with a quarter of the data below it (the median of the lower half of the data set). In a contingency table, the distribution of either variable alone is called the marginal distribution. The counts or percentages are the totals found in the margins (last row or column) of the table. The mean is found by summing all the data values and dividing by the count. The median is the middle value of an organized data set, with half of the data above and half below it. The mean of the minimum and maximum values of a set of data. A hump or local high point in the shape of the distribution of a variable is called a "mode." The apparent location of modes can change as the scale of a histogram is changed. Distributions with more than two modes. The shape of the distribution of a data set is unimodal and symmetric. A useful family of models for unimodal, symmetric distributions. The normal percentile corresponding to a z-score gives the percentage of values in a standard normal distribution found at that z-score or below. A display to help assess whether a distribution of data is approximately normal. If the plot is nearly straight, the data satisfy the nearly normal. When using a normal model, we make the assumption that the distribution of the data is normal.

Outliers Parameter Percentages Percentiles Pie chart Proportion Quantitative data Quantitative variable Quartiles Range Records Re-express / Transform bar chart histogram table Rescaling Respondents Segmented bar chart Outliers are extreme values that don't appear to belong with the rest of the data. They may be unusual values that deserve further investigation, or just mistakes; there's no obvious way to tell. Don't delete outliers automatically - you have to think about them. Outliers can affect many statistical analyses, so you should always be alert for them. A numerically valued attribute of a model. For example, the values of μ and σ in a N (μ, σ) model are parameters. Multiplying proportions by 100 to express as percentages, or ratios compared to a total of 100. The ith percentile is the number that falls above i% of the data. Pie charts show how a "whole" divides into categories by showing a wedge of a circle whose area corresponds to the proportion in each category. A comparison of numbers. For our use, dividing the counts by the total number of cases. The data are values of a quantitative variable whose units are known. A variable in which the numbers act as numerical values is called quantitative. Quantitative variables always have units. The median and the quartiles divide data into four equal parts. The difference between the lowest and highest values in a data set (maximum - minimum). The rows in a database, usually identifying cases Applying a simple function to a set of data to make a skewed distribution more symmetric. A bar chart that shows the proportion of people in each category rather than counts. A histogram uses adjacent bars to show the distribution of values in a quantitative variable. Each bar represents the relative frequency of values falling in an interval of values. A frequency table lists the categories in a categorical variable and gives the percentage of observations for each category. Multiplying each data value by a constant multiplies both the measures of position (mean, median, and quartiles) and the measures of spread (standard deviation and IQR) by that constant. Individuals who answer a survey A bar chart where each bar is treated as the "whole" and divides the bar proportionally into segments corresponding to the percentage in each group.

Shape Shifting Simpson's paradox Skewed Spread Standard deviation Standard normal model / distribution Standardized value Standardizing Statistic Stem-and-leaf display Unit 1 Data Distributions To describe the shapes of a distribution, look for single versus multiple modes and symmetry versus skewness. Adding a constant to each data value adds the same constant to the mean, the median, and the quartiles, but does not change the standard deviation or IQR. When averages are taken across different groups, they can appear to contradict the overall averages. This is known as "Simpson's paradox." A distribution is skewed if it's not symmetric and one tail stretches out farther than the other. Distributions are said to be skewed left when the longer tail stretches to the left, and skewed right when it goes to the right. A numerical summary of how tightly the values are clustered around the "center." We summarize the spread of a distribution with the standard deviation, interquartile range, and range. The standard deviation is the square root of the variance. A normal model, N (μ, σ), with mean μ = 0 and standard deviation σ = 1. A value found by subtracting the mean and dividing by the standard deviation. We standardize the eliminate units. Standardized values can be compared and combined even if the original variables had different units and magnitudes. A value calculated from data to summarize aspects of the data. A stem-and-leaf display shows quantitative data values in a way that sketches the distribution of the data. Subjects/Participants People on whom we experiment Symmetric Tails Timeplot Uniform Unimodal Units Upper Quartile A distribution is symmetric if the two halves on either side of the center look approximately like mirror images of each other. The tails of a distribution are the parts that typically trail off on either side. Distributions can be characterized as having long tails (if they straggle off for some distance) or short tails (if they don't). A timeplot displays data that change over time. Often, successive values are connected with lines to show trends more clearly. A distribution that's roughly flat. Having one mode. This a useful term for describing the shape of a histogram when it's generally mound-shaped. A quantity or amount adopted as a standard of measurement, such as dollars, hours, or grams. The upper quartile (Q3) is the value with the a quarter of the data above it (the median of the upper half of the data set).

Variable Variance z-score A variable holds information about the same characteristic for many cases. The variance is the sum of squared deviations from the mean, divided by the count minus one. A z-score tells how many standard deviations a value is from the mean; z-scores have a mean of zero and a standard deviation of one.