Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Similar documents
Measures of Central Tendency

Chapter 2 Describing, Exploring, and Comparing Data

Measures of Central Tendency. A measure of central tendency is a value used to represent the typical or average value in a data set.

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Averages and Variation

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data

15 Wyner Statistics Fall 2013

STA 570 Spring Lecture 5 Tuesday, Feb 1

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Chapter 2: Descriptive Statistics

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

This chapter will show how to organize data and then construct appropriate graphs to represent the data in a concise, easy-to-understand form.

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

CHAPTER 2: SAMPLING AND DATA

AND NUMERICAL SUMMARIES. Chapter 2

Slide Copyright 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION. Chapter 13. Statistics Sampling Techniques

Lecture Notes 3: Data summarization

Frequency Distributions

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

CHAPTER 2 DESCRIPTIVE STATISTICS

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.

Create a bar graph that displays the data from the frequency table in Example 1. See the examples on p Does our graph look different?

Chapter2 Description of samples and populations. 2.1 Introduction.

Univariate Statistics Summary

Chapter 6: DESCRIPTIVE STATISTICS

10.4 Measures of Central Tendency and Variation

10.4 Measures of Central Tendency and Variation

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

Basic Statistical Terms and Definitions

LESSON 3: CENTRAL TENDENCY

Section 9: One Variable Statistics

Measures of Dispersion

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

1.2. Pictorial and Tabular Methods in Descriptive Statistics

CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016)

MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation

MATH& 146 Lesson 8. Section 1.6 Averages and Variation

+ Statistical Methods in

Downloaded from

MAT 142 College Mathematics. Module ST. Statistics. Terri Miller revised July 14, 2015

Table of Contents (As covered from textbook)

Chapter 3 - Displaying and Summarizing Quantitative Data

Learning Log Title: CHAPTER 7: PROPORTIONS AND PERCENTS. Date: Lesson: Chapter 7: Proportions and Percents

Name Date Types of Graphs and Creating Graphs Notes

Descriptive Statistics

Chapter 1. Looking at Data-Distribution

Math 214 Introductory Statistics Summer Class Notes Sections 3.2, : 1-21 odd 3.3: 7-13, Measures of Central Tendency

Measures of Dispersion

CHAPTER 3: Data Description

1.3 Graphical Summaries of Data

NOTES TO CONSIDER BEFORE ATTEMPTING EX 1A TYPES OF DATA

To calculate the arithmetic mean, sum all the values and divide by n (equivalently, multiple 1/n): 1 n. = 29 years.

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

September 11, Unit 2 Day 1 Notes Measures of Central Tendency.notebook

ECLT 5810 Data Preprocessing. Prof. Wai Lam

STP 226 ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES

Descriptive Statistics

Exploratory Data Analysis

M7D1.a: Formulate questions and collect data from a census of at least 30 objects and from samples of varying sizes.

UNIT 1A EXPLORING UNIVARIATE DATA

Lesson 18-1 Lesson Lesson 18-1 Lesson Lesson 18-2 Lesson 18-2

TMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS

The first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies.

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- #

Chapter Two: Descriptive Methods 1/50

AP Statistics Prerequisite Packet

BUSINESS DECISION MAKING. Topic 1 Introduction to Statistical Thinking and Business Decision Making Process; Data Collection and Presentation

MATH 117 Statistical Methods for Management I Chapter Two

Maths Revision Worksheet: Algebra I Week 1 Revision 5 Problems per night

Middle School Math Course 3

No. of blue jelly beans No. of bags

WHOLE NUMBER AND DECIMAL OPERATIONS

Special Review Section. Copyright 2014 Pearson Education, Inc.

Chapter 5snow year.notebook March 15, 2018

AP Statistics Summer Assignment:

Unit 7 Statistics. AFM Mrs. Valentine. 7.1 Samples and Surveys

Test Bank for Privitera, Statistics for the Behavioral Sciences

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

6th Grade Vocabulary Mathematics Unit 2

Chapter 3: Data Description - Part 3. Homework: Exercises 1-21 odd, odd, odd, 107, 109, 118, 119, 120, odd

Chapter 3: Describing, Exploring & Comparing Data

CHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and

Understanding Statistical Questions

Lecture 3: Chapter 3

Unit I Supplement OpenIntro Statistics 3rd ed., Ch. 1

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

Mean,Median, Mode Teacher Twins 2015

UNIT 1: NUMBER LINES, INTERVALS, AND SETS

How individual data points are positioned within a data set.

Numerical Summaries of Data Section 14.3

Chapter 2: Frequency Distributions

Name: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution

APS Seventh Grade Math District Benchmark Assessment NM Math Standards Alignment

a. divided by the. 1) Always round!! a) Even if class width comes out to a, go up one.

MATH 112 Section 7.2: Measuring Distribution, Center, and Spread

Transcription:

+ What is Data? Data is a collection of facts. Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. In most cases, data needs to be interpreted and analyzed to provide useful information. For example: The height of a mountain is considered a data point. Gathering more data on the landscape and temperatures on the mountain gives us very good information about what the mountain area might look like. One could then use the data and information to create a guide on the best way to climb the mountain.

+ Why is Data important? Gathering information and data is an important way to help people make decisions about topics of interest. Gathering data can help identify needs and problems in a community. It can be used to find solutions to the issues. Information and data gathering can help you in getting to know the people around you.

+ Qualitative versus Quantitative Data can be qualitative, where it describes something. Data can be quantitative, it will be in number form. Discrete data is counted and continuous data is measured.

+ An example What do we know about the elephant? Qualitative: It is gray It is large It does not have fur Quantitative: It has four legs (discrete) It has one trunk (discrete) It weighs 7,543.2 kg (continuous) It can be up to 13.5 feet tall (continuous)

+ Collecting Data Data can be collected in many different ways. The simplest way is by observing: An Example: You want to find out how many children use the Hello World terminal every day You would simply sit next to the Hello World terminal for the day and count how many children use the terminal.

+ Survey Surveys can help answer any other question that might be of interest. Surveys can also helps us to decide if things are going well or not going so well. There are four steps to a successful survey: Create the questions Ask the questions Count and analyze the results Present the results

What is Statistics? Statistics is a way to get information from data Statistics Data Information Data: Facts, especially numerical facts, collected together for reference or information. Information: Knowledge communicated concerning some particular fact. Statistics is a tool for creating new understanding from a set of numbers. Definitions: Oxford English Dictionary Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.8

Key Statistical Concepts Population a population is the group of all items of interest to a statistics practitioner. frequently very large; sometimes infinite. E.g. All 5 million Florida voters, per Example 12.5 Sample A sample is a set of data drawn from the population. Potentially very large, but less than the population. E.g. a sample of 765 voters exit polled on election day. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.9

Key Statistical Concepts Parameter A descriptive measure of a population. Statistic A descriptive measure of a sample. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.10

Key Statistical Concepts Population Sample Subset Parameter Populations have Parameters, Statistic Samples have Statistics. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.11

Descriptive Statistics are methods of organizing, summarizing, and presenting data in a convenient and informative way. These methods include: Graphical Techniques (Chapter 2), and Numerical Techniques (Chapter 4). The actual method used depends on what information we would like to extract. Are we interested in measure(s) of central location? and/or measure(s) of variability (dispersion)? Descriptive Statistics helps to answer these questions Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.12

Statistical Inference Statistical inference is the process of making an estimate, prediction, or decision about a population based on a sample. Population Sample Inference Parameter Statistic What can we infer about a Population s Parameters based on a Sample s Statistics? Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.13

Definitions A variable is some characteristic of a population or sample. E.g. student grades. Typically denoted with a capital letter: X, Y, Z The values of the variable are the range of possible values for a variable. E.g. student marks (0..100) Data are the observed values of a variable. E.g. student marks: {67, 74, 71, 83, 93, 55, 48} Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.14

Interval Data Interval data Real numbers, i.e. heights, weights, prices, etc. Also referred to as quantitative or numerical. Arithmetic operations can be performed on Interval Data, thus its meaningful to talk about 2*Height, or Price + $1, and so on. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.15

Nominal Data Nominal Data The values of nominal data are categories. E.g. responses to questions about marital status, coded as: Single = 1, Married = 2, Divorced = 3, Widowed = 4 Because the numbers are arbitrary arithmetic operations don t make any sense (e.g. does Widowed 2 = Married?!) Nominal data are also called qualitative or categorical. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.16

Ordinal Data Ordinal Data appear to be categorical in nature, but their values have an order; a ranking to them: E.g. College course rating system: poor = 1, fair = 2, good = 3, very good = 4, excellent = 5 While its still not meaningful to do arithmetic on this data (e.g. does 2*fair = very good?!), we can say things like: excellent > poor or fair < very good That is, order is maintained no matter what numeric values are assigned to each category. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.17

Graphical & Tabular Techniques for Nominal Data The only allowable calculation on nominal data is to count the frequency of each value of the variable. We can summarize the data in a table that presents the categories and their counts called a frequency distribution. A relative frequency distribution lists the categories and the proportion with which each occurs. Refer to Example 2.1 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.18

Nominal Data (Tabular Summary) Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.19

Nominal Data (Frequency) Bar Charts are often used to display frequencies Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.20

Nominal Data It all the same information, (based on the same data). Just different presentation. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.21

Graphical Techniques for Interval Data There are several graphical methods that are used when the data are interval (i.e. numeric, non-categorical). The most important of these graphical methods is the histogram. The histogram is not only a powerful graphical technique used to summarize interval data, but it is also used to help explain probabilities. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.22

Building a Histogram 1) Collect the Data 2) Create a frequency distribution for the data. 3) Draw the Histogram. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.23

Histogram and Stem & Leaf Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.24

Numerical Descriptive Measures To describe the properties of central tendency, variation, and shape in numerical data To construct and interpret a boxplot To compute descriptive summary measures for a population Chap 3-25 Chap 3-25

Summary Definitions The central tendency is the extent to which all the data values group around a typical or central value. The variation is the amount of dispersion or scattering of values The shape is the pattern of the distribution of values from the lowest value to the highest value. Chap 3-26 Chap 3-26

Measures of Central Tendency: The Mean The arithmetic mean (often just called the mean ) is the most common measure of central tendency Pronounced x-bar For a sample of size n: The i th value X n i1 n X i X 1 X 2 n X n Sample size Observed values Chap 3-27 Chap 3-27

Measures of Central Tendency: The Mean The most common measure of central tendency Mean = sum of values divided by the number of values Affected by extreme values (outliers) 11 12 13 14 15 16 17 18 19 20 11 12 13 14 15 16 17 18 19 20 Mean = 13 Mean = 14 1112 1314 15 65 1112 13 14 20 70 13 14 5 5 5 5 Chap 3-28 Chap 3-28

Measures of Central Tendency: The Median In an ordered array, the median is the middle number (50% above, 50% below) 11 12 13 14 15 16 17 18 19 20 11 12 13 14 15 16 17 18 19 20 Median = 13 Median = 13 Not affected by extreme values Chap 3-29 Chap 3-29

Measures of Central Tendency: Locating the Median The location of the median when the values are in numerical order (smallest to largest): n 1 Median position position in the ordered data 2 If the number of values is odd, the median is the middle number If the number of values is even, the median is the average of the two middle numbers Note that n 1 is not the value of the median, only the position of 2 the median in the ranked data Chap 3-30 Chap 3-30

Measures of Central Tendency: The Mode Value that occurs most often Not affected by extreme values Used for either numerical or categorical (nominal) data There may be no mode There may be several modes 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6 Mode = 9 No Mode Chap 3-31 Chap 3-31

Measures of Central Tendency: Review Example House Prices: $2,000,000 $ 500,000 $ 300,000 $ 100,000 $ 100,000 Sum $ 3,000,000 Mean: ($3,000,000/5) = $600,000 Median: middle value of ranked data = $300,000 Mode: most frequent value = $100,000 Chap 3-32 Chap 3-32

Measures of Central Tendency: Which Measure to Choose? The mean is generally used, unless extreme values (outliers) exist. The median is often used, since the median is not sensitive to extreme values. For example, median home prices may be reported for a region; it is less sensitive to outliers. In some situations it makes sense to report both the mean and the median. Chap 3-33 Chap 3-33

Measures of Central Tendency: Summary Central Tendency Arithmetic Mean Median Mode X n Xi i 1 n Middle value in the ordered array Most frequently observed value Chap 3-34 Chap 3-34

Measures of Variation Variation Range Variance Standard Deviation Coefficient of Variation Measures of variation give information on the spread or variability or dispersion of the data values. Same center, different variation Chap 3-35 Chap 3-35

Measures of Variation: The Range Simplest measure of variation Difference between the largest and the smallest values: Range = X largest X smallest Example: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Range = 13-1 = 12 Chap 3-36 Chap 3-36

Measures of Variation: Why The Range Can Be Misleading Ignores the way in which data are distributed 7 8 9 10 11 12 Range = 12-7 = 5 7 8 9 10 11 12 Range = 12-7 = 5 Sensitive to outliers 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5 Range = 5-1 = 4 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120 Range = 120-1 = 119 Chap 3-37 Chap 3-37

Measures of Variation: The Sample Variance Average (approximately) of squared deviations of values from the mean Sample variance: S 2 n i1 (X i n -1 X) 2 Where X = arithmetic mean n = sample size X i = i th value of the variable X Chap 3-38 Chap 3-38

Measures of Variation: The Sample Standard Deviation Most commonly used measure of variation Shows variation about the mean Is the square root of the variance Has the same units as the original data Sample standard deviation: S n i1 (X i n -1 X) 2 Chap 3-39 Chap 3-39

Measures of Variation: Sample Standard Deviation Calculation Example Sample Data (X i ) : 10 12 14 15 17 18 18 24 n = 8 Mean = X = 16 S (10 X) 2 (12 X) 2 (14 n 1 X) 2 (24 X) 2 (10 16) 2 (12 16) 2 (14 16) 8 1 2 (24 16) 2 130 7 4.3095 A measure of the average scatter around the mean Chap 3-40 Chap 3-40

Measures of Variation: Comparing Standard Deviations Data A 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 S = 3.338 Data B 11 12 13 14 15 16 17 18 19 20 21 Data C 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 S = 0.926 Mean = 15.5 S = 4.570 Chap 3-41 Chap 3-41

Measures of Variation: Comparing Standard Deviations Smaller standard deviation Larger standard deviation Chap 3-42 Chap 3-42

Measures of Variation: Summary Characteristics The more the data are spread out, the greater the range, variance, and standard deviation. The more the data are concentrated, the smaller the range, variance, and standard deviation. If the values are all the same (no variation), all these measures will be zero. None of these measures are ever negative. Chap 3-43 Chap 3-43

Measures of Variation: The Coefficient of Variation Measures relative variation Always in percentage (%) Shows variation relative to mean Can be used to compare the variability of two or more sets of data measured in different units CV S X 100% Chap 3-44 Chap 3-44

Measures of Variation: Comparing Coefficients of Variation Stock A: Average price last year = $50 Standard deviation = $5 CV A Stock B: S X 100% Average price last year = $100 Standard deviation = $5 CV B S X 100% $5 $50 $5 $100 100% 10% 100% 5% Both stocks have the same standard deviation, but stock B is less variable relative to its price Chap 3-45 Chap 3-45

Measures of Variation: Comparing Coefficients of Variation Stock A: Average price last year = $50 Standard deviation = $5 CV A Stock C: S X 100% Average price last year = $8 Standard deviation = $2 CV C S X 100% $5 $50 $2 $8 100% 10% 100% 25% Stock C has a much smaller standard deviation but a much higher coefficient of variation Chap 3-46 Chap 3-46

Locating Extreme Outliers: Z-Score To compute the Z-score of a data value, subtract the mean and divide by the standard deviation. The Z-score is the number of standard deviations a data value is from the mean. A data value is considered an extreme outlier if its Z- score is less than -3.0 or greater than +3.0. The larger the absolute value of the Z-score, the farther the data value is from the mean. Chap 3-47 Chap 3-47

Locating Extreme Outliers: Z-Score Z X S X where X represents the data value X is the sample mean S is the sample standard deviation Chap 3-48 Chap 3-48

Locating Extreme Outliers: Z-Score Suppose the mean math SAT score is 490, with a standard deviation of 100. Compute the Z-score for a test score of 620. Z X S X 620 490 100 130 100 1.3 A score of 620 is 1.3 standard deviations above the mean and would not be considered an outlier. Chap 3-49 Chap 3-49

Quartile Measures Quartiles split the ranked data into 4 segments with an equal number of values per segment 25% 25% 25% 25% Q1 Q2 Q3 The first quartile, Q 1, is the value for which 25% of the observations are smaller and 75% are larger Q 2 is the same as the median (50% of the observations are smaller and 50% are larger) Only 25% of the observations are greater than the third quartile Chap 3-50 Chap 3-50

Quartile Measures: Locating Quartiles Find a quartile by determining the value in the appropriate position in the ranked data, where First quartile position: Q 1 = (n+1)/4 ranked value Second quartile position: Q 2 = 2(n+1)/4 ranked value Third quartile position: Q 3 = 3(n+1)/4 ranked value where n is the number of observed values Chap 3-51 Chap 3-51

Quartile Measures: Calculation Rules When calculating the ranked position use the following rules If the result is a whole number then it is the ranked position to use If the result is a fractional half (e.g. 2.5, 7.5, 8.5, etc.) then average the two corresponding data values. If the result is not a whole number or a fractional half then round the result to the nearest integer to find the ranked position. Chap 3-52 Chap 3-52

Quartile Measures: Locating Quartiles Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22 (n = 9) Q 1 is in the (9+1)/4 = 2.5 position of the ranked data so use the value half way between the 2 nd and 3 rd values, so Q 1 = 12.5 Q 1 and Q 3 are measures of non-central location Q 2 = median, is a measure of central tendency Chap 3-53 Chap 3-53

Quartile Measures Calculating The Quartiles: Example Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22 (n = 9) Q 1 is in the (9+1)/4 = 2.5 position of the ranked data, so Q 1 = (12+13)/2 = 12.5 Q 2 is in the 2(9+1)/4 = 5 th position of the ranked data, so Q 2 = median = 16 Q 3 is in the 3(9+1)/4 = 7.5 position of the ranked data, so Q 3 = (18+21)/2 = 19.5 Q 1 and Q 3 are measures of non-central location Q 2 = median, is a measure of central tendency Chap 3-54 Chap 3-54

The Five-Number Summary The five numbers that help describe the center, spread and shape of data are: X smallest First Quartile (Q 1 ) Median (Q 2 ) Third Quartile (Q 3 ) X largest Chap 3-55 Chap 3-55

Five-Number Summary and The Boxplot The Boxplot: A Graphical display of the data based on the five-number summary: X smallest -- Q 1 -- Median -- Q 3 -- X largest Example: 25% of data 25% 25% 25% of data of data of data X smallest Q 1 Median Q 3 X largest Chap 3-56 Chap 3-56

Five-Number Summary: Shape of Boxplots If data are symmetric around the median then the box and central line are centered between the endpoints X smallest Q 1 Median Q 3 X largest A Boxplot can be shown in either a vertical or horizontal orientation Chap 3-57 Chap 3-57

Boxplot Example Below is a Boxplot for the following data: X smallest Q 1 Q 2 Q 3 X largest 0 2 2 2 3 3 4 5 5 9 27 0 2 3 5 27 Chap 3-58 Chap 3-58