Descriptive Statistics

Similar documents
Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

15 Wyner Statistics Fall 2013

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

CHAPTER 2: SAMPLING AND DATA

10.4 Measures of Central Tendency and Variation

10.4 Measures of Central Tendency and Variation

CHAPTER 2 DESCRIPTIVE STATISTICS

Mean,Median, Mode Teacher Twins 2015

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

M7D1.a: Formulate questions and collect data from a census of at least 30 objects and from samples of varying sizes.

Chapter 1. Looking at Data-Distribution

CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016)

Univariate Statistics Summary

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Measures of Central Tendency. A measure of central tendency is a value used to represent the typical or average value in a data set.

Basic Statistical Terms and Definitions

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 2 Describing, Exploring, and Comparing Data

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

2.1: Frequency Distributions and Their Graphs

Measures of Central Tendency

Name Date Types of Graphs and Creating Graphs Notes

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data

STP 226 ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

Section 9: One Variable Statistics

Week 2: Frequency distributions

UNIT 1A EXPLORING UNIVARIATE DATA

STA 570 Spring Lecture 5 Tuesday, Feb 1

Measures of Dispersion

Probability and Statistics. Copyright Cengage Learning. All rights reserved.

Unit 7 Statistics. AFM Mrs. Valentine. 7.1 Samples and Surveys

Chapter 3 - Displaying and Summarizing Quantitative Data

How individual data points are positioned within a data set.

Averages and Variation

Table of Contents (As covered from textbook)

Measures of Position

Learning Log Title: CHAPTER 7: PROPORTIONS AND PERCENTS. Date: Lesson: Chapter 7: Proportions and Percents

Box and Whisker Plot Review A Five Number Summary. October 16, Box and Whisker Lesson.notebook. Oct 14 5:21 PM. Oct 14 5:21 PM.

Homework Packet Week #3

Slide Copyright 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION. Chapter 13. Statistics Sampling Techniques

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

DAY 52 BOX-AND-WHISKER

Chapter 3: Describing, Exploring & Comparing Data

a. divided by the. 1) Always round!! a) Even if class width comes out to a, go up one.

Measures of Dispersion

Chapter 11. Worked-Out Solutions Explorations (p. 585) Chapter 11 Maintaining Mathematical Proficiency (p. 583)

MATH NATION SECTION 9 H.M.H. RESOURCES

AND NUMERICAL SUMMARIES. Chapter 2

CHAPTER 3: Data Description

The first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies.

Organizing and Summarizing Data

AP Statistics Summer Assignment:

Math 167 Pre-Statistics. Chapter 4 Summarizing Data Numerically Section 3 Boxplots

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.

Name Geometry Intro to Stats. Find the mean, median, and mode of the data set. 1. 1,6,3,9,6,8,4,4,4. Mean = Median = Mode = 2.

STANDARDS OF LEARNING CONTENT REVIEW NOTES ALGEBRA I. 4 th Nine Weeks,

STANDARDS OF LEARNING CONTENT REVIEW NOTES. ALGEBRA I Part II. 3 rd Nine Weeks,

Descriptive Statistics

Lecture 1: Exploratory data analysis

MAT 142 College Mathematics. Module ST. Statistics. Terri Miller revised July 14, 2015

+ Statistical Methods in

Chapter 3: Data Description - Part 3. Homework: Exercises 1-21 odd, odd, odd, 107, 109, 118, 119, 120, odd

8: Statistics. Populations and Samples. Histograms and Frequency Polygons. Page 1 of 10

Quantitative - One Population

Chapter 3 Analyzing Normal Quantitative Data

Chapter 5: The standard deviation as a ruler and the normal model p131

Understanding Statistical Questions

STA Module 4 The Normal Distribution

STA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

The main issue is that the mean and standard deviations are not accurate and should not be used in the analysis. Then what statistics should we use?

1.2. Pictorial and Tabular Methods in Descriptive Statistics

Learning Log Title: CHAPTER 8: STATISTICS AND MULTIPLICATION EQUATIONS. Date: Lesson: Chapter 8: Statistics and Multiplication Equations

Day 4 Percentiles and Box and Whisker.notebook. April 20, 2018

Special Review Section. Copyright 2014 Pearson Education, Inc.

3. Data Analysis and Statistics

Measures of Position. 1. Determine which student did better

No. of blue jelly beans No. of bags

Chapter 3. Descriptive Measures. Slide 3-2. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Date Lesson TOPIC HOMEWORK. Displaying Data WS 6.1. Measures of Central Tendency WS 6.2. Common Distributions WS 6.6. Outliers WS 6.

Center, Shape, & Spread Center, shape, and spread are all words that describe what a particular graph looks like.

Chapter 2 Organizing and Graphing Data. 2.1 Organizing and Graphing Qualitative Data

Chapter2 Description of samples and populations. 2.1 Introduction.

appstats6.notebook September 27, 2016

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- #

Chapter 5. Normal. Normal Curve. the Normal. Curve Examples. Standard Units Standard Units Examples. for Data

Lecture Notes 3: Data summarization

AP Statistics Prerequisite Packet

Exploratory Data Analysis

Density Curve (p52) Density curve is a curve that - is always on or above the horizontal axis.

Unit I Supplement OpenIntro Statistics 3rd ed., Ch. 1

LESSON 3: CENTRAL TENDENCY

ECLT 5810 Data Preprocessing. Prof. Wai Lam

Lecture Series on Statistics -HSTC. Frequency Graphs " Dr. Bijaya Bhusan Nanda, Ph. D. (Stat.)

IT 403 Practice Problems (1-2) Answers

MAT 110 WORKSHOP. Updated Fall 2018

Transcription:

Descriptive Statistics Library, Teaching & Learning 014

Summary of Basic data Analysis DATA Qualitative Quantitative Counted Measured Discrete Continuous 3 Main Measures of Interest Central Tendency Dispersion (spread) Shape symmetric Mean: use calculator in STAT s.d.mode mode Median: Order the data and find the middle score Mode: find the most common value(s) Range: Subtract the lowest value from the highest value Quartiles: see rules below Standard deviation: use calculator in s.d. STAT MODE mode Variance: square the s.d. Asymmetric To find the median and quartiles For the lower quartile, median and upper quartiles respectively ie the 5 th, 50 th and 75 th percentiles: Calculate 5% 50% or 75% of n. if the answer is an integer, the quartile or median is the average of scores which correspond to this order data value and the next. if the answer is not an integer, the quartile or median is the score which corresponds to the next data value in order. Interquartile Range: The difference between the upper and lower quartiles (ie UQ - LQ) Semi-interquartile Range : Half the interquartile range Outliers: Any values that are less than L.Q. minus 1.5 times the IQR or any values that are greater than U.Q. plus 1.5 times the IQR General terms and symbols you should be conversant with. Binomial Continuous Discrete Coefficient Variation Contingency Variable Random Normal Central tendency Mean Measures of spread Probability Range Standard Normal Standard Score Z- score Population Sample Variance These are words that have a quite specific and rigorous meaning in the world of mathematics and statistics. Each will need learning as it applies in the context in which you meet it

USEFUL METHODS OF DATA DISPLAY In the following example, heights of trees in two orchards are measured (in metres). The results of the two samples (in order) are: Orchard A 10.0, 11., 11.4, 1.5, 1.7, 13.8, 13.8, 13.9, 14.0, 14.0, 14.3, 14.7, 14.7, 14.7, 15.1, 15.1, 15.5, 15.6, 15.8, 15.9 Orchard B 9.8, 9.9, 10.1, 10.3, 11.0, 11., 11.4, 11.6, 1.3, 1.4, 1.5, 1.7, 1.8, 13., 13.3, 13.7, 14.0, 14.1, 14.6, 15.3 To construct a stem plot (stem and leaf display), each measurement is considered to be made up of two parts - the stem, which consists of the leading digit(s) and - the leaf, which is usually the last digit To represent the values 15 and 115 in a stem and leaf plot, we classify the leading digit, 1 in 15 as the stem and 5 as the leaf, while for 115, the leading pair 11 would be the stem and 5 the leaf. In doing so, we will get the following stem plot (a) for the trees of orchard B and, in addition, (b), a back to back stem plot for the data from both orchards. (a) stem plot one distribution (b) stem plot two distributions Orchard B 9 8 9 10 1 3 11 0 4 6 1 3 4 5 7 8 13 3 7 14 0 1 6 Orchard A Orchard B 9 8 9 0 10 1 3 4 11 0 4 6 7 5 1 3 4 5 7 8 9 8 8 13 3 7 7 7 7 3 0 0 14 0 1 6 This enables 15 us to 3 get an instant feel for the difference 9 between 8 6 5 1 the 1 results 15 3 from the two orchards. i.e. we can compare the distributions of the two samples by looking at the general shape of each as they lie next to one another. This display also allows us to very easily find the 5 summary statistics to draw a box and whisker graph an even more informative graphical display and comparison: 17 A 16 15.9 B 15 15.1 15.3 14 13 1 14.15 1.7 13.7 1.45 11 11. 11 By comparing the ranges, (note the outlier as shown in orchard A) inter-quartile ranges, medians, we can see distinct differences in the two orchards. 10 10 * 9.8 9

To Draw the Box and Whisker: n 1 1 MEDIAN: n 0 10. 5 14.0 14.3 14., median is the average of the 10 th and 11 th scores 1.4 1.5 that is for Orchard A 15 and for Orchard B 1. 45 L.Q.: n 0 n 1 1 1 5.5 rounded down 5 4 ie for Orchard A, 5 th score = 1.7 and for Orchard B 5 th score = 11.0 U.Q.: n 0 n 1 1 3 1 4 15.75 rounded up 16 ie for Orchard A, 16 th score = 15.1 and for Orchard B 16 th score = 13.7 Hence box and whisker diagram is drawn as on the previous page. Also, since mean for Orchard A = 13.935 < median, distribution is positively skewed, and mean for Orchard B = 1.31, median, distribution is almost zero skewed. FURTHER STATISTICS CALCULATED: Variance and Standard Error Variance The variance is related to the standard deviation. The following process may help to explain this. (Note this is to SHOW you the process YOU will let your calculator will do it for you!) To calculate the variance: Calculate the mean of the distribution: e.g. x 1,, 3, 4, 5 x1 x... xn 1 3 x x n 5 (add all the scores and divide by the number of scores) Find the difference between each score and the mean: x x x x, etc 1, e.g. (subtract the mean from each individual score in the data set) 4 5 3 1 3, 3 1, 3 3 4 3 1, 5 3 0, Square each of these results: x x, x x, etc e.g. 1 4, 1 1 1, 4 1, 0 0 Sum these squared deviations (differences) x i x e.g. 4 1 0 1 4 10 Calculate the average of the squared deviations (divide by n-1) x i x 10 s e.g.. 5 n 1 4

This is called the variance. In words, it is the average squared deviation from the mean. One further step is required to get the standard deviation. Since the variance is the average squared deviation, simple take the square root of the variance and you have the standard deviation. That is: x1 n x 1 s e.g..5 1. 5811 Your calculator will take you straight to the standard deviation. If you need the variance, get the s.d. and then just square that answer. Standard Error This is the standard deviation of the sampling distribution. Roughly, it is the average difference between the sample statistic and the population parameter. Each sample statistic will have a standard error, and a formula to go with it. Following from the above, the sample statistic would be the mean, and the corresponding standard error would be the standard error s of the mean. Calculate the standard error of the mean as s. e.. That is, get the standard n deviation of the sample and divide it by the square root of the sample size. You will see this referred to as s.e, sem or s.e.m. Learn to recognise it in all its forms. 1.5811 To calculate the standard error in the above worked example, s. e. 0. 7071 5 PRACTICE QUESTIONS FOR DESCRIPTIVE STATISTICS 1. An auditor for Abbott & Kasteler found billing errors in invoices. In some cases the customer was billed too much (+ values) and in other cases the customer was billed too little (- values). Customer Size of error ($) A Simpson 3.10 Lea & Kean -45.00 Machine Assoc. 66.00 Fort Systems.53 Tension Mills -8.10 Bridge Inc. -51.5 Overland Toy 1.10 Valley Run 18.00 Down Kingston 1.00 Forty Nine South -31.00 P J Smith.10 For these data calculate: (a) Mean (b) Variance (c) Range (d) Median (e) Lower quartile (f) Upper quartile (g) Standard error of the mean

. The glorious Canterbury Crusaders Super 1 rugby team won all 1 matches to reach the final (which they also won!) Here are the winning margins (i.e., the number of points they won each game by) for the 1 games leading to the finals: 7 19 7 1 34 0 3 7 8 77 11 For these data points, calculate the following: (a) Mean (b) Variance (c) Mode (d) Range (e) Median (f) Lower quartile (g) Inter Quartile Range (IQR (h) Standard error of the mean 3. The starting incomes ($000 s) for 11 recent BCM graduates are shown: 0.5 19. 1.0 31.5 9.7 19. 7.3.8 18.5 35.8 30.5 For these data points, calculate the following: (a) Mean (b) Variance (c) Mode (d) Range (e) Median (f) Lower quartile (g) Inter Quartile Range (IQR (h) Calculate whether the upper value is an outlier Use the following information for the next two questions. A study of air pollution in a New Zealand city yielded the following daily readings of sulphur dioxide concentrations. The stem and leaf display below details the concentrations for 0 consecutive days. n=0 stem unit = 10.00 0 3 3 4 5 6 1 0 3 4 6 6 7 8 9 3 6 8 3 1 6 4 4 4. Calculate the mean, median and upper quartile. 5. The scientist is concerned that the highest value could be an outlier. Calculate whether it is. 6. The universe or totality of items or things under consideration is called: A a sample B a population C a parameter D a statistic 7. A summary measure that is computed to describe a characteristic from only a sample of the population is called: A a census. B the scientific method C a parameter D a statistic

8. Which of the following is a continuous quantitative variable? A the colour of a student's eyes. B the number of employees of an insurance company. C the amount of milk produced by a cow in one 4-hour period. D the number of litres of milk sold at the local supermarket yesterday. 9. To monitor campus security, the campus security guards are taking a survey of the number of students in a parking lot each 30 minutes of a 4-hour period. If X is the number of students in the lot each period of time, then X is an example of: A a categorical random variable. C a continuous random variable. B a discrete random variable. D a statistic. 10. The classification of student status (e.g. 1st year, nd year, 3rd year, postgraduate) is an example of: A a categorical random variable. C a continuous random variable. B a discrete random variable. D a parameter. 11. Which of the following statistics is not a measure of central tendency? A mean B median C mode. D Q3 (upper quartile) 1. Health care issues are receiving much attention in both academic and political arenas. A sociologist recently conducted a survey of citizens over 60 years of age whose income is too high to qualify for a community services card. The ages of 5 senior citizens without a community services card were as follows: 60 61 6 63 64 65 66 68 68 69 70 73 73 74 75 76 76 81 81 8 86 87 89 90 9 Using the above data, find the median age of the senior citizens who do not qualify for a community services card. A 73.04 B 73 C 68 D 13 13. Which of the following is most likely a population as opposed to a sample? A respondents to a newspaper survey B every third person to arrive at the bank C the first five students completing an D registered voters in a county assignment 14. Which of the following statements about the median is NOT true? A It is more affected by extreme values than the mean. B It is a measure of central tendency. C It is equal to Q. D It is equal to the mode in bell-shaped "normal" distributions.

15. A survey was conducted to determine how people rated the quality of programming available on television. Respondents were asked to rate the overall quality from 0 (no quality at all) to 100 (extremely good quality). The Stem-and-Leaf display of the data is shown below. Stem Leaves 3 4 4 03478999 5 011345 6 1566 7 01 8 9 Referring to the Stem-and-Leaf Plot above, what percentage of the respondents rated overall television quality with a rating of 80 or above? A 0% B 4% C 96% D 50% 16. The smaller the spread of data around the mean, A the smaller the interquartile range B the smaller the standard deviation C The smaller the coefficient of variation D All the above 17. Which of the following is NOT sensitive to extreme values? A The range B The mean C The interquartile range D The coefficient of variation 18. The vice-chancellor of a university was concerned about alcohol abuse on campus and wanted to find out the proportion of students at his university who visited campus bars every weekend. His advisor took a random sample of 50 students. The proportion of students in the sample who visited campus bars every weekend is an example of a: A discrete random variable B statistic C parameter D categorical random variable 19. In a right-skewed distribution the: A median equals the mean B mean is less than the median C mean is greater than the median D mode is less than the mean

0. For the distribution drawn here, identify the mean, median and mode C B A A A= Mean, B= median, C= mode B A= Median, B= mean, C= mode C A= Mode, B= median, C= mean D A= Mean, B= mode, C= median Use the following information to answer the NEXT TWO questions. The Lincoln University Students Association wants to estimate the average amount of money spent by students on food each week. The weekly food expenditure from a sample of 7 students is shown: 1. What is the mean?. What is the variance? $50 $35 $45 $60 $0 $45 $30 A $85.00 B $45.00 C $40.71 D $47.50 A $13.36 B $178.58 C $1.37 D $85.00 Solutions to PRACTICE QUESTIONS FOR DESCRIPTIVE STATISTICS 1. Data in order: 66 3.1 1 18 1.1.53.1-8.1-31 -45-51.5 Answers: a 1.68 b 1195.6 c 117.5 d.53 e -31 f 1 g 10.45. Data in order: 77 34 8 0 19 11 7 7 7 3 1 Answers: a 18 b 456.7 c 7 d 76 e 9 f 3 g 5 h 6.17 3. Data in order: 35.8 31.5 30.5 9.7 7.3.8 1 0.5 19. 19. 18.5 Answers: a 5.1 b 36.7 c 19. d 17.3 e.8 f 19. g 11.3 h 35.8 is not an outlier

4. Mean = 17.7, median = 16.5; Upper Quartile = 6 5. 44.4 is not an outlier Solutions to questions 6-6. B 7. D 8. C 9. B 10. A 11. D 1. B 13. D 14. A 15. B 16. D 17. C 18. B 19. C 0. C 1. C. B