Chapter 2 - Graphical Summaries of Data

Similar documents
Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Chapter 2: Graphical Summaries of Data 2.1 Graphical Summaries for Qualitative Data. Frequency: Frequency distribution:

Section 2-2 Frequency Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- #

Overview. Frequency Distributions. Chapter 2 Summarizing & Graphing Data. Descriptive Statistics. Inferential Statistics. Frequency Distribution

Chapter 2 Organizing and Graphing Data. 2.1 Organizing and Graphing Qualitative Data

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

2.1: Frequency Distributions

Name Date Types of Graphs and Creating Graphs Notes

2.3 Organizing Quantitative Data

Frequency Distributions

Organizing and Summarizing Data

Chapter 3 - Displaying and Summarizing Quantitative Data

STP 226 ELEMENTARY STATISTICS NOTES

This chapter will show how to organize data and then construct appropriate graphs to represent the data in a concise, easy-to-understand form.

+ Statistical Methods in

Chapter 2 Describing, Exploring, and Comparing Data

TMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS

2.1: Frequency Distributions and Their Graphs

Elementary Statistics

Test Bank for Privitera, Statistics for the Behavioral Sciences

MATH1635, Statistics (2)

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data

Section 1.2. Displaying Quantitative Data with Graphs. Mrs. Daniel AP Stats 8/22/2013. Dotplots. How to Make a Dotplot. Mrs. Daniel AP Statistics

Ms Nurazrin Jupri. Frequency Distributions

Applied Statistics for the Behavioral Sciences

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

B. Graphing Representation of Data

Displaying Distributions - Quantitative Variables

AND NUMERICAL SUMMARIES. Chapter 2

Frequency Distributions and Graphs

At the end of the chapter, you will learn to: Present data in textual form. Construct different types of table and graphs

CHAPTER 2: SAMPLING AND DATA

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 2: Understanding Data Distributions with Tables and Graphs

Prob and Stats, Sep 4

Chapter 2. Frequency Distributions and Graphs. Bluman, Chapter 2

Describing Data: Frequency Tables, Frequency Distributions, and Graphic Presentation

Raw Data is data before it has been arranged in a useful manner or analyzed using statistical techniques.

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

UNIT 15 GRAPHICAL PRESENTATION OF DATA-I

1.3 Graphical Summaries of Data

MATH 117 Statistical Methods for Management I Chapter Two

a. divided by the. 1) Always round!! a) Even if class width comes out to a, go up one.

Courtesy :

Chapter 2. Frequency distribution. Summarizing and Graphing Data

Univariate Statistics Summary

Chapter 2: Descriptive Statistics

Graphical Presentation for Statistical Data (Relevant to AAT Examination Paper 4: Business Economics and Financial Mathematics) Introduction

MATH& 146 Lesson 10. Section 1.6 Graphing Numerical Data

Chapter 5snow year.notebook March 15, 2018

12. A(n) is the number of times an item or number occurs in a data set.

UNIT 1A EXPLORING UNIVARIATE DATA

- 1 - Class Intervals

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.

Math Tech IIII, Sep 14

Chapter 2: Descriptive Statistics (Part 1)

Raw Data. Statistics 1/8/2016. Relative Frequency Distribution. Frequency Distributions for Qualitative Data

Lecture Series on Statistics -HSTC. Frequency Graphs " Dr. Bijaya Bhusan Nanda, Ph. D. (Stat.)

MATH NATION SECTION 9 H.M.H. RESOURCES

Making Science Graphs and Interpreting Data

Measures of Central Tendency

Lesson 18-1 Lesson Lesson 18-1 Lesson Lesson 18-2 Lesson 18-2

BUSINESS DECISION MAKING. Topic 1 Introduction to Statistical Thinking and Business Decision Making Process; Data Collection and Presentation

Spell out your full name (first, middle and last)

CHAPTER 2. Objectives. Frequency Distributions and Graphs. Basic Vocabulary. Introduction. Organise data using frequency distributions.

2. The histogram. class limits class boundaries frequency cumulative frequency

No. of blue jelly beans No. of bags

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

Chapter 1. Looking at Data-Distribution

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

How individual data points are positioned within a data set.

1.2. Pictorial and Tabular Methods in Descriptive Statistics

Table of Contents (As covered from textbook)

Visualizing Data: Freq. Tables, Histograms

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

STAT STATISTICAL METHODS. Statistics: The science of using data to make decisions and draw conclusions

Chapter 2 - Frequency Distributions and Graphs

CHAPTER 2 DESCRIPTIVE STATISTICS

Chapter Two: Descriptive Methods 1/50

Middle Years Data Analysis Display Methods

Chapter 2 - Descriptive Statistics

Chapter 2: Frequency Distributions

Downloaded from

Chapter 2 Descriptive Statistics I: Tabular and Graphical Presentations. Learning objectives

Slides Prepared by JOHN S. LOUCKS St. Edward s s University Thomson/South-Western. Slide

CHAPTER 3: Data Description

Parents Names Mom Cell/Work # Dad Cell/Work # Parent List the Math Courses you have taken and the grade you received 1 st 2 nd 3 rd 4th

Round each observation to the nearest tenth of a cent and draw a stem and leaf plot.

STA 570 Spring Lecture 5 Tuesday, Feb 1

JUST THE MATHS UNIT NUMBER STATISTICS 1 (The presentation of data) A.J.Hobson

Aston Hall s A-Z of mathematical terms

Chapters 1.5 and 2.5 Statistics: Collecting and Displaying Data

AP Statistics Summer Assignment:

Select Cases. Select Cases GRAPHS. The Select Cases command excludes from further. selection criteria. Select Use filter variables

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

Slide Copyright 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION. Chapter 13. Statistics Sampling Techniques

Chapter 2 Descriptive Statistics. Tabular and Graphical Presentations

Transcription:

Chapter 2 - Graphical Summaries of Data Data recorded in the sequence in which they are collected and before they are processed or ranked are called raw data. Raw data is often difficult to make sense out of (especially if you have very big data sets) so we prefer to organize and summarize our data. 2.1 - Graphical Summaries for Qualitative Data A frequency distribution is a table that lists all categories and the number of elements that belong to each of the categories. Reasons for Constructing Frequency Distributions: 1. Large data sets can be summarized. 2. We can gain some insight into the nature of data. 3. We have a basis for constructing important graphs. The relative frequency of a category is the proportion of items in the category. Frequency of that category Relative frequency of a category = P ercentage = ( R elative frequency) 100 Sum of all frequencies A relative frequency distribution is a table that represents the relative frequency of each category (often times we just add another column with this information in the same table as our frequency distribution). Practice: Suppose we ask 30 people if they ever suffer from insomnia. The responses are classified as Never (N), Sometimes (S), Often (O), Always (A), and are recorded below. S O S A N N S O A O N O O S S S S O N S S N A O S S O N N N 1. Construct a frequency distribution table for the data above. Compute the relative frequencies and percentages for all categories. Insomnia Occurrence Tally Frequency Relative Frequency Percentage Sum = Sum = Sum =

2. What proportion of the people in this sample suffer from insomnia Often or Always? A graph made of bars whose heights represent the frequencies or relative frequencies of respective categories is called a bar graph. The bars should be of equal width and not touch each other. Make sure to always label both the horizontal and vertical axis. 3. Draw a bar graph for the frequency distribution of the Insomnia Occurrence data. 4. How would a bar graph on the relative frequency distribution of the Insomnia Occurrence data differ from the one you just constructed? Bar graphs can be constructed vertically (more common) or horizontally. Sometimes we want to compare two or more bar graphs that have the same categories, so we can construct several bar graphs on the same axes, with corresponding bars next to each other. This is called a side-by-side bar graph. See example on the right. A pie chart is a circle divided into sectors, each one representing a different category and its size proportional to the proportion size (relative frequency) of that category. Each sector should be labeled with its category and its relative frequency expressed as a percentage. Angle = (Relative frequency) 360 4. Construct a pie chart for the distribution of the Insomnia Occurrence data.

2.2 - Organizing and Graphing Quantitative Data Since quantitative data doesn t have natural categories, we divide the data into classes. The classes are intervals of equal width that cover all the values in the data set. A Frequency Distribution for Quantitative data lists all the classes and the number of values that belong to each class. Data presented in the form of a frequency distribution are called grouped data. Class limits should be expressed with the same number of decimal places as the data. Make sure that classes do not overlap so that each of the original values must belong to exactly one class. Try to use the same width for all classes, although it is sometimes impossible to avoid open-ended intervals, such as 65 years or older. Make sure to include all classes, even those with a frequency of zero. Class width = Lower limit of a class Lower limit of next class Class midpoint = Lower limit of a class + Lower limit of next class 2 Practice: Find the class width and class midpoints for each category in the frequency distribution above.

Constructing a Frequency Distribution: 1. Decide on the number of classes (should be between 5 and 20). The bigger the data set, the more classes you should choose. Rather choose too many classes than too few, but you want to have some large frequencies in some of the classes. (One way to help you choose number of classes is Sturge s formula: # of classes = 1 + 3.3 log n, where n is the number of observations.) 2. Calculate approximate class width. Round to a convenient number. Largest value - Smallest value Class width = Number of classes that you want 3. Starting point: Begin by choosing a lower limit of the first class, which should be a convenient number less than or equal to your smallest value. 4. Using the lower limit of the first class and the class width, proceed to list all the classes. Lower limit of one class + Class width = Lower limit for the next class 5. Count the number of observations in each class (possibly by tallying) and construct a frequency distribution. ex. How many classes does the frequency distribution have? What is the class width? Is it clear where a value belongs? How big was the sample of pennies? Could we have omitted the classes with no frequencies, or do those classes tell us something important? Randomly Selected Pennies Weights of Pennies in grams Frequency 2.40-2.49 18 2.50-2.59 59 2.60-2.69 0 2.70-2.79 0 2.80-2.89 0 2.90-2.99 2 3.00-3.09 25 3.10-3.19 8

Histograms A histogram is a graph in which classes are marked on the horizontal axis and the frequencies, relative frequencies, or percentages are marked on the vertical axis. The frequencies, relative frequencies, or percentages are represented by the heights of the bars. In a histogram, the bars are drawn adjacent to each other (that is, there is no space in-between them as in a bar graph). Frequency Histograms vs. Relative Frequency Histograms or Percentage Histograms A polygon is a graph formed by joining the midpoints of the tops of successive bars in a histogram with straight lines. Note that we create one extra class at each end, both of which have zero frequency.

An ogive is a different kind of polygon. A frequency ogive plots cumulative frequencies, and a relative frequency ogive plots cumulative relative frequencies. The cumulative frequency of a class is the sum of the frequency for that class and all previous classes (with lower limits). Cumulative frequency Cumulative relative frequency = Sum of all frequencies Shapes of Histograms As the number of classes is increased, the polygon eventually becomes a smooth curve A high point of a histogram is referred to as a mode. A histogram is unimodal if it has only one mode, and bimodal if it has two distinct modes. Symmetric. Unimodal. Bell shaped. Symmetric. Bimodal. Skewed right. Skewed left. Uniform or Rectangular.

Practice: The following data give the numbers of computer keyboards assembled at the Twentieth Century Electronics Company for a sample of 25 days. 45 52 48 41 56 46 44 42 48 53 51 53 51 48 46 43 52 50 54 47 44 47 50 49 52 a. Make the frequency distribution table for these data. b. Calculate the relative frequencies for all classes. c. Construct a histogram for the relative frequency distribution. d. Construct a polygon for the relative frequency distribution. e. Construct a relative frequency ogive

2.3 More Graphs for Quantitative Data A stem-and-leaf plot allows for a nice overview of quantitative data without losing information on individual observations. Each value is divided into a stem and a leaf as in the following example, where the rightmost digit is the leaf, and the remaining digits form the stem.. Data: 43 58 41 65 49 52 58 60 49 Divide up stem and leaf: 4 3 5 8 4 1 6 5 4 9 5 2 5 8 6 0 4 9 Stem-and-leaf plot: Stem Leaves 4 1 3 9 9 5 2 8 8 6 0 5 ex. List the data represented in the following display. 7 5 9 8 0 2 6 7 7 9 1 7 8 10 2 6 ex. Prepare a stem-and-leaf plot for the following data. In your final display, you should arrange the leaves for each stem in increasing order. 94 103 79 99 114 89 81 86 81 93 100 96 75 90 88 107 132 95 ex. Prepare a stem-and-leaf display for the following data. 1020 1008 1021 1013 1001 1016 993 1033 1029

A split stem-and-leaf plot is when we use two or more rows for each stem. This can be an advantage if some stems have a big number of leaves. ex. When two data sets have values similar enough so that the same stems can be used, their shapes can be compared with a back-to-back stem-and-leaf plot. In a back-to-back stem-and-leaf plot, the stems go down the middle. The leaves for one of the data sets go off to the right, and the leaves for the other go off to the left. Consider the following course averages from an English class and a History class. The classes can be compared with a back-to-back stem-and-leaf plot. What can you conclude about the averages of these two classes? A time-series plot may be used when the data consist of values of a variable measured at different points in time. In a time-series plot, the horizontal axis represents time, and the vertical axis represents the value of the variable we are measuring. The values of the variable are plotted at each of the times, then the points are connected with straight lines.

A dotplot consists of a graph in which each data value is plotted as a point (or dot) along a scale of values. Dots that represent equal values are stacked. Dotplots are helpful in getting a good overview of the data, including finding clusters of data, as well as outliers. Clusters are areas where the values are more concentrated. Outliers are data values that are extremely large or small compared to the rest of the values in the data set. Dotplots are also useful for comparing two or more datasets, by creating a dotplot for each dataset, on the same scale, and then place these sets on top of each other. We call this stacked dotplots. Example borrowed from the Triola Stats book The example below is a dotplot display of the ages of actresses at the time they won Academy Award Oscars. Ex. Make a dotplot of ages of actors, and make a stacked dotplot to compare the two data sets.

2.4 - Graphs Can Be Misleading Statistical graphs, when properly used, are powerful forms of communication. Unfortunately, when graphs are improperly used, they can misrepresent the data, and lead people to draw incorrect conclusions. Below are two common forms of misrepresentation: Incorrect position of the vertical scale Incorrect sizing of graphical images Incorrect Position of the Vertical Scale With charts or plots that represent how much or how many of something, it may be misleading if the baseline is not at zero. The Area Principle When amounts are compared by constructing an image for each amount, the areas of the images must be proportional to the amounts. For example, if one amount is twice as much as another, its image should have twice as much area as the other image. When the Area Principle is violated, the images give a misleading impression of the data. Which of the images below best represent the difference of Jet Fuel cost? Year Cost of Jet Fuel per Gallon 2000 $0.90 2008 $3.16