Frequency Distributions

Similar documents
Ms Nurazrin Jupri. Frequency Distributions

Applied Statistics for the Behavioral Sciences

Chapter 2 Describing, Exploring, and Comparing Data

Chapter 2: Frequency Distributions

Chapter 2 - Graphical Summaries of Data

Section 2-2 Frequency Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc

Measures of Central Tendency

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Test Bank for Privitera, Statistics for the Behavioral Sciences

UNIT 15 GRAPHICAL PRESENTATION OF DATA-I

Averages and Variation

Chapter 2: Understanding Data Distributions with Tables and Graphs

Chapter Two: Descriptive Methods 1/50

Downloaded from

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

CHAPTER 3: Data Description

Measures of Central Tendency. A measure of central tendency is a value used to represent the typical or average value in a data set.

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

Measures of Central Tendency

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

MAT 142 College Mathematics. Module ST. Statistics. Terri Miller revised July 14, 2015

LESSON 3: CENTRAL TENDENCY

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Overview. Frequency Distributions. Chapter 2 Summarizing & Graphing Data. Descriptive Statistics. Inferential Statistics. Frequency Distribution

Chapter 1. Looking at Data-Distribution

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

This chapter will show how to organize data and then construct appropriate graphs to represent the data in a concise, easy-to-understand form.

Chapter 3 - Displaying and Summarizing Quantitative Data

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- #

Visualizing Data: Freq. Tables, Histograms

2.1: Frequency Distributions

Chapter 2. Frequency distribution. Summarizing and Graphing Data

Lecture Series on Statistics -HSTC. Frequency Graphs " Dr. Bijaya Bhusan Nanda, Ph. D. (Stat.)

AND NUMERICAL SUMMARIES. Chapter 2

Statistics. MAT 142 College Mathematics. Module ST. Terri Miller revised December 13, Population, Sample, and Data Basic Terms.

Descriptive Statistics Descriptive statistics & pictorial representations of experimental data.

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Chapter 2 Organizing and Graphing Data. 2.1 Organizing and Graphing Qualitative Data

Univariate Statistics Summary

MATH 117 Statistical Methods for Management I Chapter Two

1. To condense data in a single value. 2. To facilitate comparisons between data.

Graphical Presentation for Statistical Data (Relevant to AAT Examination Paper 4: Business Economics and Financial Mathematics) Introduction

Chapter 5snow year.notebook March 15, 2018

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Table of Contents (As covered from textbook)

Name Date Types of Graphs and Creating Graphs Notes

Maths Class 9 Notes for Statistics

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Chapter 6: DESCRIPTIVE STATISTICS

STP 226 ELEMENTARY STATISTICS NOTES

Week 2: Frequency distributions

Raw Data is data before it has been arranged in a useful manner or analyzed using statistical techniques.

Basic Statistical Terms and Definitions

MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation

IT 403 Practice Problems (1-2) Answers

Section 3.1 Shapes of Distributions MDM4U Jensen

CHAPTER 2: SAMPLING AND DATA

Use of GeoGebra in teaching about central tendency and spread variability

JUST THE MATHS UNIT NUMBER STATISTICS 1 (The presentation of data) A.J.Hobson

Chapter 2: Graphical Summaries of Data 2.1 Graphical Summaries for Qualitative Data. Frequency: Frequency distribution:

Organizing and Summarizing Data

Name: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution

The first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies.

Chapter 2: The Normal Distribution

STA 570 Spring Lecture 5 Tuesday, Feb 1

STP 226 ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES

TMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS

ECLT 5810 Data Preprocessing. Prof. Wai Lam

Density Curve (p52) Density curve is a curve that - is always on or above the horizontal axis.

Chapter 2 - Frequency Distributions and Graphs

UNIT 1A EXPLORING UNIVARIATE DATA

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

+ Statistical Methods in

1.3 Graphical Summaries of Data

MAT 110 WORKSHOP. Updated Fall 2018

Unit 7 Statistics. AFM Mrs. Valentine. 7.1 Samples and Surveys

2.1: Frequency Distributions and Their Graphs

Chpt 3. Data Description. 3-2 Measures of Central Tendency /40

Courtesy :

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

Descriptive Statistics, Standard Deviation and Standard Error

The Normal Distribution & z-scores

At the end of the chapter, you will learn to: Present data in textual form. Construct different types of table and graphs

Chapter 2 Modeling Distributions of Data

The Normal Distribution & z-scores

Describing Data: Frequency Tables, Frequency Distributions, and Graphic Presentation

3.2-Measures of Center

BUSINESS DECISION MAKING. Topic 1 Introduction to Statistical Thinking and Business Decision Making Process; Data Collection and Presentation

3 Graphical Displays of Data

Data Statistics Population. Census Sample Correlation... Statistical & Practical Significance. Qualitative Data Discrete Data Continuous Data

CHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and

- 1 - Class Intervals

No. of blue jelly beans No. of bags

Tabular & Graphical Presentation of data

Math 214 Introductory Statistics Summer Class Notes Sections 3.2, : 1-21 odd 3.3: 7-13, Measures of Central Tendency

Math 14 Lecture Notes Ch. 6.1

Chapter2 Description of samples and populations. 2.1 Introduction.

Type of graph: Explain why you picked this type of graph. Temperature (C) of product formed per minute)

The Normal Distribution & z-scores

Transcription:

Displaying Data

Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data so that it is possible to get a general overview of the results. Remember, this is the goal of descriptive statistical techniques. One method for simplifying and organizing data is to construct a frequency distribution. Frequency describes the number of times or how often a category, score, or range of scores occurs Frequency distribution a summary display for a distribution of data

Frequency Distribution Tables Displaying Data & Central Tendency A simple frequency distribution table consists of two columns - one listing categories on the scale of measurement (x) and another for frequency (f). In the x column, values are listed in order from lowest to highest (or from highest to lowest) For the frequency column, tallies are determined for each value (how often each x value occurs in the data set). These tallies are the frequencies for each x value. The sum of the frequencies should equal N. Frequency distributions can be computed for grouped or ungrouped data

Regular (ungrouped) Frequency Distribution When a frequency distribution table lists all of the individual categories (x values) it is called a regular frequency distribution. Example: x = number of naps toddlers take per day x f 0 8 1 8 2 15 3 8 4 1 N=40

Grouped Frequency Distribution Displaying Data & Central Tendency Sometimes, especially when dealing with continuous variables, a set of scores covers a wide range of values In these situations, a list of all the x values would be too long to allow a simple presentation of the data. In such cases, a grouped frequency distribution table is used. In a grouped table, the x column lists groups of scores, called class intervals, rather than individual values.

Example: x = college course enrollment 34 12 33 11 16 31 17 72 14 17 18 13 17 6 17 24 56 9 10 18 7 10 67 17 83 77 5 9 16 18 28 35 16 30 70 18 15 10 13 12 Sorted values: 5 6 7 9 9 10 10 10 11 12 12 13 13 14 15 16 16 16 17 17 17 17 17 18 18 18 18 24 28 30 31 33 34 35 56 67 70 72 77 83 Grouped frequency distribution x f 0-10 5 10-20 22 20-30 2 30-40 5 40-50 0 50-60 1 60-70 1 70-80 3 80-90 1 N=40 Note: I prefer to use real limits when specifying intervals. Your book uses apparent limits. You can use either.

Grouped Frequency Distributions: Guidelines Sort your data first, it makes building the frequency distributions easier Decide on interval width and number of intervals You should have about 5-20 intervals All intervals should have the same width Your interval width should be a relatively simple number Examples: 10, 5, 2, 1, 0.5 Your set of intervals should cover all observed values and should not overlap I.e., no individual score should fall in more than one interval

Relative Frequencies & Percentages Displaying Data & Central Tendency Often, researchers are more interested in the relative frequency (or proportion) of individuals in each category than in the total number. Remember from the last lecture that we usually measure statistics on samples to infer parameters of populations The relative frequency of a sample approximates the relative frequency of the population, whereas the raw frequency of a sample does not. The relative frequency distribution table lists the proportion (p) for each category: p = f/n. The sum of the p column should equal 1.00. Alternatively, the table could list the percentage of the distribution corresponding to each X value. The percentage is found by multiplying p by 100. The sum of the percentage column should equal 100%.

Relative Frequencies & Percentages Displaying Data & Central Tendency x f p (or f/n) % 0-10 5 0.125 12.5 10-20 22 0.550 55.0 20-30 2 0.050 5.0 30-40 5 0.125 12.5 40-50 0 0.000 0.0 50-60 1 0.025 2.5 60-70 1 0.025 2.5 70-80 3 0.075 7.5 80-90 1 0.025 2.5 Total 40 1 100%

Cumulative Frequencies, Proportions, & Percentages Cumulative frequencies, proportions, or percentages describe the sum of frequencies, proportions, or percentages across a series of intervals Usually refers to bottom-up sum of frequencies E.g., the number of college courses with at least k students

Cumulative Frequencies & Percentages x f Cumulative Freq. % Cumulative % 0-10 5 5 12.5% 12.5% 10-20 22 27 55.0% 67.5% 20-30 2 29 5.0% 72.5% 30-40 5 34 12.5% 85.0% 40-50 0 34 0.0% 85.0% 50-60 1 35 2.5% 87.5% 60-70 1 36 2.5% 90.0% 70-80 3 39 7.5% 97.5% 80-90 1 40 2.5% 100.0%

Frequency Distribution Graphs Displaying Data & Central Tendency In a frequency distribution graph, the score categories (X values) are listed on the X axis and the frequencies are listed on the Y axis. When the score categories consist of numerical scores from an interval or ratio scale, the graph should be either a histogram or a polygon.

Bar Plots & Histograms Bar plots are plots showing the relationship between two variables. Usually, the height of a bar represents the value of a dependent variable when the independent variable consists of nominal or ordinal category labels. Histograms are bar plots in which the rectangles are centered above each score (or class interval) and the heights of the bars correspond to the frequencies (or relative frequencies) of the scores. The widths of bars should extend to the real limits of the class intervals, so that adjacent bars touch. Note: Proper histograms actually represent frequencies in terms of the area rather than the height of bars, but we won t worry about that distinction in this course

Bar Plot Example: M&Ms Colors Displaying Data & Central Tendency x f brown 14 red 14 blue 10 orange 7 green 6 yellow 5 n=56

Histogram Example: Course Enrollment x f 0-10 5 10-20 22 20-30 2 30-40 5 40-50 0 50-60 1 60-70 1 70-80 3 80-90 1 N=40

Line Plots & Frequency Polygons Displaying Data & Central Tendency Line plots are plots in which dots (rather than rectangles) are centered above one score in each of a pair of scores, with the height of the dot determined by the second score, and lines are drawn to connect the dots. These are generally used to show the relationship between two quantitative measurements. A frequency polygon is a type of line plot analogous to a histogram, where the heights of the dots correspond to frequencies or relative frequencies of scores or intervals.

Frequency Polygons: Example Displaying Data & Central Tendency

Scatter Plots Displaying Data & Central Tendency A scatter plot (or scatter gram) displays discrete data points (x, y) to summarize the relationship between two variables Height Weight 70 150 67 140 72 180 75 190 68 145 69 150 71.5 164 71 140 72 142 69 136 67 123 68 155 66 140 72 145 73.5 160 73 190 69 155 73 165 72 150 74 190

Theoretical Distributions, Probability Densities & Smooth Curves If the scores in the population are continuous variables, then the theoretical distributions describing them will often be depicted as smooth curves Examples of this include the normal distribution (i.e., the bell curve ) as well as most of the test statistic distributions that we will deal with in this course (e.g., the t distribution, the F distribution, the chi-square distribution) The smooth curves represent the expectation that in a large population, relative frequencies should change smoothly as a function of a continuous variable. These smooth curves actually represent probability densities, which are related to relative frequencies

Displaying Data & Central Tendency

Displaying Data & Central Tendency

Displaying Data & Central Tendency

Displaying Data & Central Tendency

Displaying Data & Central Tendency

Frequency & Probability Distribution Graphs Frequency & probability distribution graphs are useful because they show the entire set of scores. At a glance, you can determine the highest score, the lowest score, and where the scores are centered. The graph also shows whether the scores are clustered together or scattered over a wide range.

Distribution Shape A graph shows the shape of the distribution. A distribution is symmetrical if the left side of the graph is (roughly) a mirror image of the right side. One example of a symmetrical distribution is the bell-shaped normal distribution. On the other hand, distributions are skewed when scores pile up on one side of the distribution, leaving a "tail" of a few extreme values on the other side.

Distribution Shape In a positively skewed distribution, the scores tend to pile up on the left side of the distribution with the tail tapering off to the right. In a negatively skewed distribution, the scores tend to pile up on the right side and the tail points to the left. A unimodal distribution has one peak A bimodal (multimodal) distribution has two (multiple) peaks

Displaying Data & Central Tendency

Central Tendency

Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately describes the center of the distribution and represents the entire distribution of scores. The goal of central tendency is to identify the single value that is the best representative for the entire set of data.

Central Tendency By identifying the "average score," central tendency allows researchers to summarize or condense a large set of data into a single value. Thus, central tendency serves as a descriptive statistic because it allows researchers to describe or present a set of data in a very simplified, concise form. In addition, it is possible to compare two (or more) sets of data by simply comparing the average score (central tendency) for one set versus the average score for another set.

The Mean, the Median, and the Mode No single procedure always produces a good, representative value. Therefore, researchers have developed three commonly used techniques for measuring central tendency: the mean, the median, and the mode.

Displaying Data & Central Tendency

The Mean The mean is the most commonly used measure of central tendency. The population mean is denoted by: The sample mean is denoted by: M or X Computation of the mean requires scores that are numerical values measured on an interval or ratio scale. The mean is obtained by computing the sum, or total, for the entire set of scores, then dividing this sum by the number of scores. 1 N x

Displaying Data & Central Tendency

Changing the Mean Because the calculation of the mean involves every score in the distribution, changing the value of any score will change the value of the mean. Modifying a distribution by discarding scores or by adding new scores will usually change the value of the mean. To determine how the mean will be affected for any specific situation you must consider: 1) how the number of scores is affected, and 2) how the sum of the scores is affected.

Changing the Mean If a constant value is added to every score in a distribution, then the same constant value is added to the mean. Also, if every score is multiplied by a constant value, then the mean is also multiplied by the same constant value.

The Weighted Mean When combining data from samples with different sizes, you can compute the combined mean from the sample means using the following formula: 1 MW nm, where N n N For example, consider the following samples: Sample 1: x = {6,2,6,8,3}; M = 5.0; n = 5 Sample 2: x = {3,6,13,4}; M = 6.5; n = 4 Sample 3: x = {3,4,2}; M = 3.0; n = 3

When the Mean Won t Work Displaying Data & Central Tendency Although the mean is the most commonly used measure of central tendency, there are situations where the mean does not provide a good, representative value, or where you cannot compute a mean at all. When a distribution contains a few extreme scores (or is very skewed), the mean will be pulled toward the extremes. In these cases, the mean will not provide a "central" value. With data from a nominal scale it is impossible to compute a mean, and when data are measured on an ordinal scale (ranks), it is usually inappropriate to compute a mean. Thus, the mean does not always work as a measure of central tendency and it is necessary to have alternative procedures available.

The Median If the scores in a distribution are listed in order from smallest to largest, the median is defined as the midpoint of the list. This means that computation of the median requires scores that can be placed in rank order (i.e., ordinal, interval, or ratio) The median divides the scores so that 50% of the scores in the distribution have values that are equal to or less than the median. Usually, the median can be found by a simple counting procedure: 1. With an odd number of scores, list the values in order, and the median is the middle score in the list. 2. With an even number of scores, list the values in order, and the median is half-way between the middle two scores.

The Median One advantage of the median is that it is relatively unaffected by extreme scores. Thus, the median tends to stay in the "center" of the distribution even when there are a few extreme scores or when the distribution is very skewed. In these situations, the median serves as a good alternative to the mean.

The Mode The mode is defined as the most frequently occurring category or score in the distribution. In a frequency distribution graph, the mode is the category or score corresponding to the peak or high point of the distribution. The mode can be determined for data measured on any scale of measurement: nominal, ordinal, interval, or ratio. The mode is the only measure of central tendency that can be used for data measured on a nominal scale.

Bimodal Distributions It is possible for a distribution to have more than one mode. Such a distribution is called bimodal. (Note that a distribution can have only one mean and only one median.) In addition, the term "mode" is often used to describe a peak in a distribution that is not really the highest point. Thus, a distribution may have a major mode at the highest peak and a minor mode at a secondary peak in a different location.

Displaying Data & Central Tendency

Central Tendency and the Shape of the Distribution Because the mean, the median, and the mode are all measuring central tendency, the three measures are often systematically related to each other. In a symmetrical distribution, for example, the mean and median will always be equal.

Central Tendency and the Shape of the Distribution If a symmetrical distribution has only one mode, the mode, mean, and median will all have the same value. In a skewed distribution, the mode will be located at the peak on one side and the mean usually will be displaced toward the tail on the other side. The median is usually located between the mean and the mode.

Central Tendency and the Shape of the Distribution