Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.
|
|
- Violet McDaniel
- 5 years ago
- Views:
Transcription
1 Summary Statistics
2 Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting Collection Methods Types Dimensionality Sparsity Resolution Summary stats. Visualizations Multidimensional analysis Missing data Noise and artifacts Outliers Inconsistencies
3 Basic Statistical Descriptions of Motivation: to better understand the data Freshman Sophomore Juniors Seniors Bedtime Times Sick Bedtime Times Sick Bedtime Times Sick Bedtime Times Sick
4 Basic Statistical Descriptions of Motivation: to better understand the data Freshman Sophomore Juniors Seniors Bedtime Times Sick Bedtime Times Sick Bedtime Times Sick Bedtime Times Sick Mean SD
5 Basic Statistical Descriptions of characteristics Central Tendency: Mean, median, mode Spread : Variance, standard deviation, max, min, Z-score 5
6 Types of Attributes Quantitative Discrete/ Nominal Continuous Binary Ordinal Ratio Interval Qualitative 6
7 Types of Attributes Quantitative Discrete/ Nominal Continuous Binary Ordinal Ratio Interval Qualitative 7
8 Frequency Conditions Pre(n=197) Post(n=195) Total(n=392) Hypertension 94 (47.7) 96 (49.2) 190 (48.5) Diabetes 44 (22.3) 47 (24.1) 91 (23.2) Coronary Artery Disease Frequency 59 (30.0) 35 (18.0) 94 (24.0) Asthma 22 (11.2) 17 (08.7) 39 (10.0) None 54 (27.4) 64 (32.8) 118 (30.1) Valid for all attribute types 8
9 Central Tendency: Mode Mode: Value that occurs most frequently in the data Multi-modal Bimodal Trimodal etc. Also valid for all attribute types Conditions Pre(n=197) Post(n=195) Total(n=392) Hypertension 94 (47.7) 96 (49.2) 190 (48.5) Diabetes 44 (22.3) 47 (24.1) 91 (23.2) Coronary Artery Disease 59 (30.0) 35 (18.0) 94 (24.0) Asthma 22 (11.2) 17 (08.7) 39 (10.0) None 54 (27.4) 64 (32.8) 118 (30.1) 9
10 Percentiles Given an ordinal or continuous feature x and a number p between 0 and 100, the p th percentile is a value x p of x such that p% of the observed values of x are less than x p. 10
11 Central Tendency: Mean and Median Mean Highly sensitive to outliers Sorting not needed Many variations Trimmed mean: Chopping off extreme values Often a percentage Geometric and Harmonic Means Only defined for interval and ratio data Exception of binary data 11 n x ҧ = 1 n i=1 x i Median Less sensitive to outliers (x) is sorted, then median can be identified as the as follows Odd value of X at position (n+1)/2 Even - average of values for X at position n/2 and (n+1)/2 n 1 x[ ] ( N Odd) 2 Median n n 1 x[ ] x[ ] 2 2 ( N Even) 2 Median is defined for ordinal data Odd computed the same way Even commonly the smaller value (n/2) is selected
12 Central Tendency: A Deeper Consideration of Means Geometric Mean Used when the values are multiplicative rather than additive When data are skewed and transformed with a logarithm Pros Scale invariant Cons Careful about zeros! With multiple scales: units are lost n n i=1 x i 12
13 Central Tendency: A Deeper Consideration of Means Patient 1 Patient 2 4.5/5 3/5 68/100 75/100 Arithmetic Mean Patient 1 Patient 2 ( ) 2 = vs (3 + 75) 2 = 39 13
14 Central Tendency: A Deeper Consideration of Means Patient 1 Patient 2 4.5/5 3/5 68/100 75/100 Arithmetic Mean Patient 1 Patient 2 ( ) 2 = vs (3 + 75) 2 = 39 Arithmetic Mean (Scaled) Patient 1 Patient * 20 = 90 3 * 20 = 60 ( ) 2 = 79 vs ( ) 2 =
15 Central Tendency: A Deeper Consideration of Means Patient 1 Patient 2 4.5/5 3/5 68/100 75/100 Geometric Mean Patient 1 Patient 2 sqrt(4.5 * 68) = 17.5 vs sqrt (3 * 75) = 15 15
16 Central Tendency: A Deeper Consideration of Means Harmonic Means Commonly associated with rates (of the same quantity/distance) Less commonly used Relations between means: ( σ i=1 n n 1 x i ) 1 arithmetic mean > geometric mean > harmonic mean 16
17 Central Tendency: A Deeper Consideration of Means Suppose an ambulance is 10 miles from a hospital Driving to the hospital they drove 60mph, Returning they drove 30mph What is the average trip speed? = 45 (no) 1 2 ( = ) Other examples: What is the average cost per cell? $480 for cell type A, 30$ per cell $480 for cell type B, 40$ per cell $480 for cell type C, 32$ per cell ( = ) 17
18 Spread: Range / IQR Range: is the difference between the maximum and minimum values. IQR: difference between 75 th and 25 th percentile values Describes central 50% of a distribution regardless of shape 18
19 Spread: Variance / Standard Deviation The variance or standard deviation is the most common measure of the spread of a set of points: Standard deviation s is square root of variance s 2 s 2 = 1 n 1 (x i x) ҧ 2 i=1 n n x ҧ = 1 n i=1 x i Like mean: highly susceptible to outliers: 19 Alternatives include Median Absolute Deviation (MAD) MAD = Median(x i median(x))
20 Coefficient of Variation (CV) Measure of relative spread, in relation to mean of the population As the CV utilizes mean, not valid for ordinal or nominal data types, also typically not valid for interval data Coefficient has no units 20 CV = SD തX (100%) Example: Comparing the variability in shock level to that of blood pressure can be difficult. Shock μ =.69, σ =.20 BP: μ = 138, σ = 26 CV Shock = CV BP = % = % = 18.8
21 Extra: Higher-Order Descriptive Statistics Kurtosis Measures the tail-heaviness of the distribution. Very loosely concerns the likelihood of seeing a value farther from the mean. Extremely dependent on the variance Skewness Measure of the asymmetry of the distribution of a variable. The skew value of a normal distribution is zero, Positive skew value indicates a tail on the right side Negative skew value indicates a tail on the left side 21
22
23 Exploring through Visualization
24 Many Ways To Visualize 24
25 Scatter plot Provides a first look at data to see clusters of points, outliers Each pair of values is a pair of coordinates and plotted as points in the plane Often additional attributes can be displayed by using the size, shape, and color of the markers that represent the objects. 25
26 Scatterplot Matrix Influence of Collection Site and Methods on Postmortem Morphine Concentrations in a Porcine Model 26
27 Strip Plot 27
28 Jitter / Swam Plot horizontal spread is artificially added All Points Are Shown 28
29 Measuring the Dispersion of : Five number summary: min, Q 1, median, Q 3, max Quartiles: Q 1 (25 th percentile), Q 3 (75 th percentile) Inter-quartile range: IQR = Q 3 Q 1 Boxplot: is represented as: 29 i.e., the height Q 1, Q 3, IQR: The ends of the box are at the first and third quartiles of the box is IQR Median (Q 2 ) is marked by a line within the box Whiskers: Two lines outside the box extended to Minimum and Maximum Quartiles & Boxplots
30 Box Plots Address Things Like Is a feature significant? Does the location differ between subgroups? Does the variation differ between subgroups? Are there outliers in the data? More on this next week 30
31 Combining Figure Types 31
32 Graph display of tabulated frequencies, shown as bars Usually shows the percentage for nominal attributes Can be broken down to compare multiple same attributes across multiple classes Bar Chart 32
33 Bar Chart Variations 33
34 What Can Bar Charts Address? What is the highest percentage categories? How are the category values spread out? What is the modality of the data? How do categories compare between groups? 34
35 Histograms Graph display of tabulated frequencies, shown as bars Usually shows the distribution of values of a single variable of objects in each bin. The height of each bar indicates the number of objects. The shape depends on the number of bins. 35
36 What Can Histograms Address? What kind of population distribution do the data come from? Where are the data located? How spread out are the data? Are the data symmetric or skewed? Are there outliers in the data? 36
37 Histograms Often Tell More than Boxplots The two histograms may have the same boxplot The same values for: min, Q1, median, Q3, max But they have rather different data distributions 37
38 Histogram vs Bar Chart 38
39 Histogram vs Bar Chart Between histograms and bar charts Histograms are used to show distributions of variables while bar charts are used to compare variables Histograms plot binned quantitative data while bar charts plot categorical data Bars can be reordered in bar charts but not in histograms 39
40 Odds ratio Odds Plot? The ratio of one class to another as a function of feature values. OddRatio( x, y) i j k p( x y 1) ij p( x y 0) ij Odds of a patient having diabetes given their plasma glucose concentration Plasma glucoseconcentration 1 = patient has diabetes 0 = patient does not have diabetes 40
41 What Can Odds Plots Address? How do feature values affect the probability of occurrence? Is there a threshold for the effect? 41
42 Quantile Plot Displays all of the data (allowing the user to assess both the overall behavior and unusual occurrences) Plots quantile information For a data x i data sorted in increasing order, f i indicates that approximately 100 f i % of the data are below or equal to the value x i 42
43 Quantile-Quantile (Q-Q) Plot Graphs the quantiles of one univariate distribution against the corresponding quantiles of another Example shows unit price of items sold at Branch 1 vs. Branch 2 for each quantile. Unit prices of items sold at Branch 1 tend to be lower than those at Branch 2 43
44 Heatmap Heatmaps visualize data through variations in color. They are useful for: Cross-examining multivariate data Showing variance across multiple variables, Revealing patterns, Displaying similar variables Typically, all the rows are one category (labels displayed on the left or right side) and all the columns are another category (labels displayed on the top or bottom). Cells either contain color categorical data or numerical data, based on a color scale. 44
45 Heatmap 45
46 Bonus: KDE Estimates A major problem with histograms, is that the choice of binning can have a disproportionate effect on the resulting visualization which may lead to different interpretations of the data. 46
47 KDE Estimates Instead we can attempt to estimate the parameters of the underlying distribution with respect to a number of known distribution types Kernel is a type of probability density function (PDF) Essentially, at every point (+), a kernel function is created with the point at its center The PDF is then estimated by adding all of these kernel functions and dividing by the number of data Intuitively, a kernel density estimate is a sum of bumps. A bump is assigned to every point, and the size of the bump represents the probability assigned at the neighborhood of values around that point; thus, if the data set contains 2 points at x = points at x = 0.5 the bump at x = 1.5 is twice as big as the bump at x = 0.5. Not only a visualization, but we can sample from this as well to get new data 47
48 Lots of Kernel Options 48
49 Cleaning 49
ECLT 5810 Data Preprocessing. Prof. Wai Lam
ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate
More informationData Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha
Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking
More informationData can be in the form of numbers, words, measurements, observations or even just descriptions of things.
+ What is Data? Data is a collection of facts. Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. In most cases, data needs to be interpreted and
More informationSTA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures
STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and
More informationSTA 570 Spring Lecture 5 Tuesday, Feb 1
STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row
More informationSummarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester
Summarising Data Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 09/10/2018 Summarising Data Today we will consider Different types of data Appropriate ways to summarise these
More informationSTA Module 2B Organizing Data and Comparing Distributions (Part II)
STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and
More informationSTA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)
STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and
More informationChapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data
Chapter 2 Descriptive Statistics: Organizing, Displaying and Summarizing Data Objectives Student should be able to Organize data Tabulate data into frequency/relative frequency tables Display data graphically
More informationChapter 1. Looking at Data-Distribution
Chapter 1. Looking at Data-Distribution Statistics is the scientific discipline that provides methods to draw right conclusions: 1)Collecting the data 2)Describing the data 3)Drawing the conclusions Raw
More informationAverages and Variation
Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus
More informationVocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.
5-number summary 68-95-99.7 Rule Area principle Bar chart Bimodal Boxplot Case Categorical data Categorical variable Center Changing center and spread Conditional distribution Context Contingency table
More informationLESSON 3: CENTRAL TENDENCY
LESSON 3: CENTRAL TENDENCY Outline Arithmetic mean, median and mode Ungrouped data Grouped data Percentiles, fractiles, and quartiles Ungrouped data Grouped data 1 MEAN Mean is defined as follows: Sum
More informationCHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and
CHAPTER-13 Mining Class Comparisons: Discrimination between DifferentClasses: 13.1 Introduction 13.2 Class Comparison Methods and Implementation 13.3 Presentation of Class Comparison Descriptions 13.4
More informationSTP 226 ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES
STP 6 ELEMENTARY STATISTICS NOTES PART - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES Chapter covered organizing data into tables, and summarizing data with graphical displays. We will now use
More informationStatistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.
Statistical Methods Instructor: Lingsong Zhang 1 Issues before Class Statistical Methods Lingsong Zhang Office: Math 544 Email: lingsong@purdue.edu Phone: 765-494-7913 Office Hour: Monday 1:00 pm - 2:00
More information15 Wyner Statistics Fall 2013
15 Wyner Statistics Fall 2013 CHAPTER THREE: CENTRAL TENDENCY AND VARIATION Summary, Terms, and Objectives The two most important aspects of a numerical data set are its central tendencies and its variation.
More informationMeasures of Dispersion
Measures of Dispersion 6-3 I Will... Find measures of dispersion of sets of data. Find standard deviation and analyze normal distribution. Day 1: Dispersion Vocabulary Measures of Variation (Dispersion
More informationMeasures of Central Tendency. A measure of central tendency is a value used to represent the typical or average value in a data set.
Measures of Central Tendency A measure of central tendency is a value used to represent the typical or average value in a data set. The Mean the sum of all data values divided by the number of values in
More informationChapter 6: DESCRIPTIVE STATISTICS
Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling
More informationChapter 2 Describing, Exploring, and Comparing Data
Slide 1 Chapter 2 Describing, Exploring, and Comparing Data Slide 2 2-1 Overview 2-2 Frequency Distributions 2-3 Visualizing Data 2-4 Measures of Center 2-5 Measures of Variation 2-6 Measures of Relative
More informationGetting to Know Your Data
Chapter 2 Getting to Know Your Data 2.1 Exercises 1. Give three additional commonly used statistical measures (i.e., not illustrated in this chapter) for the characterization of data dispersion, and discuss
More informationCHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.
1 CHAPTER 1 Introduction Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. Variable: Any characteristic of a person or thing that can be expressed
More informationChapter 3 - Displaying and Summarizing Quantitative Data
Chapter 3 - Displaying and Summarizing Quantitative Data 3.1 Graphs for Quantitative Data (LABEL GRAPHS) August 25, 2014 Histogram (p. 44) - Graph that uses bars to represent different frequencies or relative
More informationPrepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.
Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good
More informationChapter2 Description of samples and populations. 2.1 Introduction.
Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that
More informationCHAPTER 3: Data Description
CHAPTER 3: Data Description You ve tabulated and made pretty pictures. Now what numbers do you use to summarize your data? Ch3: Data Description Santorico Page 68 You ll find a link on our website to a
More informationMeasures of Central Tendency
Page of 6 Measures of Central Tendency A measure of central tendency is a value used to represent the typical or average value in a data set. The Mean The sum of all data values divided by the number of
More information2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES
EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 Objectives 2.1 What Are the Types of Data? www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative
More informationTable of Contents (As covered from textbook)
Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression
More informationCHAPTER 2 DESCRIPTIVE STATISTICS
CHAPTER 2 DESCRIPTIVE STATISTICS 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is how the data is spread or distributed over the range of the data values. This is one of
More informationThings you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.
1 2 Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 2. How to construct (in your head!) and interpret confidence intervals.
More informationAND NUMERICAL SUMMARIES. Chapter 2
EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 What Are the Types of Data? 2.1 Objectives www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative
More information10.4 Measures of Central Tendency and Variation
10.4 Measures of Central Tendency and Variation Mode-->The number that occurs most frequently; there can be more than one mode ; if each number appears equally often, then there is no mode at all. (mode
More information10.4 Measures of Central Tendency and Variation
10.4 Measures of Central Tendency and Variation Mode-->The number that occurs most frequently; there can be more than one mode ; if each number appears equally often, then there is no mode at all. (mode
More informationIAT 355 Visual Analytics. Data and Statistical Models. Lyn Bartram
IAT 355 Visual Analytics Data and Statistical Models Lyn Bartram Exploring data Example: US Census People # of people in group Year # 1850 2000 (every decade) Age # 0 90+ Sex (Gender) # Male, female Marital
More informationMath 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency
Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency lowest value + highest value midrange The word average: is very ambiguous and can actually refer to the mean,
More informationData Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures
More informationUnderstanding and Comparing Distributions. Chapter 4
Understanding and Comparing Distributions Chapter 4 Objectives: Boxplot Calculate Outliers Comparing Distributions Timeplot The Big Picture We can answer much more interesting questions about variables
More informationMATH& 146 Lesson 10. Section 1.6 Graphing Numerical Data
MATH& 146 Lesson 10 Section 1.6 Graphing Numerical Data 1 Graphs of Numerical Data One major reason for constructing a graph of numerical data is to display its distribution, or the pattern of variability
More informationLecture 1: Exploratory data analysis
Lecture 1: Exploratory data analysis Statistics 101 Mine Çetinkaya-Rundel January 17, 2012 Announcements Announcements Any questions about the syllabus? If you sent me your gmail address your RStudio account
More informationAP Statistics Summer Assignment:
AP Statistics Summer Assignment: Read the following and use the information to help answer your summer assignment questions. You will be responsible for knowing all of the information contained in this
More informationExploring and Understanding Data Using R.
Exploring and Understanding Data Using R. Loading the data into an R data frame: variable
More informationThe basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student
Organizing data Learning Outcome 1. make an array 2. divide the array into class intervals 3. describe the characteristics of a table 4. construct a frequency distribution table 5. constructing a composite
More informationQuantitative - One Population
Quantitative - One Population The Quantitative One Population VISA procedures allow the user to perform descriptive and inferential procedures for problems involving one population with quantitative (interval)
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES 2: Data Pre-Processing Instructor: Yizhou Sun yzsun@ccs.neu.edu September 10, 2013 2: Data Pre-Processing Getting to know your data Basic Statistical Descriptions of Data
More informationData Mining and Analytics. Introduction
Data Mining and Analytics Introduction Data Mining Data mining refers to extracting or mining knowledge from large amounts of data It is also termed as Knowledge Discovery from Data (KDD) Mostly, data
More informationWhat are we working with? Data Abstractions. Week 4 Lecture A IAT 814 Lyn Bartram
What are we working with? Data Abstractions Week 4 Lecture A IAT 814 Lyn Bartram Munzner s What-Why-How What are we working with? DATA abstractions, statistical methods Why are we doing it? Task abstractions
More informationPart I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures
Part I, Chapters 4 & 5 Data Tables and Data Analysis Statistics and Figures Descriptive Statistics 1 Are data points clumped? (order variable / exp. variable) Concentrated around one value? Concentrated
More informationCHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016)
CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1 Daphne Skipper, Augusta University (2016) 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is
More information1 Overview of Statistics; Essential Vocabulary
1 Overview of Statistics; Essential Vocabulary Statistics: the science of collecting, organizing, analyzing, and interpreting data in order to make decisions Population and sample Population: the entire
More informationSCHOOL OF BUSINESS, ECONOMICS AND MANAGEMENT BBA240 STATISTICS/ QUANTITATIVE METHODS FOR BUSINESS AND ECONOMICS
SCHOOL OF BUSINESS, ECONOMICS AND MANAGEMENT BBA240 STATISTICS/ QUANTITATIVE METHODS FOR BUSINESS AND ECONOMICS Unit Two Moses Mwale e-mail: moses.mwale@ictar.ac.zm ii Contents Contents UNIT 2: Numerical
More informationChapter 5. Understanding and Comparing Distributions. Copyright 2012, 2008, 2005 Pearson Education, Inc.
Chapter 5 Understanding and Comparing Distributions The Big Picture We can answer much more interesting questions about variables when we compare distributions for different groups. Below is a histogram
More informationLearner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display
CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &
More information2.1: Frequency Distributions and Their Graphs
2.1: Frequency Distributions and Their Graphs Frequency Distribution - way to display data that has many entries - table that shows classes or intervals of data entries and the number of entries in each
More informationFurther Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables
Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables
More informationCHAPTER 2: SAMPLING AND DATA
CHAPTER 2: SAMPLING AND DATA This presentation is based on material and graphs from Open Stax and is copyrighted by Open Stax and Georgia Highlands College. OUTLINE 2.1 Stem-and-Leaf Graphs (Stemplots),
More informationUNIT 1A EXPLORING UNIVARIATE DATA
A.P. STATISTICS E. Villarreal Lincoln HS Math Department UNIT 1A EXPLORING UNIVARIATE DATA LESSON 1: TYPES OF DATA Here is a list of important terms that we must understand as we begin our study of statistics
More informationWeek 2: Frequency distributions
Types of data Health Sciences M.Sc. Programme Applied Biostatistics Week 2: distributions Data can be summarised to help to reveal information they contain. We do this by calculating numbers from the data
More informationFrequency Distributions
Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data so that it is possible to get a general overview of the results. Remember,
More informationThe first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies.
Instructions: You are given the following data below these instructions. Your client (Courtney) wants you to statistically analyze the data to help her reach conclusions about how well she is teaching.
More informationBar Charts and Frequency Distributions
Bar Charts and Frequency Distributions Use to display the distribution of categorical (nominal or ordinal) variables. For the continuous (numeric) variables, see the page Histograms, Descriptive Stats
More informationRoad Map. Data types Measuring data Data cleaning Data integration Data transformation Data reduction Data discretization Summary
2. Data preprocessing Road Map Data types Measuring data Data cleaning Data integration Data transformation Data reduction Data discretization Summary 2 Data types Categorical vs. Numerical Scale types
More informationLecture Notes 3: Data summarization
Lecture Notes 3: Data summarization Highlights: Average Median Quartiles 5-number summary (and relation to boxplots) Outliers Range & IQR Variance and standard deviation Determining shape using mean &
More informationVisualizing and Exploring Data
Visualizing and Exploring Data Sargur University at Buffalo The State University of New York Visual Methods for finding structures in data Power of human eye/brain to detect structures Product of eons
More informationCS570 Introduction to Data Mining
CS570 Introduction to Data Mining Department of Mathematics and Computer Science Li Xiong Data Exploration and Data Preprocessing Data and attributes Data exploration Data pre-processing 2 10 What is Data?
More informationRegression III: Advanced Methods
Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3
Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Look for accompanying R code on the course web site. Topics Exploratory Data Analysis
More informationM7D1.a: Formulate questions and collect data from a census of at least 30 objects and from samples of varying sizes.
M7D1.a: Formulate questions and collect data from a census of at least 30 objects and from samples of varying sizes. Population: Census: Biased: Sample: The entire group of objects or individuals considered
More informationTopic (3) SUMMARIZING DATA - TABLES AND GRAPHICS
Topic (3) SUMMARIZING DATA - TABLES AND GRAPHICS 3- Topic (3) SUMMARIZING DATA - TABLES AND GRAPHICS A) Frequency Distributions For Samples Defn: A FREQUENCY DISTRIBUTION is a tabular or graphical display
More informationIntroduction to Geospatial Analysis
Introduction to Geospatial Analysis Introduction to Geospatial Analysis 1 Descriptive Statistics Descriptive statistics. 2 What and Why? Descriptive Statistics Quantitative description of data Why? Allow
More informationWELCOME! Lecture 3 Thommy Perlinger
Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important
More informationDescriptive Statistics
Descriptive Statistics Library, Teaching & Learning 014 Summary of Basic data Analysis DATA Qualitative Quantitative Counted Measured Discrete Continuous 3 Main Measures of Interest Central Tendency Dispersion
More information2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.
Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss
More informationAP Statistics Prerequisite Packet
Types of Data Quantitative (or measurement) Data These are data that take on numerical values that actually represent a measurement such as size, weight, how many, how long, score on a test, etc. For these
More informationDescriptive Statistics
Chapter 2 Descriptive Statistics 2.1 Descriptive Statistics 1 2.1.1 Student Learning Objectives By the end of this chapter, the student should be able to: Display data graphically and interpret graphs:
More informationChapter 2: Descriptive Statistics
Chapter 2: Descriptive Statistics Student Learning Outcomes By the end of this chapter, you should be able to: Display data graphically and interpret graphs: stemplots, histograms and boxplots. Recognize,
More informationBIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26
Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26 INTRODUCTION Graphs are one of the most important aspects of data analysis and presentation of your of data. They are visual representations
More informationSpecial Review Section. Copyright 2014 Pearson Education, Inc.
Special Review Section SRS-1--1 Special Review Section Chapter 1: The Where, Why, and How of Data Collection Chapter 2: Graphs, Charts, and Tables Describing Your Data Chapter 3: Describing Data Using
More informationSection 9: One Variable Statistics
The following Mathematics Florida Standards will be covered in this section: MAFS.912.S-ID.1.1 MAFS.912.S-ID.1.2 MAFS.912.S-ID.1.3 Represent data with plots on the real number line (dot plots, histograms,
More informationCreate a bar graph that displays the data from the frequency table in Example 1. See the examples on p Does our graph look different?
A frequency table is a table with two columns, one for the categories and another for the number of times each category occurs. See Example 1 on p. 247. Create a bar graph that displays the data from the
More information3 Graphical Displays of Data
3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked
More informationDescriptive Statistics By
Faculty of Medicine Epidemiology and Biostatistics الوبائيات واإلحصاء الحيوي (31505204) Lecture 3-5 Descriptive Statistics By Hatim Jaber MD MPH JBCM PhD 11+12-6-2017 1 Presentation outline 11-6-2017 Time
More informationStatistics Lecture 6. Looking at data one variable
Statistics 111 - Lecture 6 Looking at data one variable Chapter 1.1 Moore, McCabe and Craig Probability vs. Statistics Probability 1. We know the distribution of the random variable (Normal, Binomial)
More informationName Date Types of Graphs and Creating Graphs Notes
Name Date Types of Graphs and Creating Graphs Notes Graphs are helpful visual representations of data. Different graphs display data in different ways. Some graphs show individual data, but many do not.
More informationChpt 3. Data Description. 3-2 Measures of Central Tendency /40
Chpt 3 Data Description 3-2 Measures of Central Tendency 1 /40 Chpt 3 Homework 3-2 Read pages 96-109 p109 Applying the Concepts p110 1, 8, 11, 15, 27, 33 2 /40 Chpt 3 3.2 Objectives l Summarize data using
More informationTo calculate the arithmetic mean, sum all the values and divide by n (equivalently, multiple 1/n): 1 n. = 29 years.
3: Summary Statistics Notation Consider these 10 ages (in years): 1 4 5 11 30 50 8 7 4 5 The symbol n represents the sample size (n = 10). The capital letter X denotes the variable. x i represents the
More informationSTA Module 4 The Normal Distribution
STA 2023 Module 4 The Normal Distribution Learning Objectives Upon completing this module, you should be able to 1. Explain what it means for a variable to be normally distributed or approximately normally
More informationSTA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves
STA 2023 Module 4 The Normal Distribution Learning Objectives Upon completing this module, you should be able to 1. Explain what it means for a variable to be normally distributed or approximately normally
More informationMeasures of Central Tendency
Measures of Central Tendency MATH 130, Elements of Statistics I J. Robert Buchanan Department of Mathematics Fall 2017 Introduction Measures of central tendency are designed to provide one number which
More informationChapter 6. THE NORMAL DISTRIBUTION
Chapter 6. THE NORMAL DISTRIBUTION Introducing Normally Distributed Variables The distributions of some variables like thickness of the eggshell, serum cholesterol concentration in blood, white blood cells
More informationUnivariate Statistics Summary
Further Maths Univariate Statistics Summary Types of Data Data can be classified as categorical or numerical. Categorical data are observations or records that are arranged according to category. For example:
More informationData Mining: Concepts and Techniques
Data Mining: Concepts and Techniques Chapter 2 Original Slides: Jiawei Han and Micheline Kamber Modification: Li Xiong Data Mining: Concepts and Techniques 1 Chapter 2: Data Preprocessing Why preprocess
More informationLAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA
LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to
More informationChapter 6. THE NORMAL DISTRIBUTION
Chapter 6. THE NORMAL DISTRIBUTION Introducing Normally Distributed Variables The distributions of some variables like thickness of the eggshell, serum cholesterol concentration in blood, white blood cells
More informationLecture 6: Chapter 6 Summary
1 Lecture 6: Chapter 6 Summary Z-score: Is the distance of each data value from the mean in standard deviation Standardizes data values Standardization changes the mean and the standard deviation: o Z
More informationMAT 110 WORKSHOP. Updated Fall 2018
MAT 110 WORKSHOP Updated Fall 2018 UNIT 3: STATISTICS Introduction Choosing a Sample Simple Random Sample: a set of individuals from the population chosen in a way that every individual has an equal chance
More informationDownloaded from
UNIT 2 WHAT IS STATISTICS? Researchers deal with a large amount of data and have to draw dependable conclusions on the basis of data collected for the purpose. Statistics help the researchers in making
More informationData Statistics Population. Census Sample Correlation... Statistical & Practical Significance. Qualitative Data Discrete Data Continuous Data
Data Statistics Population Census Sample Correlation... Voluntary Response Sample Statistical & Practical Significance Quantitative Data Qualitative Data Discrete Data Continuous Data Fewer vs Less Ratio
More informationMATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation
MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation Objectives: 1. Learn the meaning of descriptive versus inferential statistics 2. Identify bar graphs,
More information