The Uneasy Relationship Between Statistical Methods and Anthropological Research. Dr. Dwight Read Dept. of Anthropology and Dept. of Statistics UCLA

Size: px
Start display at page:

Download "The Uneasy Relationship Between Statistical Methods and Anthropological Research. Dr. Dwight Read Dept. of Anthropology and Dept. of Statistics UCLA"

Transcription

1 The Uneasy Relationship Between Statistical Methods and Anthropological Research Dr. Dwight Read Dept. of Anthropology and Dept. of Statistics UCLA

2 Introduction I willl begin by characterizing statistics as finding patterning in aggregated data The probability basis for inferential statistics that relates sample statistics to population parameters is introduced next, using William Feller s notion of a conceptual experiment Although descriptive adequacy is obtained by defining an experiment defined by random sampling of a population, the latter does not characterize the processes of interest that structured the data brought forward for analysis Instead, an experiment needs to be defined by referring to a theory for the structuring processes; I exemplify this approach with archaeological data The example makes evident the problem that arises with data sets that are heterogeneous due to being the consequence of more than one data structuring process This leads to the double-bind problem that removing heterogeneity in data cases requires already having variables that measure the same process and removing heterogeneity in variables requires already having data cases that are the consequence of the same process A resolution of the double-bind problem is illustrated using a iterative procedure and leads to statistical analysis that gives rise to novel insights into the properties of the archaeological data set being analyzed

3 Qualitative Patterning Observed on Individual Cases A type refers to a category of ceramics that shares a consistent, specific and unique combination of physical attributes (such as paste type, color of decoration, kind of glazing, etc.). (Florida Museum of Natural History) Stamnos is a lidded storage jar for liquids that was standardized during the red-figure period. It is glazed inside. It has a short, stout neck, a wide, flat rim, and a straight body that tapers to a base. Horizontal handles are attached to the widest part of the jar.

4 Quantitative Patterning Observed on Individual Cases During experiments I ran every test many times and noticed just a little deviation between results. I considered those deviation to be negligibly small (statistically insignificant) so final comparison made from just one execution of test case for every language without gathering results of multiple tests and comparing their average. (

5 Pattern in the Aggregate a correlation [r = 0.547] is not predictive for individual cases. It is strictly a statistical statement about how two variables are related in aggregate. (emphasis in original;

6 What is Statistics? we can say that Statistics is the science concerned with the summary of data, trying to find regularities in these data ( emphasis added) Statistics has to do with phenomena where patterning is found in the aggregate, as opposed to phenomena where patterning is found on individual cases.

7 Implications of Patterning in the Aggregate Patterning in the aggregate specific implies that change in the aggregate may change the patterning. Therefore, statistical analysis begins by specifying the aggregate over which patterning is to be found. The aggregate over which patterning is to be found is typically called a population. A population must be well-defined; that is for any entity or observation, we must be able to determine if it is a member of the population or not.

8 Example of Defining a Population Archaeological Topic of Interest: Patterning in stone projectile points (arrowheads) Data set of interest: Projectile points made by the inhabitants of a Paleo-Indian site, 4Ven39, in Ventura County (occupied around 1400 AD). Method for Data Recovery: Excavate a sample of 2m x 2m grid squares placed randomly over the occupation area Data Set Brought Forward for Analysis: 64 projectile points recovered from the excavation Population Definition 1: Define the population to be the 64 projectile points recovered from the excavation

9 Projectile Points from 4Ven39

10 Measurements

11 Determine Patterning: Tip Angle (Sharpness) Projectile Points from 4Ven39 µ = 40.3 σ = 12.15

12 Population Definition 2 Though the population of recovered projectile points is welldefined, the archaeologist wants to know about all points, not just the ones that were recovered. Population 2: All projectile points in the site 4Ven39 Population 2 cannot be recovered in its entirety, so we use inferential statistics to infer values for population parameters from sample statistics. The sample consists of the recovered projectile points and for this sample the sample statistics are: x = 40.3,s =12.21

13 Statistical Inference From the sample statistics, the sample size n = 64, the assumption that the sample is a random sample from Population 2 due to the way the data were recovered, and the apparent normality of the distribution of data values, we conclude, with 95% confidence, that the true value for µ is in the interval [ x 1.53, x 1.53] = [37.3, 43.3]

14 Statistical Inference and Probability Inferential statistics is given a probability foundation (from Feller, W. An Introduction to Probability Theory and Its Applications) by defining a sample space Ω for a conceptual experiment, E, with outcomes when E is performed as follows: An experiment E determines a sample space Ω consisting of all possible, indecomposable outcomes for E. For a finite sample space, a value, p i, 0 p i 1, is assigned to each outcome, o i, in Ω, where i = 1. The value p i is interpreted as the probability of o i occurring when the experiment E is performed. A random variable is a mapping from Ω to the real numbers.

15 Connection Between Probability Theory and a Population Experiment E: Select an object from the (finite) population P randomly. Space Ω is determined by the objects in P Assign p = 1/N as the probability for each outcome in Ω, where N is the number of objects in P. A measurement X over the objects in P, such as the angle of a point, defines a mapping from Ω to R, hence is a random variable defined over the space Ω. Probabilities are assigned to the value x the measurement X may take on by letting o = {o i X(o i ) = x}. Then the probability that X takes on the value x is given by Prob(o) = Σ Prob(o i ) = Σp i.

16 Connection Between Probability Theory and a Population (example) Sample of excavated points Population, P, of projectile points made by inhabitants of 4Ven39 Experiment E: randomly select a point from P Space Ω of outcomes: projectile points made by inhabitants of 4Ven39 Probability of an outcome = 1/N, where N is the number of projectile points they made Random variable X: angle of the tip of a projectile point. Use inferential statistics to connect sample statistics to population parameters

17 Description versus Understanding The experiment E, of randomly selecting observations from population P, enables statistical description of any population with parameter values either computed directly when the entire population is accessible (e.g., P = {excavated projectile points}), or by inference from values obtained using a sample from the population (e.g., P = {all projectile points in the site 4Ven39}). Problem: E does not relate to the process(es) by which projectile points were made, hence parameter values do not characterize those processes.

18 Experiment Based on Process The experiment E of interest to the archaeologist is the process of an artisan at 4Ven39 making a projectile point. The outcomes of E are the projectile points that the artisans made. Archaeologists make the following assumptions: (1) the projectile points are part of a group s cultural repertoire, meaning that the artisans had shared normative values for some of the attributes of the projectile points and (2) different types of projectile points correspond to different normative values. Statistical analysis should lead to estimation of the normative values for those attributes that are part of the cultural repertoire of the artisans who manufactured the projectile points.

19 Examples of Normative Values (1) Qualitative: The points either have a leaf shape or a triangular shape. Other shapes are possible, but were not produced at 4Ven39. leaf shape triangular shape (2) Quantitative: The angle of the points is unimodal, approximately normal, suggesting that there was a single, normative value for the tip angle.

20 Statistical Model For Quantitative Normative Values population of possible manufacture errors sum of random sample of manufacture errors Quantitative Dimension Actual Value Normative Value Via the Central Limit Theorem, the frequency distribution for a random variable corresponding to a dimension under normative control will have a uni-modal, approximately normal distribution.

21 Not all Dimensions are Under Normative Control Distance from base of point to maximum width. Bar chart does not match the pattern for a dimension under normative control. Appears to be made up of two patterns: (1) 0 distance and (2) non-zero distances, with a unimodal distribution and normative value of 9 mm x = 4.2,s = 0.6

22 Distance to Maximum Width Leaf shape Triangular shape non-zero distance zero distance Dimension does not apply to triangular points

23 Heterogeneous Data Sets The normative values for the leaf shape points will differ, for some dimensions, from the normative values for the triangular shape points. This implies that we need to divide the original data set into subsets for which all members of a subset share the same dimensions and normative values.

24 Heterogeneous Dimensions The distance to maximum width is not under cultural constraint for the triangular points. This implies that we need to determine the dimensions for which there is normative control.

25 Heterogeneous Data Sets: Cluster Analysis? Include the length dimension in the cluster analysis

26 Cluster Analysis: Problem

27 Heterogeneous Dimensions: Principal Component Analysis?

28 Two Binds Bind 1: To subdivide the data set into homogeneous subsets, we need to know the dimensions for which there is normative control. Bind 2: To subdivide the dimensions into those for which there is normative control, we need to know the homogeneous subsets.

29 Double-Bind Problem Well-defined variables Not well-defined variables Well-defined data set Data set for statistical analysis Principal component analysis Not well defined data set Cluster analysis Data set brought forward for analysis

30 An Iterative Solution to Double-Bind Problem Examine each variable for a multi-modal distribution Subdivide data set by the modes of the multi-modal distribution Repeat on each subset until no further subdivisions are possible Examine pairs of variables for a multi-modal distribution Subdivide data set by the modes of the multimodal distribution Repeat on each subset until no further subdivisions are possible And so on.

31 Bimodal Distribution: Base Height Data Set S 1 (concave points) Data Set S (64 projectile points) Data Set S 2 (convex points)

32 Bimodal Distribution: Maximum Width Data Set S 1 Data Subset S 11 (narrow points) Data Subset S 12 (wide points)

33 Scattergram Plot (Length versus Tip Angle: S 11 and S 12 ) S 111 (broken, retouched) S 112 (intact) S 121 (broken, retouched) S 122 (intact)

34 Intact and Re-Sharpened Projective Points resharpened wide intact resharpened narrow

35 Characterization of Concave Projectile Points

36 Conclusion Insightful use of statistical analysis of data requires concordance between the conceptual basis of statistical methods and the structuring processes for the data brought forward for analysis. Heterogeneous data sets are typical since research goals are aimed at elucidating the processes structuring the phenomena of interest, hence welldefined data sets and well-defined sets of variables are not known in advance. Methods for doing this kind of pre-analysis of data are at a preliminary stage and the iterative method illustrated here is largely heuristic and does not yet have a well-developed formal foundation.

STAT STATISTICAL METHODS. Statistics: The science of using data to make decisions and draw conclusions

STAT STATISTICAL METHODS. Statistics: The science of using data to make decisions and draw conclusions STAT 515 --- STATISTICAL METHODS Statistics: The science of using data to make decisions and draw conclusions Two branches: Descriptive Statistics: The collection and presentation (through graphical and

More information

Data Statistics Population. Census Sample Correlation... Statistical & Practical Significance. Qualitative Data Discrete Data Continuous Data

Data Statistics Population. Census Sample Correlation... Statistical & Practical Significance. Qualitative Data Discrete Data Continuous Data Data Statistics Population Census Sample Correlation... Voluntary Response Sample Statistical & Practical Significance Quantitative Data Qualitative Data Discrete Data Continuous Data Fewer vs Less Ratio

More information

STP 226 ELEMENTARY STATISTICS NOTES

STP 226 ELEMENTARY STATISTICS NOTES ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 2 ORGANIZING DATA Descriptive Statistics - include methods for organizing and summarizing information clearly and effectively. - classify

More information

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable. 5-number summary 68-95-99.7 Rule Area principle Bar chart Bimodal Boxplot Case Categorical data Categorical variable Center Changing center and spread Conditional distribution Context Contingency table

More information

Overview. Frequency Distributions. Chapter 2 Summarizing & Graphing Data. Descriptive Statistics. Inferential Statistics. Frequency Distribution

Overview. Frequency Distributions. Chapter 2 Summarizing & Graphing Data. Descriptive Statistics. Inferential Statistics. Frequency Distribution Chapter 2 Summarizing & Graphing Data Slide 1 Overview Descriptive Statistics Slide 2 A) Overview B) Frequency Distributions C) Visualizing Data summarize or describe the important characteristics of a

More information

Chapter 2: Modeling Distributions of Data

Chapter 2: Modeling Distributions of Data Chapter 2: Modeling Distributions of Data Section 2.2 The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 2 Modeling Distributions of Data 2.1 Describing Location in a Distribution

More information

Slide Copyright 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION. Chapter 13. Statistics Sampling Techniques

Slide Copyright 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION. Chapter 13. Statistics Sampling Techniques SEVENTH EDITION and EXPANDED SEVENTH EDITION Slide - Chapter Statistics. Sampling Techniques Statistics Statistics is the art and science of gathering, analyzing, and making inferences from numerical information

More information

2.1: Frequency Distributions and Their Graphs

2.1: Frequency Distributions and Their Graphs 2.1: Frequency Distributions and Their Graphs Frequency Distribution - way to display data that has many entries - table that shows classes or intervals of data entries and the number of entries in each

More information

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents E-Companion: On Styles in Product Design: An Analysis of US Design Patents 1 PART A: FORMALIZING THE DEFINITION OF STYLES A.1 Styles as categories of designs of similar form Our task involves categorizing

More information

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked Plotting Menu: QCExpert Plotting Module graphs offers various tools for visualization of uni- and multivariate data. Settings and options in different types of graphs allow for modifications and customizations

More information

Chapter 2 - Graphical Summaries of Data

Chapter 2 - Graphical Summaries of Data Chapter 2 - Graphical Summaries of Data Data recorded in the sequence in which they are collected and before they are processed or ranked are called raw data. Raw data is often difficult to make sense

More information

Unit Maps: Grade 7 Math

Unit Maps: Grade 7 Math Rational Number Representations and Operations 7.4 Number and operations. The student adds, subtracts, multiplies, and divides rationale numbers while solving problems and justifying solutions. Solving

More information

Chapter 5snow year.notebook March 15, 2018

Chapter 5snow year.notebook March 15, 2018 Chapter 5: Statistical Reasoning Section 5.1 Exploring Data Measures of central tendency (Mean, Median and Mode) attempt to describe a set of data by identifying the central position within a set of data

More information

Middle School Math Course 3

Middle School Math Course 3 Middle School Math Course 3 Correlation of the ALEKS course Middle School Math Course 3 to the Texas Essential Knowledge and Skills (TEKS) for Mathematics Grade 8 (2012) (1) Mathematical process standards.

More information

Unit Maps: Grade 7 Math

Unit Maps: Grade 7 Math Rational Number Representations and Operations 7.4 Number and operations. The student adds, subtracts, multiplies, and divides rationale numbers while solving problems and justifying solutions. Solving

More information

Chapter 2 Describing, Exploring, and Comparing Data

Chapter 2 Describing, Exploring, and Comparing Data Slide 1 Chapter 2 Describing, Exploring, and Comparing Data Slide 2 2-1 Overview 2-2 Frequency Distributions 2-3 Visualizing Data 2-4 Measures of Center 2-5 Measures of Variation 2-6 Measures of Relative

More information

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order. Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good

More information

Unit 7 Statistics. AFM Mrs. Valentine. 7.1 Samples and Surveys

Unit 7 Statistics. AFM Mrs. Valentine. 7.1 Samples and Surveys Unit 7 Statistics AFM Mrs. Valentine 7.1 Samples and Surveys v Obj.: I will understand the different methods of sampling and studying data. I will be able to determine the type used in an example, and

More information

Frequency Distributions

Frequency Distributions Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data so that it is possible to get a general overview of the results. Remember,

More information

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. + What is Data? Data is a collection of facts. Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. In most cases, data needs to be interpreted and

More information

BUSINESS DECISION MAKING. Topic 1 Introduction to Statistical Thinking and Business Decision Making Process; Data Collection and Presentation

BUSINESS DECISION MAKING. Topic 1 Introduction to Statistical Thinking and Business Decision Making Process; Data Collection and Presentation BUSINESS DECISION MAKING Topic 1 Introduction to Statistical Thinking and Business Decision Making Process; Data Collection and Presentation (Chap 1 The Nature of Probability and Statistics) (Chap 2 Frequency

More information

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or  me, I will answer promptly. Statistical Methods Instructor: Lingsong Zhang 1 Issues before Class Statistical Methods Lingsong Zhang Office: Math 544 Email: lingsong@purdue.edu Phone: 765-494-7913 Office Hour: Monday 1:00 pm - 2:00

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

Sec 6.3. Bluman, Chapter 6 1

Sec 6.3. Bluman, Chapter 6 1 Sec 6.3 Bluman, Chapter 6 1 Bluman, Chapter 6 2 Review: Find the z values; the graph is symmetrical. z = ±1. 96 z 0 z the total area of the shaded regions=5% Bluman, Chapter 6 3 Review: Find the z values;

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Unit Maps: Grade 8 Math

Unit Maps: Grade 8 Math Real Number Relationships 8.3 Number and operations. The student represents and use real numbers in a variety of forms. Representation of Real Numbers 8.3A extend previous knowledge of sets and subsets

More information

Slides 11: Verification and Validation Models

Slides 11: Verification and Validation Models Slides 11: Verification and Validation Models Purpose and Overview The goal of the validation process is: To produce a model that represents true behaviour closely enough for decision making purposes.

More information

Mathematics Scope & Sequence Grade 8 Revised: June 2015

Mathematics Scope & Sequence Grade 8 Revised: June 2015 Mathematics Scope & Sequence 2015-16 Grade 8 Revised: June 2015 Readiness Standard(s) First Six Weeks (29 ) 8.2D Order a set of real numbers arising from mathematical and real-world contexts Convert between

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions

More information

Optimization Methods: Advanced Topics in Optimization - Multi-objective Optimization 1. Module 8 Lecture Notes 2. Multi-objective Optimization

Optimization Methods: Advanced Topics in Optimization - Multi-objective Optimization 1. Module 8 Lecture Notes 2. Multi-objective Optimization Optimization Methods: Advanced Topics in Optimization - Multi-objective Optimization 1 Module 8 Lecture Notes 2 Multi-objective Optimization Introduction In a real world problem it is very unlikely that

More information

cse 252c Fall 2004 Project Report: A Model of Perpendicular Texture for Determining Surface Geometry

cse 252c Fall 2004 Project Report: A Model of Perpendicular Texture for Determining Surface Geometry cse 252c Fall 2004 Project Report: A Model of Perpendicular Texture for Determining Surface Geometry Steven Scher December 2, 2004 Steven Scher SteveScher@alumni.princeton.edu Abstract Three-dimensional

More information

Question Bank. 4) It is the source of information later delivered to data marts.

Question Bank. 4) It is the source of information later delivered to data marts. Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile

More information

Scope and Sequence for the New Jersey Core Curriculum Content Standards

Scope and Sequence for the New Jersey Core Curriculum Content Standards Scope and Sequence for the New Jersey Core Curriculum Content Standards The following chart provides an overview of where within Prentice Hall Course 3 Mathematics each of the Cumulative Progress Indicators

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming

More information

Chapter 1. Looking at Data-Distribution

Chapter 1. Looking at Data-Distribution Chapter 1. Looking at Data-Distribution Statistics is the scientific discipline that provides methods to draw right conclusions: 1)Collecting the data 2)Describing the data 3)Drawing the conclusions Raw

More information

3 Graphical Displays of Data

3 Graphical Displays of Data 3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked

More information

Unit 5: Estimating with Confidence

Unit 5: Estimating with Confidence Unit 5: Estimating with Confidence Section 8.3 The Practice of Statistics, 4 th edition For AP* STARNES, YATES, MOORE Unit 5 Estimating with Confidence 8.1 8.2 8.3 Confidence Intervals: The Basics Estimating

More information

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 6: DESCRIPTIVE STATISTICS Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling

More information

Unit Maps: Grade 8 Math

Unit Maps: Grade 8 Math Real Number Relationships 8.3 Number and operations. The student represents and use real numbers in a variety of forms. Representation of Real Numbers 8.3A extend previous knowledge of sets and subsets

More information

Table of Contents (As covered from textbook)

Table of Contents (As covered from textbook) Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression

More information

Chapter 3 - Displaying and Summarizing Quantitative Data

Chapter 3 - Displaying and Summarizing Quantitative Data Chapter 3 - Displaying and Summarizing Quantitative Data 3.1 Graphs for Quantitative Data (LABEL GRAPHS) August 25, 2014 Histogram (p. 44) - Graph that uses bars to represent different frequencies or relative

More information

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &

More information

Chapter 2 Modeling Distributions of Data

Chapter 2 Modeling Distributions of Data Chapter 2 Modeling Distributions of Data Section 2.1 Describing Location in a Distribution Describing Location in a Distribution Learning Objectives After this section, you should be able to: FIND and

More information

CLUSTERING. JELENA JOVANOVIĆ Web:

CLUSTERING. JELENA JOVANOVIĆ   Web: CLUSTERING JELENA JOVANOVIĆ Email: jeljov@gmail.com Web: http://jelenajovanovic.net OUTLINE What is clustering? Application domains K-Means clustering Understanding it through an example The K-Means algorithm

More information

Middle School Math Course 2

Middle School Math Course 2 Middle School Math Course 2 Correlation of the ALEKS course Middle School Math Course 2 to the Indiana Academic Standards for Mathematics Grade 7 (2014) 1: NUMBER SENSE = ALEKS course topic that addresses

More information

One Factor Experiments

One Factor Experiments One Factor Experiments 20-1 Overview Computation of Effects Estimating Experimental Errors Allocation of Variation ANOVA Table and F-Test Visual Diagnostic Tests Confidence Intervals For Effects Unequal

More information

THREE-DIMENSIONA L ELECTRON MICROSCOP Y OF MACROMOLECULAR ASSEMBLIE S. Visualization of Biological Molecules in Their Native Stat e.

THREE-DIMENSIONA L ELECTRON MICROSCOP Y OF MACROMOLECULAR ASSEMBLIE S. Visualization of Biological Molecules in Their Native Stat e. THREE-DIMENSIONA L ELECTRON MICROSCOP Y OF MACROMOLECULAR ASSEMBLIE S Visualization of Biological Molecules in Their Native Stat e Joachim Frank CHAPTER 1 Introduction 1 1 The Electron Microscope and

More information

3 Graphical Displays of Data

3 Graphical Displays of Data 3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SPSS SPSS (originally Statistical Package for the Social Sciences ) is a commercial statistical software package with an easy-to-use

More information

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and

More information

CHAPTER 2 Modeling Distributions of Data

CHAPTER 2 Modeling Distributions of Data CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Density Curves

More information

Massachusetts Institute of Technology Department of Computer Science and Electrical Engineering 6.801/6.866 Machine Vision QUIZ II

Massachusetts Institute of Technology Department of Computer Science and Electrical Engineering 6.801/6.866 Machine Vision QUIZ II Massachusetts Institute of Technology Department of Computer Science and Electrical Engineering 6.801/6.866 Machine Vision QUIZ II Handed out: 001 Nov. 30th Due on: 001 Dec. 10th Problem 1: (a (b Interior

More information

Grade 9 Math Terminology

Grade 9 Math Terminology Unit 1 Basic Skills Review BEDMAS a way of remembering order of operations: Brackets, Exponents, Division, Multiplication, Addition, Subtraction Collect like terms gather all like terms and simplify as

More information

The Normal Distribution & z-scores

The Normal Distribution & z-scores & z-scores Distributions: Who needs them? Why are we interested in distributions? Important link between distributions and probabilities of events If we know the distribution of a set of events, then we

More information

12. A(n) is the number of times an item or number occurs in a data set.

12. A(n) is the number of times an item or number occurs in a data set. Chapter 15 Vocabulary Practice Match each definition to its corresponding term. a. data b. statistical question c. population d. sample e. data analysis f. parameter g. statistic h. survey i. experiment

More information

Six Weeks:

Six Weeks: HPISD Grade 7 7/8 Math The student uses mathematical processes to: acquire and demonstrate mathematical understanding Mathematical Process Standards Apply mathematics to problems arising in everyday life,

More information

B. Graphing Representation of Data

B. Graphing Representation of Data B Graphing Representation of Data The second way of displaying data is by use of graphs Although such visual aids are even easier to read than tables, they often do not give the same detail It is essential

More information

Model Selection - Which Curve?

Model Selection - Which Curve? Model Selection - Which Curve? Data Modelling There are two distinct requirements for a complete analysis. Parameter Estimation Find the parameter values that achieve closest fit to the data. Model Selection

More information

DATES TO REMEMBER. First Six Weeks Curriculum Guide. Days (28) Add. Info.

DATES TO REMEMBER. First Six Weeks Curriculum Guide. Days (28) Add. Info. First Six Weeks Curriculum Guide Subject: Pre-Algebra/Math 8 SOL: * 8.5, 8.2, 8.1, 8.4 (28) SOL 8.5 4 days SOL 8.2 4 days Remembering, consecutive, Identify the perfect squares from 0 to 400. irrational

More information

TMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS

TMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS To Describe Data, consider: Symmetry Skewness TMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS Unimodal or bimodal or uniform Extreme values Range of Values and mid-range Most frequently occurring values In

More information

Object-Based Classification & ecognition. Zutao Ouyang 11/17/2015

Object-Based Classification & ecognition. Zutao Ouyang 11/17/2015 Object-Based Classification & ecognition Zutao Ouyang 11/17/2015 What is Object-Based Classification The object based image analysis approach delineates segments of homogeneous image areas (i.e., objects)

More information

round decimals to the nearest decimal place and order negative numbers in context

round decimals to the nearest decimal place and order negative numbers in context 6 Numbers and the number system understand and use proportionality use the equivalence of fractions, decimals and percentages to compare proportions use understanding of place value to multiply and divide

More information

Partitioning Orthogonal Polygons by Extension of All Edges Incident to Reflex Vertices: lower and upper bounds on the number of pieces

Partitioning Orthogonal Polygons by Extension of All Edges Incident to Reflex Vertices: lower and upper bounds on the number of pieces Partitioning Orthogonal Polygons by Extension of All Edges Incident to Reflex Vertices: lower and upper bounds on the number of pieces António Leslie Bajuelos 1, Ana Paula Tomás and Fábio Marques 3 1 Dept.

More information

Interactive Math Glossary Terms and Definitions

Interactive Math Glossary Terms and Definitions Terms and Definitions Absolute Value the magnitude of a number, or the distance from 0 on a real number line Addend any number or quantity being added addend + addend = sum Additive Property of Area the

More information

Cpk: What is its Capability? By: Rick Haynes, Master Black Belt Smarter Solutions, Inc.

Cpk: What is its Capability? By: Rick Haynes, Master Black Belt Smarter Solutions, Inc. C: What is its Capability? By: Rick Haynes, Master Black Belt Smarter Solutions, Inc. C is one of many capability metrics that are available. When capability metrics are used, organizations typically provide

More information

University of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka

University of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should

More information

ECLT 5810 Data Preprocessing. Prof. Wai Lam

ECLT 5810 Data Preprocessing. Prof. Wai Lam ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate

More information

Math 14 Lecture Notes Ch. 6.1

Math 14 Lecture Notes Ch. 6.1 6.1 Normal Distribution What is normal? a 10-year old boy that is 4' tall? 5' tall? 6' tall? a 25-year old woman with a shoe size of 5? 7? 9? an adult alligator that weighs 200 pounds? 500 pounds? 800

More information

DSC 201: Data Analysis & Visualization

DSC 201: Data Analysis & Visualization DSC 201: Data Analysis & Visualization Exploratory Data Analysis Dr. David Koop What is Exploratory Data Analysis? "Detective work" to summarize and explore datasets Includes: - Data acquisition and input

More information

TIPS4Math Grades 4 to 6 Overview Grade 4 Grade 5 Grade 6 Collect, Organize, and Display Primary Data (4+ days)

TIPS4Math Grades 4 to 6 Overview Grade 4 Grade 5 Grade 6 Collect, Organize, and Display Primary Data (4+ days) Collect, Organize, and Display Primary Data (4+ days) Collect, Organize, Display and Interpret Categorical Data (5+ days) 4m88 Collect data by conducting a survey or an experiment to do with the 4m89 Collect

More information

Courtesy :

Courtesy : STATISTICS The Nature of Statistics Introduction Statistics is the science of data Statistics is the science of conducting studies to collect, organize, summarize, analyze, and draw conclusions from data.

More information

Year 9: Long term plan

Year 9: Long term plan Year 9: Long term plan Year 9: Long term plan Unit Hours Powerful procedures 7 Round and round 4 How to become an expert equation solver 6 Why scatter? 6 The construction site 7 Thinking proportionally

More information

Special Review Section. Copyright 2014 Pearson Education, Inc.

Special Review Section. Copyright 2014 Pearson Education, Inc. Special Review Section SRS-1--1 Special Review Section Chapter 1: The Where, Why, and How of Data Collection Chapter 2: Graphs, Charts, and Tables Describing Your Data Chapter 3: Describing Data Using

More information

GLM II. Basic Modeling Strategy CAS Ratemaking and Product Management Seminar by Paul Bailey. March 10, 2015

GLM II. Basic Modeling Strategy CAS Ratemaking and Product Management Seminar by Paul Bailey. March 10, 2015 GLM II Basic Modeling Strategy 2015 CAS Ratemaking and Product Management Seminar by Paul Bailey March 10, 2015 Building predictive models is a multi-step process Set project goals and review background

More information

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Minitab 17 commands Prepared by Jeffrey S. Simonoff Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save

More information

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008 Cluster Analysis Jia Li Department of Statistics Penn State University Summer School in Statistics for Astronomers IV June 9-1, 8 1 Clustering A basic tool in data mining/pattern recognition: Divide a

More information

2) In the formula for the Confidence Interval for the Mean, if the Confidence Coefficient, z(α/2) = 1.65, what is the Confidence Level?

2) In the formula for the Confidence Interval for the Mean, if the Confidence Coefficient, z(α/2) = 1.65, what is the Confidence Level? Pg.431 1)The mean of the sampling distribution of means is equal to the mean of the population. T-F, and why or why not? True. If you were to take every possible sample from the population, and calculate

More information

Data Preprocessing. Data Preprocessing

Data Preprocessing. Data Preprocessing Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should

More information

Modeling Plant Succession with Markov Matrices

Modeling Plant Succession with Markov Matrices Modeling Plant Succession with Markov Matrices 1 Modeling Plant Succession with Markov Matrices Concluding Paper Undergraduate Biology and Math Training Program New Jersey Institute of Technology Catherine

More information

Date Lesson TOPIC HOMEWORK. Displaying Data WS 6.1. Measures of Central Tendency WS 6.2. Common Distributions WS 6.6. Outliers WS 6.

Date Lesson TOPIC HOMEWORK. Displaying Data WS 6.1. Measures of Central Tendency WS 6.2. Common Distributions WS 6.6. Outliers WS 6. UNIT 6 ONE VARIABLE STATISTICS Date Lesson TOPIC HOMEWORK 6.1 3.3 6.2 3.4 Displaying Data WS 6.1 Measures of Central Tendency WS 6.2 6.3 6.4 3.5 6.5 3.5 Grouped Data Central Tendency Measures of Spread

More information

Confidence Intervals. Dennis Sun Data 301

Confidence Intervals. Dennis Sun Data 301 Dennis Sun Data 301 Statistical Inference probability Population / Box Sample / Data statistics The goal of statistics is to infer the unknown population from the sample. We ve already seen one mode of

More information

Probabilistic Analysis Tutorial

Probabilistic Analysis Tutorial Probabilistic Analysis Tutorial 2-1 Probabilistic Analysis Tutorial This tutorial will familiarize the user with the Probabilistic Analysis features of Swedge. In a Probabilistic Analysis, you can define

More information

Data mining, 4 cu Lecture 6:

Data mining, 4 cu Lecture 6: 582364 Data mining, 4 cu Lecture 6: Quantitative association rules Multi-level association rules Spring 2010 Lecturer: Juho Rousu Teaching assistant: Taru Itäpelto Data mining, Spring 2010 (Slides adapted

More information

CMP Book: Investigation Number Objective: PASS: 1.1 Describe data distributions and display in line and bar graphs

CMP Book: Investigation Number Objective: PASS: 1.1 Describe data distributions and display in line and bar graphs Data About Us (6th Grade) (Statistics) 1.1 Describe data distributions and display in line and bar graphs. 6.5.1 1.2, 1.3, 1.4 Analyze data using range, mode, and median. 6.5.3 Display data in tables,

More information

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data. Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting

More information

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. 1 CHAPTER 1 Introduction Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. Variable: Any characteristic of a person or thing that can be expressed

More information

Chapter Two: Descriptive Methods 1/50

Chapter Two: Descriptive Methods 1/50 Chapter Two: Descriptive Methods 1/50 2.1 Introduction 2/50 2.1 Introduction We previously said that descriptive statistics is made up of various techniques used to summarize the information contained

More information

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 1 2 Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 2. How to construct (in your head!) and interpret confidence intervals.

More information

Unit 8 SUPPLEMENT Normal, T, Chi Square, F, and Sums of Normals

Unit 8 SUPPLEMENT Normal, T, Chi Square, F, and Sums of Normals BIOSTATS 540 Fall 017 8. SUPPLEMENT Normal, T, Chi Square, F and Sums of Normals Page 1 of Unit 8 SUPPLEMENT Normal, T, Chi Square, F, and Sums of Normals Topic 1. Normal Distribution.. a. Definition..

More information

Single Slit Diffraction

Single Slit Diffraction Name: Date: PC1142 Physics II Single Slit Diffraction 5 Laboratory Worksheet Part A: Qualitative Observation of Single Slit Diffraction Pattern L = a 2y 0.20 mm 0.02 mm Data Table 1 Question A-1: Describe

More information

Chapter2 Description of samples and populations. 2.1 Introduction.

Chapter2 Description of samples and populations. 2.1 Introduction. Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that

More information

Chapter 7. Conclusions

Chapter 7. Conclusions 132 Spatial Data Representations Chapter 7. Conclusions This dissertation has addressed three current problems with spatial data representations. First is the need for data representations that support

More information

Correlation of 2012 Texas Essential Knowledge and Skills (TEKS) for Mathematics to Moving with Math-by-Topic Level D Grade 8

Correlation of 2012 Texas Essential Knowledge and Skills (TEKS) for Mathematics to Moving with Math-by-Topic Level D Grade 8 Correlation of 2012 Texas Essential Knowledge and Skills (TEKS) for Mathematics to Moving with Math-by-Topic Level D Grade 8 8.1 Mathematical process standards. The student uses mathematical processes

More information

YEAR 7 MATHS SCHEMES OF WORK

YEAR 7 MATHS SCHEMES OF WORK YEAR 7 MATHS SCHEMES OF WORK 2018-19 Autumn Term (September-December) SOW: We follow the STP Mathematics course in year 7, published by Oxford University Press. fanez@al-ashraf.gloucs.sch.uk Unit Topic

More information

Data Mining and Analytics. Introduction

Data Mining and Analytics. Introduction Data Mining and Analytics Introduction Data Mining Data mining refers to extracting or mining knowledge from large amounts of data It is also termed as Knowledge Discovery from Data (KDD) Mostly, data

More information

Analytic Performance Models for Bounded Queueing Systems

Analytic Performance Models for Bounded Queueing Systems Analytic Performance Models for Bounded Queueing Systems Praveen Krishnamurthy Roger D. Chamberlain Praveen Krishnamurthy and Roger D. Chamberlain, Analytic Performance Models for Bounded Queueing Systems,

More information

Elementary Statistics

Elementary Statistics 1 Elementary Statistics Introduction Statistics is the collection of methods for planning experiments, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing

More information

An Introduction to Growth Curve Analysis using Structural Equation Modeling

An Introduction to Growth Curve Analysis using Structural Equation Modeling An Introduction to Growth Curve Analysis using Structural Equation Modeling James Jaccard New York University 1 Overview Will introduce the basics of growth curve analysis (GCA) and the fundamental questions

More information

Chapter 2 Organizing and Graphing Data. 2.1 Organizing and Graphing Qualitative Data

Chapter 2 Organizing and Graphing Data. 2.1 Organizing and Graphing Qualitative Data Chapter 2 Organizing and Graphing Data 2.1 Organizing and Graphing Qualitative Data 2.2 Organizing and Graphing Quantitative Data 2.3 Stem-and-leaf Displays 2.4 Dotplots 2.1 Organizing and Graphing Qualitative

More information

Course of study- Algebra Introduction: Algebra 1-2 is a course offered in the Mathematics Department. The course will be primarily taken by

Course of study- Algebra Introduction: Algebra 1-2 is a course offered in the Mathematics Department. The course will be primarily taken by Course of study- Algebra 1-2 1. Introduction: Algebra 1-2 is a course offered in the Mathematics Department. The course will be primarily taken by students in Grades 9 and 10, but since all students must

More information