Data Classification 1

Similar documents
Key Terms. Symbology. Categorical attributes. Style. Layer file

Introduction to Geospatial Analysis

QGIS LAB SERIES GST 102: Spatial Analysis Lab 2: Introduction to Geospatial Analysis

statistical mapping outline

Session 3: Cartography in ArcGIS. Mapping population data

Week 2: Frequency distributions

Create a Color-Shaded Map

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.

Add to the ArcMap layout the Census dataset which are located in your Census folder.

Data analysis using Microsoft Excel

Chapter Two: Descriptive Methods 1/50

Chapter 6: DESCRIPTIVE STATISTICS

CHAPTER 2: SAMPLING AND DATA

Announcements. Data Sources a list of data files and their sources, an example of what I am looking for:

Geography 222 Quantitative Color for GIS Mike Pesses, Antelope Valley College

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Choropleth Mapping with GIS

Use of GeoGebra in teaching about central tendency and spread variability

Standard 1 Students will expand number sense to include integers and perform operations with whole numbers, simple fractions, and decimals.

Chapter 2 Describing, Exploring, and Comparing Data

I. Recursive Descriptions A phrase like to get the next term you add 2, which tells how to obtain

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Levels of Measurement. Data classing principles and methods. Nominal. Ordinal. Interval. Ratio. Nominal: Categorical measure [e.g.

Exercise 6: Symbolizing your data

Watershed Sciences 4930 & 6920 GEOGRAPHIC INFORMATION SYSTEMS

GEOGRAPHY 426 LAB 4: Choropleth Maps

Chapter 2 Modeling Distributions of Data

Slide Copyright 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION. Chapter 13. Statistics Sampling Techniques

Grade 7 Mathematics Performance Level Descriptors

The Design and Application of GIS Mathematical Model Database System with Meta-algorithm Li-Zhijiang

ICT & MATHS. Excel 2003 in Mathematics Teaching

STAT:5400 Computing in Statistics

1 Introduction. 1.1 What is Statistics?


1. To condense data in a single value. 2. To facilitate comparisons between data.

Lecture Notes 3: Data summarization

Table of Contents (As covered from textbook)

Quantitative - One Population

PA Core Standards For Mathematics Curriculum Framework Grade Level 3

Averages and Variation

Measures of Central Tendency

How to Make Graphs in EXCEL

Week 4: Describing data and estimation

Ontario Cancer Profiles User Help File

Dealing with Natural Hazards. Module 1. Topic Group: Data presentation

NCSS Statistical Software

Cartographic symbolization

CHAPTER 2 Modeling Distributions of Data

MAG Demographic Map Viewer Training

Chapter 3: Data Description - Part 3. Homework: Exercises 1-21 odd, odd, odd, 107, 109, 118, 119, 120, odd

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data

Statistics Case Study 2000 M. J. Clancy and M. C. Linn

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Getting To Know The Multiform Bivariate Matrix

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.

Mathematics K-8 Content Standards

2.1: Frequency Distributions and Their Graphs

Section 3.2 Measures of Central Tendency MDM4U Jensen

Students will understand 1. that numerical expressions can be written and evaluated using whole number exponents

Middle School Math Course 2

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Lecture 7 Attribute-based Operations

Repeat or Not? That Is the Question!

You will begin by exploring the locations of the long term care facilities in Massachusetts using descriptive statistics.

2003/2010 ACOS MATHEMATICS CONTENT CORRELATION GRADE ACOS 2010 ACOS

An Initial Seed Selection Algorithm for K-means Clustering of Georeferenced Data to Improve

Automatic Shot Boundary Detection and Classification of Indoor and Outdoor Scenes

APS Seventh Grade Math District Benchmark Assessment NM Math Standards Alignment

Tabular & Graphical Presentation of data

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

CCSS Standard. CMSD Dynamic Pacing Guide 3 rd Grade Math I Can Statements + Introduced and Assessed Introduced First Nine Weeks

Carnegie Learning Math Series Course 1, A Florida Standards Program. Chapter 1: Factors, Multiples, Primes, and Composites

Course of study- Algebra Introduction: Algebra 1-2 is a course offered in the Mathematics Department. The course will be primarily taken by

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

Road Map. Data types Measuring data Data cleaning Data integration Data transformation Data reduction Data discretization Summary

X On record with the USOE.

LECTURE TWO Representations, Projections and Coordinates

Exercise Producing Thematic Maps for Dissemination

1 Overview of Statistics; Essential Vocabulary

Atmospheric Sciences

Anadarko Public Schools MATH Power Standards

Exploratory Data Analysis

Lecture 2 Map design. Dr. Zhang Spring, 2017

3 Graphical Displays of Data

K-5 Mathematics Missouri Learning Standards: Grade-Level Expectations

Textbook Alignment to the Utah Core 5th Grade Mathematics

Today Function. Note: If you want to retrieve the date and time that the computer is set to, use the =NOW() function.

Watershed Sciences 4930 & 6920 ADVANCED GIS

Measures of Central Tendency. A measure of central tendency is a value used to represent the typical or average value in a data set.

Ch6: The Normal Distribution

GRAPHING IN EXCEL EXCEL LAB #2

TIPS4Math Grades 4 to 6 Overview Grade 4 Grade 5 Grade 6 Collect, Organize, and Display Primary Data (4+ days)

1 st Grade Math Curriculum Crosswalk

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Math 214 Introductory Statistics Summer Class Notes Sections 3.2, : 1-21 odd 3.3: 7-13, Measures of Central Tendency

MAC-CPTM Situations Project. Situation 04: Representing Standard Deviation* (* formerly Bull s Eye )

Raster Suitability Analysis: Siting a Wind Farm Facility North Of Beijing, China

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

GEOGRAPHIC INFORMATION SYSTEMS Lecture 18: Spatial Modeling

10.4 Measures of Central Tendency and Variation

Transcription:

Data Classification 1

Data Classification The idea of classification is to group together items that are alike The objective of classification is to group data in such a manner that not only are the observations within a class similar but also the classes themselves are dissimilar 2

Potential Classification of Autos in a Parking Lot 3

Classification The three steps taken in data classification include: The selection of the number of classes The classification procedure utilized An analysis of classification accuracy 4

Classifying Data Three decisions to make prior to classification: How many classes? What method to use for placing the values into classes? What kind of symbology? 5

Selection of the Number of Classes The more classes utilized, the more complex and often confusing the classification Too few classes oversimplifies the data and can hide detail The cartographer often selects four or five classes in which to group the data 6

Selection of the Number of Classes Sturges (1926) provides a basic formula that give a starting point of the number of classes suggested compared the number of observations 7

Spatial Patterns Created by Varying the Number of Data Classes Used 8

Data Classification Schemes The selection of the appropriate data classification scheme is determined by the characteristics of the data and the desired level of generalization 9

Data Classification Schemes Jenks and Coulson (1963) suggest the following five requirements should be met in the selection of class intervals: Encompass the full range of the data Have neither overlapping values nor vacant classes Be great enough in number to avoid sacrificing the accuracy of the data, but not be so numerous as to infer a greater degree of accuracy than is warranted by the nature of the collected observations Divide the data into reasonably equal groups of observations Have a logical mathematical relationship if practical 10

The number of classes became somewhat standardized when it was learned that map readers could not easily distinguish between more than 11 area symbol gray tones. 11

Common Techniques Used to Classify Data There are nine common techniques used to classify data: Natural breaks Optimization Nested means Mean and standard deviation Equal interval Equal frequency Arithmetic Geometric User defined 12

Data Used in Examples We ll use Georgia s General Fertility Rate (GFR) by county for 2000. The data ranges from 41.14 live births to women of any age per 1000 females ages 15-44, to a maximum of 101.45 13

Natural Breaks When data are ranked gaps can occur with some small and some large 14

Classification Methods Natural Breaks Equal Intervals Quantile Manual 15

How to Decide, Part II 16

Histogram Distribution of Georgia s General Fertility Rates, 2000 17

Data Breaks Used For Classification 18

Optimization An algorithm for determining an optimal selection of natural breaks was developed by Walter Fisher (1958) and implemented by George Jenks in 1977 Often called the Jenks Optimization Method or even optimization method. Mathematically based on deviations about the median. Has been said this classification does the best job of evaluating how data are distributed along the number line of interval data. 19

Nested Means A classification technique based on the mean of the data in order to group the data into two classes. The means of those two groups are used to create two more groups and then a third time. 20

Mean and Standard Deviation If the data set displays a normal frequency distribution, class boundaries can be established using its standard deviation 21

Equal Interval Assumes a desire for the data range of each class to be held constant Sometimes referred to as an equal step classification 22

Equal Frequency This classification distributes the number of observations equally among each of the classes Frequently the cartographer divides the data into quartiles (four divisions) or quintiles (five divisions) 23

Arithmetic and Geometric Intervals Used when classifying data with significant ranges For example, when looking at global population by country from Tuvalu (11,468) to China (1.4 billion) 24

User Defined Permits the cartographer to determine the class breaks Not used very often 25

Comparison of Classification Schemes 26

Comparison of Classification Schemes 27

Classification Methods, a Comparison Percent Forest Cover by County in Lower Silesia, Poland Sorted set of research data 28

Classification Methods, a Comparison Number of classes are based on this graph of the previous table of data 29

Comparison of Different Software Chloropleth Maps 30

Representing Quantities 31