Data Classification 1
Data Classification The idea of classification is to group together items that are alike The objective of classification is to group data in such a manner that not only are the observations within a class similar but also the classes themselves are dissimilar 2
Potential Classification of Autos in a Parking Lot 3
Classification The three steps taken in data classification include: The selection of the number of classes The classification procedure utilized An analysis of classification accuracy 4
Classifying Data Three decisions to make prior to classification: How many classes? What method to use for placing the values into classes? What kind of symbology? 5
Selection of the Number of Classes The more classes utilized, the more complex and often confusing the classification Too few classes oversimplifies the data and can hide detail The cartographer often selects four or five classes in which to group the data 6
Selection of the Number of Classes Sturges (1926) provides a basic formula that give a starting point of the number of classes suggested compared the number of observations 7
Spatial Patterns Created by Varying the Number of Data Classes Used 8
Data Classification Schemes The selection of the appropriate data classification scheme is determined by the characteristics of the data and the desired level of generalization 9
Data Classification Schemes Jenks and Coulson (1963) suggest the following five requirements should be met in the selection of class intervals: Encompass the full range of the data Have neither overlapping values nor vacant classes Be great enough in number to avoid sacrificing the accuracy of the data, but not be so numerous as to infer a greater degree of accuracy than is warranted by the nature of the collected observations Divide the data into reasonably equal groups of observations Have a logical mathematical relationship if practical 10
The number of classes became somewhat standardized when it was learned that map readers could not easily distinguish between more than 11 area symbol gray tones. 11
Common Techniques Used to Classify Data There are nine common techniques used to classify data: Natural breaks Optimization Nested means Mean and standard deviation Equal interval Equal frequency Arithmetic Geometric User defined 12
Data Used in Examples We ll use Georgia s General Fertility Rate (GFR) by county for 2000. The data ranges from 41.14 live births to women of any age per 1000 females ages 15-44, to a maximum of 101.45 13
Natural Breaks When data are ranked gaps can occur with some small and some large 14
Classification Methods Natural Breaks Equal Intervals Quantile Manual 15
How to Decide, Part II 16
Histogram Distribution of Georgia s General Fertility Rates, 2000 17
Data Breaks Used For Classification 18
Optimization An algorithm for determining an optimal selection of natural breaks was developed by Walter Fisher (1958) and implemented by George Jenks in 1977 Often called the Jenks Optimization Method or even optimization method. Mathematically based on deviations about the median. Has been said this classification does the best job of evaluating how data are distributed along the number line of interval data. 19
Nested Means A classification technique based on the mean of the data in order to group the data into two classes. The means of those two groups are used to create two more groups and then a third time. 20
Mean and Standard Deviation If the data set displays a normal frequency distribution, class boundaries can be established using its standard deviation 21
Equal Interval Assumes a desire for the data range of each class to be held constant Sometimes referred to as an equal step classification 22
Equal Frequency This classification distributes the number of observations equally among each of the classes Frequently the cartographer divides the data into quartiles (four divisions) or quintiles (five divisions) 23
Arithmetic and Geometric Intervals Used when classifying data with significant ranges For example, when looking at global population by country from Tuvalu (11,468) to China (1.4 billion) 24
User Defined Permits the cartographer to determine the class breaks Not used very often 25
Comparison of Classification Schemes 26
Comparison of Classification Schemes 27
Classification Methods, a Comparison Percent Forest Cover by County in Lower Silesia, Poland Sorted set of research data 28
Classification Methods, a Comparison Number of classes are based on this graph of the previous table of data 29
Comparison of Different Software Chloropleth Maps 30
Representing Quantities 31