Introduction to digital image classification

Introduction to digital image classification Dr. Norman Kerle, Wan Bakx MSc a.o. INTERNATIONAL INSTITUTE FOR GEO-INFORMATION SCIENCE AND EARTH OBSERVATION

Purpose of lecture Main lecture topics Review of basic concepts of pixel-based classification Review of principal terms (Image space vs. feature space) Decision boundaries in feature space Unsupervised vs. supervised classification Training of classifier Classification algorithms available Validation of results Problems and limitations

The Remote Sensing Process Energy Source Sensor SatCom Application Target Processing Station Analysis

Multispectral Classification What is it? grouping of similar features separation of dissimilar ones assigning class label to pixels resulting in manageable size of classes

Generalised workflow Primary Data Acquisition Pre-processing Image restoration, Radiometric corrections, Geometric corrections Image Enhancement Contrast, Noise, Sharpness Image Fusion Multi-temporal, Multi-resolution, Mosaicking Feature Extraction, quantitative Spectral (NDVI), Spatial (lines, edges), Statistical (PCA) Information extraction, qualitative Classification Supervised Unsupervised Segmentation, spatial objects Visual Interpretation

Multispectral Classification What are the advantages of using image classification? We are not interested in brightness values, but in thematic characteristics To translate continuous variability of image data into map patterns that provide meaning to the user To obtain insight in the data with respect to ground cover and surface characteristics To find anomalous patterns in the image data set

Multispectral Classification Why use it? - cont Cost efficient in the analyses of large data sets Results can be reproduced More objective then visual interpretation Effective analysis of complex multi-band (spectral) interrelationships Classification achieves data size reduction Together with manual digitising and photogrammetric processing (for map making), classification is the most commonly used image processing technique

Classification Methods MANUAL (Chapter 11) visual interpretation combination of spectral and spatial information (all interpretation elements) COMPUTER ASSISTED mainly spectral information STRATIFIED using GIS functionality to incorporate knowledge from other sources of information

Supervised Classification Objective: Converting image data into thematic data

Image Space Multi-band Image

One-dimensional feature space Histogram Input layer (single)? No distinction between slices/classes Segmented image Histogram Distinction between slices/classes unsupervised classification

Multi-dimensional Feature Space feature vectors e.g. (34, 25, 117) (34, 24, 119) statistical pattern recognition

Feature space (scatterplot) Low frequency Feature space Two/three dimensional graph or scattered diagram Formation of clusters of points representing DN values in two/three spectral bands Each cluster of points corresponds to a certain cover type on ground (theoretically) High frequency 1D

Distances and clusters in feature space band y (units of 5 DN).. (0,0) band x (units of 5 DN) Max y Min y....... Euclidian distance (0,0) Min x Max x Cluster

Supervised vs. unsupervised classification UNSUPERVISED APPROACH Based on spectral groupings Considers only spectral distance measures Minimum user interaction Requires interpretation after classification SUPERVISED APPROACH Based on spectral groupings Incorporates prior knowledge/samples More extensive user interaction

Unsupervised Slicing Input layer (single) Segmented image Histogram Distinction between slices unsupervised classification

Unsupervised classification (clustering) Clustering algorithm User defined cluster parameters Class mean vectors are arbitrarily set by algorithm (iteration 0) Class allocation of feature vectors Compute new class mean vectors Class allocation (iteration 2) Re-compute class mean vectors Iterations continue until convergence threshold has been reached Final class allocation Cluster statistics reporting Feature spaces!

Supervised Classification Principle Collect samples for training the classifier Define clusters (decision boundaries) in the feature space Assign a class label to a pixel based on its feature vector and the predefined clusters in the feature space (160,170) = Grass (60,40)= House

Supervised classification procedure 1. Define/describe the different classes 2. Collect ground truth Define classes Text 3. Create a sample set N Collect Ground Truth Analogue/ Digital data 4. Choose a classifier / decision rule / algorithm Satisfied Y /N Accuracy matrix Quality Assessment Create Sample set Digital samples Statistics Choose decision rule 5. Classify 6. Assess the (quality of the) classifcation 1. Return to 1, 2, 3 or 4 Classify Image data Classification

Training sample statistics E.g. Minimum, Maximum, Mean, Standard deviation, Variance, Co-Variance

Training samples in potential feature spaces The points a,b and c are cluster centres of clusters A, B and C. Line ab is the distance between the cluster centres A and B. There is overlap between the clusters A and B. Clusters A and B are too close to each other which may give the problem of overlap between the classes.

Sample set - 1 Band Ground-truth Freq. 300 200 100 Histogram of training/sample set 0 0 31 63 95 127 159 191 223 255 Samples set of classes Class-Slices

1 band/dimension - Slicing Histogram of training set 300 200 100 0 0 31 63 95 127 159 191 223 255 Class-Intervals Decision rule: Priority to the smallest slice length/spreading

Two bands Box Classification Means and Standard Deviations 255 255 Partitioned Feature Space Band 2 Band 2 0 0 0 Band 1 255 0 Band 1 255 Feature Space Partitioning - Box classifier [Min,Max] or [Mean - xsd,mean + xsd]

Box classification Characteristics considers only the lower and the upper limits of cluster computation is simple and fast Disadvantage overlapping boxes poorly adapted to cluster shape

1 Dimension - Minimum Distance Histogram of training set 300 200 100 0 0 31 63 95 127 159 191 223 255 Class-Intervals Decision rule: Priority to the shortest distance to the class mean

N dimensions Min. Distance to Mean 255 "Unknown" 255 Mean vectors Band 2 Band 2 0 0 Band 1 255 255 0 0 Band 1 255 Band 2 Feature Space Partitioning - Minimum Distance to Mean Classifier 0 0 Band 1 255 Threshold Distance

Minimum distance to mean classifier Characteristics emphasis on the location of cluster centre class labelling by considering minimum distance to the cluster centres Disadvantage disregards the presence of variability within a class shape and size of the clusters are not considered

1 band Maximum Likelihood Histogram of training set & Probability density functions The probability that a pixel value x belongs to a class is calculated assuming a normal/gaussian distribution 300 200 100 f(x) = σ 1 2π e (x μ) 2σ 2 2 0 0 31 63 95 127 159 191 223 255 Class-Intervals Decision rule: Priority to the highest probability (based upon σ and μ)

Maximum likelihood classifier 255 "Unknown" Mean vectors and variancecovariance matrices Band 2 255 0 Band 2 0 Band 1 255 255 0 0 Band 1 255 Band 2 Feature Space Partitioning - Maximum Likelihood Classifier 0 0 Band 1 255

Maximum Likelihood classifcation Characteristics considers variability within a cluster considers the shape, the size and the orientation of clusters Equiprobability contours Disadvantage takes more computing time based on assumption that Probability Density Function is normally distributed Probability density functions (Lillesand and Kiefer, 1987)

Validation samples sampling design Number of samples is related to: The number of samples that must be taken in order to reject a data set as being inaccurate The number of samples required to determine the true accuracy, within some error bounds Sampling design: A B A B A B C C C Systematic Sampling (n=9) Simple Random Sampling (n=9) Stratified Random Sampling (n=9)

Accuracy assessment Basic data for 4 land cover classes 163 ground truth samples Classification Result Reference Class Total A B C D A 35 14 11 1 61 B 4 11 3 0 18 C 12 9 38 4 63 D 2 5 12 2 21 Total 53 39 64 7 163 Reference or Ground Truth Sample/training set

Measures of thematic accuracy Error of commission and user accuracy Error of omission and producer accuracy Reference Class A B C D Total Error of Commision User Accuracy Classification result A 35 14 11 1 61 43% 57% B 4 11 3 0 18 39% 61% C 12 9 38 4 63 40% 60% D 2 5 12 2 21 90% 10% Total 53 39 64 7 163 Error of Omission Producer Accuracy 34% 72% 41% 71% Overall Accuracy = SumDiag/SumTotal (4+12+2)/53......... 53% 66% 28% 59% 29% 35/53......... Error or confusion matrix

Validation Reference Class A B C D Total Error of Commision User Accuracy Classification result A 35 14 11 1 61 43 57% B 4 11 3 0 18 39 61% C 12 9 38 4 63 40 60% D 2 5 12 2 21 90 10% Total 53 39 64 7 163 Error of Omission Producer Accuracy 34 72 41 71 Overall Accuracy = SumDiag/SumTotal (4+12+2)/53......... 53% 66% 28% 59% 29% 35/53......... Row : Classification Error of Commission = Reliability = Row_offdiagonal/ Row Column : Reference Error of Omission = Accuracy/class = Col_offdiagonal/ Col

Validation terminology User accuracy: Probability that a certain reference class has also been labelled as that class. In other words, it tells us the likelihood that a pixel classified as a certain class actually represents that class (57% of what has been classified as A is A). Producer accuracy: Probability that a reference pixel on a map is that particular class. It indicates how well the reference pixels for that class have been classified (66% of the reference pixels A were classified as A) Kappa statistic: Takes into account that even assigning labels at random has a certain degree of accuracy. Kappa allows to detect if 2 datasets have a statistically different accuracy.

Error matrix The error matrix provides information on the overall accuracy = proportion correctly classified (PCC) PCC tells about the amount of error, not where the errors are located PCC = Sum of the diagonal elements/total number of sampled pixels for accuracy assessment

Improvements Create more than 1 feature class for one land cover/use class Filter salt/pepper (majority on result) Use masks to identify areas where other rules apply (hybrid) Use multi temporal expertise to identify classes (expert knowledge) Use other additional data (expert knowledge)

Classification preparation Application dependent aspects: Class definition Spatio-temporal characteristics Sensor characteristics: Bands Spatial resolution Acquisition date(s) Band selection constraints: Non correlated set Software limitations Sensor(s)

Class definition problems No use of other characteristics location, orientation, pattern, texture... Spectral overlap Heterogeneous classes Mixed pixels Class definition Land Use Land Cover

Problems Land Cover/Land Use Constraints of pixel based image classification it results in spectral classes each pixel is assigned to one class only Land use Land cover Sport Grass Training samples Spectral classes Meadow Spectral bands - Spectral classes - Land cover - Land use

Problems Land Cover/Land Use Spectral Class Land Cover Class Land Use Class water water shrimp cultivation grass1 grass2 grass3 bare soil grass grass grass bare soil nature reserve nature reserve nature reserve nature reserve trees1 trees2 trees3 forest forest forest nature reserve production forest city park 1-n and n-1 relationships can exist between land cover and land use classes DEM or other additional data can help improve a classification

Problems mixed pixels Objects smaller than a pixel Mixtures Boundaries between objects Transitions

Problems spatial resolution Resolution dependency Each pixel contains approximately the same mixture Distinct reflection measurement Regular, repetition of spatial pattern Large cluster in the feature space Spectral overlap with other classes

Alternative procedures Hybrid (stratified) Classification Unsupervised/Clustering (Hyper)Spectral Classifications Subpixel Classification Object Based Classification Expert/Knowledge Based Classification Neural Network

Example - Feature space

Box classification factor 1.7

Box classification factor 4

Box classification factor 10

Minimum distance threshold 50

Minimum distance threshold 100

Maximum likelihood threshold 100

Object Based Classification (adv.) Segmentation Classified segments Assessment Image Majority based Object classification Pixel Based classification Assessment

Objects Obtain objects by: Edge detection Post-classification Segmentation Vector reference

Classes Obtain class label from: Frequency/majority Object mean...

OBC by object means Image Segmentation pixels segment ii value = μ (segment i ) i ) Classify segments Classify segments Assessment Training samples Retrieve class signatures