Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Similar documents
Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Lecture 11: Classification

INF 4300 Classification III Anne Solberg The agenda today:

Supervised vs unsupervised clustering

CS 229 Midterm Review

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Remote Sensing & Photogrammetry W4. Beata Hejmanowska Building C4, room 212, phone:

Unsupervised Learning

Supervised vs. Unsupervised Learning

Clustering: Classic Methods and Modern Views


Spectral Classification

Unsupervised Learning and Clustering

ECG782: Multidimensional Digital Signal Processing

Discriminate Analysis

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Unsupervised Learning and Clustering

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Methods for Intelligent Systems

Classifiers and Detection. D.A. Forsyth

Clustering. Supervised vs. Unsupervised Learning

Supervised Learning for Image Segmentation

Region-based Segmentation

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Clustering Lecture 5: Mixture Model

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Artificial Intelligence. Programming Styles

Pattern recognition. Classification/Clustering GW Chapter 12 (some concepts) Textures

CSE 158. Web Mining and Recommender Systems. Midterm recap

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Machine learning Pattern recognition. Classification/Clustering GW Chapter 12 (some concepts) Textures

Unsupervised Learning : Clustering

Pattern recognition. Classification/Clustering GW Chapter 12 (some concepts) Textures

Data: a collection of numbers or facts that require further processing before they are meaningful

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

Segmentation Computer Vision Spring 2018, Lecture 27

Introduction to digital image classification

Machine Learning in Biology

ECE 5424: Introduction to Machine Learning

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Pattern Recognition & Classification

Network Traffic Measurements and Analysis

Intro to Artificial Intelligence

Introduction to Computer Science

Lecture 7: Decision Trees

AN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION

Unsupervised Learning

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Fundamentals of Digital Image Processing

Modern Medical Image Analysis 8DC00 Exam

Lab 9. Julia Janicki. Introduction

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos

Gene Clustering & Classification

Expectation Maximization (EM) and Gaussian Mixture Models

k-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

IBL and clustering. Relationship of IBL with CBR

Adaptive Learning of an Accurate Skin-Color Model

A Review on Plant Disease Detection using Image Processing

Pattern Classification Algorithms for Face Recognition

Fuzzy Segmentation. Chapter Introduction. 4.2 Unsupervised Clustering.

6.801/866. Segmentation and Line Fitting. T. Darrell

Content-based image and video analysis. Machine learning

PARALLEL CLASSIFICATION ALGORITHMS

Unsupervised Learning: Clustering

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Pit Pattern Classification of Zoom-Endoscopic Colon Images using D

CS Introduction to Data Mining Instructor: Abdullah Mueen

Pattern Recognition ( , RIT) Exercise 1 Solution

Pattern recognition (4)

Unsupervised: no target value to predict

SGN (4 cr) Chapter 11

Machine Learning for. Artem Lind & Aleskandr Tkachenko

Lecture 8 Object Descriptors

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

Deep Learning for Computer Vision

Image Classification. RS Image Classification. Present by: Dr.Weerakaset Suanpaga

Figure 1: Workflow of object-based classification

Cluster Analysis. Angela Montanari and Laura Anderlucci

Lecture 10: Semantic Segmentation and Clustering

Bayes Risk. Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Discriminative vs Generative Models. Loss functions in classifiers

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

Cluster Analysis: Agglomerate Hierarchical Clustering

ECG782: Multidimensional Digital Signal Processing

1 Case study of SVM (Rob)

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Classifiers for Recognition Reading: Chapter 22 (skip 22.3)

SD 372 Pattern Recognition

Unsupervised Learning

Large Scale Data Analysis Using Deep Learning

CHAPTER 4 SEMANTIC REGION-BASED IMAGE RETRIEVAL (SRBIR)

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach

ADVANCED MACHINE LEARNING MACHINE LEARNING. Kernel for Clustering kernel K-Means

(Refer Slide Time: 0:51)

Transcription:

Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Outline An overview on classification Basics of classification How to choose appropriate features (descriptors) How to perform classification Classifiers

Where are we right now? Figure: Classification in image processing.

What is classification? Classification is a process in which individual items (objects/patterns/image regions/pixels) are grouped based on the similarity between the item and the description of the group Figure: Classification process.

Classification example 1 Figure: Classification of leaves.

Classification example 2 Figure: Histological bone implant images.

Terminology Object = pattern = point = sample = vector Feature = descriptor = attribute = measurement Classifier = decision function (boundary) Class Cluster

Dataset - example By measuring features of many objects we can construct a dataset Object Perimeter Area Label Apple 1 25 80 1 Apple 2 30 78 1 Apple 3 29 81 1 Pear 1 52 65 2 Pear 2 48 66 2 Pear 3 51 63 2

Data set and a representation in the feature space

Feature set - Example (Lab 2) Colour? Area? Perimeter?...

Features (Descriptors) Features are individual measurable properties of a phenomena The goal is to choose those features that allow pattern vectors belonging to different classes to occupy compact and disjoint regions Discriminating (effective) features Independent features

What are good features? You might try many, many features, until you find the right ones Often, people compute 100s of features, and put them all in a classifier The classifier will figure out which ones are good This is wrong!!! It is application dependent

Peaking phenomenon (Curse of dimensionality) Additional features may actually degrade the performance of a classifier This paradox is called peaking phenomenon (curse of dimensionality) Figure: Curse of dimensionality.

Dimensionality reduction Keep the number of features as small as possible Measurements cost Accuracy Simplify pattern representation The resulting classifier will be faster and will use less memory OBS! A reduction in the number of features may lead to a loss in the discriminatory power and thereby lower the accuracy

Feature selection Feature selection methods choose features from the original set based on some criteria Reduce dimensionality by selecting subset of original features Have clear physical meaning

Feature extraction Feature extraction is the process of generating features to be used in the selection and classification tasks Might not have clear physical meaning

Supervised vs. Unsupervised classification Supervised First apply knowledge, then classify Unsupervised First classify, then apply knowledge Figure: Supervised and unsupervised classification.

Supervised classification Train and classify Training Find rules and discriminant function that separate patterns in different classes using known examples Classification Take a new unknown example and put it into the correct class using discriminant function

Training data Training data can be obtained from available training samples Classify patterns which are not used during the training stage The performance of classifier depends on the number of available training samples as well as the specific values of the samples Number of training samples should not be too small!

Training data Conditional densities are unknown and they have to be learned from available training data Maybe density is known (for example, Gaussian distribution) A common strategy is to replace unknown parameters by the their estimated values Less available information - the difficulty of a classification problem increases

How to choose appropriate training set?

How to choose appropriate training set? Figure: Original image and training image.

How to choose appropriate training set? Figure: Original image and training image.

Object-wise and pixel-wise classification Object-wise classification Uses shape information to describe patterns Size, mean intensity, mean color, etc. Pixel-wise classification Uses information from individual pixels Intensity, color, texture, spectral information

Object-wise classification Lab 2

Object-wise classification Segment the image into regions and label them Extract features for each pattern Train classifier on examples with known class to find discriminant function in the feature space For new examples decide their class using the discriminant function

Pixel-wise classification 256*256 patterns (pixels) 3 features (red, green and blue color) Four classes (stamen, leaf, stalk and background)

Pixel-wise classification The pattern is a pixel in a non-segmented image Extract (calculate) features for each pixel e.g., color, gray-level representation of texture Train classifier New samples are classified with a classifier

Classifiers Once a feature selection finds a proper representation, a classifier can be designed using a number of possible approaches The performance of classifier depends on the interrelationship between sample size, number of features and classifier complexity The choice of a classifier is a difficult problem!!! It is often based on which classifier(s) happen to be available or best known to the user

Discriminant functions A discriminant function for a class is a function that will yield larger values than functions for other classes if the pattern belongs to the class d i (x) > d j (x) j = 1, 2,..., N; j i For N pattern classes, we have N discriminant functions The decision boundary between class i and class j is d j (x) d i (x) = 0

Decision boundary

Decision boundary Figure: Decision boundary.

Linear and quadratic classifier Figure: Linear (left) and quadratic (right) classifier.

Thresholding - simple classification Classify image into foreground and background Figure: Original image and tresholded image.

Bayes classifiers Based on a priori knowledge of class probability Cost of errors Minimum distance classifier Maximum likelihood classifier

Minimum distance classifier Training is done using the objects (pixels) of known class Each class is represented by its mean vector m j Mean of the feature vectors for the object within the class is calculated New objects are classified by finding the closest mean vector Determine closeness D j (x) = x m j, j = 1, 2,..., N

Minimum distance classifier

Limitations of Minimum distance classifier When this classifier should be used? Optimal performance when distributions of the classes form spherical shape

Minimum distance classifier - equivalence Use x = (x T x) 1 2 to show the equivalence Minimal: D j (x) = x m j, j = 1, 2,..., N Maximal d j (x) = x T m j 1 2 mt j m j, j = 1, 2,..., N

Bayes classifiers Bayes formula p(a B) = p(a)p(b A) p(b) Classify according to the largest probability (taking variance and covariance into consideration) If not other specified, assume that distribution within each class is Gaussian The distribution in each class can be described by a mean vector and covariance matrix

Loss (cost) function The optimal rule for minimizing the risk can be stated as follows: Assign input pattern x to class ω i for which the conditional risk r i (x) = 1 p(x) N L(ω i, ω j ) p(ω j x)p(ω k ) j=1 is smallest L(ω i, ω j ) = L ij is the loss when putting x in ω j when it belongs to ω i p(ω j x) is the probability that pattern x comes from class ω i P(ω k ) is the probability of class ω i occurring Assign input pattern x to class ω k if the conditional risk r k is the smallest

Simplification 1: 0-1 Loss function Medicine: false negatives and false positives More costly are false negatives Often assume that all errors have equal cost L ij = 1 δ ij δ ij Dirac function Discriminant function (decision boundary) d j (x) = p(x ω j )P(ω j )

Simplification 2: Assume Gaussian Probability Density Function p(x ω j ) is a Gaussian for all classes Covariance matrix C j = E((x m j )(x m j ) T ) Discriminant function d j (x) = P(ω j ) 1 2 ln C j 1 2 ((x m j) T C 1 j (x m j ))

Simplification 3: Assume Equal Probability P(ω j ) is equal for all classes Discriminant function d j (x) = 1 2 ln C j 1 2 ((x m j) T C 1 j (x m j )) Maximum Likelihood classifier We have 0-1 loss function The pattern classes are Gaussian All classes are equally likely to occur

Simplification 4: Assume Same Covariance All classes have the same covariance matrix Discriminant function d j (x) = ((x m j ) T C 1 (x m j )) Linear discriminant function Hyperplanes are decision boundaries This discriminant function is also called Mahalanobis distance

Simplification 5: Assume Independent Patterns and Same Variance All classes have the same variance No covariance C is identity matrix Discriminant function d j (x) = (x m j ) T (x m j ) This is the Minimum distance classifier

Unsupervised classification Difficult, expensive or even impossible to rely label training sample with its true category (It is not possible to obtain ground truth) Patterns within a cluster are more similar to each other than are patterns belonging to different clusters

Unsupervised classification Difficult How to determine the number of clusters K? The number of clusters often depends on resolution (fine vs. coarse)

K means Step 1. Select an initial partition with K clusters. Repeat steps 2 through 4 until the cluster membership stabilizes Step 2. Generate a new partition by assigning each pattern to its closest cluster center Step 3. Compute new cluster centres as the centroids of the clusters Step 4. Repeat steps 2 and 3 until an optimum value of the criterion function is found (the cluster are stabilized)

K means

K means

K means - disadvantages Different initialization can result in different final clusters Fixed number of clusters can make it difficult to predict what K should be It is helpful to rerun the classification using the same as well as different K values, to compare the achieved results 10 different initializations for 2D data For N-dimensional data 10 different initializations is often not enough!

Problem The following eight points A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8) A5(7, 5) A6(6, 4) A7(1, 2) A8(4, 9) should be classify into three clusters using K means clustering. Initial cluster centres are: A1(2, 10), A4(5, 8) and A7(1, 2). Find the three cluster centres after the first iteration. The distance function between two points A(x a, y a ) and B = (x b, y b ) is defined as d(a, B) = x a x b + y a y b.

Summary and conclusion Classification is needed in image processing It is highly application dependent Reading material: Chapters 12-12.2.3 Problems: 12.1, 12.2, 12.7 and one problem given during the lecture