Computer Vision. Colorado School of Mines. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Similar documents
Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Introduction to Artificial Intelligence

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Supervised vs unsupervised clustering

BL5229: Data Analysis with Matlab Lab: Learning: Clustering

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

INF 4300 Classification III Anne Solberg The agenda today:

Data clustering & the k-means algorithm

Statistical Methods in AI

Unsupervised Learning

Cluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6

Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms

An Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs

Clustering and Visualisation of Data

Clustering & Classification (chapter 15)

K-means Clustering & PCA

KTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn

Clustering CS 550: Machine Learning

Cluster Analysis for Microarray Data

Dimension Reduction CS534

Unsupervised Learning : Clustering

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Chapter 4. The Classification of Species and Colors of Finished Wooden Parts Using RBFNs

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47

JMP Book Descriptions

Radial Basis Function (RBF) Neural Networks Based on the Triple Modular Redundancy Technology (TMR)

Statistical Pattern Recognition

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Performance Measure of Hard c-means,fuzzy c-means and Alternative c-means Algorithms

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Statistical Pattern Recognition

Unsupervised Learning

Nearest Neighbor Classification

K-Means Clustering 3/3/17

Using Machine Learning to Optimize Storage Systems

University of Florida CISE department Gator Engineering. Clustering Part 2

Recognition Part I: Machine Learning. CSE 455 Linda Shapiro

IBL and clustering. Relationship of IBL with CBR

Artificial Intelligence. Programming Styles

Image processing & Computer vision Xử lí ảnh và thị giác máy tính

EPL451: Data Mining on the Web Lab 5

Unsupervised Learning and Clustering

MATH5745 Multivariate Methods Lecture 13

Introduction to Machine Learning Prof. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Computer Vision Group Prof. Daniel Cremers. 6. Boosting

Computer Vision Group Prof. Daniel Cremers. 8. Boosting and Bagging

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Introduction to Machine Learning. Xiaojin Zhu

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Hsiaochun Hsu Date: 12/12/15. Support Vector Machine With Data Reduction

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

Fuzzy Segmentation. Chapter Introduction. 4.2 Unsupervised Clustering.

Grundlagen der Künstlichen Intelligenz

Gene Clustering & Classification

Note Set 4: Finite Mixture Models and the EM Algorithm

The exam is closed book, closed notes except your one-page cheat sheet.

Data Exploration and Preparation Data Mining and Text Mining (UIC Politecnico di Milano)

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Region-based Segmentation

Ensemble Methods, Decision Trees

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Network Traffic Measurements and Analysis

Interpolation is a basic tool used extensively in tasks such as zooming, shrinking, rotating, and geometric corrections.

Statistical Pattern Recognition

CS 195-5: Machine Learning Problem Set 5

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Supervised vs. Unsupervised Learning

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Unsupervised Learning

Clustering analysis of gene expression data

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining

Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Input: Concepts, Instances, Attributes

Lecture 4 Face Detection and Classification. Lin ZHANG, PhD School of Software Engineering Tongji University Spring 2018

What to come. There will be a few more topics we will cover on supervised learning

Unsupervised learning

Remote Sensing & Photogrammetry W4. Beata Hejmanowska Building C4, room 212, phone:

Unsupervised Learning

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

Unsupervised Learning and Clustering

Large Scale Data Analysis Using Deep Learning

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest)

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Finding Clusters 1 / 60

Data Informatics. Seon Ho Kim, Ph.D.

An Introduction to PDF Estimation and Clustering

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

CSCI567 Machine Learning (Fall 2014)

Intro to Artificial Intelligence

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Learning to Learn: additional notes

Clustering Lecture 5: Mixture Model

ECG782: Multidimensional Digital Signal Processing

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS

Transcription:

Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/

Pattern Recognition

Pattern Recognition The process by which patterns in data are found, recognized, discovered Usually aims to classify data (patterns) based on either a priori knowledge or on statistical information extracted from the patterns The patterns to be classified are observations, defining points in a multidimensional space Classification is usually based on a set of patterns that have already been classified (e.g., by a person) This set of patterns is termed the training set The learning strategy is called supervised Learning can also be unsupervised In this case there is no training set Instead it establishes the classes itself based on the statistical regularities of the patterns Resources Statistical Toolbox in Matlab Journals include Pattern Recognition, IEEE Trans. Pattern Analysis & Machine Intelligence Good book: Pattern Recognition and Machine Learning by Bishop 3

Approaches Statistical Pattern Recognition We assume that the patterns are generated by a probabilistic system The data is reduced to vectors of numbers and statistical techniques are used for classification Structural Pattern Recognition The process is based on the structural interrelationships of features The data is converted to a discrete structure (such as a grammar or a graph) and classification techniques such as parsing and graph matching are used Neural The model simulates the behavior of biological neural networks 4

Unsupervised Pattern Recognition The system must learn the classifier from unlabeled data It s related to the problem of trying to estimate the underlying probability density function of the data Approaches to unsupervised learning include clustering (e.g., k means, mixture models, hierarchical clustering) techniques for dimensionality reduction (e.g., principal component analysis, independent component analysis, non negative matrix factorization, singular value decomposition) 5

k means Clustering Given a set of n dimensional vectors Also specify k (number of desired clusters) The algorithm partitions the vectors into clusters, such that it minimizes the sum, over all clusters, of the within cluster sums of point to clustercentroid distances 6

k means Algorithm. Given a set of vectors {x i }. Randomly choose a set of k means {m i } as the center of each cluster 3. For each vector x i compute distance to each m i ; assign x i to the closest cluster 4. Update the means to get a new set of cluster centers 5. Repeat steps 3 and 4 until there is no more change in cluster centers k means is guaranteed to terminate, but may not find the global optimum in the least squares sense 7

Example: Indexed Storage of Color Images If an image uses 8 bits for each of R,G,B, there are ^4 possible colors Most images don t use the entire color space of possible values we can get by with fewer We ll use k means clustering to find the reduced set of colors Image using the full color space Image using only 3 discrete colors 8

Indexed Storage of Color Images For each image, we find a set of colors that are a good approximation of the entire set of pixels in the image; and put those into a colormap Then for each pixel, we just store the indices into the colormap Image of indices (0..63) Color image, using 64 colors 9

Indexed storage of Color Images Use a colormap Image f(x,y) stores indices into a lookup table (colormap) Colormap specifies RGB for each index Display system will display these values of RGB [img,cmap] = imread( kids.tif ); imshow(img,cmap); Also see rgbind, indrgb >> cmap(:0,:) ans = 0.0980 0.094 0.00 0.45 0.00 0.098 0.608 0.4 0.804 0.96 0.6 0.00 0.43 0.569 0.373 0.96 0.843 0.8 0.47 0.353 0.84 0.337 0.490 0.00 0.394 0.039 0.569 0.48 0.75 0.090 0.435 0.34 0.373 0.376 0.47 0.43 0.4039 0.784 0.8 0.4078 0.337 0.67 0.355 0.3059 0.3490 0.576 0.039 0.057 Index = 7 0.5059 0.75 0.090 0.6039 0.39 0.047 0.639 0.3059 0.0353 0.5098 0.706 0.59 : 0

clear all close all % Read image RGB = imdouble(imread('peppers.png')); RGB = imresize(rgb, 0.5); %RGB = imdouble(imread('pears.png')); RGB = imresize(rgb, 0.5); %RGB = imdouble(imread('tissue.png')); RGB = imresize(rgb, 0.5); figure, imshow(rgb); % Convert 3-dimensional array (M,N,3) array to D (MxN,3) X = reshape(rgb, [], 3); k=6; % Number of clusters to find % Call kmeans. It returns: % IDX: for each point in X, which cluster (..k) it was assigned to % C: the k cluster centers [IDX,C] = kmeans(x,k,... 'EmptyAction', 'drop'); % if a cluster becomes empty, drop it % Reshape the index array back to a -dimensional image I = reshape(idx, size(rgb,), size(rgb,)); % Show the reduced color image figure, imshow(i, C); % Plot pixels in color space figure hold on for i=:0:size(x,) plot3(x(i, ), X(i, ), X(i, 3),... '.', 'Color', C(IDX(i),:)); end % Also plot cluster centers for i=:k plot3(c(i,), C(i,), C(i,3), 'ro', 'MarkerFaceColor', 'r'); end xlabel('red'), ylabel('green'), zlabel('blue'); axis equal axis vis3d

. 0.8 0.6 0.4 0. 0-0..5 0.5 0 0 0. 0.4 0.6 0.8..4

Supervised Statistical Methods A class is a set of objects having some important properties in common We might have known description for each We might have set of samples for each A feature extractor is a program that inputs the data (image) and extracts features that can be used in classification; these values are put into a feature vector A classifier is a program that inputs the feature vector and assigns it to one of a set of designated classes or to the reject class We will look at these classifiers: Decision tree Nearest class mean Another powerful classifier is support vector machines 3

Feature Vector Representation A feature vector is a vector x=[x, x,, x n ], where each x j is a real number The elements x j may be object measurements For example, x j may be a count of object parts or properties Example: an object region can be represented by [#holes, #strokes, moments, ] from Shapiro & Stockman 4

Possible features for Char Recognition from Shapiro & Stockman 5

Discriminant functions Functions f(x, K) perform some computation on feature vector x Knowledge K from training or programming is used Final stage determines class from Shapiro & Stockman 6

Decision Trees Strength: Easy to understand Weakness: Overtraining from Shapiro & Stockman Class #holes #strokes Best axis Mom of inertia A 3 90 Med B 90 Large 8 0 90 Med 0 0 90 Large 0 90 Low W 0 4 90 Large X 0? Large * 0 0? Large - 0 0 Low / 0 60 Low 7

Entropy Based Automatic Decision Tree Construction Training Set S x=(f,f, fm) x=(f,f, fm).. xn=(fn,f, fm) Node What feature should be used? What values? Choose the feature which results in the most information gain, as measured by the decrease in entropy from Shapiro & Stockman 8

Entropy Given a set of training vectors S, if there are c classes, Entropy(S) = -p i log (p i ) where p i is the proportion of category i examples in S. If all examples belong to the same category, the entropy is 0. If the examples are equally mixed (/c examples of each class), the entropy is a maximum at.0. c i= e.g. for c=, -.5 log.5 -.5 log.5 = -.5(-) -.5(-) = from Shapiro & Stockman 9

Decision Tree Classifier Uses subsets of features in sequence Feature extraction may be interleaved with classification decisions Can be easy to design and efficient in execution from Shapiro & Stockman 0

Matlab demo See Decision Trees in Matlab help Statistics and Machine Learning toolbox Fisher's iris data consists of measurements on the sepal length, sepal width, petal length, and petal width of 50 iris specimens. There are 50 specimens from each of three species. clear all close all % Loads: % meas(50,4) - each row is a pattern (a 4-dimensional vector) % species{50} - each element is the name of a flower load fisheriris % Create a vector of class numbers. We know that the input data is grouped % so that..50 is the st class, 5..00 is the nd class, 0..50 is the % 3rd class. y(:50,) = ; % class 'setosa' y(5:00,) = ; % class 'versico' y(0:50,) = 3; % class 'virginica' X = meas(:, :); % just use first features (easier to visualize)

% We will just use the first features, since it is easier to visualize. % However, when we do that there is a chance that some points will be % duplicated (since we are ignoring the other features). If so, just keep % the first point. indicestokeep = true(size(x,),); for i=:size(x,) % See if we already have the ith point. if any((x(i,)==x(:i-,)) & (X(i,)==X(:i-,))) indicestokeep(i) = false; % Skip this point end end X = X(indicesToKeep, :); y = y(indicestokeep); % Plot the feature vectors. figure hold on plot(x(y==,), X(y==,), '*r'); plot(x(y==,), X(y==,), '*g'); plot(x(y==3,), X(y==3,), '*b'); xlabel('sepal length'), ylabel('sepal width'); Sepal width

Choose a test point to classify % Specify a test vector to classify. xtest = [5.6, 3.]; plot(xtest(), xtest(), 'ok'); % the black circle (k) is the test point hold off Sepal width 3

Construct a decision (or classification) tree % A "classification" tree produces classification decisions that are % "nominal" (ie, names). A "regression" tree produces classification % decisions that are numeric. ctree = ClassificationTree.fit(X,y,... 'MinParent', 0); % default is 0 view(ctree); % Prints a text description view(ctree, 'mode', 'graph'); % Draws a graphic description of the tree 4

Use decision tree to classify a vector In our example, x = 5.6 and x = 3. % Classify a test vector, using the decision tree. class = predict(ctree, xtest); fprintf('test vector is classified as %d\n', class); 5

View the whole feature space % Visualize entire feature space, and what class each vector belongs to. xmin = min(x(:,)); xmax = max(x(:,)); ymin = min(x(:,)); ymax = max(x(:,)); hold on; dx = (xmax-xmin)/40; dy = (ymax-ymin)/40; for x=xmin:dx:xmax for y=ymin:dy:ymax class = predict(ctree,... [x y]); if class== plot(x,y,'.r'); elseif class== plot(x,y,'.g'); else plot(x,y,'.b'); end end end hold off; Note that some training vectors are incorrectly classified 6

The input parameter MinParent has default value 0. Setting 'MinParent = will cause the decision tree to split (make a new node) if there are any instances that are still not correctly labeled. 'MinParent = 7

Generalization Making the tree accurately classify every single training point leads to overfitting. If you have lots of data, some training points may be noisy We don t want the tree to learn the noise We want the tree to generalize from the training data It should learn the general, underlying rules It s ok to misclassify a few training points 8

MinParent = 4.5 4.5 4 4 Sepal width 3.5 3 3.5 3.5.5 4 4.5 5 5.5 6 6.5 7 7.5 8 Sepal length 4 4.5 5 5.5 6 6.5 7 7.5 8 9

MinParent = 0 4.5 4 3.5 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 30

MinParent = 40 4.5 4 3.5 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 3

Classification using nearest class mean Compute the Euclidean distance between feature vector X and the mean of each class. x x [ i] x[ i x ] i, d Choose closest class, if close enough (reject otherwise) Low error rate (intersection) from Shapiro & Stockman 3

Scaling Distance Using Standard Deviations Scale distance to the mean of class c, according to the measured standard deviation i in each direction i x x c x[ i] xc[ i] / i Otherwise, a point near the top of class 3 will be closer to the class mean from Shapiro & Stockman 33

If ellipses are not aligned with axes Instead of using standard deviation along each separate axis, use the covariance matrix C Variance (of a single variable x) is defined as Covariance (of two variables, x and y) is x x x x x x x x x x x x x x x x o o o o o o o o o o o N i x i N yy yx xy xx C xy y i N i x i yx xy N i y i yy N i x i xx y x N y N x N 34

Examples 3 3 0 0 - - - - -3-3 - - 0 3 C = 0.0497 0.03 0.03 0.8590-3 -3 - - 0 3 C = 0.8590 0.8836 0.8836.069 Notes Off diagonal values are small if variables are independent Off diagonal values are large if variables are correlated (they vary together) Matlab cov function 35

Probability Density Let s assume that errors are Gaussian The probability density for an dimensional error vector x is T x x C x p exp C x x 0.4 0.3 0. 0. 0 3 0 - x - axis - -3 - - 0 x - axis 3 36

Probability Density Look at where the probability is a constant. This is where the exponent is a constant: x This is the equation of an ellipse. T C x x For example, with uncorrelated errors this reduces to z x xx y yy z Can choose z to get desired probability. For z=3, the cumulative probability is about 97%. 37

Plotting Contours of constant probability 3 3 x - axis 0 x - axis 0 - - - - -3-3 - - 0 3 x - axis -3-3 - - 0 3 x - axis 38

% Show covariance of two variables clear all close all Matlab code randn('state',0); yp = randn(40,); xp = 0.5 * randn(40,); % xp = randn(40,); % yp = xp + 0.5*randn(40,); plot(xp,yp, '+'), axis equal; axis([-3.0 3.0-3.0 3.0]); C = cov(xp,yp) Cinv = inv(c); detcsqrt = sqrt(det(c)); % Plot the probability density, % p(x,y) = (/(pi det(c)^0.5))exp(-x Cinv x/) L = 3.0; delta = 0.; [x,x] = meshgrid(-l:delta:l,-l:delta:l); for i=:size(x,) for j=:size(x,) x = [x(i,j); x(i,j)]; fx(i,j) = (/(*pi*detcsqrt)) * exp( -0.5*x'*Cinv*x ); end end hold on % meshc(x,x,fx); % this does a surface plot contour(x,x,fx); % this does a contour plot xlabel('x - axis'); ylabel('x - axis'); 39

Example flower data from Matlab % This loads in the measurements: % meas(n,4) are the feature values % species{n} are the species names load fisheriris; % There are three classes X = meas(strmatch('setosa', species), 3:4); % use features 3,4 X = meas(strmatch('virginica', species), 3:4); X3 = meas(strmatch('versicolor', species), 3:4); hold on plot( X(:,), X(:,), '.r' ); plot( X(:,), X(:,), '.g' ); plot( X3(:,), X3(:,), '.b' ); m = sum(x)/length(x); m = sum(x)/length(x); m3 = sum(x3)/length(x3); plot( m(), m(), '*r' ); plot( m(), m(), '*g' ); plot( m3(), m3(), '*b' );.5.5 0.5 0 3 4 5 6 7 40

Overlaying probability contours % Plot the contours of equal probability [f,f] = meshgrid( min(meas(:,3)):0.:max(meas(:,3)),... min(meas(:,4)):0.:max(meas(:,4)) ); C = cov(x); Cinv = inv(c); detcsqrt = sqrt(det(c)); for i=:size(f,) for j=:size(f,) x = [f(i,j) f(i,j)]; fx(i,j) = (/(*pi*detcsqrt)) * exp( -0.5*(x-m)*Cinv*(x-m)' ); end end contour(f,f,fx); C = cov(x); Cinv = inv(c); detcsqrt = sqrt(det(c)); for i=:size(f,) for j=:size(f,) x = [f(i,j) f(i,j)]; fx(i,j) = (/(*pi*detcsqrt)) * exp( -0.5*(x-m)*Cinv*(x-m)' ); end end contour(f,f,fx); C = cov(x3); Cinv = inv(c); detcsqrt = sqrt(det(c)); for i=:size(f,) for j=:size(f,) x = [f(i,j) f(i,j)]; fx(i,j) = (/(*pi*detcsqrt)) * exp( -0.5*(x-m3)*Cinv*(x-m3)' ); end end contour(f,f,fx); 4

Mahalanobis distance Given an unknown feature vector x, which class is it closest to? Assume you know the class centers (centroids) z i, and their covariances C i We find the class that has the smallest distance from its center to the point in feature space The distance is weighted by the covariance this is called the Mahalanobis distance For example, the Mahalanobis distance of feature vector x to the i th class is di T x z C x z i where C i is the covariance matrix of the feature vectors in the i th class i i 4

Summary / Questions In pattern recognition, we classify patterns (usually in the form of vectors) into classes. Training of the classifier can be supervised (i.e., we have to provide labeled training data) or unsupervised. k means clustering is an example of unsupervised learning Approaches to classification include Statistical Structural Neural Name some statistical pattern recognition methods. 43