Learning in Medical Image Databases. Cristian Sminchisescu. Department of Computer Science. Rutgers University, NJ
|
|
- Lora Scott
- 5 years ago
- Views:
Transcription
1 Learning in Medical Image Databases Cristian Sminchisescu Department of Computer Science Rutgers University, NJ December, 998 Abstract In this paper we present several results obtained by experimenting with Bayesian minimum Mahalanobis distance and k-nearest neighbor classication methods in a medical domain. The training set contains of four classes of dierent pathologies. Individual pathologies are represented in terms of their contour description, by a high-dimensional vector (the modal vector) abstracting their shape. We employ a divergence criterion to identify features with high discriminative power. The performance measurements suggest that the Bayes classier outperforms the weighted k-nearest neighbor classi- er, a result which is not surprising considering the particular noisy structure of the training data set. Keywords: Machine learning, Bayesian learning, Nearest neighbor classication, Feature selection. Introduction The problem we are investigating in this work concerns classifying dental pathologies based upon the shape of the pathology. Specically, we employ a clinical radiograph image database containing 64 dental pathologies and the database consists of 4 dierent classes of dental disease, each class consisting of 6 elements. The 4 dierent classes represent progressive evolution of the disease, that is, the class C represents incipient forms of the disease, while class C4 consists of the most advanced stage of dental disease (gure 6). Each pathology contour has been identi- ed (in the dental radiograph) and the pathology has been labeled into one of the 4 classes of the disease we are interested in by an expert physician. However, dealing simply with the pathology contour (consisting of a set of coordinates) does not give us any powerful description or abstraction regarding the shape of the pathology. In order to abstract the shape of a pathology, starting from the raw contour points, we use modal analysis, a computer vision technique for obtaining description of objects in terms of a vector of deformation modes [5]. In particular, we are employing a prototype shape (in this case, an ellipse) and we compute the modal displacement vector, associated with the process of deforming (or aligning) the prototype shape with the underlying lesion shape (gure 7). The individual components of such a vector represent the deformation modes (like, for instance, tapering, bending, pitching, and so on), of a shape at ner and ner level of details (the number of modes could be quite large, of order 0 3 ). The modal description of a shape resembles a Fourier description of a signal where the low-order frequencies convey the signal's coarse characteristics while the high frequencies convey very ne signal information. Consequently, we expect that higher order modes in the shape description will be more sensitive to noise and less useful in terms of discriminative power (or classication ability). This domain knowledge allows us, to cut the dimensionality of the modal vector (and the subsequent analysis) down to the rst 30 modes. 2 Bayesian Learning The problem we are studying can be formulated in a Bayesian framework as follows: given a set of classes c i ; i = ::c (c = 4 in our case), we want to design a classier such that, given a feature vector x, maximizes the probability of correctly classifying it. 2. Bayes Classier We use Bayes'rule: where P (c i jx) = p(xjc i)p (c i ) p(x) ()
2 p(x) = cx j= p(xjc j )P (c j ) (2) In the above equations, P (c j ) represent a priori probabilities for class c j, p(xjc j ) represent the state conditional probability densities (essentially the probability of observing x, given the class c j ) and P (c j jx) represent the a posteriori probabilities (all j's are within the range ::n). Essentially, Bayes rule gives a quantitative account about how observing the values of x changes the a priori probability P (c j ) into the a posteriori probability P (c j jx). The pattern classication problem can be formulated in terms of a set of discriminant functions g i (x); i = ::c, each one associated with a dierent class. Given a feature vector x, the classier computes c discriminant functions and selects the class corresponding to the largest discriminant. Consequently, the vector x is assigned to class c i if: g i (x) > g j (x); 8j 6= i (3) For the particular case of a Bayesian classier, we can consider g i (x) = P (c i jx), so the maximum discriminant function naturally corresponds to the maximum a posteriori probability. However, the choice of a particular discriminant function is not unique, as it could, for instance, be multiplied by a positive constant or biased by an additive constant. Moreover, replacing every g i (x) by f(g i (x)) where f is a monotonically increasing function, will preserve the results of the classication [2]. The observation leads to both analytical and computational simplications as: cp g i (x) = P (c i jx) = p(xjc i)p (c i ) j= p(xjc j )P (c j ) gives the same classications results as: (4) g i (x) = log p(xjc i ) + log P (c i ) (5) (note that the term c P j= p(xjc j )P (c j ) is a sum having the same value among all classes, so one can treat it as a constant). For the experiments, we assume equally likely a priori class probabilities, that is, P (c i ) = P (c j ); 8i; j 2 ::c. Consequently, equation 5 becomes: g i (x) = log p(xjc i ) (6) and the central problem is transfered on estimating the conditional densities p(xjc j ); j = ::c. 2.2 Multivariate Normal Density The assumption used with many Bayesian classiers, that we shall follow in the experiments as well, is that the conditional density is multivariate normal (to be dened precisely below) which represents an appropriate model for the case when the feature vectors x corresponding to a class c i are continuous valued, moderately corrupted versions of a prototypical vector i. The multivariate normal density can be written as: p(x) = exp[? (2) d 2 jj 2 2 (x? )t? (x? )] (7) where x is a d-component column vector, is the d-component mean vector, and is the d-by-d covariance matrix. This equation is always abbreviated as p(x) N(; ). By using the equations 6 and 7, one can devise the discriminant functions corresponding to the minimum-error rate classication (leaving apart the constant and scale factors): g i (x) =?(x? ) t? i (x? )? log j i j (8) Under the particular assumptions that the covariance matrices for all classes are equal, that is: i = ; 8i 2 ; ::; c, the discriminant functions become: g i (x) =?(x? ) t? (x? ) (9) In the above equation, the right hand side term: r 2 = (x? ) t? (x? ) (0) is the squared Mahalanobis distance. To classify a feature vector x, one measures the squared Mahalanobis distance from x to each of the c mean vectors and assigns x to the class corresponding to the nearest mean. In the experiments we use a minimum Mahalanobis distance classier, based on the discriminant function presented in equation Parameter Estimation: The Mean and Covariance Matrix Under the assumption of multivariate normal distribution, the Maximum Likelihood (ML) Estimation for the mean vector and covariance matrix is: ^ = n nx k= x k () 2
3 ^ = n? nx k= (x k? ^)(x k? ^) t (2) where x k is the sample data. One can notice that the ML estimate for the unknown population mean is the sample mean, or the centroid of the sample cloud. The ML estimate for the covariance matrix is the sample variance, and it can be shown to be unbiased. For the experiments we use the above formulas for obtaining estimates for the mean vectors and covariance matrices corresponding to each of the classes c i ; i 2 f::4g, involved in the learning process. 3 Nearest-Neighbor Classication Considering the feature vectors of the form: x = (a (x); a 2 (x); :::; a n (x)); the distance between two instances x i and x j is given by: d(x i ; x j ) = vu u t X n k= (a k (x i )? a k (x j )) 2 (3) Given a new instance x q, to be classied, the k- nearest neighbor algorithm performs a vote among the k instances nearest to x q. More precisely, given the set of possible classication decisions (classes) C = c ; c 2 ; c 3 ; c 4, the algorithm decides according to: ^f = argmax c2c kx i= (c; f(x i )) (4) where (a; b) = if a = b and 0 otherwise. In the experiments, we employ a distance-weighted k-nearest neighbor method: ^f = argmax c2c where w i = d(x q;x i) 2. kx i= w i (c; f(x i )) (5) 4 Discriminant Feature Selection In this section, we are interested in identifying those features (vector components) which are highly relevant in terms of the classication performance (providing good separation between the groups to be classied). A brute-force approach to such a problem is to consider the power set of the feature set, run the classication process for each such particular combination of features, and select that feature set leading to the best classication performance. However, this solution shortly proves to be intractable as, for the 30 feature vectors we use, we need to generate the power set of their features, that is 2 30 possible subsets. An alternative approach is to use a probabilistic measure to check the impact of each vector component on the separability between dierent groups. We employ divergence, as a quantitative measure for this purpose. Intuitively, the divergence is a measure of the distributional separability of two probability distributions and measures how well the features used can represent the statistics conveyed in the raw data. For testing binary hypothesis, H vs. H 2, in which the probability distributions of features is P under H, and P 2 under H 2, we dene P e as average error probability, P 2 as the probability of choosing H when class H 2 is actually true, and vice versa for P 2 : P e = P 2 P (H ) + P 2 P (H 2 ) Furthermore, the divergence is related to the error exponent in P e as follows: as the divergence increases, the error represented by P e decreases. A closed form expression for the divergence for multi-variate Gaussian data can be derived as in [4]: D = 2 (m? m 2 ) T (K? + K? 2 )(m? m 2 ) + 2 tr(k? K 2 + K? 2 K? 2I) (6) where m and m 2 are mean values, K and K 2 are covariance matrices, corresponding to the feature vectors in classes c and c 2, and tr is a function operating on a matrix argument and computing the summation of its diagonal elements. Now, we need to link the inter-group divergence which gives a measure for the similarity between two groups, with our particular application where several groups are present. Furthermore, we want to construct a criterion function, based on divergence, such that we can evaluate the separability contribution of each feature with respect to the groups involved in the classication process. More precisely, for any feature f 2 G; G = f::30g corresponding to a modal vector, we compute a divergence-based global criterion value given by: c f = 3X 4X i= j=i+ D G?f ij (7) 3
4 where Dij S represents the inter-group divergence between groups i and j, computed for the features in set S (S G). Selecting good discriminative features according to the criterion, reduces to reversely sorting the features based on the value obtained using the formula 7, that is, the better the feature, the lower the value of its corresponding criterion (if the feature provides good discriminative ability, removing it would result in a decrease in the separation ability as quantied by the divergence measure). The results of applying the criterion for each feature are plotted in gure 3. The criterion 7 impose a partial ordering on the feature set. When testing the performance for dierent classiers, we incrementally construct the classiers corresponding to all the ordered feature sets (that is the sets consisting of features fg; f; 2g; :::f; 2; :::; 30g, but note that, for instance, \" now means the rst feature according to the feature ordering criterion, and not the rst component of the vector) and in this way, the complexity of experimenting with each particular classier becomes linear in the number of present features (in our case 30). To provide further insight into the relevance of this criterion selection, we also perform a standard statistical analysis to compute the mean and standard deviation for each feature in each individual group (as well as for the global data set). The results are depicted in gures and 2. Intuitively, divergence balances the mean inter-group separability for individual vector components with the corresponding standard deviation and \assembles" these into a formula characterizing the separability of two groups involved in the classication. The criterion presented in 7 performs a greedy selection (by summing over the divergences corresponding to a set of features, for all pairs of groups involved in the classication process). It is greedy in the sense that it might assign a good merit value to some features which not necessarily provide good separability between all the groups, but sometimes only between pairs of groups (some groups might be very well separated, some other might not, but the criterion might not able to \sense" this, as it just computes an overall sum). Mean Value Standard Deviation < Feature Means > "A" "B" "C" "D" "Globally" Feature Figure : Medians < Feature Standard Deviations > "A" "B" "C" "D" "Globally" 5 Experiments and Results We experiment with the classication methods presented above by running a cross-validated procedure for obtaining more accurate results. We employ a leave-one-out method for maximizing the utilization Feature Figure 2: Standard Deviations 4
5 Criterion < Best Features: Criterion = Probabilistic Distance - Divergence> "Criterion" the ordering criterion 7 on the set of features. Subsequently, for each k-nearest neighbor classier, we pick (and plot) the value corresponding to the minimum error among all ordered feature sets (that is fg; f; 2g; f; 2; 3g; :::f; 2; 3; :::30g). Consequently, the error values we plot are not necessarily obtained for the same ordered feature set, but might correspond to dierent ordered feature sets. This might make the plot 4, nonuniform, but we felt that the important thing to analyze is the real minimum error estimate, and not the error estimate resulting from the rigid imposition of a particular subset of features (which might not provide a minimum error estimate for a classier corresponding to a particular value of k) Feature Figure 3: Feature Criterion Values of the data. For the case of the nearest-neighbor classication we use a leave-4-out method (by keeping out one member of each class), in order to avoid bias in the experiments. The confusion matrix corresponding to the minimum Mahalanobis distance classier is given in table. Minimum classication error estimation has been obtained when selecting the rst 5 best features (according to the divergence criterion described), and the value is 22.25%. The error estimations corresponding to selecting the rst k criterion features (k = ::5) are plotted in gure 5. The confusion matrix corresponding to the weighted k-nearest neighbor classier is given in table 2. Minimum classication error estimation has been obtained for 5-nearest neighbor, the rst 7 best features, and the value is 32.8%. The error estimations corresponding to selecting dierent k-nearest neighbor (k = ::0) classiers are shown in gure 4. The plot presented in gure 4 represents a compressed version of the runs actually performed. We build separate k-nearest neighbor classiers for each k 2 f::0g, and for each classier, we consider different feature sets in the order of their criterion selection, i.e. we rst evaluate the classier corresponding only to the rst best feature, then the classier corresponding to the rst and second best features and so on, up to the classier containing all the features in the set (30). Note that the sets are not created by randomly choosing features, but they are ordered in the sense that we gradually add features according to gr. G gr. G2 gr. G3 gr. G4 G 86.50% 9.50% 4.00% 0.00% G2 2.75% 75.25% 6.50% 5.50% G3 7.50% 5.00% 78.50% 9.00% G4 4.50% 9.50% 4.75% 7.25% Table : Minimum Mahalanobis Distance Classier Confusion Matrix gr. G gr. G2 gr. G3 gr. G4 G 68.75% 3.25% 0.00% 0.00% G % 62.50% 2.50% 0.00% G3 0.00% 0.00% 00.00% 0.00% G % 0.00% 25.00% 37.50% Table 2: Nearest Neighbor Classier Confusion Matrix 6 Discussion We observe that the results obtained by using the Bayes classier are better in terms of the classication error (22.25% versus 32.8%). Also, the Bayes classier use a smaller size of the feature set for obtaining its smallest error classication, that is the rst 5 versus the rst 7 criterion features, for the 5-nearest neighbor classier. The order set of the rst 7 criterion features is f2; 26; 6; 30; 6; 29; 3g. The fact that the Bayes classier gives better results can be considered as a reasonable, practically validated outcome, as it is known that the 5
6 Error Error < Error Using KNN and Best Features > "Error" K - Number of Nearest Neighbors Figure 4: k-nearest neighbor error < Error Using Minimum Mahalanobis Distance Classifier > "Error" k-nearest neighbor classiers provide expected suboptimal classication performance [2]. Furthermore, it is only when the number of samples becomes very large (approaching innity according to the theory) when k-nearest neighbor classication starts emphasizing nearly optimal behavior. The voting process among k-nearest neighbors can be understood as a trade-o between reliability (choosing many neighbors for obtaining a reliable estimate) and accuracy (choosing those neighbors that are really very close to the point to be classied), forcing towards a compromise value for k, that is a small percentage of the number of examples. From the results, we can verify that this is, indeed, the case. There is a particular issue related to the very noisy state of the training examples (the description of the lesions' shape) at present time, that negatively impacts on the k-nearest neighbor classication. It is known that these classication methods are particularly sensitive to noise [3]. We also expect that the minimum Mahalanobis distance classier will be more robust to noise as the distance it is based upon is normalized by covariance, that is, accounts for the deviations within the groups involved in the classication process. In terms of the confusion matrices associated with the two classiers, we generally notice higher probability of misclassication within \adjacent" groups, which is, again an expected behavior, as the classes C and C2 are more likely to be similar that C and C4. However, this is not always the case. We notice, that, for instance the group 4, the lowest classication performance is obtained for both classiers, and the probability of misclassication in group is important although one may expect about them being the most separated ( since they represent classes of disease in their initial and nal evolution). This might be due to the particularly noisy descriptions in this group (by analyzing gure 2 one can notice that this is indeed true, as the standard deviation for group 4 dominates over the ones for the other groups almost everywhere in the feature domain) Conclusions and Further Work K - Number of Best Features (First K) Figure 5: Mahalanobis minimum distance error In this work, we have implemented and analyzed the performance of two classiers on samples derived from four classes of pathologies encountered in a medical image database. In order to obtain better classication performance, we performed a discriminant feature analysis based on a divergence criterion. The features were subsequently ordered according to their 6
7 discriminative ability and we subsequently tested the classier on incrementally constructed sets (that is sets in which features are incrementally added according to the order generated by the divergence criterion). This methodology attempts to deal with the intractable problem of generating all possible subsets of the feature set (2 30 ), run the classiers and obtain error estimations for all such possible feature sets. The classication results we obtained are quite promising, but further extensions are possible in several directions. First, operating on a large database, translated in a large training set could certainly provide more accurate error estimation results as well as further insight into this classication problem. Second, a more realistic feature selection method based on divergence might be used. At present time, the selection is based on the order generated by summing all inter-group divergence values, corresponding to a set of features and leaving one feature out each time (basically, computing a form of divergence gain for each feature). While we are certainly looking for features with higher contributions to the overall divergence this does not necessarily mean they provide good separation between any two groups and this might lead to poor classication performance. Using criteria that might take the relative inter-group divergence into account (not only their sum) might result in better discriminant feature selection. Ultimately, devising better, less noisy descriptions or extraction of shape vectors using computer vision techniques, should certainly improve classication performance. Acknowledgments: I would like to thank Sachin Lodha and Wen Li who kindly accepted to review the paper. References Figure 6: A pathology in its four progressive evolution stages Figure 7: Deformation of an ellipse into a pathology contour Database. Workshop on Biomedical Image Analysis, June 26-27, 998,Santa Barbara, California. [] A.Blum. Empirical Support for Winnow and Weighted-Majority Algorithms: Results on a Calendar Scheduling Domain. CMU-TR, 998 [2] R.Duda and P.Hart. Pattern Classication and Scene Analysis. Wiley Interscience, 973. [3] T.Mitchell. Machine Learning. McGraw-Hill, 997. [4] W.Therrien. Decision Estimation and Classication. John Wiley and Sons, 989. [5] W.Zhang, S.Dickinson, S.Sclaro, J.Feldman, S.Dunn. Shape Indexing in a Medical Image 7
Supervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful
More informationIntroduction The problem of cancer classication has clear implications on cancer treatment. Additionally, the advent of DNA microarrays introduces a w
MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No.677 C.B.C.L Paper No.8
More informationClassification: Linear Discriminant Functions
Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions
More informationAdvances in Neural Information Processing Systems, 1999, In press. Unsupervised Classication with Non-Gaussian Mixture Models using ICA Te-Won Lee, Mi
Advances in Neural Information Processing Systems, 1999, In press. Unsupervised Classication with Non-Gaussian Mixture Models using ICA Te-Won Lee, Michael S. Lewicki and Terrence Sejnowski Howard Hughes
More informationRecap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach
Truth Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 2.04.205 Discriminative Approaches (5 weeks)
More informationObject Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision
Object Recognition Using Pictorial Structures Daniel Huttenlocher Computer Science Department Joint work with Pedro Felzenszwalb, MIT AI Lab In This Talk Object recognition in computer vision Brief definition
More informationRowena Cole and Luigi Barone. Department of Computer Science, The University of Western Australia, Western Australia, 6907
The Game of Clustering Rowena Cole and Luigi Barone Department of Computer Science, The University of Western Australia, Western Australia, 697 frowena, luigig@cs.uwa.edu.au Abstract Clustering is a technique
More informationMachine Learning. Supervised Learning. Manfred Huber
Machine Learning Supervised Learning Manfred Huber 2015 1 Supervised Learning Supervised learning is learning where the training data contains the target output of the learning system. Training data D
More informationMachine Learning Lecture 3
Many slides adapted from B. Schiele Machine Learning Lecture 3 Probability Density Estimation II 26.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationMachine Learning Lecture 3
Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 26.04.206 Discriminative Approaches (5 weeks) Linear
More informationPATTERN CLASSIFICATION AND SCENE ANALYSIS
PATTERN CLASSIFICATION AND SCENE ANALYSIS RICHARD O. DUDA PETER E. HART Stanford Research Institute, Menlo Park, California A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS New York Chichester Brisbane
More informationIntroduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering
Introduction to Pattern Recognition Part II Selim Aksoy Bilkent University Department of Computer Engineering saksoy@cs.bilkent.edu.tr RETINA Pattern Recognition Tutorial, Summer 2005 Overview Statistical
More informationSemi-Supervised Clustering with Partial Background Information
Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject
More informationChapter 6 Continued: Partitioning Methods
Chapter 6 Continued: Partitioning Methods Partitioning methods fix the number of clusters k and seek the best possible partition for that k. The goal is to choose the partition which gives the optimal
More informationSensor Tasking and Control
Sensor Tasking and Control Outline Task-Driven Sensing Roles of Sensor Nodes and Utilities Information-Based Sensor Tasking Joint Routing and Information Aggregation Summary Introduction To efficiently
More informationContent-based image and video analysis. Machine learning
Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all
More informationMachine Learning Lecture 3
Machine Learning Lecture 3 Probability Density Estimation II 19.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Exam dates We re in the process
More informationIntroduction to Machine Learning
Introduction to Machine Learning Isabelle Guyon Notes written by: Johann Leithon. Introduction The process of Machine Learning consist of having a big training data base, which is the input to some learning
More informationUnsupervised Learning
Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover
More informationJ. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report. February 5, 1998
Density Estimation using Support Vector Machines J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report CSD-TR-97-3 February 5, 998!()+, -./ 3456 Department of Computer Science
More informationThe Curse of Dimensionality
The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more
More informationComputational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions
Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Thomas Giraud Simon Chabot October 12, 2013 Contents 1 Discriminant analysis 3 1.1 Main idea................................
More informationMixture Models and the EM Algorithm
Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is
More informationSYDE Winter 2011 Introduction to Pattern Recognition. Clustering
SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned
More informationI How does the formulation (5) serve the purpose of the composite parameterization
Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)
More informationUnsupervised Learning
Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,
More informationSupervised vs. Unsupervised Learning
Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now
More informationINF 4300 Classification III Anne Solberg The agenda today:
INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007,
More informationPattern Recognition ( , RIT) Exercise 1 Solution
Pattern Recognition (4005-759, 20092 RIT) Exercise 1 Solution Instructor: Prof. Richard Zanibbi The following exercises are to help you review for the upcoming midterm examination on Thursday of Week 5
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 2014-2015 Jakob Verbeek, November 28, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15
More informationRegularization and model selection
CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial
More informationClustering Sequences with Hidden. Markov Models. Padhraic Smyth CA Abstract
Clustering Sequences with Hidden Markov Models Padhraic Smyth Information and Computer Science University of California, Irvine CA 92697-3425 smyth@ics.uci.edu Abstract This paper discusses a probabilistic
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More informationA Distance-Based Classifier Using Dissimilarity Based on Class Conditional Probability and Within-Class Variation. Kwanyong Lee 1 and Hyeyoung Park 2
A Distance-Based Classifier Using Dissimilarity Based on Class Conditional Probability and Within-Class Variation Kwanyong Lee 1 and Hyeyoung Park 2 1. Department of Computer Science, Korea National Open
More informationMULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER
MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER A.Shabbir 1, 2 and G.Verdoolaege 1, 3 1 Department of Applied Physics, Ghent University, B-9000 Ghent, Belgium 2 Max Planck Institute
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationLab 2: Support Vector Machines
Articial neural networks, advanced course, 2D1433 Lab 2: Support Vector Machines March 13, 2007 1 Background Support vector machines, when used for classication, nd a hyperplane w, x + b = 0 that separates
More informationBayes Classifiers and Generative Methods
Bayes Classifiers and Generative Methods CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Stages of Supervised Learning To
More informationMachine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017
Machine Learning in the Wild Dealing with Messy Data Rajmonda S. Caceres SDS 293 Smith College October 30, 2017 Analytical Chain: From Data to Actions Data Collection Data Cleaning/ Preparation Analysis
More informationAPPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES
APPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES A. Likas, K. Blekas and A. Stafylopatis National Technical University of Athens Department
More informationEstimation of Item Response Models
Estimation of Item Response Models Lecture #5 ICPSR Item Response Theory Workshop Lecture #5: 1of 39 The Big Picture of Estimation ESTIMATOR = Maximum Likelihood; Mplus Any questions? answers Lecture #5:
More informationMarkov Random Fields and Gibbs Sampling for Image Denoising
Markov Random Fields and Gibbs Sampling for Image Denoising Chang Yue Electrical Engineering Stanford University changyue@stanfoed.edu Abstract This project applies Gibbs Sampling based on different Markov
More informationLocal qualitative shape from stereo. without detailed correspondence. Extended Abstract. Shimon Edelman. Internet:
Local qualitative shape from stereo without detailed correspondence Extended Abstract Shimon Edelman Center for Biological Information Processing MIT E25-201, Cambridge MA 02139 Internet: edelman@ai.mit.edu
More informationCS 664 Segmentation. Daniel Huttenlocher
CS 664 Segmentation Daniel Huttenlocher Grouping Perceptual Organization Structural relationships between tokens Parallelism, symmetry, alignment Similarity of token properties Often strong psychophysical
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationInternational Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA
International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERATIONS KAM_IL SARAC, OMER E GEC_IO GLU, AMR EL ABBADI
More informationCSE 446 Bias-Variance & Naïve Bayes
CSE 446 Bias-Variance & Naïve Bayes Administrative Homework 1 due next week on Friday Good to finish early Homework 2 is out on Monday Check the course calendar Start early (midterm is right before Homework
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1
Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group
More informationSupport Vector Machines
Support Vector Machines Chapter 9 Chapter 9 1 / 50 1 91 Maximal margin classifier 2 92 Support vector classifiers 3 93 Support vector machines 4 94 SVMs with more than two classes 5 95 Relationshiop to
More informationABSTRACT 1. INTRODUCTION 2. METHODS
Finding Seeds for Segmentation Using Statistical Fusion Fangxu Xing *a, Andrew J. Asman b, Jerry L. Prince a,c, Bennett A. Landman b,c,d a Department of Electrical and Computer Engineering, Johns Hopkins
More informationA Hierarchical Statistical Framework for the Segmentation of Deformable Objects in Image Sequences Charles Kervrann and Fabrice Heitz IRISA / INRIA -
A hierarchical statistical framework for the segmentation of deformable objects in image sequences Charles Kervrann and Fabrice Heitz IRISA/INRIA, Campus Universitaire de Beaulieu, 35042 Rennes Cedex,
More informationGaussian Processes for Robotics. McGill COMP 765 Oct 24 th, 2017
Gaussian Processes for Robotics McGill COMP 765 Oct 24 th, 2017 A robot must learn Modeling the environment is sometimes an end goal: Space exploration Disaster recovery Environmental monitoring Other
More informationStatistical image models
Chapter 4 Statistical image models 4. Introduction 4.. Visual worlds Figure 4. shows images that belong to different visual worlds. The first world (fig. 4..a) is the world of white noise. It is the world
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationClustering: Classic Methods and Modern Views
Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering
More informationCluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008
Cluster Analysis Jia Li Department of Statistics Penn State University Summer School in Statistics for Astronomers IV June 9-1, 8 1 Clustering A basic tool in data mining/pattern recognition: Divide a
More informationLearning to Learn: additional notes
MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2008 Recitation October 23 Learning to Learn: additional notes Bob Berwick
More informationMetaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini
Metaheuristic Development Methodology Fall 2009 Instructor: Dr. Masoud Yaghini Phases and Steps Phases and Steps Phase 1: Understanding Problem Step 1: State the Problem Step 2: Review of Existing Solution
More informationAll images are degraded
Lecture 7 Image Relaxation: Restoration and Feature Extraction ch. 6 of Machine Vision by Wesley E. Snyder & Hairong Qi Spring 2018 16-725 (CMU RI) : BioE 2630 (Pitt) Dr. John Galeotti The content of these
More informationNonparametric Classification. Prof. Richard Zanibbi
Nonparametric Classification Prof. Richard Zanibbi What to do when feature distributions (likelihoods) are not normal Don t Panic! While they may be suboptimal, LDC and QDC may still be applied, even though
More informationColor-Based Classification of Natural Rock Images Using Classifier Combinations
Color-Based Classification of Natural Rock Images Using Classifier Combinations Leena Lepistö, Iivari Kunttu, and Ari Visa Tampere University of Technology, Institute of Signal Processing, P.O. Box 553,
More informationWhat is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.
What is Learning? CS 343: Artificial Intelligence Machine Learning Herbert Simon: Learning is any process by which a system improves performance from experience. What is the task? Classification Problem
More informationInstance-Based Learning: A Survey
Chapter 6 Instance-Based Learning: A Survey Charu C. Aggarwal IBM T. J. Watson Research Center Yorktown Heights, NY charu@us.ibm.com 6.1 Introduction... 157 6.2 Instance-Based Learning Framework... 159
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More information3.1. Solution for white Gaussian noise
Low complexity M-hypotheses detection: M vectors case Mohammed Nae and Ahmed H. Tewk Dept. of Electrical Engineering University of Minnesota, Minneapolis, MN 55455 mnae,tewk@ece.umn.edu Abstract Low complexity
More informationsize, runs an existing induction algorithm on the rst subset to obtain a rst set of rules, and then processes each of the remaining data subsets at a
Multi-Layer Incremental Induction Xindong Wu and William H.W. Lo School of Computer Science and Software Ebgineering Monash University 900 Dandenong Road Melbourne, VIC 3145, Australia Email: xindong@computer.org
More informationRandom projection for non-gaussian mixture models
Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,
More information2. On classification and related tasks
2. On classification and related tasks In this part of the course we take a concise bird s-eye view of different central tasks and concepts involved in machine learning and classification particularly.
More informationSolution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013
Your Name: Your student id: Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013 Problem 1 [5+?]: Hypothesis Classes Problem 2 [8]: Losses and Risks Problem 3 [11]: Model Generation
More informationDigital Image Processing Laboratory: MAP Image Restoration
Purdue University: Digital Image Processing Laboratories 1 Digital Image Processing Laboratory: MAP Image Restoration October, 015 1 Introduction This laboratory explores the use of maximum a posteriori
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest
More informationPrototype Selection for Handwritten Connected Digits Classification
2009 0th International Conference on Document Analysis and Recognition Prototype Selection for Handwritten Connected Digits Classification Cristiano de Santana Pereira and George D. C. Cavalcanti 2 Federal
More informationConditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,
Conditional Random Fields and beyond D A N I E L K H A S H A B I C S 5 4 6 U I U C, 2 0 1 3 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative
More informationLecture on Modeling Tools for Clustering & Regression
Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More informationMTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen
MTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen Lecture 2: Feature selection Feature Selection feature selection (also called variable selection): choosing k < d important
More information3. Cluster analysis Overview
Université Laval Multivariate analysis - February 2006 1 3.1. Overview 3. Cluster analysis Clustering requires the recognition of discontinuous subsets in an environment that is sometimes discrete (as
More informationImage Processing. Filtering. Slide 1
Image Processing Filtering Slide 1 Preliminary Image generation Original Noise Image restoration Result Slide 2 Preliminary Classic application: denoising However: Denoising is much more than a simple
More information9.1. K-means Clustering
424 9. MIXTURE MODELS AND EM Section 9.2 Section 9.3 Section 9.4 view of mixture distributions in which the discrete latent variables can be interpreted as defining assignments of data points to specific
More informationMachine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016
Machine Learning 10-701, Fall 2016 Nonparametric methods for Classification Eric Xing Lecture 2, September 12, 2016 Reading: 1 Classification Representing data: Hypothesis (classifier) 2 Clustering 3 Supervised
More informationDocument Image Restoration Using Binary Morphological Filters. Jisheng Liang, Robert M. Haralick. Seattle, Washington Ihsin T.
Document Image Restoration Using Binary Morphological Filters Jisheng Liang, Robert M. Haralick University of Washington, Department of Electrical Engineering Seattle, Washington 98195 Ihsin T. Phillips
More informationUsing Local Trajectory Optimizers To Speed Up Global. Christopher G. Atkeson. Department of Brain and Cognitive Sciences and
Using Local Trajectory Optimizers To Speed Up Global Optimization In Dynamic Programming Christopher G. Atkeson Department of Brain and Cognitive Sciences and the Articial Intelligence Laboratory Massachusetts
More informationTHE preceding chapters were all devoted to the analysis of images and signals which
Chapter 5 Segmentation of Color, Texture, and Orientation Images THE preceding chapters were all devoted to the analysis of images and signals which take values in IR. It is often necessary, however, to
More informationDensity estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate
Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,
More informationThe Comparative Study of Machine Learning Algorithms in Text Data Classification*
The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification
More informationSparse & Redundant Representations and Their Applications in Signal and Image Processing
Sparse & Redundant Representations and Their Applications in Signal and Image Processing Sparseland: An Estimation Point of View Michael Elad The Computer Science Department The Technion Israel Institute
More informationA Topography-Preserving Latent Variable Model with Learning Metrics
A Topography-Preserving Latent Variable Model with Learning Metrics Samuel Kaski and Janne Sinkkonen Helsinki University of Technology Neural Networks Research Centre P.O. Box 5400, FIN-02015 HUT, Finland
More informationAn Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst
An Evaluation of Information Retrieval Accuracy with Simulated OCR Output W.B. Croft y, S.M. Harding y, K. Taghva z, and J. Borsack z y Computer Science Department University of Massachusetts, Amherst
More informationREPORTED DECISION INTEGRATION MODULE UNIFIED DECISION REFINEMENT BAYESIAN BAYESIAN BAYESIAN BAYESIAN BAYESIAN BAYESIAN
Statistical Decision Integration Using Fisher Criterion S. Shah J. K. Aggarwal Laboratory for Visual Computing Computer & Vision Res. Ctr. Wayne State University The University of Texas at Austin Dept.
More informationCHAPTER 4: CLUSTER ANALYSIS
CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis
More information6.867 Machine Learning
6.867 Machine Learning Problem set - solutions Thursday, October What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove. Do not
More informationDecision Making. final results. Input. Update Utility
Active Handwritten Word Recognition Jaehwa Park and Venu Govindaraju Center of Excellence for Document Analysis and Recognition Department of Computer Science and Engineering State University of New York
More informationBayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis
Bayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis Xavier Le Faucheur a, Brani Vidakovic b and Allen Tannenbaum a a School of Electrical and Computer Engineering, b Department of Biomedical
More information