Learning in Medical Image Databases. Cristian Sminchisescu. Department of Computer Science. Rutgers University, NJ

Size: px
Start display at page:

Download "Learning in Medical Image Databases. Cristian Sminchisescu. Department of Computer Science. Rutgers University, NJ"

Transcription

1 Learning in Medical Image Databases Cristian Sminchisescu Department of Computer Science Rutgers University, NJ December, 998 Abstract In this paper we present several results obtained by experimenting with Bayesian minimum Mahalanobis distance and k-nearest neighbor classication methods in a medical domain. The training set contains of four classes of dierent pathologies. Individual pathologies are represented in terms of their contour description, by a high-dimensional vector (the modal vector) abstracting their shape. We employ a divergence criterion to identify features with high discriminative power. The performance measurements suggest that the Bayes classier outperforms the weighted k-nearest neighbor classi- er, a result which is not surprising considering the particular noisy structure of the training data set. Keywords: Machine learning, Bayesian learning, Nearest neighbor classication, Feature selection. Introduction The problem we are investigating in this work concerns classifying dental pathologies based upon the shape of the pathology. Specically, we employ a clinical radiograph image database containing 64 dental pathologies and the database consists of 4 dierent classes of dental disease, each class consisting of 6 elements. The 4 dierent classes represent progressive evolution of the disease, that is, the class C represents incipient forms of the disease, while class C4 consists of the most advanced stage of dental disease (gure 6). Each pathology contour has been identi- ed (in the dental radiograph) and the pathology has been labeled into one of the 4 classes of the disease we are interested in by an expert physician. However, dealing simply with the pathology contour (consisting of a set of coordinates) does not give us any powerful description or abstraction regarding the shape of the pathology. In order to abstract the shape of a pathology, starting from the raw contour points, we use modal analysis, a computer vision technique for obtaining description of objects in terms of a vector of deformation modes [5]. In particular, we are employing a prototype shape (in this case, an ellipse) and we compute the modal displacement vector, associated with the process of deforming (or aligning) the prototype shape with the underlying lesion shape (gure 7). The individual components of such a vector represent the deformation modes (like, for instance, tapering, bending, pitching, and so on), of a shape at ner and ner level of details (the number of modes could be quite large, of order 0 3 ). The modal description of a shape resembles a Fourier description of a signal where the low-order frequencies convey the signal's coarse characteristics while the high frequencies convey very ne signal information. Consequently, we expect that higher order modes in the shape description will be more sensitive to noise and less useful in terms of discriminative power (or classication ability). This domain knowledge allows us, to cut the dimensionality of the modal vector (and the subsequent analysis) down to the rst 30 modes. 2 Bayesian Learning The problem we are studying can be formulated in a Bayesian framework as follows: given a set of classes c i ; i = ::c (c = 4 in our case), we want to design a classier such that, given a feature vector x, maximizes the probability of correctly classifying it. 2. Bayes Classier We use Bayes'rule: where P (c i jx) = p(xjc i)p (c i ) p(x) ()

2 p(x) = cx j= p(xjc j )P (c j ) (2) In the above equations, P (c j ) represent a priori probabilities for class c j, p(xjc j ) represent the state conditional probability densities (essentially the probability of observing x, given the class c j ) and P (c j jx) represent the a posteriori probabilities (all j's are within the range ::n). Essentially, Bayes rule gives a quantitative account about how observing the values of x changes the a priori probability P (c j ) into the a posteriori probability P (c j jx). The pattern classication problem can be formulated in terms of a set of discriminant functions g i (x); i = ::c, each one associated with a dierent class. Given a feature vector x, the classier computes c discriminant functions and selects the class corresponding to the largest discriminant. Consequently, the vector x is assigned to class c i if: g i (x) > g j (x); 8j 6= i (3) For the particular case of a Bayesian classier, we can consider g i (x) = P (c i jx), so the maximum discriminant function naturally corresponds to the maximum a posteriori probability. However, the choice of a particular discriminant function is not unique, as it could, for instance, be multiplied by a positive constant or biased by an additive constant. Moreover, replacing every g i (x) by f(g i (x)) where f is a monotonically increasing function, will preserve the results of the classication [2]. The observation leads to both analytical and computational simplications as: cp g i (x) = P (c i jx) = p(xjc i)p (c i ) j= p(xjc j )P (c j ) gives the same classications results as: (4) g i (x) = log p(xjc i ) + log P (c i ) (5) (note that the term c P j= p(xjc j )P (c j ) is a sum having the same value among all classes, so one can treat it as a constant). For the experiments, we assume equally likely a priori class probabilities, that is, P (c i ) = P (c j ); 8i; j 2 ::c. Consequently, equation 5 becomes: g i (x) = log p(xjc i ) (6) and the central problem is transfered on estimating the conditional densities p(xjc j ); j = ::c. 2.2 Multivariate Normal Density The assumption used with many Bayesian classiers, that we shall follow in the experiments as well, is that the conditional density is multivariate normal (to be dened precisely below) which represents an appropriate model for the case when the feature vectors x corresponding to a class c i are continuous valued, moderately corrupted versions of a prototypical vector i. The multivariate normal density can be written as: p(x) = exp[? (2) d 2 jj 2 2 (x? )t? (x? )] (7) where x is a d-component column vector, is the d-component mean vector, and is the d-by-d covariance matrix. This equation is always abbreviated as p(x) N(; ). By using the equations 6 and 7, one can devise the discriminant functions corresponding to the minimum-error rate classication (leaving apart the constant and scale factors): g i (x) =?(x? ) t? i (x? )? log j i j (8) Under the particular assumptions that the covariance matrices for all classes are equal, that is: i = ; 8i 2 ; ::; c, the discriminant functions become: g i (x) =?(x? ) t? (x? ) (9) In the above equation, the right hand side term: r 2 = (x? ) t? (x? ) (0) is the squared Mahalanobis distance. To classify a feature vector x, one measures the squared Mahalanobis distance from x to each of the c mean vectors and assigns x to the class corresponding to the nearest mean. In the experiments we use a minimum Mahalanobis distance classier, based on the discriminant function presented in equation Parameter Estimation: The Mean and Covariance Matrix Under the assumption of multivariate normal distribution, the Maximum Likelihood (ML) Estimation for the mean vector and covariance matrix is: ^ = n nx k= x k () 2

3 ^ = n? nx k= (x k? ^)(x k? ^) t (2) where x k is the sample data. One can notice that the ML estimate for the unknown population mean is the sample mean, or the centroid of the sample cloud. The ML estimate for the covariance matrix is the sample variance, and it can be shown to be unbiased. For the experiments we use the above formulas for obtaining estimates for the mean vectors and covariance matrices corresponding to each of the classes c i ; i 2 f::4g, involved in the learning process. 3 Nearest-Neighbor Classication Considering the feature vectors of the form: x = (a (x); a 2 (x); :::; a n (x)); the distance between two instances x i and x j is given by: d(x i ; x j ) = vu u t X n k= (a k (x i )? a k (x j )) 2 (3) Given a new instance x q, to be classied, the k- nearest neighbor algorithm performs a vote among the k instances nearest to x q. More precisely, given the set of possible classication decisions (classes) C = c ; c 2 ; c 3 ; c 4, the algorithm decides according to: ^f = argmax c2c kx i= (c; f(x i )) (4) where (a; b) = if a = b and 0 otherwise. In the experiments, we employ a distance-weighted k-nearest neighbor method: ^f = argmax c2c where w i = d(x q;x i) 2. kx i= w i (c; f(x i )) (5) 4 Discriminant Feature Selection In this section, we are interested in identifying those features (vector components) which are highly relevant in terms of the classication performance (providing good separation between the groups to be classied). A brute-force approach to such a problem is to consider the power set of the feature set, run the classication process for each such particular combination of features, and select that feature set leading to the best classication performance. However, this solution shortly proves to be intractable as, for the 30 feature vectors we use, we need to generate the power set of their features, that is 2 30 possible subsets. An alternative approach is to use a probabilistic measure to check the impact of each vector component on the separability between dierent groups. We employ divergence, as a quantitative measure for this purpose. Intuitively, the divergence is a measure of the distributional separability of two probability distributions and measures how well the features used can represent the statistics conveyed in the raw data. For testing binary hypothesis, H vs. H 2, in which the probability distributions of features is P under H, and P 2 under H 2, we dene P e as average error probability, P 2 as the probability of choosing H when class H 2 is actually true, and vice versa for P 2 : P e = P 2 P (H ) + P 2 P (H 2 ) Furthermore, the divergence is related to the error exponent in P e as follows: as the divergence increases, the error represented by P e decreases. A closed form expression for the divergence for multi-variate Gaussian data can be derived as in [4]: D = 2 (m? m 2 ) T (K? + K? 2 )(m? m 2 ) + 2 tr(k? K 2 + K? 2 K? 2I) (6) where m and m 2 are mean values, K and K 2 are covariance matrices, corresponding to the feature vectors in classes c and c 2, and tr is a function operating on a matrix argument and computing the summation of its diagonal elements. Now, we need to link the inter-group divergence which gives a measure for the similarity between two groups, with our particular application where several groups are present. Furthermore, we want to construct a criterion function, based on divergence, such that we can evaluate the separability contribution of each feature with respect to the groups involved in the classication process. More precisely, for any feature f 2 G; G = f::30g corresponding to a modal vector, we compute a divergence-based global criterion value given by: c f = 3X 4X i= j=i+ D G?f ij (7) 3

4 where Dij S represents the inter-group divergence between groups i and j, computed for the features in set S (S G). Selecting good discriminative features according to the criterion, reduces to reversely sorting the features based on the value obtained using the formula 7, that is, the better the feature, the lower the value of its corresponding criterion (if the feature provides good discriminative ability, removing it would result in a decrease in the separation ability as quantied by the divergence measure). The results of applying the criterion for each feature are plotted in gure 3. The criterion 7 impose a partial ordering on the feature set. When testing the performance for dierent classiers, we incrementally construct the classiers corresponding to all the ordered feature sets (that is the sets consisting of features fg; f; 2g; :::f; 2; :::; 30g, but note that, for instance, \" now means the rst feature according to the feature ordering criterion, and not the rst component of the vector) and in this way, the complexity of experimenting with each particular classier becomes linear in the number of present features (in our case 30). To provide further insight into the relevance of this criterion selection, we also perform a standard statistical analysis to compute the mean and standard deviation for each feature in each individual group (as well as for the global data set). The results are depicted in gures and 2. Intuitively, divergence balances the mean inter-group separability for individual vector components with the corresponding standard deviation and \assembles" these into a formula characterizing the separability of two groups involved in the classication. The criterion presented in 7 performs a greedy selection (by summing over the divergences corresponding to a set of features, for all pairs of groups involved in the classication process). It is greedy in the sense that it might assign a good merit value to some features which not necessarily provide good separability between all the groups, but sometimes only between pairs of groups (some groups might be very well separated, some other might not, but the criterion might not able to \sense" this, as it just computes an overall sum). Mean Value Standard Deviation < Feature Means > "A" "B" "C" "D" "Globally" Feature Figure : Medians < Feature Standard Deviations > "A" "B" "C" "D" "Globally" 5 Experiments and Results We experiment with the classication methods presented above by running a cross-validated procedure for obtaining more accurate results. We employ a leave-one-out method for maximizing the utilization Feature Figure 2: Standard Deviations 4

5 Criterion < Best Features: Criterion = Probabilistic Distance - Divergence> "Criterion" the ordering criterion 7 on the set of features. Subsequently, for each k-nearest neighbor classier, we pick (and plot) the value corresponding to the minimum error among all ordered feature sets (that is fg; f; 2g; f; 2; 3g; :::f; 2; 3; :::30g). Consequently, the error values we plot are not necessarily obtained for the same ordered feature set, but might correspond to dierent ordered feature sets. This might make the plot 4, nonuniform, but we felt that the important thing to analyze is the real minimum error estimate, and not the error estimate resulting from the rigid imposition of a particular subset of features (which might not provide a minimum error estimate for a classier corresponding to a particular value of k) Feature Figure 3: Feature Criterion Values of the data. For the case of the nearest-neighbor classication we use a leave-4-out method (by keeping out one member of each class), in order to avoid bias in the experiments. The confusion matrix corresponding to the minimum Mahalanobis distance classier is given in table. Minimum classication error estimation has been obtained when selecting the rst 5 best features (according to the divergence criterion described), and the value is 22.25%. The error estimations corresponding to selecting the rst k criterion features (k = ::5) are plotted in gure 5. The confusion matrix corresponding to the weighted k-nearest neighbor classier is given in table 2. Minimum classication error estimation has been obtained for 5-nearest neighbor, the rst 7 best features, and the value is 32.8%. The error estimations corresponding to selecting dierent k-nearest neighbor (k = ::0) classiers are shown in gure 4. The plot presented in gure 4 represents a compressed version of the runs actually performed. We build separate k-nearest neighbor classiers for each k 2 f::0g, and for each classier, we consider different feature sets in the order of their criterion selection, i.e. we rst evaluate the classier corresponding only to the rst best feature, then the classier corresponding to the rst and second best features and so on, up to the classier containing all the features in the set (30). Note that the sets are not created by randomly choosing features, but they are ordered in the sense that we gradually add features according to gr. G gr. G2 gr. G3 gr. G4 G 86.50% 9.50% 4.00% 0.00% G2 2.75% 75.25% 6.50% 5.50% G3 7.50% 5.00% 78.50% 9.00% G4 4.50% 9.50% 4.75% 7.25% Table : Minimum Mahalanobis Distance Classier Confusion Matrix gr. G gr. G2 gr. G3 gr. G4 G 68.75% 3.25% 0.00% 0.00% G % 62.50% 2.50% 0.00% G3 0.00% 0.00% 00.00% 0.00% G % 0.00% 25.00% 37.50% Table 2: Nearest Neighbor Classier Confusion Matrix 6 Discussion We observe that the results obtained by using the Bayes classier are better in terms of the classication error (22.25% versus 32.8%). Also, the Bayes classier use a smaller size of the feature set for obtaining its smallest error classication, that is the rst 5 versus the rst 7 criterion features, for the 5-nearest neighbor classier. The order set of the rst 7 criterion features is f2; 26; 6; 30; 6; 29; 3g. The fact that the Bayes classier gives better results can be considered as a reasonable, practically validated outcome, as it is known that the 5

6 Error Error < Error Using KNN and Best Features > "Error" K - Number of Nearest Neighbors Figure 4: k-nearest neighbor error < Error Using Minimum Mahalanobis Distance Classifier > "Error" k-nearest neighbor classiers provide expected suboptimal classication performance [2]. Furthermore, it is only when the number of samples becomes very large (approaching innity according to the theory) when k-nearest neighbor classication starts emphasizing nearly optimal behavior. The voting process among k-nearest neighbors can be understood as a trade-o between reliability (choosing many neighbors for obtaining a reliable estimate) and accuracy (choosing those neighbors that are really very close to the point to be classied), forcing towards a compromise value for k, that is a small percentage of the number of examples. From the results, we can verify that this is, indeed, the case. There is a particular issue related to the very noisy state of the training examples (the description of the lesions' shape) at present time, that negatively impacts on the k-nearest neighbor classication. It is known that these classication methods are particularly sensitive to noise [3]. We also expect that the minimum Mahalanobis distance classier will be more robust to noise as the distance it is based upon is normalized by covariance, that is, accounts for the deviations within the groups involved in the classication process. In terms of the confusion matrices associated with the two classiers, we generally notice higher probability of misclassication within \adjacent" groups, which is, again an expected behavior, as the classes C and C2 are more likely to be similar that C and C4. However, this is not always the case. We notice, that, for instance the group 4, the lowest classication performance is obtained for both classiers, and the probability of misclassication in group is important although one may expect about them being the most separated ( since they represent classes of disease in their initial and nal evolution). This might be due to the particularly noisy descriptions in this group (by analyzing gure 2 one can notice that this is indeed true, as the standard deviation for group 4 dominates over the ones for the other groups almost everywhere in the feature domain) Conclusions and Further Work K - Number of Best Features (First K) Figure 5: Mahalanobis minimum distance error In this work, we have implemented and analyzed the performance of two classiers on samples derived from four classes of pathologies encountered in a medical image database. In order to obtain better classication performance, we performed a discriminant feature analysis based on a divergence criterion. The features were subsequently ordered according to their 6

7 discriminative ability and we subsequently tested the classier on incrementally constructed sets (that is sets in which features are incrementally added according to the order generated by the divergence criterion). This methodology attempts to deal with the intractable problem of generating all possible subsets of the feature set (2 30 ), run the classiers and obtain error estimations for all such possible feature sets. The classication results we obtained are quite promising, but further extensions are possible in several directions. First, operating on a large database, translated in a large training set could certainly provide more accurate error estimation results as well as further insight into this classication problem. Second, a more realistic feature selection method based on divergence might be used. At present time, the selection is based on the order generated by summing all inter-group divergence values, corresponding to a set of features and leaving one feature out each time (basically, computing a form of divergence gain for each feature). While we are certainly looking for features with higher contributions to the overall divergence this does not necessarily mean they provide good separation between any two groups and this might lead to poor classication performance. Using criteria that might take the relative inter-group divergence into account (not only their sum) might result in better discriminant feature selection. Ultimately, devising better, less noisy descriptions or extraction of shape vectors using computer vision techniques, should certainly improve classication performance. Acknowledgments: I would like to thank Sachin Lodha and Wen Li who kindly accepted to review the paper. References Figure 6: A pathology in its four progressive evolution stages Figure 7: Deformation of an ellipse into a pathology contour Database. Workshop on Biomedical Image Analysis, June 26-27, 998,Santa Barbara, California. [] A.Blum. Empirical Support for Winnow and Weighted-Majority Algorithms: Results on a Calendar Scheduling Domain. CMU-TR, 998 [2] R.Duda and P.Hart. Pattern Classication and Scene Analysis. Wiley Interscience, 973. [3] T.Mitchell. Machine Learning. McGraw-Hill, 997. [4] W.Therrien. Decision Estimation and Classication. John Wiley and Sons, 989. [5] W.Zhang, S.Dickinson, S.Sclaro, J.Feldman, S.Dunn. Shape Indexing in a Medical Image 7

Supervised vs unsupervised clustering

Supervised vs unsupervised clustering Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful

More information

Introduction The problem of cancer classication has clear implications on cancer treatment. Additionally, the advent of DNA microarrays introduces a w

Introduction The problem of cancer classication has clear implications on cancer treatment. Additionally, the advent of DNA microarrays introduces a w MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No.677 C.B.C.L Paper No.8

More information

Classification: Linear Discriminant Functions

Classification: Linear Discriminant Functions Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions

More information

Advances in Neural Information Processing Systems, 1999, In press. Unsupervised Classication with Non-Gaussian Mixture Models using ICA Te-Won Lee, Mi

Advances in Neural Information Processing Systems, 1999, In press. Unsupervised Classication with Non-Gaussian Mixture Models using ICA Te-Won Lee, Mi Advances in Neural Information Processing Systems, 1999, In press. Unsupervised Classication with Non-Gaussian Mixture Models using ICA Te-Won Lee, Michael S. Lewicki and Terrence Sejnowski Howard Hughes

More information

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach Truth Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 2.04.205 Discriminative Approaches (5 weeks)

More information

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision Object Recognition Using Pictorial Structures Daniel Huttenlocher Computer Science Department Joint work with Pedro Felzenszwalb, MIT AI Lab In This Talk Object recognition in computer vision Brief definition

More information

Rowena Cole and Luigi Barone. Department of Computer Science, The University of Western Australia, Western Australia, 6907

Rowena Cole and Luigi Barone. Department of Computer Science, The University of Western Australia, Western Australia, 6907 The Game of Clustering Rowena Cole and Luigi Barone Department of Computer Science, The University of Western Australia, Western Australia, 697 frowena, luigig@cs.uwa.edu.au Abstract Clustering is a technique

More information

Machine Learning. Supervised Learning. Manfred Huber

Machine Learning. Supervised Learning. Manfred Huber Machine Learning Supervised Learning Manfred Huber 2015 1 Supervised Learning Supervised learning is learning where the training data contains the target output of the learning system. Training data D

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Many slides adapted from B. Schiele Machine Learning Lecture 3 Probability Density Estimation II 26.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 26.04.206 Discriminative Approaches (5 weeks) Linear

More information

PATTERN CLASSIFICATION AND SCENE ANALYSIS

PATTERN CLASSIFICATION AND SCENE ANALYSIS PATTERN CLASSIFICATION AND SCENE ANALYSIS RICHARD O. DUDA PETER E. HART Stanford Research Institute, Menlo Park, California A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS New York Chichester Brisbane

More information

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering Introduction to Pattern Recognition Part II Selim Aksoy Bilkent University Department of Computer Engineering saksoy@cs.bilkent.edu.tr RETINA Pattern Recognition Tutorial, Summer 2005 Overview Statistical

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Chapter 6 Continued: Partitioning Methods

Chapter 6 Continued: Partitioning Methods Chapter 6 Continued: Partitioning Methods Partitioning methods fix the number of clusters k and seek the best possible partition for that k. The goal is to choose the partition which gives the optimal

More information

Sensor Tasking and Control

Sensor Tasking and Control Sensor Tasking and Control Outline Task-Driven Sensing Roles of Sensor Nodes and Utilities Information-Based Sensor Tasking Joint Routing and Information Aggregation Summary Introduction To efficiently

More information

Content-based image and video analysis. Machine learning

Content-based image and video analysis. Machine learning Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Machine Learning Lecture 3 Probability Density Estimation II 19.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Exam dates We re in the process

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Isabelle Guyon Notes written by: Johann Leithon. Introduction The process of Machine Learning consist of having a big training data base, which is the input to some learning

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover

More information

J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report. February 5, 1998

J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report. February 5, 1998 Density Estimation using Support Vector Machines J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report CSD-TR-97-3 February 5, 998!()+, -./ 3456 Department of Computer Science

More information

The Curse of Dimensionality

The Curse of Dimensionality The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more

More information

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Thomas Giraud Simon Chabot October 12, 2013 Contents 1 Discriminant analysis 3 1.1 Main idea................................

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned

More information

I How does the formulation (5) serve the purpose of the composite parameterization

I How does the formulation (5) serve the purpose of the composite parameterization Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,

More information

Supervised vs. Unsupervised Learning

Supervised vs. Unsupervised Learning Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now

More information

INF 4300 Classification III Anne Solberg The agenda today:

INF 4300 Classification III Anne Solberg The agenda today: INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007,

More information

Pattern Recognition ( , RIT) Exercise 1 Solution

Pattern Recognition ( , RIT) Exercise 1 Solution Pattern Recognition (4005-759, 20092 RIT) Exercise 1 Solution Instructor: Prof. Richard Zanibbi The following exercises are to help you review for the upcoming midterm examination on Thursday of Week 5

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 2014-2015 Jakob Verbeek, November 28, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15

More information

Regularization and model selection

Regularization and model selection CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial

More information

Clustering Sequences with Hidden. Markov Models. Padhraic Smyth CA Abstract

Clustering Sequences with Hidden. Markov Models. Padhraic Smyth CA Abstract Clustering Sequences with Hidden Markov Models Padhraic Smyth Information and Computer Science University of California, Irvine CA 92697-3425 smyth@ics.uci.edu Abstract This paper discusses a probabilistic

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

A Distance-Based Classifier Using Dissimilarity Based on Class Conditional Probability and Within-Class Variation. Kwanyong Lee 1 and Hyeyoung Park 2

A Distance-Based Classifier Using Dissimilarity Based on Class Conditional Probability and Within-Class Variation. Kwanyong Lee 1 and Hyeyoung Park 2 A Distance-Based Classifier Using Dissimilarity Based on Class Conditional Probability and Within-Class Variation Kwanyong Lee 1 and Hyeyoung Park 2 1. Department of Computer Science, Korea National Open

More information

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER A.Shabbir 1, 2 and G.Verdoolaege 1, 3 1 Department of Applied Physics, Ghent University, B-9000 Ghent, Belgium 2 Max Planck Institute

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

Lab 2: Support Vector Machines

Lab 2: Support Vector Machines Articial neural networks, advanced course, 2D1433 Lab 2: Support Vector Machines March 13, 2007 1 Background Support vector machines, when used for classication, nd a hyperplane w, x + b = 0 that separates

More information

Bayes Classifiers and Generative Methods

Bayes Classifiers and Generative Methods Bayes Classifiers and Generative Methods CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Stages of Supervised Learning To

More information

Machine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017

Machine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017 Machine Learning in the Wild Dealing with Messy Data Rajmonda S. Caceres SDS 293 Smith College October 30, 2017 Analytical Chain: From Data to Actions Data Collection Data Cleaning/ Preparation Analysis

More information

APPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES

APPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES APPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES A. Likas, K. Blekas and A. Stafylopatis National Technical University of Athens Department

More information

Estimation of Item Response Models

Estimation of Item Response Models Estimation of Item Response Models Lecture #5 ICPSR Item Response Theory Workshop Lecture #5: 1of 39 The Big Picture of Estimation ESTIMATOR = Maximum Likelihood; Mplus Any questions? answers Lecture #5:

More information

Markov Random Fields and Gibbs Sampling for Image Denoising

Markov Random Fields and Gibbs Sampling for Image Denoising Markov Random Fields and Gibbs Sampling for Image Denoising Chang Yue Electrical Engineering Stanford University changyue@stanfoed.edu Abstract This project applies Gibbs Sampling based on different Markov

More information

Local qualitative shape from stereo. without detailed correspondence. Extended Abstract. Shimon Edelman. Internet:

Local qualitative shape from stereo. without detailed correspondence. Extended Abstract. Shimon Edelman. Internet: Local qualitative shape from stereo without detailed correspondence Extended Abstract Shimon Edelman Center for Biological Information Processing MIT E25-201, Cambridge MA 02139 Internet: edelman@ai.mit.edu

More information

CS 664 Segmentation. Daniel Huttenlocher

CS 664 Segmentation. Daniel Huttenlocher CS 664 Segmentation Daniel Huttenlocher Grouping Perceptual Organization Structural relationships between tokens Parallelism, symmetry, alignment Similarity of token properties Often strong psychophysical

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

10701 Machine Learning. Clustering

10701 Machine Learning. Clustering 171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among

More information

International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA

International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERATIONS KAM_IL SARAC, OMER E GEC_IO GLU, AMR EL ABBADI

More information

CSE 446 Bias-Variance & Naïve Bayes

CSE 446 Bias-Variance & Naïve Bayes CSE 446 Bias-Variance & Naïve Bayes Administrative Homework 1 due next week on Friday Good to finish early Homework 2 is out on Monday Check the course calendar Start early (midterm is right before Homework

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1 Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Chapter 9 Chapter 9 1 / 50 1 91 Maximal margin classifier 2 92 Support vector classifiers 3 93 Support vector machines 4 94 SVMs with more than two classes 5 95 Relationshiop to

More information

ABSTRACT 1. INTRODUCTION 2. METHODS

ABSTRACT 1. INTRODUCTION 2. METHODS Finding Seeds for Segmentation Using Statistical Fusion Fangxu Xing *a, Andrew J. Asman b, Jerry L. Prince a,c, Bennett A. Landman b,c,d a Department of Electrical and Computer Engineering, Johns Hopkins

More information

A Hierarchical Statistical Framework for the Segmentation of Deformable Objects in Image Sequences Charles Kervrann and Fabrice Heitz IRISA / INRIA -

A Hierarchical Statistical Framework for the Segmentation of Deformable Objects in Image Sequences Charles Kervrann and Fabrice Heitz IRISA / INRIA - A hierarchical statistical framework for the segmentation of deformable objects in image sequences Charles Kervrann and Fabrice Heitz IRISA/INRIA, Campus Universitaire de Beaulieu, 35042 Rennes Cedex,

More information

Gaussian Processes for Robotics. McGill COMP 765 Oct 24 th, 2017

Gaussian Processes for Robotics. McGill COMP 765 Oct 24 th, 2017 Gaussian Processes for Robotics McGill COMP 765 Oct 24 th, 2017 A robot must learn Modeling the environment is sometimes an end goal: Space exploration Disaster recovery Environmental monitoring Other

More information

Statistical image models

Statistical image models Chapter 4 Statistical image models 4. Introduction 4.. Visual worlds Figure 4. shows images that belong to different visual worlds. The first world (fig. 4..a) is the world of white noise. It is the world

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

Clustering: Classic Methods and Modern Views

Clustering: Classic Methods and Modern Views Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering

More information

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008 Cluster Analysis Jia Li Department of Statistics Penn State University Summer School in Statistics for Astronomers IV June 9-1, 8 1 Clustering A basic tool in data mining/pattern recognition: Divide a

More information

Learning to Learn: additional notes

Learning to Learn: additional notes MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2008 Recitation October 23 Learning to Learn: additional notes Bob Berwick

More information

Metaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini

Metaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini Metaheuristic Development Methodology Fall 2009 Instructor: Dr. Masoud Yaghini Phases and Steps Phases and Steps Phase 1: Understanding Problem Step 1: State the Problem Step 2: Review of Existing Solution

More information

All images are degraded

All images are degraded Lecture 7 Image Relaxation: Restoration and Feature Extraction ch. 6 of Machine Vision by Wesley E. Snyder & Hairong Qi Spring 2018 16-725 (CMU RI) : BioE 2630 (Pitt) Dr. John Galeotti The content of these

More information

Nonparametric Classification. Prof. Richard Zanibbi

Nonparametric Classification. Prof. Richard Zanibbi Nonparametric Classification Prof. Richard Zanibbi What to do when feature distributions (likelihoods) are not normal Don t Panic! While they may be suboptimal, LDC and QDC may still be applied, even though

More information

Color-Based Classification of Natural Rock Images Using Classifier Combinations

Color-Based Classification of Natural Rock Images Using Classifier Combinations Color-Based Classification of Natural Rock Images Using Classifier Combinations Leena Lepistö, Iivari Kunttu, and Ari Visa Tampere University of Technology, Institute of Signal Processing, P.O. Box 553,

More information

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control. What is Learning? CS 343: Artificial Intelligence Machine Learning Herbert Simon: Learning is any process by which a system improves performance from experience. What is the task? Classification Problem

More information

Instance-Based Learning: A Survey

Instance-Based Learning: A Survey Chapter 6 Instance-Based Learning: A Survey Charu C. Aggarwal IBM T. J. Watson Research Center Yorktown Heights, NY charu@us.ibm.com 6.1 Introduction... 157 6.2 Instance-Based Learning Framework... 159

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

3.1. Solution for white Gaussian noise

3.1. Solution for white Gaussian noise Low complexity M-hypotheses detection: M vectors case Mohammed Nae and Ahmed H. Tewk Dept. of Electrical Engineering University of Minnesota, Minneapolis, MN 55455 mnae,tewk@ece.umn.edu Abstract Low complexity

More information

size, runs an existing induction algorithm on the rst subset to obtain a rst set of rules, and then processes each of the remaining data subsets at a

size, runs an existing induction algorithm on the rst subset to obtain a rst set of rules, and then processes each of the remaining data subsets at a Multi-Layer Incremental Induction Xindong Wu and William H.W. Lo School of Computer Science and Software Ebgineering Monash University 900 Dandenong Road Melbourne, VIC 3145, Australia Email: xindong@computer.org

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

2. On classification and related tasks

2. On classification and related tasks 2. On classification and related tasks In this part of the course we take a concise bird s-eye view of different central tasks and concepts involved in machine learning and classification particularly.

More information

Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013

Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013 Your Name: Your student id: Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013 Problem 1 [5+?]: Hypothesis Classes Problem 2 [8]: Losses and Risks Problem 3 [11]: Model Generation

More information

Digital Image Processing Laboratory: MAP Image Restoration

Digital Image Processing Laboratory: MAP Image Restoration Purdue University: Digital Image Processing Laboratories 1 Digital Image Processing Laboratory: MAP Image Restoration October, 015 1 Introduction This laboratory explores the use of maximum a posteriori

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest

More information

Prototype Selection for Handwritten Connected Digits Classification

Prototype Selection for Handwritten Connected Digits Classification 2009 0th International Conference on Document Analysis and Recognition Prototype Selection for Handwritten Connected Digits Classification Cristiano de Santana Pereira and George D. C. Cavalcanti 2 Federal

More information

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C, Conditional Random Fields and beyond D A N I E L K H A S H A B I C S 5 4 6 U I U C, 2 0 1 3 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative

More information

Lecture on Modeling Tools for Clustering & Regression

Lecture on Modeling Tools for Clustering & Regression Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

MTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen

MTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen MTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen Lecture 2: Feature selection Feature Selection feature selection (also called variable selection): choosing k < d important

More information

3. Cluster analysis Overview

3. Cluster analysis Overview Université Laval Multivariate analysis - February 2006 1 3.1. Overview 3. Cluster analysis Clustering requires the recognition of discontinuous subsets in an environment that is sometimes discrete (as

More information

Image Processing. Filtering. Slide 1

Image Processing. Filtering. Slide 1 Image Processing Filtering Slide 1 Preliminary Image generation Original Noise Image restoration Result Slide 2 Preliminary Classic application: denoising However: Denoising is much more than a simple

More information

9.1. K-means Clustering

9.1. K-means Clustering 424 9. MIXTURE MODELS AND EM Section 9.2 Section 9.3 Section 9.4 view of mixture distributions in which the discrete latent variables can be interpreted as defining assignments of data points to specific

More information

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016 Machine Learning 10-701, Fall 2016 Nonparametric methods for Classification Eric Xing Lecture 2, September 12, 2016 Reading: 1 Classification Representing data: Hypothesis (classifier) 2 Clustering 3 Supervised

More information

Document Image Restoration Using Binary Morphological Filters. Jisheng Liang, Robert M. Haralick. Seattle, Washington Ihsin T.

Document Image Restoration Using Binary Morphological Filters. Jisheng Liang, Robert M. Haralick. Seattle, Washington Ihsin T. Document Image Restoration Using Binary Morphological Filters Jisheng Liang, Robert M. Haralick University of Washington, Department of Electrical Engineering Seattle, Washington 98195 Ihsin T. Phillips

More information

Using Local Trajectory Optimizers To Speed Up Global. Christopher G. Atkeson. Department of Brain and Cognitive Sciences and

Using Local Trajectory Optimizers To Speed Up Global. Christopher G. Atkeson. Department of Brain and Cognitive Sciences and Using Local Trajectory Optimizers To Speed Up Global Optimization In Dynamic Programming Christopher G. Atkeson Department of Brain and Cognitive Sciences and the Articial Intelligence Laboratory Massachusetts

More information

THE preceding chapters were all devoted to the analysis of images and signals which

THE preceding chapters were all devoted to the analysis of images and signals which Chapter 5 Segmentation of Color, Texture, and Orientation Images THE preceding chapters were all devoted to the analysis of images and signals which take values in IR. It is often necessary, however, to

More information

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,

More information

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

The Comparative Study of Machine Learning Algorithms in Text Data Classification* The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification

More information

Sparse & Redundant Representations and Their Applications in Signal and Image Processing

Sparse & Redundant Representations and Their Applications in Signal and Image Processing Sparse & Redundant Representations and Their Applications in Signal and Image Processing Sparseland: An Estimation Point of View Michael Elad The Computer Science Department The Technion Israel Institute

More information

A Topography-Preserving Latent Variable Model with Learning Metrics

A Topography-Preserving Latent Variable Model with Learning Metrics A Topography-Preserving Latent Variable Model with Learning Metrics Samuel Kaski and Janne Sinkkonen Helsinki University of Technology Neural Networks Research Centre P.O. Box 5400, FIN-02015 HUT, Finland

More information

An Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst

An Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst An Evaluation of Information Retrieval Accuracy with Simulated OCR Output W.B. Croft y, S.M. Harding y, K. Taghva z, and J. Borsack z y Computer Science Department University of Massachusetts, Amherst

More information

REPORTED DECISION INTEGRATION MODULE UNIFIED DECISION REFINEMENT BAYESIAN BAYESIAN BAYESIAN BAYESIAN BAYESIAN BAYESIAN

REPORTED DECISION INTEGRATION MODULE UNIFIED DECISION REFINEMENT BAYESIAN BAYESIAN BAYESIAN BAYESIAN BAYESIAN BAYESIAN Statistical Decision Integration Using Fisher Criterion S. Shah J. K. Aggarwal Laboratory for Visual Computing Computer & Vision Res. Ctr. Wayne State University The University of Texas at Austin Dept.

More information

CHAPTER 4: CLUSTER ANALYSIS

CHAPTER 4: CLUSTER ANALYSIS CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set - solutions Thursday, October What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove. Do not

More information

Decision Making. final results. Input. Update Utility

Decision Making. final results. Input. Update Utility Active Handwritten Word Recognition Jaehwa Park and Venu Govindaraju Center of Excellence for Document Analysis and Recognition Department of Computer Science and Engineering State University of New York

More information

Bayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis

Bayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis Bayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis Xavier Le Faucheur a, Brani Vidakovic b and Allen Tannenbaum a a School of Electrical and Computer Engineering, b Department of Biomedical

More information