Some questions of consensus building using co-association
|
|
- Andrew Holt
- 5 years ago
- Views:
Transcription
1 Some questions of consensus building using co-association VITALIY TAYANOV Polish-Japanese High School of Computer Technics Aleja Legionow, 4190, Bytom POLAND Abstract: In this paper the co-association matrix has been applied to divide all set of objects into some functional groups. The consistence of each group depends on how difficult is to classify a certain object. This has been done to solve two main problems in pattern recognition and machine learning: to reduce the recognition error and the overtraining value. Key Words: Consensus, Co-association matrix, Hamming distance, Dissimilarity 1 Introduction In general case classifier building as well as all recognition algorithm aims to achieve their insensitivity to the irregularity of the set or sample. If one uses learning for building such a kind of algorithms, the irregularity of the sample leads to the error while testing even if there was not an error during the learning period. As an example Support Vector Machines (SVM) could be mentioned. According to this algorithm we can construct a linear hyperplane in some feature space that gives us the possibility to separate classes in this space with error that equals to zero. But this is only for the learning set and for other set obtained from the same source but under a bit other conditions the algorithm will be characterized by some value of error. In this case the hyperplane is determined constantly and can not take into account the probabilistic character of the new objects that belong to some class set. So the value of overtraining shows us the quality of the learning process. That is why it is so important to have a good estimate of this value. Vapnik-Chervonenkis (VC) theory tells that to achieve a small enough value of difference between the probability of errors while learning and testing there should be tens or hundreds of thousands of objects what is often impossible to have. Such a difference is actually called overtraining. These estimates are very overrated and are built for the worst cases in classification problem that are almost unlikely. That is why during last ten years this theory was developed in the domain of determination of factors causing overrating of these estimates []. Thanks to these research it was improved in a lot of times the classical VC estimates. On the other hand it is very interesting to make a research on how to build the algorithms where influence of the sample irregularity is minimal. This could be done by devision of the general set onto functional groups depending on the data complexity and the results of classifier work. The mathematical mechanism to realize this division is based on the co-association matrices and belongs to consensus approach for the classification and clustering. Co-association matrices with respect to the classification algorithms The idea of proposed approach consists in grouping (combining) results of the classification that are identical for the group (ensemble) of classifiers or decision algorithms. Proposed approach concerns the hierarchical classifiers or clustering algorithms construction. Here one considers the classification on N classes. Let I is the number of objects in the set and P is the number of classification results. Every classification p (p = 1,..., P ) associates every object k of the sample with one and only one class. Elementary co-association matrix A k contains the information according to which algorithms u and v has a consensus with respect to some class for object k: { A k 1 for u v u,v = 0 otherwise, (1) where denotes consensus between algorithms ISBN:
2 u and v. Because u v is the same that v u, A k u,v is symmetric binary matrix. If n is the number of algorithms which are used for consensus building then the size of the matrix A is n n. The number of possible different composition of algorithms that could be created is equal P = n(n 1). Let there exists the limitation in the algorithm space by finite limited number of composition P that could be made from these algorithms (p = 1,..., P ). From this set of consensus algorithms one needs to take two of them that should be maximally dissimilar. Formally dissimilarity of a pair of algorithms could be defined as Hamming distance between results of classification for each object k that are presented in form of binary sequences with zeros and ones. The number of zeros and ones in such a sequence is equal to the number of objects I. This means that if algorithm votes that object belongs to the class c (c = 1,..., C) then one puts 1 in the sequence on the position that corresponds to object k, otherwise one puts 0. So the only task left to do is to find the pair of algorithms that have the maximal Hamming distance. Determination of appropriate indices could be done by the following way { } i, j = arg min u,v A k u,v () After determination of this pair of algorithms on the basis of some learning set one estimates the frequency of belonging of some object to the group of objects that have no consensus. Because the pair of the most dissimilar algorithms is estimated on the basis of some learning set the solution will be approximate. In general for the task of classification onto arbitrary number of classes these consensus algorithms divide our set onto three functional sets i.e.: a group of objects on which consensus of two algorithms is reached and it is correct, a group of objects on which consensus is not reached and the group of objects on which consensus is reached and it is incorrect. The amount of objects in the third group can not be reduced at all. So the value of minimal probability of classification error in this case is conditioned by the third group of objects and can not be less than probability that object from the set I belongs to this group. The most interesting from the research point of view is the second group of objects on which there is no consensus of the most dissimilar algorithms. Reclassification of objects from this set gives the possibility to obtain some amount of objects that belongs to the first and the third group. The more objects will drop into the first group the better is the specialized algorithm that makes the reclassification. If one denotes the probability that object belongs to one of k three groups as P 1, P and P 3 then the probability of reclassification in this case by fifty-fifty principle gives the general error of classification equal P e = P P. (3) Given probability has the sense of the upper bound for the classification error, so the probability of correct classification will be in the interval [P 3 ; P P ]. This could be explained by the fact that the probability of error more than 0.5 is not considered because the worst acceptable algorithm of classification by fifty-fifty principle has the error probability that approaches to 0.5. If two algorithms have been characterised by the approximately equal probability P 3 then the better algorithm will be the one that has lower value of probability P under the constraint P > P 3. This is determined by the risks of correct classification obtaining while reclassifying the objects from the second class. Thus one could have the fast approximate estimation of the reliability of algorithms. 3 Some properties of obtained groups It is also important to carry out the research of peculiarities of test objects. The objects of the first and the third groups are not so interesting as objects from the second one. For objects from the first and the third groups as well as for the second one it could be made a research in the direction of definition of structure of these groups. This could be done using Gauss Mixture Models (GMM), where the number of mixtures is some characteristics of the data complexity in this or that group of objects. But these two groups could not be reclassified. For the research purposes it is interesting to consider the second set of objects. It is interesting first of all by the fact that some objects are separated from others and are in separate group and reclassification of these objects gives the possibility to reduce the general classification error. The principle task of the research of peculiarities of objects from the second group is object analysis from this group to build specialised classifiers that could correctly classify as much as possible of objects from this set. First of all let us make a research of the symmetry on objects of the second group. Symmetry test is done relatively to the algorithm that is exactly in the middle between two the most dissimilar algorithms. This means that the Hamming distance from this algorithm to two others is the same. If we have three algorithms ISBN:
3 X, Y and Z then d h (X, Y ) = d h (Y, Z); d h (X, Z) = D h (Y, Z) = D h (X, Y ). All of the estimates of Hamming distance have been done on the basis of learning set. The third algorithm t on the basis of matrix A k u,v could be found by the following way. Let us introduce the notation of the Hamming distance matrix as D H u,v = d h (u, v). Using the matrix A k u,v and making the normalization of the Hamming distance one obtains: k d h (u, v) = 1 Ak u,v, k = 1,..., I. (4) I First we have to find the element in the Hamming distance matrix Du,v H that has a value close as much as ). Then we have to find some ( D H u,v possible to max algorithm t having a property that d h (u, t) = d h (t, v) = max ( D H u,v ( Du,v H ). (5) Then one should search for the minimal element in the column j (see eq. 1) of the new distance matrix Du,v H max. The number of row ) where this element is determines the index of the searched algorithm t. 4 Experimental results Figures 1 6 show graphic dependencies of consensus results for problems taken from UCI repository. This repository was created at the University of California. The data structure of the test tasks from this repository is as follows. Each task is written as a text file where columns are attributes of the object and rows consist of a number of different attributes for every object. Thus the number of rows corresponds to the number of objects and the number of columns correspond to the number of attributes for each object. A separate column consists of labels of classes, which mark each of the objects. A lot of data within this repository has been related to biology and medicine. In table 1 one gives the probabilities of errors obtained on the test data for different classifiers or classifier compositions (committees of algorithms). All these algorithms were verified on two tasks that are difficult enough from the classification point of view. For the proposed algorithm it has been given the minimal and maximal errors that can be obtained on given tested data. Figure 1: Task pima from UCI repository: nonparametrical function of the correct consensus that consists of two algorithms Table 1: Error of classification for different algorithms. method /task bupa pima Monotone (SVM) Monotone (Parzen) AdaBoost (SVM) AdaBoost (Parzen) SVM Parzen RVM Proposed algorithm (min/max, Q = 00) 0.040/ /0.03 In table 1 the value of minimal error is equal to consensus error for the proposed algorithm. The value of maximal error has been calculated as sum of minimal error and the half of the related amount of objects, on which there is no consensus (fifty-fifty principle). As seen from the table the value of maximal error is much less than the least value of error of all given algorithms for two tasks from UCI repository. In comparison with some algorithms given in the table the value of minimal error is approximately 10 times less for the proposed algorithm then the error of some other algorithms from the table. The proposed algorithms are characterized by much more stability of the classification error in comparison with other algorithms. It can be seen from corresponding error comparison for two tasks from the UCI repository. In tables -3 the estimates of probability of belonging of every object from the task of repository UCI to every of three functional groups of objects have been given. In this case the objects, on which ISBN:
4 Figure : Task pima from UCI repository: nonparametrical function of the incorrect consensus that consists of two algorithms Figure 4: Task bupa from UCI repository: nonparametrical function of the correct consensus that consists of two algorithms Table : Task pima from UCI repository. Q=00 Q=30 µ σ µ σ P c P e P c Figure 3: Task pima from UCI repository: nonparametrical function of no consensus between two algorithms consensus of the most dissimilar algorithms exists (P c ), belong to the class of so called easy objects. Then objects, on which both of algorithms that are in consensus make errors (P e ), belong to the class of objects that cause uncorrected error and this error can not be reduced at all. The last class of objects consists of objects, on which there is no consensus of the most dissimilar algorithms (P c ). This group of objects also belongs to the class of border objects. In the tables one gives variances of corresponding probabilities too. Minimal size of the blocks, on which one builds estimates using algorithms of cross-validation changes from 30 to 00. Also every distribution of objects in all groups could be described as mixture of Gaussians i.e GMM. For this it is useful to use Expectation-Maximization (EM) algorithm for determination of moments of Gaussians. This means that object have no homogeneous structure but form a compact structures i.e clusters that could be overlapped. Objects from the second group that are the most interesting for us also form clusters. Such kind of a structure of objects from the second group of objects allows to build algorithms that could classify the part of objects of clusters that forms the mixture of Gaussians. For this it is necessary that algorithm could sort out objects in some boundary that is around of some averaged object that is formed by objects mostly from one class. So one should develop some specialized algorithm that could pick out objects from every of compact clusters and then reclassify them correctly for the majority of objects for every cluster. Then the total error of classification will be approach to the minimal while improving the algorithms. If compact clusters (every of mixtures) consist of objects of different classes then it is impossible to use static models. In this case one should use dynamic models (e.g. some Graphical models such as Bayesian networks, Markov Models (MM) or Hidden Markov Models (HMM)) and detecting will be made on the basis of object behavior or regularity of object behavior in some state space. ISBN:
5 Figure 5: Task bupa from UCI repository: nonparametrical function of the incorrect consensus that consists of two algorithms Figure 6: Task bupa from UCI repository: nonparametrical function of no consensus between two algorithms Table 3: Task bupa from UCI repository. Q=00 Q=30 µ σ µ σ P c P e P c Figure 7: Task pima from UCI repository: nonparametrical function of consensus symmetry problem So the research of object peculiarities from the second group is interesting because it gives the possibility to understand the rules according to which the specialized algorithms should be built. These algorithms are developed only for reclassifying of objects from the second group. As we can see from Fig. 7 and 8 there is no symmetry in algorithms that is was verified on two tasks. Both mean and variance are different. This means that Hamming distance in general case is not linear that is conditioned by the data. Such a nonlinearity makes the task and algorithms very data-dependent. On the other hand it is difficult to predict the results of classification. All this approve the usefulness of consensus approach that uses the most dissimilar algorithms satisfying only statistical changes of the classification results. Using such an approach makes the selection of algorithms easy non-ambiguity and non-empiric task. 5 Conclusion In this paper the probability of belonging of every object to each of three groups of objects: a group of easy objects, on which it is reached the correct consensus of two algorithms, a group of objects, on which two the most dissimilar algorithms have an incorrect consensus and a group of objects, on which one does not achieve consensus. The analysis shows that there are probability distributions of data that can be presented as a multicomponent models including GMM. All this makes it possible to analyze the proposed algorithms by means of mathematical statistics and probability theory. From the figures and tables one can see that the probability estimations using methods of cross-validation with averaged blocks of 30 and ISBN:
6 Figure 8: Task bupa from UCI repository: nonparametrical function of consensus symmetry problem On the other hand proposed algorithm gives the possibility to evaluate and analyse other algorithms. For example it could be applied for analysis of the SVM and RVM (Relevant Vector Machines). For example, for SVM if we consider it as a symmetric problem in consensus then the third algorithm t will correspond to a separation hyperplane and other two hyperplanes that represent the support vectors correspond to the initial dissimilar algorithms. Because we use training in SVM and RVM the position of a hyperplane is only approximate. Then changing the direction of a hyperplane (due to overtraining) will lead to results that are very dependent on the direction changed (that is caused by the other learning set) because of the symmetry problem. This makes the SVM unstable to the learning set that is approved in a lot of research. 00 elements minimum [3] differ a little among themselves, which makes it possible to conclude that this method of consensus building where consensus consists in the most dissimilar algorithms are quite regular and does not have such sensitivity to the samples as other algorithms that use training. As seen from the corresponding tables, the minimal classification error is almost less by order of magnitude than error for the best existing algorithms. The maximal error is less from 1.5 to times in comparison with other algorithms. Also, the corresponding errors are much more stable both relatively to the task, on which one tests the algorithm and the series of given algorithms where the error value has significantly large variance. Moreover, since the minimal value of error is quite small and stable, it guaranties the stability of receipt of correct classification results on objects, on which consensus is reached by the most dissimilar algorithms. Relatively to other algorithms such a confidence can not be achieved. Indeed, the error value at 30 40% (as compared to 4%) gives no confidence in the results of classification. Estimates of probabilities on the basis of average values and the corresponding maximum probability distributions (for a maximum likelihood estimation (MLE)) are not much different, which gives an additional guarantee for the corresponding probability estimates. Significance of obtained consensus estimates of probabilities of correct consensus, incorrect consensus and probability that consensus will not be achieved provides a classification complexity estimate. Problems and algorithms for the complexity estimation of classification task is discussed in [4]. Mathematical analysis of committees of algorithms building has been considered in details in [5]. References: [1] V. Vapnik, The nature of statistical learning theory, nd ed. Springer Verlag, New York 000. [] K. Vorontsov, On the influence of similarity of classifiers on the probability of overfitting, in Proc. of the Ninth International Conference on Pattern Recognition and Image Analysis: New Information Technologies (PRIA-9), Nizhni Novgorod, Russian Federation, vol., 008, pp [3] S. Gurov, The reliability estimation of classification algorithm, Publishing department of the Computational mathematics and cybernetics faculty of Moscow State University, Moscow 003 (in Russian). [4] M. Basu, T. Ho, Data complexity in pattern recognition, Springer, London 006. [5] J. Zhuravlev, About the algebraic approach to recognition or classification tasks solution, Problems of cybernetics, vol. 33, 1978, pp. 5 68, (in Russian). ISBN:
Clustering Lecture 5: Mixture Model
Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser
More informationNaïve Bayes for text classification
Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support
More informationMachine Learning. Unsupervised Learning. Manfred Huber
Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training
More informationRobot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning
Robot Learning 1 General Pipeline 1. Data acquisition (e.g., from 3D sensors) 2. Feature extraction and representation construction 3. Robot learning: e.g., classification (recognition) or clustering (knowledge
More informationSketchable Histograms of Oriented Gradients for Object Detection
Sketchable Histograms of Oriented Gradients for Object Detection No Author Given No Institute Given Abstract. In this paper we investigate a new representation approach for visual object recognition. The
More informationSupport Vector Machines
Support Vector Machines . Importance of SVM SVM is a discriminative method that brings together:. computational learning theory. previously known methods in linear discriminant functions 3. optimization
More informationContent-based image and video analysis. Machine learning
Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all
More informationOptimization Methods for Machine Learning (OMML)
Optimization Methods for Machine Learning (OMML) 2nd lecture Prof. L. Palagi References: 1. Bishop Pattern Recognition and Machine Learning, Springer, 2006 (Chap 1) 2. V. Cherlassky, F. Mulier - Learning
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-
More informationIntroduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering
Introduction to Pattern Recognition Part II Selim Aksoy Bilkent University Department of Computer Engineering saksoy@cs.bilkent.edu.tr RETINA Pattern Recognition Tutorial, Summer 2005 Overview Statistical
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationTable of Contents. Recognition of Facial Gestures... 1 Attila Fazekas
Table of Contents Recognition of Facial Gestures...................................... 1 Attila Fazekas II Recognition of Facial Gestures Attila Fazekas University of Debrecen, Institute of Informatics
More informationData Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Implementation: Real machine learning schemes Decision trees Classification
More informationInvariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction
Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Stefan Müller, Gerhard Rigoll, Andreas Kosmala and Denis Mazurenok Department of Computer Science, Faculty of
More informationLeave-One-Out Support Vector Machines
Leave-One-Out Support Vector Machines Jason Weston Department of Computer Science Royal Holloway, University of London, Egham Hill, Egham, Surrey, TW20 OEX, UK. Abstract We present a new learning algorithm
More informationSupport Vector Machines
Support Vector Machines SVM Discussion Overview. Importance of SVMs. Overview of Mathematical Techniques Employed 3. Margin Geometry 4. SVM Training Methodology 5. Overlapping Distributions 6. Dealing
More informationA fuzzy k-modes algorithm for clustering categorical data. Citation IEEE Transactions on Fuzzy Systems, 1999, v. 7 n. 4, p.
Title A fuzzy k-modes algorithm for clustering categorical data Author(s) Huang, Z; Ng, MKP Citation IEEE Transactions on Fuzzy Systems, 1999, v. 7 n. 4, p. 446-452 Issued Date 1999 URL http://hdl.handle.net/10722/42992
More informationEnhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques
24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE
More informationSupport Vector Machines.
Support Vector Machines srihari@buffalo.edu SVM Discussion Overview. Importance of SVMs. Overview of Mathematical Techniques Employed 3. Margin Geometry 4. SVM Training Methodology 5. Overlapping Distributions
More informationApplication of Principal Components Analysis and Gaussian Mixture Models to Printer Identification
Application of Principal Components Analysis and Gaussian Mixture Models to Printer Identification Gazi. Ali, Pei-Ju Chiang Aravind K. Mikkilineni, George T. Chiu Edward J. Delp, and Jan P. Allebach School
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationKernel Combination Versus Classifier Combination
Kernel Combination Versus Classifier Combination Wan-Jui Lee 1, Sergey Verzakov 2, and Robert P.W. Duin 2 1 EE Department, National Sun Yat-Sen University, Kaohsiung, Taiwan wrlee@water.ee.nsysu.edu.tw
More informationRandom projection for non-gaussian mixture models
Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,
More informationEnhancing K-means Clustering Algorithm with Improved Initial Center
Enhancing K-means Clustering Algorithm with Improved Initial Center Madhu Yedla #1, Srinivasa Rao Pathakota #2, T M Srinivasa #3 # Department of Computer Science and Engineering, National Institute of
More informationJ. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report. February 5, 1998
Density Estimation using Support Vector Machines J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report CSD-TR-97-3 February 5, 998!()+, -./ 3456 Department of Computer Science
More informationClustering and The Expectation-Maximization Algorithm
Clustering and The Expectation-Maximization Algorithm Unsupervised Learning Marek Petrik 3/7 Some of the figures in this presentation are taken from An Introduction to Statistical Learning, with applications
More informationCSEP 573: Artificial Intelligence
CSEP 573: Artificial Intelligence Machine Learning: Perceptron Ali Farhadi Many slides over the course adapted from Luke Zettlemoyer and Dan Klein. 1 Generative vs. Discriminative Generative classifiers:
More informationAn Introduction to Pattern Recognition
An Introduction to Pattern Recognition Speaker : Wei lun Chao Advisor : Prof. Jian-jiun Ding DISP Lab Graduate Institute of Communication Engineering 1 Abstract Not a new research field Wide range included
More informationCS 340 Lec. 4: K-Nearest Neighbors
CS 340 Lec. 4: K-Nearest Neighbors AD January 2011 AD () CS 340 Lec. 4: K-Nearest Neighbors January 2011 1 / 23 K-Nearest Neighbors Introduction Choice of Metric Overfitting and Underfitting Selection
More informationIntroduction to Mobile Robotics
Introduction to Mobile Robotics Clustering Wolfram Burgard Cyrill Stachniss Giorgio Grisetti Maren Bennewitz Christian Plagemann Clustering (1) Common technique for statistical data analysis (machine learning,
More informationNICOLAS BOURBAKI ELEMENTS OF MATHEMATICS. General Topology. Chapters 1-4. Springer-Verlag Berlin Heidelberg New York London Paris Tokyo
NICOLAS BOURBAKI ELEMENTS OF MATHEMATICS General Topology Chapters 1-4 Springer-Verlag Berlin Heidelberg New York London Paris Tokyo ADVICE TO THE READER v CONTENTS OF THE ELEMENTS OF MATHEMATICS SERIES
More informationTHE COMPUTER MODELLING OF GLUING FLAT IMAGES ALGORITHMS. Alekseí Yu. Chekunov. 1. Introduction
MATEMATIČKI VESNIK MATEMATIQKI VESNIK 69, 1 (2017), 12 22 March 2017 research paper originalni nauqni rad THE COMPUTER MODELLING OF GLUING FLAT IMAGES ALGORITHMS Alekseí Yu. Chekunov Abstract. In this
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationPV211: Introduction to Information Retrieval
PV211: Introduction to Information Retrieval http://www.fi.muni.cz/~sojka/pv211 IIR 15-1: Support Vector Machines Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics, Masaryk University,
More informationUnsupervised Learning: Clustering
Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning
More informationTHE COMPUTER MODELLING OF GLUING FLAT IMAGES ALGORITHMS. Alekseí Yu. Chekunov. 1. Introduction
MATEMATIQKI VESNIK Corrected proof Available online 01.10.2016 originalni nauqni rad research paper THE COMPUTER MODELLING OF GLUING FLAT IMAGES ALGORITHMS Alekseí Yu. Chekunov Abstract. In this paper
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines CS 536: Machine Learning Littman (Wu, TA) Administration Slides borrowed from Martin Law (from the web). 1 Outline History of support vector machines (SVM) Two classes,
More informationComputational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions
Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Thomas Giraud Simon Chabot October 12, 2013 Contents 1 Discriminant analysis 3 1.1 Main idea................................
More informationUnsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning
Unsupervised Learning Clustering and the EM Algorithm Susanna Ricco Supervised Learning Given data in the form < x, y >, y is the target to learn. Good news: Easy to tell if our algorithm is giving the
More informationHMM-Based Handwritten Amharic Word Recognition with Feature Concatenation
009 10th International Conference on Document Analysis and Recognition HMM-Based Handwritten Amharic Word Recognition with Feature Concatenation Yaregal Assabie and Josef Bigun School of Information Science,
More informationCOMS 4771 Clustering. Nakul Verma
COMS 4771 Clustering Nakul Verma Supervised Learning Data: Supervised learning Assumption: there is a (relatively simple) function such that for most i Learning task: given n examples from the data, find
More informationLocal Linear Approximation for Kernel Methods: The Railway Kernel
Local Linear Approximation for Kernel Methods: The Railway Kernel Alberto Muñoz 1,JavierGonzález 1, and Isaac Martín de Diego 1 University Carlos III de Madrid, c/ Madrid 16, 890 Getafe, Spain {alberto.munoz,
More informationCluster quality assessment by the modified Renyi-ClipX algorithm
Issue 3, Volume 4, 2010 51 Cluster quality assessment by the modified Renyi-ClipX algorithm Dalia Baziuk, Aleksas Narščius Abstract This paper presents the modified Renyi-CLIPx clustering algorithm and
More informationGenerating the Reduced Set by Systematic Sampling
Generating the Reduced Set by Systematic Sampling Chien-Chung Chang and Yuh-Jye Lee Email: {D9115009, yuh-jye}@mail.ntust.edu.tw Department of Computer Science and Information Engineering National Taiwan
More informationFace Hallucination Based on Eigentransformation Learning
Advanced Science and Technology etters, pp.32-37 http://dx.doi.org/10.14257/astl.2016. Face allucination Based on Eigentransformation earning Guohua Zou School of software, East China University of Technology,
More informationHomework #4 Programming Assignment Due: 11:59 pm, November 4, 2018
CSCI 567, Fall 18 Haipeng Luo Homework #4 Programming Assignment Due: 11:59 pm, ovember 4, 2018 General instructions Your repository will have now a directory P4/. Please do not change the name of this
More informationModeling of ambient O 3 : a comparative study
Modeling of ambient O 3 : a comparative study Biljana Mileva-Boshkoska Abstract Air pollution is one of the biggest environmental concerns. Besides passive monitoring, the recent trend is shifting towards
More informationLink Prediction for Social Network
Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue
More informationMachine Learning Lecture 3
Many slides adapted from B. Schiele Machine Learning Lecture 3 Probability Density Estimation II 26.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course
More informationCS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample
Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups
More informationClustering: Classic Methods and Modern Views
Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering
More informationA noninformative Bayesian approach to small area estimation
A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported
More informationMachine Learning Lecture 3
Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 26.04.206 Discriminative Approaches (5 weeks) Linear
More informationStable and Multiscale Topological Signatures
Stable and Multiscale Topological Signatures Mathieu Carrière, Steve Oudot, Maks Ovsjanikov Inria Saclay Geometrica April 21, 2015 1 / 31 Shape = point cloud in R d (d = 3) 2 / 31 Signature = mathematical
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationTemporal Pooling Method for Rapid HTM Learning Applied to Geometric Object Recognition
Temporal Pooling Method for Rapid HTM Learning Applied to Geometric Object Recognition 1,2 S. Štolc, 1,2 I. Bajla, 3 K. Valentín, 3 R. Škoviera 1 Institute of Measurement Science, Department of Theoretical
More informationText classification II CE-324: Modern Information Retrieval Sharif University of Technology
Text classification II CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2015 Some slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
More informationExpectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University
Expectation Maximization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University April 10 th, 2006 1 Announcements Reminder: Project milestone due Wednesday beginning of class 2 Coordinate
More informationAn Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework
IEEE SIGNAL PROCESSING LETTERS, VOL. XX, NO. XX, XXX 23 An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework Ji Won Yoon arxiv:37.99v [cs.lg] 3 Jul 23 Abstract In order to cluster
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationBioinformatics - Lecture 07
Bioinformatics - Lecture 07 Bioinformatics Clusters and networks Martin Saturka http://www.bioplexity.org/lectures/ EBI version 0.4 Creative Commons Attribution-Share Alike 2.5 License Learning on profiles
More informationLecture 2 September 3
EE 381V: Large Scale Optimization Fall 2012 Lecture 2 September 3 Lecturer: Caramanis & Sanghavi Scribe: Hongbo Si, Qiaoyang Ye 2.1 Overview of the last Lecture The focus of the last lecture was to give
More informationVC 17/18 TP14 Pattern Recognition
VC 17/18 TP14 Pattern Recognition Mestrado em Ciência de Computadores Mestrado Integrado em Engenharia de Redes e Sistemas Informáticos Miguel Tavares Coimbra Outline Introduction to Pattern Recognition
More informationRecap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach
Truth Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 2.04.205 Discriminative Approaches (5 weeks)
More informationUnsupervised: no target value to predict
Clustering Unsupervised: no target value to predict Differences between models/algorithms: Exclusive vs. overlapping Deterministic vs. probabilistic Hierarchical vs. flat Incremental vs. batch learning
More informationIntroduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones
Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones What is machine learning? Data interpretation describing relationship between predictors and responses
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany
More informationThe alpha-procedure - a nonparametric invariant method for automatic classification of multi-dimensional objects
The alpha-procedure - a nonparametric invariant method for automatic classification of multi-dimensional objects Tatjana Lange Pavlo Mozharovskyi Hochschule Merseburg, 06217 Merseburg, Germany Universität
More informationExpectation Maximization (EM) and Gaussian Mixture Models
Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationEnsemble methods in machine learning. Example. Neural networks. Neural networks
Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you
More informationUsing Decision Boundary to Analyze Classifiers
Using Decision Boundary to Analyze Classifiers Zhiyong Yan Congfu Xu College of Computer Science, Zhejiang University, Hangzhou, China yanzhiyong@zju.edu.cn Abstract In this paper we propose to use decision
More informationORT EP R RCH A ESE R P A IDI! " #$$% &' (# $!"
R E S E A R C H R E P O R T IDIAP A Parallel Mixture of SVMs for Very Large Scale Problems Ronan Collobert a b Yoshua Bengio b IDIAP RR 01-12 April 26, 2002 Samy Bengio a published in Neural Computation,
More informationAdaptive Learning of an Accurate Skin-Color Model
Adaptive Learning of an Accurate Skin-Color Model Q. Zhu K.T. Cheng C. T. Wu Y. L. Wu Electrical & Computer Engineering University of California, Santa Barbara Presented by: H.T Wang Outline Generic Skin
More informationSupport vector machines
Support vector machines Cavan Reilly October 24, 2018 Table of contents K-nearest neighbor classification Support vector machines K-nearest neighbor classification Suppose we have a collection of measurements
More informationConstrained Clustering with Interactive Similarity Learning
SCIS & ISIS 2010, Dec. 8-12, 2010, Okayama Convention Center, Okayama, Japan Constrained Clustering with Interactive Similarity Learning Masayuki Okabe Toyohashi University of Technology Tenpaku 1-1, Toyohashi,
More informationFlexible-Hybrid Sequential Floating Search in Statistical Feature Selection
Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection Petr Somol 1,2, Jana Novovičová 1,2, and Pavel Pudil 2,1 1 Dept. of Pattern Recognition, Institute of Information Theory and
More informationA Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models
A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models Gleidson Pegoretti da Silva, Masaki Nakagawa Department of Computer and Information Sciences Tokyo University
More informationA Graph Theoretic Approach to Image Database Retrieval
A Graph Theoretic Approach to Image Database Retrieval Selim Aksoy and Robert M. Haralick Intelligent Systems Laboratory Department of Electrical Engineering University of Washington, Seattle, WA 98195-2500
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim,
More informationSemi-supervised learning and active learning
Semi-supervised learning and active learning Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Combining classifiers Ensemble learning: a machine learning paradigm where multiple learners
More informationSupervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful
More informationNIST. Support Vector Machines. Applied to Face Recognition U56 QC 100 NO A OS S. P. Jonathon Phillips. Gaithersburg, MD 20899
^ A 1 1 1 OS 5 1. 4 0 S Support Vector Machines Applied to Face Recognition P. Jonathon Phillips U.S. DEPARTMENT OF COMMERCE Technology Administration National Institute of Standards and Technology Information
More informationPractice EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE
Practice EXAM: SPRING 0 CS 6375 INSTRUCTOR: VIBHAV GOGATE The exam is closed book. You are allowed four pages of double sided cheat sheets. Answer the questions in the spaces provided on the question sheets.
More informationRank Measures for Ordering
Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many
More informationCalculation of extended gcd by normalization
SCIREA Journal of Mathematics http://www.scirea.org/journal/mathematics August 2, 2018 Volume 3, Issue 3, June 2018 Calculation of extended gcd by normalization WOLF Marc, WOLF François, LE COZ Corentin
More informationCLASSIFICATION is one of the most important applications of neural systems. Approximation
Neural minimal distance methods Włodzisław Duch Department of Computer Methods, Nicholas Copernicus University, Grudzia dzka 5, 87-100 Toruń, Poland. E-mail: duch@phys.uni.torun.pl Abstract Minimal distance
More informationAnnouncements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.
CS 188: Artificial Intelligence Spring 2011 Lecture 21: Perceptrons 4/13/2010 Announcements Project 4: due Friday. Final Contest: up and running! Project 5 out! Pieter Abbeel UC Berkeley Many slides adapted
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationFeature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.
CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit
More informationTexture Classification by Combining Local Binary Pattern Features and a Self-Organizing Map
Texture Classification by Combining Local Binary Pattern Features and a Self-Organizing Map Markus Turtinen, Topi Mäenpää, and Matti Pietikäinen Machine Vision Group, P.O.Box 4500, FIN-90014 University
More informationMultivariate Data Analysis and Machine Learning in High Energy Physics (V)
Multivariate Data Analysis and Machine Learning in High Energy Physics (V) Helge Voss (MPI K, Heidelberg) Graduierten-Kolleg, Freiburg, 11.5-15.5, 2009 Outline last lecture Rule Fitting Support Vector
More informationInference and Representation
Inference and Representation Rachel Hodos New York University Lecture 5, October 6, 2015 Rachel Hodos Lecture 5: Inference and Representation Today: Learning with hidden variables Outline: Unsupervised
More informationI How does the formulation (5) serve the purpose of the composite parameterization
Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 12 Combining
More informationIntegration of Public Information at the Regional Level Challenges and Opportunities *
Integration of Public Information at the Regional Level Challenges and Opportunities * Leon Bobrowski, Mariusz Buzun, and Karol Przybszewski Faculty of Computer Science, Bialystok Technical University,
More informationSupport Vector Machines.
Support Vector Machines srihari@buffalo.edu SVM Discussion Overview 1. Overview of SVMs 2. Margin Geometry 3. SVM Optimization 4. Overlapping Distributions 5. Relationship to Logistic Regression 6. Dealing
More information