Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions
|
|
- Corey Montgomery
- 5 years ago
- Views:
Transcription
1 Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Thomas Giraud Simon Chabot October 12, 2013
2 Contents 1 Discriminant analysis Main idea The Bayes rule Maximum likelihood estimation Example Hypothesis Linear vs quadratic discriminant analysis Clustering methods Goal Kmeans Algorithm Example Iris classification Minibatch-kmeans Description Algorithm Hierarchical clustering Other methods K-nearest neighbors
3 Introduction This report aims to introduce some concepts about data classification. Classification algorithms are mainly divided up into two families. The first one is called supervised classification and the second one is called unsupervised classification. Each of them have the same goal : classify data into different classes according to their features. To achieve this, each algorithms family uses a different approach. The supervised family needs to be trained. It means that an algorithm of this family has to be trained on the same kind of data it will have to classify. For instance, if one of this algorithm has to classify crabs into two classes males, females we will have to give it a lot of males crabs and females crabs in order to it gets able to perform the classification on itself, and for each crabs, we have to say to the algorithm if the crabs is looking at is either male or female. Once the algorithm is trained, it will be able to work on its own. On the other hand, the unsupervised family can perform the classification on its own directly, there is no training at all. The algorithm is given a dataset and tries to recognize which subject belongs to which class, gathering subjects sharing similar properties. Usually, the only information the algorithm is given is the number of classes it has to find. This report introduces some famous methods of each family. 2
4 Chapter 1 Discriminant analysis 1.1 Main idea In this chapter, we introduce the basics of the decision theory and object classification thought the Bayes rule. The discriminant analysis goal is to classify a given population into different known classes. To achieve this goal, the classifier needs to be trained, let L be the train set. The population of L is made of n subjects, where each subject can be described by p features, and the class it belongs to. So, the whole population can be seen as the following matrix: x 11 x 1p y 1 x 21 x 2p y 2 (1.1).... x n1 x np y n Mathematically, we have to find a function of L, such that x g(x) = y. Now, let T be the test set, T looks like L but the classes (i.e. the last column) are unknown and have to be determined. The assumption made is that if L is well representative of T, then using the g(x) function over T, we might be able to associate to each subject the corresponding class. Let s see how such a function can be defined. 1.2 The Bayes rule Considering L, we have n subjects divided up into c classes. We assume each class k is distributed according to a normal distribution, N (µ k, Σ k ) and an a priori probability π k. Thus, we have: 3
5 f k (x) = ( 1 (2π) p/2 exp 1 ) det Σ k 2 (x µ k) Σ 1 k (x µ k ) (1.2) where p is the dimension of the x vector. The function g(x) we are going to build has to maximize the a posteriori probability of x to belong to the class k. That is to say, we are looking for: Thus, we define g(x) such as k = arg max π k f k (x) (1.3) k g : X Y = {w 1,..., w c } x w k where k is defined by (1.3) (1.4) The equation 1.4 is known as the Bayes rule. 1.3 Maximum likelihood estimation Now, given c normal distributions, we are able to classify a given subject x into the class maximizing the a posteriori probability. The problem is we do not know those c distributions. We assume they are normal ones, but we do not know exactly what the mean, the deviation and the a priori probability are. So, we have to estimate those parameters using the maximum likelihood estimation 1. Using this estimator and the train set L, we can be nearly the real and theoretical values. We have : ˆπ k = n k (1.5a) n ˆµ k = 1 n t ik x i (1.5b) n k ˆΣ k = 1 n k i=0 n t ik (x i ˆµ k ) (x i ˆµ k ) i=0 (1.5c) where n is the number of subjects in the whole population, n k the number of subjects belonging to the class w k, and t ik is a 1 if y i = w k, 0 otherwise, k {1,..., c}. 1. Some referred as MLE. 4
6 1.4 Example Before continuing on the discriminant analysis, let s have a little example. We want to make a programme, using discriminant analysis, to determine the origin of different wines, according to their chemical components. We found on a database 2 a record of 178 different wines, described by 12 features, and divided up into three classes. The idea is to divide this data into two sets : one train set L, used to train the classifier. one test set T, used to check how good (or bad) our classifier is. For each subject of T, we will ask our classifier to pick the best class to assign and compare this to the original value. By doing this for the whole set, we are going to be able to determine a percentage of success. For each subject, we have the following features : Malic acid, Ash, Alcalinity of ash, Magnesium, Total phenols, Flavanoids, Nonflavanoid phenols, Proanthocyanins, Color intensity, Hue, OD280/OD315 of diluted wines, Proline, and the class. We will show, step by step, how the classifier is trained, and used to classifier. First of all, we need to create the train and the test sets. 0 >>> import data # used to load the data >>> import pylab as p # used to perform some mathematical stuff 2 >>> w = load ( wine ) # load the raw wine data >>> print ( w) # here, we find our population matrix 4 [[ , ] [ , ] 6 [ , ]..., 8 [ , ] [ , ] 10 [ , ]] >>> p. shuffle (w) #we shuffle the rows 12 >>> train = w[: len ( w)/2,] # the first half is used as train set >>> test = w[ len (w)/2:, : -1] # the second half used as test set 14 # moreover, the last column is removed >>> target = w[ len ( w)/2:, -1]# keep the real classes in memory So, at this stage, we have train a matrix representing L, test representing T, and target the real classes of subjects of T, used to evaluate our classifier. We wrote a function, estimates_parameters, which evaluates for each class, the parameters of the normal distribution using the MLE on the train set. (Unfortunately, the output cannot be shown in the report, because it is constituted of three matrices and three 1 13 vectors... ). The parameters estimated are
7 stored into a variable params. Now, let s use our classifier on the test set. We have made a function called postprobalitiy and given one distribution (i.e. ˆµ k, ˆΣ k and ˆπ k ), this function gives 3 the a posteriori probability of x to belong to the class k. Let s classify one subject using this function : 0 >>> params = estimates_ parameters ( train ) # we train our classifier >>> subject = test [0, :] # Pick the first wine to classifier 2 >>> postprobalitiy ( subject, params [ 0]) # belonging to class >>> postprobalitiy ( subject, params [ 1]) # belonging to class >>> postprobalitiy ( subject, params [ 2]) # belonging to class Using the Bayes rule (1.4), the subject should be associated to the class 1. Let s check if it is right using the target array (which contains the actual classes). 0 >>> target [0] # The real class of subject 1 Yes, it is! To perform this operation on the whole test set, we have made a function called predict, which returns an array listing all the class assigned. 0 >>> predictions = predict ( test, params ) >>> print ( predictions ) # What has been predicted? 2 array ([1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 0, 2, 2, 0, 0, 1, 0, 0, 0, 1, 2, 2, 0, 2, 2, 2, 0, 0, 4 0, 1, 1, 1, 0, 1, 0, 2, 1, 0, 2, 1, 1, 2, 1, 2, 0, 1, 1, 2, 2, 2, 1, 1, 1, 1, 0, 1, 1, 1, 6 1, 0, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 0, 0, 2, 0, 0, 1, 1, 1, 1, 0, 2, 0, 1, 2]) 8 >>> print ( target ) # What is wanted? array ([1, 0, 0, 1, 2, 1, 2, 1, 1, 1, 2, 2, 0, 2, 2, 10 0, 0, 1, 0, 0, 0, 1, 2, 2, 0, 2, 2, 2, 0, 0, 0, 1, 1, 2, 0, 1, 0, 2, 1, 0, 2, 1, 1, 2, 1, 12 2, 0, 1, 1, 2, 2, 2, 1, 2, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 0, 0, 1, 14 0, 0, 2, 0, 0, 1, 1, 1, 1, 0, 2, 0, 1, 2]) 16 >>> print ( predictions == target ) # Whether the prediction # is correct or not 18 array ([ True, False, True, True, False, True, False, True, True, True, False, True, True, True, 20 True, True, True, True, True, True, True, True, True, True, True, True, True, True, 22 True, True, True, True, True, False, True, 3. actually, it gives the logarithm of the probability, to handle small values in practice. This is equivalent. 6
8 True, True, True, True, True, True, True, 24 True, True, True, True, True, True, True, True, True, True, True, False, True, True, 26 True, True, True, False, True, True, True, True, True, True, True, True, True, True, 28 True, True, False, False, True, True, True, True, True, True, True, True, True, True, 30 True, True, True, True, True ], dtype = bool ) Using the target and the predictions tables, we can rate our classifier, counting the amount of correct predictions. In this particular case, the classifier is 89.89% correct (80 correct out of 89). 1.5 Hypothesis There exists different common variants of discriminant analysis. The one we have introduce so far is called the quadratic discriminant analysis. As a matter of fact, it can be shown quite easily that the decision borders are quadratic forms, such as hypersphere, hyperboloid, etc. Depending on the hypothesis we made on the data we have, the discriminant analysis can be declined into different variants. Linear discriminant analysis In this variant, we suppose that all the classes have the same covariance matrix, i.e. k Σ k = Σ. It can be shown that the decision borders of such a variant are linear manifolds. This variant assign to a subject x the nearest class (representing by its mean µ k ) according to the Mahalanobis distance 4. Naive discriminant analysis If one suppose that the p features describing the dataset are statistically independent, then she is performing a naive discriminant analysis. In practice, it means the covariance matrices are diagonal. Using the MLE, it means that only the diagonal of ˆΣ k is kept. Euclidean discriminant analysis This is the simplest variant there exists. One suppose that all the covariance matrices are equal to a scalar and the c classes have the same a priori probabilities. It means : Σ k = σ 2 I p and π k = 1 k {1,..., c} c 4. The Mahalanobis distance is defined as d(x, y) = (x y) Σ 1 (x y) and unlike the Euclidean one, it takes into account the dispersion of the distribution. 7
9 In this particular case, the decision borders are separated by hyperplanes Linear vs quadratic discriminant analysis Let s finish this chapter we a little example showing the difference between the linear discriminant analysis LDA and the quadratic discriminant analysis QDA. We have a dataset of 200 crabs made of 100 males and 100 females and we want to perform a discrimination based on the sex of the animal. Each crabs is described by five features 5. But, because we want to visualize the discrimination on a plane, we perform a principal component analysis in order to reduce the dimension while keeping the maximum of information. Once this is done, we perform a LDA and a QDA, the results are shown on the picture 1.1. Each color represents a sex. The big dots are the crabs assigned to the right class, and the small ones wrong one. The big black dots are the means, ellipsoids are the confident regions and the black lines the decision borders. We can see why there are called linear and quadratic discriminant analysis. Figure 1.1: Linear vs. Quadratic discriminant analysis 5. the frontal lobe size, the rear width, the carapace length, the carapace width and the body depth 8
10 Chapter 2 Clustering methods 2.1 Goal A cluster refers to a set of similar objects. The similarity in a set may vary according to data. The goal is to classify a given data set through a certain number of clusters. 2.2 Kmeans Algorithm K-means clustering is an simple unsupervised algorithm that solve the clustering problem. Its main characteristic is that it s told in advance how many distinct clusters to generate. The main idea is to determine the size of the clusters thanks to the structure of the data. The algorithm begins with k distinct placed centroids where each centroid stands for a cluster. Every point of the set is assigned to the nearest centroid by calculating the Euclidean distance between them. Then, the centroids become the average location of all the points assigned to them. And a second round begins : each distance between the nodes and the updated centroids is recalculated. The assignments are redone only if the nearest centroid of the point is not the one it currently belongs. When switching occurs, centroids have to be recalculated. This procedure is repeated until the assignments stop changing, in other words clusters do not move any more. The procedure always terminates as k-means will always converge. The number of iterations to converge is highly dependent on the initialization of the centroids. 9
11 Moreover, the main problem with this algorithm is its complexity. Suppose we have a dataset of records, and we want to divide them into 300 clusters. The complexity of the k-means algorithm is O(n k i f), where n is the number of data, k is the number of clusters, i is the number of repetitions and f is the number of features in a particular record. It s clearly that it will take a long time to cluster data. Finally, k-means algorithm s goal is to minimize an objective function which is a squared error function. J = k j=1 n x j i c j 2 (2.1) i=1 where x j i c j 2 is the Euclidean distance between the data point x j i and the centroid c j. The following steps describe the algorithm : 1. Place k different centroids randomly. 2. Assign each points to the nearest centroid. 3. Move centroids to the average location of the points that were assigned to them. 4. Repeat steps [2] and [3] until the centroids no longer move Example As a simple illustration of k-means algorithm, suppose that we have a data set of 200 points and we know that they can be grouped into 3 clusters. Figures 2.1 to 2.3 show the k-means process in action for this example Iris classification In this section, we will explain our programme, using k-means algorithm, to determine the clusters. It only works with two dimensions data. In this example, we use Fisher s Iris dataset 1. So, we have 150 Iris divided into 3 species, Iris setosa, Iris virginica and Iris versicolor. Each flower is described by 4 features. So, we need to reduce the dataset to 2 dimensions using a principal component analysis. 1. Fisher s Iris data set is a multivariate data set introduced by Sir Ronald Fisher (1936) as an example of discriminant analysis 10
12 Figure 2.1: Steps 1 to 2 11
13 Figure 2.2: Steps 3 to 4 12
14 Figure 2.3: Steps 5 to 6 13
15 The purpose is to retrieve these three different groups of iris. Therefore, we have to choose three different centroids randomly among iris data set. In this way, it allows us to not determine a correct size of the plane and we are sure that centroids will be close to data set. 0 from math import sqrt # used to perfom the square root from random import sample # used to perform random sampling without replacement 2 import pylab as p # used to perform some mathematical stuff 4 centroids = sample ( points, 3) # we choose 3 centroids randomly among points 6 for i in xrange (6) : # we repeat 6 times the algorithm 8 dict_ assign = dict () # it s a dictionary of assignements ( points and centroids ) 10 for s in centroids : dict_assign [s] = set () 12 for pt in points : 14 closest_ centroid = min ([( dist_ points ( pt, c), c) for c in centroids ]) [1] dict_assign [ closest_centroid ]. add (pt) 16 # assign the closest centroid to each point 18 centroids = set () for pts in dict_ assign. values (): 20 if not pts : continue 22 xm = sum (x for (x,y) in pts )/ len ( pts ) # average on abscissa of points assigned to the centroid ym = sum (y for (x,y) in pts )/ len ( pts ) # average on ordinate of points assigned to the centroid 24 centroids. add (( xm, ym)) # move centroids to the new average coordinates The results are presented in Figure 2.4. We can see that K-means classification correspond to the real classification. Expect for some points, the algorithm has some difficulties in making much difference between its clusters. The reason is that the actual classes are, in our PCA representation, overlapping. 14
16 2.3 Minibatch-kmeans Principle The idea of this k-means derived algorithm is to use smaller subsets of the data (mini-batches). For instance, for a dataset of records, we only train records. Thus, it takes lesser time than the original algorithm. Even if it uses minibatches, the algorithm makes sure that the clusters may be a good representation of whole of the dataset and the results are only slightly worse than the previous algorithm. All algorithms tend to have the issue of parameter selection. But in Mini-Batch we don t need to figure out how many clusters we want, only how many iterations to perform and the data size Algorithm In the first step, the algorithm takes S samples (randomly chosen from the dataset) which form a mini-batch. Then, samples are assigned to the nearest centroid. It then updates the cluster centroids by taking the average of the sample and the previous samples assigned to those centroids. This gradient descent update has the effect of decreasing the rate of change for a centroid, which is significantly faster than a normal k-means update. These steps are repeated until we reach a convergence or the number of iterations. 2.4 Hierarchical clustering An alternative method of clustering is Hierarchical clustering. This type of algorithm gives a tree as a result. Indeed, the principle is to build a hierarchy of clusters. There are two different representations: Agglomerative hierarchical clustering is a bottom-up approach where it starts with every single object in a single cluster. In each successive iteration, it merges the closest pair of clusters (by relying on some common criteria) until all of the data is in one cluster. Divisive clustering is a top-down approach. This variant of hierarchical clustering starts at the top with all documents in one cluster, and this cluster is split using a flat clustering algorithm 2. These splits are performed until each document is on its own cluster. The similarity between every pair of data must be recalculated in each iteration. That is why this algorithm runs slowly on large dataset. 2. documents within a cluster should be as similar as possible and documents in one cluster should be as dissimilar as possible from documents in other clusters 15
17 Figure 2.4: Iris classification 16
18 Conclusion 17
Stefano Cavuoti INAF Capodimonte Astronomical Observatory Napoli
Stefano Cavuoti INAF Capodimonte Astronomical Observatory Napoli By definition, machine learning models are based on learning and self-adaptive techniques. A priori, real world data are intrinsically carriers
More informationOn Sample Weighted Clustering Algorithm using Euclidean and Mahalanobis Distances
International Journal of Statistics and Systems ISSN 0973-2675 Volume 12, Number 3 (2017), pp. 421-430 Research India Publications http://www.ripublication.com On Sample Weighted Clustering Algorithm using
More informationIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)
More informationINF 4300 Classification III Anne Solberg The agenda today:
INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15
More informationThe University of Birmingham School of Computer Science MSc in Advanced Computer Science. Imaging and Visualisation Systems. Visualisation Assignment
The University of Birmingham School of Computer Science MSc in Advanced Computer Science Imaging and Visualisation Systems Visualisation Assignment John S. Montgomery msc37jxm@cs.bham.ac.uk May 2, 2004
More informationSupervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful
More informationIntroduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering
Introduction to Pattern Recognition Part II Selim Aksoy Bilkent University Department of Computer Engineering saksoy@cs.bilkent.edu.tr RETINA Pattern Recognition Tutorial, Summer 2005 Overview Statistical
More informationAn Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs
An Introduction to Cluster Analysis Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs zhaoxia@ics.uci.edu 1 What can you say about the figure? signal C 0.0 0.5 1.0 1500 subjects Two
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationUnsupervised Learning
Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)
More informationCIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]
CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.
More informationCluster Analysis: Agglomerate Hierarchical Clustering
Cluster Analysis: Agglomerate Hierarchical Clustering Yonghee Lee Department of Statistics, The University of Seoul Oct 29, 2015 Contents 1 Cluster Analysis Introduction Distance matrix Agglomerative Hierarchical
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationClustering and The Expectation-Maximization Algorithm
Clustering and The Expectation-Maximization Algorithm Unsupervised Learning Marek Petrik 3/7 Some of the figures in this presentation are taken from An Introduction to Statistical Learning, with applications
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationPart I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a
Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering
More informationContent-based image and video analysis. Machine learning
Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1
Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group
More informationFuzzy Segmentation. Chapter Introduction. 4.2 Unsupervised Clustering.
Chapter 4 Fuzzy Segmentation 4. Introduction. The segmentation of objects whose color-composition is not common represents a difficult task, due to the illumination and the appropriate threshold selection
More informationMachine Learning. Unsupervised Learning. Manfred Huber
Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training
More informationk-nearest Neighbors + Model Selection
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University k-nearest Neighbors + Model Selection Matt Gormley Lecture 5 Jan. 30, 2019 1 Reminders
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for
More informationUnsupervised: no target value to predict
Clustering Unsupervised: no target value to predict Differences between models/algorithms: Exclusive vs. overlapping Deterministic vs. probabilistic Hierarchical vs. flat Incremental vs. batch learning
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationCOMS 4771 Clustering. Nakul Verma
COMS 4771 Clustering Nakul Verma Supervised Learning Data: Supervised learning Assumption: there is a (relatively simple) function such that for most i Learning task: given n examples from the data, find
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationCase-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.
CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance
More informationCS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley 1 1 Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance
More informationK-Means Clustering 3/3/17
K-Means Clustering 3/3/17 Unsupervised Learning We have a collection of unlabeled data points. We want to find underlying structure in the data. Examples: Identify groups of similar data points. Clustering
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationChapter 6 Continued: Partitioning Methods
Chapter 6 Continued: Partitioning Methods Partitioning methods fix the number of clusters k and seek the best possible partition for that k. The goal is to choose the partition which gives the optimal
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 4th, 2014 Wolf-Tilo Balke and José Pinto Institut für Informationssysteme Technische Universität Braunschweig The Cluster
More informationIntroduction to R and Statistical Data Analysis
Microarray Center Introduction to R and Statistical Data Analysis PART II Petr Nazarov petr.nazarov@crp-sante.lu 22-11-2010 OUTLINE PART II Descriptive statistics in R (8) sum, mean, median, sd, var, cor,
More informationCSE 573: Artificial Intelligence Autumn 2010
CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine
More informationCOMP 551 Applied Machine Learning Lecture 13: Unsupervised learning
COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551
More informationIntroduction to Machine Learning. Xiaojin Zhu
Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006
More informationSupervised vs. Unsupervised Learning
Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationClustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford
Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically
More informationFeature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.
CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany
More informationMultivariate Analysis
Multivariate Analysis Cluster Analysis Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Unsupervised Learning Cluster Analysis Natural grouping Patterns in the data
More informationContents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation
Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4
More informationOlmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.
Center of Atmospheric Sciences, UNAM November 16, 2016 Cluster Analisis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster)
More informationClustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation
More informationk-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out
Machine learning: Unsupervised learning" David Kauchak cs Spring 0 adapted from: http://www.stanford.edu/class/cs76/handouts/lecture7-clustering.ppt http://www.youtube.com/watch?v=or_-y-eilqo Administrative
More informationEPL451: Data Mining on the Web Lab 5
EPL451: Data Mining on the Web Lab 5 Παύλος Αντωνίου Γραφείο: B109, ΘΕΕ01 University of Cyprus Department of Computer Science Predictive modeling techniques IBM reported in June 2012 that 90% of data available
More informationINF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22
INF4820 Clustering Erik Velldal University of Oslo Nov. 17, 2009 Erik Velldal INF4820 1 / 22 Topics for Today More on unsupervised machine learning for data-driven categorization: clustering. The task
More informationHsiaochun Hsu Date: 12/12/15. Support Vector Machine With Data Reduction
Support Vector Machine With Data Reduction 1 Table of Contents Summary... 3 1. Introduction of Support Vector Machines... 3 1.1 Brief Introduction of Support Vector Machines... 3 1.2 SVM Simple Experiment...
More informationUnsupervised Learning: Clustering
Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning
More informationINF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering
INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,
More informationClustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017
Clustering and Dimensionality Reduction Stony Brook University CSE545, Fall 2017 Goal: Generalize to new data Model New Data? Original Data Does the model accurately reflect new data? Supervised vs. Unsupervised
More information1 Case study of SVM (Rob)
DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how
More informationBased on Raymond J. Mooney s slides
Instance Based Learning Based on Raymond J. Mooney s slides University of Texas at Austin 1 Example 2 Instance-Based Learning Unlike other learning algorithms, does not involve construction of an explicit
More informationMini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class
Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class Guidelines Submission. Submit a hardcopy of the report containing all the figures and printouts of code in class. For readability
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering May 25, 2011 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig Homework
More informationStatistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte
Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Murhaf Fares & Stephan Oepen Language Technology Group (LTG) September 27, 2017 Today 2 Recap Evaluation of classifiers Unsupervised
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser
More informationKernels and Clustering
Kernels and Clustering Robert Platt Northeastern University All slides in this file are adapted from CS188 UC Berkeley Case-Based Learning Non-Separable Data Case-Based Reasoning Classification from similarity
More informationData: a collection of numbers or facts that require further processing before they are meaningful
Digital Image Classification Data vs. Information Data: a collection of numbers or facts that require further processing before they are meaningful Information: Derived knowledge from raw data. Something
More informationClassification: Feature Vectors
Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal & Stephan Oepen Language Technology Group (LTG) September 23, 2015 Agenda Last week Supervised vs unsupervised learning.
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationIBL and clustering. Relationship of IBL with CBR
IBL and clustering Distance based methods IBL and knn Clustering Distance based and hierarchical Probability-based Expectation Maximization (EM) Relationship of IBL with CBR + uses previously processed
More informationUnsupervised learning in Vision
Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual
More informationCluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010
Cluster Analysis Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX 7575 April 008 April 010 Cluster Analysis, sometimes called data segmentation or customer segmentation,
More informationMachine learning - HT Clustering
Machine learning - HT 2016 10. Clustering Varun Kanade University of Oxford March 4, 2016 Announcements Practical Next Week - No submission Final Exam: Pick up on Monday Material covered next week is not
More informationCluster Analysis using Spherical SOM
Cluster Analysis using Spherical SOM H. Tokutaka 1, P.K. Kihato 2, K. Fujimura 2 and M. Ohkita 2 1) SOM Japan Co-LTD, 2) Electrical and Electronic Department, Tottori University Email: {tokutaka@somj.com,
More informationNatural Language Processing
Natural Language Processing Machine Learning Potsdam, 26 April 2012 Saeedeh Momtazi Information Systems Group Introduction 2 Machine Learning Field of study that gives computers the ability to learn without
More informationWhat to come. There will be a few more topics we will cover on supervised learning
Summary so far Supervised learning learn to predict Continuous target regression; Categorical target classification Linear Regression Classification Discriminative models Perceptron (linear) Logistic regression
More informationStats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms
Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms Padhraic Smyth Department of Computer Science Bren School of Information and Computer Sciences University of California,
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationMSA220 - Statistical Learning for Big Data
MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups
More informationClustering. Supervised vs. Unsupervised Learning
Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now
More information3. Cluster analysis Overview
Université Laval Multivariate analysis - February 2006 1 3.1. Overview 3. Cluster analysis Clustering requires the recognition of discontinuous subsets in an environment that is sometimes discrete (as
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationCSE 158. Web Mining and Recommender Systems. Midterm recap
CSE 158 Web Mining and Recommender Systems Midterm recap Midterm on Wednesday! 5:10 pm 6:10 pm Closed book but I ll provide a similar level of basic info as in the last page of previous midterms CSE 158
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationClustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York
Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity
More informationCluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008
Cluster Analysis Jia Li Department of Statistics Penn State University Summer School in Statistics for Astronomers IV June 9-1, 8 1 Clustering A basic tool in data mining/pattern recognition: Divide a
More informationDS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li Time: 6:00pm 8:50pm Thu Location: AK 232 Fall 2016 High Dimensional Data v Given a cloud of data points we want to understand
More informationMATH5745 Multivariate Methods Lecture 13
MATH5745 Multivariate Methods Lecture 13 April 24, 2018 MATH5745 Multivariate Methods Lecture 13 April 24, 2018 1 / 33 Cluster analysis. Example: Fisher iris data Fisher (1936) 1 iris data consists of
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationMore on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization
More on Learning Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization Neural Net Learning Motivated by studies of the brain. A network of artificial
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Kernels and Clustering Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationPATTERN CLASSIFICATION AND SCENE ANALYSIS
PATTERN CLASSIFICATION AND SCENE ANALYSIS RICHARD O. DUDA PETER E. HART Stanford Research Institute, Menlo Park, California A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS New York Chichester Brisbane
More informationMixture Models and the EM Algorithm
Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim,
More informationCluster Analysis. Summer School on Geocomputation. 27 June July 2011 Vysoké Pole
Cluster Analysis Summer School on Geocomputation 27 June 2011 2 July 2011 Vysoké Pole Lecture delivered by: doc. Mgr. Radoslav Harman, PhD. Faculty of Mathematics, Physics and Informatics Comenius University,
More informationUnsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning
Unsupervised Learning Clustering and the EM Algorithm Susanna Ricco Supervised Learning Given data in the form < x, y >, y is the target to learn. Good news: Easy to tell if our algorithm is giving the
More informationExpectation Maximization: Inferring model parameters and class labels
Expectation Maximization: Inferring model parameters and class labels Emily Fox University of Washington February 27, 2017 Mixture of Gaussian recap 1 2/26/17 Jumble of unlabeled images HISTOGRAM blue
More informationClustering Lecture 5: Mixture Model
Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics
More information