Unsupervised Learning

Similar documents
Network Traffic Measurements and Analysis

Grundlagen der Künstlichen Intelligenz

Methods for Intelligent Systems

Unsupervised Learning

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

ECG782: Multidimensional Digital Signal Processing

Clustering Lecture 5: Mixture Model

Machine Learning : Clustering, Self-Organizing Maps

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Machine Learning for OR & FE

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Chapter 7: Competitive learning, clustering, and self-organizing maps

Figure (5) Kohonen Self-Organized Map

Clustering CS 550: Machine Learning

COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS

10-701/15-781, Fall 2006, Final

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Machine Learning. Unsupervised Learning. Manfred Huber

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Unsupervised Learning: Clustering

Function approximation using RBF network. 10 basis functions and 25 data points.

K-means and Hierarchical Clustering

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

CS325 Artificial Intelligence Ch. 20 Unsupervised Machine Learning

Clustering with Reinforcement Learning

Artificial Neural Networks Unsupervised learning: SOM

Clustering. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 238

Segmentation: Clustering, Graph Cut and EM

Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013

A Dendrogram. Bioinformatics (Lec 17)

Bioinformatics - Lecture 07

Clustering and Visualisation of Data

CS 195-5: Machine Learning Problem Set 5

Unsupervised Learning

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Content-based image and video analysis. Machine learning

Exploratory Data Analysis using Self-Organizing Maps. Madhumanti Ray

Clustering web search results

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components

PATTERN RECOGNITION USING NEURAL NETWORKS

Deep Learning for Computer Vision

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Expectation Maximization!

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

Dimension Reduction CS534

Clustering: Classic Methods and Modern Views

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

ECE 5424: Introduction to Machine Learning

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Clustering and The Expectation-Maximization Algorithm

Data Clustering. Danushka Bollegala

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Exploratory data analysis for microarrays

Nonlinear projections. Motivation. High-dimensional. data are. Perceptron) ) or RBFN. Multi-Layer. Example: : MLP (Multi(

INF 4300 Classification III Anne Solberg The agenda today:

Using Machine Learning to Optimize Storage Systems

Tight Clusters and Smooth Manifolds with the Harmonic Topographic Map.

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Mixture Models and the EM Algorithm

COMS 4771 Clustering. Nakul Verma

Clustering. Chapter 10 in Introduction to statistical learning

Clustering algorithms

Slide07 Haykin Chapter 9: Self-Organizing Maps

Chapter DM:II. II. Cluster Analysis

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

Clustering. Bruno Martins. 1 st Semester 2012/2013

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

MSA220 - Statistical Learning for Big Data

Unsupervised Learning

Self-Organizing Maps for cyclic and unbounded graphs

General Instructions. Questions

Semi-Supervised Clustering with Partial Background Information

11/14/2010 Intelligent Systems and Soft Computing 1

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos

K-Means and Gaussian Mixture Models

MULTICORE LEARNING ALGORITHM

Data Mining. Kohonen Networks. Data Mining Course: Sharif University of Technology 1

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest)

Clustering: K-means and Kernel K-means

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

Introduction to Machine Learning. Xiaojin Zhu

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Metric Learning for Large-Scale Image Classification:

What is machine learning?

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Data Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\

Clustering and Dimensionality Reduction

Machine Learning Methods in Visualisation for Big Data 2018

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering

University of Florida CISE department Gator Engineering. Clustering Part 4

Transcription:

Networks for Pattern Recognition, 2014

Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization

Networks for Problems/Approaches in Machine Learning Supervised Learning Reinforcement Learning

Networks for Problems/Approaches in Machine Learning Supervised Learning Reinforcement Learning In supervised learning, every input is provided with an output. It is a problem about function approximation and there is a feedback on what response should be (target output).

Networks for Problems/Approaches in Machine Learning Supervised Learning Reinforcement Learning In unsupervised learning, there is no outputs, just input data. It is a problem about describe the nature of the data and/or infer its internal structure.

Networks for Problems/Approaches in Machine Learning Supervised Learning Reinforcement Learning A semi-supervised approach exists. It is about how to use labelled and unlabelled data to learn.

Networks for Problems/Approaches in Machine Learning Supervised Learning Reinforcement Learning In reinforcement learning there is no supervisor. The learning is done based on interaction with the environment, through rewards and penalties. It is about learning of states, actions and its effect.

Networks for Unsupervised learning refers to any machine learning process that seeks to learn structure in the absence of either an identified output (supervised learning) or feedback (reinforcement learning). Three typical examples of unsupervised learning are:, Association rules and self-organization maps. - Encyclopedia of Machine Learning, Sammut, Webb, 2010

Networks for in Literature Due to machine learning includes many topics like pattern recognition, classification, signal processing, probability and statistics, and also it has many applications. Authors present their own approach in books and courses. Here are some examples on how unsupervised learning topics are presented in literature:

Networks for in Literature Bishop, Pattern Recognition and Machine Learning: Ch. 9 Mixture Models and EM: K-means, MoG, EM. Ch. 12 Continuous Latent Variables: PCA, Probabilistic and Kernel PCA, NLCA, ICA. Duda, Pattern Classification Ch. 3 ML and Bayesian Parameter estimation: EM. Ch. 10 U. Learning and : Mixtures, K-means, Hier. clustering, component analysis (PCA, NLCA, ICA), SOMs.

Networks for in Literature Theodoridis, Pattern Recognition: Ch. 6 Feature Selection: KLT (PCA), NLPCA, ICA. Ch. 11 : Basic Concepts. Ch. 12 I: Sequential Algorithms. Ch. 13 II: Hierarchical Algorithms. Ch. 14 III: Scheme Based and Function Optimization. Ch. 15 IV. Ch. 16 Validity.

Networks for in Literature Marques, Reconhecimento de padroes. Ch. 3: EM Ch. 4 Unsupervised classification: K-means, VQ, Hier. CLustering, SLC. Ch. 5 NN: Kohonen Maps. Melin, Hybrid Intelligent Systems for Pattern Recognition Using Soft Computing Ch. 5 Unsupervised learning neural networks: Kohonen, LVQ, Hopfield. Ch. 8.

Networks for in Literature Webb, Statistical Pattern Recognition Ch. 2 Density estimation - Parametric: EM. Ch. 9 Feature Extraction and Selection. Ch. 10. Rencher, Methods of Multivariate Analysis Ch. 12 Principal Component Analysis. Ch. 14 Analysis.

Networks for in Literature Single Linkage K-means Soft DBSCAN PCA NLCA ICA SVD Networks for unsupervised Learning Kohonen LVQ Hopfield

Networks for Single Linkage K-Means Soft DBSCAN : Take a set of objects and put them into groups in such a way that objects in the same group or cluster are more similar to each other that to those in other groups or clusters.

Networks for Single Linkage K-Means Soft DBSCAN Our first clustering task

Networks for Single Linkage K-Means Soft DBSCAN Our first clustering task

Networks for Single Linkage K-Means Soft DBSCAN OCD clustering

Networks for Single Linkage K-Means Soft DBSCAN OCD clustering

Networks for Single Linkage K-Means Soft DBSCAN Basic Problem Given: A set of objects X Inter-object distance matrix D, D(x, y) = D(y, x) {x, y} X Output: A set of partitions P D (x) = P D (y) if x and y in the same cluster. Trivial clustering: x X P D (x) = 1 x X P D (x) = x

Networks for Single Linkage K-Means Soft DBSCAN Single Linkage Consider each object in a cluster (n objects) Define inter-cluster distance as the distance between the closest two point in the two clusters Merge two closest clusters Repeat n k times to make k clusters

Networks for Single Linkage K-Means Soft DBSCAN K-means Pick k center at random Each center claims its closest points Recompute center by averaging clustered points Repeat until converge

Networks for Single Linkage K-Means Soft DBSCAN K-means P t (x): Partition/Cluster of object x C t i : Set of points in cluster i = {x P(x) = i} Center t i : y C i t C i y

Networks for Single Linkage K-Means Soft DBSCAN K-means Centeri 0 P t (x) = argmin Center t i : i y y C i t C i x center t 1 i 2

Networks for Single Linkage K-Means Soft DBSCAN Soft Allows a point to be shared by two clusters, assuming that data was generated by: - Select one of K-gaussians uniformly (Fixed known variance) - Sample x i from that gaussian - Repeat n times

Networks for Single Linkage K-Means Soft DBSCAN Soft Task: - Find a hypothesis h =< µ 1, µ 2,..., µ k > that maximises the probability of the data (Maximum Likelihood)

Networks for Single Linkage K-Means Soft DBSCAN Expectation Maximization Expectation: - Expectation: E[z ij ] = P(x=x i µ=µ j ) Maximization: - Maximization:µ j = k P(x=x i µ=µ j ) i=1 E[z ij ].x i i E[z ij ] i *P(x = x i µ = µ j ) = e ( 1/2) σ2 (x j u j ) 2

Networks for Single Linkage K-Means Soft DBSCAN DBSCAN - - Density-Based Spatial for Applications with Noise - Proposed by Ester et al in 1996 in KDD conference [Ester96].

Networks for Single Linkage K-Means Soft DBSCAN DBSCAN - Explanation - The algorithm needs two parameters: Eps(ɛ) and minpts - Also are defined two types of points: Core points and border points - p is a core point if in its Eps-Neighborhood are at least minpts points. Types of points [Ester96]

Networks for Single Linkage K-Means Soft DBSCAN DBSCAN - Algorithm - DBSCAN starts at an arbitrary point p, then evaluate if point s Eps-Neighnorhood contains at least minpts points - If True, p is a core point (Is in a cluster) - Assign clusterid to p and its neighbour, and neighbours of its neighbours and so on. - Increase clusterid. - If False, p is labelled as Noise - Continue with an unlabelled point, until all points in dataset are labelled.

Networks for PCA Also referred as Karhunen-Loeve Transform, is a statistical procedure to convert a set of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.

Networks for PCA Procedure Let Φ k be the eigenvector corresponding to the kth eigenvalue λ k of the covariance matrix Σ x,i.e.: Σ x Φ k = λ k Φ k (k = 1,..., N) As the covariance matrix Σ x is hermitian, eigenvectors φ i s are ortogonals.

Networks for PCA Procedure We can define the matrix Φ as: Φ = [φ 1,..., φ N ] satisfying: Φ T Φ = I i.e. Φ 1 = Φ T The N eigenequations above can be combined to expressed as: Σ x Φ = ΦΛ where Λ = diag(λ 1,..., Λ N )

Networks for PCA Procedure Now, given a signal vector x, Karunen-Loeve Transform is defined as: y = Φ T x where the i th component y i of the transform vector, is the projection of x i onto φ i : y i = φ i, x i = φ T i x

Networks for Kohonen Maps LVQ Unsupervised learning Neural Networks attempt to learn to respond to different input patterns with different parts of the network. The network is often trained to strengthen firing to respond to frequently occurring patterns. In this manner, the network develops certain internal representations for encoding input patterns.

Networks for Kohonen Maps LVQ Competitive Networks With no available information regarding the desired outputs, unsupervised learning networks update weights only on the basis of the input patterns. The competitive learning network is a popular scheme to achieve this type of unsupervised data clustering or classification.

Networks for Kohonen Maps LVQ Competitive Networks Competitive Learning Network [MallinXX]

Networks for Kohonen Maps LVQ Input vector: X = [x 1, x 2, x 3 ] T Weight vector for output unit j: W j = [w 1j, w 2j, w 3j ] T Activation value for output unit j; a j = 3 i=1 x i w ij = X T W j = Wj T X

Networks for Kohonen Maps LVQ Competitive Networks The output unit with the highest activation must be selected for further processing. Assuming that output unit k has the maximal activation, the weights leading to this unit are updated according to the competitive or the so-called winner-take-all learning rule: w k (t + 1) = w k(t) + η(x(t) w k (t)) w k (t) + η(x(t) w k (t))

Networks for Kohonen Maps LVQ Competitive Networks Using the Euclidean distance as a dissimilarity measure is a more general scheme of competitive learning, in which the activation of output unit j is a j = ( 3 (x i w ij ) 2 ) 0.5 = x w j i=1 The weights of the output unit with the smallest activation are updated according to: w k (t + 1) = w k (t) + η(x(t) w k (t))

Networks for Kohonen Maps LVQ Competitive Networks Here two metrics of similarity are introduced: the similarity measure of inner product and the dissimilarity measure of the Euclidean distance. Obviously, other metrics can be used instead, and different selections lead to different clustering results.

Networks for Kohonen Maps LVQ Competitive Networks When the Euclidean distance is adopted, it can be proved that the update formula is actually an on-line version of gradient descent that minimizes the following objection function: E = p w f (xp) x p 2 f (x p ) is the wining neuron when input x p is presented. w f (xp) is the center of the class where x p belongs to.

Networks for Kohonen Maps LVQ Competitive Networks Dynamically changing the learning rate η in the weight update formula is generally desired. An initial large value of η explores the data space widely; later on, a progressively smaller value refines the weights. If the output units of a competitive learning network are arranged in a geometric manner (such as in a one-dimensional vector or two-dimensional array), then we can update the weights of the winners as well as the neighboring losers.

Networks for Kohonen Maps LVQ Kohonen self-organizing networks, also known as Kohonen feature maps or topology-preserving maps, are another competition-based network paradigm for data clustering (Kohonen, 1982, 1984). Networks of this type impose a neighborhood constraint on the output units, such that a certain topological property in the input data is reflected in the output units weights.

Networks for Kohonen Maps LVQ Kohonen Map with 2 input and 49 output units. The learning procedure of Kohonen feature maps is similar to that of competitive learning networks. That is, a similarity (dissimilarity) measure is selected and the winning unit is considered to be the one with the largest (smallest) activation.

Networks for Kohonen Maps LVQ Training Select wining output unit between all weight vectors w i and the input vector x: x w c = min x w i i

Networks for Kohonen Maps LVQ Training Let NB c denote a set of index corresponding to a neighborhood around winner c. New weights are updated by: w i = η(x w i ), i NB c Where η is a small positive learning rate. Also is possible not define a neighborhood and use a Gaussian function instead: Ω c(i) = exp( p i p c 2 2σ 2 ) w i = ηω c(i) (x w i )

Networks for Kohonen Maps LVQ Learning Vector Quantization (LVQ) is an adaptive data classification method based on training data with desired class information (Kohonen, 1989). Although a supervised training method, LVQ employs unsupervised data-clustering techniques (e.g., competitive learning) to preprocess the data set and obtain cluster centers. LVQ s network architecture closely resembles that of a competitive learning network, except that each output unit is associated with a class.

Networks for Kohonen Maps LVQ LVQ network representation. LVQ s network architecture closely resembles that of a competitive learning network, except that each output unit is associated with a class. Figure 5.11 presents an example, where the input dimension is 2 and the input space is divided into four clusters. Two of the clusters belong to class 1, and the other two clusters belong to class 2.

Networks for Kohonen Maps LVQ Training The LVQ learning algorithm involves two steps. In the first step, an unsupervised learning data clustering method is used to locate several cluster centers without using the class information. In the second step, the class information is used to fine-tune the cluster centers to minimize the number of misclassified cases.

Networks for Kohonen Maps LVQ Training Initialise the cluster centers by a clustering method. Label each cluster by the voting method Randomly select a training input vector x and find k such that x w k is a minimum. If x and w k belong to the same class, update w k by: Otherwise, update w k by: w k = η(x w k ) w k = η(x w k ) *The learning rate η is a positive small constant and should decrease in value with each respective iteration.

Networks for Kohonen Maps LVQ Example Initial Approximation of LVQ. Final Classification achieved

Networks for Code available at: http://github.com/gustavovelascoh/unsupervised-learning-class