Chapter 7: Competitive learning, clustering, and self-organizing maps

Similar documents
Function approximation using RBF network. 10 basis functions and 25 data points.

Slide07 Haykin Chapter 9: Self-Organizing Maps

Data Mining. Kohonen Networks. Data Mining Course: Sharif University of Technology 1

Unsupervised Learning

Exploratory Data Analysis using Self-Organizing Maps. Madhumanti Ray

Figure (5) Kohonen Self-Organized Map

Artificial Neural Networks Unsupervised learning: SOM

A Topography-Preserving Latent Variable Model with Learning Metrics

Unsupervised Learning

Image Compression Using SOFM

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Nonlinear dimensionality reduction of large datasets for data exploration

Cluster analysis of 3D seismic data for oil and gas exploration

Supervised vs.unsupervised Learning

Self-Organizing Map. presentation by Andreas Töscher. 19. May 2008

A SOM-view of oilfield data: A novel vector field visualization for Self-Organizing Maps and its applications in the petroleum industry

A Population Based Convergence Criterion for Self-Organizing Maps

Unsupervised Learning

4. Cluster Analysis. Francesc J. Ferri. Dept. d Informàtica. Universitat de València. Febrer F.J. Ferri (Univ. València) AIRF 2/ / 1

Neuro-Fuzzy Comp. Ch. 8 May 12, 2005

CSC 411: Lecture 12: Clustering

Methods for Intelligent Systems

Review: Final Exam CPSC Artificial Intelligence Michael M. Richter

Image Segmentation. Shengnan Wang

11/14/2010 Intelligent Systems and Soft Computing 1

SGN (4 cr) Chapter 11

Operators-Based on Second Derivative double derivative Laplacian operator Laplacian Operator Laplacian Of Gaussian (LOG) Operator LOG

Clustering with Reinforcement Learning

Network Traffic Measurements and Analysis

Neural Network Weight Selection Using Genetic Algorithms

Segmentation: Clustering, Graph Cut and EM

Differential Compression and Optimal Caching Methods for Content-Based Image Search Systems

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

CHAPTER 6 COUNTER PROPAGATION NEURAL NETWORK IN GAIT RECOGNITION

Centroid Neural Network based clustering technique using competetive learning

Unsupervised Learning

Nonlinear projections. Motivation. High-dimensional. data are. Perceptron) ) or RBFN. Multi-Layer. Example: : MLP (Multi(

Modified Self-Organizing Mixture Network for Probability Density Estimation and Classification

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS

Advanced Data Visualization

Self-organizing mixture models

Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching

Richard S. Zemel 1 Georey E. Hinton North Torrey Pines Rd. Toronto, ONT M5S 1A4. Abstract

Non-linear dimension reduction

Seismic regionalization based on an artificial neural network

Advanced visualization techniques for Self-Organizing Maps with graph-based methods

Convex Clustering with Exemplar-Based Models

ECG782: Multidimensional Digital Signal Processing

Multi-Clustering Centers Approach to Enhancing the Performance of SOM Clustering Ability

IMAGE ANALYSIS, CLASSIFICATION, and CHANGE DETECTION in REMOTE SENSING

CHAPTER 3 TUMOR DETECTION BASED ON NEURO-FUZZY TECHNIQUE

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Extract an Essential Skeleton of a Character as a Graph from a Character Image

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES

Review of the Robust K-means Algorithm and Comparison with Other Clustering Methods

MRT based Adaptive Transform Coder with Classified Vector Quantization (MATC-CVQ)

Machine Learning : Clustering, Self-Organizing Maps

A Review on LBG Algorithm for Image Compression

Self-Organizing Maps for cyclic and unbounded graphs

arxiv: v1 [physics.data-an] 27 Sep 2007

Local Features Tutorial: Nov. 8, 04

An Adaptive and Deterministic Method for Initializing the Lloyd-Max Algorithm

K-Means Clustering. Sargur Srihari

Bioimage Informatics

Building Adaptive Basis Functions with a Continuous Self-Organizing Map

SIFT - scale-invariant feature transform Konrad Schindler

In the Name of God. Lecture 17: ANFIS Adaptive Network-Based Fuzzy Inference System

Data Mining. Moustafa ElBadry. A thesis submitted in fulfillment of the requirements for the degree of Bachelor of Arts in Mathematics

CHAPTER FOUR NEURAL NETWORK SELF- ORGANIZING MAP

CS 664 Segmentation. Daniel Huttenlocher

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

9 Unsupervised Learning (1.Competitive Learning)

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

Segmentation Computer Vision Spring 2018, Lecture 27

Clustering CS 550: Machine Learning

Graph projection techniques for Self-Organizing Maps

Modification of the Growing Neural Gas Algorithm for Cluster Analysis

The Pre-Image Problem in Kernel Methods

University of Florida CISE department Gator Engineering. Clustering Part 5

Learning Population Codes by Minimizing Description Length

ECE 5424: Introduction to Machine Learning

Region-based Segmentation

MTTTS17 Dimensionality Reduction and Visualization. Spring 2018 Jaakko Peltonen. Lecture 11: Neighbor Embedding Methods continued

Image Restoration using Markov Random Fields

CS325 Artificial Intelligence Ch. 20 Unsupervised Machine Learning

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

EE795: Computer Vision and Intelligent Systems

(University Improving of Montreal) Generative Adversarial Networks with Denoising Feature Matching / 17

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

A Dendrogram. Bioinformatics (Lec 17)

A Survey on Image Segmentation Using Clustering Techniques

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016

Growing Neural Gas A Parallel Approach

A Self Organizing Map for dissimilarity data 0

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

EECS 556 Image Processing W 09. Image enhancement. Smoothing and noise removal Sharpening filters

Transcription:

Chapter 7: Competitive learning, clustering, and self-organizing maps António R. C. Paiva EEL 6814 Spring 2008

Outline Competitive learning Clustering Self-Organizing Maps

What is competition in neural networks? Competition means that, given the input, the PEs in a neural network will compete for the resources, such as the output. For every input the PEs will produce an output. Only the most suitable output is utilized. Only the winner PE is updated. As an analogy, consider bidding in the stock market. The stock are the input, and each broker competes by bidding with a value. The most suitable output is the highest value!

Why is competition necessary? Competition creates specialization in the network. Specialization means that, through competition, the PEs are tuned for different areas of the input space. In many situations, resources are limited, so competition recreates these natural constraints in the environment. Competition is the base concept for clustering and self-organizing maps (SOMs).

Characteristics of competitive learning Competitive learning is typically applied to a single-layer topology. Formulations using multi-layer topologies exist but typically employ independent competition on each layer. Competitive learning is unsupervised learning. Competition is by itself a non-linear process and thus difficult to treat mathematically.

Criteria for competitive learning I Error minimization Select the PE such that the output yields the minimum error, y = arg max x y y (Notice that error can be defined with different metrics and depends on the application.) Utilized in the formulation of clustering methods and SOM.

Criteria for competitive learning II Entropy maximization Selects the PE such that, on average, all PEs are equally likely to be a winner. Put differently, an histogram of how many times each PE was a winner is approximately uniform. Important for density estimation. Depends on the formulation, but entropy maximization can be achieved by error minimization.

Outline Competitive learning Clustering Self-Organizing Maps

Clustering Clustering is a particular example of competitive learning, and therefore unsupervised learning. Clustering aims at representing the input space of the data with a small number of reference points. The reference points are called centroids and each centroid defines a cluster. The difference with PCA is that a cluster is a hard neighborhood. That is, any point in the neighborhood of the reference point is represented by that point.

K-means K-means is perhaps the simplest and most widely used clustering method. K-means minimizes the reconstruction MSE. Cost function: J = K i=1 x j C i y i x j 2, where K is the number of clusters (or centroids), y i is the ith centroid and C i are the input data points within the corresponding cluster.

Optimization of K-means Take the gradient of the cost function J y k = K 2(y i x j ) (y i x j ) y x j C k i i=1 = x j C k 2(y k x j ) Setting the gradient to zero gives the fixed-point update rule, x j C k x j = x j C k y k = N k y k y k = 1 N k x j C k

Algorithm 1. Initialization: Select K random data points as centroids. 2. While change in J is large Note: 2.1 Assign each data point to the nearest centroid (i.e., smaller error). 2.2 Compute new location of centroids. For finite data, this algorithm is known to converge to a local minima in a finite number of steps.

Demo From: http://www.neuroinformatik.ruhr-uni-bochum.de/ini/vdm/ research/gsn/demogng/lbg_2.html

Clustering and vector quantization I Vector quantization is an important application of clustering in engineering. The K-means is equivalent to the Linde-Buze-Gray (LBG) algorithm commonly known in vector quantization.

Clustering and vector quantization II Vector quantization is a form of clustering aimed at lossy data compression. Idea: The centroids are known both by the encoder and decoder. Instead of transmit the data, send the index of the centroid that is nearest. At the decoder end, use the centroid itself as an approximation to the original data data point.

Outline Competitive learning Clustering Self-Organizing Maps

Self-Organizing Maps Self-organizing maps (SOMs; also known as Kohonen SOM maps) are another example of competitive learning. The goal of SOM is to transform the input space into a 1-D or 2-D discrete map in a topologically ordered fashion. Input

Distinct feature SOM builds a topologically preserving map. Topologically preserving means that data points close in the input space are represented by nearby points in the SOM. Self-organizing means that the competitive learning process finds this topology directly from data.

Implementation Each step of training involves three processes: 1. Competition: The network computes the winner PE based on some criterion. 2. Cooperation: The winner PE defines a neighborhood PEs that are updated. 3. Adaptation: All PEs in the neighborhood of the winner PE are adapted to optimize the criterion, weighted by the topological distance to the winner PE. Training of the SOM can be divided in two phases: ordering and convergence.

Cooperation process I Cooperation between neighboring PEs implements lateral interaction between neurons in biological systems. Cooperation is obtained by soft-competition. Although there is still only one winner, a neighborhood of the winner is updated.

Cooperation process II The neighborhood function h i,j must be: Symmetric around the origin. This implies that the function is shift-invariant. The amplitude must decrease monotonically to zero. The width must be adjustable. A typically choice for the neighborhood function is the Gaussian: h i,j ( ) 0 d i,j h i,j = exp d2 i,j 2σ 2 (n) (d i,j is the topological distance; i.e., the distance in the map.)

Cooperation process III Cooperation forces neighboring PEs to tune for neighboring areas of the input space. Cooperation is the process responsible for self-organization (and topology preservation) in SOMs.

Adaptation process I At each step the PEs are adapted according to w i (n + 1) = w i (n) + η(n)h i,j (n)(x w i (n)), where i is the index of the winner PE.

Adaptation process II At each epoch, the stepsize η and width of the neighborhood function σ are reduced according to some rule. For example, η(n) = σ(n) = η 0 1 + α n n max, σ 0 1 + β n n max. Decreasing η creates simulated annealing. Decreasing σ means that cooperation exist at first, but towards the end, the PEs will fine tune to their specific areas.

Demo From: http://www.neuroinformatik.ruhr-uni-bochum.de/ini/vdm/ research/gsn/demogng/som.html

Applications Visualization of higher dimensional data or process. Density estimation.