Artificial Neural Networks Unsupervised learning: SOM

Similar documents
Function approximation using RBF network. 10 basis functions and 25 data points.

Self-Organizing Map. presentation by Andreas Töscher. 19. May 2008

Figure (5) Kohonen Self-Organized Map

Chapter 7: Competitive learning, clustering, and self-organizing maps

Slide07 Haykin Chapter 9: Self-Organizing Maps

Unsupervised Learning

Supervised vs.unsupervised Learning

5.6 Self-organizing maps (SOM) [Book, Sect. 10.3]

Unsupervised Learning

Unsupervised Learning

Neuro-Fuzzy Comp. Ch. 8 May 12, 2005

Artificial Neural Networks MLP, RBF & GMDH

Exploratory Data Analysis using Self-Organizing Maps. Madhumanti Ray

Machine Learning : Clustering, Self-Organizing Maps

COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS

Introduction to Machine Learning

DESIGN OF KOHONEN SELF-ORGANIZING MAP WITH REDUCED STRUCTURE

What is a receptive field? Why a sensory neuron has such particular RF How a RF was developed?

Methods for Intelligent Systems

11/14/2010 Intelligent Systems and Soft Computing 1

Nonlinear projections. Motivation. High-dimensional. data are. Perceptron) ) or RBFN. Multi-Layer. Example: : MLP (Multi(

PATTERN RECOGNITION USING NEURAL NETWORKS

Advanced visualization techniques for Self-Organizing Maps with graph-based methods

Review: Final Exam CPSC Artificial Intelligence Michael M. Richter

Autoorganised Structures for Extraction of Perceptual Primitives

Two-step Modified SOM for Parallel Calculation

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

Self-Organizing Maps for cyclic and unbounded graphs

Artificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism)

Cluster analysis of 3D seismic data for oil and gas exploration

Growing Neural Gas A Parallel Approach

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Seismic regionalization based on an artificial neural network

Inductive Modelling of Temporal Sequences by Means of Self-organization

arxiv: v1 [physics.data-an] 27 Sep 2007

CHAPTER 6 COUNTER PROPAGATION NEURAL NETWORK FOR IMAGE RESTORATION

Segmentation: Clustering, Graph Cut and EM

Road Sign Visualization with Principal Component Analysis and Emergent Self-Organizing Map

Reducing topological defects in self-organizing maps using multiple scale neighborhood functions

Cluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6

A Population Based Convergence Criterion for Self-Organizing Maps

Clustering with Reinforcement Learning

Unsupervised Learning

CHAPTER FOUR NEURAL NETWORK SELF- ORGANIZING MAP

Lecture Topic Projects

Machine Learning Methods in Visualisation for Big Data 2018

ECG782: Multidimensional Digital Signal Processing

The Traveling Salesman

Data Warehousing and Machine Learning

Instance-based Learning

Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing

CPSC 340: Machine Learning and Data Mining. Multi-Dimensional Scaling Fall 2017

Preprocessing DWML, /33

A Topography-Preserving Latent Variable Model with Learning Metrics

Machine Learning Reliability Techniques for Composite Materials in Structural Applications.

Performance Degradation Assessment and Fault Diagnosis of Bearing Based on EMD and PCA-SOM

/00/$10.00 (C) 2000 IEEE

Radial Basis Function Networks: Algorithms

Application of genetic algorithms and Kohonen networks to cluster analysis

Initialization of Self-Organizing Maps: Principal Components Versus Random Initialization. A Case Study

A Dendrogram. Bioinformatics (Lec 17)

Affine Arithmetic Self Organizing Map

1 Training/Validation/Testing

Markov chain Monte Carlo methods

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Richard S. Zemel 1 Georey E. Hinton North Torrey Pines Rd. Toronto, ONT M5S 1A4. Abstract

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Multiresponse Sparse Regression with Application to Multidimensional Scaling

Extract an Essential Skeleton of a Character as a Graph from a Character Image

Introduction to Neural Networks

A novel firing rule for training Kohonen selforganising

Fuzzy-Kernel Learning Vector Quantization

6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

Machine Learning for OR & FE

ESANN'2006 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2006, d-side publi., ISBN

DS504/CS586: Big Data Analytics Big Data Clustering II

Controlling the spread of dynamic self-organising maps

Unsupervised learning

Free Projection SOM: A New Method For SOM-Based Cluster Visualization

Neuro-Fuzzy Computing

Markov Random Fields and Gibbs Sampling for Image Denoising

Finding Clusters 1 / 60

Online Learning for Object Recognition with a Hierarchical Visual Cortex Model

Artificial Neural Networks

DS504/CS586: Big Data Analytics Big Data Clustering II

2D image segmentation based on spatial coherence

SGN (4 cr) Chapter 11

SOM+EOF for Finding Missing Values

Unsupervised Learning

Tight Clusters and Smooth Manifolds with the Harmonic Topographic Map.

CS325 Artificial Intelligence Ch. 20 Unsupervised Machine Learning

Learning a Manifold as an Atlas Supplementary Material

Bioinformatics - Lecture 07

Self-Organizing Sparse Codes

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Unsupervised Recursive Sequence Processing

CSC321: Neural Networks. Lecture 13: Learning without a teacher: Autoencoders and Principal Components Analysis. Geoffrey Hinton

Support Vector Machines

Transcription:

Artificial Neural Networks Unsupervised learning: SOM 01001110 01100101 01110101 01110010 01101111 01101110 01101111 01110110 01100001 00100000 01110011 01101011 01110101 01110000 01101001 01101110 01100001 00100000 01101011 01100001 01110100 01100101 01100100 01110010 01111001 00100000 01110000 01101111 01100011 01101001 01110100 01100001 01100011 01110101 00101100 00100000 01000110 01000101 01001100 00100000 01000011 01010110 01010101 01010100 00101100 00100000 Jan Drchal drchajan@fel.cvut.cz Computational Intelligence Group Department of Computer Science and Engineering Faculty of Electrical Engineering Czech Technical University in Prague

Outline Competitive learning. Self-organization, Vector Quantization, Cluster Analysis. SOM architecture and learning. SOM visualizations. SOM evaluation.

Competitive Learning Nature inspired. No arbiter needed unsupervised learning. Individuals (units, neurons) learn from examples. System self-organizes. Now we are going to apply this to cluster analysis.

SOM SOM = Self Organizing Maps. Prof. Teuvo Kohonen, Finsko, TU Helsinki, 1981, several thousands scientific publications since...

SOM Kohonen's Application Original application: phonetic typewriter : Finish language.

SOM Inspiration Brain represents the world in a topological way. Exterior spatial relations are mapped to similar spatial relations in the brain: i.e. signals from hand and arm are processed nearby.

SOM Inspiration II Bear, Connors & Paradiso (2001). Neuroscience: Exploring The Brain. Pg. 474.

SOM Overview Single layer, feed-forward. Unsupervised, self-organization. No output, instead Winner-takes-all. Used for cluster analysis. Performs vector quantization. Not a classifier! But can be simply transformed into one by adding another layer.

What is Self-Organization? Self-organization of a system is a process which leads to a rise of a quality of its inner configuration while not using any information from outside. Self-organization clears up relationships between parts of a system.

What is Cluster Analysis Assignment of a set of observations into subsets (clusters). A measure of similarity is defined: observations in the same cluster are similar, observations between two clusters are dissimilar. Classic cluster analysis works with Rn input space observations. See: http://en.wikipedia.org/wiki/cluster_analysis

What is Vector Quantization The goal of Vector Quantization is to approximate the probability density p(x) of real input vectors x Rn distribution using finite number of representatives wi Rn. The representative vectors tend to drift there where the data is dense, while there tends to be only a few of them where data is sparsely located. In this manner, the net tends to approximate the probability density of the input data. Hollmen '96

Vector Quantization Example Blue points are the input vectors. Red points are the representatives.

VQ by SOM 1D SOM of 4 neurons

Why the Different Result? SOM works with neighbourhood. Representatives influence each other. They form elastic : chain for 1D SOM, mesh for higher dimensions. g(x) + + - 0 - x

SOM Architecture 1/3 Representative Typically: 2D mesh of representatives (neurons)

SOM Architecture 2/3 Arrangements: 1D linear quite often, 2D mesh most frequently, 3D (and higher dimensions) exceptionally problematic visualization. The arrangement defines neighbourhood of a neuron. Kohonen suggests: rectangular SOM!

SOM Architecture 3/3 Input vector x has a dimension N. Each neuron has a weight vector w of the same dimension N. Weight vectors of all neurons are compared to x. The most similar is chosen BMU (Best Matching Unit). BMU becomes a representative of vector x. neuron i Kohonen layer wi winning neuron: BMU

SOM Neuron 1/2 Evaluates the similarity of input vector x and weight vector wi. Similarity: i.e. Euclidean The most similar neuron to a input vector is chosen (BMU): j =argmin { x w i }, * i SOM neuron is a representative of a cluster.

SOM Neuron 2/2 Note, we don't have to use Euclidean distance. We can use directional similarity expressed by the dot product: * T j =arg max { x ( t ) w i ( t ) }. i Why max here? Note: A.B A B cos = http://en.wikipedia.org/wiki/dot_product

Learning SOM Initialization (random weights). Apply input pattern x = (x1, x2, xn). Compute distances. Select BMU neuron j. Adjust weights for all neurons i: w i t 1 =w i t ij t [ x t w i t ] Continue with next pattern. Neighbourhood function

Example [ ] W = 0.27 [ 0.81 ] X = 0.52 0.12 1 [ ] W 2= 0.42 0.70 [ ] W 3= 0.43 0.21 The third vector is the winner (BMU).

Example contd. Let's move the neuron closer to the input pattern: w ij t 1 =w ij t t [ xi t w ij t ] Δw 13=η( t )( x 1 w13 )=0. 1(0. 52 0. 43)= 0. 01 Δw 23=η( t )( x 2 w23 )=0. 1(0. 12 0. 21)= 0. 01 [ ][ ][ ] W 3 ( t +1)=W 3 ( t )+ ΔW 3 ( t )= 0.43 + 0.01 = 0.44 0.21 0.01 0.2 We adjusted only BMU weights. Here, the winner takes all.

What About Updating Also Neurons in the Neighbourhood?

Neighbourhood for SOM Timo Honkela (Description of Kohonen's Self-Organizing Map)

Common Neighbourhoods square hexagonal T. Kohonen: Self Organizing Maps

Learning SOM II The neighbourhood plays important role when learning SOM: topological arrangement, neighbour distances. Neighbourhood changes in time: its diameter decreases (to zero). The change is realised by neighbourhood function η(t).

Gaussian Neighbourhood Neighbourhood function for neuron i. ( 2 r j r i ηij ( t ) =α ( t ). exp 2 2σ (t ) * * ) Where j* is the BMU, r the position of neuron in map, and function α(t): learning rate. The exp expression represents neighbourhood shape.

Gaussian Neighbourhood Distance related learning

Neighbourhood Related Functions Hollmen '96, MSc.

Learning Process During the learning the BMU (and its neighbours) is adapted to get closer to the input pattern which have caused its activation. Neurons are moving towards the input pattern. What influences the magnitude of the approach?

SOM Applications To visualize data. To cover the input space by representatives. 1D 2D

Or Slide by Johan Everts

Example: Learning Dot-Product SOM

More Examples Covering a triangle by 1D SOM. T. Kohonen: Self Organizing Maps

More Examples contd. Covering a square by 2D SOM. T. Kohonen: Self Organizing Maps

Possible Problem: Knots This problem is not likely to be corrected by further learning if the plasticity is low: Rojas: Neural Networks - A Systematic Introduction

What is the Cause? There are many: Random initialization of weights we are unable the change bad initial orientation of vectors. Choice of a neighbourhood function. Scheduling of neighbourhood modification in time. Input data of course...

What Can Help? Same weights for all neurons initially each neuron has a same chance to represent a pattern. Add random noise to input patterns at start. Lateral inhibition...

Lateral Inhibition When choosing the BMU we do not pick isolated winner. The choice does not depend on an activation of a single neuron but also on activity of its neighbours...

Lateral Inhibition II I j =I I =d j k g jk I k l j f j neighbours j-th neuron response neighbourhood response local response lateral inhibition interaction distance from input vector

Lateral Inhibition Functions g(x) g(x) + + - 0 biological 1 - x 0 x simplified

SOM Visualization How to visualize representatives? Weight dimension = input vector dimension. How to show in 2D? U-matrix, P-matrix, PCA (linear projection), Sammon's projection (non-linear).

U-matrix (UMAT) Visualizes distances between neurons: Dark coloring between neurons large distance. Light close in input space. Dark gaps separate clusters. Neuron colour reflects the distance of its weight vector to all other weight vectors, again: dark large distance, light close distance.

U-matrix Example data neurons distance between adjacent neurons

P-matrix (Pareto Density Estimation) Shows the number of input space vectors which belong to a sphere centered in the neuron's weight vector. Visualizes data density. Neurons with high value belong to dense areas of input space. Neurons with low value are lonesome. Valleys separate clusters ( plateaus ).

Feature Plots clustering UMAP feature plot shows a value of a single component (feature) of a weight vector can be used to check if two components correlate http://en.wikipedia.org/wiki/self-organizing_map

Drawbacks of UMAT, PMAT Only distances between neighbours. New learning on the same data may give different results: (i.e. 90 degrees rotation) Not intuitive. How can we show high-dimensional data in 2D(3D) keeping notion of original distances?

PCA Principal Component Analysis. Linear transformation to a new coordinate system such that: 1st coordinate (principal component) greatest variance by any projection of the data 2nd coordinate 2nd greatest variance etc. Dimension reduction use only N first coordinates, throw the rest away...

Principal Components Example 2nd PC 1st PC new coordinate system original coordinate system http://en.wikipedia.org/wiki/principal_component_analysis

Sammon's Projection Non-linear reduction of higher-dimensional space to lower-dimensional space. d (i, j ) Tries to preserve distances. distance in high dim. space energy function low for similar distances in both spaces. E = N 1 N 1 1 N N d (i, j ) i=1 j=i+1 ( d (i, j ) d distance in low dim. space (i, j )) d (i, j ) * x2 x1 2 y2 3D to 2D d * (i, j ) i=1 j=i+1 x3 Energy function is a subject to minimization (originally using gradient descent). y1

Standard SOM Visualizations UMAT Sammon neuron weights projected to 2D, neighbours connected

SOM Applications Detection of similar images. http://www.generation5.org/content/2007/kohonenimage.asp

ReefSOM SOM visualization for non-experts. UMAT + glyphs. http://www.brains-minds-media.org/archive/305

SOM Evaluation VQ vector quantization, more input vectors mapped into a single neuron quantization error or distortion. Compression of an input space dimension. Preserves data topology neighbour vectors (from an input space) are mapped to neighbour neurons (in the mesh) topographic error.

SOM Quantization Error & Distortion Quantization Error average distance between input vector and its BMU (computed over all input vectors). precision of mapping. Distortion count with neighbours: E= i N j I i, bmu j w i x j 2 neurons input vectors Energy function again!

Topographic Error of SOM # of input vectors, for which the winner (BMU) and the second best neuron are not adjacent in the mesh.

Next Lecture Universal approximation. Kolmogorov's theorem. RBF networks. GMDH networks.