Self-Organizing Maps (SOM)

Size: px

Start display at page:

Download "Self-Organizing Maps (SOM)"

Prudence Harvey
6 years ago
Views:

Overview (SOM) Basics Sequential Training (On-Line Learning) Batch SOM Visualizing the SOM - SOM Grid - Music Description Map (MDM) - Bar Plots and Chernoff's Faces - U-Matrix and Distance Matrix -

1 Overview (SOM) Basics Sequential Training (On-Line Learning) Batch SOM Visualizing the SOM - SOM Grid - Music Description Map (MDM) - Bar Plots and Chernoff's Faces - U-Matrix and Distance Matrix - Smoothed Data Histogram (SDH) - Component Planes Univ.-Ass. Dr. Markus Schedl Department of Computational Perception Johannes Kepler University Linz Growing Hierarchical SOM Aligned SOM markus.schedl@jku.at Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 2 Self-Organizing Map (SOM): Basics SOM: "neural network" [Kohonen, 1982], [Kohonen, 2001] SOM ~ k-means clustering + topology preservation preservation of non-linear relationship between data items Basic Architecture: Map: 2-dimensional array of interconnected units ("neurons") connections define fixed topology "neighborhood" units represent cluster centers (prototypes, "model vectors", "weight vector", "reference vector") Different Topologies / Grid Structures Interpretation: clustering with topology constraints (similar data items should be placed close to each other on the map) mapping from data/feature/input space to low-dim. visualization space + tighter relationship between clusters + more connections + grid structure fits Gaussian structure in neighborhood kernel calculation (centroids of neighboring map units are equidistant) + easier to implement diagonally neighboring map units do not perfectly fit to Gaussian neighborhood function 2010 Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 4

An Application of the Self-Organizing Map: The neptune Interface On-line Learning: The Online Training Algorithm Input: map of units u i with model vectors m i ("codebook") training instances X = {x

2 An Application of the Self-Organizing Map: The neptune Interface On-line Learning: The Online Training Algorithm Input: map of units u i with model vectors m i ("codebook") training instances X = {x i } a similarity measure sim(.,.) between data items (e.g., Euclidean distance) parameters: α(t) (learning rate [0..1]) and a neighborhood kernel function with parameter r(t) ( neighborhood radius ), 2 2 e.g., pseudo-gaussian u ( t) = exp( d r( t) ) (d ij = map distance btw. u i, u j ) Online SOM Training Algorithm (one possible variant): ij Initialize each unit (model vector) m i to represent a randomly selected data item Loop over time steps t, until convergence: 1. Randomly select an example x 2. Find the winning unit (best matching unit) u c with m c = max i (sim(m i,x)) 3. Adapt model vectors of all units as m i (t +1) = m i (t)+ α(t) u ic (t) [x m i (t)] 4. Update (decrease) training parameters α(t), r(t) ij 2010 Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 6 Off-line Learning: The Batch SOM Algorithm SOM: Illustration Input: map of units u i with model vectors m i training instances X = {x i } a similarity measure between examples (e.g., Euclidean distance) a neighborhood kernel function with parameter r(t) ( neighborhood radius ), 2 2 e.g., pseudo-gaussian u ( t) = exp( d r( t) ) (d ij = map distance btwn. u i, u j ) Batch SOM Training Algorithm (one possible variant): ij Initialize each unit (model vector) m i to represent a randomly selected data item Loop over time steps t, until convergence: 1. Determine the best matching unit u c(i) for each data item x i (i.e., assign each instance to its most similar model vector) Voronoi set 2. Update each model vector m i to better fit the data items assigned to it and the data in its neighborhood: u ( ) ( t) k ic k x k m i ( t + 1) = u ( t) 3. Update (decrease) neighborhood radius r(t) ij k ic( k ) 2010 Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 8

SOM: Illustration Initialization of the Model Vectors Random Initialization: - random values in same range as X (between min and max of each dimension) - randomly select

relationship between units: neighboring units cover similar data items non-uniform distances between model vectors, uniform distances in visualization "interpolation units"

Eigenvalues) span a 2-dimensional subspace initialize model vectors along these Eigenvectors predefined linear mapping to start with + mapping consistent for different runs

on material by Gerhard Widmer and Peter Knees 10 Example: WebSOM Project Example: Browsing Music Collections Support in Browsing (Potentially Huge) Data Sets [Kaski et al.

3 SOM: Illustration Initialization of the Model Vectors Random Initialization: - random values in same range as X (between min and max of each dimension) - randomly select data items from X and assign them to model vectors m i + fast mapping not consistent for different runs each data point (example) x uniquely belongs to a unit (the BMU of x) relationship between units: neighboring units cover similar data items non-uniform distances between model vectors, uniform distances in visualization "interpolation units" (units with no data associated) are possible Linear Initialization: perform Eigendecomposition of autocorrelation matrix of X PCA top 2 Eigenvectors (with largest Eigenvalues) span a 2-dimensional subspace initialize model vectors along these Eigenvectors predefined linear mapping to start with + mapping consistent for different runs (up to rotation / mirroring) computationally more complex 2010 Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 10 Example: WebSOM Project Example: Browsing Music Collections Support in Browsing (Potentially Huge) Data Sets [Kaski et al., 1998] ViSMuC by Schedl, Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 12

Example: Browsing Music Collections (II) Visualizing the SOM Visualizing attribute distributions on top of a learned SOM: Component Planes: visualize feature values of model vectors associated with

4 Example: Browsing Music Collections (II) Visualizing the SOM Visualizing attribute distributions on top of a learned SOM: Component Planes: visualize feature values of model vectors associated with the map units (or averaged feature values over all instances covered by a unit) Bar Charts or Chernoff's Faces: visualize all dimensions of model vectors for each map unit in one plot [Vesanto, 1999], [Vesanto, 2002] PlaySOM, TU Wien, Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 15 Visualizing the SOM Visualizing Attribute Distributions: Component Planes Visualizing the data distribution on top of a learned SOM: Learned Map Component Planes SOM-Grid: each data item is displayed within its BMU Music Description Map (MDM): aggregates similar map units and add descriptive labels [Knees et al., 2006] Horse Zebra Cow Tiger Lion Fox Dog Wolf Small Medium Big 2-Legs 4-Legs Hair Hooves Mane U-Matrix: visualizes distances between units (via color) Cat Feathers Hunt Run Fly Distance Matrix: visualize aggregated distances of model vectors to all neighboring units [Vesanto, 1999], [Vesanto, 2002] Duck Goose Dove Chicken Owl Hawk Eagle Swim Smoothed Data Histogram (SDH): visualizes (smoothed) density of data items in an area [Pampalk et al., 2002] - explain mapping (labeling) - make correlations between attributes visible 2010 Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 17

Visualizing Attribute Distributions: Bar Plots each attribute value (dimension in data space) is displayed via a bar in a d-dimensional bar chart visualization for each map unit Visualizing Attribute

5 Visualizing Attribute Distributions: Bar Plots each attribute value (dimension in data space) is displayed via a bar in a d-dimensional bar chart visualization for each map unit Visualizing Attribute Distributions: Chernoff's Faces psychologically motivated visualization method (people can quickly grasp a face's expression) each attribute value (dimension in data space) is mapped to a specific property of the Chernoff face (e.g., mouth's contour, height/width of face, ear's slope, ) 2010 Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 20 Visualizing the SOM SOM Grid for data set C103a: co-occurrences of artist names Visualizing the data distribution on top of a learned SOM: SOM-Grid: each data item is displayed within its BMU Music Description Map (MDM): aggregates similar map units and add descriptive labels [Knees et al., 2006] U-Matrix: visualizes distances between units (via color) Distance Matrix: visualize aggregated distances of model vectors to all neighboring units [Vesanto, 1999], [Vesanto, 2002] Smoothed Data Histogram (SDH): visualizes (smoothed) density of data items in an area [Pampalk et al., 2002] 2010 Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 22

SOM Grid for data set C103a: co-occurrences of artist names (II) SOM Grid for larger data set 2572 songs, 7 genres, features: MFCCs 2010

Widmer and Peter Knees 24 SOM Grid for larger data set: Aggregate data items using metadata metadata available summarize items w.r.t. properties (e.

, 2006] - extension of the simple SOM grid - describes regions of the map by metadata - aggregates "similar" neighboring map units via

6 SOM Grid for data set C103a: co-occurrences of artist names (II) SOM Grid for larger data set 2572 songs, 7 genres, features: MFCCs 2010 Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 24 SOM Grid for larger data set: Aggregate data items using metadata metadata available summarize items w.r.t. properties (e.g., genre) Music Description Map (MDM) [Knees et al., 2006] - extension of the simple SOM grid - describes regions of the map by metadata - aggregates "similar" neighboring map units via region growing algorithm loss of information: Dance? 2010 Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 26

and k < threshold r 0 u i A if Manhattan distance between units u and i: r 0 < d(u,i) < r 1 1 G F 2 t, u t, u = 2 ( u Ft, k ) k A 0 u F i A t, i 1 f a a, u tft, fa, u = v a a tf #tracks of artist a

7 MDM (II): Labeling Map Units MDM (II): Connecting Similar Map Units determining the goodness G 2 t,u of a term t for map unit u according to [Lagus, Kaski, 1999]: 1. sort all units w.r.t. G 2 -values of contained terms U u k A0 if Manhattan distance between units u and k < threshold r 0 u i A if Manhattan distance between units u and i: r 0 < d(u,i) < r 1 1 G F 2 t, u t, u = 2 ( u Ft, k ) k A 0 u F i A t, i 1 f a a, u tft, fa, u = v a a tf #tracks of artist a on unit u v, a term frequency of term t for artist a 2. remove highest ranked unit u U, find similarly labeled units among u's neighbors if cosine similarity between label vectors of map unit u and its neighbors i < threshold θ, aggregate u and i 3. goto 2 filter all terms t with G 2 t,u < 0.01 cut-off of 30 keywords per map unit [Lagus, Kaski, 1999] 2010 Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 28 Visualizing Data Distributions: U-Matrix and Distance Matrix U-Matrix: visualizes distances between units (model vectors) Distance Matrix: visualizes difference of a unit's model vector to all neighboring units' model vectors Visualizing Data Distributions: U-Matrix and SDH Two methods for visualizing data on top of a learned SOM: U-Matrix: visualizes distances between units (via color) Smoothed Data Histogram (SDH): visualizes (smoothed) density of examples in an area of the map 2010 Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 30

Smoothed Data Histograms (SDH) [Pampalk et al.

smoothing / density estimation: voting matrix whose size equals size of SOM data items vote for a number N of best-matching units best-matching unit gets N points, 2nd best gets N-1 points, N-th best

Space Visualization Space N=1 N=2 N=5 N=7 N=10 N=20 2010 Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 31 2010 Markus Schedl, partly based on material by Gerhard Widmer

8 Smoothed Data Histograms (SDH) [Pampalk et al., 2002] SOM and SDH: An Example display smoothed density of data items associated with areas of the map reveal clusters in the data many pieces associated with a unit cluster center Idea for smoothing / density estimation: voting matrix whose size equals size of SOM data items vote for a number N of best-matching units best-matching unit gets N points, 2nd best gets N-1 points, N-th best gets 1 point, all others get 0 points (N is parameter, spread ) the distribution of votes is visualized over the entire map, e.g., via a color map (interpolated voting matrix for smoothing) Data Space Visualization Space N=1 N=2 N=5 N=7 N=10 N= Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 32 Smoothed Data Histograms SOM and SDH A Sample Application neptune Be aware of influence of color scale on perception! Input: music collection (digital audio files) calculate audio features for each track, e.g. rhythmic [Pampalk, Islands of Music: Analysis, Organization, and Visualization of Music Archives, Diploma Thesis 2001] timbral [Mandel & Ellis, Song-Level Features and Support Vector Machines for Music Classification, ISMIR 2005] train a SOM on audio features calculate an SDH on the SOM visualize SDH in 3D using smoothed voting matrix of SDH as height values build a game-like user interface to explore the user s (or someone else s) music collection Matlab implementations of SOMs and SDHs (Toolboxes): (Google: SOM Toolbox ) (Google: SDH Toolbox ) 2010 Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 34

neptune (2) neptune (3) 2010 Markus Schedl, partly based on

9 neptune (2) neptune (3) 2010 Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 36 neptune (4) neptune (5) 2010 Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 38

Hierarchical Structuring: The Growing Hierarchical Self-Organizing Map (GHSOM) Flat SOM: [Dittenbach et al.

Train SOM on data assigned to parent unit 3. Decision 1: Insert new row or column? If yes: insert new row/column and goto 2 4. Decision 2: hierarchically expand units of map?

10 Hierarchical Structuring: The Growing Hierarchical Self-Organizing Map (GHSOM) Flat SOM: [Dittenbach et al., 2002] The GHSOM Algorithm Start with 1 unit to expand (= mean of data), level 0 Loop until no more units to expand 1. For each unit to expand create new 2x2 SOM (initialize orientation) 2. Train SOM on data assigned to parent unit 3. Decision 1: Insert new row or column? If yes: insert new row/column and goto 2 4. Decision 2: hierarchically expand units of map? If yes: add units to expand list Hierarchical SOM: Decision 1: Insert new row or column if mean quantization error > threshold (i.e., map does not represent the data well); insert new row or column between unit with highest quantization error and adjacent unit with largest distance Decision 2: Expand unit if quantization error of unit > threshold (i.e., unit does not represent its associated data items well) Parameters: same as SOM (except no. of units) + 2 thresholds τ 1, τ Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 41 The GHSOM Algorithm: Decisions 1 and 2 Mean quantization error of unit u i : Voronoi set V i of unit u i : all data items whose BMU is u i 1 MQE = xk m i i Vi k V i where V i = { k uc( k ) = ui} quantifies how well a unit i approximates its data items Mean quantization error of a SOM: MMQE V = i i n MQE i quantifies how well a SOM approximates the data items The GHSOM Algorithm: Decisions 1 (enlarge map) and 2 (insert new map) Decision 1 : Insert new row or column if MMQE > τ 1 MQE 0 where MQE 0 is the MQE of a virtual unit m 0 representing the mean of all instances covered by the parent unit: m 0 = x i n i Decision 2 : Expand unit if MQE i > τ 2 MQE 0 * where MQE 0* is the mean quantization error of the whole dataset with respect to the virtual unit located in the center of the whole dataset (in contrast to MQE 0, which is the mean quantization error of the data items in the respective sub-branch of the GHSOM) Generally: τ 1, τ 2 are chosen such that 1 > τ 1 >> τ 2 > Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 43

The GHSOM Algorithm: Preservation of Orientation Problem: maps of descendants of a unit u i could have arbitrary orientation no visible relationship between different sub-branches (other than common

the parent map for example: initialize the 4 model vectors with the means of the parent vector and each of its 4 immediate neighbors for border units: extrapolate "virtual" units.

codebook of a new sublevel SOM expressed as weighted parent unit(s' neighbors) look like?

11 The GHSOM Algorithm: Preservation of Orientation Problem: maps of descendants of a unit u i could have arbitrary orientation no visible relationship between different sub-branches (other than common parent map) Solution: enforce/encourage a specific orientation of the sub-layer SOMs via initialization initialize the model vectors of the 2x2 SOMs such that they correspond to the orientation of the parent map for example: initialize the 4 model vectors with the means of the parent vector and each of its 4 immediate neighbors for border units: extrapolate "virtual" units. Example: if u i is located on the left border and the unit to its right is u r, create virtual left neighbor u l with m l = m i + (m i m r ) Exercise: How could the initialization function for the codebook of a new sublevel SOM expressed as weighted parent unit(s' neighbors) look like? Hierarchical Map GHSOM on Animals Hierarchical Component Planes 2010 Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 45 GHSOM + SDH: deeptune GHSOM + SDH: deeptune (II) Different Hierarchy Levels 2010 Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 47

12 Visualizing Effects of Changes in Data Definition: Aligned SOMs [Pampalk et al., 2003] Basic concepts: Goal: understand relationship between different ways of representing the same data layers of mutually constrained SOMs (i.e., a stack of SOMs) each layer trained on slightly different data space / view of the data (i.e., different dimensions or distance definitions), but same data items trained so that all layers have same orientation constraints between layers to enforce smooth transitions between views p min Aligned SOMs: The Basic Architecture p max Parameter Values (define different views of the data) Stack of SOMs Use: exploratory analysis of alternative data representations visualize changes in the inherent structure of the data in response to changes in features, relative feature weights, different ways of normalizing features, different similarity functions,... navigation through alternative data spaces 2010 Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 49 Distance between layers (relative to distance between units in same layer) E.g., intra-som distance between neighboring units = 1 inter-som distance "between" same map unit = 1/ Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 50 Initialize all layers Loop Randomly select training instance x and layer l Find best matching unit for x in l Adapt neighborhood of best matching unit (intra- and inter-layer neighborhood) Neighborhood: Aligned SOM: Training (Online version, simplified) within layer between layers 2010 Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 51 Aligned SOM: On-line Learning Input: map of units u li with model vectors m li ("codebook"), l layer training instances X = {x i } a similarity measure sim(.,.) between data items (e.g., Euclidean distance) parameters: α(t) (learning rate [0..1]) and a neighborhood kernel function with parameter r(t) ( neighborhood radius ), e.g., pseudo-gaussian 2 2 (d ij = map distance btw. u li, u kj ) uij ( t) = exp( dij r( t) ) Online SOM Training Algorithm: Initialize each unit (model vector) m li to represent a randomly selected data item (features weighted according to layer-specific weights, e.g., from 1:0 to 0:1) Loop over time steps t, until convergence: 1. Randomly select an example x and a layer l; apply weighting according to view/data space of l x l 2. Find the winning unit (best matching unit) u c with m c = max i (sim(m li,x l )) 3. Adapt model vectors of all units in all layers as m li (t +1) = m li (t) + α(t) u ic (t) [x l m li (t)] 4. Update (decrease) training parameters α(t), r(t) 2010 Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 52

Aligned SOM on Animals Aligned SOM Demos http://www.ofai.at/~elias.

13 Aligned SOM on Animals Aligned SOM Demos Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 54 Literature SOM: [Kohonen, 1982]: Kohonen, T. Self-Organizing Formation of Topologically Correct Feature Maps. Biological Cybernetics, 43: [Kohonen, 2001]: Kohonen, T., volume 30 of Springer Series in Information Sciences. Springer, Berlin, Germany, 3rd edition. [Vesanto, 1999]: Vesanto, J. SOM-Based Data Visualization Methods. Intelligent Data Analysis 3(2): [Vesanto, 2002]: Vesanto, J. Data Exploration Process Based on the Self-Organizing Map. PhD thesis, Helsinki University of Technology, Espoo, Finland. [Pampalk et al., 2002]: Pampalk, E., Rauber, A., and Merkl, D. Using Smoothed Data Histograms for Cluster Visualization in. In Proceedings of the International Conference on Artificial Neural Networks (ICANN 2002), Madrid, Spain. Springer. [Knees et al., 2006]: Knees, P., Pohle, T., Schedl, M., and Widmer, G. Automatically Describing Music on a Map. In Proceedings of the 2nd Workshop on Learning the Semantics of Audio Signals (LSAS 2008), Paris, France, June [Kaski et al., 1998]: WEBSOM of Document Collections, Neurocomputing 21, Literature (II) GHSOM: [Dittenbach et al., 2002]: Dittenbach, M., Rauber, A., and Merkl, D. Uncovering Hierarchical Structure in Data Using the Growing Hierarchical Self-Organizing Map. Neurocomputing, 48(1 4): Aligned SOM: [Pampalk et al. 2003]: Pampalk, E., Goebl, W., Widmer, G. Visualizing Changes in the Structure of Data for Exploratory Feature Selection, In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2003). [Lagus, Kaski, 1999]: Keyword Selection Method for Characterising Text Document Maps, In Proceedings of the International Conference on Artificial Neural Networks (ICANN 1999), London, UK Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees Markus Schedl, partly based on material by Gerhard Widmer and Peter Knees 56

Graph projection techniques for Self-Organizing Maps

Graph projection techniques for Self-Organizing Maps Georg Pölzlbauer 1, Andreas Rauber 1, Michael Dittenbach 2 1- Vienna University of Technology - Department of Software Technology Favoritenstr. 9 11