A Study on Clustering Method by Self-Organizing Map and Information Criteria

Similar documents
Influence of Neighbor Size for Initial Node Exchange of SOM Learning

Self-Organizing Maps for cyclic and unbounded graphs

Production of Video Images by Computer Controlled Cameras and Its Application to TV Conference System

Time Series Prediction as a Problem of Missing Values: Application to ESTSP2007 and NN3 Competition Benchmarks

Projection of undirected and non-positional graphs using self organizing maps

Function approximation using RBF network. 10 basis functions and 25 data points.

Two-step Modified SOM for Parallel Calculation

A Population Based Convergence Criterion for Self-Organizing Maps

Texture Classification by Combining Local Binary Pattern Features and a Self-Organizing Map

Seismic regionalization based on an artificial neural network

Feature Subset Selection for Logistic Regression via Mixed Integer Optimization

Robust Event Boundary Detection in Sensor Networks A Mixture Model Based Approach

Image Segmentation Using Iterated Graph Cuts BasedonMulti-scaleSmoothing

Cluster Analysis using Spherical SOM

Application of genetic algorithms and Kohonen networks to cluster analysis

Binarization of Color Character Strings in Scene Images Using K-means Clustering and Support Vector Machines

SOM+EOF for Finding Missing Values

Supervised vs.unsupervised Learning

Fuzzy Modeling using Vector Quantization with Supervised Learning

Relation Organization of SOM Initial Map by Improved Node Exchange

An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification

Modular network SOM : Theory, algorithm and applications

Unsupervised Learning

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation

Self-Organizing Feature Map. Kazuhiro MINAMIMOTO Kazushi IKEDA Kenji NAKAYAMA

Similarity Image Retrieval System Using Hierarchical Classification

Image Segmentation Using Iterated Graph Cuts Based on Multi-scale Smoothing

The Projected Dip-means Clustering Algorithm

Color Image Segmentation

Automatic Group-Outlier Detection

Swarm Based Fuzzy Clustering with Partition Validity

Reducing topological defects in self-organizing maps using multiple scale neighborhood functions

Performance Analysis of Data Mining Classification Techniques

To be Bernoulli or to be Gaussian, for a Restricted Boltzmann Machine

Evaluation of the Performance of O(log 2 M) Self-Organizing Map Algorithm without Neighborhood Learning

BRACE: A Paradigm For the Discretization of Continuously Valued Data

Fuzzy-Kernel Learning Vector Quantization

Unsupervised Learning for Hierarchical Clustering Using Statistical Information

CHAPTER 4 AN IMPROVED INITIALIZATION METHOD FOR FUZZY C-MEANS CLUSTERING USING DENSITY BASED APPROACH

Cellular Learning Automata-Based Color Image Segmentation using Adaptive Chains

CloNI: clustering of JN -interval discretization

Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection

FAST HUMAN DETECTION USING TEMPLATE MATCHING FOR GRADIENT IMAGES AND ASC DESCRIPTORS BASED ON SUBTRACTION STEREO

Information Criteria Methods in SAS for Multiple Linear Regression Models

Robust Lip Contour Extraction using Separability of Multi-Dimensional Distributions

Some questions of consensus building using co-association

Sparsity issues in self-organizing-maps for structures

Image Segmentation for Image Object Extraction

Effect of Grouping in Vector Recognition System Based on SOM

Segmentation and Object Detection with Gabor Filters and Cumulative Histograms

Efficient Pruning Method for Ensemble Self-Generating Neural Networks

Artificial Neural Networks Unsupervised learning: SOM

Chapter 7: Competitive learning, clustering, and self-organizing maps

Multi-Clustering Centers Approach to Enhancing the Performance of SOM Clustering Ability

Image Classification Using Wavelet Coefficients in Low-pass Bands

THE discrete multi-valued neuron was presented by N.

Algorithm That Mimics Human Perceptual Grouping of Dot Patterns

Cluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6

Free Projection SOM: A New Method For SOM-Based Cluster Visualization

Online algorithms for clustering problems

Further Applications of a Particle Visualization Framework

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Shape Modeling of A String And Recognition Using Distance Sensor

Ensemble Combination for Solving the Parameter Selection Problem in Image Segmentation

KANSEI Based Clothing Fabric Image Retrieval

DESIGN OF KOHONEN SELF-ORGANIZING MAP WITH REDUCED STRUCTURE

Nonlinear dimensionality reduction of large datasets for data exploration

Controlling the spread of dynamic self-organising maps

A novel firing rule for training Kohonen selforganising

Towards Automatic Recognition of Fonts using Genetic Approach

Improving Classifier Performance by Imputing Missing Values using Discretization Method

A NEW ALGORITHM FOR OPTIMIZING THE SELF- ORGANIZING MAP

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

Machine Learning Based Autonomous Network Flow Identifying Method

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks

Parameter Selection for EM Clustering Using Information Criterion and PDDP

Planar Symmetry Detection by Random Sampling and Voting Process

A faster model selection criterion for OP-ELM and OP-KNN: Hannan-Quinn criterion

COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS

COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS

An Efficient Approach for Color Pattern Matching Using Image Mining

Machine Learning (BSMC-GA 4439) Wenke Liu

Figure (5) Kohonen Self-Organized Map

11/14/2010 Intelligent Systems and Soft Computing 1

Stability Assessment of Electric Power Systems using Growing Neural Gas and Self-Organizing Maps

PATTERN RECOGNITION USING NEURAL NETWORKS

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

A Simple Automated Void Defect Detection for Poor Contrast X-ray Images of BGA

Modified Self-Organizing Mixture Network for Probability Density Estimation and Classification

An Efficient Method for Extracting Fuzzy Classification Rules from High Dimensional Data

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

Extract an Essential Skeleton of a Character as a Graph from a Character Image

The Analysis of Traffic of IP Packets using CGH. Self Organizing Map

A NON-ADAPTIVE DISTRIBUTED SYSTEM-LEVEL DIAGNOSIS METHOD FOR COMPUTER NETWORKS

An Approach for Fuzzy Modeling based on Self-Organizing Feature Maps Neural Network

ECG782: Multidimensional Digital Signal Processing

Bioimage Informatics

Hardware Realization of Panoramic Camera with Direction of Speaker Estimation and a Panoramic Image Generation Function

Mineral Exploation Using Neural Netowrks

Transcription:

A Study on Clustering Method by Self-Organizing Map and Information Criteria Satoru Kato, Tadashi Horiuchi,andYoshioItoh Matsue College of Technology, 4-4 Nishi-ikuma, Matsue, Shimane 90-88, JAPAN, kato@matsue-ct.ac.jp Tottori University, 4-0 Koyama-cho minami, Tottori 80-80, JAPAN Abstract. In this paper, we propose a clustering method by and information criteria. In this method, initial cluster-candidates are derived by, and then these candidates are merged appropriately based on information criterion such as BIC or AIC (Akaike Information Criterion). Through the clustering experiments for the artificial datasets and UCI Machine Learning Repository s datasets, we confirm that our proposed method can extract clusters more accurately and stably than the only method. Introduction Clustering by Self-Organizing Map ([]) can extract clusters of arbitrary distribution shapes based on the distance between the code-vectors (representative points of the input data)[]. In recent, there are several improvemental methods which alter the basic algorithm [3][4]. Hence, this is one of the distancebased clustering approaches. On the other hand, there are distribution-based clustering approaches that consider the distribution of input data when extracting clusters appropriately. For example, x-means method [] adopts Bayesian Information Criterion (BIC) into k-means method. Information criteria are also easily introduced into the clustering method by. In this paper, we propose a clustering method by and information criteria. In the proposing method, initial cluster-candidates are derived by, and then these candidates are merged appropriately based on the information criterion such as BIC or AIC (Akaike Information Criterion). Through the clustering experiments for the artificial datasets and UCI Machine Learning Repository s datasets, we confirm that our proposed method can extract clusters more accurately and stably than the -only method. Furthermore, we show that AIC is suitable for the proposed method compared to BIC.

Clustering by. Basic algorithm, proposed by Kohonen, is configured as shown in Fig.. In the basic learning algorithm [], the code vectors are updated by using the following equations w i (t +)=w i (t)+α(t) Φ(p i )(x w i (t)) () Φ(p i )=exp ( p i ) σ (t) () Here α(t) is the learning coefficient after t learning steps. The coefficient starts from its initial value α ini, and then decreases monotonically as t increases, thus reaching its minimum at the pre-set maximum number of learning steps T. In addition, Φ(p i ) is a neighborhood function with the center at winner cell c, and p i is the distance from cell i to the winner cell c in the competitive layer. In Eq., σ(t) is a time-varying parameter that defines the neighborhood size in the competitive layer. Like α(t) in Eq., this parameter decreases monotonically from σ ini as learning proceeds. As a result of learning, the similarity between learning data is expressed by the closeness on the grid in the competitive layer. In addition, the data density in the input data space is reflected in the distribution of code vectors after learning.. Cluster extraction from In the maps built by learning, the code vectors between adjacent cells in the grid of competitive layer are similar, and the data density in the input layer is reflected in distribution of code vectors after learning. Using these features, as pointed out by Terashima et al. [], allows clustering by the detection of cluster boundaries as portions where the code vectors between adjacent cells are substantially different. The specific clustering procedure is presented below. In addition, one-dimensional is used for simplicity of analysis; m cells in the competitive layer are arranged in one-dimensional array.. Map building The input data are subjected to learning to obtain a set of code vectors.. Map analysis (a) For every cell i(i =,,..., m ), the code vector density dw i is found from following equations as the Euclidean distance between code vectors for cells i and i +: dw i = w i w i+ (3)

3 Neuron cell i Input layer......... Input vector : x Competitive layer Code vector : Wi Fig.. Basic structure one-dimensional (b) The code vector density dw i for every cell i(i =,,..., m ) is normalized to its maximum and minimum in the range 0, thus obtaining the normalized density dw i : dw i = dw i dw i min dw i max dw i min (4) (c) The histogram of dw i is derived. A cluster boundary is recognized between cell i corresponding to the histogram peak and its neighbor cell i +. 3. Labeling The competitive layer is divided according to the dw i group of cells is labeled appropriately. histogram, and every 3 Proposed method 3. Basic idea There are many upward-peaks in the density histogram of code-vector. Each of these may indicate a boundary of clusters obviously or not, so that many cluster-candidates can be extracted from the density histogram. Basic idea of the proposing method is appropriate mergence of these cluster-candidates by using of information criteria under following procedures. A. Make code-vector density(dw i ) histogram after learning process of onedimensional. B. Extract cluster-candidates from the density histogram and assign continuous number to each candidate. These numbers are ordered correspondingly to that of neuron cells in the competitive-layer of. C. Decide which cluster-candidates should be merged from arbitrary pair of candidates whose numbers are adjoining each other. Fig. shows a practical sequence of the proposed method. Through the procedure A, we obtain Fig.(a) and (b). And Fig.(c) is obtained after the procedure B. Then, cluster-candidates are gradually merged by applying the procedure C over and over until the number of clusters is agree with original one. (See Fig.(d)-(f)).

4 Cell9 Cell3 Cell Cell Cell Cell7 Cell Input data Code-vector dw i 0.9 0.8 0.7 0. 0. 0.4 0.3 0. Cell7 Cell Cell 0. Cell3 Cell Cell9 Cell 0 0 0 Cell number 30 7 8 4 3 (a) Distribution of code-vectors (b) Density histogram of code-vectors (c) Cluster-candidates (Initial state) 34 34 34 7 78 78 8 (d) st. mergence (e) nd. mergence (f) 3rd. mergence Fig.. Clustering process by using proposed method 3. BIC and AIC When a distribution of data x is observed, a family of alternative model which generate the distribution can be considered. Information criterion is one of the useful guideline to determine which model is the most suitable. Bayesian Information Criterion (BIC) and Akaike Information Criterion (AIC) are typical one, calculated by the following equation respectively. BIC = logl(ˆθ; x)+q log n () AIC = logl(ˆθ; x)+q () Where, q is dimension of the parameter vector ˆθ and n is the number of samples of empirical distribution. And L( ) = f( ),wheref( ) isp-dimensional Gaussiandistribution: f(ˆθ; x) =(π) p V exp { } (x V (x μ) (7) μ)t

In Eq.() and Eq.(), first term is logarithmic likelihood when a model described by parameter ˆθ is applied to the empirical distribution of x and second term indicates complexity of the model which is called penalty term. 3.3 Cluster mergence by using information criterion The procedure for selective cluster mergence (see Sec.3. procedure C) is divided into the following processes in practice. Here, it is assumed that the procedure A and B in Sec.3. have already finished. C. Merge a pair of adjoining cluster-candidates temporally, and calculate the two values what we call IC single and IC twin by using either Eq.() or Eq.(). Where, IC single and IC twin means the value of either BIC or AIC when an applied distribution model for the unified clusters is single-distribution or twin-distribution respectively. C. Calculate ΔIC which means a difference between IC single and IC twin by the following equation. ΔIC = IC single IC twin (8) C3. After calculation of ΔIC for all pairs of adjoining cluster candidates, find the pair which has minimum ΔIC and merge two cluster candidates included in the pair conclusively. Then, consecutive numbers for the cluster candidates including the new cluster are refreshed. C4. Repeat the procedure C to C3 until the number of clusters reaches a specified value. IC single < IC twin when fitting the single-distribution to the unified clusters is more suitable than the case of twin-distribution. Therefore, ΔIC can measure a degree of propriety to merge two adjoining clusters. 4 Clustering experiments 4. Experimental method We use four kinds of data distribution as experimental dataset. Two datasets are generated artificially to consist of two or three clusters whose density or distribution shape is different. Each of another two datasets is UCI Iris and BCW dataset from UCI ML repository[] as examples of practical data. Performance evaluation is carried out by using a degree of the classification error. Classification error is calculated by comparing the indices of the original dataset with which is obtained from the clustering result. When applying learning algorithm in proposed method, we set the iteration of learning is 00 times of the number of the input data and the number of cells of the competitive layer is set to,0,,30 or 3. We make 00 trials for each setting of learning and apply cluster mergence procedure with either BIC or AIC for each learning result so that 00 kinds of clustering result (00 trials patterns of the size of competitive layer) for each dataset.

0.8 Class Class Class 3 4 Class Class 0. 3 0.4 0. 0 0 0 0. 0.4 0. 0.8 - - - 0 3 4 (a) Artificial dataset (Different densities, 3 clusters) (b) Artificial dataset (Distorted distributions, clusters) nd. principal component Iris Setosa Iris Versicolor Iris Verginica nd. Principal component BCW_class(benign) BCW_class(malignant) st. principal component st. Principal component (c) UCI Iris data (d) UCI BCW data Fig. 3. Artificial and practical dataset for the clustering experiments 4. Experimental result Fig.4 shows the result of performance evaluation of clustering for each dataset. We calculate the average value of classification error through 00 trials for each the five pattern of s competitive layer size. So that in the legend of each figure, Worst, Average and Best indicate maximum, average and minimum value of the average classification error respectively among the five patterns of settings. +BIC and +AIC correspond to proposed method and only is conventional method which extracts clusters from histogram of codevector density such as shown in Fig.(b) with appropriate threshold setting. In the case of artificial dataset and UCI BCW dataset, only method shows very high classification error. These dataset are including clusters whose density is quite different each other and it is hard to estimate the boundary of clusters correctly only by using code-vector density histogram. On the other

7 Classification error (%) 0 Worst Average Best 40 49% Classification error (%) Worst Average 4 Best 30 0 0 % 0.78% 0.78% 3.4% 0.9% 0.9%.% k-means +BIC +AIC only k-means +BIC +AIC only (a) Artificial dataset (b) Artificial dataset Classification error (%) Classification error (%) 0 40 Worst Average Best 0 Worst Average Best 3% 30 0 0 4% 7.8% 4.9% 0% 0 3.9%.%.3% k-means +BIC +AIC only k-means +BIC +AIC only (c) UCI Iris data (d) UCI BCW data Fig. 4. Comparison of clustering performance hand, proposed method can extract each cluster in the dataset more accurately than another methods except k-means method in the case of BCW dataset. BCW dataset contains comparatively high-dimensional data (each data has 0 attributes), therefore distribution-based approaches such as +BIC and +AIC may not be able to estimate parameters such as μ and V in the Eq.(7) correctly of the distribution model Looking at the classification error of +BIC and +AIC, both these methods show almost same clustering performance excepting the case of UCI Iris dataset. In Eq.(), the penalty term includes variable of the number of samples n. AndΔIC becomes small if the value of n is large. Hence in the case of +BIC method, one cluster candidate which has a large number of samples tends to take adjoining candidates one after another.

8 Conclusion In this paper, we tried to combine clustering methodology with appropriate cluster mergence approach by using information criteria such as BIC and AIC. Since it can pay attention to a naturalness for each data distribution as a cluster, proposed method can extract clusters more correctly than conventional methods especially when the dataset consists of clusters whose density is different each other. From the results of clustering experiments using several kind of artificial and practical dataset, proposed method shows less classification error than other conventional method such as k-means and -based simple clustering method. Furthermore, we confirmed that AIC is suitable for the proposed method compared to BIC. It is necessary to use more kind of practical dataset for examination of effectiveness of the proposed method as a future work. References. T. Kohonen: Self-Organizing Maps, 3 rd ed., Springer-Verlag Berlin (00). M. Terashima, F. Shiratani, K. Yamamoto: Unsupervised Cluster Segmentation Method Using Data Density Histogram on Self-Organizing Feature Map, IEICE Trans., Vol.J79-D-II, No.7, pp.80 90 (99)(in Japanese) 3. S. Kato, K. Koike, T. Horiuchi and Y. Itoh: A Study on Two-Stage Self-Organizing Map Suitable for Clustering Problems Proceedings of the 00 International Symposium on Intelligent Signal Processing and Communication Systems, pp.77 80, 00. 4. H. Matsushita and Y. Nishio: Reunifying Self-Organizing Map and Disconnecting Self-Organizing Map, RISP Journal of Signal Processing, Vol., no., pp.44 4, 007.. D. Pelleg, and A. Moore, X-means: Extending K-means with Efficient Estimation of the Number of Clusters, Proc. of the 7th International Conference on Machine Learning, pp.77 734, 000.. UCI Machine Learning Repository, http://www.ics.uci.edu/ mlearn/mlrepository.html