Preprocessing DWML, /33

Similar documents
Data Warehousing and Machine Learning

Unsupervised Learning

Clustering CS 550: Machine Learning

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

CHAPTER 4: CLUSTER ANALYSIS

DATA PREPROCESSING. Pronalaženje skrivenog znanja Bojan Furlan

Data Preprocessing. Slides by: Shree Jaswal

Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms

Supervised vs.unsupervised Learning

2. Data Preprocessing

Data Preprocessing. Komate AMPHAWAN

ECLT 5810 Clustering

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

ECLT 5810 Clustering

Data Mining and Analytics. Introduction

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

3. Data Preprocessing. 3.1 Introduction

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

arxiv: v1 [physics.data-an] 27 Sep 2007

Cluster Analysis: Agglomerate Hierarchical Clustering

Unsupervised Learning and Clustering

UNIT 2 Data Preprocessing

A Dendrogram. Bioinformatics (Lec 17)

Data Warehousing and Machine Learning

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Data Mining Concepts & Techniques

Basic Data Mining Technique

Cluster Analysis. Ying Shen, SSE, Tongji University

Unsupervised Learning and Clustering

Clustering Basic Concepts and Algorithms 1

Data Mining. Kohonen Networks. Data Mining Course: Sharif University of Technology 1

AND NUMERICAL SUMMARIES. Chapter 2

Figure (5) Kohonen Self-Organized Map

Clustering in Data Mining

CSE4334/5334 DATA MINING

Exploratory Data Analysis using Self-Organizing Maps. Madhumanti Ray

Hierarchical Clustering 4/5/17

Clustering COMS 4771

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Table Of Contents: xix Foreword to Second Edition

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2

University of Florida CISE department Gator Engineering. Clustering Part 2

Chapter 3: Supervised Learning

Contents. Foreword to Second Edition. Acknowledgments About the Authors

CSE 40171: Artificial Intelligence. Learning from Data: Unsupervised Learning

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

Clustering. Content. Typical Applications. Clustering: Unsupervised data mining technique

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Gene Clustering & Classification

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015

K-Means Clustering 3/3/17

COMP90049 Knowledge Technologies

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

Clustering & Classification (chapter 15)

Chapter 6: Cluster Analysis

Contents. Preface to the Second Edition

Introduction to Data Mining and Data Analytics

CSE 5243 INTRO. TO DATA MINING

Hierarchical Clustering

Finding Clusters 1 / 60

Unsupervised Learning. Pantelis P. Analytis. Introduction. Finding structure in graphs. Clustering analysis. Dimensionality reduction.

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

4. Cluster Analysis. Francesc J. Ferri. Dept. d Informàtica. Universitat de València. Febrer F.J. Ferri (Univ. València) AIRF 2/ / 1

University of Florida CISE department Gator Engineering. Clustering Part 5

Clustering part II 1

CS570: Introduction to Data Mining

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

CSE 5243 INTRO. TO DATA MINING

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality

ECLT 5810 Data Preprocessing. Prof. Wai Lam

Unsupervised Learning

Clustering Part 4 DBSCAN

Machine Learning using MapReduce

Data Mining: Data. What is Data? Lecture Notes for Chapter 2. Introduction to Data Mining. Properties of Attribute Values. Types of Attributes

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Machine Learning Classifiers and Boosting

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic

By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

Based on Raymond J. Mooney s slides

Clustering algorithms

INF 4300 Classification III Anne Solberg The agenda today:

Artificial Neural Networks Unsupervised learning: SOM

Clustering in Ratemaking: Applications in Territories Clustering

Transcription:

Preprocessing DWML, 2007 1/33

Preprocessing Before you can start on the actual data mining, the data may require some preprocessing: Attributes may be redundant. Values may be missing. The data contains outliers. The data is not in a suitable format. The values appear inconsistent. Garbage in, garbage out DWML, 2007 2/33

Preprocessing Data Cleaning ID Zip Gander Income Age Marital status Transaction amount 1001 10048 M 75000 C M 5000 1002 J2S7K7 F -40000 40 W 4000 1003 90210 10000000 45 S 7000 1004 6269 M 50000 0 S 1000 1005 55101 F 99999 30 D 3000 DWML, 2007 3/33

Preprocessing Data Cleaning ID Zip Gander Income Age Marital status Transaction amount 1001 10048 M 75000 C M 5000 1002 J2S7K7 F -40000 40 W 4000 1003 90210 10000000 45 S 7000 1004 6269 M 50000 0 S 1000 1005 55101 F 99999 30 D 3000 Correct zip code? DWML, 2007 3/33

Preprocessing Data Cleaning ID Zip Gander Income Age Marital status Transaction amount 1001 10048 M 75000 C M 5000 1002 J2S7K7 F -40000 40 W 4000 1003 90210 10000000 45 S 7000 1004 6269 M 50000 0 S 1000 1005 55101 F 99999 30 D 3000 Correct zip code? DWML, 2007 3/33

Preprocessing Data Cleaning ID Zip Gander Income Age Marital status Transaction amount 1001 10048 M 75000 C M 5000 1002 J2S7K7 F -40000 40 W 4000 1003 90210?? 10000000 45 S 7000 1004 6269 M 50000 0 S 1000 1005 55101 F 99999 30 D 3000 Missing value! DWML, 2007 3/33

Preprocessing Data Cleaning ID Zip Gander Income Age Marital status Transaction amount 1001 10048 M 75000 C M 5000 1002 J2S7K7 F -40000 40 W 4000 1003 90210 10000000 45 S 7000 1004 6269 M 50000 0 S 1000 1005 55101 F 99999 30 D 3000 Error/outlier! DWML, 2007 3/33

Preprocessing Data Cleaning ID Zip Gander Income Age Marital status Transaction amount 1001 10048 M 75000 C M 5000 1002 J2S7K7 F -40000 40 W 4000 1003 90210 10000000 45 S 7000 1004 6269 M 50000 0 S 1000 1005 55101 F 99999 30 D 3000 Error! DWML, 2007 3/33

Preprocessing Data Cleaning ID Zip Gander Income Age Marital status Transaction amount 1001 10048 M 75000 C M 5000 1002 J2S7K7 F -40000 40 W 4000 1003 90210 10000000 45 S 7000 1004 6269 M 50000 0 S 1000 1005 55101 F 99999 30 D 3000 Unexpected precision. DWML, 2007 3/33

Preprocessing Data Cleaning ID Zip Gander Income Age Marital status Transaction amount 1001 10048 M 75000 C M 5000 1002 J2S7K7 F -40000 40 W 4000 1003 90210 10000000 45 S 7000 1004 6269 M 50000 0 S 1000 1005 55101 F 99999 30 D 3000 Categorical value? DWML, 2007 3/33

Preprocessing Data Cleaning ID Zip Gander Income Age Marital status Transaction amount 1001 10048 M 75000 C M 5000 1002 J2S7K7 F -40000 40 W 4000 1003 90210 10000000 45 S 7000 1004 6269 M 50000 0 S 1000 1005 55101 F 99999 30 D 3000 Error/missing value? DWML, 2007 3/33

Preprocessing Data Cleaning ID Zip Gander Income Age Marital status Transaction amount 1001 10048 M 75000 C M 5000 1002 J2S7K7 F -40000 40 W 4000 1003 90210 10000000 45 S 7000 1004 6269 M 50000 0 S 1000 1005 55101 F 99999 30 D 3000 Other issues: What are the semantics of the marital status? What is the unit of measure for the transaction field? DWML, 2007 3/33

Preprocessing Missing Values In many real world data bases you will be faced with the problem of missing data: Id. Savings Assets Income Credit Risk ($ 1000s) 1 Medium High 75 Good 2 Low Low 50 Bad 3 25 Bad 4 Medium Medium Good 5 Low Medium 100 Good 6 High High 25 Good 7 Low 25 Bad 8 Medium Medium 75 Good By simply discarding the records with missing data we might unintentionally bias the data. DWML, 2007 4/33

Preprocessing Missing Values Possible strategies for handling missing data: Use a predefined constant. Use the mean (for numerical variables) or the mode (for categorical values). Use a value drawn randomly form the observed distribution. Id. Savings Assets Income Credit Risk ($ 1000s) 1 Medium High 75 Good 2 Low Low 50 Bad 3 25 Bad 4 Medium Medium Good 5 Low Medium 100 Good 6 High High 25 Good 7 Low 25 Bad 8 Medium Medium 75 Good DWML, 2007 5/33

Preprocessing Missing Values Possible strategies for handling missing data: Use a predefined constant. Use the mean (for numerical variables) or the mode (for categorical values). Use a value drawn randomly form the observed distribution. Id. Savings Assets Income Credit Risk ($ 1000s) 1 Medium High 75 Good 2 Low Low 50 Bad 3 Low 25 Bad 4 Medium Medium Good 5 Low Medium 100 Good 6 High High 25 Good 7 Low 25 Bad 8 Medium Medium 75 Good Both Low and Medium are modes for savings. DWML, 2007 5/33

Preprocessing Missing Values Possible strategies for handling missing data: Use a predefined constant. Use the mean (for numerical variables) or the mode (for categorical values). Use a value drawn randomly form the observed distribution. Id. Savings Assets Income Credit Risk ($ 1000s) 1 Medium High 75 Good 2 Low Low 50 Bad 3 Low High 25 Bad 4 Medium Medium Good 5 Low Medium 100 Good 6 High High 25 Good 7 Low Medium 25 Bad 8 Medium Medium 75 Good High and Medium are drawn randomly from the observed distribution for Assets. DWML, 2007 5/33

Preprocessing Missing Values Possible strategies for handling missing data: Use a predefined constant. Use the mean (for numerical variables) or the mode (for categorical values). Use a value drawn randomly form the observed distribution. Id. Savings Assets Income Credit Risk ($ 1000s) 1 Medium High 75 Good 2 Low Low 50 Bad 3 Low High 25 Bad 4 Medium Medium 54 Good 5 Low Medium 100 Good 6 High High 25 Good 7 Low Medium 25 Bad 8 Medium Medium 75 Good 54 75 + 50 + 25 + 100 + 25 + 25 + 75. 7 DWML, 2007 5/33

Preprocessing Discretization Some data mining algorithms can only handle discrete attributes. Possible solution: Divide the continuous range into intervals. Example: (Income, Risk) = (25, B),(25, B),(50, G),(51, B),(54, G),(75, G),(75, G)(100, G),(100, G) Unsupervised discretization Equal width binning (width 25): Equal frequency binning (bin density 3): Bin 1: 25, 25 [25, 50) Bin 2: 50, 51, 54 [50,75) Bin 3: 75, 75, 100, 100 [75, 100] Bin 1: 25, 25, 50 [25, 50.5) Bin 2: 51, 54, 75, 75 [50.5, 87.5) Bin 3: 100, 100 [87.5, 100] DWML, 2007 6/33

Preprocessing Supervised discretization Take the class distribution into account when selecting the intervals. For example, recursively bisect the interval by selecting the split point giving the highest information gain:» S v Gain(S, v) = Ent(S) S Until some stopping criteria is met. Ent(S v ) + S >v Ent(S >v ) S (Income, Risk) = (25, B),(25, B),(50, G),(51, B),(54, G),(75, G),(75, G)(100, G),(100, G) Ent(S) = 3 9 log 2 3 9 + 6 9 log 2 «6 9 = 0.9183 Split E-Ent Interval 25 0.4602 (, 25],(25, ) 50 0.7395 (, 50],(50, ) 51 0.3606 (, 51],(51, ) 54 0.5394 (, 54],(54, ) 75 0.7663 (, 75],(75, ) DWML, 2007 7/33

Preprocessing Data Transformation Some data mining tools tends to give variables with a large range a higher significance than variables with a smaller range. For example, Age versus income. DWML, 2007 8/33

Preprocessing Data Transformation Some data mining tools tends to give variables with a large range a higher significance than variables with a smaller range. For example, Age versus income. The typical approach is to standardize the scales: 1 Min-Max Normalization: 0.8 X = X min(x) max(x) min(x). normalized values 0.6 0.4 0.2 A1 A2 0-20 0 20 40 60 80 100 120 original values DWML, 2007 8/33

Preprocessing Data Transformation Some data mining tools tends to give variables with a large range a higher significance than variables with a smaller range. For example, Age versus income. The typical approach is to standardize the scales: 1 Min-Max Normalization: 0.8 X = X min(x) max(x) min(x). normalized values 0.6 0.4 0.2 A1 A2 0-20 0 20 40 60 80 100 120 original values 3 Z-score standardization: X = X mean(x) SD(X). standardized values 2 1 0-1 -2-3 A1 A2-4 -20 0 20 40 60 80 100 120 original values DWML, 2007 8/33

Preprocessing Outliers Data: 1, 2,3,3,4, 4,5,5,6, 6,6,6,7,7, 8,8,8,20. 4 3.5 3 2.5 2 Summary statistics: First quartile (1Q): 25% of the data = 4. Second quartile (2Q): 50% of the data = 6. 1.5 1 0.5 Third quartile (3Q): 75% of the data = 7. Interquartile range IQR = 3Q 1Q = 3. 0 0 5 10 15 20 DWML, 2007 9/33

Preprocessing Outliers Data: 1, 2,3,3,4, 4,5,5,6, 6,6,6,7,7, 8,8,8,20. 4 3.5 3 2.5 2 1.5 1 0.5 Summary statistics: First quartile (1Q): 25% of the data = 4. Second quartile (2Q): 50% of the data = 6. Third quartile (3Q): 75% of the data = 7. Interquartile range IQR = 3Q 1Q = 3. 0 0 5 10 15 20 A data point may be an outlier if: It is lower than 1Q 1.5 IQR = 4 1.5 3 = 0.5. It is higher than 3Q + 1.5 IQR = 7 + 1.5 3 = 11.5. DWML, 2007 9/33

Clustering DWML, 2007 10/33

Clustering Unlabeled Data The Iris data with class labels removed: Attributes SL SW PL PW 5.1 3.5 1.4 0.2 4.9 3.0 1.4 0.2 6.3 2.9 6.0 2.1 6.3 2.5 4.9 1.5............ Unlabeled data in general: (discrete or continuous) attributes, no class variable. DWML, 2007 11/33

Clustering Clustering A clustering of the data S = s 1,..., s N consists of a set C = {c 1,..., c k } of cluster labels, and a cluster assignment ca : S C. Clustering Iris with C = {blue, red}: Note: a clustering partitions the datapoints, not necessarily the instance space. When cluster labels have no particular significance, can identify clustering also with partition S = S 1... S k where S i = ca 1 (c i ). DWML, 2007 12/33

Clustering Clustering goal Instance Space Between cluster distances Within cluster distances A candidate clustering (indicated by colors) of data cases in instance space. Arrows indicate between- and within-cluster distances (selected). General goal: find clustering with large between-cluster variation (sum of between-cluster distances), and small within-cluster variation (sum of within-cluster distances). Concrete goal varies according to exact distance definition. DWML, 2007 13/33

Clustering Examples Group plants/animals into families or related species, based on - morphological features - molecular features Identify types of customers based on attributes in a database (can then be targeted by special advertising campaigns) Web mining: group web-pages according to content DWML, 2007 14/33

Clustering Clustering vs. Classification The cluster label can be interpreted as a hidden class variable that is never observed whose number of states is unknown on which the distribution of attribute values depends Clustering is often called unsupervised learning, vs. the supervised learning of classifiers: in supervised learning correct class labels for the training data are provided to the learning algorithm by a supervisor, or teacher. One key problem in clustering is determining the right number of clusters. Two different approaches: Partition-based clustering Hierarchical clustering All clustering methods require a distance measure on the instance space! DWML, 2007 15/33

Clustering Partition-based Clustering Number k of clusters fixed (user defined). Partition data into k clusters. k-means clustering Assume that there is a distance function d(s, s ) defined between data items we can compute the mean value of a collection {s 1,..., s l } of data items Initialize: randomly pick initial cluster centers c = c 1,..., c k from S repeat for i = 1,..., k S i := {s S c i = arg min c c d(c, s)} c old,i := c i c i := mean S i ca(s) := c i (s S i ) until c = c old DWML, 2007 16/33

Clustering Example k = 3: DWML, 2007 17/33

Clustering Example k = 3: c 1 c 2 c 3 DWML, 2007 17/33

Clustering Example k = 3: c 1 c 2 c 3 S 1 S 2 S 3 DWML, 2007 17/33

Clustering Example k = 3: c 1 c 2 c 3 S 1 S 2 S 3 DWML, 2007 17/33

Clustering Example k = 3: c 1 c 2 c 3 S 1 S 2 S 3 DWML, 2007 17/33

Clustering Example k = 3: c 1 c 2 c 3 S 1 S 2 S 3 DWML, 2007 17/33

Clustering Example k = 3: c 1 c 2 c 3 S 1 S 2 S 3 DWML, 2007 17/33

Clustering Example k = 3: c 1 c 2 c 3 S 1 S 2 S 3 DWML, 2007 17/33

Clustering Example k = 3: c 1 c 2 c 3 S 1 S 2 S 3 DWML, 2007 17/33

Clustering Example k = 3: c 1 c 2 c 3 S 1 S 2 S 3 DWML, 2007 17/33

Clustering Example(cont.) Result for clustering the same data with k = 2: c 1 c 2 S 1 S 2 Result can depend on choice of initial cluster centers! DWML, 2007 18/33

Clustering Outliers The result of partitional clustering can be skewed by outliers. Example with k = 2: useful preprocessing: outlier detection and elimination. DWML, 2007 19/33

Hierarchical Clustering Hierarchical clustering The right number of clusters may not only be unknown, it may also be quite ambiguous: DWML, 2007 20/33

Hierarchical Clustering Hierarchical clustering The right number of clusters may not only be unknown, it may also be quite ambiguous: DWML, 2007 20/33

Hierarchical Clustering Hierarchical clustering The right number of clusters may not only be unknown, it may also be quite ambiguous: DWML, 2007 20/33

Hierarchical Clustering Hierarchical clustering The right number of clusters may not only be unknown, it may also be quite ambiguous: Provide an explicit representation of nested clusterings of different granularity DWML, 2007 20/33

Hierarchical Clustering Agglomerative hierarchical clustering Extend distance function d(s, s ) to distance function D(S, S ) between sets of data items. Two out of many possibilities: D average (S, S ) := 1 S S X s S,s S d(s, s ) D min (S, S ) := min s S,s S d(s, s ) for i = 1,..., N: S i := {s i } while current partition S 1... S k of S contains more than one element (i, j) := arg min i,j 1,...,k D(S i, S j ) form new partition by merging S i and S j. When D average is used, this is also called average link clustering; for D min : single link clustering. DWML, 2007 21/33

Hierarchical Clustering DWML, 2007 22/33

Hierarchical Clustering DWML, 2007 22/33

Hierarchical Clustering DWML, 2007 22/33

Hierarchical Clustering DWML, 2007 22/33

Hierarchical Clustering DWML, 2007 22/33

Hierarchical Clustering DWML, 2007 22/33

Hierarchical Clustering DWML, 2007 22/33

Hierarchical Clustering DWML, 2007 22/33

Hierarchical Clustering DWML, 2007 22/33

Hierarchical Clustering DWML, 2007 22/33

Hierarchical Clustering DWML, 2007 22/33

Hierarchical Clustering DWML, 2007 22/33

Hierarchical Clustering DWML, 2007 22/33

Hierarchical Clustering Dendrogram Representation of Hierarchical Clustering Distance of merged components DWML, 2007 23/33

Hierarchical Clustering Dendrogram Representation of Hierarchical Clustering Distance of merged components 3 clustering 5 clustering The length of the distance interval correponding to a specific clustering can be interpreted as a measure for the significance of this particular clustering DWML, 2007 23/33

Hierarchical Clustering Single link vs. Average link DWML, 2007 24/33

Hierarchical Clustering Single link vs. Average link 4-clustering for single link and average link DWML, 2007 24/33

Hierarchical Clustering Single link vs. Average link 4-clustering for single link and average link single link 2-clustering DWML, 2007 24/33

Hierarchical Clustering Single link vs. Average link 4-clustering for single link and average link single link 2-clustering average link 2-clustering DWML, 2007 24/33

Hierarchical Clustering Single link vs. Average link 4-clustering for single link and average link single link 2-clustering average link 2-clustering Generally: single link will produce rather elongated, linear clusters, average link more convex clusters DWML, 2007 24/33

Hierarchical Clustering Another Example DWML, 2007 25/33

Hierarchical Clustering Another Example single link 2-clustering DWML, 2007 25/33

Hierarchical Clustering Another Example average link 2-clustering (or similar) DWML, 2007 25/33

Self Organizing Maps DWML, 2007 26/33

Self Organizing Maps SOMs as Special Neural Networks Input Layer Output Layer Neural network structure without hidden layers Output neurons structured as two-dimensional array Connection from ith input to jth output has weight w i,j No activation function for output nodes DWML, 2007 27/33

Self Organizing Maps Kohonen Learning Given: Unlabeled data a 1,...,a N R n Distance measure d n (, ) on R n Distance measure d out (, ) on output neurons Update function η(t, d) : N R R; decreasing in t and d. 1. Initialize weight vectors w (0) j for output nodes o j 2. t := 0 3. repeat 4. t := t + 1 5. for i = 1,..., N 6. let o j be the output neuron minimizing d n (w j,a i ). 7. for all output nodes o h : 8. w (t) h := w(t 1) h + η(t, d out (o h, o j ))(a i w (t 1) h ) 9. until termination condition applies DWML, 2007 28/33

Self Organizing Maps Distances etc. Possible choices: d n : Euclidean d out (o j, o h ): e.g. 1 if o j, o h are neighbors (rectangular or hexagonal layout), or Euclidean distance on grid indices η(t, d): e.g. α(t)exp( d 2 /2σ 2 (t)) with α(t), σ(t) decreasing in t. DWML, 2007 29/33

Self Organizing Maps Intuition SOM learning can be understood as fitting a 2-dimensional surface to the data: o 1,0 o 1,1 o 0,0 o 0,1 Colors indicate association with different output neurons, not data attributes. Some output neurons may not have any associated data cases. DWML, 2007 30/33

Self Organizing Maps Example (from Tan et al.) Data: Word occurrence data (?) from 3204 articles from the Los Angeles Times with (hidden) section labels Entertainment, Financial, Foreign, Metro, National, Sports. Result of SOM clustering on 4 4 hexagonal grid: Density Sports Sports Metro Metro low Sports Sports Metro Foreign Entertainment Metro Metro National high Entertainment Metro Financial Financial Output nodes labelled with majority label of associated cases and colored according to number of cases associated with it (fictional). DWML, 2007 31/33

Self Organizing Maps SOMs and k-means In spite of its roots in neural networks, SOMs are more closely related to k-means clustering: Weight vectors wj are cluster centers Kohonen updating associates data cases with cluster centers, and repositions cluster centers to fit associated data cases Differences: - 2-dim. spatial relationship among cluster centers - Data cases associated with more than one cluster center - On-line updating (one case at a time) DWML, 2007 32/33

Self Organizing Maps Pros and Cons + Provides more insight than a basic clustering (i.e. partitioning of data) + Can produce intuitive representations of clustering results - No well-defined objective function that is optimized DWML, 2007 33/33