Spherical Topology Self-Organizing Map Neuron Network for Visualization of Complex Data

Size: px

Start display at page:

Download "Spherical Topology Self-Organizing Map Neuron Network for Visualization of Complex Data"

Barbra Boyd
6 years ago
Views:

Spherical Topology Self-Organizing Map Neuron Network for Visualization of Complex Data Huajie Wu <u4932885@anu.edu.

1 Spherical Topology Self-Organizing Map Neuron Network for Visualization of Complex Data Huajie Wu 4 November 2011 A report submitted for the degree of Master of Computing of Australian National University Supervisor: Prof. Tom Gedeon

2 Acknowledgements Firstly, thanks to my supervisor Tom Gedeon, and Dingyun Zhu for their recommendations and support on this project. Moreover, thanks to Uwe R. Zimmer for his suggestions about the technique of report writing. Finally, thanks to my family and my friends for their encouragements. Page 1

3 Abstract The spherical SOM (SSOM) has been proposed in order to remove the border effect in conventional Self-Organizing Maps (SOM). However, SSOM still has limitations in representing a sequence of events. The concentric spherical Self-Organizing Maps (CSSOM) is proposed in this report, because it can use an arbitrary number of spheres and that topology could be applied in analysis of sequential and time series data. I present a new method to extend SSOM and to reconstruct the neighbors in order to implement concentric spherical Self-Organizing Maps. Moreover, for ease of evaluation, I present the display schemas and several measurements for the quality of SOMs. I present the experimental results. The results indicate that the quality of SOM is improved through using specified CSSOM depending on the characteristics of the dataset. However, the results for sequence training as currently proposed needs improvement. Finally, the quality of clustering becomes worse, as the number of spheres increases and the number of units in each sphere decreases. Key words: Neural Networks, concentric spherical Self-Organizing Maps, time series data, clustering, sequence training List of Abbreviations NN Neural Networks SOM Self-Organizing Map SSOM Spherical Self-Organizing Map CSSOM Concentric Spherical Self-Organizing Map Page 2

4 Contents Acknowledgements... 1 Abstract... 2 List of Abbreviations... 2 List of Figures... 5 List of Tables Introduction Motivation Objectives Contribution Preview Background Neural Networks and Unsupervised Learning Kohonen s SOM Spherical SOMs S-SOM The Algorithm in the Training Process Deformation of S-SOM The Arrangement & Neighborhood Structure The Representations of Distortions and Colors Details of Concentric Spherical Self-Organizing Map Neuron Network Description Architecture of CSSOM Neighborhoods Structure Display Schema Sequence Training Test Suite Purity of Clustering Quantization Error and Topological Error Experiments and Results Experiment 1: the quality of SOMs Description of the experiment Experiment Process and Discussion of Results Experiment 2: Time sequence training Description of the experiment Experiment Process and Discussion of Results Experiment 3: The purity of clustering using CSSOM with different number of spheres Description of the experiment Experiment Process and Discussion of Results Conclusion and Future Works Conclusion Future Work Reference Page 3

5 Appendix A Page 4

6 List of Figures FIGURE 1: EXAMPLE OF A BASIC NN... 9 FIGURE 2: CONVENTIONAL 1D AND 2D ARRANGEMENTS FIGURE 3: THE PROCESS OF UPDATING WEIGHTS FOR CLUSTER UNITS FIGURE 4: 3 D GRAPHICAL OBJECT REPRESENTING THE WISCONSIN CANCER DATA FIGURE 5: THE MAIN INTERFACE OF CSSOM FIGURE 6: DROP-DOWN MENU OF FILE AND POP-UP WINDOW FIGURE 7: THE CHANGES AFTER SELECTING LOADED FILES FIGURE 8: THE GENERAL FLOW OF CSSOM FIGURE 9: THE SUB-STEPS IN INITIALIZATION FIGURE 10: OPTIONS IN TRAINING FIGURE 11: OPTIONS IN DISPLAY SCHEMA FIGURE 12: OPTIONS IN EVALUATION FIGURE 13: 2 D OUTPUT GRID UNITS MAP OF 3 LAYERS OF CSSOM FIGURE 14: SHOWING SPHERE3 AS CENTER, 5 SPHERES OF CHAIN GLYPH DISPLAY SCHEMA FIGURE 15: START FROM SPHERE 1, END WITH SPHERE 4, 5 SPHERES OF EQUAL GLYPH DISPLAY SCHEMA. 21 FIGURE 16: THE DISTRIBUTION OF ITEMS IN CLUSTERS FIGURE 17: THE AVERAGE NUMBER OF NEIGHBORHOODS PER UNITS IN SSOM AND CSSOM FIGURE 18: THE NUMBER OF UNITS WITH DIFFERENT NEIGHBORHOODS. THE DETAILS ARE IN APPENDIX A. 28 FIGURE 19: VISUAL VIEW OF SSOM FROM DIFFERENT ANGLES FIGURE 20: VISUAL VIEW OF CSSOM(15S) FROM SPHERE 1 TO SPHERE FIGURE 21: VISUAL VIEW OF CSSOM IN SEQUENCE TRAINING FROM SPHERE 1 TO SPHERE FIGURE 22: VISUAL VIEW OF CSSOM IN SEQUENCE TRAINING FROM SPHERE 5 TO SPHERE FIGURE 23: VISUAL VIEW OF CSSOM IN PARALLEL TRAINING IN TIME ORDER FROM SPHERE 1 TO SPHERE Page 5

7 List of Tables TABLE 1: VISUAL TABLE FOR PARALLEL TRAINING TABLE 2: VISUAL TABLE FOR SEQUENCE TRAINING TABLE 3: SUMMARY OF DATASETS TABLE 4: QE AND TE OF SOMS USING DATASET IRIS TABLE 5: QE AND TE OF SSOM AND CSSOM USING DATASET ECSH TABLE 6: QE AND TE OF PARALLEL TRAINING AND SEQUENCE TRAINING USING DATASET ECSH TABLE 7: QE AND TE OF PARALLEL TRAINING IN RANDOM ORDER AND IN TIME ORDER USING DATASET ECSH TABLE 8: THE PURITY OF SSOMS WITH DIFFERENT SIZE OF SPHERE TABLE 9: THE PURITY OF CSSOMS WITH DIFFERENT NUMBER OF SPHERES Page 6

8 1. Introduction In 1982, the Self-Organizing Map (SOM) was proposed by Kohonen, and it was primarily used for clustering, classification, sampling and visualizing high dimensional data [3]. Since then, that technique has been widely applied in many ways such as clustering high-frequency financial data. The conventional neighborhood arrangements are planar SOM made of two-dimensional rectangular or hexagonal lattices. However, the planar SOM has a disadvantage which is the border effect [15]. During training, the neurons compete with others. The weight of the winning neuron and its neighbor are updated. Ideally, all the units have the same chance to be updated. However, in the planar map, the units at the border of the map have fewer neighbors than the inside units. At the end of training, the map may not form expected similar regions of the data space, since there are many units with unequal chances of being modified during training [5]. Therefore, many spherical SOMs were proposed in order to solve that problem. In this report, one of these written by Sangole & Leontitsis in 2005 is described and is extended as a base. The disadvantage of most of methods is that the number of neurons in the map is not arbitrary [2]. The reason is that ICOSA N can arrange 2+10*4 N [2] where ICOSA N is the arrangement of that method, and N is the number of recursive subdivisions. Concentric Spherical Self-Organizing Maps based on SOM implement multiple layers of SSOM. Compared with that single layer of SSOM, the number of neurons is more able to be varied. More importantly, the standard SOM also has no way to represent sequential data which could be done in multiple layers in CSSOM. 1.1 Motivation The motivation of this project is to provide a method that the user can use an arbitrary number of spheres, and to observe the results of clustering data as well as the quality of SOMs on multiple spheres (CSSOM). Moreover, the other motivation is to evaluate the effects between time sequence training and parallel training in concentric spherical Self-Organizing Map. 1.2 Objectives The aim of this project is to implement concentric spherical Self-Organizing Map based on Sangole & Leontitsis s SSOM code, and allow a user to visualize data using an arbitrary number of spheres, and investigate a possible way to train CSSOM for time series data. Page 7

9 1.3 Contribution The contribution of this project involves the three following areas. Firstly, the SSOM code written by Sangole & Leontitsis is simplified and modified in order to implement concentric SSOM and display them in different ways. Secondly, evaluation code was written to evaluate the accuracy of clustering and the quality of SOMs. Finally, the sequence training code is designed for CSSOM, and is used to evaluate its difference from parallel training. 1.4 Preview Chapter 2 gives an overview of the relevant techniques and the basic concepts, including Neural Networks, unsupervised training, Self-Organizing Maps and spherical Self-Organizing Maps, which are preparation for understanding CSSOM and its evaluation. Chapter 3 takes a further interpretation and explanation of Sangole & Leontitsis s SSOM. Chapter 4 has a detailed introduction, including the basic description, the architecture of CSSOM, the structure of neighborhood and display schema. Chapter 5 evaluates the results of SSOM and CSSOM for clustering the data and the quality of SOMs, and has an observation about the effects and results different between sequence training and parallel training for specific datasets. Finally, Chapter 6 concludes the report and indicates the weaknesses as well as suggestions for future work. 2. Background 2.1 Neural Networks and Unsupervised Learning A Neural Network (NN) is a mathematical model or computational model which consists of an interconnected group of artificial neurons, and simulates some small amount of human brain structure and function. Moreover, it is an adaptive system which changes its structure based on external information and internal information [17]. In general, it consists of three types of layers which have input neurons, hidden neurons and output neurons respectively. The following picture shows the basic principle of NNs. Page 8

Figure 1: Example of a basic NN In Figure 1, X represents the inputs which might be from other neurons; Y represents the outputs which might be inputs to other neurons or the final outputs, and W

10 Figure 1: Example of a basic NN In Figure 1, X represents the inputs which might be from other neurons; Y represents the outputs which might be inputs to other neurons or the final outputs, and W which represents the strength of the connections between the inputs and the neuron. Finally, the neuron collects all the adaptive inputs, and uses the activation function to generate the output. The activation function could be non-liner function or Gaussian function. The activation function formula is shown below. Where y is the output of the neuron n is the number of the inputs i is the i th of the neuron w is the weight of the input x is the input of the neuron n y = f( w i x i ) i=0 f is the activation function, e.g. for sigmoid, f(x)= 1 1+e x There are three main learning methods: they are supervised learning, unsupervised and reinforcement learning. In this report, unsupervised learning is used in SOM and it will be mainly described. With unsupervised learning, the SOM neural networks can be used to find patterns in data without known categories or labels. In other words, SOM only uses the training dataset s input set to group them, in which case the training data is not organized as input-output patterns. If we do have output patterns, then we can compare them to the clusters of the SOM and can estimate the similarity between the SOM clusters and the output labels, which is similar to the accuracy for a supervised classification method. Page 9

11 2.2 Kohonen s SOM The Self-Organizing Maps algorithm was introduced by Teuvo Kohonen in 1982[9]. Generally, Kohonen s later publications in 1995 [10] and 2001 [11] are regarded as the major references on SOM. Kohonen s description is it is a tool of visualization and analysis of high-dimensional data. Additionally, it is useful for clustering, classification and data mining in different areas. SOM is an unsupervised learning method, the key feature of which is that there are no explicit target outputs or environmental evaluations associated with each input [6]. During the training process, there is no evaluation of correctness of output or supervision. First, it is different from other neural networks, and it only has two layers which are input layer and output layer (or called competition layer) respectively. Every input in input space connects to all the output neurons in the map. The output arrangements are mostly of two dimensions. The Figure 2 shows below conventional 1D and 2D arrangements. y n y n x n x n (a)1d line layout (b)2d plan layout (rectangle) Figure 2: Conventional 1D and 2D arrangements In Figure 2, x n represent the input neurons in input space, y n represents the outputs in the output space. Figure 2.a shows a one dimensional arrangement in form of a line layout. Figure 2.b shows a two-dimensional arrangement in form of rectangular layout. The Figure 2 shows that compared to general NN, SOM has no hidden neurons and the discrete layout of the inputs map to output space in a regular arrangement. Besides the rectangular layout, 2D SOM also has the form of hexagonal arrangement. Next, the main process of Self-Organizing Maps (SOM) is introduced generally. The process is made up three main phases which are competition, cooperation and adaptation [17]. Competition: The output of the neuron in self-organizing map neural network computes the distance (Euclidean distance) between the weight vector and input vector. Then, the competition among the neurons is based on the outputs that they produce, where i(x) indicates the optimal matching input vector x, the formula can be represented: i(x)= arg min j x w j, j=1,2,,l (2.1) In formula 2.1, x is the input vector, w j is the j th neuron s weight vector. It uses Nearest neighbor search, which is interpreted as proximity search, similarity search or closest point search, consists in finding closest points in metric spaces [7]. The neuron j which satisfies the above condition is called the winning neuron. Page 10

12 Cooperation: the winning neuron is located at the center of the neighborhood of topologically cooperating neurons. The winning neuron tends to activate a set of neurons at lateral distances computed by a special function. The distance function must satisfy two requirements: 1) it is symmetric; 2) it decreases monotonically, as the distance increases [8]. A distance function h(n,i) which satisfies the above requirements is Gaussian: h( j, i) = exp( - d j, i 2 / 2σ 2 ) (2.2) In formula 2.2, h(j,i) is the topological area centered around the wining neuron i. The d j,i is the lateral distance between winning neuron i and cooperating neuron j, and σ is the radius influence. Adaption: it is in this phase that the synaptic weights adaptively change. Since these neural networks are self-adaptive, it requires neuron j s synaptic weight w j to be updated toward the input vector x. All neurons in the neighborhood of the winner are updated as well in order to make sure that adjacent neurons have similar weight vectors. The following formula state the weights of each neurons in the neighborhood of the winner are updated: w j =w j +ηh(j,i)*(x-w j ) (2.3) In formula 2.3, η is a learning rate, i is the index of winning neuron, w j is the weight of the neuron j. The h(j,i) function has been shown in equation 2.2. These three phases are repeated during the training, until the changes become less than a predefined threshold. 2.3 Spherical SOMs Compared to 2D normal SOM, spherical SOMs eliminate the border effect. Furthermore, the spherical SOMs have more effective visualization. That is because all neurons have equal geometrical treatment and people may prefer to read the maps from the spheres. There are a number of spherical SOMs which have been implemented and applied to various datasets. The main spherical SOMs topologies are the followings: GeoSOM, S-SOM, 3D-SOM and H-SOM. GeoSOM was proposed by Wu & Takatsuka [5], and uses a 2D rectangular grid data structure to store the icosahedron-based geodesic dome. In Sangole & Leontitsis s S-SOM work [4], every grid unit stores the list of its immediate neighbors. The next chapter has detailed descriptions about S-SOM. Boudjemai [16] applied it in 3D modeling as 3D-SOM, whereas Hirokazu [2] developed H-SOM to arrange the neurons along a helix, which is divided into equal parts. Hirokazu s method allows arbitrary numbers of neurons, but calculating neighbors is quite difficult. There are disadvantages and advantages among these spherical SOMs, discussed in detail by Wu & Takatsuka [5]. Our project is mainly based on Sangole & Leontitsis s S-SOM work, so the following chapter has much more details on S-SOM. Page 11

3. S-SOM The spherical self-organizing feature map proposed by Sangole & Leontisis is a tool mapping randomly organized N-dimensional data into a lower almost 2D surface of a sphere, which is used

13 3. S-SOM The spherical self-organizing feature map proposed by Sangole & Leontisis is a tool mapping randomly organized N-dimensional data into a lower almost 2D surface of a sphere, which is used for visual pattern analysis. 3.1 The Algorithm in the Training Process Before training, the program requires a user to load the SSOM data structure and the data. Furthermore, the parameters like epoch (how many cycles), leaning rate and neighbor parameter should be set first. Then the weights of every cluster units are updated during training. The following flow chart shows the process of the training. Figure 3: The process of updating weights for cluster units p In Figure 3, in step 2, D i,j,k is the difference between the current input vector and the weight vectors for all cluster units, the formula used is: p D i,j,k = (Φ(u i,j,k )+1)* (x p n=1 n w n,i,jj ) 2 (3.1) Where, x p n is the nth input vector, and w n,i,jj is the weight vectors of (i,j,k) th node. The Φ(u i,j,k ) is a count-dependent non-decreasing function used to prevent cluster under-utilization[13]. N In step 3, the winning neuron is the unit at node (i,j,k) with minimum distance. Page 12

14 In step 4, the weights of the winning neuron and its neighborhood units will be updated. The formula is shown below: w i,j,k (new) = w i,j,k (old) + a*[x p n -w i,j,k (old)] (3.2a) Φ(new) =Φ(old) + h(r) (3.2b) h(r)= exp(-r 2 /2R) (3.2c) In 3.2a, a is the learning rate. In 3.2b, Φ is the count-dependent parameter explained in 3.1. In 3.2b, 3.2c, h(r) is the neighborhood function. In 3.2c, R is a radius that span over a half of the spherical SOFM space, and r is the current radius [4]. Finally, if it satisfies the stop condition which is the epoch parameter (the number of cycles) set by users, the training process will terminate. 3.2 Deformation of S-SOM The Arrangement & Neighborhood Structure The output space is made up of the predefined grids units. Compared to other platonic polyhedral, the icosahedron is the most similar to a sphere, and the variance in edge length is the smallest. Most of the vertices have 6 immediate neighbors (adjacent points), except the 12 original vertices of the icosahedron which only have 5 immediate neighbors. After tessellation, the number of vertices can be calculated as: NN=2+10*4 n (3.1) In equation 3.1, NN is the number of vertices in the output space, n represents the number of recursive subdivisions. The program also has a data structure to store all vertices neighbors. In other words, every vertex has a neighborhood list to record its neighbors The Representations of Distortions and Colors As Sangole s[12] description, distortions and colors reflect the magnitude of the similarity measure. The following diagram shows the representations of distortions and colors in visual view. Dataset: Wisconsin breast cancer data, 683 fine needle aspirate tissue samples, pre-classified into two categories: malignant and benign tissue Parameters: 642 grid units on a tessellated sphere are selected. The network is trained for approximately 200 epochs. Page 13

15 Figure 4: 3 D graphical object representing the Wisconsin cancer data In Figure 3, it is not hard to find that there are two significantly different colors which are red and yellow. Red indicates malignant tissue, whereas yellow represents benign tissue. Furthermore, there are two clumps between the boundaries of the colors. Either shape or color, in the graphical form thereby indicates the presence of the distinct attributes as compared to the rest of the input vectors in the data set [12]. 4. Details of Concentric Spherical Self-Organizing Map Neuron Network 4.1 Description This section mainly describes the user interface, and generally introduces the function of every component in the GUI. Page 14

Figure 5: The main interface of CSSOM In Figure 5, Loading Data Structure section displays what input data file and SSOM structure data file the user has selected.

16 Figure 5: The main interface of CSSOM In Figure 5, Loading Data Structure section displays what input data file and SSOM structure data file the user has selected. The instructions how to load these two files will be shown latter. Before training, the Training Parameter should be set, or the parameter will keep the default value (shown in Figure 5). Epochs represents the cycles of parallel training, and it is necessary to set when it has parallel training. Size is the neighborhood parameter which is used in the h function (please refer to equation 3.2c). Spheres represents the number of layers of sphere. Times is the repetitions of sequence training (details in chapter 4.5) and not used in parallel training. Before displaying the glyphs, the Display Parameter also should be set, or this parameter will remain at the default value (shown in Figure 5). CenterSph should be set before plotting the Chain glyph (details in chapter 4.4). StartSph and EndSph should be set before plotting the Equal glyph (details in chapter 4.4). All of these have default values, so the system will work but may not produce the best results. After loading the input data file and SSOM structure data file, the buttons in the Training Buttons section will be visible. The Train button is used for parallel training after setting the epochs parameter. The Sequence Train button is used for sequence training. After the training, the buttons in the Display Glyph Buttons section will be visible. The Plot Chain Glyph button is used to generate the glyph in form of a chain (details in chapter 4.4). The Plot Equal Glyph button is used to generate the glyph consisting of several equal size spheres (details in chapter 4.4). Next, the following Figure shows the sub-menu of File. Page 15

Figure 6: Drop-down menu of File and pop-up window In Figure 6, in the drop-down menu of File, select Load data or Load S-SOFM, then pop-up a window which is the parent

The following Figure shows the changes: Figure 7: The changes after selecting loaded files In Figure 7, The Training Buttons are selectable after loading two files.

2 Architecture of CSSOM The Concentric Spherical Self-Organizing Maps could be interpreted as multiple layers of Sangole & Leontisis S-SOM, and a tool of more effective

17 Figure 6: Drop-down menu of File and pop-up window In Figure 6, in the drop-down menu of File, select Load data or Load S-SOFM, then pop-up a window which is the parent directory of main.m file. After users select the files to load, the Loading Data Structure section in the main interface will show the name of the file users selected. The following Figure shows the changes: Figure 7: The changes after selecting loaded files In Figure 7, The Training Buttons are selectable after loading two files. In the drop-down menu of Help, there is an option About, which pops up a window with information about this software. 4.2 Architecture of CSSOM The Concentric Spherical Self-Organizing Maps could be interpreted as multiple layers of Sangole & Leontisis S-SOM, and a tool of more effective visualized representation for sequence data. The CSSOM is composed of four modules which are the initialization module, training module, display schema module and test suite module. The following diagrams show the flow Page 16

and sequence of the modules and provide an overview of CSSOM s operation principles.

1, the files include the input data and SSOM structure, and the parameters include those shown in Figure 5. In step 1.

3, based on X, C, P (the Cartesian coordinates) and spheres, resize the P and reorganize the neighborhood lists (details in chapter 4.3).

18 and sequence of the modules and provide an overview of CSSOM s operation principles. Figure 8: the general flow of CSSOM Initialization: Figure 9: The sub-steps in Initialization In Figure 9, step 1.1, the files include the input data and SSOM structure, and the parameters include those shown in Figure 5. In step 1.2, all data are saved as variables in the workspace such as X for input vectors, C for the neighbor lists. In step 1.3, based on X, C, P (the Cartesian coordinates) and spheres, resize the P and reorganize the neighborhood lists (details in chapter 4.3). Training: Figure 10: Options in Training Figure 10 shows that there are two types training which are Parallel Training and Sequence Training. Parallel Training has been described in detail in the previous chapter (please refer to chapter 3.1), whereas Sequence Training will be introduced in chapter 4.5. Page 17

Display Schema: Figure 11: Options in Display Schema Figure 11: In order to analyze the data in various views, there are two different types of Display Schema. Chapter 4.4 has more details.

6. 4.3 Neighborhoods Structure Before the training, based on P, C and spheres (representing the cartesian coordinates, neighborhood lists, and number of layers respectively), P and C need to

19 Display Schema: Figure 11: Options in Display Schema Figure 11: In order to analyze the data in various views, there are two different types of Display Schema. Chapter 4.4 has more details. Figure 12: Options in Evaluation In Figure 12 s option 2, QE represents Quantization Error, while TE is for Topological Error. For more details refer to chapter Neighborhoods Structure Before the training, based on P, C and spheres (representing the cartesian coordinates, neighborhood lists, and number of layers respectively), P and C need to be resized and modified. This section focuses on the modifications of the neighborhood structure. Let s demonstrate it according to the following diagram. Figure 13: 2 D output grid units map of 3 layers of CSSOM Page 18

20 In Figure 13, in order to distinguish the cluster units in different spheres, we use different colors, which are also involved in following explanations. Blue represents Sphere 1, Yellow is for Sphere 2, and Red is for Sphere 3. In Sangole & Leontisis SSOM code, there is only one sphere which we assume is Sphere1. If a in Sphere1 is the winning neuron, in a s neighborhood list, when r=1 (r for radius), the immediate neighbors are b g (from b to g), when r=2, the neighbors exactly 2 away are h s, when r=3, the neighbors exactly 3 away are t k 2. Normally, the initial value of r depends on the number of units, which spreads over half of the sphere [4]. If it is implemented in CSSOM and the number of layers is 3, the neighbors of a should be added, which contain the units in Sphere2 and Sphere3. The units in different spheres have connections to each other. Therefore, for r=1, besides b g (from b to g in Sphere1), the neighbors of a include a in Sphere2 and a in Sphere3 (Sphere3 is also adjacent to Sphere1, because all the spheres are considered to be in a loop). When r=2, besides h s (from h to s in Sphere1), the neighbors also include b g (from b to g in Sphere2) and b g (from b to g in Sphere3), and so on. Next, the pseudo code for updating neurons neighbors on the data structure is below: Algorithm 1 updating neurons neighbors on the data structure 1 initialize the neighborhood's data structure, 2 assign n spheres of original neighborhood data structure C to a new one Cnew; 3 //rsize is the radius size, spheres is the number of spheres, 4 //nsize is the number of units, rsdfl is defualt radius size of C. 5 for i = 1 to rsize do 6 for each sphere j do 7 for each neuron k do 8 update the right index of neighbors in the same sphere; 9 //because initialization just expands the size of Cnew, but the index of neighbors is not correct. 10 end for 11 end for 12 end for 13 if rsize bigger or equal spheres then 14 assign spheres - 1 to rsize; 15 end if 16 if no reminder of shperes devided by 2 then 17 assign(spheres / 2-1) to rsize; 18 else 19 assign((spheres + 1) / 2-1) to rsize; 20 end if 21 if rsize is larger than rsdfl then 22 for i = 1 to(rsize - rsdfl) do 23 for each sphere j do Page 19

21 24 for k = 1 to nsize do 25 assign Cnew{k + nsize * (j - 1), i + size(c, 2)} to empty value; 26 //to prevent the subscript exceeds, when assigning values next loop 27 end for 28 end for 29 end for 30 end if 31 assign Cnew to temporary variable CC; 32 for each sphere l do 33 for each neuron i do 34 for j = 1 to rsize do 35 for k = 1 to j do 36 if j - k equal to 0 then 37 add the most adjacent spheres corresponding index to current Cnew; 38 else then 39 add k adjacent spheres' corresponding neuron CC 's r = j - k neighbor to current Cnew; 40 end if 41 end for 42 end for 43 end for 44 end for Display Schema This program provides users with two types of display schema for analyzing data from different perspectives. The Plot Chain Glyph schema focuses on displaying the center sphere which users set as. At the same time, as the connection (the adjacency) with the center sphere decreases, the size of other spheres decrease. This schema is suitable for emphatically analyzing a specified sphere and its adjacent spheres. Moreover, it is good for analyzing the spheres as a whole. Figure 14 shows the display schema. Page 20

However, as the number of spheres increase, the size of all the spheres will become smaller. That problem can be solved by using the Plot Equal Glyph display schema.

22 Figure 14: Showing sphere3 as center, 5 spheres of Chain Glyph display schema In Figure 14, there are 5 spheres, and sphere 3 is set as the center. Sphere 3 is the largest one; sphere 2 and sphere 4 become smaller; sphere 1 and sphere 5 are the smallest because they are the farthest from sphere 3. However, as the number of spheres increase, the size of all the spheres will become smaller. That problem can be solved by using the Plot Equal Glyph display schema. Equal Glyph can show an arbitrary continuous number of equal-size spheres. Furthermore, it provides users optimal visual effects. The following Figure shows the effects. Figure 15: Start from sphere 1, end with sphere 4, 5 spheres of Equal Glyph display schema In Figure 15, the glyph shows the spheres from 1 to 4 with the same size, out of 5 spheres. Once users observe appropriate spheres, the Equal Glyph display schema will provide a good visual effect to users to focus in on that sphere and its adjacent spheres. Whatever display schema the users use, the spheres can be rotated simultaneously for user s preference in analyzing complex data. Page 21

23 4.5 Sequence Training There are two training algorithms which are Parallel Training and Sequence Training respectively mentioned above. Parallel Training is the traditional algorithm, the basic procedure of which is described in detail at chapter 3.1. It is called parallel training here as all the spheres see the data at the same time. That is, a winning unit is chosen from any of these spheres for P1 in the first epoch shown in the first line of Table 1. Generally the patterns are actually in a random order. This section will focus on describing Sequence Training which is proposed here for the data in sequential or chronological order. The following tables will visually demonstrate how the parallel training works followed by the new sequence training. Assume that there are 5 patterns, which are involved in 3 layers of CSSOM. Spheres S1 S2 S3 Epochs 1 P1 P1 P1 1 P2 P2 P2 1 P3 P3 P3 1 P4 P4 P4 1 P5 P5 P5 Table 1: Visual table for Parallel Training Spheres S1 S2 S3 Times 1 P1 P2 P3 2 P2 P3 P4 3 P3 P4 P5 4 P4 P5 P1 5 P5 P1 P2 Table 2: Visual table for Sequence Training In Table 1 and 2, P n represents the n th pattern (the input vector). A pattern in Table 1 is trained on all S 1 n (1 th to n th Sphere) once, whereas a pattern in table 2 is trained on a S n (n th Sphere) in order. The Times parameter which is shown in main interface (figure 5 in chapter 4.1) is set by users before the Sequence Training. It is not difficult to find that after 5 Times, every pattern was involved in all the spheres. Therefore, if the number of Times is equal to the number of patterns, every pattern will be involved in all the spheres once, which is equivalent to parallel training with 1 epoch ( Epochs ). So, the five Times in Table 2 has the same Page 22

24 amount of training as one epoch in Table 1. Next, the pseudo code will shows the Sequence Training algorithm in detail. Algorithm 2 for sequence training 1 initialize the weights; 2 initialize freqnew; 3 get the number of single sphere's neuron as nsize; 4 //nsize will used latter 5 for each repetition (Times) idx do 6 get the remainder after idx is divided by Psize(Patterns' size), save as start_idx 7 // idx is index of times 8 // start_idx can be considered as the index of current training pattern 9 if start_idx is 0 then 10 assign Psize to start_idx; 11 end if 12 for j=1 to the number of sphere 13 the start_idx th pattern minus all weights; 14 calculate the normalization of the distance matrix; 15 obtain the minimum distance in current sphere jth; 16 calculate the new weight for the wining neuron; 17 if neighborhood matrix is given then 18 for k = 1 to r 19 // r for each neighborhood radius 20 calculate the new weight of the neighbors; 21 update the count-dependent parameter for the neighbors 22 //the neighbors in other spheres as well 23 end for 24 end if 25 update the count-dependent parameter; 26 update the training pattern's index (plus 1); 27 end for 28 end for 4.6 Test Suite There are three evaluation criteria for CSSOM, which are (1) distortion and color evaluation, (2) purity of clustering, (3) quantization error and topological error. Distortions and colors are used to analyze the data by visualization. They are adopted in most implementations of spherical SOMs, about which there are descriptions in detail earlier (refer to chapter 3.2.2). This section will concentrate on illustrating the latter two evaluations. Page 23

25 4.6.1 Purity of Clustering Purity is an measure which evaluates how good the quality of clustering is. Assume there are k clusters C (the k in k-means), the total number of items in cluster C j is denoted as C j, and C j class=i denotes the number of items of class i assigned to cluster j, then the purity can be expressed as: purity(c j )= max i ( C j class=i )/ C j (4.1a) The overall purity of a clustering solution can be expressed as: In 4.1b, N is the total number of items. purity= 1 k N j=1 max i ( C j class=i ) (4.1b) There is an example below which illustrates how to calculate the purity in detail. Figure 16: the distribution of items in clusters In Figure 16, the majority class and the number of members of the majority class for 3 clusters are:, 5 in cluster 1;, 3 in cluster 2;, 4 in cluster 3; therefore, the purity is 1/21*(5+3+4) Quantization Error and Topological Error Quantization error and topological error are two widely used measurements which evaluate the quality of a Self-Organizing Map. First, quantization error evaluates how well the neural network map fits the input patterns. This error is the average distance between all data vectors and their best matching units (winning neurons). The formula can be expressed as: n QE= 1 x n j=1 ȷ m xȷ (4.2) Page 24

26 In formula 4.2, n is the number of input vectors, x ȷ is the jth input vector, and m xȷ is the best matching unit of the j th input vector. However, the quantization error cannot describe the topological order of the map, or in other words, it is difficult to measure the topology preservation. Topology preservation is a measurement of how continuous the mapping from input space to the map grid is. Topological error is used to evaluate the complexity of the output space. This error measures the proportion of all data vectors for which first and second best-matching units (BMU) are not adjacent vectors [14]. The formula is expressed as: n TE= 1 u(x n j=1 ) ȷ (4.3a) u(x )= 0 ii x ȷ dddd vvvvvr s fffff aaa ssssss bbbb mmmmmhiii uuuuu aaa aaaaaaaa ȷ 1 ii x ȷ dddd vvvvvr s fffff aaa ssssss bbbb mmmmmhiii uuuuu aaa nnn aaaaaaaa (4.3b) In 4.3a and 4.3b, x ȷ is jth input vector and n is the number of input vectors. Therefore, lower topological error has better topology preservation. In contrast, a high topology error indicates that the output space is complex and it is hard to preserve the topology, so it recommends reducing the size of the network [3]. That is because as the size of the network reduces, the possibility that first-best and second-best matching units are adjacent in output space increase, so the TE becomes lower. 5. Experiments and Results 5.1 Experiment 1: the quality of SOMs Description of the experiment In this experiment, the main task is to analyze and evaluate the quality of different topologies. There are two figures below used to observe SOMs quality in each emphasis. The first one compares the quantization errors and topological errors among Kohonen s SOM, Sangole s S-SOM and concentric SSOM using dataset IRIS. The second table will focus on comparing S-SOM and CSSOM using the more complex dataset ECSH. Dataset Description: IRIS The dataset consist of 150 patterns which 50 are Iris Setosa, 50 are Iris Versicolour and 50 are Iris Virginica. There are four attributes for the input data which are sepal length, sepal width, petal length and petal width respectively. More detailed descriptions and the data files can be Page 25

27 found on the UCI University of California Irvine machine learning data set repository website 1. ECSH The data set is for a participant in a larger study who read paragraphs in Easy, Calm, Stressful, Hard order. There are two sheets where one sheet has data for a Calm paragraph and the other for Stressful, and in this experiment, the data is for a Calm paragraph. It reflects 60Hz recordings and spans one minute. There are 3,641 input vectors each of which has 7 attributes as listed below: xgaze - x gaze point on a 1680 (width) x 1050 (height) monitor ygaze - y gaze point on a 1680 x 1050 monitor pupilldiam - pupil diameter of left eye pupilrdiam - pupil diameter of right eye ecg - ECG gsr - GSR bp - blood pressure The table below shows the summary of those datasets. Dataset Number of input dimensions Number of observations Missing Value Data Characteristic s IRIS no Multivariate ECSH yes Time sequence Table 3: Summary of datasets Experiment Process and Discussion of Results First, we use the IRIS dataset to compare the quantization errors and topological errors of SOM, SSOM and CSSOM. The parameter epoch is set to 44. The parameter neighborhood size(s) is set to 0.5. Those parameters are set according to the nature of dataset. Unfortunately, inferences in this regard cannot be made based on existing methods for estimating the SOFM parameters [1], therefore, the values selected are the optimal ones based on repeated experiments. For ease of the comparisons between those topology methods, similar number of units in each is selected. The appropriate number is 162, especially in CSSOM, 14 spheres of 12 units grid (=168 the closet to 162). Every value in Table 4 is the average of all values in 20-times repeated runs. The results are shown below: ERRORS SOM SSOM CSSOM(14S) QE TE Table 4: QE and TE of SOMs using dataset IRIS 1 Page 26

28 In Table 4, we can see that both QE and TE of SSOM, CSSOM (14 spheres of 12 units grid) is lower than the conventional SOM s. That is because SSOM and CSSOM have solved the border effects problem, and every unit has much the same chance to update the weights. As a result, the errors are decreased. However, QE of CSSOM is slightly higher than SSOM. It is probably because the complexity of the neighborhood structure has been increased when the CSSOM is used. There are not any differences in TE. For ease of comparisons between SSOM and CSSOM, we use more complex data to obtain some more results. The ECSH dataset is used which has 3,641 patterns. Therefore, the appropriate number of units is 2,562 in SSOM, because it is closest to the number of input patters. For CSSOM, we select a number of units which is closest to 2,562. The parameter epoch is set to 20. The parameter neighborhood size(s) is set to 0.5. The results are shown below: ERROR SSOM CSSOM(4S) CSSOM(15S) CSSOM(61S) CSSOM(214S) QE TE Table 5: QE and TE of SSOM and CSSOM using dataset ECSH In the table 5, SSOM is a single SOM with a 2,562-units grid map; CSSOM(4S) represents 4 spheres of 642-units grid map(4*642=2,568); CSSOM(15S) represents 15 spheres of 162-units grid map(15*162=2,582); CSSOM(61S) represents 61 spheres of 42-units grid map(61*42=2,562); CSSOM(214S) represents 214 spheres of 12-units grid map(214*12=2,568); It is interesting to find that TE in table 5 is directly related to the average number of neighbors per units shown in Figure 17. Obviously, CSSOM(61S) has the smallest topological error (TE), since the average number of neighbors per unit in CSSOM(61S) is largest. In other word, as the average number of neighborhoods per unit increases, TE will decrease. That is because TE measures the proportion of all data vectors for which first and second best-matching units (BMU) are not adjacent vectors, and the probability of adjacent vectors increase, as the number of neighbors per unit increase (under that condition they has the same or similar size of network). At the same time, table 5 shows QE of CSSOM(15S) is lower than SSOM, whereas the rest of CSSOM is higher. Figure 18 shows the trend of the number of units with different number of neighbors decreases, as the number of units in a singer layer decreases. Obviously, all 1,267 units in CSSOM(214S) have the same number of neighbors, whereas the units in SSOM has 30 variants with different neighborhoods (see Appendix A). The number of units with different numbers of neighbors is larger, so the uniformity is worse. Therefore, CSSOM(214S) is most uniform, and the others are less. The uniformity of the arrangement is one of the factors impacting the QE. As the number of units with different number of neighbors becomes larger, more and more units have unequal chances to be updated. Therefore, it will lead to influences on forming expected similar regions of data space. At the same time, other factors like the complexity of the units-grid map and the connections between the units have an impact on QE. Page 27

29 Therefore, it might be the reason CSSOM(15S) has the smallest QE Average number of neighborhoods per unit SSOM 684 CSSOM(4S) 376 CSSOM(15S) CSSOM(61S) CSSOM(214S) Figure 17: The average number of neighborhoods per units in SSOM and CSSOM The number of units with different neighborhoods the number of units with different neighbors Figure 18: The number of units with different neighborhoods. The details are in Appendix A Next, we analyze the quality of SSOM and CSSOM(15S) in the form of visual representations. As the representations of distortions and colors are mentioned in Chapter 3.2.2, it reflects on the magnitude of the similarity measure. The representation in Figure 19 is from SSOM in Table 5 above, one of the experiments with in QE and in TE. Page 28

Figure 19: visual view of SSOM from different angles Compared to SSOM, besides

CSSOM has a more visible representation on the connections between the spheres.

shows that there are more similarities between spheres 4 to 11. 5.

30 Figure 19: visual view of SSOM from different angles Compared to SSOM, besides reflecting the connections and similarity in the same sphere, the visual view of CSSOM has a more visible representation on the connections between the spheres. Figure 20 shows the view of CSSOM(15S) in Table 5 above, one of the experiments with in QE and in TE. Figure 20: visual view of CSSOM(15S) from sphere 1 to sphere 15 Figure 20 clearly shows that there are more similarities between spheres 4 to Experiment 2: Time sequence training Description of the experiment Because the standard SOM has no way to represent sequential data, but which could be done by Page 29

31 multiple layers, we propose sequence training for CSSOM in order to observe the effects and results. These two points are our expectations: (1) better TE or QE; (2) Normal distributions of colors and distortions in the spheres. Normal distribution means there are obvious areas with distinct colors and distortions in one sphere, or the glyph shows continuous spheres which seem to be similar in the distribution of the colors and distortions. Therefore, the experiment is divided into two sections to evaluate the results and the effects. The dataset selected is ECSH which is the same as the previous experiment, and chapter has the detailed description Experiment Process and Discussion of Results Using the results from the traditional training (called parallel training here), we use one of the experiments above which is CSSOM(15S) with best results (Table 5) and effects (Figure 21). For ease of comparisons with that parallel training, the parameter Epochs is set to be 20, Times (repetitions) is set to be 3,641, Size (neighborhood size parameter) and Spheres are set to be 0.5 and 15 respectively. The units-grid map selected is ICOSA 2 (162 units). The table below shows the average TE and QE after equivalent parallel training and sequence training. ERRORS Parallel Training Sequence Training QE TE Table 6: QE and TE of Parallel Training and Sequence Training using dataset ECSH In table 6, it is disappointing to find that neither QE nor TE has found any improvements with the sequence training. Furthermore, QE in sequence training is about 8 times higher than QE in parallel training, whereas TE in sequence training is approximate 3 times higher than QE in parallel training. Figure 20 above shows one of CSSOM s experiments in parallel training, with in QE and in TE, whereas, under the same conditions, Figure 21 shows one of sequence training, with in QE and in TE. In Figure 20, we can find obvious different areas in the same sphere and corresponding similarities among these spheres. However, in Figure 21, there are no obvious similarities among these spheres. Figure 22 clearly show both colors and distortions are irregular and inconsistent from sphere 5 to sphere 10. Page 30

Figure 21: Visual view of CSSOM in sequence training from sphere 1 to sphere 15 Figure 22: Visual view of CSSOM in sequence training from sphere 5 to sphere 10 We cannot obtain the desired results

32 Figure 21: Visual view of CSSOM in sequence training from sphere 1 to sphere 15 Figure 22: Visual view of CSSOM in sequence training from sphere 5 to sphere 10 We cannot obtain the desired results and effects in sequence training, possibly because of the number of epochs, the size of units-grid map, and even the sequence training algorithm. Next, an experiment is described below to validate the assumption that it is not sequential training that is the problem. The patterns are randomly selected in default parallel training above in every epoch. For observing the differences, we modify the sequence of patterns to be in order. All the parameters remain the same values with those in parallel training in table 7. It is the TE/QE table with 20-times repeated runs below: Page 31

33 ERRORS Parallel Training(random order) Parallel Training(time order) QE TE Table 7: QE and TE of Parallel Training in random order and in time order using dataset ECSH The QE of the parallel training in time order (data order) is much higher than that of the parallel training in random order, and there are only slight differences in TE between them. Those results imply that when the patterns are randomly selected in training, this is good for the quality of SOM. Figure 23 shows an example of parallel training in time order, with QE and TE. Figure 23: visual view of CSSOM in parallel training in time order from sphere 1 to sphere 15 Compared to the training in random order, the training in time order (the dataset s order) has fewer spheres with similarities. Despite this, there are still obviously distinguishable color areas in each sphere. In a word, those results suggest that the sequence training as proposed is not suitable for CSSOM. Since the QE for the parallel training in time order is still lower than sequence training, this suggests that the sequence training can be improved. Page 32

34 5.3 Experiment 3: The purity of clustering using CSSOM with different number of spheres Description of the experiment Purity is an evaluation measure of the quality of clustering. Chapter has the description in detail. In this section, we conduct an experiment to observe the variation trend of the quality of clustering as the number of spheres increase. The IRIS dataset is used in this experiment, and 105 of 150 patterns are used as training set (35 are Iris Setosa, 35 are Iris Versicolour and 35 are Iris Virginica), whereas 45 patterns are used for testing. The patterns in the testing set have labels for the output (0 for Setosa, 1 for Versicolour and 2 for Virginica). All the following experiments k-value is set as Experiment Process and Discussion of Results First, we use default SSOM structures with different size of spheres (ICOSA 0, ICOSA 1, ICOSA 2, ICOSA 3 and ICOSA 4 ) to test the purity of SOMs for the IRIS dataset. SSOM ICOSA 0 ICOSA 1 ICOSA 2 ICOSA 3 ICOSA 4 Purity 89.67% 94.11% 94.35% 91.89% 92.00% Table 8: The purity of SSOMs with different size of sphere From Table 8, we can find that ICOSA 2 (162 units) is the most appropriate for clustering the IRIS dataset, rather than ICOSA 4. In other word, as the units increase, the purity might not continue to increase. Therefore, the following experiment selects CSSOMs with same or similar number of units which is closed to 162 in total, but with different number of spheres. The following experiment focuses on comparing the purity of 1-sphere-ICOSA 2 (162 units), 4-spheres-ICOSA 1 (42 units, 168 in total), and 14-spheres-ICOSA 0 (12 units, 168 in total). The parameter epoch is set to 20. The parameter neighborhood size(s) is set to 0.5. Table 9 is the corresponding purity of clustering of 20 times (the average value). CSSOM ICOSA 2 (1S) ICOSA 1 (4S) ICOSA 0 (14S) Purity 94.35% 92.52% 92.12% Table 9: The purity of CSSOMs with different number of spheres In table 9, it is easy to find that the quality of clustering is worse as the number of spheres in CSSOM increases. The reason might be that as the number of spheres in CSSOM increases, the distribution of the clusters becomes more and more complicated. Page 33

35 6. Conclusion and Future Works 6.1 Conclusion In this report, we propose a new approach for the arrangement of neurons in spherical SOMs, which is CSSOM with arbitrary number of spheres. User can select arbitrary number of spheres according to the objectives of various experiments. First, CSSOM has better quality than conventional SOM, since it has lower QE and TE. Second, the QE and TE among SSOM and CSSOMs (different number of spheres, but with the approximate number of units in total) are different. If a better topology preservation and greater uniformity of units neighborhood are desired, the CSSOM with fewer units in one sphere is recommended. However, the input patterns might not fit the neural map well (high QE). Therefore, users should have the balance between them according to different demands. Finally, the results of the experiments show that the sequence training we proposed at the present stage for CSSOM needs improvement. 6.2 Future Work In the project, we focus on implementation instead of the experiment, so the datasets used are not diverse enough. More experiments with diverse datasets should be conducted in the future. Moreover, the corresponding points X (the cartesian coordinates of the points on the sphere) in every sphere are the same in the initialization phase. This is not reasonable, because it might have some influences on QE. About X s initialization, it requires a small scale of adjustment according to the length of R (neighborhood radius). Of course, it is also the future work that r (neighborhood radius parameter) and epochs are adjustable automatically depending on the dataset s characteristics, in order to achieve optimum performance. Finally, although the experiment shows the sequence training we proposed for the time series data in CSSOM does not work well, improvements are possible. For example, for each epoch, the first pattern could be selected randomly for training, but patterns order (time series) should be still kept sequential. Future work should also involve experiments with many datasets to provide a rule of thumb for how to choose sphere size and number of units. For example it may be better to find an appropriate good size, and then try multiple spheres. Page 34

Learning algorithms and temporal structures for SOM

Learning algorithms and temporal structures for SOM Qian Wu November 203 A report submitted for the degree of Master of Computing of Australian National University Supervisors: Prof.