COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS Toomas Kirt Supervisor: Leo Võhandu Tallinn Technical University Toomas.Kirt@mail.ee Abstract: Key words: For the visualisation of multidimensional financial data sets we have used the Self- Organising Maps (SOM) by T. Kohonen [6,8]. SOM is one of the very useful neural computing methods for analysing and visualising multidimensional data. To achieve better computational speed, one has to reduce the dimensionality of the original data during data pre-processing stage. One of the best-known methods for that is the Principal Component Analysis (PCA). In our research project we have used one alternative effective method for the dimensionality reduction. This method is called the Peeling Method (Võhandu, Krusberg, 1977). Main difference between these two methods is that PCA finds out principal components that are optimised linear combinations of all original data variables (so there is no real reduction of data), but the Peeling method finds first out most important variables that describe the system correlations in the best possible way using only some (or most) of variables. To illustrate results between different dimensionality reduction methods, we have used financial data of Estonian banks. Our reduction method enables to detach almost half of the original variables from the original data and to get practically the same results afterwards using SOM as with all the data. Self- Organising Maps, Neural Networks, Dimensionality Reduction, Data Mining, Peeling Method 1
2 Toomas Kirt 1. INTRODUCTION The Self- Organising Maps (SOM) is a very useful neural computing method for analysing and visualising multidimensional data. SOM can be used for translating multidimensional financial data into simple twodimensional maps. SOM groups similar input data vectors which are near each other in the input space to nearby map units in the SOM. The SOM can thus be used as a clustering tool. To achieve better computational speed, it is possible to reduce dimensionality of the original data during data preprocessing. One of the well-known methods for the dimensionality reduction is the Principal Component Analysis (PCA). In this paper we use also another dimensionality reduction technique the Peeling Method. Our goal in this paper is to compare two dimensionality reduction methods. To visualise two methods we use financial data of Estonian banks. Our goal is to compare results created by SOM using original data and data with reduced dimensionality. This paper is divided into four parts. In the first part we give short overview of the SOM algorithm and visualisation methods. In the second part we give an overview of the PCA s main properties. In the third part we introduce the Peeling method and in the fourth part we use above described methods with the financial data of Estonian banks and compare results. 2. SELF ORGANISING MAPS (SOM) A self-organising map is a feedforward neural network that uses an unsupervised training algorithm, and through a process called selforganisation, it configures the output units into a topological representation of the original data [2,5]. The SOM belongs to a general class of neural network methods, which are non-linear regression techniques that can be trained to learn or find relationships between inputs and outputs or to organise data so as to disclose so far unknown patterns or structures [1]. The algorithm is based on unsupervised, competitive learning. The algorithm provides a topology preserving mapping from high-dimensional space to map units. Map units, or neurones, usually from a two-dimensional space grid and thus the mapping is a mapping from a high-dimensional space onto plane. The property of topology preserving means that the SOM groups similar input data vectors on neurons: points that are near each other in the input space are mapped to nearby map units in the SOM. The SOM can thus serve as a clustering tool as well as a tool for visualising high-dimensional data.
Combined method to visualise and reduce dimensionality of the financial data sets 3 The process of creating a self-organising map requires two layers of processing units. The first is an input layer containing processing units for each element in the input vector; the second is an output layer or a grid of processing units that is fully connected with those at the input layer. User, depending on how the map will be used, can define size of an output layer. The learning process goes on as follows. At first the output grid will be initialised with initial values that could be random values from input space. One sample will be taken from input variables and it will be presented to the output grid of the map. All the neurons in the output layer compete with each other to become a winner. The winner will be the output node that is the closest to the sample vector according to the Euclidean distance. The weights of the winner neuron will be changed closer to the sample vector, moved in direction of the input sample. The weights of the neurons in the neighbourhood of the winner unit will also be changed. During the process of learning the learning rate becomes smaller and also the rate of change declines around the neighbourhood of the winning neuron. At the end of the training only the winning unit is adjusted. As a result of the self- organising process the similar input data vectors are mapped to nearby map units in the SOM. The Unified distance matrix (U-matrix) method is usually used to visualise the structure of the input space of a self -organising feature map. The U-matrix method can be used for getting an impression of otherwise invisible structures in a multidimensional data space and it allows classifying data sets into groups of similar data points (self-organised classification or clustering). One of the simplest U-matrix methods is to sum up the distances of weight vectors of adjacent neurons on a feature map. An U-matrix gives a picture of topology of the unit layer and therefore also of the topology of the input spaces as follows: altitude in the U-matrix encodes dissimilarity in the input space. Valleys in the U-matrix (i.e. low altitudes) correspond to input vectors that are similar. [10] So the clusters in a multidimensional data set can be identified if all the points falling into the same valley of an U-matrix are grouped together. Furthermore the height of the walls or hills on an U-matrix gives a hint how much the classes differ from each other. Finally the properties of Self- Organising Maps ensure that similar groups are situated nearby in an U- matrix.
4 Toomas Kirt 3. PRINCIPAL COMPONENT ANALYSIS The Principal Component Analysis (PCA) is a technique commonly used for data reduction in statistical pattern recognition and signal processing. It is also known as Karhunen- Loève Transform [3,9]. In Principal Component Analysis each component of the projected vector is a linear combination of the components of the original data item. The projection is formed by multiplying each component by a certain fixed scalar coefficient and adding the results together. Mathematical methods exist for finding the optimal coefficients such that the variance of the data after the projection will be preserved, whereby it is also closest to the variance of the original data. N Let X R be a random n-dimensional vector representing the environment of interest. Our goal is to generate features that are optimally uncorrelated, that is E[y(i)y(j)]=0, i j. Let Y=A T X. From the definition of the correlation matrix we have R Y E[YY T ]=E[A T XX T A]=A T R X A. (3.1) However, R X is a symmetric matrix, and hence its eigenvectors are mutually orthogonal. Thus, if matrix A is chosen so that columns are the orthonormal eigenvectors a 1,a 2,..,a N, of R X, then R Y is diagonal R Y =A T R X A=Λ (3.2) where λ is the diagonal matrix having as elements on its diagonal the respective eigenvalues λ 1,λ 2,,λ N, of R X. Let the corresponding eigenvalues be arranged in decreasing order: λ 1 >λ 2 > >λ j > >λ N so that λ 1 = λ max. An important property of PCA is Mean square error approximation. If we choose in m xˆ = y( i) (3.3) i= 1 a i the eigenvectors corresponding to the m, m N, largest eigenvalues of the correlation matrix, then the MSE is minimised, being the sum of N-m smallest eigenvalues. Another property of PCA is property of total variance. Let E[X] be zero and y be the PCA transformed vector of X. The eigenvalues of the input correlation matrix are equal to the variances of the transformed features [ y i ] 2 ( i 2 σ Y ( i) E ) = λ. (3.4) Thus, selecting those features, y( i) a largest eigenvalues makes their sum variance λ maximum. i i T i x, corresponding to the m Those properties allow choosing m principal components that retain most of the total variance associated with the original random variables [7].
Combined method to visualise and reduce dimensionality of the financial data sets 5 4. PEELING METHOD The Peeling method [11] finds first out most important variables that describe the system correlations in the best possible way using only some (or most) of variables. Algorithm works as following: 1. For a correlation matrix R 1 r12... r1 m r21 1... r2 m R =............ (4.1) rm1 rm 2... 1 we calculate for every column 2. For ( k ) S = max S (4.3) j S m 2 rij i= j = 1 rjj (4.2) (That is the number of the most important in average variable in the system. Superscript shows the number of the iteration (k=1,,r m) 3. The correlation coefficients of the maximal variable will be divided by the square root of the diagonal element r jj of the matrix R. The transformed column vector b 1 is the first vector of the new factor matrix B. 4. Find the residual matrix (1) ' R = R b1b1 (4.4) 5. Repeat the process r times, where r m is the rank of R. According to the elimination order we take first r variables and use them in following activities. 5. CASE STUDY: SOM OF ESTONIAN BANKS To illustrate two methods we have used financial reports of Estonian banks. We used 92 different reports from the period 1997-1998 and each report consists of 16 variables. It is not very remarkable amount of data, but our goal is just to compare visual results achieved by the different methods. Every node on the map represents one or more reports of the banks. Firstly we created a U-matrix of the original data and the result is given on the Figure 1.
6 Toomas Kirt Figure 1. SOM of Estonian banks 16 variables Secondly we applied to the original data PCA method and selected out nine linear combination that describe more than 95% of the variation. After creating SOM from reduced data we got result as showed on the Figure 2. Figure 2. SOM of Estonian banks PCA 9 variables At the third attempt we have used the Peeling method. We eliminated six variables and used ten original variables. Result we can see on Figure 3.
Combined method to visualise and reduce dimensionality of the financial data sets 7 Figure 3. SOM of Estonian banks Peeling Method 10 variables As we can see, the structures of the three maps are similar. It means that we can get practically the same results without using all data. Calculation of the SOM took respectively 17, 10 and 11 seconds. It means that the process was approximately 44% and 38% faster when we used 9 and 10 variables instead of 16. 6. CONCLUSION We have introduced two possible ways for the dimensionality reduction and compared them. As we saw from the maps there are only small differences between maps made of data with reduced dimensionality and original data. Despite achieved results we should take into account that calculating a correlation matrix and eigenvalues is computing-consuming activity. Therefore in the further research we would like to turn our attention to random mapping method suggested by Sami Kaski [4]. REFERENCES [1] Deboeck G, Kohonen T, Visual exploration in finance: with self-organising maps, Springer, Berlin, 1998 [2] Haykin S Neural Networks, Prentice Hall, New Jersey, 1999 [3] Jobson J.D Applied multivariate Data Analysis, Volume II, Springer, New York, 1992
8 Toomas Kirt [4] Kaski S Dimensionality Reduction by Random Mapping: Fast Similarity Computation for Clustering, IEEE International Joint Conference on Neural Networks, Anchorage, Alaska, May 4-9 1998 [5] Kohonen T, Kaski S, Laugus K, Salojärvi J, Honkela J, Paatero V and Saarela A, Self organisation of a massive document collection, IEEE Transactions on Neural Networks, vol. 11, No.3, May 2000 [6] Kohonen T, Self-organising maps, Third edition, Springer, Berlin, 2000 [7] Liu C, Wechler, Face Recognition Using Shape and Texture, CVPR 99, Fort Collins, Colorado, June 23-25 1999 [8] Oja E, Kaski S, Kohonen Maps, Elsevier, Amsterdam, 1999 [9] Theodoridis S, Koutroumbas K, Pattern Recognition, Academic Press, San Diego, 1998 [10] Ultsch A, Unified Matrix (U-matrix) Methods, http://www.mathematik.unimarburg.de/~ultsch/umatrix/umatrix.html, 1999 [11] Võhandu L, Krusberg H A Direct Factor Analysis Method, The Proceedings of TTU, 426, 1977, pp.11-21