Time Series Prediction Using RSOM and Local Models

Tme Seres Predcton Usng RSOM and Local Models Peter ANGELOVIČ Slovak Unversty of Technology Faculty of Informatcs and Informaton Technologes Ilkovčova 3, 842 16 Bratslava, Slovaka angelovc@ft.stuba.sk Abstract. The artcle gves a short survey n the area of tme seres predcton, ts defnton and certan solutons. The man emphass of ths artcle s put on predcton wth the use of local models and artfcal neural networks. Detaled analyss of SOM and RSOM neural networks and ts usage n temporal quantzaton s gven. The artcle s also concerned about verfcaton of predcton sklls of the system wth RSOM and MLP neural networks n real tme seres. 1 Introducton The man motvaton for analyss and research of tme seres s the desre to predct the future and understand fundamental features and processes n systems, whch are used n every sector of human lfe. Tme seres predcton concentrates on buldng models, whch descrbes process usng avalable knowledge and nformaton. These models can be used to smulate the future events n the process. Tme seres s formed from measurements or observatons natural, techncal or economcal processes, whch are made sequental n tme. In general, consecutve samples of tme seres have an mportant feature that value or sample depends on sequence of consecutve samples, whch have been located just before. Because of ths dependency, t s possble to perform estmaton or predcton of future values n tme seres. Ths fact can be expressed by equaton: ( x( t), x( t 1),..., x( t 1) ) x( t + 1) = f N + (1) Supervsor: prof. Ing. Vladmír Vojtek, PhD., Insttute of Appled Informatcs, Faculty of Informatcs and Informaton Technologes STU n Bratslava M. Belková (Ed.), IIT.SRC 2005, Aprl 27, 2005, pp. 27-27.

28 Peter Angelovč where x(t) s sample or state of the system n tme step t, f s dependency between sample n tme step t+1 and prevous samples, N s number of the prevous samples, whch are relevant for computng the new state of the system. There s no reason for bg number of prevous samples n the real applcatons. Usually certan nterval of consecutve samples s used. Ths nterval s called tme wndow and computaton of the sample x(t+1) s called predcton. In scence, technology and economy there exsts many processes and phenomena, whose predcton s mportant due to ther usefulness and fnancal proft. These nclude dfferent ndustral processes that can be modeled, predcted and controlled based on some sensory data. Many phenomena of nature, such as daly ranfall or probablty of an earthquake would also be useful to predct. Medcal applcatons nclude for example modelng bologcal sgnals such as EEG or ECG to understand better the patents state. In economy stock market prces are an example of a possbly very proftable predcton tasks. 2 Predcton models Nowadays, there are varous approaches used n tme seres predcton. The oldest and frequently used methods are statstcal regresson lnear models [3], because the theory of these models s well known, and many algorthms for model buldng are avalable. They nclude lnear autoregressve models (AR), movng average models (MA), mxed models (ARMA) and mxed ntegrated models (ARIMA). In practce almost all measured processes are nonlnear to some extent and hence lnear modelng methods turn out to be n some cases nadequate. On the other sde, nonlnear methods are tme-consumng, that s why they were unusable n real tme applcatons a few years ago. Wth the growth of computer processng and data storage, the usage of varous nonlnear methods has expanded. Artfcal neural networks became very popular n ths sector [4]. They are motvated from bologcal knowledge about structure and functonalty of human bran. Neural networks consst of huge amount of smple computatonal unts neurons, whch are connected n some manner. The way of connecton and computaton that they perform depends on the type of the task. Learnng or adaptaton s the typcal feature of neural networks. Optmzaton of all the model parameters s carred out at the same tme wth the learnng algorthm. Many types of neural networks for predcton have been formed. Frst neural network used n predcton tasks was multlayer perceptron (MLP), whch became very popular durng 1980s. Temporal and contextual nformaton n nternal structure of network s mplemented n Tme Delay Neural Network. Input of ths network s created from the samples of the tme wndow, whch s presented nto the network at the same tme. Ths s frst mplementaton of memory n neural network. Another network, whch s often used n predcton tasks and ncludes memory n the nternal structure, s FIR MLP (Fnte Impulse Response MLP). Memory s mplemented n the structure of connectons between neurons. Recurrent neural networks such as Jordan or Elman networks are also used for temporal processng. The newest networks used n tme seres predcton are Echo State Networks, whch gves promsng results.

2.1 Local predcton models Tme Seres Predcton Usng RSOM and Local Models 29 All predcton models can be dvded nto global and local predcton models. Models mentoned above were the global models. In ths approach only one model s used to characterze measured process. Global models gve best results wth statonary tme seres. But, when the seres s nonstatonary, dentfyng proper global model becomes more dffcult. Local models often overcome some of the problems of the global models such as computng complexty and accuracy n predcton (see Fg. 1). They are based on dvdng the data set to smaller sets of data, each beng modeled wth smple local model. Fg. 1. On the left pcture approxmaton of tme seres wth one global model GM s shown. The same tme seres s modeled wth three local models LM1, LM2 and LM3 on the rght. The base method, whch uses local models for predcton, s called k nearest neghbors [6]. Local models requre a data set wth many examples of both the observed varables and the process output. The model can be constructed then and used to estmate the process output for a new nput, whch s not part of data set. Local model output s calculated usng three consecutve stages. The frst stage fnds frst k closest vectors, or nearest neghbors of observed varables n the data set. The second stage constructs a smple model usng only k chosen samples. The thrd stage evaluates the model to estmate the process output. Next method known from machne learnng s algorthm M5, whch bulds models tree durng the learnng phase [7]. Each nternal node of the tree denotes a test on an attrbute or value of nput vector, each branch represents an outcome of the test, and leaf nodes represent lnear regresson models. Branchng n nternal nodes leads to dvdng nput space of attrbutes nto dsjunctve regons. Dependency between nputs and target values n these regons can be expressed by lnear model. Dvson of the nput space of attrbutes can be performed by many clusterng methods or self organzng maps (SOM). SOM s able to dvde nput set of samples nto subsets wth smlar propertes. But losng of contextual nformaton durng dvson process s a bg dsadvantage, because t s mportant property n process of

30 Peter Angelovč tme seres predcton. For example, f there are two samples wth the same value both, but dfferent context, the SOM assgns them nto same cluster, and then local model predcts the same results. Ths problem can be overcome by njectng the memory structure nto the clusterng process and makng what we call temporal quantzaton. In [2] s detaled survey of SOM models wth memory mechansm. Next secton s concerned about tme seres predcton usng SOM wth contextual mechansm. 3 Modelng temporal context wth SOM Temporal context n the seres can be stored n the model by usng some knd of memory structure. In [5] s presented an extenson to the SOM the Recurrent Self- Organzng Map (RSOM) that allows storng certan nformaton from the past nput vectors. The nformaton s stored n the form of dfference vectors n the map unts. The mappng that s formed durng tranng has the topology preservaton characterstc of the SOM as explaned n next secton. RSOM can be used to determne the tme context of each vector n the seres of consecutve nput vectors by performng temporal quantzaton mentoned earler. It s now possble to use context wth the current nput of the model to select a dfferent local model n cases where the current nput vector looks the same, but the context s dfferent. In predcton the closest reference vector of the RSOM, called the best matchng unt or the wnner, s searched for each nput vector. A local model that s assocated wth the best matchng unt s then selected to be used n predcton task n that tme. 3.1 Self-organzng map SOM s a vector quantzaton method wth topology preservaton, when the dmenson of the map matches the true dmenson of the nput space. Topology preservaton means that nput patterns whch are near n the nput space are mapped nto the neghbour unts n the SOM lattce, where the unts are organzed nto a regular N- dmensonal grd. Topology preservaton s acheved wth the ntroducton of topologcal neghborhood that connects the unts of the SOM wth a neghborhood functon. The tranng algorthm of the SOM s based on unsupervsed learnng, where one sample, the nput vector x(n) from the nput space V I, s selected randomly and compared aganst the weght vector w of the unt the map space V M. The best matchng unt b to gven nput pattern x(n) s selected usng metrc based crteron: { x( n) w ( ) } x( n) w ( n) = mn n b V M (2) where the parallel vertcal bars denote Eucldean vector norm. Intally all weght vectors are set wth the random selected sample from the tranng set. Durng the

Tme Seres Predcton Usng RSOM and Local Models 31 learnng phase the weghts n the map are updated towards the gven nput pattern x(n) accordng to: ( x( n) w ( )) w ( n + 1) = w ( n) + γ ( n) h ( n) n b (3) where V M and γ(n) s a scalar valued adaptaton gan. The neghborhood functon h b (n) gves the exctaton of unt when the best matchng unt s b. If the map s traned properly, weght vectors specfy centers of clusters satsfyng the vector quantzaton crteron, where we seek to mnmze the sum squared dstance between nput patterns and weght vectors of the best matchng unts. 3.2 Recurrent SOM Recurrent SOM dffers from the SOM only n ts outputs. The outputs of the normal SOM are reseted to zero after presentng each nput pattern and selectng best matchng unt wth the typcal wnner takes all strategy. Hence the map s senstve only to the last nput pattern. In the RSOM the sharp outputs are replaced wth leaky ntegrator outputs, whch once actvated gradually lose ther actvty. The modelng of the outputs n RSOM s close to the behavor of natural neurons, whch retan an electrcal potental on ther membranes wth decay. In the RSOM ths decay s modeled wth the dfference equaton: ( x( n) w ( )) y ( n) = (1 α ) y ( n 1) + α n (4) where 0 < α 1 s a leakng coeffcent, y(n) s a leaked dfference vector, w (n) s the reference or weght vector n the unt and x(n) s the nput pattern. Fg. 2 shows desgn of the output node n RSOM network. Fg. 2. Structure of RSOM node wth feedback. Hgh value of α corresponds to short memory whle small values of α correspond to long memory and slow decay of actvaton. In the extremes of α, RSOM behaves

32 Peter Angelovč lke a normal SOM (α = 1), whle n the other extreme all unts tend to the mean of the nput data. Snce feedback n RSOM contans vector, t also captures the drecton of the error, whch can be exploted n weght update when the map s traned. The best matchng unt b at step n s searched by the equaton: y b = mn { y ( n) } (5) Then the map s traned wth modfed Hebban tranng rule gven n Eq. (3), where the dfference vector (x(n) - w (n)) s replaced wth y. 3.3 Temporal quantzaton Learnng algorthm of RSOM s mplemented as follows [5]: 1. An epsode of consecutve nput vectors startng from the data s presented to the map. We use certan number of teratons, because effect of the feedback n RSOM goes quckly toward to zero. The number of vectors belongng to the epsode s dependent on the leakng coeffcent α. 2. Best matchng unt s selected at the end of every epsode usng Eq. (5). 3. The updatng of the vector and ts neghbors s carred out as n Eq. (3). 4. After the updatng, all dfference vectors are set to zero, and a new random startng pont from the seres s selected. The above scenaro s repeated untl the mappng s formed. Whole procedure of buldng models and evaluatng ther predcton abltes wth testng data follows these steps: The tme seres s frst dvded nto tranng and testng data. RSOM s used to dvde data nto local data sets accordng to the best matchng unt on the map. Local models, whch have the same number of parameters as the vectors of RSOM unts, are then estmated accordng to these local data sets. In the tranng phase the new sample from the test data set s presented nto RSOM. Fnally, the best model s selected. Ths model s then used to predct next value n tme seres. 4 Experments Predcton sklls of system based on RSOM and local models were valdated on tme seres of energy consumpton. The results were compared to the other predcton systems based on global models such as MLP and FIR MLP. Tme seres nclude 1179 tranng samples (days) and 7 test samples. The nput vector was formed from 28 values as follows: 24 hourly values of energy consumpton one day before, 3 values represented the code of the predcted day and the last value

Tme Seres Predcton Usng RSOM and Local Models 33 was effectve temperature 1. The output vector ncluded 24 values of hourly energy consumpton of predcted day. Measurng and comparng predcton results were made usng methods for evaluatng errors such as: Mean Square Error (MSE), Maxmal Absolute Devaton (MAD) and Mean Absolute Percentage Error (MAPE). Influence of leakng coeffcent (LC) and number of local models (NLM) were observed durng experments. The best results are shown n Table 1. Whole composton and sequence of experments are n [1]. Table 1. Best predcton results for dfferent number of local models. NLM LC MSE MAD MAPE [%] 2 0,78 0,00169 0,1945 2,28 3 0,95 0,00183 0,2113 2,304 4 0,95 0,00147 0,1642 2,08 The above table shows that the best result was reached usng 4 local models and 0,95 as a value of leakng coeffcent. The fnal predcton of testng data s on Fg. 3. Fg. 3. Comparson of predcted and real values of tested samples. The grey lne represents the real values and black lne represents the predcted values. All values are transformed nto the <0, 1> nterval. Experments usng other predcton methods were made at the same tme as experments wth RSOM and local models. The man motvaton for comparng these models was showng the fact that local models are more accurately for modelng such nonlnear tme seres. Results of ths comparson are shown n Table 2. 1 Concept of formng the code of the day and computaton of effectve temperature s shown n [1] and [8].

34 Peter Angelovč Table 2. Comparson of predcton results wth other methods. Method MSE MAO MAPE [%] MLP 0,00617 0,4214 3,52 FIR MLP 0,00138 0,1637 2,14 RSOM + LM 0,00147 0,1642 2,08 5 Conclusons Predcton system based on RSOM and local models was used for predcton of energy consumpton. Best results were acheved wth 4 local models. Ths result captures the fact mentoned n [8], that samples (days) of energy consumpton tme seres can be dvded nto 4 dfferent groups, accordng to the type of the day (workng day, holday, day before or after the holday). In global, the results of the system are better or comparable wth other predcton methods. Usage of predcton system based on RSOM has several advantages. Frst of all, t s small computng complexty opposte to the global models. Next property s the unsupervsed learnng of the context from the data. It allows buldng models from the data wth only a lttle a pror knowledge. An mportant property of RSOM s ts vsualzaton ablty. It s possble to vsualze the local models, and gve comprehensble nformaton to the user of the system to understand the tme seres. References 1. Angelovč, P.: Využte neurónových setí s nekontrolovaným učením pr predkc časových radov. Dplomová práca, Košce: FEI TU, 2003. 2. Barreto, G., Araujo, A.: Tme n self-organzng maps: an overvew of models. Internatonal Journal of Computer Research, Vol. 10, No. 2 (2001), 139-179. 3. Box, G., Jenkns, G., Rensel G.: Tme Seres Analyss: Forecastng and Control. Prentce Hall, Englewood Clffs, New Jersey, 1994. 4. Kvasnčka, V., Beňušková, Ľ., Pospíchal, J., Farkaš, I., Tňo, P., Kráľ, A.: Úvod do teóre neurónových setí. Bratslava: Irs, 1997. ISBN 80-88778-30-1. 5. Koskela, T., Varsta, M., Hekkonen, J., Kask, K.: Temporal sequence processng usng recurrent SOM. Helsnk Unversty of Technology, 1998. 6. McNames, J.: Innovatons n local modellng for tme seres predcton. Dssertaton, Stanford Unversty, 1999. 7. Paralč, J.: Objavovane znalostí v databázach. Košce: Elfa, 2002. ISBN 80-89066-60-7. 8. Szathmáry P., Kolcun, M.: Predkca denných dagramov zaťažena ES s využtím umelých neurónových setí. Košce: Elfa, 2001. ISBN 80-88964-80-6.