Using Cumulative Weighted Slopes for Clustering Time Series Data

GESTS Int l Trans. Computer Science and Engr., Vol.20, No.1 29 Using Cumulative Weighted Slopes for Clustering Time Series Data Durga Toshniwal 1 and R. C. Joshi 1 1 Department of Electronics & Computer Engineering, Indian Institute of Technology Roorkee, Roorkee 247 667 Uttaranchal, India {durgadec, joshifcc}@iitr.ernet.in Abstract. Clustering time series data is an important mining activity in various domains. In this paper we propose a novel approach for clustering time series data based on cumulative weighted slopes. This technique is based on the observation that similar time sequences would have similar slopes at their corresponding points. The weighted sum of these slopes is called the cumulative weighted slope and is computed for each time sequence. Clusters are formed on the basis of this weighted sum of slopes to identify similar patterns over periods of time. 1 Introduction Mining time series data for clusters has been an area of active research for the last few decades. It has immense applications in various domains. Examples of some application domains include finance and banking, retail sales, weather forecasting and agriculture. The problem of clustering is interdisciplinary in nature and has been addressed in different contexts by researchers working in a variety of areas such as data mining, statistics, and information systems. Much work is being done on clustering time series data [1], [2], [3], [4], [5]. Clustering can be used as a stand alone tool for analyzing data, and may also be used as a pre-processing step in other data mining algorithms [6], [7]. Clustering methods can be classified in a number of ways [8] [9]. Clustering can also be broadly categorized as whole sequence clustering and subsequence clustering [10]. The whole sequence clustering deals with grouping of similar time series into the same cluster. Whereas, the subsequence clustering uses a sliding window to extract subsequences from the given time series and then performs clustering on them. Due to an increased interest on streaming time series data, most of the work on clustering of time series data is based on subsequence clustering [1], [2], [3], [4], [5], [6], [7], [10], [11]. Keogh et al. claim in [10] that clustering streaming time series data is completely meaningless. In this paper, we suggest a novel approach for clustering time series data which is based on whole sequence clustering. In our method, the feature extraction from time

30 Using Cumulative Weighted Slopes for Clustering Time Series Data series data is done using cumulative weighted slopes. Cumulative weighted slope can be defined as the sum of the weighted slopes of the given time sequence computed on a point-to-point basis. The parameters representing the cumulative weighted slopes for various time sequences are then grouped into clusters using k-means clustering method to identify similar patterns. In this paper, we assume that a time series consists of a sequence of real numbers which represent the values of a measured parameter at equal but finite intervals of time. We first demonstrate the effectiveness of our approach by applying it to synthetic time series data consisting of a variety of similar and reverse shaped curves. Next, we demonstrate its application by taking real life case data on retail sales. This data is collected on a monthly basis over a period of eleven years from retail chain stores in USA. The retail sales data has been chosen as the case data in our paper due to the growing importance of time series data mining for the retail industry. Clustering of the retail sales data reveals similarity in the buying patterns of some common retail items. Such information can be used as an important tool by the retailers to enhance their sales by designing effective marketing strategies, optimizing inventory and efficiently using shelf space. The rest of the paper is organized as follows. Section 2 briefly gives background and related work. In Section 3, we describe the proposed approach. Section 4 gives some experimental results and in Section 5 we discuss the case study on real life retail time series data. Finally the conclusions and future work are covered in Section 6. 2 Background and Related Work For clustering time series data, we need to perform feature extraction from the time series data and then apply some clustering technique to the feature vector. In this section, we briefly discuss some key approaches for performing clustering and feature extraction from time series data. 2.1 Feature Extraction from Time Series Data There has been an explosion of interest on feature extraction from time series data [12], [13], [14]. So far, a variety of approaches have been suggested for deriving feature vector from time series data. Most of these techniques rely on dimension reduction for mapping the high dimensional time series data to a lower dimensional space. The transformed data is then used for efficient indexing and retrieval purposes. Agrawal et al. [12] used the Discrete Fourier Transform (DFT) for deriving the feature vector from the time series data. The DFT was used to map the time sequences to the frequency domain. Chan et al. [13] proposed to use the Discrete Wavelet Transform (DWT) in place of DFT for feature extraction from time series data. Unlike the DFT which misses the time localization of sequences, the DWT allows time as well as frequency localization concurrently. A data dependent scheme for feature extraction was proposed in [14] and is known as the Singular Value Decomposition (SVD) method for feature extraction.

GESTS Int l Trans. Computer Science and Engr., Vol.20, No.1 31 In this paper, we introduce a new technique for feature extraction from time series data using cumulative weighted slopes. By cumulative slopes we mean the summation of slopes of the time sequences at corresponding points. Further these slopes have been assigned weights depending on their locations along the time axis. This helps to exaggerate the similarity (dissimilarity) of trends in our approach. The proposed approach works well in the presence of variable length time sequences in the databases. It can handle time as well as amplitude scaling and different baselines for the time sequences in the given database. 2.2 Clustering Clustering is the grouping of unlabeled data such that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters. So far, much work has been done on the different approaches for performing clustering [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]. One of the most widely used clustering methods is the hierarchical clustering method [4]. However its application is limited to relatively small datasets [9] due to the fact that its time complexity is O(n 2 log n). Here n is the number of tuples or objects in the given database. The k-means clustering method is more suitable for large datasets [9]. It was first introduced by MacQueen in [15]. This clustering algorithm falls in the category of partitioning approach for clustering data. The k-means algorithm partitions a set of n data objects into k clusters so that the resulting intracluster similarity is high whereas the intercluster similarity is low. The number of clusters k has to be fixed apriori. Initially, the k-means algorithm randomly selects k of the data objects as the cluster means or centers. It then computes the new mean for each cluster. The process iterates until the criterion function converges. The criterion function is defined as: k 2 E = i = 1 p C p - m i i (1) Here E is the sum of square-error for all objects in the database, p is the point in space representing a given object, and m i is the mean of the cluster C i. In our paper, we have chosen to use the k-means clustering algorithm for analyzing the time series data due to its almost linear time complexity for even datasets of large sizes. The complexity of the k-means algorithm for N objects is given by O(kNrD) [10] where k is the number of clusters specified by the user, r is the number of iterations until convergence, and D is the dimensionality of the time series. In order to reduce the complexity, we can reduce N or D. It may always not be possible to reduce N so the best possible method for reducing the complexity is to reduce D by using an efficient representation technique for the time series data. In our approach, each time sequence is represented as a number which is obtained by the weighted sum of the slopes of the time sequence at certain points. As a result, the dimensionality of the time series data has been reduced to a constant equal to 1. Thus for all practical purposes, the complexity of the k-means algorithm gets reduced to O(kNr). Thus we have achieved a substantial improvement in the complexity of the k-means algorithm.

32 Using Cumulative Weighted Slopes for Clustering Time Series Data 3 Proposed Approach In this paper, we suggest a simple and novel approach for clustering time sequences which is based on whole sequence clustering. The cumulative weighted slopes are used for feature extraction from the given time sequences. First the slopes are calculated at corresponding points of each of the time sequence under observation. The slopes computed at corresponding points of the sequences are then assigned weights depending on the location of the slope along the time axis. Thus we obtain the weighted slopes for each of the time sequences which are then summed to obtain the cumulative weighted slope for the respective time sequence. In this way, the cumulative weighted slope is computed for all the time sequences being studied. These cumulative weighted slopes are then grouped into clusters using k-means clustering method to identify similar patterns. 3.1 Feature Extraction Using Cumulative Weighted Slopes In this section, we introduce the parameters for cumulative weighted slope. For the computation of cumulative weighted slope, all the time sequences are divided into same number of small strips of equal width along the time axis. In our approach, the weight given to the slope at a point is equal to the fraction of the strip number (for which the slope is being computed) to the total number of strips into which the time sequence has been divided. Cumulative Weighted Slope Computation. The approach requires some data preprocessing steps to be performed prior to slope computations. It is assumed here that the time series database consists of p time sequences designated by X 1, X 2 X p. Each time sequence X i in turn can be represented as < (t i1, y i1 ), (t i2, y i2 ) (t in, y in ) >. The first step in data preprocessing involves scaling of each of the time sequence X i in the time series database along the time axis. This is done to equalize their time axes to some desired value say t d. Thus their time axes become equal. The selection of t d is done by the user and may depend on the domain of application of the data. In our technique, scaling along the time axis is done to help compare variable length time sequences. For example, a 5-year growth pattern of a Company A can be compared to a 10-year growth pattern of a Company B. In order to avoid any distortions that may arise due to aforesaid scaling along the time-axis, the values along the y-axis for each X i are also scaled proportionately. Each transformed X i denoted by X i may be represented as <(t i1, y i1 ), (t i2, y i2 ) (t in, y in ) > where: t ik = t ik * ( t d / t in ) and y ik = y ik * ( t d / t in ) (2) This is followed by dividing each time sequence in the database into same number of small, equi-width strips along the time-axis as shown in Fig. 1. Thus each time sequence is divided into say m number of strips. The strips have different heights but same widths along the time-axis as shown in Fig. 1.

GESTS Int l Trans. Computer Science and Engr., Vol.20, No.1 33 The first parameter introduced in this section to represent the cumulative weighted slope is denoted by WS sq (X i ) and is given as: WS sq (X i ) = ( ) 2 2 S ik * k m m k = 1 Here S ik is the slope for the k th strip in the time sequence X i, m is the total number of strips into which the time sequence has been divided. Here the slope S ik is given as: S ik = { y i(k+1) - y ik } / Δ t (4) We assume in (4) that the starting and ending coordinates for the k th strip of the time sequence being analyzed are given by ( t ' ik, y ik ) and ( t ' i(k + 1), y i(k + 1)). And Δt is the width of each of the strips and is a constant. The choice of Δt may be user specified or domain specific. The important thing to note about the selection of Δt is that its value should be optimally selected so that it is neither too small (because that may lead to excessive computations) nor too large (loss of details). The weight associated with the slope S ik as in (4) is given by (k / m) where k is the strip number for which the slope S ik is being computed and m is the total number of strips into which each time sequence is divided. The next parameter introduced to represent the cumulative weighted slope is denoted by WS cube (X i ) and is given as: WS cube (X i ) = 3 3 S ik * ( k m) m k = 1 3 Here S ik is the slope for the k th strip in the time sequence X i, m is the total number of strips into which the time sequence has been divided and the slope S ik is given as in (4). The weight assigned to slope S ik as given in (5) is (k / m) 3 where k is the strip number for which the slope S ik is being computed and m is the total number of strips. The cube of slopes has been specially chosen in (5) to account for the positive or negative sign of the weighted slopes for a given time sequence. We feel that the inclusion of the sign plays a significant role while computing the cumulative weighted slope for a time sequence. Moreover the cube of (k / m) has been used in conjunction with the cube of S ik in the parameter WS cube (X i ). This has been done to exaggerate the role of the location of the strip (i.e. k) for which the slope S ik is being computed. Or, in other words, the weights have been assigned in (5) to emphasize the fact that a certain slope in a time sequence exists at a certain point. When the various slopes at certain points of the given time sequence are summed, we get the parameter as in (5). Clustering Time Series Data. We have used the k-means clustering algorithm to group the centroids. The algorithm is outlined in Table 1. The iterations stop when the criterion function as in (1) converges. The overall strategy of the proposed method is summarized as follows: Data pre-processing Step 1: Scaling of data along the time-axis and correspondingly scaling the values of y-ordinate to avoid any possibility of data distortions. Step 2: Dividing each time sequence into same number of equi width strips. (3) (5)

34 Using Cumulative Weighted Slopes for Clustering Time Series Data y y1 y2 yn 0 t1 t2 tn time t Fig. 1. Division of the normalized time series into n equi-width strips each having width Δ t Feature extraction using cumulative weighted slopes Step 3: Computing the parameter WS sq (X i ) or the parameter WS cube (X i ) for arriving at the cumulative weighted slopes of the time sequences being analyzed. Clustering Step 4: Clustering of the parameters obtained in step 3 using k-means clustering algorithm. 4 Experimental Results To prove the effectiveness of our approach, we have conducted experiments with synthetic time series datasets. The synthetic datasets used in this section have been specifically designed to illustrate the feature extraction method suggested in our approach. A variety of shapes and reverse shapes have been used for our experiments. But due to lack of space, only small subsets of these are shown here. The application of k-means clustering on real life case data taken in our approach is dealt with in the next section. The first sample dataset A considered is shown in Fig. 2. It comprises of A1, A2, A3 and A4. The data are pre-processed as discussed in Section 3. This involves scaling both along the x-axis and correspondingly along the y-axis taking t d = 5 (can be user defined or domain specific). The number of strips into which each member of the dataset A has been divided is 10 (can be user defined or domain specific). The finally pre-processed dataset A is denoted by AS and is shown in Fig. 3. The parameters WS sq (X) and WS cube (X) computed for the pre-processed dataset A are given in Table 2. Clustering has been done using the k-means algorithm as explained in Section 2 with k = 2. The resulting clusters on the basis of parameter WS sq (X) are shown in Table 3. The clusters resulting by applying k-means (k =2) clustering technique to the parameter WS cube (X) are also the same as shown in Table 3. In terms of dataset A, the first cluster consists of A1, A2 and A3 and the second one comprises of A4. The next sample dataset under consideration is B and is shown in Fig. 4. The data has been pre-processed as discussed in Section 3 taking t d = 5. The number of strips into which each member of the dataset B has been divided is 10. The finally pre-

GESTS Int l Trans. Computer Science and Engr., Vol.20, No.1 35 processed dataset B is denoted by BS and comprises of B1S, B2S, B3S and B4S. The parameters WS sq (X) and WS cube (X) computed for the pre-processed dataset BS are given in Table 4. Clustering is done using the k-means algorithm as explained in Section 3 with k = 2. Table 1. K-Means Algorithm S.No. Steps 1. Choose the value of k 2. Randomly select k objects as the cluster centers 3. Assign each object to a cluster to which it is most similar (near) depending on the mean value for that cluster 4. Re-calculate the k cluster centers 5. Repeat steps 3 and 4 until the cluster centers stop moving Fig. 2. Time series dataset A Fig. 3. Finally pre-processed time series dataset A denoted by AS

36 Using Cumulative Weighted Slopes for Clustering Time Series Data Table 2. Cumulative weighted slope computations for Dataset AS Pre-processed Sequence Parameter WS sq (X) Parameter WS cube (X) A1S 1.320-0.467 A2S 1.288-0.497 A3S 1.195-0.745 A4S 3.548-1.320 Table 3. Results of K-Means clustering applied to Table 2 (k = 2) Cluster No. Description 1. A1S, A2S, A3S 2. A4S Fig. 4. Time series dataset B Table 4. Cumulative weighted slope computations for Dataset BS Pre-processed Sequence Parameter WS sq (X) Parameter WS cube (X) B1S 1.919-1.120 B2S 1.843-0.942 B3S 1.827-1.037 B4S 3.526-2.085 Table 5. Results of K-Means clustering applied to Table 2 (k = 2) Cluster No. Description 1. B1S, B2S, B3S 2. B4S The resulting clusters on the basis of parameter WS sq (X) are shown in Table 5. The clusters resulting by applying k-means (k =2) clustering technique to the parameter

GESTS Int l Trans. Computer Science and Engr., Vol.20, No.1 37 WS cube (X) are also the same as shown in Table 5. In terms of dataset B, the first cluster consists of B1, B2 and B3 and the second one comprises of B4. 5 Case Study The case study undertaken in this paper consists of similarity analysis of retail sales data (in millions of dollars) collected on a monthly basis over a period of 11 years (from 01/1992 to 12/2002) for chain retail stores in USA [16]. The length of each time sequence in the retail sales time series database consists of 132 datapoints (for each item under sales). We considered sales data of several types of retail businesses as listed in Table 6. The time series data from the retail industry has been studied to analyze the sales patterns of different categories of products. Clustering the parameters for cumulative weighted slopes representing the retail sales time sequences can help identify the products which show similar sales patterns. This information can serve as an important tool for the retailers in leveraging their sales, designing effective marketing strategies, efficiently using their self space, forecasting inventory requirements and so on. Table 6. Businesses considered in the case study S. No. Description S. No. Description 1. Health and Personal Care 7. Men s Clothing Stores Stores 2. Pharmacies and Drug stores 8. Women s Clothing Stores 3. Furniture Stores 9. Shoe Stores 4. Jewelry stores 10 New Car Dealers 5. Sporting goods, Hobby and 11. Used Car Dealers Music Stores 6. Household Appliances Stores Table 7. Cumulative weighted slope computations for the case study S. No. Description Parameter WS sq (X) 1. Health and Personal Care Stores 8729.19 2. Pharmacies and Drug Stores 7517.66 3. Sporting goods, Hobby and Music 17739.05 Stores 4. Furniture Stores 5523.15 5. Used Car Dealers 2926.08 6. Jewelry Stores 12322.13 7. Women s Clothing Stores 5664.91 8. Shoe Stores 3323.99 9. Household Appliances 895.42 10. Men s Clothing Stores 2229.54 11. New Car Dealers 37761.62

38 Using Cumulative Weighted Slopes for Clustering Time Series Data Table 8. Results of k-means clustering with k = 4 Cluster No. Description 1. Men s Clothing Stores, Shoe Stores, Used Car Dealers, and Household Appliances Stores 2. Furniture Stores, Women's Clothing Stores, Health and Personal Care Stores and Pharmacies and Drug stores 3. Jewelry Stores and Sporting goods, Hobby and Music Stores 4. New Car Dealers The first step involves data pre-processing. The data has been pre-processed using the steps outlined in Section 3. Thereafter, cumulative weighted slopes given by parameter WS sq (X) have been computed for each of the retail sales time sequence data (each having 132 datapoints) as per the procedure described in Section 3. The results have been summarized in Table 7. After cumulative weighted slope computations, the k-means clustering algorithm has been employed to find groups of products having similar customer buying patterns. The results of clustering with k=4 are listed in Table 8. Those products whose cumulative weighted slope parameters lie in the same cluster exhibit similar sales patterns. Similarly the computations of the parameter WS cube(x) can also be done. It can be concluded from Table 8 that the retail sales for the period of 11 years from January 1992 to December 2002 at men's clothing stores, shoe stores, household appliances' stores and used cars show similar sales patterns. Thus, this implies that the stores selling men's clothes may also have a shoes' section as these items show similar sales patterns. Or those stores selling men's clothes and shoes may place the latter in shelf spaces near the clothes section. All this provides the customer with convenience and at the same time may help boost sales of these items. Similarly, the information obtained by applying our approach to sales time series data from retail stores may also be applied to various other items of business to derive significant business strategies and rules. 6 Conclusions and Future Work We have proposed a new and efficient technique for clustering time series data. It is based on whole sequence clustering. The proposed approach works by computing parameters representing cumulative weighted slopes for the time sequences under observation. These parameters are then clustered using k-means clustering algorithm. In this paper, we assume that a time series consists of a sequence of real numbers which represent the values of a measured parameter at equal intervals of time. The

GESTS Int l Trans. Computer Science and Engr., Vol.20, No.1 39 proposed approach works irrespective of global scaling or shrinking of the time sequences. It is also capable of handling different baselines. The case data considered in this study is 11-years sales data collected from the retail chain stores in USA on a monthly basis from January, 1992 to December, 2002. Applying the proposed approach to the sales time series data from retail industry can reveal similarities in sales patterns of different items. This information may be very helpful to boost sales and design effective marketing strategies in the traditional retail industry as well as in e-business. In further work we intend to obtain association rules from retail time series data by searching for groups of clusters that occur frequently together. Also, alternate clustering algorithms may be employed for grouping the retail time series data. We also intend to employ a further enlarged dataset which may include sales data for many more types of businesses and places. References [1] G. Das, K. Lin, H. Mannila, G. Reganathan, and P. Smyth, " Rule Discovery from Time Series," Proc. of the 4th Int'l Conference on Knowledge Discovery and Data Mining, pp. 16-22, New York, NY, Aug 27-31, 1998. [2] P. Cotofrei and K. Stoffel, "Classification Rules + Time = Temporal Rules," Proc. of the 2002 Int'l Conference on Computational Science, pp. 572-581, Amsterdam, Netherlands, Apr 21-24, 2002. [3] X. Jin, L. Wang, Y. Lu, and C. Shi, "Indexing and Mining of the Local Patterns in Sequence Databases," Proc. of the 3 rd Int'l Conference on Intelligent Data Engineering and Automated Learning, Manchaster, pp. 68-73, UK, Aug 12-14, 2002. [4] E. Keogh and S. Kasetty, " On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration," Proc. of the 8th ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining, Edmonton, pp. 102-111, Alberta, Canada, July 23-26, 2002. [5] N. Radhakrishnan, J. D. Wilson, and P. C. Loizou, "An alternate Partitioning Technique to Quantify the Regularity of Complex Time Series," Int l Journal of Bifurcation and Chaos, Vol. 10, No. 7, pp. 1773-1779, World Scientific Publishing, 2000. [6] S. K. Harms, J. Deogun, and T. Tadesse, " Discovering Sequential Association Rules with Constraints and Time Lags in Multiple Sequences," Proc. of the 13th Int'l Symposium on Methodologies for Intelligent Systems, pp. 432-441, Lyon, France, June 27-29, 2002. [7] C. Li, P. S.Yu, and V. Castelli, MALM: A Framework for Mining Sequence Database at Multiple Abstraction Levels," Proc. of the 7 th ACM CIKM Int'l Conference on Information and Knowledge Management, pp. 267-272, Bethesda, MD, Nov 3-7, 1998. [8] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, San Francisco, CA, 2002. [9] A. K. Jain, M. N. Murty, and P. J. Flynn, "Data Clustering: A Survey," ACM Comput. Surv., Vol. 31, pp. 264-323, 1999. [10] E. Keogh, J. Lin, and W. Truppel, "Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research," Proc. of the Int'l Conference on Data Mining, pp. 115, 2003.

40 Using Cumulative Weighted Slopes for Clustering Time Series Data [11] P. Cotofrei, " Statistical Temporal Rules," Proc. of the 15th Conference on Computational Statistics Berlin, Germany, Aug 24-28, 2002. [12] R. Agrawal, C. Faloutsos, and A. Swami, Efficient Similarity Search in Sequence Databases, Proc. 4th Int l Conf. Foundations of Data Organization and Algorithms, pp. 69-84, Chicago, Illinois, USA, 1993. [13] D. Refiei, On Similarity Based Queries for Time Series Data, Proc. 15th IEEE Int l Conf. Data Engineering, pp. 410-417, Sydney, Australia, March 1999. [14] F. Korn, H. Jagadish, and C. Faloutsos, Efficiently Supporting Ad hoc Queries in Large Datasets of Time Sequences, Proc. ACM SIGMOD Int l Conf. On Management of Data, pp. 289-300, Tuescon, AZ, May 1997. [15] J. MacQueen, "Some Methods for Classification and Analysis of Multivariate Observations," Proc. of the 5th Berkeley Symposium Math. Statist., pp. 281-297, Prob., 1967. [16] Economic Time Series Page, http://www.economagic.com Biography Name: Durga Toshniwal Address: Department of Electronics & Computer Engineering, Indian Institute of Technology, Roorkee, India 247 667 Education & Work experience: The author has done Bachelor of Engineering from JMI, India and Master of Technology from NIT Kurukshetra, India. Presently, she is a Research Scholar at the Indian Institute of Technology Roorkee, India. Her areas of research interest are Time Series Data Mining and KDD Tel: +91-1332-271575 E-mail: durgadec@iitr.ernet.in Name: R. C. Joshi Address: Department of Electronics & Computer Engineering, Indian Institute of Technology, Roorkee, India 247 667 Education & Work experience: The author has done B.E. from Allahabad University, India, M.E. and Ph.D. from IIT Roorkee, India. Presently, he is a Professor at the Indian Institute of Technology Roorkee, India. His area of interest is Databases Tel: +91-1332-285650 E-mail: joshifcc@iitr.ernet.in