Using Cumulative Weighted Slopes for Clustering Time Series Data

Size: px
Start display at page:

Download "Using Cumulative Weighted Slopes for Clustering Time Series Data"

Transcription

1 GESTS Int l Trans. Computer Science and Engr., Vol.20, No.1 29 Using Cumulative Weighted Slopes for Clustering Time Series Data Durga Toshniwal 1 and R. C. Joshi 1 1 Department of Electronics & Computer Engineering, Indian Institute of Technology Roorkee, Roorkee Uttaranchal, India {durgadec, joshifcc}@iitr.ernet.in Abstract. Clustering time series data is an important mining activity in various domains. In this paper we propose a novel approach for clustering time series data based on cumulative weighted slopes. This technique is based on the observation that similar time sequences would have similar slopes at their corresponding points. The weighted sum of these slopes is called the cumulative weighted slope and is computed for each time sequence. Clusters are formed on the basis of this weighted sum of slopes to identify similar patterns over periods of time. 1 Introduction Mining time series data for clusters has been an area of active research for the last few decades. It has immense applications in various domains. Examples of some application domains include finance and banking, retail sales, weather forecasting and agriculture. The problem of clustering is interdisciplinary in nature and has been addressed in different contexts by researchers working in a variety of areas such as data mining, statistics, and information systems. Much work is being done on clustering time series data [1], [2], [3], [4], [5]. Clustering can be used as a stand alone tool for analyzing data, and may also be used as a pre-processing step in other data mining algorithms [6], [7]. Clustering methods can be classified in a number of ways [8] [9]. Clustering can also be broadly categorized as whole sequence clustering and subsequence clustering [10]. The whole sequence clustering deals with grouping of similar time series into the same cluster. Whereas, the subsequence clustering uses a sliding window to extract subsequences from the given time series and then performs clustering on them. Due to an increased interest on streaming time series data, most of the work on clustering of time series data is based on subsequence clustering [1], [2], [3], [4], [5], [6], [7], [10], [11]. Keogh et al. claim in [10] that clustering streaming time series data is completely meaningless. In this paper, we suggest a novel approach for clustering time series data which is based on whole sequence clustering. In our method, the feature extraction from time

2 30 Using Cumulative Weighted Slopes for Clustering Time Series Data series data is done using cumulative weighted slopes. Cumulative weighted slope can be defined as the sum of the weighted slopes of the given time sequence computed on a point-to-point basis. The parameters representing the cumulative weighted slopes for various time sequences are then grouped into clusters using k-means clustering method to identify similar patterns. In this paper, we assume that a time series consists of a sequence of real numbers which represent the values of a measured parameter at equal but finite intervals of time. We first demonstrate the effectiveness of our approach by applying it to synthetic time series data consisting of a variety of similar and reverse shaped curves. Next, we demonstrate its application by taking real life case data on retail sales. This data is collected on a monthly basis over a period of eleven years from retail chain stores in USA. The retail sales data has been chosen as the case data in our paper due to the growing importance of time series data mining for the retail industry. Clustering of the retail sales data reveals similarity in the buying patterns of some common retail items. Such information can be used as an important tool by the retailers to enhance their sales by designing effective marketing strategies, optimizing inventory and efficiently using shelf space. The rest of the paper is organized as follows. Section 2 briefly gives background and related work. In Section 3, we describe the proposed approach. Section 4 gives some experimental results and in Section 5 we discuss the case study on real life retail time series data. Finally the conclusions and future work are covered in Section 6. 2 Background and Related Work For clustering time series data, we need to perform feature extraction from the time series data and then apply some clustering technique to the feature vector. In this section, we briefly discuss some key approaches for performing clustering and feature extraction from time series data. 2.1 Feature Extraction from Time Series Data There has been an explosion of interest on feature extraction from time series data [12], [13], [14]. So far, a variety of approaches have been suggested for deriving feature vector from time series data. Most of these techniques rely on dimension reduction for mapping the high dimensional time series data to a lower dimensional space. The transformed data is then used for efficient indexing and retrieval purposes. Agrawal et al. [12] used the Discrete Fourier Transform (DFT) for deriving the feature vector from the time series data. The DFT was used to map the time sequences to the frequency domain. Chan et al. [13] proposed to use the Discrete Wavelet Transform (DWT) in place of DFT for feature extraction from time series data. Unlike the DFT which misses the time localization of sequences, the DWT allows time as well as frequency localization concurrently. A data dependent scheme for feature extraction was proposed in [14] and is known as the Singular Value Decomposition (SVD) method for feature extraction.

3 GESTS Int l Trans. Computer Science and Engr., Vol.20, No.1 31 In this paper, we introduce a new technique for feature extraction from time series data using cumulative weighted slopes. By cumulative slopes we mean the summation of slopes of the time sequences at corresponding points. Further these slopes have been assigned weights depending on their locations along the time axis. This helps to exaggerate the similarity (dissimilarity) of trends in our approach. The proposed approach works well in the presence of variable length time sequences in the databases. It can handle time as well as amplitude scaling and different baselines for the time sequences in the given database. 2.2 Clustering Clustering is the grouping of unlabeled data such that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters. So far, much work has been done on the different approaches for performing clustering [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]. One of the most widely used clustering methods is the hierarchical clustering method [4]. However its application is limited to relatively small datasets [9] due to the fact that its time complexity is O(n 2 log n). Here n is the number of tuples or objects in the given database. The k-means clustering method is more suitable for large datasets [9]. It was first introduced by MacQueen in [15]. This clustering algorithm falls in the category of partitioning approach for clustering data. The k-means algorithm partitions a set of n data objects into k clusters so that the resulting intracluster similarity is high whereas the intercluster similarity is low. The number of clusters k has to be fixed apriori. Initially, the k-means algorithm randomly selects k of the data objects as the cluster means or centers. It then computes the new mean for each cluster. The process iterates until the criterion function converges. The criterion function is defined as: k 2 E = i = 1 p C p - m i i (1) Here E is the sum of square-error for all objects in the database, p is the point in space representing a given object, and m i is the mean of the cluster C i. In our paper, we have chosen to use the k-means clustering algorithm for analyzing the time series data due to its almost linear time complexity for even datasets of large sizes. The complexity of the k-means algorithm for N objects is given by O(kNrD) [10] where k is the number of clusters specified by the user, r is the number of iterations until convergence, and D is the dimensionality of the time series. In order to reduce the complexity, we can reduce N or D. It may always not be possible to reduce N so the best possible method for reducing the complexity is to reduce D by using an efficient representation technique for the time series data. In our approach, each time sequence is represented as a number which is obtained by the weighted sum of the slopes of the time sequence at certain points. As a result, the dimensionality of the time series data has been reduced to a constant equal to 1. Thus for all practical purposes, the complexity of the k-means algorithm gets reduced to O(kNr). Thus we have achieved a substantial improvement in the complexity of the k-means algorithm.

4 32 Using Cumulative Weighted Slopes for Clustering Time Series Data 3 Proposed Approach In this paper, we suggest a simple and novel approach for clustering time sequences which is based on whole sequence clustering. The cumulative weighted slopes are used for feature extraction from the given time sequences. First the slopes are calculated at corresponding points of each of the time sequence under observation. The slopes computed at corresponding points of the sequences are then assigned weights depending on the location of the slope along the time axis. Thus we obtain the weighted slopes for each of the time sequences which are then summed to obtain the cumulative weighted slope for the respective time sequence. In this way, the cumulative weighted slope is computed for all the time sequences being studied. These cumulative weighted slopes are then grouped into clusters using k-means clustering method to identify similar patterns. 3.1 Feature Extraction Using Cumulative Weighted Slopes In this section, we introduce the parameters for cumulative weighted slope. For the computation of cumulative weighted slope, all the time sequences are divided into same number of small strips of equal width along the time axis. In our approach, the weight given to the slope at a point is equal to the fraction of the strip number (for which the slope is being computed) to the total number of strips into which the time sequence has been divided. Cumulative Weighted Slope Computation. The approach requires some data preprocessing steps to be performed prior to slope computations. It is assumed here that the time series database consists of p time sequences designated by X 1, X 2 X p. Each time sequence X i in turn can be represented as < (t i1, y i1 ), (t i2, y i2 ) (t in, y in ) >. The first step in data preprocessing involves scaling of each of the time sequence X i in the time series database along the time axis. This is done to equalize their time axes to some desired value say t d. Thus their time axes become equal. The selection of t d is done by the user and may depend on the domain of application of the data. In our technique, scaling along the time axis is done to help compare variable length time sequences. For example, a 5-year growth pattern of a Company A can be compared to a 10-year growth pattern of a Company B. In order to avoid any distortions that may arise due to aforesaid scaling along the time-axis, the values along the y-axis for each X i are also scaled proportionately. Each transformed X i denoted by X i may be represented as <(t i1, y i1 ), (t i2, y i2 ) (t in, y in ) > where: t ik = t ik * ( t d / t in ) and y ik = y ik * ( t d / t in ) (2) This is followed by dividing each time sequence in the database into same number of small, equi-width strips along the time-axis as shown in Fig. 1. Thus each time sequence is divided into say m number of strips. The strips have different heights but same widths along the time-axis as shown in Fig. 1.

5 GESTS Int l Trans. Computer Science and Engr., Vol.20, No.1 33 The first parameter introduced in this section to represent the cumulative weighted slope is denoted by WS sq (X i ) and is given as: WS sq (X i ) = ( ) 2 2 S ik * k m m k = 1 Here S ik is the slope for the k th strip in the time sequence X i, m is the total number of strips into which the time sequence has been divided. Here the slope S ik is given as: S ik = { y i(k+1) - y ik } / Δ t (4) We assume in (4) that the starting and ending coordinates for the k th strip of the time sequence being analyzed are given by ( t ' ik, y ik ) and ( t ' i(k + 1), y i(k + 1)). And Δt is the width of each of the strips and is a constant. The choice of Δt may be user specified or domain specific. The important thing to note about the selection of Δt is that its value should be optimally selected so that it is neither too small (because that may lead to excessive computations) nor too large (loss of details). The weight associated with the slope S ik as in (4) is given by (k / m) where k is the strip number for which the slope S ik is being computed and m is the total number of strips into which each time sequence is divided. The next parameter introduced to represent the cumulative weighted slope is denoted by WS cube (X i ) and is given as: WS cube (X i ) = 3 3 S ik * ( k m) m k = 1 3 Here S ik is the slope for the k th strip in the time sequence X i, m is the total number of strips into which the time sequence has been divided and the slope S ik is given as in (4). The weight assigned to slope S ik as given in (5) is (k / m) 3 where k is the strip number for which the slope S ik is being computed and m is the total number of strips. The cube of slopes has been specially chosen in (5) to account for the positive or negative sign of the weighted slopes for a given time sequence. We feel that the inclusion of the sign plays a significant role while computing the cumulative weighted slope for a time sequence. Moreover the cube of (k / m) has been used in conjunction with the cube of S ik in the parameter WS cube (X i ). This has been done to exaggerate the role of the location of the strip (i.e. k) for which the slope S ik is being computed. Or, in other words, the weights have been assigned in (5) to emphasize the fact that a certain slope in a time sequence exists at a certain point. When the various slopes at certain points of the given time sequence are summed, we get the parameter as in (5). Clustering Time Series Data. We have used the k-means clustering algorithm to group the centroids. The algorithm is outlined in Table 1. The iterations stop when the criterion function as in (1) converges. The overall strategy of the proposed method is summarized as follows: Data pre-processing Step 1: Scaling of data along the time-axis and correspondingly scaling the values of y-ordinate to avoid any possibility of data distortions. Step 2: Dividing each time sequence into same number of equi width strips. (3) (5)

6 34 Using Cumulative Weighted Slopes for Clustering Time Series Data y y1 y2 yn 0 t1 t2 tn time t Fig. 1. Division of the normalized time series into n equi-width strips each having width Δ t Feature extraction using cumulative weighted slopes Step 3: Computing the parameter WS sq (X i ) or the parameter WS cube (X i ) for arriving at the cumulative weighted slopes of the time sequences being analyzed. Clustering Step 4: Clustering of the parameters obtained in step 3 using k-means clustering algorithm. 4 Experimental Results To prove the effectiveness of our approach, we have conducted experiments with synthetic time series datasets. The synthetic datasets used in this section have been specifically designed to illustrate the feature extraction method suggested in our approach. A variety of shapes and reverse shapes have been used for our experiments. But due to lack of space, only small subsets of these are shown here. The application of k-means clustering on real life case data taken in our approach is dealt with in the next section. The first sample dataset A considered is shown in Fig. 2. It comprises of A1, A2, A3 and A4. The data are pre-processed as discussed in Section 3. This involves scaling both along the x-axis and correspondingly along the y-axis taking t d = 5 (can be user defined or domain specific). The number of strips into which each member of the dataset A has been divided is 10 (can be user defined or domain specific). The finally pre-processed dataset A is denoted by AS and is shown in Fig. 3. The parameters WS sq (X) and WS cube (X) computed for the pre-processed dataset A are given in Table 2. Clustering has been done using the k-means algorithm as explained in Section 2 with k = 2. The resulting clusters on the basis of parameter WS sq (X) are shown in Table 3. The clusters resulting by applying k-means (k =2) clustering technique to the parameter WS cube (X) are also the same as shown in Table 3. In terms of dataset A, the first cluster consists of A1, A2 and A3 and the second one comprises of A4. The next sample dataset under consideration is B and is shown in Fig. 4. The data has been pre-processed as discussed in Section 3 taking t d = 5. The number of strips into which each member of the dataset B has been divided is 10. The finally pre-

7 GESTS Int l Trans. Computer Science and Engr., Vol.20, No.1 35 processed dataset B is denoted by BS and comprises of B1S, B2S, B3S and B4S. The parameters WS sq (X) and WS cube (X) computed for the pre-processed dataset BS are given in Table 4. Clustering is done using the k-means algorithm as explained in Section 3 with k = 2. Table 1. K-Means Algorithm S.No. Steps 1. Choose the value of k 2. Randomly select k objects as the cluster centers 3. Assign each object to a cluster to which it is most similar (near) depending on the mean value for that cluster 4. Re-calculate the k cluster centers 5. Repeat steps 3 and 4 until the cluster centers stop moving Fig. 2. Time series dataset A Fig. 3. Finally pre-processed time series dataset A denoted by AS

8 36 Using Cumulative Weighted Slopes for Clustering Time Series Data Table 2. Cumulative weighted slope computations for Dataset AS Pre-processed Sequence Parameter WS sq (X) Parameter WS cube (X) A1S A2S A3S A4S Table 3. Results of K-Means clustering applied to Table 2 (k = 2) Cluster No. Description 1. A1S, A2S, A3S 2. A4S Fig. 4. Time series dataset B Table 4. Cumulative weighted slope computations for Dataset BS Pre-processed Sequence Parameter WS sq (X) Parameter WS cube (X) B1S B2S B3S B4S Table 5. Results of K-Means clustering applied to Table 2 (k = 2) Cluster No. Description 1. B1S, B2S, B3S 2. B4S The resulting clusters on the basis of parameter WS sq (X) are shown in Table 5. The clusters resulting by applying k-means (k =2) clustering technique to the parameter

9 GESTS Int l Trans. Computer Science and Engr., Vol.20, No.1 37 WS cube (X) are also the same as shown in Table 5. In terms of dataset B, the first cluster consists of B1, B2 and B3 and the second one comprises of B4. 5 Case Study The case study undertaken in this paper consists of similarity analysis of retail sales data (in millions of dollars) collected on a monthly basis over a period of 11 years (from 01/1992 to 12/2002) for chain retail stores in USA [16]. The length of each time sequence in the retail sales time series database consists of 132 datapoints (for each item under sales). We considered sales data of several types of retail businesses as listed in Table 6. The time series data from the retail industry has been studied to analyze the sales patterns of different categories of products. Clustering the parameters for cumulative weighted slopes representing the retail sales time sequences can help identify the products which show similar sales patterns. This information can serve as an important tool for the retailers in leveraging their sales, designing effective marketing strategies, efficiently using their self space, forecasting inventory requirements and so on. Table 6. Businesses considered in the case study S. No. Description S. No. Description 1. Health and Personal Care 7. Men s Clothing Stores Stores 2. Pharmacies and Drug stores 8. Women s Clothing Stores 3. Furniture Stores 9. Shoe Stores 4. Jewelry stores 10 New Car Dealers 5. Sporting goods, Hobby and 11. Used Car Dealers Music Stores 6. Household Appliances Stores Table 7. Cumulative weighted slope computations for the case study S. No. Description Parameter WS sq (X) 1. Health and Personal Care Stores Pharmacies and Drug Stores Sporting goods, Hobby and Music Stores 4. Furniture Stores Used Car Dealers Jewelry Stores Women s Clothing Stores Shoe Stores Household Appliances Men s Clothing Stores New Car Dealers

10 38 Using Cumulative Weighted Slopes for Clustering Time Series Data Table 8. Results of k-means clustering with k = 4 Cluster No. Description 1. Men s Clothing Stores, Shoe Stores, Used Car Dealers, and Household Appliances Stores 2. Furniture Stores, Women's Clothing Stores, Health and Personal Care Stores and Pharmacies and Drug stores 3. Jewelry Stores and Sporting goods, Hobby and Music Stores 4. New Car Dealers The first step involves data pre-processing. The data has been pre-processed using the steps outlined in Section 3. Thereafter, cumulative weighted slopes given by parameter WS sq (X) have been computed for each of the retail sales time sequence data (each having 132 datapoints) as per the procedure described in Section 3. The results have been summarized in Table 7. After cumulative weighted slope computations, the k-means clustering algorithm has been employed to find groups of products having similar customer buying patterns. The results of clustering with k=4 are listed in Table 8. Those products whose cumulative weighted slope parameters lie in the same cluster exhibit similar sales patterns. Similarly the computations of the parameter WS cube(x) can also be done. It can be concluded from Table 8 that the retail sales for the period of 11 years from January 1992 to December 2002 at men's clothing stores, shoe stores, household appliances' stores and used cars show similar sales patterns. Thus, this implies that the stores selling men's clothes may also have a shoes' section as these items show similar sales patterns. Or those stores selling men's clothes and shoes may place the latter in shelf spaces near the clothes section. All this provides the customer with convenience and at the same time may help boost sales of these items. Similarly, the information obtained by applying our approach to sales time series data from retail stores may also be applied to various other items of business to derive significant business strategies and rules. 6 Conclusions and Future Work We have proposed a new and efficient technique for clustering time series data. It is based on whole sequence clustering. The proposed approach works by computing parameters representing cumulative weighted slopes for the time sequences under observation. These parameters are then clustered using k-means clustering algorithm. In this paper, we assume that a time series consists of a sequence of real numbers which represent the values of a measured parameter at equal intervals of time. The

11 GESTS Int l Trans. Computer Science and Engr., Vol.20, No.1 39 proposed approach works irrespective of global scaling or shrinking of the time sequences. It is also capable of handling different baselines. The case data considered in this study is 11-years sales data collected from the retail chain stores in USA on a monthly basis from January, 1992 to December, Applying the proposed approach to the sales time series data from retail industry can reveal similarities in sales patterns of different items. This information may be very helpful to boost sales and design effective marketing strategies in the traditional retail industry as well as in e-business. In further work we intend to obtain association rules from retail time series data by searching for groups of clusters that occur frequently together. Also, alternate clustering algorithms may be employed for grouping the retail time series data. We also intend to employ a further enlarged dataset which may include sales data for many more types of businesses and places. References [1] G. Das, K. Lin, H. Mannila, G. Reganathan, and P. Smyth, " Rule Discovery from Time Series," Proc. of the 4th Int'l Conference on Knowledge Discovery and Data Mining, pp , New York, NY, Aug 27-31, [2] P. Cotofrei and K. Stoffel, "Classification Rules + Time = Temporal Rules," Proc. of the 2002 Int'l Conference on Computational Science, pp , Amsterdam, Netherlands, Apr 21-24, [3] X. Jin, L. Wang, Y. Lu, and C. Shi, "Indexing and Mining of the Local Patterns in Sequence Databases," Proc. of the 3 rd Int'l Conference on Intelligent Data Engineering and Automated Learning, Manchaster, pp , UK, Aug 12-14, [4] E. Keogh and S. Kasetty, " On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration," Proc. of the 8th ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining, Edmonton, pp , Alberta, Canada, July 23-26, [5] N. Radhakrishnan, J. D. Wilson, and P. C. Loizou, "An alternate Partitioning Technique to Quantify the Regularity of Complex Time Series," Int l Journal of Bifurcation and Chaos, Vol. 10, No. 7, pp , World Scientific Publishing, [6] S. K. Harms, J. Deogun, and T. Tadesse, " Discovering Sequential Association Rules with Constraints and Time Lags in Multiple Sequences," Proc. of the 13th Int'l Symposium on Methodologies for Intelligent Systems, pp , Lyon, France, June 27-29, [7] C. Li, P. S.Yu, and V. Castelli, MALM: A Framework for Mining Sequence Database at Multiple Abstraction Levels," Proc. of the 7 th ACM CIKM Int'l Conference on Information and Knowledge Management, pp , Bethesda, MD, Nov 3-7, [8] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, San Francisco, CA, [9] A. K. Jain, M. N. Murty, and P. J. Flynn, "Data Clustering: A Survey," ACM Comput. Surv., Vol. 31, pp , [10] E. Keogh, J. Lin, and W. Truppel, "Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research," Proc. of the Int'l Conference on Data Mining, pp. 115, 2003.

12 40 Using Cumulative Weighted Slopes for Clustering Time Series Data [11] P. Cotofrei, " Statistical Temporal Rules," Proc. of the 15th Conference on Computational Statistics Berlin, Germany, Aug 24-28, [12] R. Agrawal, C. Faloutsos, and A. Swami, Efficient Similarity Search in Sequence Databases, Proc. 4th Int l Conf. Foundations of Data Organization and Algorithms, pp , Chicago, Illinois, USA, [13] D. Refiei, On Similarity Based Queries for Time Series Data, Proc. 15th IEEE Int l Conf. Data Engineering, pp , Sydney, Australia, March [14] F. Korn, H. Jagadish, and C. Faloutsos, Efficiently Supporting Ad hoc Queries in Large Datasets of Time Sequences, Proc. ACM SIGMOD Int l Conf. On Management of Data, pp , Tuescon, AZ, May [15] J. MacQueen, "Some Methods for Classification and Analysis of Multivariate Observations," Proc. of the 5th Berkeley Symposium Math. Statist., pp , Prob., [16] Economic Time Series Page, Biography Name: Durga Toshniwal Address: Department of Electronics & Computer Engineering, Indian Institute of Technology, Roorkee, India Education & Work experience: The author has done Bachelor of Engineering from JMI, India and Master of Technology from NIT Kurukshetra, India. Presently, she is a Research Scholar at the Indian Institute of Technology Roorkee, India. Her areas of research interest are Time Series Data Mining and KDD Tel: durgadec@iitr.ernet.in Name: R. C. Joshi Address: Department of Electronics & Computer Engineering, Indian Institute of Technology, Roorkee, India Education & Work experience: The author has done B.E. from Allahabad University, India, M.E. and Ph.D. from IIT Roorkee, India. Presently, he is a Professor at the Indian Institute of Technology Roorkee, India. His area of interest is Databases Tel: joshifcc@iitr.ernet.in

Finding Similarity in Time Series Data by Method of Time Weighted Moments

Finding Similarity in Time Series Data by Method of Time Weighted Moments Finding Similarity in Series Data by Method of Weighted Moments Durga Toshniwal, Ramesh C. Joshi Department of Electronics and Computer Engineering Indian Institute of Technology Roorkee 247 667, India

More information

Mining Quantitative Association Rules on Overlapped Intervals

Mining Quantitative Association Rules on Overlapped Intervals Mining Quantitative Association Rules on Overlapped Intervals Qiang Tong 1,3, Baoping Yan 2, and Yuanchun Zhou 1,3 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China {tongqiang,

More information

Dynamic Clustering of Data with Modified K-Means Algorithm

Dynamic Clustering of Data with Modified K-Means Algorithm 2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq

More information

HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery

HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery Ninh D. Pham, Quang Loc Le, Tran Khanh Dang Faculty of Computer Science and Engineering, HCM University of Technology,

More information

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Abstract Mrs. C. Poongodi 1, Ms. R. Kalaivani 2 1 PG Student, 2 Assistant Professor, Department of

More information

Improved Frequent Pattern Mining Algorithm with Indexing

Improved Frequent Pattern Mining Algorithm with Indexing IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov Dec. 2014), PP 73-78 Improved Frequent Pattern Mining Algorithm with Indexing Prof.

More information

Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data

Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data D.Radha Rani 1, A.Vini Bharati 2, P.Lakshmi Durga Madhuri 3, M.Phaneendra Babu 4, A.Sravani 5 Department

More information

Keshavamurthy B.N., Mitesh Sharma and Durga Toshniwal

Keshavamurthy B.N., Mitesh Sharma and Durga Toshniwal Keshavamurthy B.N., Mitesh Sharma and Durga Toshniwal Department of Electronics and Computer Engineering, Indian Institute of Technology, Roorkee, Uttarkhand, India. bnkeshav123@gmail.com, mitusuec@iitr.ernet.in,

More information

Temporal Weighted Association Rule Mining for Classification

Temporal Weighted Association Rule Mining for Classification Temporal Weighted Association Rule Mining for Classification Purushottam Sharma and Kanak Saxena Abstract There are so many important techniques towards finding the association rules. But, when we consider

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

Iteration Reduction K Means Clustering Algorithm

Iteration Reduction K Means Clustering Algorithm Iteration Reduction K Means Clustering Algorithm Kedar Sawant 1 and Snehal Bhogan 2 1 Department of Computer Engineering, Agnel Institute of Technology and Design, Assagao, Goa 403507, India 2 Department

More information

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Devina Desai ddevina1@csee.umbc.edu Tim Oates oates@csee.umbc.edu Vishal Shanbhag vshan1@csee.umbc.edu Machine Learning

More information

Maintenance of the Prelarge Trees for Record Deletion

Maintenance of the Prelarge Trees for Record Deletion 12th WSEAS Int. Conf. on APPLIED MATHEMATICS, Cairo, Egypt, December 29-31, 2007 105 Maintenance of the Prelarge Trees for Record Deletion Chun-Wei Lin, Tzung-Pei Hong, and Wen-Hsiang Lu Department of

More information

Normalization based K means Clustering Algorithm

Normalization based K means Clustering Algorithm Normalization based K means Clustering Algorithm Deepali Virmani 1,Shweta Taneja 2,Geetika Malhotra 3 1 Department of Computer Science,Bhagwan Parshuram Institute of Technology,New Delhi Email:deepalivirmani@gmail.com

More information

International Journal of Modern Engineering and Research Technology

International Journal of Modern Engineering and Research Technology Volume 2, Issue 4, October 2015 ISSN: 2348-8565 (Online) International Journal of Modern Engineering and Research Technology Website: http://www.ijmert.org Privacy Preservation in Data Mining Using Mixed

More information

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42 Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth

More information

The Fuzzy Search for Association Rules with Interestingness Measure

The Fuzzy Search for Association Rules with Interestingness Measure The Fuzzy Search for Association Rules with Interestingness Measure Phaichayon Kongchai, Nittaya Kerdprasop, and Kittisak Kerdprasop Abstract Association rule are important to retailers as a source of

More information

A Novel method for Frequent Pattern Mining

A Novel method for Frequent Pattern Mining A Novel method for Frequent Pattern Mining K.Rajeswari #1, Dr.V.Vaithiyanathan *2 # Associate Professor, PCCOE & Ph.D Research Scholar SASTRA University, Tanjore, India 1 raji.pccoe@gmail.com * Associate

More information

The Effect of Word Sampling on Document Clustering

The Effect of Word Sampling on Document Clustering The Effect of Word Sampling on Document Clustering OMAR H. KARAM AHMED M. HAMAD SHERIN M. MOUSSA Department of Information Systems Faculty of Computer and Information Sciences University of Ain Shams,

More information

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3

More information

H-D and Subspace Clustering of Paradoxical High Dimensional Clinical Datasets with Dimension Reduction Techniques A Model

H-D and Subspace Clustering of Paradoxical High Dimensional Clinical Datasets with Dimension Reduction Techniques A Model Indian Journal of Science and Technology, Vol 9(38), DOI: 10.17485/ijst/2016/v9i38/101792, October 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 H-D and Subspace Clustering of Paradoxical High

More information

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set To Enhance Scalability of Item Transactions by Parallel and Partition using Dynamic Data Set Priyanka Soni, Research Scholar (CSE), MTRI, Bhopal, priyanka.soni379@gmail.com Dhirendra Kumar Jha, MTRI, Bhopal,

More information

Data Mining Course Overview

Data Mining Course Overview Data Mining Course Overview 1 Data Mining Overview Understanding Data Classification: Decision Trees and Bayesian classifiers, ANN, SVM Association Rules Mining: APriori, FP-growth Clustering: Hierarchical

More information

Keywords: clustering algorithms, unsupervised learning, cluster validity

Keywords: clustering algorithms, unsupervised learning, cluster validity Volume 6, Issue 1, January 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Clustering Based

More information

TEMPORAL data mining is a research field of growing

TEMPORAL data mining is a research field of growing An Optimal Temporal and Feature Space Allocation in Supervised Data Mining S. Tom Au, Guangqin Ma, and Rensheng Wang, Abstract This paper presents an expository study of temporal data mining for prediction

More information

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University

More information

Enhancing K-means Clustering Algorithm with Improved Initial Center

Enhancing K-means Clustering Algorithm with Improved Initial Center Enhancing K-means Clustering Algorithm with Improved Initial Center Madhu Yedla #1, Srinivasa Rao Pathakota #2, T M Srinivasa #3 # Department of Computer Science and Engineering, National Institute of

More information

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach ABSTRACT G.Ravi Kumar 1 Dr.G.A. Ramachandra 2 G.Sunitha 3 1. Research Scholar, Department of Computer Science &Technology,

More information

A Novel Algorithm for Associative Classification

A Novel Algorithm for Associative Classification A Novel Algorithm for Associative Classification Gourab Kundu 1, Sirajum Munir 1, Md. Faizul Bari 1, Md. Monirul Islam 1, and K. Murase 2 1 Department of Computer Science and Engineering Bangladesh University

More information

Analyzing Outlier Detection Techniques with Hybrid Method

Analyzing Outlier Detection Techniques with Hybrid Method Analyzing Outlier Detection Techniques with Hybrid Method Shruti Aggarwal Assistant Professor Department of Computer Science and Engineering Sri Guru Granth Sahib World University. (SGGSWU) Fatehgarh Sahib,

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,

More information

Mine Blood Donors Information through Improved K- Means Clustering Bondu Venkateswarlu 1 and Prof G.S.V.Prasad Raju 2

Mine Blood Donors Information through Improved K- Means Clustering Bondu Venkateswarlu 1 and Prof G.S.V.Prasad Raju 2 Mine Blood Donors Information through Improved K- Means Clustering Bondu Venkateswarlu 1 and Prof G.S.V.Prasad Raju 2 1 Department of Computer Science and Systems Engineering, Andhra University, Visakhapatnam-

More information

An Overview of various methodologies used in Data set Preparation for Data mining Analysis

An Overview of various methodologies used in Data set Preparation for Data mining Analysis An Overview of various methodologies used in Data set Preparation for Data mining Analysis Arun P Kuttappan 1, P Saranya 2 1 M. E Student, Dept. of Computer Science and Engineering, Gnanamani College of

More information

Hierarchical Document Clustering

Hierarchical Document Clustering Hierarchical Document Clustering Benjamin C. M. Fung, Ke Wang, and Martin Ester, Simon Fraser University, Canada INTRODUCTION Document clustering is an automatic grouping of text documents into clusters

More information

Correlation Based Feature Selection with Irrelevant Feature Removal

Correlation Based Feature Selection with Irrelevant Feature Removal Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

Mining Association Rules in Temporal Document Collections

Mining Association Rules in Temporal Document Collections Mining Association Rules in Temporal Document Collections Kjetil Nørvåg, Trond Øivind Eriksen, and Kjell-Inge Skogstad Dept. of Computer and Information Science, NTNU 7491 Trondheim, Norway Abstract. In

More information

NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM

NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM Saroj 1, Ms. Kavita2 1 Student of Masters of Technology, 2 Assistant Professor Department of Computer Science and Engineering JCDM college

More information

K-Mean Clustering Algorithm Implemented To E-Banking

K-Mean Clustering Algorithm Implemented To E-Banking K-Mean Clustering Algorithm Implemented To E-Banking Kanika Bansal Banasthali University Anjali Bohra Banasthali University Abstract As the nations are connected to each other, so is the banking sector.

More information

EFFICIENT ALGORITHM FOR MINING FREQUENT ITEMSETS USING CLUSTERING TECHNIQUES

EFFICIENT ALGORITHM FOR MINING FREQUENT ITEMSETS USING CLUSTERING TECHNIQUES EFFICIENT ALGORITHM FOR MINING FREQUENT ITEMSETS USING CLUSTERING TECHNIQUES D.Kerana Hanirex Research Scholar Bharath University Dr.M.A.Dorai Rangaswamy Professor,Dept of IT, Easwari Engg.College Abstract

More information

An Enhanced K-Medoid Clustering Algorithm

An Enhanced K-Medoid Clustering Algorithm An Enhanced Clustering Algorithm Archna Kumari Science &Engineering kumara.archana14@gmail.com Pramod S. Nair Science &Engineering, pramodsnair@yahoo.com Sheetal Kumrawat Science &Engineering, sheetal2692@gmail.com

More information

Combining Distributed Memory and Shared Memory Parallelization for Data Mining Algorithms

Combining Distributed Memory and Shared Memory Parallelization for Data Mining Algorithms Combining Distributed Memory and Shared Memory Parallelization for Data Mining Algorithms Ruoming Jin Department of Computer and Information Sciences Ohio State University, Columbus OH 4321 jinr@cis.ohio-state.edu

More information

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH International Journal of Information Technology and Knowledge Management January-June 2011, Volume 4, No. 1, pp. 27-32 DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY)

More information

K-Means Clustering With Initial Centroids Based On Difference Operator

K-Means Clustering With Initial Centroids Based On Difference Operator K-Means Clustering With Initial Centroids Based On Difference Operator Satish Chaurasiya 1, Dr.Ratish Agrawal 2 M.Tech Student, School of Information and Technology, R.G.P.V, Bhopal, India Assistant Professor,

More information

Materialized Data Mining Views *

Materialized Data Mining Views * Materialized Data Mining Views * Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland tel. +48 61

More information

Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Data mining - detailed outline. Problem

Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Data mining - detailed outline. Problem Faloutsos & Pavlo 15415/615 Carnegie Mellon Univ. Dept. of Computer Science 15415/615 DB Applications Lecture # 24: Data Warehousing / Data Mining (R&G, ch 25 and 26) Data mining detailed outline Problem

More information

An Improved Apriori Algorithm for Association Rules

An Improved Apriori Algorithm for Association Rules Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan

More information

Efficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points

Efficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points Efficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points Dr. T. VELMURUGAN Associate professor, PG and Research Department of Computer Science, D.G.Vaishnav College, Chennai-600106,

More information

Hierarchical Online Mining for Associative Rules

Hierarchical Online Mining for Associative Rules Hierarchical Online Mining for Associative Rules Naresh Jotwani Dhirubhai Ambani Institute of Information & Communication Technology Gandhinagar 382009 INDIA naresh_jotwani@da-iict.org Abstract Mining

More information

Data mining - detailed outline. Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Problem.

Data mining - detailed outline. Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Problem. Faloutsos & Pavlo 15415/615 Carnegie Mellon Univ. Dept. of Computer Science 15415/615 DB Applications Data Warehousing / Data Mining (R&G, ch 25 and 26) C. Faloutsos and A. Pavlo Data mining detailed outline

More information

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining P.Subhashini 1, Dr.G.Gunasekaran 2 Research Scholar, Dept. of Information Technology, St.Peter s University,

More information

Conceptual Review of clustering techniques in data mining field

Conceptual Review of clustering techniques in data mining field Conceptual Review of clustering techniques in data mining field Divya Shree ABSTRACT The marvelous amount of data produced nowadays in various application domains such as molecular biology or geography

More information

An Efficient Algorithm for finding high utility itemsets from online sell

An Efficient Algorithm for finding high utility itemsets from online sell An Efficient Algorithm for finding high utility itemsets from online sell Sarode Nutan S, Kothavle Suhas R 1 Department of Computer Engineering, ICOER, Maharashtra, India 2 Department of Computer Engineering,

More information

International Journal of Advance Engineering and Research Development. A Survey on Data Mining Methods and its Applications

International Journal of Advance Engineering and Research Development. A Survey on Data Mining Methods and its Applications Scientific Journal of Impact Factor (SJIF): 4.72 International Journal of Advance Engineering and Research Development Volume 5, Issue 01, January -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 A Survey

More information

CS490D: Introduction to Data Mining Prof. Chris Clifton

CS490D: Introduction to Data Mining Prof. Chris Clifton CS490D: Introduction to Data Mining Prof. Chris Clifton April 5, 2004 Mining of Time Series Data Time-series database Mining Time-Series and Sequence Data Consists of sequences of values or events changing

More information

A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set

A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set Renu Vashist School of Computer Science and Engineering Shri Mata Vaishno Devi University, Katra,

More information

Mining of Web Server Logs using Extended Apriori Algorithm

Mining of Web Server Logs using Extended Apriori Algorithm International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

More information

KEYWORDS: Clustering, RFPCM Algorithm, Ranking Method, Query Redirection Method.

KEYWORDS: Clustering, RFPCM Algorithm, Ranking Method, Query Redirection Method. IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY IMPROVED ROUGH FUZZY POSSIBILISTIC C-MEANS (RFPCM) CLUSTERING ALGORITHM FOR MARKET DATA T.Buvana*, Dr.P.krishnakumari *Research

More information

Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials *

Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials * Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials * Galina Bogdanova, Tsvetanka Georgieva Abstract: Association rules mining is one kind of data mining techniques

More information

PTclose: A novel algorithm for generation of closed frequent itemsets from dense and sparse datasets

PTclose: A novel algorithm for generation of closed frequent itemsets from dense and sparse datasets : A novel algorithm for generation of closed frequent itemsets from dense and sparse datasets J. Tahmores Nezhad ℵ, M.H.Sadreddini Abstract In recent years, various algorithms for mining closed frequent

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 11, November 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 4, April 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Discovering Knowledge

More information

Mining Frequent Itemsets for data streams over Weighted Sliding Windows

Mining Frequent Itemsets for data streams over Weighted Sliding Windows Mining Frequent Itemsets for data streams over Weighted Sliding Windows Pauray S.M. Tsai Yao-Ming Chen Department of Computer Science and Information Engineering Minghsin University of Science and Technology

More information

International Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at

International Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at Performance Evaluation of Ensemble Method Based Outlier Detection Algorithm Priya. M 1, M. Karthikeyan 2 Department of Computer and Information Science, Annamalai University, Annamalai Nagar, Tamil Nadu,

More information

Preparation of Data Set for Data Mining Analysis using Horizontal Aggregation in SQL

Preparation of Data Set for Data Mining Analysis using Horizontal Aggregation in SQL Preparation of Data Set for Data Mining Analysis using Horizontal Aggregation in SQL Vidya Bodhe P.G. Student /Department of CE KKWIEER Nasik, University of Pune, India vidya.jambhulkar@gmail.com Abstract

More information

DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE

DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE Saravanan.Suba Assistant Professor of Computer Science Kamarajar Government Art & Science College Surandai, TN, India-627859 Email:saravanansuba@rediffmail.com

More information

Cluster Validity Classification Approaches Based on Geometric Probability and Application in the Classification of Remotely Sensed Images

Cluster Validity Classification Approaches Based on Geometric Probability and Application in the Classification of Remotely Sensed Images Sensors & Transducers 04 by IFSA Publishing, S. L. http://www.sensorsportal.com Cluster Validity ification Approaches Based on Geometric Probability and Application in the ification of Remotely Sensed

More information

An Efficient Clustering for Crime Analysis

An Efficient Clustering for Crime Analysis An Efficient Clustering for Crime Analysis Malarvizhi S 1, Siddique Ibrahim 2 1 UG Scholar, Department of Computer Science and Engineering, Kumaraguru College Of Technology, Coimbatore, Tamilnadu, India

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Privacy Preservation Data Mining Using GSlicing Approach Mr. Ghanshyam P. Dhomse

More information

Detection and Deletion of Outliers from Large Datasets

Detection and Deletion of Outliers from Large Datasets Detection and Deletion of Outliers from Large Datasets Nithya.Jayaprakash 1, Ms. Caroline Mary 2 M. tech Student, Dept of Computer Science, Mohandas College of Engineering and Technology, India 1 Assistant

More information

A Hierarchical Document Clustering Approach with Frequent Itemsets

A Hierarchical Document Clustering Approach with Frequent Itemsets A Hierarchical Document Clustering Approach with Frequent Itemsets Cheng-Jhe Lee, Chiun-Chieh Hsu, and Da-Ren Chen Abstract In order to effectively retrieve required information from the large amount of

More information

A Review on Cluster Based Approach in Data Mining

A Review on Cluster Based Approach in Data Mining A Review on Cluster Based Approach in Data Mining M. Vijaya Maheswari PhD Research Scholar, Department of Computer Science Karpagam University Coimbatore, Tamilnadu,India Dr T. Christopher Assistant professor,

More information

Classifying Documents by Distributed P2P Clustering

Classifying Documents by Distributed P2P Clustering Classifying Documents by Distributed P2P Clustering Martin Eisenhardt Wolfgang Müller Andreas Henrich Chair of Applied Computer Science I University of Bayreuth, Germany {eisenhardt mueller2 henrich}@uni-bayreuth.de

More information

Results and Discussions on Transaction Splitting Technique for Mining Differential Private Frequent Itemsets

Results and Discussions on Transaction Splitting Technique for Mining Differential Private Frequent Itemsets Results and Discussions on Transaction Splitting Technique for Mining Differential Private Frequent Itemsets Sheetal K. Labade Computer Engineering Dept., JSCOE, Hadapsar Pune, India Srinivasa Narasimha

More information

The Comparison of CBA Algorithm and CBS Algorithm for Meteorological Data Classification Mohammad Iqbal, Imam Mukhlash, Hanim Maria Astuti

The Comparison of CBA Algorithm and CBS Algorithm for Meteorological Data Classification Mohammad Iqbal, Imam Mukhlash, Hanim Maria Astuti Information Systems International Conference (ISICO), 2 4 December 2013 The Comparison of CBA Algorithm and CBS Algorithm for Meteorological Data Classification Mohammad Iqbal, Imam Mukhlash, Hanim Maria

More information

SK International Journal of Multidisciplinary Research Hub Research Article / Survey Paper / Case Study Published By: SK Publisher

SK International Journal of Multidisciplinary Research Hub Research Article / Survey Paper / Case Study Published By: SK Publisher ISSN: 2394 3122 (Online) Volume 2, Issue 1, January 2015 Research Article / Survey Paper / Case Study Published By: SK Publisher P. Elamathi 1 M.Phil. Full Time Research Scholar Vivekanandha College of

More information

Appropriate Item Partition for Improving the Mining Performance

Appropriate Item Partition for Improving the Mining Performance Appropriate Item Partition for Improving the Mining Performance Tzung-Pei Hong 1,2, Jheng-Nan Huang 1, Kawuu W. Lin 3 and Wen-Yang Lin 1 1 Department of Computer Science and Information Engineering National

More information

Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering Amravati

Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering Amravati Analytical Representation on Secure Mining in Horizontally Distributed Database Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering

More information

A Graph-Based Approach for Mining Closed Large Itemsets

A Graph-Based Approach for Mining Closed Large Itemsets A Graph-Based Approach for Mining Closed Large Itemsets Lee-Wen Huang Dept. of Computer Science and Engineering National Sun Yat-Sen University huanglw@gmail.com Ye-In Chang Dept. of Computer Science and

More information

ETP-Mine: An Efficient Method for Mining Transitional Patterns

ETP-Mine: An Efficient Method for Mining Transitional Patterns ETP-Mine: An Efficient Method for Mining Transitional Patterns B. Kiran Kumar 1 and A. Bhaskar 2 1 Department of M.C.A., Kakatiya Institute of Technology & Science, A.P. INDIA. kirankumar.bejjanki@gmail.com

More information

XML Clustering by Bit Vector

XML Clustering by Bit Vector XML Clustering by Bit Vector WOOSAENG KIM Department of Computer Science Kwangwoon University 26 Kwangwoon St. Nowongu, Seoul KOREA kwsrain@kw.ac.kr Abstract: - XML is increasingly important in data exchange

More information

COMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES

COMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES COMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES USING DIFFERENT DATASETS V. Vaithiyanathan 1, K. Rajeswari 2, Kapil Tajane 3, Rahul Pitale 3 1 Associate Dean Research, CTS Chair Professor, SASTRA University,

More information

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Improving the Efficiency of Fast Using Semantic Similarity Algorithm International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year

More information

A FUZZY BASED APPROACH FOR PRIVACY PRESERVING CLUSTERING

A FUZZY BASED APPROACH FOR PRIVACY PRESERVING CLUSTERING A FUZZY BASED APPROACH FOR PRIVACY PRESERVING CLUSTERING 1 B.KARTHIKEYAN, 2 G.MANIKANDAN, 3 V.VAITHIYANATHAN 1 Assistant Professor, School of Computing, SASTRA University, TamilNadu, India. 2 Assistant

More information

Product presentations can be more intelligently planned

Product presentations can be more intelligently planned Association Rules Lecture /DMBI/IKI8303T/MTI/UI Yudho Giri Sucahyo, Ph.D, CISA (yudho@cs.ui.ac.id) Faculty of Computer Science, Objectives Introduction What is Association Mining? Mining Association Rules

More information

Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm

Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm Expert Systems: Final (Research Paper) Project Daniel Josiah-Akintonde December

More information

AN EFFICIENT GRADUAL PRUNING TECHNIQUE FOR UTILITY MINING. Received April 2011; revised October 2011

AN EFFICIENT GRADUAL PRUNING TECHNIQUE FOR UTILITY MINING. Received April 2011; revised October 2011 International Journal of Innovative Computing, Information and Control ICIC International c 2012 ISSN 1349-4198 Volume 8, Number 7(B), July 2012 pp. 5165 5178 AN EFFICIENT GRADUAL PRUNING TECHNIQUE FOR

More information

Combined Intra-Inter transaction based approach for mining Association among the Sectors in Indian Stock Market

Combined Intra-Inter transaction based approach for mining Association among the Sectors in Indian Stock Market Ranjeetsingh BParihar et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol 3 (3), 01,3895-3899 Combined Intra-Inter transaction based approach for mining Association

More information

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset M.Hamsathvani 1, D.Rajeswari 2 M.E, R.Kalaiselvi 3 1 PG Scholar(M.E), Angel College of Engineering and Technology, Tiruppur,

More information

Collaborative Rough Clustering

Collaborative Rough Clustering Collaborative Rough Clustering Sushmita Mitra, Haider Banka, and Witold Pedrycz Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India {sushmita, hbanka r}@isical.ac.in Dept. of Electrical

More information

CERIAS Tech Report Multiple and Partial Periodicity Mining in Time Series Databases by Mikhail J. Atallah Center for Education and Research

CERIAS Tech Report Multiple and Partial Periodicity Mining in Time Series Databases by Mikhail J. Atallah Center for Education and Research CERIAS Tech Report - Multiple and Partial Periodicity Mining in Time Series Databases by Mikhail J. Atallah Center for Education and Research Information Assurance and Security Purdue University, West

More information

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms. Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey of Clustering

More information

AN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION

AN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION AN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION WILLIAM ROBSON SCHWARTZ University of Maryland, Department of Computer Science College Park, MD, USA, 20742-327, schwartz@cs.umd.edu RICARDO

More information

IMPLEMENTATION OF CLASSIFICATION ALGORITHMS USING WEKA NAÏVE BAYES CLASSIFIER

IMPLEMENTATION OF CLASSIFICATION ALGORITHMS USING WEKA NAÏVE BAYES CLASSIFIER IMPLEMENTATION OF CLASSIFICATION ALGORITHMS USING WEKA NAÏVE BAYES CLASSIFIER N. Suresh Kumar, Dr. M. Thangamani 1 Assistant Professor, Sri Ramakrishna Engineering College, Coimbatore, India 2 Assistant

More information

IMPROVING APRIORI ALGORITHM USING PAFI AND TDFI

IMPROVING APRIORI ALGORITHM USING PAFI AND TDFI IMPROVING APRIORI ALGORITHM USING PAFI AND TDFI Manali Patekar 1, Chirag Pujari 2, Juee Save 3 1,2,3 Computer Engineering, St. John College of Engineering And Technology, Palghar Mumbai, (India) ABSTRACT

More information

Striped Grid Files: An Alternative for Highdimensional

Striped Grid Files: An Alternative for Highdimensional Striped Grid Files: An Alternative for Highdimensional Indexing Thanet Praneenararat 1, Vorapong Suppakitpaisarn 2, Sunchai Pitakchonlasap 1, and Jaruloj Chongstitvatana 1 Department of Mathematics 1,

More information

Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator

Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator R.Saravanan 1, J.Sivapriya 2, M.Shahidha 3 1 Assisstant Professor, Department of IT,SMVEC, Puducherry, India 2,3 UG student, Department

More information

Datasets Size: Effect on Clustering Results

Datasets Size: Effect on Clustering Results 1 Datasets Size: Effect on Clustering Results Adeleke Ajiboye 1, Ruzaini Abdullah Arshah 2, Hongwu Qin 3 Faculty of Computer Systems and Software Engineering Universiti Malaysia Pahang 1 {ajibraheem@live.com}

More information

FUFM-High Utility Itemsets in Transactional Database

FUFM-High Utility Itemsets in Transactional Database Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 3, March 2014,

More information