Advances in data stream mining

Size: px
Start display at page:

Download "Advances in data stream mining"

Transcription

1 Mohamed Medhat Gaber Mining data streams has been a focal point of research interest over the past decade. Hardware and software advances have contributed to the significance of this area of research by introducing faster than ever data generation. This rapidly generated data has been termed as data streams. Credit card transactions, Google searches, phone calls in a city, and many others\are typical data streams. In many important applications, it is inevitable to analyze this streaming data in real time. Traditional data mining techniques have fallen short in addressing the needs of data stream mining. Randomization, approximation, and adaptation have been used extensively in developing new techniques or adopting exiting ones to enable them to operate in a streaming environment. This paper reviews key milestones and state of the art in the data stream mining area. Future insights are also be presented. C 2011 Wiley Periodicals, Inc. How to cite this article: WIREs Data Mining Knowl Discov 2012, 2: doi: /widm.52 INTRODUCTION Data streams as a concept is defined as high-speed generated instances of data that challenge our computational systems to store, process, and reason about. 1,2 However, streaming data, if analyzed, is an important source of knowledge that enables us to take extremely important decisions in real time. The area has attracted attention of the data mining community over the last decade to develop new techniques or adopt existing ones aiming to realize the many important applications of data stream mining. Business, scientific, and security applications have been discussed extensively in the literature. 3,4 The last decade has witnessed an active research in the data stream mining. Hundreds of techniques have been proposed to address the research issues of analyzing rapidly arrived data streams in real time. Out of the large body of literature, we can identify four different categories that have contributed in shaping this area of research as follows. Other categories of techniques can be identified. For example, a large body of one-pass techniques do exist in the data stream mining literature. 1 However, the impact of the following four categories have been widely recognized. The first three categories represent approaches to building learning algorithms. On the other hand, Correspondence to: mohamed.gaber@port.ac.uk School of Computing, University of Portsmouth, Portsmouth, Hampshire, UK DOI: /widm.52 the last category contributes to a generic controller that could be used on top of any stream mining algorithm: 1. Two-phase techniques 2. Hoeffding bound-based techniques 3. Symbolic approximation-based techniques 4. Granularity-based techniques This paper will discuss the main principles behind each of the above categories and how this principle has been applied to different techniques. This discussion will be followed by presenting new directions in the area. Finally, future insights will be given. NOTABLE TECHNIQUES IN DATA STREAM MINING This section will provide a discussion of the four identified categories of data stream mining techniques listed in the introductory section. Two-Phase Techniques The two-phase techniques have been introduced by Aggarwal et al. 5 The general idea for this category of techniques is to maintain an online summary of data using what has been termed as microclusters. Microclustering has extended the data structure proposed by Zhang et al. 6 to develop the balanced iterative reducing and clustering using hierarchies (BIRCH). Volume 2, January/February

2 wires.wiley.com/widm The maintenance of the online microclusters is followed by a second phase that is done offline. This second phase differs from one technique to another according to whether the ultimate objective is running a supervised or an unsupervised technique. On the basis of the two-phase strategy, a framework for clustering data streams termed as CluStream has been proposed. 5 The proposed technique divides the clustering process into two components: online and offline. The online component stores summary statistics about the data streams and the offline one performs clustering on the summarized data according to a number of user preferences such as the time frame and the number of clusters. In an important milestone to the two-phase techniques, Aggarwal et al. 7 have proposed an extension to CluStream termed as HPStream a projected clustering for high-dimensional data streams. HPStream has outperformed CluStream in a number of case studies. The main motivation behind the development of HPStream is that CluStream has not performed effectively with high-dimensionality streaming information. Aggarwal et al. 8 have adopted the idea of microclusters introduced in CluStream 5 in on-demand classification. CluStream, as described earlier, divides the clustering process into the two components: offline and online. On-demand classification 8,9 uses clustering results to classify data using statistics of class distribution. The main motivation behind the technique is that the classification model should be used over a time period according to the application. The technique uses microclustering for each class in the data stream. This initialization is followed by a nearest neighbor classification of the unlabeled data. The microclusters are the key of the proposed technique which is the subtractive property. This property enables the extraction of the needed microclusters over the required time period. Hoeffding Bound-Based Techniques Domingos and Hulten 10,11 have proposed a generic strategy for scaling up machine learning algorithms termed very fast machine learning (VFML). This strategy depends on determining an upper bound for the learner s accuracy loss as a function in the number of examples/data records in each step of the algorithm. Hoeffding bound 12 has been the key for the development of the VFML techniques. Hence, we have coined this group of techniques as Hoeffding bound-based techniques. It states that with probability 1 δ, the true mean (r) isatleast(( r) ɛ), where ( r) is the estimated mean value expressed as: R ɛ = 2 ln(1/δ), 2n where R is the range of the estimated number and n is the number of points. This generic method has been applied to an extension of the traditional K-means clustering algorithm, VFKM, and decision tree classification, very fast decision trees (VFDT), techniques. Unlike K-median that has been used extensively in data stream clustering, the K-means algorithm computes the cluster centers by using the mean values of the data records assigned to the cluster under examination. VFKM 13 uses Hoeffding bound to determine the number of examples needed in each step of K-means algorithm. VFKM runs as a sequence of K-means executions with each run uses more data records than the previous one until the calculated statistical Hoeffding bound is satisfied. Domingos and Hulten 10,11 have developed VFDT, which is a decision tree learning system based on Hoeffding trees. It splits the tree using the current best attribute taking into consideration that the number of examples/records used satisfies the Hoeffding bound. VFDT is an extended version of Hoeffding tree algorithm that addresses the research issues of data streams. These research issues are as follows: Ties of attributes: occur when two or more attributes have close values of the splitting criteria such as information gain. High speed nature of data streams: represents an inherent feature of data streams. Bounded memory: the tree can grow till the algorithm runs out of memory. Accuracy of the output: is an issue in all data stream mining algorithms. The extension of Hoeffding trees in VFDT has been done using the following techniques: Ties of attributes have been overcome using a user-specified threshold of acceptable error measure for the output. That way the algorithm running time will be reduced and it overcomes the risk of infinite running time of the algorithm. The high speed nature of the streaming information has been addressed using batch processing. The computation of the splitting criteria is done in a batch processing rather than online processing. This significantly reduces the time of recalculating the criteria for all 80 Volume 2, January/February 2012

3 WIREs Data Mining and Knowledge Discovery the attributes with each incoming record of the stream. Bounded memory has been addressed by deactivating the least promising leaves and ignoring the poor attributes. The calculation of these poor attributes is done through the difference between the splitting criteria of the highest and lowest attributes. If the difference is greater than a prespecified value, the attribute with the lowest splitting measure will be removed from memory saving the memory of the data stream computing environment. The accuracy of the output has been taken into consideration using multiple scans over the data streams in the case of low data rates, and by using an accurate initialization of the tree using a different, more accurate technique to build an initial decision tree. All of the above improvements have been tested using synthetic data sets. The experiments have proved efficiency of these improvements. The VFDT has been extended to address the problem of concept drift in evolving data streams by Hulten et al. 11 The new framework has been termed as CVFDT. It is mainly running VFDT over fixed sliding windows in order to have the most updated classifier. The change occurs when the splitting criteria change significantly across the input attributes. It is worth pointing out here the work by Masud et al. 14 is tackling concept drifts with emerging classes. SAX-Based Techniques Symbolic ApproXimation (SAX) is a time series representation that has been introduced by Keogh and his colleagues. 15 SAX has proved to be the state-ofthe-art technique in time series representation. Time series data is a typical streaming source with a temporal dimension. In addition of being used in traditional data mining techniques such as clustering, classification, and indexing, it has achieved important breakthroughs in finding the most different subsequence in a time series termed discord 16 and the most frequent subsequence in a time series termed motif. 17 Numerous applications have used SAX representation with notable success. Some examples can be recalled here. It has been reported 16 that a premature ventricular contraction could be accurately identified using discord detection in the time series of electrocardiogram (ECG). Li and Nallela 18 have used motif discovery with SAX representation to successfully find patterns of water level. SAX follows three major steps in converting a time series from its numerical form to its symbolic form. The first step is Piecewise Aggregate Approximation (PAA). This is done by converting a time series of size n to an arbitrarily size w using the following equation: C i = w n n w i j= n w (i 1)+1 C j, where C i is the ith time point in the approximated time series. The second step is symbolic discretization. This is done via producing equal areas under the curve of the Gaussian distribution and setting respective breakpoints. Each breakpoint represents a step from one letter to another when replacing the approximated values produced by the PAA process by its approximated symbolic values. The final step uses a distance measure between each two characters that are stored in a lookup table to find out the accumulated distance between any two subsequences of times series. Granularity-Based Techniques Granularity-based approach has been introduced by Gaber et al Having noted that stream mining techniques may fall short when running on resourceconstrained devices such as smart phones and sensor nodes, the granularity-based approach works on adapting the mining techniques to change their resource consumption patterns over time according to availability of resources. Resource consumption patterns represent the change in resource consumption over a period of time which is termed as time frame. The algorithm granularity settings are the input, output, and processing settings of a mining algorithm that can vary over time to cope with the availability of resources and current data stream arrival rate. The following are definitions of each of these settings: Algorithm input granularity (AIG): AIG represents the process of changing the data stream arrival rates that feed the algorithm. Examples of techniques that could be used include sampling, load shedding, and creating data synopsis. Sampling has been the choice used in developing the granularity-based data mining techniques. Algorithm output granularity (AOG): AOG is the process of changing the output size of the algorithm in order to preserve the limited Volume 2, January/February

4 wires.wiley.com/widm memory space. We refer to this output as the number of knowledge structures. For example, we may refer to number of clusters or rules. Algorithm processing granularity (APG): APG is the process of changing the algorithm parameters in order to consume less processing power. Randomization and approximation techniques represent the strategies of APG. It should be noted that there is a collective interaction among the above three settings. AIG mainly affects the data rate and it is associated with bandwidth consumption and battery. On the other hand, AOG is associated with memory and APG is associated with processing power. However, the change in any of them affects the other resources. The process of enabling resource awareness should be very lightweight in order to be feasible in a streaming environment characterized by its scarcity of resources. Accordingly, the algorithm granularity settings only consider direct interactions. The algorithm granularity requires continuous monitoring of the computational resources. This is done over fixed time intervals/frames that we denote as TF. According to this periodic resource monitoring, the mining algorithm changes its parameters to cope with the current consumption patterns of resources. These parameters are AIG, APG, and AOG settings discussed briefly in the previous section. It has to be noted that setting the value of TF is a critical parameter for the success of the running technique. The higher the TF is, the lower the adaptation overhead will be, but at the expense of risking a high consumption of resources during the long time frame. The use of algorithm granularity as a general approach for mining data streams will require us to provide some formal definitions and notations. The following are definitions that we will use in our discussion: R: set of computational resources R = {r 1, r 2,..., r n }; TF: time interval for resource monitoring and adaptation; ALT: application lifetime; ALT : time left to last the application lifetime; NoF(r i ): number of time frames to consume the resources r i, assuming that the consumption pattern of r i will follow the same pattern of thelasttimeframe; AGP(r i ): algorithm granularity parameter that affects the resource r i. According to the above, the main rule to be used to use the algorithm granularity approach is as follows: IF ALT > NoF (r TF i ) THEN SET AGP(r i ) ELSE SET AGP(r i ) + Where AGP(r i ) + achieves higher accuracy at the expense of higher consumption of the resource r i, and AGP(r i ) achieves lower accuracy at the advantage of lower consumption of the resource r i. This simplified rule could take different forms according to the monitored resource and the algorithm granularity parameter applied to control the consumption of this resource. Interested readers are referred to Ref 19 for applying the above rule in controlling a data stream clustering algorithm termed as RA-Cluster. Interested practitioners can use the following procedure for enabling resource awareness and adaptation for their data stream mining algorithms. The procedure follows the following steps: 1. Identify the set of resources that mining algorithm will adapt accordingly (R); 2. Set the application lifetime (ALT) and time interval/frame (TF); 3. Define AGP(r i ) + and AGP(r i ) for every r i R; 4. Run the algorithm for TF; 5. Monitor the resource consumption for every r i R; 6. Apply AGP(r i ) + or AGP(r i ) to every r i R according to the ratio ALT : NoF (r TF i ) and the rule given; 7. Repeat the last three steps. Applying the above procedure is all what is needed to enable resource awareness and adaptation, using the algorithm granularity approach, to stream mining algorithms. On the basis of the Granularity-based approach, a number of data stream mining algorithms have been developed. For a complete list of techniques, the reader is advised to review the recent tutorial by Gama et al Volume 2, January/February 2012

5 WIREs Data Mining and Knowledge Discovery NEW DIRECTION IN DATA STREAM MINING Data stream mining has evolved as a new form of online data analysis that has also challenged the computational capabilities of our state-of-the-art data processing facilities. However, advances in the computational power of small computational devices including personal digital assistants (PDAs), smart phones and sensor nodes have realized an unpreceded opportunity to perform ubiquitous data stream mining. We can broadly categorize this area to mining sensor data streams and mobile data mining. Recent achievements in these areas are discussed in the following subsections. Mining Sensor Data Streams Many important applications coupled with the increase of the computational power of wirelessly connected sensor nodes have given birth to this new research direction in the data stream mining area. Mining data streams originated from sensor nodes has witnessed notable success in the last few years. Research issues associated with this area have been detailed in Ref 3. Differences between data stream mining in sensor networks and other platforms as detailed in Ref 3 are as follows: Duplication of data in densely deployment of wireless sensor networks introduces a new challenge. Multilevel data mining is important in wireless sensor networks given that individual sensors can generate local models that need to be integrated. Real-time data cleansing given that sensory streaming data is likely to be noisy. Adaptation to availability of resources is inevitable given the limited resources that each sensor node has. It is worth mentioning the success of granularity-based approach in developing stream mining techniques that are able to operate in wireless sensor networks. 20,23 The field is concerned with benefiting from the large deployment of small computational devices that are able to communicate wirelessly and have increasing sensing capabilities. This rich source of streaming data is a key to the success of many important security, scientific, and industrial applications. Examples of these applications could be found in Refs 3,4,24. Mobile Data Stream Mining The number of mobile users is in continuous increase. Mobile data mining users are not an exception. Academic prototypes such as Open Mobile Miner (OMM) 25 and commercial products such as MineFleet 26 have already found their way to users. We can date back the early start of the area of mobile data mining to MobiMine system developed by Karguta et al. 27 Although the system targets mobile brokers in the stock exchange area, the data mining process has been performed on a server conserving the scarce resources of the mobile device, a PDA in this case. Few years later, Karguta et al. 28 have developed VEDAS system for distributed data stream mining of a fleet of vehicles, analyzing both the driver s behavior and the vehicle s health. The system has used mobile devices running different data stream mining techniques. This has been a result of the advances in computational capabilities of our mobile devices. Mobility of the user, connectivity problems, and availability of computational resources are the major research issues in this promising area of research. The granularity-based approach has proved to be a successful solution when running stream mining techniques on mobile devices with limited resources. 21 The OMM tool 25 has adopted the granularity-based approach. Future Insights We can state the future directions and insights in this growing area of research: Online medical, scientific, and biological data stream mining using data generated from medical, biological instruments, and various tools employed in scientific laboratories; Hardware solutions to small devices emitting or receiving data streams in order to enable high-performance computation on small devices; Developing software architectures that serve data streaming applications; Situation aware data stream mining that recalls the models built in similar situations rather than building a new model; Online text mining for opinion discovery with the notable use of Web 2.0 technologies. Conclusion This review paper has highlighted the major strategies and techniques used in data stream mining. We have identified four categories of techniques: (1) two-phase Volume 2, January/February

6 wires.wiley.com/widm techniques, (2)Hoeffding bound-based techniques, (3) symbolic approximation-based techniques, and (4) granularity-based techniques. Details of each category have been discussed. New directions and future insights in this growing area of research have been presented. Two research directions have been discussed. The first concerns mining data originated from sensor networks. Mobile data stream mining represents the second area. Finally, future insights by the author have been enumerated giving the reader some potential direction for research. REFERENCES 1. Gaber MM, Zaslavsky A, Krishnaswamy S. Mining data streams: a review. ACM SIGMOD Rec 2005, 34: Babcock B, Babu S, Datar M, Motwani R, Widom J. Models and issues in data stream systems. In: Proceedings of PODS Gama J, Gaber MM, eds. Learning from Data Streams: Processing Techniques in Sensor Networks. Springer Verlag; Ganguly A, Gama J, Omitaomu O, Gaber MM, Vatsavai RR, eds. Knowledge Discovery from Sensor Data. Berlin, Germany: CRC Press; Aggarwal CC, Han J, Wang J, Yu PS. A framework for clustering evolving data streams. In: Proceedings of the 29th VLDB Conference. Berlin; 2003, ZhangT,RamakrishnanR,Livny,M.BIRCH:anefficient data clustering method for very large databases. SIGMOD Rec. New York: ACM Press; 1996, 25: Aggarwal CC, Han J, Wang J, Yu P. A framework for high dimensional projected clustering of data streams. In: Proceedings of the VLDB Conference Aggarwal CC, Han J, Wang J, Yu P. On demand classification of data streams. In: Proceedings of the ACM KDD Conference. Seattle, WA; 2004, Gaber MM, Zaslavsky A, Krishnaswamy S. A survey of classification methods in data streams. In: Aggarwal C, ed. Data Streams: Models and Algorithms. Springer Verlag; 2007, Domingos P, Hulten G. Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press; 2000, Hulten G, Spencer L, Domingos P. Mining timechanging data streams. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press; 2001, Hoeffding W. Probability inequalities for sums of bounded random variables. J Am Stat Assoc 1963, 58: Domingos P, Hulten G. A general method for scaling up machine learning algorithms and its application to clustering. In: Proceedings of the 18th International Conference on Machine Learning. Williams College, Williamstown, MA, USA, 2001, Masud MM, Gao J, Khan L, Han J, Thuraisingham BM. Integrating novel class detection with classification for concept-drifting data streams. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Bled, Slovenia, 2009, Lin J, Keogh E, Lonardi S, Chiu B. A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. San Diego, CA; 2003, Keogh E, Lin J, Fu A. HOT SAX: efficiently finding the most unusual time series subsequence. In: Proceedings of the 5th IEEE International Conference on Data Mining (ICDM 2005). Houston, TX; 2005, Chiu B, Keogh E, Lonardi S. Probabilistic discovery of time series motifs. In: Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington D.C.; 2003, Li L, Nallela S. Probabilistic discovery of motifs in water level. In: IEEE International Conference on Information Reuse and Integration. Las Vegas, NV; 2009, Gaber MM, Yu PS. A holistic approach for resourceaware adaptive data stream mining. J New Gen Comput 2006, 25: Phung ND, Gaber MM, Röhm U. Resource-aware online data mining in wireless sensor networks. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining. IEEE Symposium Series on Computational Intelligence. Honolulu, HI; 2007, Gaber MM. Data stream mining using granularitybased approach. In: Abraham A, Hassanien A, Carvalho A, Snase V, eds. Foundations of Computational Intelligence. Vol. 6. Berlin/Heidelberg: Springer; 2009, Gama J, Gaber MM, Krishnaswamy S. Data stream mining: from theory to applications and from stationary to mobile. In: Twenty-Fifth Symposium On Applied Computing. Sierre, Switzerland. Available at: 84 Volume 2, January/February 2012

7 WIREs Data Mining and Knowledge Discovery SAC10-DS-Tutorial/Tutorial-SAC10-Final.pdf (Accessed October 20, 2011.) 23. Gaber MM, Shiddiqi AM. Distributed data stream classification for wireless sensor networks. In: Proceedings of the 2010 ACM Symposium on Applied Computing (SAC). Sierre, Switzerland: ACM Press; 2010, Gaber MM, Vatsavai R, Omitaomu O, Gama J, Chawla N, Ganguly A, eds. Knowledge Discovery from Sensor Data. Lecture Notes in Computer Science. Vol Las Vegas, Berlin, Germany, NV: Springer; Krishnaswamy S, Gaber MM, Harbach M, Hugues C, Sinha A, Gillick B, Haghighi PD, Zaslavsky A. Open Mobile Miner: a toolkit for mobile data stream mining. ACM Knowl Discov Databases Agnik. MineFleet description. Available at: (Accessed October 17, 2011.) 27. KarguptaH,ParkB,PittieS,LiuL,KushrajD,Sarkar K. MobiMine: monitoring the stock market from a PDA. ACM SIGKDD Explor 2002, 3: Kargupta H, Bhargava R, Liu K, Powers M, Blair P, Bushra S, Dull J, Sarkar K, Klein M, Vasa M, Handy D. VEDAS: a mobile and distributed data stream mining system for real-time vehicle monitoring. In: Proceedings of the SIAM International Data Mining Conference. Orlando, FL; 2004, Volume 2, January/February

Clustering from Data Streams

Clustering from Data Streams Clustering from Data Streams João Gama LIAAD-INESC Porto, University of Porto, Portugal jgama@fep.up.pt 1 Introduction 2 Clustering Micro Clustering 3 Clustering Time Series Growing the Structure Adapting

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK REAL TIME DATA SEARCH OPTIMIZATION: AN OVERVIEW MS. DEEPASHRI S. KHAWASE 1, PROF.

More information

Data mining techniques for data streams mining

Data mining techniques for data streams mining REVIEW OF COMPUTER ENGINEERING STUDIES ISSN: 2369-0755 (Print), 2369-0763 (Online) Vol. 4, No. 1, March, 2017, pp. 31-35 DOI: 10.18280/rces.040106 Licensed under CC BY-NC 4.0 A publication of IIETA http://www.iieta.org/journals/rces

More information

A Wireless Data Stream Mining Model Mohamed Medhat Gaber 1, Shonali Krishnaswamy 1, and Arkady Zaslavsky 1

A Wireless Data Stream Mining Model Mohamed Medhat Gaber 1, Shonali Krishnaswamy 1, and Arkady Zaslavsky 1 A Wireless Data Stream Mining Model Mohamed Medhat Gaber 1, Shonali Krishnaswamy 1, and Arkady Zaslavsky 1 1 School of Computer Science and Software Engineering, Monash University, 900 Dandenong Rd, Caulfield

More information

HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery

HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery Ninh D. Pham, Quang Loc Le, Tran Khanh Dang Faculty of Computer Science and Engineering, HCM University of Technology,

More information

Mining Frequent Itemsets for data streams over Weighted Sliding Windows

Mining Frequent Itemsets for data streams over Weighted Sliding Windows Mining Frequent Itemsets for data streams over Weighted Sliding Windows Pauray S.M. Tsai Yao-Ming Chen Department of Computer Science and Information Engineering Minghsin University of Science and Technology

More information

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06

More information

Incremental Classification of Nonstationary Data Streams

Incremental Classification of Nonstationary Data Streams Incremental Classification of Nonstationary Data Streams Lior Cohen, Gil Avrahami, Mark Last Ben-Gurion University of the Negev Department of Information Systems Engineering Beer-Sheva 84105, Israel Email:{clior,gilav,mlast}@

More information

E-Stream: Evolution-Based Technique for Stream Clustering

E-Stream: Evolution-Based Technique for Stream Clustering E-Stream: Evolution-Based Technique for Stream Clustering Komkrit Udommanetanakit, Thanawin Rakthanmanon, and Kitsana Waiyamai Department of Computer Engineering, Faculty of Engineering Kasetsart University,

More information

arxiv: v1 [cs.lg] 3 Oct 2018

arxiv: v1 [cs.lg] 3 Oct 2018 Real-time Clustering Algorithm Based on Predefined Level-of-Similarity Real-time Clustering Algorithm Based on Predefined Level-of-Similarity arxiv:1810.01878v1 [cs.lg] 3 Oct 2018 Rabindra Lamsal Shubham

More information

DOI:: /ijarcsse/V7I1/0111

DOI:: /ijarcsse/V7I1/0111 Volume 7, Issue 1, January 2017 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey on

More information

City, University of London Institutional Repository

City, University of London Institutional Repository City Research Online City, University of London Institutional Repository Citation: Andrienko, N., Andrienko, G., Fuchs, G., Rinzivillo, S. & Betz, H-D. (2015). Real Time Detection and Tracking of Spatial

More information

Sensor Based Time Series Classification of Body Movement

Sensor Based Time Series Classification of Body Movement Sensor Based Time Series Classification of Body Movement Swapna Philip, Yu Cao*, and Ming Li Department of Computer Science California State University, Fresno Fresno, CA, U.S.A swapna.philip@gmail.com,

More information

Research on Parallelized Stream Data Micro Clustering Algorithm Ke Ma 1, Lingjuan Li 1, Yimu Ji 1, Shengmei Luo 1, Tao Wen 2

Research on Parallelized Stream Data Micro Clustering Algorithm Ke Ma 1, Lingjuan Li 1, Yimu Ji 1, Shengmei Luo 1, Tao Wen 2 International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015) Research on Parallelized Stream Data Micro Clustering Algorithm Ke Ma 1, Lingjuan Li 1, Yimu Ji 1,

More information

Role of big data in classification and novel class detection in data streams

Role of big data in classification and novel class detection in data streams DOI 10.1186/s40537-016-0040-9 METHODOLOGY Open Access Role of big data in classification and novel class detection in data streams M. B. Chandak * *Correspondence: hodcs@rknec.edu; chandakmb@gmail.com

More information

Towards New Heterogeneous Data Stream Clustering based on Density

Towards New Heterogeneous Data Stream Clustering based on Density , pp.30-35 http://dx.doi.org/10.14257/astl.2015.83.07 Towards New Heterogeneous Data Stream Clustering based on Density Chen Jin-yin, He Hui-hao Zhejiang University of Technology, Hangzhou,310000 chenjinyin@zjut.edu.cn

More information

Novel Classification Methods on Data Streams

Novel Classification Methods on Data Streams Novel Classification Methods on Data Streams Nikolaos Kouiroukidis Department of Applied Informatics University of Macedonia Thessaloniki, Greece kouiruki@uom.gr ABSTRACT We have encountered an enormous

More information

Deakin Research Online

Deakin Research Online Deakin Research Online This is the published version: Saha, Budhaditya, Lazarescu, Mihai and Venkatesh, Svetha 27, Infrequent item mining in multiple data streams, in Data Mining Workshops, 27. ICDM Workshops

More information

K-means based data stream clustering algorithm extended with no. of cluster estimation method

K-means based data stream clustering algorithm extended with no. of cluster estimation method K-means based data stream clustering algorithm extended with no. of cluster estimation method Makadia Dipti 1, Prof. Tejal Patel 2 1 Information and Technology Department, G.H.Patel Engineering College,

More information

Volume 2, Issue 2, February 2014 International Journal of Advance Research in Computer Science and Management Studies

Volume 2, Issue 2, February 2014 International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 2, February 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Paper / Case Study Available online at: www.ijarcsms.com Mining

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

Data Clustering With Leaders and Subleaders Algorithm

Data Clustering With Leaders and Subleaders Algorithm IOSR Journal of Engineering (IOSRJEN) e-issn: 2250-3021, p-issn: 2278-8719, Volume 2, Issue 11 (November2012), PP 01-07 Data Clustering With Leaders and Subleaders Algorithm Srinivasulu M 1,Kotilingswara

More information

An Empirical Comparison of Stream Clustering Algorithms

An Empirical Comparison of Stream Clustering Algorithms MÜNSTER An Empirical Comparison of Stream Clustering Algorithms Matthias Carnein Dennis Assenmacher Heike Trautmann CF 17 BigDAW Workshop Siena Italy May 15 18 217 Clustering MÜNSTER An Empirical Comparison

More information

DATA STREAMS: MODELS AND ALGORITHMS

DATA STREAMS: MODELS AND ALGORITHMS DATA STREAMS: MODELS AND ALGORITHMS DATA STREAMS: MODELS AND ALGORITHMS Edited by CHARU C. AGGARWAL IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 Kluwer Academic Publishers Boston/Dordrecht/London

More information

Clustering Large Dynamic Datasets Using Exemplar Points

Clustering Large Dynamic Datasets Using Exemplar Points Clustering Large Dynamic Datasets Using Exemplar Points William Sia, Mihai M. Lazarescu Department of Computer Science, Curtin University, GPO Box U1987, Perth 61, W.A. Email: {siaw, lazaresc}@cs.curtin.edu.au

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Mining Data Streams. From Data-Streams Management System Queries to Knowledge Discovery from continuous and fast-evolving Data Records.

Mining Data Streams. From Data-Streams Management System Queries to Knowledge Discovery from continuous and fast-evolving Data Records. DATA STREAMS MINING Mining Data Streams From Data-Streams Management System Queries to Knowledge Discovery from continuous and fast-evolving Data Records. Hammad Haleem Xavier Plantaz APPLICATIONS Sensors

More information

Mining Data Streams Data Mining and Text Mining (UIC Politecnico di Milano) Daniele Loiacono

Mining Data Streams Data Mining and Text Mining (UIC Politecnico di Milano) Daniele Loiacono Mining Data Streams Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann Series in Data

More information

Feature Based Data Stream Classification (FBDC) and Novel Class Detection

Feature Based Data Stream Classification (FBDC) and Novel Class Detection RESEARCH ARTICLE OPEN ACCESS Feature Based Data Stream Classification (FBDC) and Novel Class Detection Sminu N.R, Jemimah Simon 1 Currently pursuing M.E (Software Engineering) in Vins christian college

More information

Data Stream Clustering Using Micro Clusters

Data Stream Clustering Using Micro Clusters Data Stream Clustering Using Micro Clusters Ms. Jyoti.S.Pawar 1, Prof. N. M.Shahane. 2 1 PG student, Department of Computer Engineering K. K. W. I. E. E. R., Nashik Maharashtra, India 2 Assistant Professor

More information

Event Detection using Archived Smart House Sensor Data obtained using Symbolic Aggregate Approximation

Event Detection using Archived Smart House Sensor Data obtained using Symbolic Aggregate Approximation Event Detection using Archived Smart House Sensor Data obtained using Symbolic Aggregate Approximation Ayaka ONISHI 1, and Chiemi WATANABE 2 1,2 Graduate School of Humanities and Sciences, Ochanomizu University,

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER

SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER P.Radhabai Mrs.M.Priya Packialatha Dr.G.Geetha PG Student Assistant Professor Professor Dept of Computer Science and Engg Dept

More information

Nesnelerin İnternetinde Veri Analizi

Nesnelerin İnternetinde Veri Analizi Nesnelerin İnternetinde Veri Analizi Bölüm 3. Classification in Data Streams w3.gazi.edu.tr/~suatozdemir Supervised vs. Unsupervised Learning (1) Supervised learning (classification) Supervision: The training

More information

A Framework for Clustering Massive Text and Categorical Data Streams

A Framework for Clustering Massive Text and Categorical Data Streams A Framework for Clustering Massive Text and Categorical Data Streams Charu C. Aggarwal IBM T. J. Watson Research Center charu@us.ibm.com Philip S. Yu IBM T. J.Watson Research Center psyu@us.ibm.com Abstract

More information

Sequences Modeling and Analysis Based on Complex Network

Sequences Modeling and Analysis Based on Complex Network Sequences Modeling and Analysis Based on Complex Network Li Wan 1, Kai Shu 1, and Yu Guo 2 1 Chongqing University, China 2 Institute of Chemical Defence People Libration Army {wanli,shukai}@cqu.edu.cn

More information

Lecture 7. Data Stream Mining. Building decision trees

Lecture 7. Data Stream Mining. Building decision trees 1 / 26 Lecture 7. Data Stream Mining. Building decision trees Ricard Gavaldà MIRI Seminar on Data Streams, Spring 2015 Contents 2 / 26 1 Data Stream Mining 2 Decision Tree Learning Data Stream Mining 3

More information

Clustering Algorithms for Data Stream

Clustering Algorithms for Data Stream Clustering Algorithms for Data Stream Karishma Nadhe 1, Prof. P. M. Chawan 2 1Student, Dept of CS & IT, VJTI Mumbai, Maharashtra, India 2Professor, Dept of CS & IT, VJTI Mumbai, Maharashtra, India Abstract:

More information

NDoT: Nearest Neighbor Distance Based Outlier Detection Technique

NDoT: Nearest Neighbor Distance Based Outlier Detection Technique NDoT: Nearest Neighbor Distance Based Outlier Detection Technique Neminath Hubballi 1, Bidyut Kr. Patra 2, and Sukumar Nandi 1 1 Department of Computer Science & Engineering, Indian Institute of Technology

More information

ONLINE ALGORITHMS FOR HANDLING DATA STREAMS

ONLINE ALGORITHMS FOR HANDLING DATA STREAMS ONLINE ALGORITHMS FOR HANDLING DATA STREAMS Seminar I Luka Stopar Supervisor: prof. dr. Dunja Mladenić Approved by the supervisor: (signature) Study programme: Information and Communication Technologies

More information

Data Stream Mining. Tore Risch Dept. of information technology Uppsala University Sweden

Data Stream Mining. Tore Risch Dept. of information technology Uppsala University Sweden Data Stream Mining Tore Risch Dept. of information technology Uppsala University Sweden 2016-02-25 Enormous data growth Read landmark article in Economist 2010-02-27: http://www.economist.com/node/15557443/

More information

Classification of Concept Drifting Data Streams Using Adaptive Novel-Class Detection

Classification of Concept Drifting Data Streams Using Adaptive Novel-Class Detection Volume 3, Issue 9, September-2016, pp. 514-520 ISSN (O): 2349-7084 International Journal of Computer Engineering In Research Trends Available online at: www.ijcert.org Classification of Concept Drifting

More information

An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction

An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction International Journal of Engineering Science Invention Volume 2 Issue 1 January. 2013 An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction Janakiramaiah Bonam 1, Dr.RamaMohan

More information

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis Meshal Shutaywi and Nezamoddin N. Kachouie Department of Mathematical Sciences, Florida Institute of Technology Abstract

More information

Review on Gaussian Estimation Based Decision Trees for Data Streams Mining Miss. Poonam M Jagdale 1, Asst. Prof. Devendra P Gadekar 2

Review on Gaussian Estimation Based Decision Trees for Data Streams Mining Miss. Poonam M Jagdale 1, Asst. Prof. Devendra P Gadekar 2 Review on Gaussian Estimation Based Decision Trees for Data Streams Mining Miss. Poonam M Jagdale 1, Asst. Prof. Devendra P Gadekar 2 1,2 Pune University, Pune Abstract In recent year, mining data streams

More information

Multiresolution Motif Discovery in Time Series

Multiresolution Motif Discovery in Time Series Tenth SIAM International Conference on Data Mining Columbus, Ohio, USA Multiresolution Motif Discovery in Time Series NUNO CASTRO PAULO AZEVEDO Department of Informatics University of Minho Portugal April

More information

A Survey on Postive and Unlabelled Learning

A Survey on Postive and Unlabelled Learning A Survey on Postive and Unlabelled Learning Gang Li Computer & Information Sciences University of Delaware ligang@udel.edu Abstract In this paper we survey the main algorithms used in positive and unlabeled

More information

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE Sinu T S 1, Mr.Joseph George 1,2 Computer Science and Engineering, Adi Shankara Institute of Engineering

More information

Ms. Ritu Dr. Bhawna Suri Dr. P. S. Kulkarni (Assistant Prof.) (Associate Prof. ) (Assistant Prof.) BPIT, Delhi BPIT, Delhi COER, Roorkee

Ms. Ritu Dr. Bhawna Suri Dr. P. S. Kulkarni (Assistant Prof.) (Associate Prof. ) (Assistant Prof.) BPIT, Delhi BPIT, Delhi COER, Roorkee Journal Homepage: NOVEL FRAMEWORK FOR DATA STREAMS CLASSIFICATION APPROACH BY DETECTING RECURRING FEATURE CHANGE IN FEATURE EVOLUTION AND FEATURE S CONTRIBUTION IN CONCEPT DRIFT Ms. Ritu Dr. Bhawna Suri

More information

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Bhakti V. Gavali 1, Prof. Vivekanand Reddy 2 1 Department of Computer Science and Engineering, Visvesvaraya Technological

More information

Online Dial-A-Ride Problem with Time Windows: an exact algorithm using status vectors

Online Dial-A-Ride Problem with Time Windows: an exact algorithm using status vectors Online Dial-A-Ride Problem with Time Windows: an exact algorithm using status vectors A. Fabri 1 and P. Recht 2 1 Universität Dortmund a.fabri@wiso.uni-dortmund.de 2 Universität Dortmund p.recht@wiso.uni-dortmund.de

More information

Mining Frequent Itemsets from Data Streams with a Time- Sensitive Sliding Window

Mining Frequent Itemsets from Data Streams with a Time- Sensitive Sliding Window Mining Frequent Itemsets from Data Streams with a Time- Sensitive Sliding Window Chih-Hsiang Lin, Ding-Ying Chiu, Yi-Hung Wu Department of Computer Science National Tsing Hua University Arbee L.P. Chen

More information

Cost-sensitive Boosting for Concept Drift

Cost-sensitive Boosting for Concept Drift Cost-sensitive Boosting for Concept Drift Ashok Venkatesan, Narayanan C. Krishnan, Sethuraman Panchanathan Center for Cognitive Ubiquitous Computing, School of Computing, Informatics and Decision Systems

More information

Towards Real-Time Feature Tracking Technique using Adaptive Micro-Clusters

Towards Real-Time Feature Tracking Technique using Adaptive Micro-Clusters Towards Real-Time Feature Tracking Technique using Adaptive Micro-Clusters Mahmood Shakir Hammoodi, Frederic Stahl, Mark Tennant, Atta Badii Abstract Data streams are unbounded, sequential data instances

More information

Random Sampling over Data Streams for Sequential Pattern Mining

Random Sampling over Data Streams for Sequential Pattern Mining Random Sampling over Data Streams for Sequential Pattern Mining Chedy Raïssi LIRMM, EMA-LGI2P/Site EERIE 161 rue Ada 34392 Montpellier Cedex 5, France France raissi@lirmm.fr Pascal Poncelet EMA-LGI2P/Site

More information

Noval Stream Data Mining Framework under the Background of Big Data

Noval Stream Data Mining Framework under the Background of Big Data BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 16, No 5 Special Issue on Application of Advanced Computing and Simulation in Information Systems Sofia 2016 Print ISSN: 1311-9702;

More information

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AN EFFICIENT APPROACH FOR TEXT MINING USING SIDE INFORMATION Kiran V. Gaidhane*, Prof. L. H. Patil, Prof. C. U. Chouhan DOI: 10.5281/zenodo.58632

More information

Extended R-Tree Indexing Structure for Ensemble Stream Data Classification

Extended R-Tree Indexing Structure for Ensemble Stream Data Classification Extended R-Tree Indexing Structure for Ensemble Stream Data Classification P. Sravanthi M.Tech Student, Department of CSE KMM Institute of Technology and Sciences Tirupati, India J. S. Ananda Kumar Assistant

More information

Image Mining: frameworks and techniques

Image Mining: frameworks and techniques Image Mining: frameworks and techniques Madhumathi.k 1, Dr.Antony Selvadoss Thanamani 2 M.Phil, Department of computer science, NGM College, Pollachi, Coimbatore, India 1 HOD Department of Computer Science,

More information

Mining Maximum frequent item sets over data streams using Transaction Sliding Window Techniques

Mining Maximum frequent item sets over data streams using Transaction Sliding Window Techniques IJCSNS International Journal of Computer Science and Network Security, VOL.1 No.2, February 201 85 Mining Maximum frequent item sets over data streams using Transaction Sliding Window Techniques ANNURADHA

More information

A New Online Clustering Approach for Data in Arbitrary Shaped Clusters

A New Online Clustering Approach for Data in Arbitrary Shaped Clusters A New Online Clustering Approach for Data in Arbitrary Shaped Clusters Richard Hyde, Plamen Angelov Data Science Group, School of Computing and Communications Lancaster University Lancaster, LA1 4WA, UK

More information

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University

More information

Clustering Of Ecg Using D-Stream Algorithm

Clustering Of Ecg Using D-Stream Algorithm Clustering Of Ecg Using D-Stream Algorithm Vaishali Yeole Jyoti Kadam Department of computer Engg. Department of computer Engg. K.C college of Engg, K.C college of Engg Thane (E). Thane (E). Abstract The

More information

Social Behavior Prediction Through Reality Mining

Social Behavior Prediction Through Reality Mining Social Behavior Prediction Through Reality Mining Charlie Dagli, William Campbell, Clifford Weinstein Human Language Technology Group MIT Lincoln Laboratory This work was sponsored by the DDR&E / RRTO

More information

Classification of Concept-Drifting Data Streams using Optimized Genetic Algorithm

Classification of Concept-Drifting Data Streams using Optimized Genetic Algorithm Classification of Concept-Drifting Data Streams using Optimized Genetic Algorithm E. Padmalatha Asst.prof CBIT C.R.K. Reddy, PhD Professor CBIT B. Padmaja Rani, PhD Professor JNTUH ABSTRACT Data Stream

More information

K-Means Clustering With Initial Centroids Based On Difference Operator

K-Means Clustering With Initial Centroids Based On Difference Operator K-Means Clustering With Initial Centroids Based On Difference Operator Satish Chaurasiya 1, Dr.Ratish Agrawal 2 M.Tech Student, School of Information and Technology, R.G.P.V, Bhopal, India Assistant Professor,

More information

Dynamic Clustering of Data with Modified K-Means Algorithm

Dynamic Clustering of Data with Modified K-Means Algorithm 2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq

More information

Mining Frequent Itemsets in Time-Varying Data Streams

Mining Frequent Itemsets in Time-Varying Data Streams Mining Frequent Itemsets in Time-Varying Data Streams Abstract A transactional data stream is an unbounded sequence of transactions continuously generated, usually at a high rate. Mining frequent itemsets

More information

Temporal Weighted Association Rule Mining for Classification

Temporal Weighted Association Rule Mining for Classification Temporal Weighted Association Rule Mining for Classification Purushottam Sharma and Kanak Saxena Abstract There are so many important techniques towards finding the association rules. But, when we consider

More information

EFFICIENT ADAPTIVE PREPROCESSING WITH DIMENSIONALITY REDUCTION FOR STREAMING DATA

EFFICIENT ADAPTIVE PREPROCESSING WITH DIMENSIONALITY REDUCTION FOR STREAMING DATA INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 EFFICIENT ADAPTIVE PREPROCESSING WITH DIMENSIONALITY REDUCTION FOR STREAMING DATA Saranya Vani.M 1, Dr. S. Uma 2,

More information

e-ccc-biclustering: Related work on biclustering algorithms for time series gene expression data

e-ccc-biclustering: Related work on biclustering algorithms for time series gene expression data : Related work on biclustering algorithms for time series gene expression data Sara C. Madeira 1,2,3, Arlindo L. Oliveira 1,2 1 Knowledge Discovery and Bioinformatics (KDBIO) group, INESC-ID, Lisbon, Portugal

More information

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set To Enhance Scalability of Item Transactions by Parallel and Partition using Dynamic Data Set Priyanka Soni, Research Scholar (CSE), MTRI, Bhopal, priyanka.soni379@gmail.com Dhirendra Kumar Jha, MTRI, Bhopal,

More information

C-NBC: Neighborhood-Based Clustering with Constraints

C-NBC: Neighborhood-Based Clustering with Constraints C-NBC: Neighborhood-Based Clustering with Constraints Piotr Lasek Chair of Computer Science, University of Rzeszów ul. Prof. St. Pigonia 1, 35-310 Rzeszów, Poland lasek@ur.edu.pl Abstract. Clustering is

More information

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Abstract Mrs. C. Poongodi 1, Ms. R. Kalaivani 2 1 PG Student, 2 Assistant Professor, Department of

More information

Hierarchical Online Mining for Associative Rules

Hierarchical Online Mining for Associative Rules Hierarchical Online Mining for Associative Rules Naresh Jotwani Dhirubhai Ambani Institute of Information & Communication Technology Gandhinagar 382009 INDIA naresh_jotwani@da-iict.org Abstract Mining

More information

Web page recommendation using a stochastic process model

Web page recommendation using a stochastic process model Data Mining VII: Data, Text and Web Mining and their Business Applications 233 Web page recommendation using a stochastic process model B. J. Park 1, W. Choi 1 & S. H. Noh 2 1 Computer Science Department,

More information

A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set

A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set Renu Vashist School of Computer Science and Engineering Shri Mata Vaishno Devi University, Katra,

More information

TOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA)

TOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA) TOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA) 1 S. ADAEKALAVAN, 2 DR. C. CHANDRASEKAR 1 Assistant Professor, Department of Information Technology, J.J. College of Arts and Science, Pudukkottai,

More information

Clustering Large Datasets using Data Stream Clustering Techniques

Clustering Large Datasets using Data Stream Clustering Techniques Clustering Large Datasets using Data Stream Clustering Techniques Matthew Bolaños 1, John Forrest 2, and Michael Hahsler 1 1 Southern Methodist University, Dallas, Texas, USA. 2 Microsoft, Redmond, Washington,

More information

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Database and Knowledge-Base Systems: Data Mining. Martin Ester Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro

More information

A Patent Retrieval Method Using a Hierarchy of Clusters at TUT

A Patent Retrieval Method Using a Hierarchy of Clusters at TUT A Patent Retrieval Method Using a Hierarchy of Clusters at TUT Hironori Doi Yohei Seki Masaki Aono Toyohashi University of Technology 1-1 Hibarigaoka, Tenpaku-cho, Toyohashi-shi, Aichi 441-8580, Japan

More information

New Optimal Load Allocation for Scheduling Divisible Data Grid Applications

New Optimal Load Allocation for Scheduling Divisible Data Grid Applications New Optimal Load Allocation for Scheduling Divisible Data Grid Applications M. Othman, M. Abdullah, H. Ibrahim, and S. Subramaniam Department of Communication Technology and Network, University Putra Malaysia,

More information

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Devina Desai ddevina1@csee.umbc.edu Tim Oates oates@csee.umbc.edu Vishal Shanbhag vshan1@csee.umbc.edu Machine Learning

More information

Computer Department, Savitribai Phule Pune University, Nashik, Maharashtra, India

Computer Department, Savitribai Phule Pune University, Nashik, Maharashtra, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 5 ISSN : 2456-3307 A Review on Various Outlier Detection Techniques

More information

Agglomerative clustering on vertically partitioned data

Agglomerative clustering on vertically partitioned data Agglomerative clustering on vertically partitioned data R.Senkamalavalli Research Scholar, Department of Computer Science and Engg., SCSVMV University, Enathur, Kanchipuram 631 561 sengu_cool@yahoo.com

More information

AN APPROACH FOR LOAD BALANCING FOR SIMULATION IN HETEROGENEOUS DISTRIBUTED SYSTEMS USING SIMULATION DATA MINING

AN APPROACH FOR LOAD BALANCING FOR SIMULATION IN HETEROGENEOUS DISTRIBUTED SYSTEMS USING SIMULATION DATA MINING AN APPROACH FOR LOAD BALANCING FOR SIMULATION IN HETEROGENEOUS DISTRIBUTED SYSTEMS USING SIMULATION DATA MINING Irina Bernst, Patrick Bouillon, Jörg Frochte *, Christof Kaufmann Dept. of Electrical Engineering

More information

USC Real-time Pattern Isolation and Recognition Over Immersive Sensor Data Streams

USC Real-time Pattern Isolation and Recognition Over Immersive Sensor Data Streams Real-time Pattern Isolation and Recognition Over Immersive Sensor Data Streams Cyrus Shahabi and Donghui Yan Integrated Media Systems Center and Computer Science Department, University of Southern California

More information

On Classification of High-Cardinality Data Streams

On Classification of High-Cardinality Data Streams On Classification of High-Cardinality Data Streams Charu C. Aggarwal Philip S. Yu Abstract The problem of massive-domain stream classification is one in which each attribute can take on one of a large

More information

K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection

K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection Zhenghui Ma School of Computer Science The University of Birmingham Edgbaston, B15 2TT Birmingham, UK Ata Kaban School of Computer

More information

Research Paper Available online at: Efficient Clustering Algorithm for Large Data Set

Research Paper Available online at:   Efficient Clustering Algorithm for Large Data Set Volume 2, Issue, January 202 ISSN: 2277 28X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: Efficient Clustering Algorithm for

More information

An Adaptive Framework for Multistream Classification

An Adaptive Framework for Multistream Classification An Adaptive Framework for Multistream Classification Swarup Chandra, Ahsanul Haque, Latifur Khan and Charu Aggarwal* University of Texas at Dallas *IBM Research This material is based upon work supported

More information

Distance-based Outlier Detection: Consolidation and Renewed Bearing

Distance-based Outlier Detection: Consolidation and Renewed Bearing Distance-based Outlier Detection: Consolidation and Renewed Bearing Gustavo. H. Orair, Carlos H. C. Teixeira, Wagner Meira Jr., Ye Wang, Srinivasan Parthasarathy September 15, 2010 Table of contents Introduction

More information

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, March 18, ISSN

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, March 18,   ISSN International Journal of Computer Engineering and Applications, Volume XII, Special Issue, March 18, www.ijcea.com ISSN 2321-3469 COMBINING GENETIC ALGORITHM WITH OTHER MACHINE LEARNING ALGORITHM FOR CHARACTER

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

Incremental Learning Algorithm for Dynamic Data Streams

Incremental Learning Algorithm for Dynamic Data Streams 338 IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.9, September 2008 Incremental Learning Algorithm for Dynamic Data Streams Venu Madhav Kuthadi, Professor,Vardhaman College

More information

Mining Massive Data Streams

Mining Massive Data Streams Journal of Machine Learning Research 1 (2005)?? Submitted 3/05; Published??/?? Mining Massive Data Streams Geoff Hulten Microsoft Corporation One Microsoft Way Redmond, WA 98052-6399, USA Pedro Domingos

More information

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA SEQUENTIAL PATTERN MINING FROM WEB LOG DATA Rajashree Shettar 1 1 Associate Professor, Department of Computer Science, R. V College of Engineering, Karnataka, India, rajashreeshettar@rvce.edu.in Abstract

More information

CSE 701: LARGE-SCALE GRAPH MINING. A. Erdem Sariyuce

CSE 701: LARGE-SCALE GRAPH MINING. A. Erdem Sariyuce CSE 701: LARGE-SCALE GRAPH MINING A. Erdem Sariyuce WHO AM I? My name is Erdem Office: 323 Davis Hall Office hours: Wednesday 2-4 pm Research on graph (network) mining & management Practical algorithms

More information

CHAPTER 7. PAPER 3: EFFICIENT HIERARCHICAL CLUSTERING OF LARGE DATA SETS USING P-TREES

CHAPTER 7. PAPER 3: EFFICIENT HIERARCHICAL CLUSTERING OF LARGE DATA SETS USING P-TREES CHAPTER 7. PAPER 3: EFFICIENT HIERARCHICAL CLUSTERING OF LARGE DATA SETS USING P-TREES 7.1. Abstract Hierarchical clustering methods have attracted much attention by giving the user a maximum amount of

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 4, Jul Aug 2017

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 4, Jul Aug 2017 International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 4, Jul Aug 17 RESEARCH ARTICLE OPEN ACCESS Classifying Brain Dataset Using Classification Based Association Rules

More information