SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases

Size: px
Start display at page:

Download "SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases"

Transcription

1 SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases Jinlong Wang, Congfu Xu, Hongwei Dan, and Yunhe Pan Institute of Artificial Intelligence, Zhejiang University Hangzhou, , China Abstract. The issue of maintaining privacy in frequent itemset mining has attracted considerable attentions. In most of those works, only distorted data are available which may bring a lot of issues in the datamining process. Especially, in the dynamic update distorted database environment, it is nontrivial to mine frequent itemsets incrementally due to the high counting overhead to recompute support counts for itemsets. This paper investigates such a problem and develops an efficient algorithm SA-IFIM for incrementally mining frequent itemsets in update distorted databases. In this algorithm, some additional information is stored during the earlier mining process to support the efficient incremental computation. Especially, with the introduction of supporting aggregate and representing it with bit vector, the transaction database is transformed into machine oriented model to perform fast support computation. The performance studies show the efficiency of our algorithm. 1 Introduction Recently, privacy becomes one of the prime concerns in data mining. For not compromising the privacy, most of works make use of distortion or randomization techniques to the original dataset, and only the disguised data are shared for data mining [1 3]. Mining frequent itemset models from the distorted databases with the reconstruction methods brings expensive overheads as compared to directly mining original data sets [2]. In [3, 4], the basic formula from set theory are used to eliminate these counting overheads. But, in reality, for many applications, a database is dynamic in the sense. The changes on the data set may invalidate some existing frequent itemsets and introduce some new ones, so the incremental algorithms [5, 6] were proposed for addressing the problem. However, it is not efficient to directly use these incremental algorithms in the update distorted database, because of the high counting overhead to recompute support for itemsets. Although Supported by the Natural Science Foundation of China (No ), Zhejiang Provincial Natural Science Foundation of China (Y105250) and the Science- Technology Progrom of Zhejiang Province of China (No. 2004C31098). Congfu Xu is the corresponding author.

2 2 Jinlong Wang et al. [7] has proposed an algorithm for incremental updating, the efficiency still cannot satisfy the reality. This paper investigates the problem of incremental frequent itemset mining in update distorted databases. We first develop an efficient incremental updating computation method to quickly reconstruct an itemset s support by using the additional information stored during the earlier mining process. Then, a new concept supporting aggregate (SA) is introduced and represented with bit vector. In this way, the transaction database is transformed into machine oriented model to perform fast support computation. Finally, an efficient algorithm SA- IFIM (Supporting Aggregate based Incremental Frequent Itemset Mining in update distorted databases) is presented to describe the process. The performance studies show the efficiency of our algorithm. The remainder of this paper is organized as follows. Section 2 presents the SA-IFIM algorithm step by step. The performance studies are reported in Section 3. Finally, Section 4 concludes this paper. 2 The SA-IFIM Algorithm In this section, the SA-IFIM algorithm is introduced step by step. Before mining, the data sets are distorted respectively using the method mentioned by EMASK [3]. In the following, we first describe the preliminaries about incremental frequent itemsets mining, then investigate the essence of the updating technique and use some additional information recorded during the earlier mining and the set theory for quick updating computation. Next, we introduce the supporting aggregate and represent it with bit vector to transform the database into machine oriented model for speeding up computations. Finally, the SA-IFIM algorithm is summarized. 2.1 Preliminaries In this subsection, some preliminaries about the concept of incremental frequent itemset mining are presented, summarizing the formal description in [5, 6]. Let D be a set of transactions and I = {i 1,i 2,...,i m } a set of distinct literals (items). For a dynamic database, old transactions are deleted from the database D and new transactions + are added. Naturally, D. Denote the updated database by D, therefore D = (D ) +, and the unchanged transactions by D = D. Let Fp express the frequent itemsets in the original database D, Fp k denote k-frequent itemsets. The problem of incremental mining is to find frequent itemsets (denoted by Fp ) in D, given,d, +, and the mining result Fp, with respect to the same user specified minimum support s. Furthermore, the incremental approach needs to take advantage of previously obtained information to avoid rerunning the mining algorithms on the whole database when the database is updated. For the clarity, we present s as a relative support value, but δ + c, δ c, σ c, and σ c as absolute ones, respectively in +,, D, D. And set δ c as the change of support count of itemset c. Then δ c = δ + c δ c, σ c = σ c + δ + c δ c.

3 The SA-IFIM Algorithm Efficient incremental computation Generally, in dynamically updating environment, the important aspect of mining is how to deal with the frequent itemsets in D, recorded in Fp, and how to add the itemsets, which are non-frequent in D (not existing in Fp) but frequent in D. In the following, for simplicity, we define as the tuple number in the transaction database. 1. For the frequent itemsets in Fp, find the non-frequent or still available frequent itemsets in the updated database D. Lemma 1 If c Fp (σ c D s), and δ c ( + ) s, then c Fp. Proof. σ c=σ c + δ + c δ c ( D s + + s s) =( D + + ) s = D s. Property 1. When c Fp, and δ c < ( + ) s, then c Fp if and only if σ c D s. 2. For itemsets which are non-frequent in D, mine the frequent itemsets in the changed database + and recompute their support counts through scanning D. Lemma 2 If c Fp, and δ c < ( + ) s, then c Fp. Proof. Refer to Lemma 1. Property 2. When c Fp, and δ c ( + ) s, then c Fp if and only if σ c D s. Under the framework of symbol-specific distortion process in [3], 1 and 0 in the original database are respectively flipped with (1 p) and (1 q). In incremental frequent itemset mining, the goal is to mine frequent itemsets from the distorted databases with the information obtained during the earlier process. To test the condition for an itemset not in Fp in the situation Property 2, we need reconstruct an itemset s support in the unchanged database D through scanning D. Not only the distorted support of the itemset itself, but also some other counts related to it need to be tracked of. This makes that the support count computing in Property 2 is difficult and paramount important in incremental mining. And it is nontrivial to directly apply traditional incremental algorithms to it. To address the problem, an efficient incremental updating operation is first developed through computation with the support in the distorted database, then another method is presented to improve the support computation efficiency in the section 2.3. In distorted databases, the support computations of frequent itemsets are tedious. Motivated by [3], the similar support computation method is used in incremental mining. With the method, for computing an itemset s support, we should have the support counts of all its subsets in the distorted database. However, if we save the support counts of all the itemsets, this will be unpractical

4 4 Jinlong Wang et al. and greatly increase cost and degrade indexing efficiency. Thus in incremental mining, when recording the frequent itemsets and their support counts, the corresponding ones in each distorted database are registered at the same time. In this way, for a k-itemset not in Fp, since all its subsets are frequent in the database, we can use the existing support counts in each distorted database to compute and reconstruct its support in the updated database quickly. Thus, the efficiency is improved. 2.3 Supporting aggregate and database transformation In order to improve the efficiency, we introduce the concept supporting aggregate and use bit vector to represent it. By virtue of elementary supporting aggregate based on bit vector, the database is transformed into the machine oriented data model, which improves the efficiency of itemsets support computation. In the following statement, for transaction database D, let U denote a set of objects (universe), as unique identifiers for the transactions. For simplicity, we refer U as the transactions without differences. For an itemset A I, a transaction u U is said to contain A if A u. Definition 1. supporting aggregate (SA). For an attribute itemset A I, denote S(A) = {u U A u} as its supporting aggregate, where S(A) is the aggregate, composed of the transactions including the attribute itemset A. Generally, S(A) U. For the supporting aggregate of each attribute items, we call it elementary supporting aggregate (ESA). Using ESA, the original transaction database is vertically inverted and transformed into attribute-transaction list. Through the ESA, the SA of an itemset can be obtained quickly with set intersection. And the itemsets support can be efficiently computed. In order to further improve processing speed, for each SA (ESA), we denote it as BV-SA (BV-ESA) with a binary vector of U dimensions ( U is the number of transaction in U). If an itemset s SA contains the ith transaction, its binary vector s ith dimension is set to 1, otherwise, the corresponding position is set to 0. By this representation, the support count of each attribute item can be computed efficiently. With the vertical database representation, where each row presents an attribute s BV-ESA, the attribute items can be removed sequentially due to download closure property [8], which efficiently reduced the size of the data set. On the other hand, the whole BV-ESA sometimes cannot be loaded into memory entirely because of the memory constraints. Our approach seeks to solve the scalable problem through horizontally partitioning the transaction data set into subsets, which is composed of partial objects (transactions), then load them partition by partition. Through the method, each partition is disjointed with each other, which makes it suitable for the parallel and distributed processing. Furthermore, in reality, the optimizational memory swap strategy can be adopted to reduce the I/O cost.

5 The SA-IFIM Algorithm The process of SA-IFIM algorithm In this subsection, the algorithm SA-IFIM is summarized as Algorithm 1. When the distorted data sets D, and + are firstly scanned, they are transformed into the corresponding vertical bit vector representations BV (D ), BV ( ) and BV ( + ) partition by partition, and saved into hard disk. From the representations, frequent k-itemsets Fp k can be obtained level by level. And based on the candidate set generation-and-test approach, candidate frequent k-itemsets (C k ) are generated from frequent (k-1)-itemsets (Fp k 1 ). Algorithm 1: Algorithm SA-IFIM Input: D, +,, Fp (Frequent itemsets and the support counts in D), Fp (Frequent itemsets of Fp and the corresponding support counts in D ), minimum support s, and distortion parameter p, q as EMASK [3]. Output: Fp (Frequent itemsets and the support counts in D ) Method: As shown in Fig.1. In the algorithm, we use some temporal files to store the support counts in the distorted database for efficiency. Fig. 1. SA-IFIM algorithm diagram.

6 6 Jinlong Wang et al. 3 Performance Evaluation This section performed comprehensive experiments to compare SA-IFIM with EMASK, provided by the authors in [9]. And for the better performance evaluation, we also implemented the algorithm IFIM (Similar as IPPFIM [7]). All programs were coded in C++ using Cygwin with gcc The experiments were done on a P4, 3GHz Processor, with 1G memory. SA-IFIM and IFIM yield the same itemsets as EMASK with the same data set and the same minimum support parameters. Our experiments were performed on the synthetic data sets by IBM synthetic market-basket data generator [8]. In the following, we use the notation as D (number of transactions), T (average size of the transactions), I (average size of the maximal potentially large itemsets), and N (number of items), and set N=1000. In our method, the sizes of + and are not required to be the same. Without loss of generality, let d = + = for simplicity. For the sake of clarity, TxIyDmdn is used to represent an original database with an update database, where the parameters T = x and I = y are the same, only different in the number of the original transaction database D = m and the update transaction database d = n. In the following, we used the distorted benchmark data sets as the input databases to the algorithms. The distortion parameters are same as EMASK [3], with p=0.5 and q=0.97. In the experiments, for a fair comparison of algorithms and scalable requirements, SA-IFIM is run where only 5K transactions are loaded into the main memory one time. 3.1 Different support analysis In Fig.2, the relative performance of SA-IFIM, IFIM and EMASK are compared on two different data sets, T25I4D100Kd10K (sparse) and T40I10D100Kd10K (dense) with respect to various minimum support. As shown in Fig.2, SA-IFIM leads to prominent performance improvement. Explicitly, on the sparse data sets (T25I4D100Kd10K), IFIM is close to EMASK, and SA-IFIM is orders of magnitude faster than them; on the dense data sets (T40I10D100Kd10K), IFIM is faster than EMASK, but SA-IFIM also outperforms IFIM, and the margin grows as the minimum support decreases. 3.2 Effect of the update size Two data sets T25I4D100Kdm and T40I10D100Kdm were experimented, and the results shown in Fig.3. As expected, when the same number of transactions are deleted and added, the time of rerunning EMASK maintains constant, but the one of IFIM increases sharply and surpass EMASK quickly. In Fig.3, the execution time of SA-IFIM is much less than EMASK. SA-IFIM still significantly outperforms EMASK, even when the update size is much large.

7 The SA-IFIM Algorithm 7 (a) T25I4D100Kd10K (b) T40I10D100Kd10K Fig. 2. Extensive analysis for different support (a) T25I4D100Kdm(s=0.6%) (b) T40I10D100Kdm(s=1.25%) Fig. 3. Different updating tuples analysis 3.3 Scale up performance Finally, to assess the scalability of the algorithm SA-IFIM, two experiments, T25I4Dmd(m/10) at s = 0.6% and T40I10Dmd(m/10) at s = 1.25%, were conducted to examine the scale up performance by enlarging the number of mined data set. The scale up results for the two data sets are obtained as Fig.4, which shows the impact of D and d to the algorithms SA-IFIM and EMASK. In the experiments, the size of the update database is as 10% of the original database, and the size of the transaction database m was increased from 100K to 1000K. As shown in Fig.4, EMASK is very sensitive to the updating tuple but SA-IFIM is not, and the execution time of SA-IFIM increases linearly as the database size increases. This shows that the algorithm can be applied to very large databases and demonstrates good scalability of it.

8 8 Jinlong Wang et al. (a) T25I4Dmd(m/10)(s=0.6%) (b) T40I10Dmd(m/10)(s=1.25%) Fig. 4. Scale up performance analysis 4 Conclusions In this paper, we explore the issue of frequent itemset mining under the dynamically updating distorted databases environment. We first develop an efficient incremental updating computation method to quickly reconstruct an itemset s support. Through the introduction of the supporting aggregate represented with bit vector, the databases are transformed into the representations more accessible and processible by computer. The support count computing can be accomplished efficiently. Experiments conducted show that SA-IFIM significantly outperforms EMASK of mining the whole updated database, and also have the advantage of the incremental algorithms only based on EMASK. References 1. Agrawal, R., and Srikant, R.: Privacy-preserving data mining. In: Proceedings of SIGMOD. (2000) Rizvi, S., and Haritsa, J.: Maintaining data privacy in association rule mining. In: Proceedings of VLDB. (2002) Agrawal, S., Krishnan, V., and Haritsa, J.: On addressing efficiency concerns in privacy-preserving mining. In: Proceedings of DASFAA. (2004) Xu, C., Wang, J., Dan, H., and Pan, Y.: An improved EMASK algorithm for privacy-preserving frequent pattern mining. In: Proceedings of CIS. (2005) Cheung, D., Han, J., Ng, V., and Wong, C.: Maintenance of discovered association rules in large databases: An incremental updating tedchnique. In: Proceedings of ICDE. (1996) Cheung, D., Lee, S., and Kao, B.: A general incremental technique for updating discovered association rules. In: Proceedings of DASFAA. (1997) Wang, J., Xu, C., and Pan, Y.: An Incremental Algorithm for Mining Privacy- Preserving Frequent Itemsets. In: Proceedings of ICMLC. (2006) 8. Agrawal, R., and Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of VLDB. (1994)

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery : Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Hong Cheng Philip S. Yu Jiawei Han University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center {hcheng3, hanj}@cs.uiuc.edu,

More information

Mining Frequent Itemsets for data streams over Weighted Sliding Windows

Mining Frequent Itemsets for data streams over Weighted Sliding Windows Mining Frequent Itemsets for data streams over Weighted Sliding Windows Pauray S.M. Tsai Yao-Ming Chen Department of Computer Science and Information Engineering Minghsin University of Science and Technology

More information

Integration of Candidate Hash Trees in Concurrent Processing of Frequent Itemset Queries Using Apriori

Integration of Candidate Hash Trees in Concurrent Processing of Frequent Itemset Queries Using Apriori Integration of Candidate Hash Trees in Concurrent Processing of Frequent Itemset Queries Using Apriori Przemyslaw Grudzinski 1, Marek Wojciechowski 2 1 Adam Mickiewicz University Faculty of Mathematics

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

A Decremental Algorithm for Maintaining Frequent Itemsets in Dynamic Databases *

A Decremental Algorithm for Maintaining Frequent Itemsets in Dynamic Databases * A Decremental Algorithm for Maintaining Frequent Itemsets in Dynamic Databases * Shichao Zhang 1, Xindong Wu 2, Jilian Zhang 3, and Chengqi Zhang 1 1 Faculty of Information Technology, University of Technology

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department

More information

On Multiple Query Optimization in Data Mining

On Multiple Query Optimization in Data Mining On Multiple Query Optimization in Data Mining Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland {marek,mzakrz}@cs.put.poznan.pl

More information

Efficient Remining of Generalized Multi-supported Association Rules under Support Update

Efficient Remining of Generalized Multi-supported Association Rules under Support Update Efficient Remining of Generalized Multi-supported Association Rules under Support Update WEN-YANG LIN 1 and MING-CHENG TSENG 1 Dept. of Information Management, Institute of Information Engineering I-Shou

More information

Parallel Mining of Maximal Frequent Itemsets in PC Clusters

Parallel Mining of Maximal Frequent Itemsets in PC Clusters Proceedings of the International MultiConference of Engineers and Computer Scientists 28 Vol I IMECS 28, 19-21 March, 28, Hong Kong Parallel Mining of Maximal Frequent Itemsets in PC Clusters Vong Chan

More information

Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm

Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm Marek Wojciechowski, Krzysztof Galecki, Krzysztof Gawronek Poznan University of Technology Institute of Computing Science ul.

More information

Efficient GSP Implementation based on XML Databases

Efficient GSP Implementation based on XML Databases 212 International Conference on Information and Knowledge Management (ICIKM 212) IPCSIT vol.45 (212) (212) IACSIT Press, Singapore Efficient GSP Implementation based on Databases Porjet Sansai and Juggapong

More information

Closed Pattern Mining from n-ary Relations

Closed Pattern Mining from n-ary Relations Closed Pattern Mining from n-ary Relations R V Nataraj Department of Information Technology PSG College of Technology Coimbatore, India S Selvan Department of Computer Science Francis Xavier Engineering

More information

Incrementally mining high utility patterns based on pre-large concept

Incrementally mining high utility patterns based on pre-large concept Appl Intell (2014) 40:343 357 DOI 10.1007/s10489-013-0467-z Incrementally mining high utility patterns based on pre-large concept Chun-Wei Lin Tzung-Pei Hong Guo-Cheng Lan Jia-Wei Wong Wen-Yang Lin Published

More information

Web page recommendation using a stochastic process model

Web page recommendation using a stochastic process model Data Mining VII: Data, Text and Web Mining and their Business Applications 233 Web page recommendation using a stochastic process model B. J. Park 1, W. Choi 1 & S. H. Noh 2 1 Computer Science Department,

More information

A New Fast Vertical Method for Mining Frequent Patterns

A New Fast Vertical Method for Mining Frequent Patterns International Journal of Computational Intelligence Systems, Vol.3, No. 6 (December, 2010), 733-744 A New Fast Vertical Method for Mining Frequent Patterns Zhihong Deng Key Laboratory of Machine Perception

More information

Pincer-Search: An Efficient Algorithm. for Discovering the Maximum Frequent Set

Pincer-Search: An Efficient Algorithm. for Discovering the Maximum Frequent Set Pincer-Search: An Efficient Algorithm for Discovering the Maximum Frequent Set Dao-I Lin Telcordia Technologies, Inc. Zvi M. Kedem New York University July 15, 1999 Abstract Discovering frequent itemsets

More information

A mining method for tracking changes in temporal association rules from an encoded database

A mining method for tracking changes in temporal association rules from an encoded database A mining method for tracking changes in temporal association rules from an encoded database Chelliah Balasubramanian *, Karuppaswamy Duraiswamy ** K.S.Rangasamy College of Technology, Tiruchengode, Tamil

More information

Monotone Constraints in Frequent Tree Mining

Monotone Constraints in Frequent Tree Mining Monotone Constraints in Frequent Tree Mining Jeroen De Knijf Ad Feelders Abstract Recent studies show that using constraints that can be pushed into the mining process, substantially improves the performance

More information

Closed Non-Derivable Itemsets

Closed Non-Derivable Itemsets Closed Non-Derivable Itemsets Juho Muhonen and Hannu Toivonen Helsinki Institute for Information Technology Basic Research Unit Department of Computer Science University of Helsinki Finland Abstract. Itemset

More information

Towards Incremental Grounding in Tuffy

Towards Incremental Grounding in Tuffy Towards Incremental Grounding in Tuffy Wentao Wu, Junming Sui, Ye Liu University of Wisconsin-Madison ABSTRACT Markov Logic Networks (MLN) have become a powerful framework in logical and statistical modeling.

More information

A Graph-Based Approach for Mining Closed Large Itemsets

A Graph-Based Approach for Mining Closed Large Itemsets A Graph-Based Approach for Mining Closed Large Itemsets Lee-Wen Huang Dept. of Computer Science and Engineering National Sun Yat-Sen University huanglw@gmail.com Ye-In Chang Dept. of Computer Science and

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK EFFICIENT ALGORITHMS FOR MINING HIGH UTILITY ITEMSETS FROM TRANSACTIONAL DATABASES

More information

An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction

An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction International Journal of Engineering Science Invention Volume 2 Issue 1 January. 2013 An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction Janakiramaiah Bonam 1, Dr.RamaMohan

More information

Mining N-most Interesting Itemsets. Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang. fadafu,

Mining N-most Interesting Itemsets. Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang. fadafu, Mining N-most Interesting Itemsets Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang Department of Computer Science and Engineering The Chinese University of Hong Kong, Hong Kong fadafu, wwkwongg@cse.cuhk.edu.hk

More information

Data Mining Query Scheduling for Apriori Common Counting

Data Mining Query Scheduling for Apriori Common Counting Data Mining Query Scheduling for Apriori Common Counting Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland {marek,

More information

Maintenance of fast updated frequent pattern trees for record deletion

Maintenance of fast updated frequent pattern trees for record deletion Maintenance of fast updated frequent pattern trees for record deletion Tzung-Pei Hong a,b,, Chun-Wei Lin c, Yu-Lung Wu d a Department of Computer Science and Information Engineering, National University

More information

On Privacy-Preservation of Text and Sparse Binary Data with Sketches

On Privacy-Preservation of Text and Sparse Binary Data with Sketches On Privacy-Preservation of Text and Sparse Binary Data with Sketches Charu C. Aggarwal Philip S. Yu Abstract In recent years, privacy preserving data mining has become very important because of the proliferation

More information

Discovery of Frequent Itemset and Promising Frequent Itemset Using Incremental Association Rule Mining Over Stream Data Mining

Discovery of Frequent Itemset and Promising Frequent Itemset Using Incremental Association Rule Mining Over Stream Data Mining Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 5, May 2014, pg.923

More information

Mining Temporal Indirect Associations

Mining Temporal Indirect Associations Mining Temporal Indirect Associations Ling Chen 1,2, Sourav S. Bhowmick 1, Jinyan Li 2 1 School of Computer Engineering, Nanyang Technological University, Singapore, 639798 2 Institute for Infocomm Research,

More information

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA SEQUENTIAL PATTERN MINING FROM WEB LOG DATA Rajashree Shettar 1 1 Associate Professor, Department of Computer Science, R. V College of Engineering, Karnataka, India, rajashreeshettar@rvce.edu.in Abstract

More information

CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets

CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets Jianyong Wang, Jiawei Han, Jian Pei Presentation by: Nasimeh Asgarian Department of Computing Science University of Alberta

More information

620 HUANG Liusheng, CHEN Huaping et al. Vol.15 this itemset. Itemsets that have minimum support (minsup) are called large itemsets, and all the others

620 HUANG Liusheng, CHEN Huaping et al. Vol.15 this itemset. Itemsets that have minimum support (minsup) are called large itemsets, and all the others Vol.15 No.6 J. Comput. Sci. & Technol. Nov. 2000 A Fast Algorithm for Mining Association Rules HUANG Liusheng (ΛΠ ), CHEN Huaping ( ±), WANG Xun (Φ Ψ) and CHEN Guoliang ( Ξ) National High Performance Computing

More information

A novel algorithm for frequent itemset mining in data warehouses

A novel algorithm for frequent itemset mining in data warehouses 216 Journal of Zhejiang University SCIENCE A ISSN 1009-3095 http://www.zju.edu.cn/jzus E-mail: jzus@zju.edu.cn A novel algorithm for frequent itemset mining in data warehouses XU Li-jun ( 徐利军 ), XIE Kang-lin

More information

Value Added Association Rules

Value Added Association Rules Value Added Association Rules T.Y. Lin San Jose State University drlin@sjsu.edu Glossary Association Rule Mining A Association Rule Mining is an exploratory learning task to discover some hidden, dependency

More information

Maintenance of the Prelarge Trees for Record Deletion

Maintenance of the Prelarge Trees for Record Deletion 12th WSEAS Int. Conf. on APPLIED MATHEMATICS, Cairo, Egypt, December 29-31, 2007 105 Maintenance of the Prelarge Trees for Record Deletion Chun-Wei Lin, Tzung-Pei Hong, and Wen-Hsiang Lu Department of

More information

Temporal Weighted Association Rule Mining for Classification

Temporal Weighted Association Rule Mining for Classification Temporal Weighted Association Rule Mining for Classification Purushottam Sharma and Kanak Saxena Abstract There are so many important techniques towards finding the association rules. But, when we consider

More information

Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules

Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules Manju Department of Computer Engg. CDL Govt. Polytechnic Education Society Nathusari Chopta, Sirsa Abstract The discovery

More information

Datasets Size: Effect on Clustering Results

Datasets Size: Effect on Clustering Results 1 Datasets Size: Effect on Clustering Results Adeleke Ajiboye 1, Ruzaini Abdullah Arshah 2, Hongwu Qin 3 Faculty of Computer Systems and Software Engineering Universiti Malaysia Pahang 1 {ajibraheem@live.com}

More information

Mining Frequent Patterns from Very High Dimensional Data: A Top-Down Row Enumeration Approach *

Mining Frequent Patterns from Very High Dimensional Data: A Top-Down Row Enumeration Approach * Mining Frequent Patterns from Very High Dimensional Data: A Top-Down Row Enumeration Approach * Hongyan Liu 1 Jiawei Han 2 Dong Xin 2 Zheng Shao 2 1 Department of Management Science and Engineering, Tsinghua

More information

CHAPTER 5 WEIGHTED SUPPORT ASSOCIATION RULE MINING USING CLOSED ITEMSET LATTICES IN PARALLEL

CHAPTER 5 WEIGHTED SUPPORT ASSOCIATION RULE MINING USING CLOSED ITEMSET LATTICES IN PARALLEL 68 CHAPTER 5 WEIGHTED SUPPORT ASSOCIATION RULE MINING USING CLOSED ITEMSET LATTICES IN PARALLEL 5.1 INTRODUCTION During recent years, one of the vibrant research topics is Association rule discovery. This

More information

NON-CENTRALIZED DISTINCT L-DIVERSITY

NON-CENTRALIZED DISTINCT L-DIVERSITY NON-CENTRALIZED DISTINCT L-DIVERSITY Chi Hong Cheong 1, Dan Wu 2, and Man Hon Wong 3 1,3 Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong {chcheong, mhwong}@cse.cuhk.edu.hk

More information

Efficient Incremental Mining of Top-K Frequent Closed Itemsets

Efficient Incremental Mining of Top-K Frequent Closed Itemsets Efficient Incremental Mining of Top- Frequent Closed Itemsets Andrea Pietracaprina and Fabio Vandin Dipartimento di Ingegneria dell Informazione, Università di Padova, Via Gradenigo 6/B, 35131, Padova,

More information

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set To Enhance Scalability of Item Transactions by Parallel and Partition using Dynamic Data Set Priyanka Soni, Research Scholar (CSE), MTRI, Bhopal, priyanka.soni379@gmail.com Dhirendra Kumar Jha, MTRI, Bhopal,

More information

An Algorithm for Frequent Pattern Mining Based On Apriori

An Algorithm for Frequent Pattern Mining Based On Apriori An Algorithm for Frequent Pattern Mining Based On Goswami D.N.*, Chaturvedi Anshu. ** Raghuvanshi C.S.*** *SOS In Computer Science Jiwaji University Gwalior ** Computer Application Department MITS Gwalior

More information

Finding frequent closed itemsets with an extended version of the Eclat algorithm

Finding frequent closed itemsets with an extended version of the Eclat algorithm Annales Mathematicae et Informaticae 48 (2018) pp. 75 82 http://ami.uni-eszterhazy.hu Finding frequent closed itemsets with an extended version of the Eclat algorithm Laszlo Szathmary University of Debrecen,

More information

Product presentations can be more intelligently planned

Product presentations can be more intelligently planned Association Rules Lecture /DMBI/IKI8303T/MTI/UI Yudho Giri Sucahyo, Ph.D, CISA (yudho@cs.ui.ac.id) Faculty of Computer Science, Objectives Introduction What is Association Mining? Mining Association Rules

More information

An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets

An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.8, August 2008 121 An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets

More information

CS570 Introduction to Data Mining

CS570 Introduction to Data Mining CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay Partial slide credits: Li Xiong, Jiawei Han and Micheline Kamber George Kollios 1 Mining Frequent Patterns,

More information

2. Discovery of Association Rules

2. Discovery of Association Rules 2. Discovery of Association Rules Part I Motivation: market basket data Basic notions: association rule, frequency and confidence Problem of association rule mining (Sub)problem of frequent set mining

More information

Materialized Data Mining Views *

Materialized Data Mining Views * Materialized Data Mining Views * Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland tel. +48 61

More information

Performance Analysis of Frequent Closed Itemset Mining: PEPP Scalability over CHARM, CLOSET+ and BIDE

Performance Analysis of Frequent Closed Itemset Mining: PEPP Scalability over CHARM, CLOSET+ and BIDE Volume 3, No. 1, Jan-Feb 2012 International Journal of Advanced Research in Computer Science RESEARCH PAPER Available Online at www.ijarcs.info ISSN No. 0976-5697 Performance Analysis of Frequent Closed

More information

C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking

C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking Dong Xin Zheng Shao Jiawei Han Hongyan Liu University of Illinois at Urbana-Champaign, Urbana, IL 6, USA Tsinghua University,

More information

Mining Recent Frequent Itemsets in Data Streams with Optimistic Pruning

Mining Recent Frequent Itemsets in Data Streams with Optimistic Pruning Mining Recent Frequent Itemsets in Data Streams with Optimistic Pruning Kun Li 1,2, Yongyan Wang 1, Manzoor Elahi 1,2, Xin Li 3, and Hongan Wang 1 1 Institute of Software, Chinese Academy of Sciences,

More information

Mining High Average-Utility Itemsets

Mining High Average-Utility Itemsets Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 Mining High Itemsets Tzung-Pei Hong Dept of Computer Science and Information Engineering

More information

Mining Association Rules from Stars

Mining Association Rules from Stars Mining Association Rules from Stars Eric Ka Ka Ng, Ada Wai-Chee Fu, Ke Wang + Chinese University of Hong Kong Department of Computer Science and Engineering fkkng,adafug@cse.cuhk.edu.hk + Simon Fraser

More information

An Approximate Scheme to Mine Frequent Patterns over Data Streams

An Approximate Scheme to Mine Frequent Patterns over Data Streams An Approximate Scheme to Mine Frequent Patterns over Data Streams Shanchan Wu Department of Computer Science, University of Maryland, College Park, MD 20742, USA wsc@cs.umd.edu Abstract. In this paper,

More information

Mining of Web Server Logs using Extended Apriori Algorithm

Mining of Web Server Logs using Extended Apriori Algorithm International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

More information

Comparing the Performance of Frequent Itemsets Mining Algorithms

Comparing the Performance of Frequent Itemsets Mining Algorithms Comparing the Performance of Frequent Itemsets Mining Algorithms Kalash Dave 1, Mayur Rathod 2, Parth Sheth 3, Avani Sakhapara 4 UG Student, Dept. of I.T., K.J.Somaiya College of Engineering, Mumbai, India

More information

Data Structure for Association Rule Mining: T-Trees and P-Trees

Data Structure for Association Rule Mining: T-Trees and P-Trees IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 16, NO. 6, JUNE 2004 1 Data Structure for Association Rule Mining: T-Trees and P-Trees Frans Coenen, Paul Leng, and Shakil Ahmed Abstract Two new

More information

Mining Top-K Strongly Correlated Item Pairs Without Minimum Correlation Threshold

Mining Top-K Strongly Correlated Item Pairs Without Minimum Correlation Threshold Mining Top-K Strongly Correlated Item Pairs Without Minimum Correlation Threshold Zengyou He, Xiaofei Xu, Shengchun Deng Department of Computer Science and Engineering, Harbin Institute of Technology,

More information

An Improved Algorithm for Mining Association Rules Using Multiple Support Values

An Improved Algorithm for Mining Association Rules Using Multiple Support Values An Improved Algorithm for Mining Association Rules Using Multiple Support Values Ioannis N. Kouris, Christos H. Makris, Athanasios K. Tsakalidis University of Patras, School of Engineering Department of

More information

Ascending Frequency Ordered Prefix-tree: Efficient Mining of Frequent Patterns

Ascending Frequency Ordered Prefix-tree: Efficient Mining of Frequent Patterns Ascending Frequency Ordered Prefix-tree: Efficient Mining of Frequent Patterns Guimei Liu Hongjun Lu Dept. of Computer Science The Hong Kong Univ. of Science & Technology Hong Kong, China {cslgm, luhj}@cs.ust.hk

More information

Web Usage Mining: How to Efficiently Manage New Transactions and New Clients

Web Usage Mining: How to Efficiently Manage New Transactions and New Clients Web Usage Mining: How to Efficiently Manage New Transactions and New Clients F. Masseglia 1,2, P. Poncelet 2, and M. Teisseire 2 1 Laboratoire PRiSM, Univ. de Versailles, 45 Avenue des Etats-Unis, 78035

More information

Maintenance of Generalized Association Rules for Record Deletion Based on the Pre-Large Concept

Maintenance of Generalized Association Rules for Record Deletion Based on the Pre-Large Concept Proceedings of the 6th WSEAS Int. Conf. on Artificial Intelligence, Knowledge Engineering and ata Bases, Corfu Island, Greece, February 16-19, 2007 142 Maintenance of Generalized Association Rules for

More information

Maintaining Data Privacy in Association Rule Mining

Maintaining Data Privacy in Association Rule Mining Maintaining Data Privacy in Association Rule Mining Shariq Rizvi Indian Institute of Technology, Bombay Joint work with: Jayant Haritsa Indian Institute of Science August 2002 MASK Presentation (VLDB)

More information

Memory issues in frequent itemset mining

Memory issues in frequent itemset mining Memory issues in frequent itemset mining Bart Goethals HIIT Basic Research Unit Department of Computer Science P.O. Box 26, Teollisuuskatu 2 FIN-00014 University of Helsinki, Finland bart.goethals@cs.helsinki.fi

More information

Real World Performance of Association Rule Algorithms

Real World Performance of Association Rule Algorithms To appear in KDD 2001 Real World Performance of Association Rule Algorithms Zijian Zheng Blue Martini Software 2600 Campus Drive San Mateo, CA 94403, USA +1 650 356 4223 zijian@bluemartini.com Ron Kohavi

More information

WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity

WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity Unil Yun and John J. Leggett Department of Computer Science Texas A&M University College Station, Texas 7783, USA

More information

Finding Frequent Patterns Using Length-Decreasing Support Constraints

Finding Frequent Patterns Using Length-Decreasing Support Constraints Finding Frequent Patterns Using Length-Decreasing Support Constraints Masakazu Seno and George Karypis Department of Computer Science and Engineering University of Minnesota, Minneapolis, MN 55455 Technical

More information

ISSN: (Online) Volume 2, Issue 7, July 2014 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 2, Issue 7, July 2014 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 2, Issue 7, July 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Data Mining: Mining Association Rules. Definitions. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

Data Mining: Mining Association Rules. Definitions. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Mining Association Rules Definitions Market Baskets. Consider a set I = {i 1,...,i m }. We call the elements of I, items.

More information

Searching frequent itemsets by clustering data: towards a parallel approach using MapReduce

Searching frequent itemsets by clustering data: towards a parallel approach using MapReduce Searching frequent itemsets by clustering data: towards a parallel approach using MapReduce Maria Malek and Hubert Kadima EISTI-LARIS laboratory, Ave du Parc, 95011 Cergy-Pontoise, FRANCE {maria.malek,hubert.kadima}@eisti.fr

More information

Association Rule Mining. Introduction 46. Study core 46

Association Rule Mining. Introduction 46. Study core 46 Learning Unit 7 Association Rule Mining Introduction 46 Study core 46 1 Association Rule Mining: Motivation and Main Concepts 46 2 Apriori Algorithm 47 3 FP-Growth Algorithm 47 4 Assignment Bundle: Frequent

More information

Aggregation and maintenance for database mining

Aggregation and maintenance for database mining Intelligent Data Analysis 3 (1999) 475±490 www.elsevier.com/locate/ida Aggregation and maintenance for database mining Shichao Zhang School of Computing, National University of Singapore, Lower Kent Ridge,

More information

ESTIMATING HASH-TREE SIZES IN CONCURRENT PROCESSING OF FREQUENT ITEMSET QUERIES

ESTIMATING HASH-TREE SIZES IN CONCURRENT PROCESSING OF FREQUENT ITEMSET QUERIES ESTIMATING HASH-TREE SIZES IN CONCURRENT PROCESSING OF FREQUENT ITEMSET QUERIES Pawel BOINSKI, Konrad JOZWIAK, Marek WOJCIECHOWSKI, Maciej ZAKRZEWICZ Institute of Computing Science, Poznan University of

More information

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH International Journal of Information Technology and Knowledge Management January-June 2011, Volume 4, No. 1, pp. 27-32 DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY)

More information

FastLMFI: An Efficient Approach for Local Maximal Patterns Propagation and Maximal Patterns Superset Checking

FastLMFI: An Efficient Approach for Local Maximal Patterns Propagation and Maximal Patterns Superset Checking FastLMFI: An Efficient Approach for Local Maximal Patterns Propagation and Maximal Patterns Superset Checking Shariq Bashir National University of Computer and Emerging Sciences, FAST House, Rohtas Road,

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data

Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data Shilpa Department of Computer Science & Engineering Haryana College of Technology & Management, Kaithal, Haryana, India

More information

Purna Prasad Mutyala et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (5), 2011,

Purna Prasad Mutyala et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (5), 2011, Weighted Association Rule Mining Without Pre-assigned Weights PURNA PRASAD MUTYALA, KUMAR VASANTHA Department of CSE, Avanthi Institute of Engg & Tech, Tamaram, Visakhapatnam, A.P., India. Abstract Association

More information

Analysis of Basic Data Reordering Techniques

Analysis of Basic Data Reordering Techniques Analysis of Basic Data Reordering Techniques Tan Apaydin 1, Ali Şaman Tosun 2, and Hakan Ferhatosmanoglu 1 1 The Ohio State University, Computer Science and Engineering apaydin,hakan@cse.ohio-state.edu

More information

Mining Vague Association Rules

Mining Vague Association Rules Mining Vague Association Rules An Lu, Yiping Ke, James Cheng, and Wilfred Ng Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong, China {anlu,keyiping,csjames,wilfred}@cse.ust.hk

More information

Efficient Mining of Generalized Negative Association Rules

Efficient Mining of Generalized Negative Association Rules 2010 IEEE International Conference on Granular Computing Efficient Mining of Generalized egative Association Rules Li-Min Tsai, Shu-Jing Lin, and Don-Lin Yang Dept. of Information Engineering and Computer

More information

Mining Temporal Association Rules in Network Traffic Data

Mining Temporal Association Rules in Network Traffic Data Mining Temporal Association Rules in Network Traffic Data Guojun Mao Abstract Mining association rules is one of the most important and popular task in data mining. Current researches focus on discovering

More information

STUDY ON FREQUENT PATTEREN GROWTH ALGORITHM WITHOUT CANDIDATE KEY GENERATION IN DATABASES

STUDY ON FREQUENT PATTEREN GROWTH ALGORITHM WITHOUT CANDIDATE KEY GENERATION IN DATABASES STUDY ON FREQUENT PATTEREN GROWTH ALGORITHM WITHOUT CANDIDATE KEY GENERATION IN DATABASES Prof. Ambarish S. Durani 1 and Mrs. Rashmi B. Sune 2 1 Assistant Professor, Datta Meghe Institute of Engineering,

More information

Parallel and Distributed Frequent Itemset Mining on Dynamic Datasets

Parallel and Distributed Frequent Itemset Mining on Dynamic Datasets Parallel and Distributed Frequent Itemset Mining on Dynamic Datasets Adriano Veloso, Matthew Erick Otey Srinivasan Parthasarathy, and Wagner Meira Jr. Computer Science Department, Universidade Federal

More information

Maintaining Frequent Itemsets over High-Speed Data Streams

Maintaining Frequent Itemsets over High-Speed Data Streams Maintaining Frequent Itemsets over High-Speed Data Streams James Cheng, Yiping Ke, and Wilfred Ng Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Kowloon,

More information

Associating Terms with Text Categories

Associating Terms with Text Categories Associating Terms with Text Categories Osmar R. Zaïane Department of Computing Science University of Alberta Edmonton, AB, Canada zaiane@cs.ualberta.ca Maria-Luiza Antonie Department of Computing Science

More information

Information Sciences

Information Sciences Information Sciences 179 (28) 559 583 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins Efficient single-pass frequent pattern mining using a

More information

Roadmap DB Sys. Design & Impl. Association rules - outline. Citations. Association rules - idea. Association rules - idea.

Roadmap DB Sys. Design & Impl. Association rules - outline. Citations. Association rules - idea. Association rules - idea. 15-721 DB Sys. Design & Impl. Association Rules Christos Faloutsos www.cs.cmu.edu/~christos Roadmap 1) Roots: System R and Ingres... 7) Data Analysis - data mining datacubes and OLAP classifiers association

More information

. (1) N. supp T (A) = If supp T (A) S min, then A is a frequent itemset in T, where S min is a user-defined parameter called minimum support [3].

. (1) N. supp T (A) = If supp T (A) S min, then A is a frequent itemset in T, where S min is a user-defined parameter called minimum support [3]. An Improved Approach to High Level Privacy Preserving Itemset Mining Rajesh Kumar Boora Ruchi Shukla $ A. K. Misra Computer Science and Engineering Department Motilal Nehru National Institute of Technology,

More information

Discovery of Association Rules in Temporal Databases 1

Discovery of Association Rules in Temporal Databases 1 Discovery of Association Rules in Temporal Databases 1 Abdullah Uz Tansel 2 and Necip Fazil Ayan Department of Computer Engineering and Information Science Bilkent University 06533, Ankara, Turkey {atansel,

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

Association Rule Mining from XML Data

Association Rule Mining from XML Data 144 Conference on Data Mining DMIN'06 Association Rule Mining from XML Data Qin Ding and Gnanasekaran Sundarraj Computer Science Program The Pennsylvania State University at Harrisburg Middletown, PA 17057,

More information

Random Sampling over Data Streams for Sequential Pattern Mining

Random Sampling over Data Streams for Sequential Pattern Mining Random Sampling over Data Streams for Sequential Pattern Mining Chedy Raïssi LIRMM, EMA-LGI2P/Site EERIE 161 rue Ada 34392 Montpellier Cedex 5, France France raissi@lirmm.fr Pascal Poncelet EMA-LGI2P/Site

More information

Item Set Extraction of Mining Association Rule

Item Set Extraction of Mining Association Rule Item Set Extraction of Mining Association Rule Shabana Yasmeen, Prof. P.Pradeep Kumar, A.Ranjith Kumar Department CSE, Vivekananda Institute of Technology and Science, Karimnagar, A.P, India Abstract:

More information

Mining Quantitative Association Rules on Overlapped Intervals

Mining Quantitative Association Rules on Overlapped Intervals Mining Quantitative Association Rules on Overlapped Intervals Qiang Tong 1,3, Baoping Yan 2, and Yuanchun Zhou 1,3 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China {tongqiang,

More information

MINING FREQUENT MAX AND CLOSED SEQUENTIAL PATTERNS

MINING FREQUENT MAX AND CLOSED SEQUENTIAL PATTERNS MINING FREQUENT MAX AND CLOSED SEQUENTIAL PATTERNS by Ramin Afshar B.Sc., University of Alberta, Alberta, 2000 THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE

More information