Mining Generalized Sequential Patterns using Genetic Programming
|
|
- Kellie McKinney
- 6 years ago
- Views:
Transcription
1 Mining Generalized Sequential Patterns using Genetic Programming Sandra de Amo Universidade Federal de Uberlândia Faculdade de Computação Uberlândia MG - Brazil deamo@ufu.br Ary dos Santos Rocha Jr. Universidade Federal de Uberlândia Faculdade de Engenharia Elétrica Uberlândia MG - Brazil ary@cripta.com.br Abstract We propose a new kind of sequential pattern which we call Generalized Sequential Pattern, and we introduce the problem of mining generalized sequential patterns over temporal databases. A classical sequential pattern consists of a sequence of itemsets. This kind of pattern can be discovered in a database of customer transactions where each transaction consists of a transaction-id, transaction time and the items bought in the transaction. On the other hand, our generalized sequential pattern consists of a sequence of SQL expressions and can be discovered in a large temporal database. We present the genetic algorithm SEG-GEN to solve the problem of mining generalized sequential pattern. We show that SEG-GEN performs better than the classical algorithm AprioriAll for mining simple sequential patterns where the minimum support threshold is low. Keywords: Data Mining, Temporal Mining, Sequential Pattern, Genetic Programming 1 Introduction The problem of discovering sequential patterns in temporal data have been extensively studied in several recent papers [3, 4, 5, 6, 9] and its importance fully justified by the great number of potential application domains where mining sequential patterns appears as a crucial issue, such as financial market (evolution of stock market quotations), retailing (evolution of clients purchases), medicine (variations of patients symptoms), local weather forecast, telecommunication (frequent sequences of alarms output by network switches), etc. Different kinds of sequential patterns have been proposed as well as general formalisms and algorithms for expressing and mining them [7]. Roughly speaking, the problem of mining sequential patterns over a large amount of temporal data can be viewed as follows : (a) we are given a table of transactions Trans(IdCl, Time, Itemsets), where IdCl stands for the client identifier, Time is the time associated to the transaction and Itemsets the set of items bought by the client IdCl at time Time; (b) we are interested in discovering which sequences of itemsets are frequently purchased by the clients. For instance, we could discover that 70% of clients buy TV and CD Player followed by VCR and VCR tapes followed by DVD. The dataset may store other kind of data, for instance, instead of clients and sets of items we could have patients and set of symptoms. The problem of mining sequential patterns have been already treated in the past and a number of efficient algorithms have been proposed to solve it [3, 4, 6, 9]. In this paper, we introduce a new type of sequential pattern, which we call generalized sequential pattern. Roughly speaking, a generalized sequential pattern differs from a classical sequential pattern in the sense that it can capture temporal regularities where different types of information are evolving in time, in contrast with a classical sequential pattern which is designed to capture regularities where
2 only one type of information (the one captured by the Itemsets attribute) evolves in time. A typical example of a generalized sequential pattern is: clients having a low income buy a Fiat and afterwards, having a high income, buy a Mercedes. We notice here that the two attributes Income and Itemsets specify information evolving in time. Another important point is that the sample dataset where generalized sequential patterns are discovered can contain several tables and not only one table as in the classical case. Related Work. In [3] the problem of mining sequential patterns has been introduced and three different algorithms for mining them have been presented. One of these algorithms, the AprioriAll algorithm, finds all frequent sequential patterns and its performance is better than or comparable to the other two algorithms. In [4], the GSP algorithm has been introduced for mining sequential patterns and its performance, in some cases, is far better than AprioriAll s. The GSP algorithm allows the user to interact with the mining process by imposing some constraints on the patterns to be discovered. These algorithms are based on the so called Apriori Property or Antimonotony Property which states that if a sequence is frequent then all its subsequences must be frequent as well. Other algorithms ([6, 9]), based on different principles, have been proposed in order to mine sequential patterns; most of them have better performance than the Apriori family of algorithms (AprioriAll, GSP, etc). None of these algorithms, however, have been designed to mine generalized sequential patterns over multiple tables. In [5], a more general kind of sequential pattern have been introduced, the multidimensional sequential pattern, which involves a table with more than one non-key attribute, in contrast with the classical case, where only the non-key attribute Itemsets is allowed. The main difference between our generalized sequential pattern and the multidimensional sequential pattern is that, in the former, all non-keys attributes can depend on the time attribute, i.e., the values associated to these attributes differ from time to time, whereas in the later only one non-key attribute, the Itemsets, can depend on time, the other ones are fixed with respect to time and depend only on the non-temporal key attribute (IdCl). The algorithm SEG-GEN we propose to solve the problem of mining generalized sequential patterns uses genetic programming tools. The use of this technique is justified by the fact that all the existing methods for mining sequential patterns are designed to produce all frequent patterns, and so, they involve a combinatorial process whose inherent cost is enormous independing on the implementation techniques. The use of a genetic programming technique allows to solve the more complex problem of mining generalized sequential patterns in an approximated but satisfactory way. It is important to notice that genetic programming have already been used to solve other temporal mining problems, e.g., the Temporal Constraint Mining [2] and the Temporal Patterns of Time Series Events [8]. Our Contribution. In this paper we introduce a new type of temporal pattern which generalizes the classical sequential patterns of [3, 4] as well as the multidimensional sequential patterns of [5]. We also propose the genetic algorithm SEG-GEN to solve the problem of mining generalized sequential patterns. The paper is organized as follows : in Section 2, we give a formal description of the problem of mining generalized sequential patterns, in Section 3 we describe the algorithm SEG-GEN which uses genetic programming tools for discovering such patterns. We conclude the paper by presenting some experimental results and comparing the performance of SEG-GEN to the classical AprioriAll algorithm [3] in the particular case where the input database reduces to a unique table of transactions and consequently the sequential patterns reduces to the classical ones.
3 2 Problem Statement We suppose the reader is familiar with the classical database terminology [1]. Let R = {R 1,..., R m } be a database schema. If A is an attribute, we denote by type(a) the set of values which A can take (for instance, type(a) can be the set of integers, the set of all strings, etc). A PL (Pattern Language) expression E is a SELECT FROM WHERE expression over R (where the condition in the WHERE clause is a boolean combination of simple conditions like A = B, A = a, A a, etc) as well as unions and intersections of SELECT FROM WHERE expressions. For more details on the specification language PL, see [2]. The schema of a PL expression E (denoted by schema(e)) is the set of attributes appearing in the SELECT clause. If I is a database instance and E is a PL expression, we denote by E(I) the set of answers of E when applied to I. Two PL expressions E 1 and E 2 are said to be compatible if schema(e 1 ) = {A 1,..., A k }, schema(e 2 ) = {B 1,..., B k } and type(a i ) = type(b i ) for each i = 1,..., k. A generalized sequential pattern over the database schema R is a sequence < E 1, E 2,..., E n > of compatible PL expressions. Example 2.1 Let R={Client(CliCod,CliName,Income), Buy(CliCod,CarCod), Car(CarCod,Model,Year)} be a database schema. So, < E 1, E 2 > given below is a generalized sequential pattern over R : E 1 SELECT Client.CliCod FROM Buy, Car WHERE (Car.CarCod = Buy.CarCod) AND (Car.Model = Fiat ) AND (Client.Income = low ) E 2 SELECT Client.CliCod FROM Buy, Car WHERE (Car.CarCod = Buy.CarCod) AND (Car.Model = Mercedes ) AND (Client.Income = high ) Following the snapshot approach, a temporal database D is a sequence of database instances D = (D 1,..., D n ) over a database schema R = {R 1,..., R m } (notice that we have m different tables for each instant i = 1,..., n). We denote by dom(d) (the domain of D) the set of all elements appearing in the tables of D i, for each i = 1,..., n. Let us suppose we are given a temporal database D and a generalized sequential pattern σ =< E 1,..., E m >, with m n. Let u be a tuple over schema(e 1 ) 1. We say that u supports σ w.r.t. the dataset D if there exists j 1,..., j m such that u E 1 (D j1 ) E 2 (D j2 )... E m (D jm ). Let (schema(e 1 ),D) denote the set of all tuples over schema(e 1 ) taking values over dom(d). Let N be the cardinality of (schema(e 1 ),D). We define the support of σ w.r.t. D (denoted by sup(σ,d)) as : sup(σ, D) = {u u supports σ} N We say that σ is frequent w.r.t. the dataset D if sup(σ, D) α, where α is a given threshold, 0 α 1. The problem of mining generalized sequential patterns can be stated as follows: Given a temporal database D over a database schema R and a threshold α such that 0 α 1, find the frequent generalized sequential patterns with respect to D and α. 3 The Algorithm SEG-GEN Before presenting the Algorithm SEG-GEN responsible for mining generalized sequential patterns over a temporal dataset D, we first will define the usual genetic programming concepts used in the algorithm. A chromossome is a generalized sequential pattern as defined in section 2. A population is a set of chromossomes. The mutation operation is performed over a chromossome σ in the following way : an arbitrary element (an attribute, a constant, a relation) of an expression E appearing in σ 1 As the expressions E i have compatible schemas, u is also a tuple over schema(e j ) for j = 2,..., m)
4 is chosen and afterwards it is replaced by a different element of the same kind. The reproduction operation is defined as usual. The crossing operation is performed over two chromossomes as follows : arbitrary positions i, j are chosen in the first and second chromossomes respectively. The portion of the first chromossome on the left side of i is exchanged with the portion of the second chromossome on the right side of j and vice-versa. The fitness of a chromossome σ is mesured according its support. Now, we are ready to give an informal description of the Algorithm SEG-GEN. For lack of space, the implementation details of the different procedures used in the algorithm are omitted. Procedure SEG-GEN Input : maxgen (the maximal number of generations), perfit (optimal percentual of fitting chromossomes), minsup (support threshold). GENPOP (% Generates the initial population); SUPCAL(% Calculates the support of each chromossome); ORDERPOP (% The population is ordered in the increasing order given by the support); N := number of chromossomes; U := number of unfitting chromossomes; G := 1; p := U N ; while ((G maxgen) and (p < perfit)) do Choose one of the following operations : CROSSING OR MUTATION OR REPRODUCTION; ORDERPOP; G := G + 1 ; U := number of unfitting chromossomes; p := U N ; BACKTRACKING; At each generation, unfitted chromossomes are stored into a table and afterwards are used in the BACKTRACKING procedure which gives the opportunity to these chromossomes to produce some other frequent patterns. 4 Experimental Results In this section, we present some experimental results of SEG-GEN with synthetic data sets. For this purpose, we have created 24 different databases. The synthetic data have been produced by an algorithm based on the ideas of [3]. The tests have been performed on a Intel Pentium III 650 Mhz workstation, 128 MB of main memory and running Microsoft Windows 2000 Professional. Data was stored on a 20 GB HD Quantum AT Fireball LCT and was accessed via ODBC. Two groups of tests have been executed : the first group concerns the execution of SEG-GEN over multiple tables ; the second one concerns the execution of SEG-GEN over a unique table, and the mining sequences are simple sequential patterns like in [3]. The objective of this second group of tests is to study the relative performance of SEG- GEN and AprioriAll. For lack of space, we present here only the results for some selected databases. First Group : Figure 1 shows that the execution times of SEG-GEN increases as the support is decreased. However, for support values between 0.33 and 0.5 the execution time increasing is not very important. The parameters used are shown in table??. Figure 2 shows the results of SEG-GEN executed over 4 different datasets, with distinct total number of records in their tables. The parameters used are shown in table??. We can verify that the execution times for SEG-GEN scale quite linearly. Second Group : Figure 3 shows the executions of SEG-GEN and AprioriAll over the same dataset, where minimum support is decreased from 0.75 to The parameters used are shown in table??. We can notice that for values of minimum support smaller than 0.07, SEG-GEN is faster than AprioriAll. Figure 4 shows the executions of the two algorithms over several datasets. The parameters used are shown in table??. These results show that AprioriAll is faster than SEG-GEN but the difference between them decreases as the number of records in the dataset increase.
5 Table 1 : Parameters used in figure 1 Number of tables 2 Number of records 1000 Number of patterns 100 Number of instances 12 Table 2 : Parameters used in figure 2 Number of tables 2 Number of patterns 100 Number of instances 12 Support 33% Figure 3 Table 3 : Parameters used in figure 3 Number of items 1000 Number of itemset 2500 Number of patterns 3000 Number of records 2500 Table 4 : Parameters used in figure 4 Number of items 1000 Number of itemset 2500 Number of patterns 3000 Support 33% Figure 1 Figure 2 Figure 4 References [1] S Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, [2] Sandra de Amo, Márcia Fernandes, Flávio Silva, and João Nunes. Mining temporal constraints in databases using genetic programing. XV Simpósio Brasileiro de Banco de Dados SBBD 2000, João Pessoa, Brazil, October 2000, pages [3] R. Agrawal and R. Srikant. Mining sequential patterns. Research Report RJ 9910, IBM Almaden Research Center, San Jose, California, October, [4] R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In In Proc. of the Fifth Int l Conference on Extending Database Technology (EDBT), Avignon, France, March, [5] H. Pinto, J. Han, J. Pei, K. Wang, Q. Chen, and U. Dayal. Multi-dimensional sequential pattern mining. In In Proc Int. Conf. on Information and Knowledge Management (CIKM 01), Atlanta, November, [6] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.-C. Hsu. Freespan: Frequent
6 pattern-projected sequential pattern mining. In In Proc Int. Conf. on Knowledge Discovery and Data Mining (KDD 00), Boston, MA, August, [7] Joshi, M. V., Karypis, G., Kumar, V. : A Universal Formulation of Sequential Patterns. Technical Report, Department of Computer Science, University of Minnesota, [8] Povinelli, R.J. : Using Genetic Algorithms to Find Temporal Patterns Indicative of Time Serie Events. In Proceedings of Genetic and Evolutionary Computation Conference (GECCO- 2000) Workshop Program, Las Vegas, Nevada, 2000, pp [9] Mohammed J. Zaki. Spade: An efficient algorithm for mining frequent sequences. In Machine Learning Journal, special issue on Unsupervised Learning (Doug Fisher, ed.), Vol. 42 Nos. 1/2.
An Algorithm for Frequent Pattern Mining Based On Apriori
An Algorithm for Frequent Pattern Mining Based On Goswami D.N.*, Chaturvedi Anshu. ** Raghuvanshi C.S.*** *SOS In Computer Science Jiwaji University Gwalior ** Computer Application Department MITS Gwalior
More information620 HUANG Liusheng, CHEN Huaping et al. Vol.15 this itemset. Itemsets that have minimum support (minsup) are called large itemsets, and all the others
Vol.15 No.6 J. Comput. Sci. & Technol. Nov. 2000 A Fast Algorithm for Mining Association Rules HUANG Liusheng (ΛΠ ), CHEN Huaping ( ±), WANG Xun (Φ Ψ) and CHEN Guoliang ( Ξ) National High Performance Computing
More informationFast Discovery of Sequential Patterns Using Materialized Data Mining Views
Fast Discovery of Sequential Patterns Using Materialized Data Mining Views Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo
More informationSequential Pattern Mining Methods: A Snap Shot
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-661, p- ISSN: 2278-8727Volume 1, Issue 4 (Mar. - Apr. 213), PP 12-2 Sequential Pattern Mining Methods: A Snap Shot Niti Desai 1, Amit Ganatra
More informationSequential PAttern Mining using A Bitmap Representation
Sequential PAttern Mining using A Bitmap Representation Jay Ayres, Jason Flannick, Johannes Gehrke, and Tomi Yiu Dept. of Computer Science Cornell University ABSTRACT We introduce a new algorithm for mining
More informationBinary Sequences and Association Graphs for Fast Detection of Sequential Patterns
Binary Sequences and Association Graphs for Fast Detection of Sequential Patterns Selim Mimaroglu, Dan A. Simovici Bahcesehir University,Istanbul, Turkey, selim.mimaroglu@gmail.com University of Massachusetts
More informationDISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH
International Journal of Information Technology and Knowledge Management January-June 2011, Volume 4, No. 1, pp. 27-32 DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY)
More informationAssociation Rule Mining. Entscheidungsunterstützungssysteme
Association Rule Mining Entscheidungsunterstützungssysteme Frequent Pattern Analysis Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set
More informationUAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA
UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University
More informationSequential Pattern Mining: A Survey on Issues and Approaches
Sequential Pattern Mining: A Survey on Issues and Approaches Florent Masseglia AxIS Research Group INRIA Sophia Antipolis BP 93 06902 Sophia Antipolis Cedex France Phone number: (33) 4 92 38 50 67 Fax
More informationAC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery
: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Hong Cheng Philip S. Yu Jiawei Han University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center {hcheng3, hanj}@cs.uiuc.edu,
More informationAn Evolutionary Algorithm for Mining Association Rules Using Boolean Approach
An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach ABSTRACT G.Ravi Kumar 1 Dr.G.A. Ramachandra 2 G.Sunitha 3 1. Research Scholar, Department of Computer Science &Technology,
More informationA NOVEL ALGORITHM FOR MINING CLOSED SEQUENTIAL PATTERNS
A NOVEL ALGORITHM FOR MINING CLOSED SEQUENTIAL PATTERNS ABSTRACT V. Purushothama Raju 1 and G.P. Saradhi Varma 2 1 Research Scholar, Dept. of CSE, Acharya Nagarjuna University, Guntur, A.P., India 2 Department
More informationKeshavamurthy B.N., Mitesh Sharma and Durga Toshniwal
Keshavamurthy B.N., Mitesh Sharma and Durga Toshniwal Department of Electronics and Computer Engineering, Indian Institute of Technology, Roorkee, Uttarkhand, India. bnkeshav123@gmail.com, mitusuec@iitr.ernet.in,
More informationDiscover Sequential Patterns in Incremental Database
Discover Sequential Patterns in Incremental Database Nancy P. Lin, Wei-Hua Hao, Hung-Jen Chen, Hao-En, and Chueh, Chung-I Chang Abstract The task of sequential pattern mining is to discover the complete
More informationImproving Efficiency of Apriori Algorithms for Sequential Pattern Mining
Bonfring International Journal of Data Mining, Vol. 4, No. 1, March 214 1 Improving Efficiency of Apriori Algorithms for Sequential Pattern Mining Alpa Reshamwala and Dr. Sunita Mahajan Abstract--- Computer
More informationA Comprehensive Survey on Sequential Pattern Mining
A Comprehensive Survey on Sequential Pattern Mining Irfan Khan 1 Department of computer Application, S.A.T.I. Vidisha, (M.P.), India Anoop Jain 2 Department of computer Application, S.A.T.I. Vidisha, (M.P.),
More informationTo Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set
To Enhance Scalability of Item Transactions by Parallel and Partition using Dynamic Data Set Priyanka Soni, Research Scholar (CSE), MTRI, Bhopal, priyanka.soni379@gmail.com Dhirendra Kumar Jha, MTRI, Bhopal,
More informationMINING FREQUENT MAX AND CLOSED SEQUENTIAL PATTERNS
MINING FREQUENT MAX AND CLOSED SEQUENTIAL PATTERNS by Ramin Afshar B.Sc., University of Alberta, Alberta, 2000 THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE
More informationAn Effective Process for Finding Frequent Sequential Traversal Patterns on Varying Weight Range
13 IJCSNS International Journal of Computer Science and Network Security, VOL.16 No.1, January 216 An Effective Process for Finding Frequent Sequential Traversal Patterns on Varying Weight Range Abhilasha
More informationDiscovering fuzzy time-interval sequential patterns in sequence databases
Discovering fuzzy time-interval sequential patterns in sequence databases Yen-Liang Chen Department of Information Management National Central University ylchen@mgt.ncu.edu.tw Cheng-Kui Huang Department
More informationBrief Survey on DNA Sequence Mining
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 11, November 2013,
More informationDMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE
DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE Saravanan.Suba Assistant Professor of Computer Science Kamarajar Government Art & Science College Surandai, TN, India-627859 Email:saravanansuba@rediffmail.com
More informationInternational Journal of Scientific Research and Reviews
Research article Available online www.ijsrr.org ISSN: 2279 0543 International Journal of Scientific Research and Reviews A Survey of Sequential Rule Mining Algorithms Sachdev Neetu and Tapaswi Namrata
More informationDATA MINING II - 1DL460
Uppsala University Department of Information Technology Kjell Orsborn DATA MINING II - 1DL460 Assignment 2 - Implementation of algorithm for frequent itemset and association rule mining 1 Algorithms for
More informationTemporal Weighted Association Rule Mining for Classification
Temporal Weighted Association Rule Mining for Classification Purushottam Sharma and Kanak Saxena Abstract There are so many important techniques towards finding the association rules. But, when we consider
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Sequence Data Sequence Database: Timeline 10 15 20 25 30 35 Object Timestamp Events A 10 2, 3, 5 A 20 6, 1 A 23 1 B 11 4, 5, 6 B
More informationETP-Mine: An Efficient Method for Mining Transitional Patterns
ETP-Mine: An Efficient Method for Mining Transitional Patterns B. Kiran Kumar 1 and A. Bhaskar 2 1 Department of M.C.A., Kakatiya Institute of Technology & Science, A.P. INDIA. kirankumar.bejjanki@gmail.com
More informationUSING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS
INFORMATION SYSTEMS IN MANAGEMENT Information Systems in Management (2017) Vol. 6 (3) 213 222 USING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS PIOTR OŻDŻYŃSKI, DANUTA ZAKRZEWSKA Institute of Information
More informationA Graph-Based Approach for Mining Closed Large Itemsets
A Graph-Based Approach for Mining Closed Large Itemsets Lee-Wen Huang Dept. of Computer Science and Engineering National Sun Yat-Sen University huanglw@gmail.com Ye-In Chang Dept. of Computer Science and
More informationA Review Paper on Parallel Implementation of Sentinel Mining Algorithm on GPU
A Review Paper on Parallel Implementation of Sentinel Mining Algorithm on GPU N. M. Sonawane Computer Engineering Department Late G. N. Sapkal College of Engineering, Anjneri, Nashik Prof. B. R. Nandwalkar
More informationSeqIndex: Indexing Sequences by Sequential Pattern Analysis
SeqIndex: Indexing Sequences by Sequential Pattern Analysis Hong Cheng Xifeng Yan Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign {hcheng3, xyan, hanj}@cs.uiuc.edu
More informationFPSMining: A Fast Algorithm for Mining User Preferences in Data Streams
FPSMining: A Fast Algorithm for Mining User Preferences in Data Streams Jaqueline A. J. Papini, Sandra de Amo, Allan Kardec S. Soares Federal University of Uberlândia, Brazil jaque@comp.ufu.br, deamo@ufu.br,
More informationMining High Average-Utility Itemsets
Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 Mining High Itemsets Tzung-Pei Hong Dept of Computer Science and Information Engineering
More informationPattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42
Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth
More informationRandom Sampling over Data Streams for Sequential Pattern Mining
Random Sampling over Data Streams for Sequential Pattern Mining Chedy Raïssi LIRMM, EMA-LGI2P/Site EERIE 161 rue Ada 34392 Montpellier Cedex 5, France France raissi@lirmm.fr Pascal Poncelet EMA-LGI2P/Site
More informationMonotone Constraints in Frequent Tree Mining
Monotone Constraints in Frequent Tree Mining Jeroen De Knijf Ad Feelders Abstract Recent studies show that using constraints that can be pushed into the mining process, substantially improves the performance
More informationMining User - Aware Rare Sequential Topic Pattern in Document Streams
Mining User - Aware Rare Sequential Topic Pattern in Document Streams A.Mary Assistant Professor, Department of Computer Science And Engineering Alpha College Of Engineering, Thirumazhisai, Tamil Nadu,
More informationApplying Data Mining to Wireless Networks
Applying Data Mining to Wireless Networks CHENG-MING HUANG 1, TZUNG-PEI HONG 2 and SHI-JINN HORNG 3,4 1 Department of Electrical Engineering National Taiwan University of Science and Technology, Taipei,
More informationConcurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm
Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm Marek Wojciechowski, Krzysztof Galecki, Krzysztof Gawronek Poznan University of Technology Institute of Computing Science ul.
More informationConstraint-Based Mining of Sequential Patterns over Datasets with Consecutive Repetitions
Constraint-Based Mining of Sequential Patterns over Datasets with Consecutive Repetitions Marion Leleu 1,2, Christophe Rigotti 1, Jean-François Boulicaut 1, and Guillaume Euvrard 2 1 LIRIS CNRS FRE 2672
More informationSequential Pattern Mining A Study
Sequential Pattern Mining A Study S.Vijayarani Assistant professor Department of computer science Bharathiar University S.Deepa M.Phil Research Scholar Department of Computer Science Bharathiar University
More informationAn Algorithm for Mining Large Sequences in Databases
149 An Algorithm for Mining Large Sequences in Databases Bharat Bhasker, Indian Institute of Management, Lucknow, India, bhasker@iiml.ac.in ABSTRACT Frequent sequence mining is a fundamental and essential
More informationMining Associated Ranking Patterns from Wireless Sensor Networks
1 Mining Associated Ranking Patterns from Wireless Sensor Networks Pu-Tai Yang Abstract Wireless Sensor Networks (WSNs) are complex networks consisting of many sensors which can detect and collect sensed
More informationPSEUDO PROJECTION BASED APPROACH TO DISCOVERTIME INTERVAL SEQUENTIAL PATTERN
PSEUDO PROJECTION BASED APPROACH TO DISCOVERTIME INTERVAL SEQUENTIAL PATTERN Dvijesh Bhatt Department of Information Technology, Institute of Technology, Nirma University Gujarat,( India) ABSTRACT Data
More informationAn improved approach of FP-Growth tree for Frequent Itemset Mining using Partition Projection and Parallel Projection Techniques
An improved approach of tree for Frequent Itemset Mining using Partition Projection and Parallel Projection Techniques Rana Krupali Parul Institute of Engineering and technology, Parul University, Limda,
More informationInfrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset
Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset M.Hamsathvani 1, D.Rajeswari 2 M.E, R.Kalaiselvi 3 1 PG Scholar(M.E), Angel College of Engineering and Technology, Tiruppur,
More informationSLPMiner: An Algorithm for Finding Frequent Sequential Patterns Using Length-Decreasing Support Constraint Λ
SLPMiner: An Algorithm for Finding Frequent Sequential Patterns Using Length-Decreasing Support Constraint Λ Masakazu Seno and George Karypis Department of Computer Science and Engineering, Army HPC Research
More informationAN EFFICIENT GRADUAL PRUNING TECHNIQUE FOR UTILITY MINING. Received April 2011; revised October 2011
International Journal of Innovative Computing, Information and Control ICIC International c 2012 ISSN 1349-4198 Volume 8, Number 7(B), July 2012 pp. 5165 5178 AN EFFICIENT GRADUAL PRUNING TECHNIQUE FOR
More informationDiscovering interesting rules from financial data
Discovering interesting rules from financial data Przemysław Sołdacki Institute of Computer Science Warsaw University of Technology Ul. Andersa 13, 00-159 Warszawa Tel: +48 609129896 email: psoldack@ii.pw.edu.pl
More informationMining Imperfectly Sporadic Rules with Two Thresholds
Mining Imperfectly Sporadic Rules with Two Thresholds Cu Thu Thuy and Do Van Thanh Abstract A sporadic rule is an association rule which has low support but high confidence. In general, sporadic rules
More informationSEQUENTIAL PATTERN MINING FROM WEB LOG DATA
SEQUENTIAL PATTERN MINING FROM WEB LOG DATA Rajashree Shettar 1 1 Associate Professor, Department of Computer Science, R. V College of Engineering, Karnataka, India, rajashreeshettar@rvce.edu.in Abstract
More informationAn Approach for Privacy Preserving in Association Rule Mining Using Data Restriction
International Journal of Engineering Science Invention Volume 2 Issue 1 January. 2013 An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction Janakiramaiah Bonam 1, Dr.RamaMohan
More informationEfficient Incremental Mining of Top-K Frequent Closed Itemsets
Efficient Incremental Mining of Top- Frequent Closed Itemsets Andrea Pietracaprina and Fabio Vandin Dipartimento di Ingegneria dell Informazione, Università di Padova, Via Gradenigo 6/B, 35131, Padova,
More informationPerformance Analysis of Apriori Algorithm with Progressive Approach for Mining Data
Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data Shilpa Department of Computer Science & Engineering Haryana College of Technology & Management, Kaithal, Haryana, India
More informationUsing Association Rules for Better Treatment of Missing Values
Using Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine Intelligence Group) National University
More informationFM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data
FM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data Qiankun Zhao Nanyang Technological University, Singapore and Sourav S. Bhowmick Nanyang Technological University,
More informationMining Temporal Association Rules in Network Traffic Data
Mining Temporal Association Rules in Network Traffic Data Guojun Mao Abstract Mining association rules is one of the most important and popular task in data mining. Current researches focus on discovering
More informationImproved Frequent Pattern Mining Algorithm with Indexing
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov Dec. 2014), PP 73-78 Improved Frequent Pattern Mining Algorithm with Indexing Prof.
More informationParallel, Incremental and Interactive Mining for Frequent Itemsets in Evolving Databases
Parallel, Incremental and Interactive Mining for Frequent Itemsets in Evolving Databases Adriano Veloso Wagner Meira Jr. Márcio Bunte de Carvalho Computer Science Department Universidade Federal de Minas
More informationTransforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm
Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm Expert Systems: Final (Research Paper) Project Daniel Josiah-Akintonde December
More informationDiscovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree
Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree Virendra Kumar Shrivastava 1, Parveen Kumar 2, K. R. Pardasani 3 1 Department of Computer Science & Engineering, Singhania
More informationWeb Service Usage Mining: Mining For Executable Sequences
7th WSEAS International Conference on APPLIED COMPUTER SCIENCE, Venice, Italy, November 21-23, 2007 266 Web Service Usage Mining: Mining For Executable Sequences MOHSEN JAFARI ASBAGH, HASSAN ABOLHASSANI
More informationSequences Modeling and Analysis Based on Complex Network
Sequences Modeling and Analysis Based on Complex Network Li Wan 1, Kai Shu 1, and Yu Guo 2 1 Chongqing University, China 2 Institute of Chemical Defence People Libration Army {wanli,shukai}@cqu.edu.cn
More informationOptimization using Ant Colony Algorithm
Optimization using Ant Colony Algorithm Er. Priya Batta 1, Er. Geetika Sharmai 2, Er. Deepshikha 3 1Faculty, Department of Computer Science, Chandigarh University,Gharaun,Mohali,Punjab 2Faculty, Department
More informationParallelizing Frequent Itemset Mining with FP-Trees
Parallelizing Frequent Itemset Mining with FP-Trees Peiyi Tang Markus P. Turkia Department of Computer Science Department of Computer Science University of Arkansas at Little Rock University of Arkansas
More informationPartSpan: Parallel Sequence Mining of Trajectory Patterns
Fifth International Conference on Fuzzy Systems and Knowledge Discovery PartSpan: Parallel Sequence Mining of Trajectory Patterns Shaojie Qiao,, Changjie Tang, Shucheng Dai, Mingfang Zhu Jing Peng, Hongjun
More informationMaintenance of the Prelarge Trees for Record Deletion
12th WSEAS Int. Conf. on APPLIED MATHEMATICS, Cairo, Egypt, December 29-31, 2007 105 Maintenance of the Prelarge Trees for Record Deletion Chun-Wei Lin, Tzung-Pei Hong, and Wen-Hsiang Lu Department of
More informationResearch Article Apriori Association Rule Algorithms using VMware Environment
Research Journal of Applied Sciences, Engineering and Technology 8(2): 16-166, 214 DOI:1.1926/rjaset.8.955 ISSN: 24-7459; e-issn: 24-7467 214 Maxwell Scientific Publication Corp. Submitted: January 2,
More informationAn Improved Apriori Algorithm for Association Rules
Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan
More informationFIT: A Fast Algorithm for Discovering Frequent Itemsets in Large Databases Sanguthevar Rajasekaran
FIT: A Fast Algorithm for Discovering Frequent Itemsets in Large Databases Jun Luo Sanguthevar Rajasekaran Dept. of Computer Science Ohio Northern University Ada, OH 4581 Email: j-luo@onu.edu Dept. of
More informationA new algorithm for gap constrained sequence mining
24 ACM Symposium on Applied Computing A new algorithm for gap constrained sequence mining Salvatore Orlando Dipartimento di Informatica Università Ca Foscari Via Torino, 155 - Venezia, Italy orlando@dsi.unive.it
More informationParallel Mining of Maximal Frequent Itemsets in PC Clusters
Proceedings of the International MultiConference of Engineers and Computer Scientists 28 Vol I IMECS 28, 19-21 March, 28, Hong Kong Parallel Mining of Maximal Frequent Itemsets in PC Clusters Vong Chan
More informationlevel 0 level 1 level 2 level 3 (4) (5) (1) (2) (3) (1 2 4) . - Active node. - Inactive node
Parallel Tree Projection Algorithm for Sequence Mining Valerie Guralnik, Nivea Garg, George Karypis fguralnik, garg, karypisg@cs.umn.edu Department of Computer Science and Engineering/Army HPCC Research
More informationMining N-most Interesting Itemsets. Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang. fadafu,
Mining N-most Interesting Itemsets Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang Department of Computer Science and Engineering The Chinese University of Hong Kong, Hong Kong fadafu, wwkwongg@cse.cuhk.edu.hk
More informationEffective Mining Sequential Pattern by Last Position Induction
Effective Mining Sequential Pattern by Last Position Induction Zhenglu Yang and Masaru Kitsuregawa The University of Tokyo Institute of Industrial Science 4-6-1 Komaba, Meguro-Ku Tokyo 153-8305, Japan
More informationFUFM-High Utility Itemsets in Transactional Database
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 3, March 2014,
More informationInterestingness Measurements
Interestingness Measurements Objective measures Two popular measurements: support and confidence Subjective measures [Silberschatz & Tuzhilin, KDD95] A rule (pattern) is interesting if it is unexpected
More informationData Mining Concepts
Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms Sequential
More informationData Access Paths in Processing of Sets of Frequent Itemset Queries
Data Access Paths in Processing of Sets of Frequent Itemset Queries Piotr Jedrzejczak, Marek Wojciechowski Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland Marek.Wojciechowski@cs.put.poznan.pl
More informationPrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth
PrefixSpan: ining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth Jian Pei Jiawei Han Behzad ortazavi-asl Helen Pinto Intelligent Database Systems Research Lab. School of Computing Science
More informationEfficient GSP Implementation based on XML Databases
212 International Conference on Information and Knowledge Management (ICIKM 212) IPCSIT vol.45 (212) (212) IACSIT Press, Singapore Efficient GSP Implementation based on Databases Porjet Sansai and Juggapong
More informationChallenges and Interesting Research Directions in Associative Classification
Challenges and Interesting Research Directions in Associative Classification Fadi Thabtah Department of Management Information Systems Philadelphia University Amman, Jordan Email: FFayez@philadelphia.edu.jo
More informationDistinctive Frequent Itemset Mining from Time Segmented Databases Using ZDD-Based Symbolic Processing. Shin-ichi Minato and Takeaki Uno
TCS Technical Report TCS -TR-A-09-37 Distinctive Frequent Itemset Mining from Time Segmented Databases Using ZDD-Based Symbolic Processing by Shin-ichi Minato and Takeaki Uno Division of Computer Science
More informationMining Frequent Itemsets for data streams over Weighted Sliding Windows
Mining Frequent Itemsets for data streams over Weighted Sliding Windows Pauray S.M. Tsai Yao-Ming Chen Department of Computer Science and Information Engineering Minghsin University of Science and Technology
More informationRaunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering Amravati
Analytical Representation on Secure Mining in Horizontally Distributed Database Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering
More informationParallel and Distributed Frequent Itemset Mining on Dynamic Datasets
Parallel and Distributed Frequent Itemset Mining on Dynamic Datasets Adriano Veloso, Matthew Erick Otey Srinivasan Parthasarathy, and Wagner Meira Jr. Computer Science Department, Universidade Federal
More informationTutorial on Assignment 3 in Data Mining 2009 Frequent Itemset and Association Rule Mining. Gyozo Gidofalvi Uppsala Database Laboratory
Tutorial on Assignment 3 in Data Mining 2009 Frequent Itemset and Association Rule Mining Gyozo Gidofalvi Uppsala Database Laboratory Announcements Updated material for assignment 3 on the lab course home
More informationFrequent Pattern Mining
Frequent Pattern Mining...3 Frequent Pattern Mining Frequent Patterns The Apriori Algorithm The FP-growth Algorithm Sequential Pattern Mining Summary 44 / 193 Netflix Prize Frequent Pattern Mining Frequent
More informationAn Improved Algorithm for Mining Association Rules Using Multiple Support Values
An Improved Algorithm for Mining Association Rules Using Multiple Support Values Ioannis N. Kouris, Christos H. Makris, Athanasios K. Tsakalidis University of Patras, School of Engineering Department of
More informationAssociation mining rules
Association mining rules Given a data set, find the items in data that are associated with each other. Association is measured as frequency of occurrence in the same context. Purchasing one product when
More informationEfficient Updating of Discovered Patterns for Text Mining: A Survey
Efficient Updating of Discovered Patterns for Text Mining: A Survey Anisha Radhakrishnan Post Graduate Student Karunya university Coimbatore, India Mathew Kurian Assistant Professor Karunya University
More informationAssociation Rule Mining from XML Data
144 Conference on Data Mining DMIN'06 Association Rule Mining from XML Data Qin Ding and Gnanasekaran Sundarraj Computer Science Program The Pennsylvania State University at Harrisburg Middletown, PA 17057,
More informationAn Efficient Algorithm for finding high utility itemsets from online sell
An Efficient Algorithm for finding high utility itemsets from online sell Sarode Nutan S, Kothavle Suhas R 1 Department of Computer Engineering, ICOER, Maharashtra, India 2 Department of Computer Engineering,
More informationPattern Mining in Frequent Dynamic Subgraphs
Pattern Mining in Frequent Dynamic Subgraphs Karsten M. Borgwardt, Hans-Peter Kriegel, Peter Wackersreuther Institute of Computer Science Ludwig-Maximilians-Universität Munich, Germany kb kriegel wackersr@dbs.ifi.lmu.de
More informationComparing the Performance of Frequent Itemsets Mining Algorithms
Comparing the Performance of Frequent Itemsets Mining Algorithms Kalash Dave 1, Mayur Rathod 2, Parth Sheth 3, Avani Sakhapara 4 UG Student, Dept. of I.T., K.J.Somaiya College of Engineering, Mumbai, India
More informationAnalysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data
Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data D.Radha Rani 1, A.Vini Bharati 2, P.Lakshmi Durga Madhuri 3, M.Phaneendra Babu 4, A.Sravani 5 Department
More informationClaSP: An Efficient Algorithm for Mining Frequent Closed Sequences
ClaSP: An Efficient Algorithm for Mining Frequent Closed Sequences Antonio Gomariz 1,, Manuel Campos 2,RoqueMarin 1, and Bart Goethals 3 1 Information and Communication Engineering Dept., University of
More informationGenerating Cross level Rules: An automated approach
Generating Cross level Rules: An automated approach Ashok 1, Sonika Dhingra 1 1HOD, Dept of Software Engg.,Bhiwani Institute of Technology, Bhiwani, India 1M.Tech Student, Dept of Software Engg.,Bhiwani
More informationDynamic Load Balancing of Unstructured Computations in Decision Tree Classifiers
Dynamic Load Balancing of Unstructured Computations in Decision Tree Classifiers A. Srivastava E. Han V. Kumar V. Singh Information Technology Lab Dept. of Computer Science Information Technology Lab Hitachi
More information