Efficient Distributed Data Mining using Intelligent Agents

Size: px
Start display at page:

Download "Efficient Distributed Data Mining using Intelligent Agents"

Transcription

1 1 Efficient Distributed Data Mining using Intelligent Agents Cristian Aflori and Florin Leon Abstract Data Mining is the process of extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases. Agents are defined as software or hardware entities that perform some set of tasks on behalf of users with some degree of autonomy. In order to work for somebody as an assistant, an agent has to include a certain amount of intelligence, which is the ability to choose among various courses of action, plan, communicate, adapt to changes in the environment, and learn from experience. In general, an intelligent agent can be described as consisting of a sensing element that can receive events, a recognizer or classifier that determines which event occurred, a set of logic ranging from hard-coded programs to rule-based inference, and a mechanism for taking action. In several steps through knowledge discovery, which include data preparation, mining model selection and application, and output analysis, intelligent agent paradigm can be used to automate the individual tasks. In the experiment setup, we discover association rules in a distributed database using intelligent agents. We apply an original approach for effective distributed mining association rules: loose-couple incremental methods. We compare the results obtained with the similar work done in the field. Index Terms Association rules, distributed data mining, intelligent agents, loose-couple incremental methods. I I. INTELLIGENT AGENTS N recent years, distributed artificial intelligence developed and diversified, as it is a research field that merges concepts and results from many disciplines, such as psychology, sociology and economy. Its interdisciplinary nature makes it difficult to established a unanimously accepted definition, but generally distributed artificial intelligence refers to "the study, construction, and application of multiagent systems, that is, systems in which several interacting, intelligent agents pursue some set of goals or perform some set of tasks." [1]. This development was facilitated by the progress accomplished in computer science. Multi-tasking operating systems, communicating processes, distributed computing, This work was supported in part by the National University Research Council under Grant AT no 66 / 2004, The prototype of GIS web system for data mining using intelligent agents. Cristian Aflori is a lecturer at the Gh. Asachi Technical University, Department of Automatic Control and Computer Science ( caflori@ cs.tuiasi.ro). Florin Leon is a PhD student at the Gh. Asachi Technical University, Department of Automatic Control and Computer Science ( fleon@ cs.tuiasi.ro). and object oriented programming languages supported the design, implementation and deployment of agent-based systems. Most classical artificial intelligence systems are statical, their architecture is predefined, while agent-based systems dynamically modify in time. Distributed artificial intellience studies the issues related to the design of distributed, interactive systems. An autonomous agent can be considered a system situated within and a part of an environment that senses that environment and acts on it, over time, in pursuit of its own agenda and so as to affect what it senses in the future [2]. The use of agents is mainly justified by the fact that they are a solution for managing complex systems. Because of their autonomy, they can act on behalf of the user, without having only the role of a simple interface. Agents are defined as software or hardware entities that perform some set of tasks on behalf of users with some degree of autonomy [3]. In order to work for somebody as an assistant, an agent has to include a certain amount of intelligence, which is the ability to choose among various courses of action, plan, communicate, adapt to changes in the environment, and learn from experience. In general, an intelligent agent can be described as consisting of a sensing element that can receive events, a recognizer or classifier that determines which event occurred, a set of logic ranging from hard-coded programs to rule-based inference, and a mechanism for taking action [4] [5]. Other attributes that are important for agent paradigm include mobility and learning. An agent is mobile if it can navigate through a network and perform tasks on remote machines. A learning agent adapts to the requirements of its user and automatically changes its behavior to environmental changes. II. DATA MINING AGENTS Data mining (DM) or knowledge discovery in databases (KDD) is the process of search for valuable information in large volumes of data, exploration and analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns and rules [6]. Data mining agents seek data and information based on the profile of the user and the instructions she gives. A group of flexible data-mining agents can co-operate to discover knowledge from distributed sources. They are responsible for accessing data and extracting higher-level useful information from the data. A data mining agent specializes in performing some activity in the domain of interest. Agents can work in

2 2 parallel and share the information they have gathered so far. In several steps through knowledge discovery, which include data preparation, mining model selection and application, and output analysis, intelligent agent paradigm can be used to automate the individual tasks. In data preparation, agent use can be especially on sensitivity to learning parameters, applying some triggers for database updates and handling missing or invalid data. In data mining model, we have seen the agent-based studies are implemented for classification, clustering, summarization and generalization which have learning nature and rule generation since current learning methods are able to find regularities in large data sets [8], [9]. An intelligent agent can use domain knowledge with embedded simple rules and using the training data it can learn and reduce the need for domain experts. In the interpretation of what is learned, a scanning agent can go through the rules and facts generated and identify items that can possibly contain valuable information [10]. Searching for patterns of interest by using learning and intelligence in classification, clustering, summarization and generalization can also be accomplished by intelligent agents. An agent can learn from a profile or from examples and feedback from user can be used to refine confidence in agent s predictions. An intelligent agent can use domain knowledge with embedded simple rules and using the training data it can learn and reduce the need for domain experts. Data mining using neural networks and possible intelligent agent use in data mining process are discussed in [4]. In the understanding of what is learned, agent use can be only as a fixed-agent or simply a program in visualization. The major advantage of using intelligent agents in automation of data mining is indicated as their possible support for online transaction data mining. When new data is added to the database, an alarm or triggering agent can send events to the main mining application and to the learning task in it, so that new data can be evaluated with the already mined data. This automated decision support using triggers in data mining is called as active data mining [11]. Since, the main mining functions can be performed by using learning methods, the implementation and application of these methods by using intelligent agents will provide flexible, modular and delegated solution. Additionally, this paradigm can be used in the parallelization of the data mining algorithms according to its usability in distributed environments. III. DISTRIBUTED DATA MINING In the present time, the organizations have different branches located in various geographical places, and each branch own a local database to store information about their own business. If the top level management needs to mine novel information in the process of decision making, there are two options. The first one is to transfer data to a single database and mine it on that database. The second option is to mine them independently and still generate information for the combination of the data in multiple databases. The architecture of a data mining system using intelligent agents is presented in the following figure: The development of distributed rule mining is a challenging and critical task since it requires knowledge of all the data stored at different locations and the ability to combine partial results from individual databases into a single result [12]. The individual databases have to be analyzed to generate rules to make local decisions. It would be easier for the organization to make decisions based on the rules generated by the Branch 2 Agents Communication Branch 1 Top Level Branch (Central DB) Branch 4 individual branches, rather than using the raw data. If the raw data from each of the individual databases were sent to a single database to generate the rules, certain useful rules, which would aid in making decisions about local branches, would be lost. If the raw data from all the databases were transferred to a single database then each of the individual branches would not be generating the rules with respect to its data. In such a case the organization may miss out certain rules that were prominent in certain branches and were not found in the other branches similar to the above example. Generating such rules would aid in making decisions about specific branches. The patterns in multi-databases are divided into the following classes [13]: - patterns: branches need to consider the original raw data in their datasets so they can identify local patterns for local decisions. - High-vote patterns: These are the patterns that are supported by most of the branches and are used for making global decisions. - Exceptional patterns: These patterns are strongly supported by only a few branches and are used to create policies for specific branches. IV. INCREMENTAL ASSOCIATION Branch 3 Fig. 1. Mining rules in a distributed environment using intelligent agents. In this paper, we focus on an important data mining operation, the association, which we perform by the means of the agents. Association rule induction is a powerful method, which aims at finding regularities in the trends of the data. With the induction of association rules one tries to find sets of data

3 3 instances that frequently appear together. Such information is usually expressed in the form of rules. An association rule expresses an association between (sets of) items. However, not every association rule is useful, only those that are expressive and reliable. Therefore, the standard measures to assess association rules are the support and the confidence of a rule, both of which are computed from the support of certain item sets. Our procedure is based on the idea of the Apriori algorithm [14] for extracting the rules. From the implementation point of view, we used the variant described in the data mining book by Witten and Frank [15]. The algorithm is founded on the observation that if any given set of attributes S is not adequately supported, any superset of S will also not be adequately supported and consequently any effort to calculate the support for such supersets is wasted. For example if we know that {A, B} is not supported it follows that {A, B, C}, {A, B, D}, etc. will also not be supported. The algorithm first determines the support for all single attributes (sets of cardinality 1) in the data set, and deletes all the single attributes that are not adequately supported. Then, for all supported single attributes, it constructs pairs of attributes (sets of cardinality 2). If there are no pairs, it finishes; otherwise it determines the support for the constructed pairs. For all supported pairs of attributes candidate sets of cardinality 3 (triples) are built. Again, if there are no triples, it ends; otherwise it determines the support for the constructed triples. It continues likewise until no more candidate sets can be produced. In a distributed incremental approach individual agents have access only to a limited number of transactions. Therefore, by employing the Apriori algorithm, they only have a partial view of the association rules. However, they can memorize the rules with a lower support and gradually update them, as they access more databases or communicate with other agents. In the related work the Incremental mining algorithm [16], [17], [18] is used to find new frequent itemsets with minimal recomputation when new transactions are added to or deleted from the transaction database. The algorithm uses the negative border concept for this. The negative border [19] consists of all itemsets that were candidates, which did not have the minimum support. During each pass of the Apriori algorithm, the set of candidate itemsets Ck is computed from the frequent itemsets Fk-1 in the join and prune steps of the algorithm. The negative border is the set of all those itemsets that were candidates in the k th pass but did not satisfy the user specified support, that is (NBd(Fk)) = Ck Fk. The algorithm uses a full scan of the whole database only if the negative border of the frequent itemsets expands. V. CASE STUDY In order to demonstrate our approach, let us consider a database with four attributes, each taking nine possible values. The structure of the database is given as ARFF (Attribute- Relation File Format), a common format used to describe data for machine learning first second third fourth A1,B2,C4,D1 A3,B2,C1,D7 A1,B2,C2,D3 A2,B1,C2,D2 A2,B2,C2,D7 A1,B3,C4,D2 A1,B4,C2,D3 A2,B2,C4,D4 A1,B2,C3,D7 A3,B4,C4,D5 We split the main database in two databases, each containing an equal number of transactions: A1,B2,C4,D1 A3,B2,C1,D7 A1,B2,C2,D3 A2,B1,C2,D2 A2,B2,C2,D7 A1,B3,C4,D2 A1,B4,C2,D3 A2,B2,C4,D4 A1,B2,C3,D7 A3,B4,C4,D5 A4,B4,C3,D5 A1,B4,C4,D7 A6,B2,C4,D6 A1,B5,C5,D7 A5,B2,C6,D6 A1,B6,C4,D5 A3,B2,C6,D7 A3,B5,C5,D5 A3,B3,C5,D4 We applied the standard Apriori algorithm to the two databases and retained the rules with a support higher than 1, because the databases were simple enough. The first database gives the following rules: 1) fourth=d7 6 ==> second=b2 6 2) first=a1 fourth=d7 4 ==> second=b2 4 3) third=c4 fourth=d7 3 ==> first=a1 second=b2 3 4) first=a1 third=c4 fourth=d7 3 ==> second=b2 3 5) second=b2 third=c4 fourth=d7 3 ==> first=a1 3 6) third=c4 fourth=d7 3 ==> second=b2 3 7) third=c4 fourth=d7 3 ==> first=a1 3 8) fourth=d3 2 ==> first=a1 third=c2 2 9) first=a1 third=c2 2 ==> fourth=d3 2

4 4 10) first=a1 fourth=d3 2 ==> third=c2 2 11) third=c2 fourth=d3 2 ==> first=a1 2 12) fourth=d3 2 ==> third=c2 2 13) fourth=d3 2 ==> first=a1 2 The second database gives the following rules: 1) third=c4 fourth=d7 5 ==> first=a1 5 2) first=a1 second=b2 4 ==> third=c4 fourth=d7 4 3) first=a1 second=b2 third=c4 4 ==> fourth=d7 4 4) first=a1 second=b2 fourth=d7 4 ==> third=c4 4 5) second=b2 third=c4 fourth=d7 4 ==> first=a1 4 6) first=a1 second=b2 4 ==> fourth=d7 4 7) first=a1 second=b2 4 ==> third=c4 4 8) second=b5 2 ==> third=c5 2 9) fourth=d6 2 ==> second=b2 2 10) third=c6 2 ==> second=b2 2 These results can be combined in order to produce a common set of rules by adding the support of the premises and, respectively, the support of the conclusions: Supp p = Supp pi (1) Supp c = Supp ci (2) 4) first=a1 second=b2 third=c4 8 ==> fourth=d7 7 5) first=a1 second=b2 fourth=d7 8 ==> third=c4 7 6) first=a1 third=c4 fourth=d7 8 ==> second=b2 7 7) third=c4 fourth=d7 8 ==> second=b2 7 8) fourth=d7 13 ==> second=b2 11 conf:(0.85) 9) first=a1 fourth=d7 10 ==> third=c4 8 conf:(0.8) 10) first=a1 second=b2 10 ==> fourth=d7 8 conf:(0.8) One can observe that these rules can be also obtained by combining partial rules from the two databases (Table 1). It is important to note that some global rules cannot appear in the partial database rules, because there are not enough transactions to form them. In one case (the third line of the table), we considered the premise and conclusion as interchangeable, because their confidence was 1. By merging all the selected partial rules one cannot obtain exactly the main rules. Had we memorized all the partial rules, we could have combined them into some more precise rules, closer to those of the main database. The confidence factor can be recomputed by dividing the summed supports: Supp Supp c ci Conf = = (3) Supp Supp p We also computed the main database rules. The following are the first ten rules obtained: 1) third=c4 fourth=d7 8 ==> first=a1 8 2) second=b2 third=c4 fourth=d7 7 ==> first=a1 7 3) third=c4 fourth=d7 8 ==> first=a1 second=b2 pi TABLE I GENERETATED ASSOCIATION RULES VI. PERFORMANCE EVALUATION In an organization with several branches it is crucial that the top level management to have a complete, update imagine of their own activities all over the world. In order to achieve this goal, it is important to efficiently mine novel information in a distributed environment. For these reasons we need to measure the performance of the incremental association algorithm comparing with the standard Apriori algorithm. The classic Apriori algorithm performs a recomputation of all data each time a database increment arrives from a local database. To measure the algorithm performances an experiment was setup using the following parameters: a synthetically dataset Main database rule First partial database rule Second partial database rule 1. third=c4, fourth=d7 8 ==> first=a third=c4 fourth=d7 3 ==> first=a third=c4 fourth=d7 5 ==> first=a second=b2, third=c4, fourth=d7 7 ==> first=a second=b2 third=c4 fourth=d7 3 ==> first=a second=b2 third=c4 fourth=d7 4 ==> first=a third=c4, fourth=d7 8 ==> first=a1, second=b third=c4, fourth=d7 3 ==> first=a1 second=b first=a1 second=b2 4 ==> third=c4 fourth=d first=a1 second=b2 third=c4 8 ==> fourth=d first=a1 second=b2 third=c4 4 ==> fourth=d first=a1 second=b2 fourth=d7 8 ==> third=c first=a1 second=b2 fourth=d7 4 ==> third=c first=a1 third=c4 fourth=d7 8 ==> 4. first=a1 third=c4 fourth=d7 3 ==> - second=b2 7 second=b third=c4 fourth=d7 8 ==> second=b third=c4 fourth=d7 3 ==> second=b fourth=d7 13 ==> second=b2 11 conf:(0.85) 1. fourth=d7 6 ==> second=b first=a1 fourth=d7 10 ==> third=c conf:(0.8) 10. first=a1 second=b2 10 ==> fourth=d7 8 conf:(0.8) - 6. first=a1 second=b2 4 ==> fourth=d7 4

5 5 in ARFF format, a server (P4 2.4Mhz, 1Gb RAM) with the WindowsXP operating system. The algorithm implementation is in J# (the Microsoft version of Java for the DotNET framework): the classical Apriori and the incremental version. The nomenclature of these datasets is of the form TxxIyyDzzzK, where xx denotes the average number of items present per transaction, "yy" denotes the average support of each item in the dataset and "zzzk" denotes the total number of transactions in K (1000s). A percentage of the transactions of the database are considered as the original database and the remaining transactions are added incrementally in percentages. The experiments are performed for 3 increments to the original database as in most cases recomputing may turn out to be better than incremental mining during the initial iterations till the size of the dataset grows considerably. The database grows with a percentage of 10% from the total size. The initial size of the database is 700k transactions. The database is updated with 3 increments each having 100k transactions and the rules resulted have 30%, 25% and 20% support factor. The efficiency of the incremental algorithm comparing with the classical Apriori is presented in the Figure 2, which shows an improvement in performance for the incremental association algorithm compared to the classical Apriori of 48% for 30% support, and it decreased to about 43% for 25% support and to 40% for 20% support. The resulted performance is similar to the related work done in the incremental association algorithm using the negative border concept [20]. The experiments show that the incremental mining performs better compared with recomputation for larger datasets. VII. CONCLUSION AND FUTURE WORK This paper presented an original approach for efficiently Fig. 2. Performance of the incremental association algorithm comparing with the classical Apriori algorithm. mining association rules in a distributed environment using intelligent agents. The case study showed that the incremental algorithm produced almost the same rules like the classical recomputation algorithm. The performance of the incremental algorithm is varying depending on the database size, the increment size and the support factor, but for large datasets the improvement is about 40%-48% compared to the classical algorithm. As future direction of research, we will consider forming sets of superior cardinality from the partial rules transported by the agents, instead of simply mixing the existing rules into similar ones with higher support. Also, we will explore other incremental versions of association algorithms, implemented in the database (database tight), and we will evaluate the algorithms in a real world case. Also, it is very interesting to extend the data mining multiagent framework with the visualization features of a Geographic Information System and with the capabilities of mining spatial data. REFERENCES [1] G. Weiß, S. Sen (eds.): Adaptation and Learning in Multiagent Systems. Berlin: Springer Verlag, 1996 [2] S. Franklin, A. Graesser: Is it an Agent, or just a Program?: A Taxonomy for Autonomous Agents, Institute for Intelligent Systems, University of Memphis, [3] S.Russell, P.Norvig, Artificial Intelligence: A Modern Approach, Prentice-Hall, 1995 [4] J.P.Bigus, Data Mining with Neural Networks - Solving Business Problems from Application Development to Decision Support, McGraw-Hill, [5] T.Dean, J.Allen, Y.Aloimonos, Artificial Intelligence: Theory and Practice, The Benjamin/Cummings Publishing Co. Inc., [6] Fayyad U. M.,.J., Piatetsky-Shapiro G., Smyth P.: From Data Mining to KnowledgeDiscovery: An Overview, in: Advances in Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, 1996, pp [7] Han, J. and M. Kamber, Data Mining : Concepts and Techniques. 2001: Morgan Kaufmann Publishers. [8] T.R.Payne, P.Edwards, C.L.Green, Experience with Rule Induction and k-nearest Neighbor Methods for Interface Agents that Learn, IEEE Transactions on Knowledge and Data Engineering, vol.9, no.2, pp , Mar/Apr 1997 [9] J.Yang, P.Pai, V. Hanovar, L.Miller, Mobile Intelligent Agents for Document Classification and Retrieval: A Machine Learning Approach, Proceedings of the European Symposium on Cybernetics and System Research, Vienna, Austria, [10] H.S.Nwana, M.Wooldridge, Software Agent Technologies, Software Agents and Soft Computing: Towards Enhanced Machine Intelligence, Lecture Notes in Artificial Intelligence 1198, pp.59-77, [11] Agrawal and Psalia Active Data Mining! [12] Zhang, S., X. Wu, and C. Zhang, Multi-Database Mining. IEEE Computational Intelligence Bulletin, Vol. 2, No. 1, June 2003: p [13] Wu, X. and S. Zhang. Synthesizing High-Frequency Rules from Different Data Sources. in IEEE Transactions on Knowledge and Data Engineering [14] Agrawal R., Srikant R., Fast Algorithms for Mining Association Rules, Proc. 20th Int. Conf. Very Large Data Bases, VLDB, 1994 [15] Witten, I. H., Frank, E., Data Mining: Practical machine learning tools with Java implementations, Morgan Kaufmann, San Francisco, [16] Thomas, S., et al. An Efficient Algorithm for the Incremental Updation of Association Rules in Large Databases. in Knowledge Discovery and Data Mining [17] Thuraisingham, B., A Primer for Understanding and Applying Data Mining. IEEE, Vol. 2, No.1: p

6 [18] Agrawal, R., T. Imielinski, and A. Swami. Mining Association Rules between sets of items in large databases. in ACM SIGMOD International Conference on the Management of Data Washington, D.C. [19] Toivonen, H. Sampling Large Databases for Association Rules. in In Proc Int. Conf. Very Large Data Bases. 1996: Morgan Kaufman. [20] Hima Valli Kona, Association Rule Mining Over Multiple Databases: Partitioned and Incremental Approaches, The University of Texas at Arlington,

An Improved Apriori Algorithm for Association Rules

An Improved Apriori Algorithm for Association Rules Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan

More information

Temporal Weighted Association Rule Mining for Classification

Temporal Weighted Association Rule Mining for Classification Temporal Weighted Association Rule Mining for Classification Purushottam Sharma and Kanak Saxena Abstract There are so many important techniques towards finding the association rules. But, when we consider

More information

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set To Enhance Scalability of Item Transactions by Parallel and Partition using Dynamic Data Set Priyanka Soni, Research Scholar (CSE), MTRI, Bhopal, priyanka.soni379@gmail.com Dhirendra Kumar Jha, MTRI, Bhopal,

More information

Improved Frequent Pattern Mining Algorithm with Indexing

Improved Frequent Pattern Mining Algorithm with Indexing IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov Dec. 2014), PP 73-78 Improved Frequent Pattern Mining Algorithm with Indexing Prof.

More information

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach ABSTRACT G.Ravi Kumar 1 Dr.G.A. Ramachandra 2 G.Sunitha 3 1. Research Scholar, Department of Computer Science &Technology,

More information

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining P.Subhashini 1, Dr.G.Gunasekaran 2 Research Scholar, Dept. of Information Technology, St.Peter s University,

More information

Discovering interesting rules from financial data

Discovering interesting rules from financial data Discovering interesting rules from financial data Przemysław Sołdacki Institute of Computer Science Warsaw University of Technology Ul. Andersa 13, 00-159 Warszawa Tel: +48 609129896 email: psoldack@ii.pw.edu.pl

More information

Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree

Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree Virendra Kumar Shrivastava 1, Parveen Kumar 2, K. R. Pardasani 3 1 Department of Computer Science & Engineering, Singhania

More information

Understanding Rule Behavior through Apriori Algorithm over Social Network Data

Understanding Rule Behavior through Apriori Algorithm over Social Network Data Global Journal of Computer Science and Technology Volume 12 Issue 10 Version 1.0 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc. (USA) Online ISSN: 0975-4172

More information

Mining Quantitative Association Rules on Overlapped Intervals

Mining Quantitative Association Rules on Overlapped Intervals Mining Quantitative Association Rules on Overlapped Intervals Qiang Tong 1,3, Baoping Yan 2, and Yuanchun Zhou 1,3 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China {tongqiang,

More information

A mining method for tracking changes in temporal association rules from an encoded database

A mining method for tracking changes in temporal association rules from an encoded database A mining method for tracking changes in temporal association rules from an encoded database Chelliah Balasubramanian *, Karuppaswamy Duraiswamy ** K.S.Rangasamy College of Technology, Tiruchengode, Tamil

More information

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department

More information

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42 Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth

More information

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA)

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA) International Journal of Innovation and Scientific Research ISSN 2351-8014 Vol. 12 No. 1 Nov. 2014, pp. 217-222 2014 Innovative Space of Scientific Research Journals http://www.ijisr.issr-journals.org/

More information

Using Association Rules for Better Treatment of Missing Values

Using Association Rules for Better Treatment of Missing Values Using Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine Intelligence Group) National University

More information

Mining High Average-Utility Itemsets

Mining High Average-Utility Itemsets Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 Mining High Itemsets Tzung-Pei Hong Dept of Computer Science and Information Engineering

More information

A Data Mining Framework for Extracting Product Sales Patterns in Retail Store Transactions Using Association Rules: A Case Study

A Data Mining Framework for Extracting Product Sales Patterns in Retail Store Transactions Using Association Rules: A Case Study A Data Mining Framework for Extracting Product Sales Patterns in Retail Store Transactions Using Association Rules: A Case Study Mirzaei.Afshin 1, Sheikh.Reza 2 1 Department of Industrial Engineering and

More information

ANU MLSS 2010: Data Mining. Part 2: Association rule mining

ANU MLSS 2010: Data Mining. Part 2: Association rule mining ANU MLSS 2010: Data Mining Part 2: Association rule mining Lecture outline What is association mining? Market basket analysis and association rule examples Basic concepts and formalism Basic rule measurements

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

MINING ASSOCIATION RULE FOR HORIZONTALLY PARTITIONED DATABASES USING CK SECURE SUM TECHNIQUE

MINING ASSOCIATION RULE FOR HORIZONTALLY PARTITIONED DATABASES USING CK SECURE SUM TECHNIQUE MINING ASSOCIATION RULE FOR HORIZONTALLY PARTITIONED DATABASES USING CK SECURE SUM TECHNIQUE Jayanti Danasana 1, Raghvendra Kumar 1 and Debadutta Dey 1 1 School of Computer Engineering, KIIT University,

More information

Mining High Order Decision Rules

Mining High Order Decision Rules Mining High Order Decision Rules Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 e-mail: yyao@cs.uregina.ca Abstract. We introduce the notion of high

More information

Efficient Mining of Generalized Negative Association Rules

Efficient Mining of Generalized Negative Association Rules 2010 IEEE International Conference on Granular Computing Efficient Mining of Generalized egative Association Rules Li-Min Tsai, Shu-Jing Lin, and Don-Lin Yang Dept. of Information Engineering and Computer

More information

Data Mining. Introduction. Piotr Paszek. (Piotr Paszek) Data Mining DM KDD 1 / 44

Data Mining. Introduction. Piotr Paszek. (Piotr Paszek) Data Mining DM KDD 1 / 44 Data Mining Piotr Paszek piotr.paszek@us.edu.pl Introduction (Piotr Paszek) Data Mining DM KDD 1 / 44 Plan of the lecture 1 Data Mining (DM) 2 Knowledge Discovery in Databases (KDD) 3 CRISP-DM 4 DM software

More information

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Abstract Mrs. C. Poongodi 1, Ms. R. Kalaivani 2 1 PG Student, 2 Assistant Professor, Department of

More information

LOAD BALANCING IN MOBILE INTELLIGENT AGENTS FRAMEWORK USING DATA MINING CLASSIFICATION TECHNIQUES

LOAD BALANCING IN MOBILE INTELLIGENT AGENTS FRAMEWORK USING DATA MINING CLASSIFICATION TECHNIQUES 8 th International Conference on DEVELOPMENT AND APPLICATION SYSTEMS S u c e a v a, R o m a n i a, M a y 25 27, 2 0 0 6 LOAD BALANCING IN MOBILE INTELLIGENT AGENTS FRAMEWORK USING DATA MINING CLASSIFICATION

More information

COMPARISON OF K-MEAN ALGORITHM & APRIORI ALGORITHM AN ANALYSIS

COMPARISON OF K-MEAN ALGORITHM & APRIORI ALGORITHM AN ANALYSIS ABSTRACT International Journal On Engineering Technology and Sciences IJETS COMPARISON OF K-MEAN ALGORITHM & APRIORI ALGORITHM AN ANALYSIS Dr.C.Kumar Charliepaul 1 G.Immanual Gnanadurai 2 Principal Assistant

More information

A Decremental Algorithm for Maintaining Frequent Itemsets in Dynamic Databases *

A Decremental Algorithm for Maintaining Frequent Itemsets in Dynamic Databases * A Decremental Algorithm for Maintaining Frequent Itemsets in Dynamic Databases * Shichao Zhang 1, Xindong Wu 2, Jilian Zhang 3, and Chengqi Zhang 1 1 Faculty of Information Technology, University of Technology

More information

Clustering Algorithms In Data Mining

Clustering Algorithms In Data Mining 2017 5th International Conference on Computer, Automation and Power Electronics (CAPE 2017) Clustering Algorithms In Data Mining Xiaosong Chen 1, a 1 Deparment of Computer Science, University of Vermont,

More information

Tadeusz Morzy, Maciej Zakrzewicz

Tadeusz Morzy, Maciej Zakrzewicz From: KDD-98 Proceedings. Copyright 998, AAAI (www.aaai.org). All rights reserved. Group Bitmap Index: A Structure for Association Rules Retrieval Tadeusz Morzy, Maciej Zakrzewicz Institute of Computing

More information

EFFICIENT ALGORITHM FOR MINING FREQUENT ITEMSETS USING CLUSTERING TECHNIQUES

EFFICIENT ALGORITHM FOR MINING FREQUENT ITEMSETS USING CLUSTERING TECHNIQUES EFFICIENT ALGORITHM FOR MINING FREQUENT ITEMSETS USING CLUSTERING TECHNIQUES D.Kerana Hanirex Research Scholar Bharath University Dr.M.A.Dorai Rangaswamy Professor,Dept of IT, Easwari Engg.College Abstract

More information

C-NBC: Neighborhood-Based Clustering with Constraints

C-NBC: Neighborhood-Based Clustering with Constraints C-NBC: Neighborhood-Based Clustering with Constraints Piotr Lasek Chair of Computer Science, University of Rzeszów ul. Prof. St. Pigonia 1, 35-310 Rzeszów, Poland lasek@ur.edu.pl Abstract. Clustering is

More information

An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets

An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.8, August 2008 121 An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets

More information

Appropriate Item Partition for Improving the Mining Performance

Appropriate Item Partition for Improving the Mining Performance Appropriate Item Partition for Improving the Mining Performance Tzung-Pei Hong 1,2, Jheng-Nan Huang 1, Kawuu W. Lin 3 and Wen-Yang Lin 1 1 Department of Computer Science and Information Engineering National

More information

Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules

Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules Manju Department of Computer Engg. CDL Govt. Polytechnic Education Society Nathusari Chopta, Sirsa Abstract The discovery

More information

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Kranti Patil 1, Jayashree Fegade 2, Diksha Chiramade 3, Srujan Patil 4, Pradnya A. Vikhar 5 1,2,3,4,5 KCES

More information

Materialized Data Mining Views *

Materialized Data Mining Views * Materialized Data Mining Views * Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland tel. +48 61

More information

CSE 626: Data mining. Instructor: Sargur N. Srihari. Phone: , ext. 113

CSE 626: Data mining. Instructor: Sargur N. Srihari.   Phone: , ext. 113 CSE 626: Data mining Instructor: Sargur N. Srihari E-mail: srihari@cedar.buffalo.edu Phone: 645-6164, ext. 113 1 What is Data Mining? Different perspectives: CSE, Business, IT As a field of research in

More information

Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm

Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm Expert Systems: Final (Research Paper) Project Daniel Josiah-Akintonde December

More information

Association Rule Learning

Association Rule Learning Association Rule Learning 16s1: COMP9417 Machine Learning and Data Mining School of Computer Science and Engineering, University of New South Wales March 15, 2016 COMP9417 ML & DM (CSE, UNSW) Association

More information

Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

More information

DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE

DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE Saravanan.Suba Assistant Professor of Computer Science Kamarajar Government Art & Science College Surandai, TN, India-627859 Email:saravanansuba@rediffmail.com

More information

A Comparative Study of Association Rules Mining Algorithms

A Comparative Study of Association Rules Mining Algorithms A Comparative Study of Association Rules Mining Algorithms Cornelia Győrödi *, Robert Győrödi *, prof. dr. ing. Stefan Holban ** * Department of Computer Science, University of Oradea, Str. Armatei Romane

More information

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Database and Knowledge-Base Systems: Data Mining. Martin Ester Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro

More information

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3

More information

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Anotai Siltepavet 1, Sukree Sinthupinyo 2 and Prabhas Chongstitvatana 3 1 Computer Engineering, Chulalongkorn University,

More information

Structure of Association Rule Classifiers: a Review

Structure of Association Rule Classifiers: a Review Structure of Association Rule Classifiers: a Review Koen Vanhoof Benoît Depaire Transportation Research Institute (IMOB), University Hasselt 3590 Diepenbeek, Belgium koen.vanhoof@uhasselt.be benoit.depaire@uhasselt.be

More information

Elena Marchiori Free University Amsterdam, Faculty of Science, Department of Mathematics and Computer Science, Amsterdam, The Netherlands

Elena Marchiori Free University Amsterdam, Faculty of Science, Department of Mathematics and Computer Science, Amsterdam, The Netherlands DATA MINING Elena Marchiori Free University Amsterdam, Faculty of Science, Department of Mathematics and Computer Science, Amsterdam, The Netherlands Keywords: Data mining, knowledge discovery in databases,

More information

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique www.ijcsi.org 29 Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Anotai Siltepavet 1, Sukree Sinthupinyo 2 and Prabhas Chongstitvatana 3 1 Computer Engineering, Chulalongkorn

More information

Data Mining Part 3. Associations Rules

Data Mining Part 3. Associations Rules Data Mining Part 3. Associations Rules 3.2 Efficient Frequent Itemset Mining Methods Fall 2009 Instructor: Dr. Masoud Yaghini Outline Apriori Algorithm Generating Association Rules from Frequent Itemsets

More information

SD-Map A Fast Algorithm for Exhaustive Subgroup Discovery

SD-Map A Fast Algorithm for Exhaustive Subgroup Discovery SD-Map A Fast Algorithm for Exhaustive Subgroup Discovery Martin Atzmueller and Frank Puppe University of Würzburg, 97074 Würzburg, Germany Department of Computer Science Phone: +49 931 888-6739, Fax:

More information

A Novel method for Frequent Pattern Mining

A Novel method for Frequent Pattern Mining A Novel method for Frequent Pattern Mining K.Rajeswari #1, Dr.V.Vaithiyanathan *2 # Associate Professor, PCCOE & Ph.D Research Scholar SASTRA University, Tanjore, India 1 raji.pccoe@gmail.com * Associate

More information

Mining of Web Server Logs using Extended Apriori Algorithm

Mining of Web Server Logs using Extended Apriori Algorithm International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

More information

Induction of Association Rules: Apriori Implementation

Induction of Association Rules: Apriori Implementation 1 Induction of Association Rules: Apriori Implementation Christian Borgelt and Rudolf Kruse Department of Knowledge Processing and Language Engineering School of Computer Science Otto-von-Guericke-University

More information

Parallel Implementation of Apriori Algorithm Based on MapReduce

Parallel Implementation of Apriori Algorithm Based on MapReduce International Journal of Networked and Distributed Computing, Vol. 1, No. 2 (April 2013), 89-96 Parallel Implementation of Apriori Algorithm Based on MapReduce Ning Li * The Key Laboratory of Intelligent

More information

A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining

A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining Miss. Rituja M. Zagade Computer Engineering Department,JSPM,NTC RSSOER,Savitribai Phule Pune University Pune,India

More information

MySQL Data Mining: Extending MySQL to support data mining primitives (demo)

MySQL Data Mining: Extending MySQL to support data mining primitives (demo) MySQL Data Mining: Extending MySQL to support data mining primitives (demo) Alfredo Ferro, Rosalba Giugno, Piera Laura Puglisi, and Alfredo Pulvirenti Dept. of Mathematics and Computer Sciences, University

More information

Analysis of a Population of Diabetic Patients Databases in Weka Tool P.Yasodha, M. Kannan

Analysis of a Population of Diabetic Patients Databases in Weka Tool P.Yasodha, M. Kannan International Journal of Scientific & Engineering Research Volume 2, Issue 5, May-2011 1 Analysis of a Population of Diabetic Patients Databases in Weka Tool P.Yasodha, M. Kannan Abstract - Data mining

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 4, April 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Discovering Knowledge

More information

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer Data Mining George Karypis Department of Computer Science Digital Technology Center University of Minnesota, Minneapolis, USA. http://www.cs.umn.edu/~karypis karypis@cs.umn.edu Overview Data-mining What

More information

PRIVACY PRESERVING IN DISTRIBUTED DATABASE USING DATA ENCRYPTION STANDARD (DES)

PRIVACY PRESERVING IN DISTRIBUTED DATABASE USING DATA ENCRYPTION STANDARD (DES) PRIVACY PRESERVING IN DISTRIBUTED DATABASE USING DATA ENCRYPTION STANDARD (DES) Jyotirmayee Rautaray 1, Raghvendra Kumar 2 School of Computer Engineering, KIIT University, Odisha, India 1 School of Computer

More information

Fuzzy Cognitive Maps application for Webmining

Fuzzy Cognitive Maps application for Webmining Fuzzy Cognitive Maps application for Webmining Andreas Kakolyris Dept. Computer Science, University of Ioannina Greece, csst9942@otenet.gr George Stylios Dept. of Communications, Informatics and Management,

More information

Performance Based Study of Association Rule Algorithms On Voter DB

Performance Based Study of Association Rule Algorithms On Voter DB Performance Based Study of Association Rule Algorithms On Voter DB K.Padmavathi 1, R.Aruna Kirithika 2 1 Department of BCA, St.Joseph s College, Thiruvalluvar University, Cuddalore, Tamil Nadu, India,

More information

The Fuzzy Search for Association Rules with Interestingness Measure

The Fuzzy Search for Association Rules with Interestingness Measure The Fuzzy Search for Association Rules with Interestingness Measure Phaichayon Kongchai, Nittaya Kerdprasop, and Kittisak Kerdprasop Abstract Association rule are important to retailers as a source of

More information

Data Access Paths for Frequent Itemsets Discovery

Data Access Paths for Frequent Itemsets Discovery Data Access Paths for Frequent Itemsets Discovery Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science {marekw, mzakrz}@cs.put.poznan.pl Abstract. A number

More information

Comparative Study of Subspace Clustering Algorithms

Comparative Study of Subspace Clustering Algorithms Comparative Study of Subspace Clustering Algorithms S.Chitra Nayagam, Asst Prof., Dept of Computer Applications, Don Bosco College, Panjim, Goa. Abstract-A cluster is a collection of data objects that

More information

Association Rule Mining. Entscheidungsunterstützungssysteme

Association Rule Mining. Entscheidungsunterstützungssysteme Association Rule Mining Entscheidungsunterstützungssysteme Frequent Pattern Analysis Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set

More information

Study on Mining Weighted Infrequent Itemsets Using FP Growth

Study on Mining Weighted Infrequent Itemsets Using FP Growth www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 6 June 2015, Page No. 12719-12723 Study on Mining Weighted Infrequent Itemsets Using FP Growth K.Hemanthakumar

More information

Dynamic Load Balancing of Unstructured Computations in Decision Tree Classifiers

Dynamic Load Balancing of Unstructured Computations in Decision Tree Classifiers Dynamic Load Balancing of Unstructured Computations in Decision Tree Classifiers A. Srivastava E. Han V. Kumar V. Singh Information Technology Lab Dept. of Computer Science Information Technology Lab Hitachi

More information

Optimization using Ant Colony Algorithm

Optimization using Ant Colony Algorithm Optimization using Ant Colony Algorithm Er. Priya Batta 1, Er. Geetika Sharmai 2, Er. Deepshikha 3 1Faculty, Department of Computer Science, Chandigarh University,Gharaun,Mohali,Punjab 2Faculty, Department

More information

Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm

Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm Marek Wojciechowski, Krzysztof Galecki, Krzysztof Gawronek Poznan University of Technology Institute of Computing Science ul.

More information

Machine Learning: Symbolische Ansätze

Machine Learning: Symbolische Ansätze Machine Learning: Symbolische Ansätze Unsupervised Learning Clustering Association Rules V2.0 WS 10/11 J. Fürnkranz Different Learning Scenarios Supervised Learning A teacher provides the value for the

More information

Hiding Sensitive Predictive Frequent Itemsets

Hiding Sensitive Predictive Frequent Itemsets Hiding Sensitive Predictive Frequent Itemsets Barış Yıldız and Belgin Ergenç Abstract In this work, we propose an itemset hiding algorithm with four versions that use different heuristics in selecting

More information

Mining Temporal Association Rules in Network Traffic Data

Mining Temporal Association Rules in Network Traffic Data Mining Temporal Association Rules in Network Traffic Data Guojun Mao Abstract Mining association rules is one of the most important and popular task in data mining. Current researches focus on discovering

More information

A Further Study in the Data Partitioning Approach for Frequent Itemsets Mining

A Further Study in the Data Partitioning Approach for Frequent Itemsets Mining A Further Study in the Data Partitioning Approach for Frequent Itemsets Mining Son N. Nguyen, Maria E. Orlowska School of Information Technology and Electrical Engineering The University of Queensland,

More information

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,

More information

Mining Generalised Emerging Patterns

Mining Generalised Emerging Patterns Mining Generalised Emerging Patterns Xiaoyuan Qian, James Bailey, Christopher Leckie Department of Computer Science and Software Engineering University of Melbourne, Australia {jbailey, caleckie}@csse.unimelb.edu.au

More information

On Multiple Query Optimization in Data Mining

On Multiple Query Optimization in Data Mining On Multiple Query Optimization in Data Mining Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland {marek,mzakrz}@cs.put.poznan.pl

More information

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10) CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data

Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data Shilpa Department of Computer Science & Engineering Haryana College of Technology & Management, Kaithal, Haryana, India

More information

A Quantified Approach for large Dataset Compression in Association Mining

A Quantified Approach for large Dataset Compression in Association Mining IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 15, Issue 3 (Nov. - Dec. 2013), PP 79-84 A Quantified Approach for large Dataset Compression in Association Mining

More information

Data Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei

Data Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei Data Mining Chapter 1: Introduction Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei 1 Any Question? Just Ask 3 Chapter 1. Introduction Why Data Mining? What Is Data Mining? A Multi-Dimensional

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics

More information

USING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS

USING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS INFORMATION SYSTEMS IN MANAGEMENT Information Systems in Management (2017) Vol. 6 (3) 213 222 USING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS PIOTR OŻDŻYŃSKI, DANUTA ZAKRZEWSKA Institute of Information

More information

Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering Amravati

Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering Amravati Analytical Representation on Secure Mining in Horizontally Distributed Database Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering

More information

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset M.Hamsathvani 1, D.Rajeswari 2 M.E, R.Kalaiselvi 3 1 PG Scholar(M.E), Angel College of Engineering and Technology, Tiruppur,

More information

Gurpreet Kaur 1, Naveen Aggarwal 2 1,2

Gurpreet Kaur 1, Naveen Aggarwal 2 1,2 Association Rule Mining in XML databases: Performance Evaluation and Analysis Gurpreet Kaur 1, Naveen Aggarwal 2 1,2 Department of Computer Science & Engineering, UIET Panjab University Chandigarh. E-mail

More information

Study on Apriori Algorithm and its Application in Grocery Store

Study on Apriori Algorithm and its Application in Grocery Store Study on Apriori Algorithm and its Application in Grocery Store Pragya Agarwal Department of CSE ASET,Amity University Sector-125,Noida,UP,India Madan Lal Yadav Department of CSE ASET,Amity University

More information

Association Rules Mining using BOINC based Enterprise Desktop Grid

Association Rules Mining using BOINC based Enterprise Desktop Grid Association Rules Mining using BOINC based Enterprise Desktop Grid Evgeny Ivashko and Alexander Golovin Institute of Applied Mathematical Research, Karelian Research Centre of Russian Academy of Sciences,

More information

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA SEQUENTIAL PATTERN MINING FROM WEB LOG DATA Rajashree Shettar 1 1 Associate Professor, Department of Computer Science, R. V College of Engineering, Karnataka, India, rajashreeshettar@rvce.edu.in Abstract

More information

Mining Association Rules with Item Constraints. Ramakrishnan Srikant and Quoc Vu and Rakesh Agrawal. IBM Almaden Research Center

Mining Association Rules with Item Constraints. Ramakrishnan Srikant and Quoc Vu and Rakesh Agrawal. IBM Almaden Research Center Mining Association Rules with Item Constraints Ramakrishnan Srikant and Quoc Vu and Rakesh Agrawal IBM Almaden Research Center 650 Harry Road, San Jose, CA 95120, U.S.A. fsrikant,qvu,ragrawalg@almaden.ibm.com

More information

A Novel Texture Classification Procedure by using Association Rules

A Novel Texture Classification Procedure by using Association Rules ITB J. ICT Vol. 2, No. 2, 2008, 03-4 03 A Novel Texture Classification Procedure by using Association Rules L. Jaba Sheela & V.Shanthi 2 Panimalar Engineering College, Chennai. 2 St.Joseph s Engineering

More information

Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation

Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation Daniel Lowd January 14, 2004 1 Introduction Probabilistic models have shown increasing popularity

More information

Building a Concept Hierarchy from a Distance Matrix

Building a Concept Hierarchy from a Distance Matrix Building a Concept Hierarchy from a Distance Matrix Huang-Cheng Kuo 1 and Jen-Peng Huang 2 1 Department of Computer Science and Information Engineering National Chiayi University, Taiwan 600 hckuo@mail.ncyu.edu.tw

More information

2002 Journal of Software.. (stacking).

2002 Journal of Software.. (stacking). 1000-9825/2002/13(02)0245-05 2002 Journal of Software Vol13, No2,,, (,200433) E-mail: {wyji,ayzhou,zhangl}@fudaneducn http://wwwcsfudaneducn : (GA) (stacking), 2,,, : ; ; ; ; : TP18 :A, [1],,, :,, :,,,,

More information

Product presentations can be more intelligently planned

Product presentations can be more intelligently planned Association Rules Lecture /DMBI/IKI8303T/MTI/UI Yudho Giri Sucahyo, Ph.D, CISA (yudho@cs.ui.ac.id) Faculty of Computer Science, Objectives Introduction What is Association Mining? Mining Association Rules

More information

Enhancing Cluster Quality by Using User Browsing Time

Enhancing Cluster Quality by Using User Browsing Time Enhancing Cluster Quality by Using User Browsing Time Rehab M. Duwairi* and Khaleifah Al.jada'** * Department of Computer Information Systems, Jordan University of Science and Technology, Irbid 22110,

More information

STUDY ON FREQUENT PATTEREN GROWTH ALGORITHM WITHOUT CANDIDATE KEY GENERATION IN DATABASES

STUDY ON FREQUENT PATTEREN GROWTH ALGORITHM WITHOUT CANDIDATE KEY GENERATION IN DATABASES STUDY ON FREQUENT PATTEREN GROWTH ALGORITHM WITHOUT CANDIDATE KEY GENERATION IN DATABASES Prof. Ambarish S. Durani 1 and Mrs. Rashmi B. Sune 2 1 Assistant Professor, Datta Meghe Institute of Engineering,

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN: IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 20131 Improve Search Engine Relevance with Filter session Addlin Shinney R 1, Saravana Kumar T

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information