CHAPTER-5 A HASH BASED APPROACH FOR FREQUENT PATTERN MINING TO IDENTIFY SUSPICIOUS FINANCIAL PATTERNS
|
|
- Lynne Cummings
- 5 years ago
- Views:
Transcription
1 CHAPTER-5 A HASH BASED APPROACH FOR FREQUENT PATTERN MINING TO IDENTIFY SUSPICIOUS FINANCIAL PATTERNS 54
2 CHAPTER-5 A HASH BASED APPROACH FOR FREQUENT PATTERN MINING TO IDENTIFY SUSPICIOUS FINANCIAL PATTERNS 5.1 INTRODUCTION Money Laundering in a criminal activity used to disguise black money as white money. The technology is getting advanced and in this fast changing technology, many merits as well as demerits are associated. The advent of e-commerce has globalized the world and with a single button click we can perform a huge amount of transaction. Detecting financial fraud is very important as it poses threat not only to financial institution but also to the nation. Traditional investigative techniques aimed at uncovering patterns consume numerous man-hours. Data mining techniques are well-suited for identifying trends and patterns in large datasets often comprised of hundreds or even thousands of complex hidden relationships. In spite of the guidelines listed by various governing bodies like, Reserve Bank of India, Securities and Exchange Board of India etc, many a times these are being violated. In India, all the banks need to submit the list of transactions which are not in line with the Reserve Bank of India guidelines to financial intelligence unit for further scrutinizing. Generally, the transactions pertaining to a bank may be either intrabank/interbank transaction, and the banks cannot request for any sort of investigations until unless if there is a foolproof system for identifying the money laundering activity. By considering the above facts, money laundering is considered to be a serious threat to the financial institutions as well as to the nation to carry illegal activities by hiding their personal identification. Although many anti money laundering techniques are proposed but failed to act efficiently. The current scenario is that all the anti money laundering solutions adopted are rule based which consume numerous man-hours. In Indian scenario an individual is considered, based on the guidelines given by reserve bank of India, banks determine few transactions which seem to be suspicious and send it to Financial Intelligence Unit (FIU). 55
3 FIU verifies if the transaction is actually suspicious or not. This process is very time consuming and not suitable to tackle dirty proceeds immediately. Hence it is very important to construct an efficient anti money laundering tool which goes very helpful for banks to report suspicious transactions. Hence this module aims to improve the efficiency of the existing anti money laundering techniques, this module aims at identifying the suspicious accounts in the layering stage of money laundering process by generating frequent transactional datasets using hash based mining. The generated frequent datasets will then be used in the graph theoretic approach to identify the traversal path of the suspicious transactions. The major idea of the system is to generate frequent 2-item set on the transactional database using hash based technique. After applying the hash based technique identifying the sequential traversal path using a graph theoretic approach among the suspicious accounts which were found in the frequent transactional data sets. The graph theoretic approach is applied to identify agent and integrator in the layering stage of money laundering The main purpose of this system is i) To prevent criminal elements from using the banking system for money laundering activities. ii) iii) To enable the bank to know/understand the customers and their financial dealings better which will in turn will help the bank to manage risks prudently. To put in place appropriate controls for detection and reporting of suspicious activities in accordance with applicable laws/laid down procedures. 5.2 METHDOLOGY The proposed system uses hashing technique to generate frequent accounts. We are working on the transactional data from multiple banks. Hence each individual bank s data that is stored in db1, db2...so on are taken together and combined to form a single 56
4 large database. Now the data of this large database has to be pre-processed in order to obtain data which is free from all null and missing or incorrect values. A hash based technique is applied on the transactional dataset to obtain a candidate set of reduced size. From this reduced size of candidate set we obtain frequent-2 item set. Now these frequent-2 item sets forms the edges of the graph. On applying the algorithm longest path in a directed acyclic graph we obtain the path in which large amount has been transferred. On the basis of in-degree and out-degree of each node, we determine agent and integrator : Applying hash-based technique over apriori algorithm A. Apriori algorithm Association analysis is used to find the relationship among the data elements and determining association rules. Some of the important association rule mining algorithms are apriori and hash based approach. They are used to find the associations using the minimum support and minimum confidence. The association analysis is divided into two sub problems. One is to find the accounts whose happening occurs behind the threshold and the second one is generating association rules over large databases with the constraints of minimum confidence. Apriori algorithm works well only if the data base is small and contains less number of transactions. The join indexing will helps in identifying the link that exists among the suspicious transaction but unable to establish the associations that exists among them. When the apriori algorithm is applied by considering the apriori property that Every subset of an account set must be frequent [108]. Using this principle a frequent item set is generated. The process of apriori algorithm works in this way. 57
5 Apriori algorithm for discovering frequent accounts 1. procedure apriori(t, minsupport) 2. { 3. // t is the database of suspicious transaction occurred between accounts(stoa) and minsuport is S 4. l 1 ={frequent st}; 5. for (k=2; l k -1!=Ø;k++) 6. { 7. c k =candidate generated from l k // that is cartesian product l k -1 x l k -1 & eliminating any k-1st that is not frequent 9. minsupport =s; 10. set-consists=2 11. while(support value of all transactions>s) 12. { 13. generate frequent st of size(set_consists+1); 14. set_consists ++; 15. calculate support values; 16. } 17. end 18. return U k l k // l k frequent accounts of size k Finally, all of those candidates satisfying minimum support form the set of frequent accounts, l 58
6 Applying the apriori algorithm on the result set that is derived from join indexing. The procedure for generating frequent transaction set is described below by considering a small financial data set consisting of transactions.the list of transaction that is found after applying join indexing is shown in the table 5.1. Table-5.1: List of transactions from join indexing UID List of Account IDs UID List of Account IDs UID List of Account IDs UID List of Account IDs T 1 39, 12 T 7 43, 16 T 13 43, 16 T 19 39, 12 T 2 15, 12 T 8 12,16 T 14 39, 12 T 20 15, 12 T 3 43, 16 T 9 16, 19 T 15 39, 12 T 21 22, 39 T 4 16, 22 T 10 39, 12 T 16 15, 12 T 22 12, 16 T 5 39, 12 T 11 15, 12 T 17 22, 39 T 23 43, 16 T 6 15, 12 T 12 22, 39 T 18 12, 16 T 24 43, 16 The procedure for generating frequent transaction set is described below. Step -1: In the first step simply scrutinize all of the transactions in order to count the no of occurrence of each account id. Table 5.2:No.of occurrences of each account ids List of STOA s Support count 12, , , , , , , , , ,
7 Step 2: Considering the minimum support count = 3, the frequent STOA s are Table 5.3: List of STOA S List of account ID s Support count 12, , , ,43 5 Step 3: From the derived 2-itemset and using the modified apriori algorithm a 3-itemset is derived Table5.4: Generated (3-itemset) after applying apriori algorithm List of account ID s 12,15,39 12,16,43 Further generation of association rules are not possible due to the non availability of information. The financial database consists of only 2 item set associations and applying apriori algorithm we can only generate 3 item set. Apriori algorithm works well if there exists a chain of associations from the transactional account set, but the situation is different in case of financial transactions. Any financial transaction is between two players but not between many. The apriori algorithm has some drawbacks in reducing the number of candidate k itemsets. In particular the 2 item sets since it is the key in improving the performance we used the hash based technique to improve the performance. 60
8 B. Hash based technique: This technique is used to reduce the candidate k-items, ck, for k>1. The formula for hash function used here for creating hash table h(x,y) = ((order of x)*10+(order of y))mod 7.for example when scanning each transaction in the data base to generate the frequent 1 item sets,l1,from the candidate 1-item set in c1,we can generate all of these 2 item sets for each transaction and map them into various buckets of a hash table structure and increase the corresponding bucket counts and the process continues : Identifying suspicious transactions path using graph theoretic approach To resolve this situation in the hash based approach and to further investigate the flow of money, a graph theoretic approach is proposed. A graph is an ordered pair G= (V, E) comprising a set V of vertices or nodes together with a set of edges or lines [17].We have different types of graphs such as simple graph where the non empty subsets of vertices are connected at most by one edge and the multi graphs are used for allowing the multiple edges between two vertices and the pseudo graphs are the graphs which allows edges connected to the vertex itself. From these we can differentiate directed graph and undirected graph. A directed graph is a graph in which there exists a direction which links the vertices, on the other hand undirected graph is the graph there won t be any direction between the vertices. In this proposed system a directed graph G= (V, E), the node V is considered as account and E comprised of associations between two or more accounts. 61
9 5.2.3 Algorithm for the construction of graph for identifying the path 1. Read the transaction details derived from hash based algorithm 2. Add account numbers as vertices in the graph 3. Now join vertices if there is transaction between accounts 4. Now find in degree and out degree of all vertices 5. The vertex with in degree as zero is source vertex represents agent in the placement phase of money laundering and vertex with out degree as zero is destination vertex represents integrator in the integration phase of money laundering. 6. The all possible paths between agent and integrator will give us layering information. Linking all the transactions sequentially and generating a graph by considering each account in the frequent item set as a node. For each link between the transaction, assign weights to reflect the multiplicity of the occurrence and hence the strength of the path. Finding the in-degree and out-degree of each node and determining agent and integrator. 62
10 5.3 IMPLEMENTATION Hash based technique over apriori algorithm: A hash based technique can be used to reduce the size of the candidate k-item sets, ck, for k>1. This is because in this technique we apply a hash function to each of the itemset of the transaction. h(x,y)= ((order of x*10)+order of y)mod 7 Suppose we have an item set {A1,A4} Then x=1 and y=4. Hence h(1,4)= ((1*10)+4)mod 7=14 mod 7=0. Now we place {A1,A4} in bucket address 0. Like wise we fill the hash table and record the bucket count. If any bucket is having count less than the minimum support count, then that whole bucket (i.e, its entire contents) is discarded) All the undeleted bucket counts now form elements of candidate set. Thus now we have a candidate item set which is smaller in size and hence we need to scan the database less number of times to find the frequent item sets thereby improving the efficiency of apriori algorithm. Candidate 2-item set generation: All the contents of the undeleted hash table contents are copied and then the duplicate transactions are eliminated. Then we obtain candidate 2 item set. Transitivity relation As at a time only 2 accounts are involved in a transaction, to find the chaining of accounts, we have used the mathematical transitivity relation, i.e., if A->B and B->C, then A->B->C Frequent 3 Item sets 63
11 From the transitivity relation we obtain 3 item sets. These item sets have the amount associated with it. Generating a sequential traversal path: From the frequent accounts, we can create the edges of the graph and also the weight of each edge is equal to the amount transferred between those two accounts. Longest path in a directed acyclic graph There are many paths in the graph. Now to find the most suspicious path, we are applying this algorithm and getting the path with the total amount. To understand the approach, let us consider the dataset of 22 transactions. Generating frequent accounts using hashing Consider a small transaction dataset of 22 transactions Table No-5.5: Dataset contents. Transaction_ID From-to transaction 2-item set 1 A1->A2 {1,2} 2 A2->A3 {2,3} 3 A3->A4 {3,4} 4 A1->A4 {1,4} 5 A4->A6 {4,6} 6 A5->A6 {5,6} 7 A3->A5 {3,5} 8 A3->A6 {3,6} 9 A4->A5 {4,5} 10 A1->A2 {1,2} 11 A5->A6 {5,6} 12 A3->A5 {3,5} 13 A3->A6 {3,6} 14 A1->A2 {1,2} 64
12 15 A3->A5 {3,5} 16 A3->A6 {3,6} 17 A4->A5 {4,5} 18 A1->A2 {1,2} 19 A3->A5 {3,5} 20 A4->A5 {4,5} 21 A3->A4 {3,4} 22 A2->A3 {2,3} On this set of 22 transactions hash formula is applied. H(x,y)=((order of x)*10)+ (order of y)) mod 7. Here x= from_acc_d and y=to_acc_id Now all these 22 transactions are grouped in to different indexes in hash table. Now the bucket count is calculated for each bucket Table No 5.6: Bucket tables with bucket counts Bucket address Bucket 1,4 3,6 2,3 4,5 4,6 1,2 3,4 contents 5,6 3,6 2,3 4,5 1,2 3,4 5,6 3,6 4,5 1,2 3,5 1,2 3,5 3,5 3,5 Bucket count
13 Enter the minimum bucket count Then the buckets whose total count is less than the deleted with all its contents. Here bucket 4 is deleted. minimum bucket will be Minimum bucket count=2 Table No-5.7 Bucket count for item sets and minimum support count Item set Bucket count 1,4 7 5,6 7 3,5 7 3,6 3 2,3 2 4,5 3 4,6 1 (*discarded) 1,2 4 3,4 2 Now the left over transactions in the buckets are taken and then their actual count in database is recorded 66
14 Table No-5.8: The bucket count and actual count are recorded Item sets Bucket count Actual count 1,4 7 1 (*discarded) 5, , , , , , ,4 2 2 Enter a support count for the no of time of transaction. (say 2) Minimum Support Count =2 Now all the transactions which have occurred 2 or more no of times are taken in to Frequent -2 item sets Table No-5.9: Frequent 2 accounts with their support counts Frequent-2 Item set Support count 5,6 2 3,5 4 3,6 3 2,3 2 4,5 3 1,2 4 3,4 2 67
15 These are the frequent-2 transactions. Finding the traversal path: Various paths are identified by connecting all the frequent accounts as nodes. A4 A6 Out degree=0 Integrator W 45 =3 W 56 =2 InDegree=0 Agent W 34 =2 A5 W 36 =3 W 35 =4 A3 W 23 =2 A2 W 32 =4 A1 Fig No- 5.1: The graph of suspicious accounts Some of the packages used are: java.io.*: Java IO is an API that comes with Java which is targeted at reading and writing data (input and output). java.util.iterator : To generate successive elements from a series, we can use java iterator. java.util.vector: The Vector class implements a growable array of objects. Like an array, it contains components that can be accessed using an integer index. However, the size of 68
16 a Vector can grow or shrink as needed to accommodate adding and removing items after the Vector has been created. java.sql.* Provides the API for accessing and processing data stored in a data source (usually a relational database) using the Java TM programming language. This API includes a framework whereby different drivers can be installed dynamically to access different data sources. Java.util.scanner : The java.util. Scanner class is a simple text scanner which can parse primitive types and strings using regular expressions. Database We have maintained the databases in sql server management studio. For this we have created tables using sql queries. Dataset We have 4 datasets. 1) TwentyTwo - having twenty two transactions. 2) FiveThousand having FiveThousand transactions. 3) TenThousand having TenThousand transactions. 4) SeventeenThousand having SeventeenThousand transactions. These four datasets are created by creating four tables for Transactions with same attributes but with different no of records. Tables: 1) Bank 2) Customer 3)Accounts 4)Transactions All the data that is inserted into these tables are synthetic data and they are the data that is free from null values and missing values. Four transaction tables are created to store the varied size of dataset. 69
17 The tables created have a primary key associated with it. Bank table has bank_id as primary key, Customer table has customer_id as primary key Account table has account_id as primary key Transaction table has trans_id as primary key Table insertion: Example queries: insert into bank values('sbi','mvp','visakapatnam','a.p') insert into customer values('bharath kumar chowhan','it.employee','6','aaxpd7874l',' ','ranga reddy',' ','male') insert into account values(' ','6','1',' ',' ') insert into transactions values(103,51,'12/1/2013 9:00:00 AM',20173,'initiated',null,null) 5.4 SUMMARY By considering the different sizes of the synthetic data sets of transactions we could address the issue of detecting suspicious accounts using the existing anti-money laundering techniques. We are successful in identifying the suspicious accounts in the layering stage of money laundering process by generating frequent transactional datasets using hash based mining. Further we were also able to identify the traversal path of the suspicious transactions using the longest path in a directed acyclic graph. The graph theory with which we examined the degree of each node is then considered as our basis to identify the agent and integrator. 70
Data Mining Part 3. Associations Rules
Data Mining Part 3. Associations Rules 3.2 Efficient Frequent Itemset Mining Methods Fall 2009 Instructor: Dr. Masoud Yaghini Outline Apriori Algorithm Generating Association Rules from Frequent Itemsets
More informationApriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke
Apriori Algorithm For a given set of transactions, the main aim of Association Rule Mining is to find rules that will predict the occurrence of an item based on the occurrences of the other items in the
More informationAssociation Rules. A. Bellaachia Page: 1
Association Rules 1. Objectives... 2 2. Definitions... 2 3. Type of Association Rules... 7 4. Frequent Itemset generation... 9 5. Apriori Algorithm: Mining Single-Dimension Boolean AR 13 5.1. Join Step:...
More information2. Discovery of Association Rules
2. Discovery of Association Rules Part I Motivation: market basket data Basic notions: association rule, frequency and confidence Problem of association rule mining (Sub)problem of frequent set mining
More informationLecture notes for April 6, 2005
Lecture notes for April 6, 2005 Mining Association Rules The goal of association rule finding is to extract correlation relationships in the large datasets of items. Many businesses are interested in extracting
More informationAssociation mining rules
Association mining rules Given a data set, find the items in data that are associated with each other. Association is measured as frequency of occurrence in the same context. Purchasing one product when
More informationFrequent Itemsets Melange
Frequent Itemsets Melange Sebastien Siva Data Mining Motivation and objectives Finding all frequent itemsets in a dataset using the traditional Apriori approach is too computationally expensive for datasets
More informationTutorial on Association Rule Mining
Tutorial on Association Rule Mining Yang Yang yang.yang@itee.uq.edu.au DKE Group, 78-625 August 13, 2010 Outline 1 Quick Review 2 Apriori Algorithm 3 FP-Growth Algorithm 4 Mining Flickr and Tag Recommendation
More informationgspan: Graph-Based Substructure Pattern Mining
University of Illinois at Urbana-Champaign February 3, 2017 Agenda What motivated the development of gspan? Technical Preliminaries Exploring the gspan algorithm Experimental Performance Evaluation Introduction
More informationInternational Journal of Computer Trends and Technology (IJCTT) volume 27 Number 2 September 2015
Improving Efficiency of Apriori Algorithm Ch.Bhavani, P.Madhavi Assistant Professors, Department of Computer Science, CVR college of Engineering, Hyderabad, India. Abstract -- Apriori algorithm has been
More informationA Fast Algorithm for Data Mining. Aarathi Raghu Advisor: Dr. Chris Pollett Committee members: Dr. Mark Stamp, Dr. T.Y.Lin
A Fast Algorithm for Data Mining Aarathi Raghu Advisor: Dr. Chris Pollett Committee members: Dr. Mark Stamp, Dr. T.Y.Lin Our Work Interested in finding closed frequent itemsets in large databases Large
More informationImplementation of Data Mining for Vehicle Theft Detection using Android Application
Implementation of Data Mining for Vehicle Theft Detection using Android Application Sandesh Sharma 1, Praneetrao Maddili 2, Prajakta Bankar 3, Rahul Kamble 4 and L. A. Deshpande 5 1 Student, Department
More informationINFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM
INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM G.Amlu #1 S.Chandralekha #2 and PraveenKumar *1 # B.Tech, Information Technology, Anand Institute of Higher Technology, Chennai, India
More informationChapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the
Chapter 6: What Is Frequent ent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc) that occurs frequently in a data set frequent itemsets and association rule
More informationData Structures. Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali Association Rules: Basic Concepts and Application
Data Structures Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali 2009-2010 Association Rules: Basic Concepts and Application 1. Association rules: Given a set of transactions, find
More informationINTELLIGENT SUPERMARKET USING APRIORI
INTELLIGENT SUPERMARKET USING APRIORI Kasturi Medhekar 1, Arpita Mishra 2, Needhi Kore 3, Nilesh Dave 4 1,2,3,4Student, 3 rd year Diploma, Computer Engineering Department, Thakur Polytechnic, Mumbai, Maharashtra,
More informationMining Frequent Patterns with Counting Inference at Multiple Levels
International Journal of Computer Applications (097 7) Volume 3 No.10, July 010 Mining Frequent Patterns with Counting Inference at Multiple Levels Mittar Vishav Deptt. Of IT M.M.University, Mullana Ruchika
More informationCS570 Introduction to Data Mining
CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay Partial slide credits: Li Xiong, Jiawei Han and Micheline Kamber George Kollios 1 Mining Frequent Patterns,
More informationMining Frequent Patterns without Candidate Generation
Mining Frequent Patterns without Candidate Generation Outline of the Presentation Outline Frequent Pattern Mining: Problem statement and an example Review of Apriori like Approaches FP Growth: Overview
More informationAssociation Rule Mining
Association Rule Mining Generating assoc. rules from frequent itemsets Assume that we have discovered the frequent itemsets and their support How do we generate association rules? Frequent itemsets: {1}
More informationChapter 4: Mining Frequent Patterns, Associations and Correlations
Chapter 4: Mining Frequent Patterns, Associations and Correlations 4.1 Basic Concepts 4.2 Frequent Itemset Mining Methods 4.3 Which Patterns Are Interesting? Pattern Evaluation Methods 4.4 Summary Frequent
More informationPattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42
Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth
More informationEXTRACTION OF RELEVANT WEB PAGES USING DATA MINING
Chapter 3 EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING 3.1 INTRODUCTION Generally web pages are retrieved with the help of search engines which deploy crawlers for downloading purpose. Given a query,
More informationA Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm
A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm S.Pradeepkumar*, Mrs.C.Grace Padma** M.Phil Research Scholar, Department of Computer Science, RVS College of
More informationAppropriate Item Partition for Improving the Mining Performance
Appropriate Item Partition for Improving the Mining Performance Tzung-Pei Hong 1,2, Jheng-Nan Huang 1, Kawuu W. Lin 3 and Wen-Yang Lin 1 1 Department of Computer Science and Information Engineering National
More informationAn Improved Apriori Algorithm for Association Rules
Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan
More informationIntroducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved
Introducing Hashing Chapter 21 Contents What Is Hashing? Hash Functions Computing Hash Codes Compressing a Hash Code into an Index for the Hash Table A demo of hashing (after) ARRAY insert hash index =
More informationAssociation Rule Mining: FP-Growth
Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong We have already learned the Apriori algorithm for association rule mining. In this lecture, we will discuss a faster
More informationChapter 4: Association analysis:
Chapter 4: Association analysis: 4.1 Introduction: Many business enterprises accumulate large quantities of data from their day-to-day operations, huge amounts of customer purchase data are collected daily
More informationImproved Frequent Pattern Mining Algorithm with Indexing
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov Dec. 2014), PP 73-78 Improved Frequent Pattern Mining Algorithm with Indexing Prof.
More informationAPRIORI ALGORITHM FOR MINING FREQUENT ITEMSETS A REVIEW
International Journal of Computer Application and Engineering Technology Volume 3-Issue 3, July 2014. Pp. 232-236 www.ijcaet.net APRIORI ALGORITHM FOR MINING FREQUENT ITEMSETS A REVIEW Priyanka 1 *, Er.
More informationMining of Web Server Logs using Extended Apriori Algorithm
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational
More informationData Mining: Mining Association Rules. Definitions. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..
.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Mining Association Rules Definitions Market Baskets. Consider a set I = {i 1,...,i m }. We call the elements of I, items.
More informationA mining method for tracking changes in temporal association rules from an encoded database
A mining method for tracking changes in temporal association rules from an encoded database Chelliah Balasubramanian *, Karuppaswamy Duraiswamy ** K.S.Rangasamy College of Technology, Tiruchengode, Tamil
More informationA multi-step attack-correlation method with privacy protection
A multi-step attack-correlation method with privacy protection Research paper A multi-step attack-correlation method with privacy protection ZHANG Yongtang 1, 2, LUO Xianlu 1, LUO Haibo 1 1. Department
More informationCHAPTER 5 WEIGHTED SUPPORT ASSOCIATION RULE MINING USING CLOSED ITEMSET LATTICES IN PARALLEL
68 CHAPTER 5 WEIGHTED SUPPORT ASSOCIATION RULE MINING USING CLOSED ITEMSET LATTICES IN PARALLEL 5.1 INTRODUCTION During recent years, one of the vibrant research topics is Association rule discovery. This
More informationFrequent Pattern Mining
Frequent Pattern Mining How Many Words Is a Picture Worth? E. Aiden and J-B Michel: Uncharted. Reverhead Books, 2013 Jian Pei: CMPT 741/459 Frequent Pattern Mining (1) 2 Burnt or Burned? E. Aiden and J-B
More informationMINING ASSOCIATION RULE FOR HORIZONTALLY PARTITIONED DATABASES USING CK SECURE SUM TECHNIQUE
MINING ASSOCIATION RULE FOR HORIZONTALLY PARTITIONED DATABASES USING CK SECURE SUM TECHNIQUE Jayanti Danasana 1, Raghvendra Kumar 1 and Debadutta Dey 1 1 School of Computer Engineering, KIIT University,
More informationAn Algorithm for Mining Large Sequences in Databases
149 An Algorithm for Mining Large Sequences in Databases Bharat Bhasker, Indian Institute of Management, Lucknow, India, bhasker@iiml.ac.in ABSTRACT Frequent sequence mining is a fundamental and essential
More informationDiscovery of Frequent Itemset and Promising Frequent Itemset Using Incremental Association Rule Mining Over Stream Data Mining
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 5, May 2014, pg.923
More informationAssociation Rules. Berlin Chen References:
Association Rules Berlin Chen 2005 References: 1. Data Mining: Concepts, Models, Methods and Algorithms, Chapter 8 2. Data Mining: Concepts and Techniques, Chapter 6 Association Rules: Basic Concepts A
More informationA New Technique to Optimize User s Browsing Session using Data Mining
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
More informationUAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA
UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University
More informationWeb Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India
Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Abstract - The primary goal of the web site is to provide the
More informationFrequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar
Frequent Pattern Mining Based on: Introduction to Data Mining by Tan, Steinbach, Kumar Item sets A New Type of Data Some notation: All possible items: Database: T is a bag of transactions Transaction transaction
More informationInfrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset
Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset M.Hamsathvani 1, D.Rajeswari 2 M.E, R.Kalaiselvi 3 1 PG Scholar(M.E), Angel College of Engineering and Technology, Tiruppur,
More informationTeradata. This was compiled in order to describe Teradata and provide a brief overview of common capabilities and queries.
Teradata This was compiled in order to describe Teradata and provide a brief overview of common capabilities and queries. What is it? Teradata is a powerful Big Data tool that can be used in order to quickly
More informationWeb Usage Mining for Comparing User Access Behaviour using Sequential Pattern
Web Usage Mining for Comparing User Access Behaviour using Sequential Pattern Amit Dipchandji Kasliwal #, Dr. Girish S. Katkar * # Malegaon, Nashik, Maharashtra, India * Dept. of Computer Science, Arts,
More informationA Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining
A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining Miss. Rituja M. Zagade Computer Engineering Department,JSPM,NTC RSSOER,Savitribai Phule Pune University Pune,India
More informationThe Transpose Technique to Reduce Number of Transactions of Apriori Algorithm
The Transpose Technique to Reduce Number of Transactions of Apriori Algorithm Narinder Kumar 1, Anshu Sharma 2, Sarabjit Kaur 3 1 Research Scholar, Dept. Of Computer Science & Engineering, CT Institute
More informationImproving the Efficiency of Web Usage Mining Using K-Apriori and FP-Growth Algorithm
International Journal of Scientific & Engineering Research Volume 4, Issue3, arch-2013 1 Improving the Efficiency of Web Usage ining Using K-Apriori and FP-Growth Algorithm rs.r.kousalya, s.k.suguna, Dr.V.
More informationKnowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey
Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey G. Shivaprasad, N. V. Subbareddy and U. Dinesh Acharya
More informationDISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH
International Journal of Information Technology and Knowledge Management January-June 2011, Volume 4, No. 1, pp. 27-32 DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY)
More informationData Mining for Knowledge Management. Association Rules
1 Data Mining for Knowledge Management Association Rules Themis Palpanas University of Trento http://disi.unitn.eu/~themis 1 Thanks for slides to: Jiawei Han George Kollios Zhenyu Lu Osmar R. Zaïane Mohammad
More informationPart 1: Written Questions (60 marks):
COMP 352: Data Structure and Algorithms Fall 2016 Department of Computer Science and Software Engineering Concordia University Combined Assignment #3 and #4 Due date and time: Sunday November 27 th 11:59:59
More informationBINARY DECISION TREE FOR ASSOCIATION RULES MINING IN INCREMENTAL DATABASES
BINARY DECISION TREE FOR ASSOCIATION RULES MINING IN INCREMENTAL DATABASES Amaranatha Reddy P, Pradeep G and Sravani M Department of Computer Science & Engineering, SoET, SPMVV, Tirupati ABSTRACT This
More informationCHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM. Please purchase PDF Split-Merge on to remove this watermark.
119 CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM 120 CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM 5.1. INTRODUCTION Association rule mining, one of the most important and well researched
More informationA Framework for Securing Databases from Intrusion Threats
A Framework for Securing Databases from Intrusion Threats R. Prince Jeyaseelan James Department of Computer Applications, Valliammai Engineering College Affiliated to Anna University, Chennai, India Email:
More informationPerformance Based Study of Association Rule Algorithms On Voter DB
Performance Based Study of Association Rule Algorithms On Voter DB K.Padmavathi 1, R.Aruna Kirithika 2 1 Department of BCA, St.Joseph s College, Thiruvalluvar University, Cuddalore, Tamil Nadu, India,
More informationAn Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining
An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining P.Subhashini 1, Dr.G.Gunasekaran 2 Research Scholar, Dept. of Information Technology, St.Peter s University,
More informationASSESSMENT LAYERED SECURITY
FFIEC BUSINESS ACCOUNT GUIDANCE RISK & ASSESSMENT LAYERED SECURITY FOR ONLINE BUSINESS TRANSACTIONS New financial standards will assist banks and business account holders to make online banking safer and
More informationCSE 634/590 Data mining Extra Credit: Classification by Association rules: Example Problem. Muhammad Asiful Islam, SBID:
CSE 634/590 Data mining Extra Credit: Classification by Association rules: Example Problem Muhammad Asiful Islam, SBID: 106506983 Original Data Outlook Humidity Wind PlayTenis Sunny High Weak No Sunny
More informationClassification by Association
Classification by Association Cse352 Ar*ficial Intelligence Professor Anita Wasilewska Generating Classification Rules by Association When mining associa&on rules for use in classifica&on we are only interested
More informationA Novel method for Frequent Pattern Mining
A Novel method for Frequent Pattern Mining K.Rajeswari #1, Dr.V.Vaithiyanathan *2 # Associate Professor, PCCOE & Ph.D Research Scholar SASTRA University, Tanjore, India 1 raji.pccoe@gmail.com * Associate
More informationGurpreet Kaur 1, Naveen Aggarwal 2 1,2
Association Rule Mining in XML databases: Performance Evaluation and Analysis Gurpreet Kaur 1, Naveen Aggarwal 2 1,2 Department of Computer Science & Engineering, UIET Panjab University Chandigarh. E-mail
More informationComparison of FP tree and Apriori Algorithm
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 10, Issue 6 (June 2014), PP.78-82 Comparison of FP tree and Apriori Algorithm Prashasti
More informationPattern Discovery Using Apriori and Ch-Search Algorithm
ISSN (e): 2250 3005 Volume, 05 Issue, 03 March 2015 International Journal of Computational Engineering Research (IJCER) Pattern Discovery Using Apriori and Ch-Search Algorithm Prof.Kumbhar S.L. 1, Mahesh
More informationA NEW ASSOCIATION RULE MINING BASED ON FREQUENT ITEM SET
A NEW ASSOCIATION RULE MINING BASED ON FREQUENT ITEM SET Ms. Sanober Shaikh 1 Ms. Madhuri Rao 2 and Dr. S. S. Mantha 3 1 Department of Information Technology, TSEC, Bandra (w), Mumbai s.sanober1@gmail.com
More informationHTTP BASED BOT-NET DETECTION TECHNIQUE USING APRIORI ALGORITHM WITH ACTUAL TIME DURATION
International Journal of Computer Engineering and Applications, Volume XI, Issue III, March 17, www.ijcea.com ISSN 2321-3469 HTTP BASED BOT-NET DETECTION TECHNIQUE USING APRIORI ALGORITHM WITH ACTUAL TIME
More informationValue Added Association Rules
Value Added Association Rules T.Y. Lin San Jose State University drlin@sjsu.edu Glossary Association Rule Mining A Association Rule Mining is an exploratory learning task to discover some hidden, dependency
More informationSEQUENTIAL PATTERN MINING FROM WEB LOG DATA
SEQUENTIAL PATTERN MINING FROM WEB LOG DATA Rajashree Shettar 1 1 Associate Professor, Department of Computer Science, R. V College of Engineering, Karnataka, India, rajashreeshettar@rvce.edu.in Abstract
More informationCollaborative Rough Clustering
Collaborative Rough Clustering Sushmita Mitra, Haider Banka, and Witold Pedrycz Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India {sushmita, hbanka r}@isical.ac.in Dept. of Electrical
More informationA Trie-based APRIORI Implementation for Mining Frequent Item Sequences
A Trie-based APRIORI Implementation for Mining Frequent Item Sequences Ferenc Bodon bodon@cs.bme.hu Department of Computer Science and Information Theory, Budapest University of Technology and Economics
More informationAN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE
AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3
More informationWeb Service Usage Mining: Mining For Executable Sequences
7th WSEAS International Conference on APPLIED COMPUTER SCIENCE, Venice, Italy, November 21-23, 2007 266 Web Service Usage Mining: Mining For Executable Sequences MOHSEN JAFARI ASBAGH, HASSAN ABOLHASSANI
More informationCLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets
CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets Jianyong Wang, Jiawei Han, Jian Pei Presentation by: Nasimeh Asgarian Department of Computing Science University of Alberta
More informationKeshavamurthy B.N., Mitesh Sharma and Durga Toshniwal
Keshavamurthy B.N., Mitesh Sharma and Durga Toshniwal Department of Electronics and Computer Engineering, Indian Institute of Technology, Roorkee, Uttarkhand, India. bnkeshav123@gmail.com, mitusuec@iitr.ernet.in,
More informationChapter 2. Related Work
Chapter 2 Related Work There are three areas of research highly related to our exploration in this dissertation, namely sequential pattern mining, multiple alignment, and approximate frequent pattern mining.
More informationAssociation Rule Mining. Entscheidungsunterstützungssysteme
Association Rule Mining Entscheidungsunterstützungssysteme Frequent Pattern Analysis Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set
More informationA Study on Mining of Frequent Subsequences and Sequential Pattern Search- Searching Sequence Pattern by Subset Partition
A Study on Mining of Frequent Subsequences and Sequential Pattern Search- Searching Sequence Pattern by Subset Partition S.Vigneswaran 1, M.Yashothai 2 1 Research Scholar (SRF), Anna University, Chennai.
More informationApproaches for Mining Frequent Itemsets and Minimal Association Rules
GRD Journals- Global Research and Development Journal for Engineering Volume 1 Issue 7 June 2016 ISSN: 2455-5703 Approaches for Mining Frequent Itemsets and Minimal Association Rules Prajakta R. Tanksali
More informationsignicantly higher than it would be if items were placed at random into baskets. For example, we
2 Association Rules and Frequent Itemsets The market-basket problem assumes we have some large number of items, e.g., \bread," \milk." Customers ll their market baskets with some subset of the items, and
More informationCse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University
Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before
More informationFP-Growth algorithm in Data Compression frequent patterns
FP-Growth algorithm in Data Compression frequent patterns Mr. Nagesh V Lecturer, Dept. of CSE Atria Institute of Technology,AIKBS Hebbal, Bangalore,Karnataka Email : nagesh.v@gmail.com Abstract-The transmission
More informationHybrid Approach for Improving Efficiency of Apriori Algorithm on Frequent Itemset
IJCSNS International Journal of Computer Science and Network Security, VOL.18 No.5, May 2018 151 Hybrid Approach for Improving Efficiency of Apriori Algorithm on Frequent Itemset Arwa Altameem and Mourad
More informationMining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports
Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports R. Uday Kiran P. Krishna Reddy Center for Data Engineering International Institute of Information Technology-Hyderabad Hyderabad,
More informationCHENNAI MATHEMATICAL INSTITUTE M.Sc. / Ph.D. Programme in Computer Science
CHENNAI MATHEMATICAL INSTITUTE M.Sc. / Ph.D. Programme in Computer Science Entrance Examination, 5 May 23 This question paper has 4 printed sides. Part A has questions of 3 marks each. Part B has 7 questions
More informationResearch Article Apriori Association Rule Algorithms using VMware Environment
Research Journal of Applied Sciences, Engineering and Technology 8(2): 16-166, 214 DOI:1.1926/rjaset.8.955 ISSN: 24-7459; e-issn: 24-7467 214 Maxwell Scientific Publication Corp. Submitted: January 2,
More informationReal-time Fraud Detection with Innovative Big Graph Feature. Gaurav Deshpande, VP Marketing, TigerGraph; Mingxi Wu, VP Engineering, TigerGraph
Real-time Fraud Detection with Innovative Big Graph Feature Gaurav Deshpande, VP Marketing, TigerGraph; Mingxi Wu, VP Engineering, TigerGraph Speaking Today Gaurav Deshpande VP Marketing, TigerGraph gaurav@tigergraph.com
More informationMining Association Rules in Large Databases
Mining Association Rules in Large Databases Association rules Given a set of transactions D, find rules that will predict the occurrence of an item (or a set of items) based on the occurrences of other
More informationSQL Based Frequent Pattern Mining with FP-growth
SQL Based Frequent Pattern Mining with FP-growth Shang Xuequn, Sattler Kai-Uwe, and Geist Ingolf Department of Computer Science University of Magdeburg P.O.BOX 4120, 39106 Magdeburg, Germany {shang, kus,
More informationFaculty of Science FINAL EXAMINATION COMP-250 A Introduction to Computer Science School of Computer Science, McGill University
NAME: STUDENT NUMBER:. Faculty of Science FINAL EXAMINATION COMP-250 A Introduction to Computer Science School of Computer Science, McGill University Examimer: Prof. Mathieu Blanchette December 8 th 2005,
More informationChapter 13: Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
More informationMining N-most Interesting Itemsets. Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang. fadafu,
Mining N-most Interesting Itemsets Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang Department of Computer Science and Engineering The Chinese University of Hong Kong, Hong Kong fadafu, wwkwongg@cse.cuhk.edu.hk
More informationData Mining Concepts
Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms Sequential
More informationDATA MINING II - 1DL460
DATA MINING II - 1DL460 Spring 2013 " An second class in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt13 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
More informationINFREQUENT WEIGHTED ITEM SET MINING USING FREQUENT PATTERN GROWTH R. Lakshmi Prasanna* 1, Dr. G.V.S.N.R.V. Prasad 2
ISSN 2277-2685 IJESR/Nov. 2015/ Vol-5/Issue-11/1434-1439 R. Lakshmi Prasanna et. al.,/ International Journal of Engineering & Science Research INFREQUENT WEIGHTED ITEM SET MINING USING FREQUENT PATTERN
More information5. MULTIPLE LEVELS AND CROSS LEVELS ASSOCIATION RULES UNDER CONSTRAINTS
5. MULTIPLE LEVELS AND CROSS LEVELS ASSOCIATION RULES UNDER CONSTRAINTS Association rules generated from mining data at multiple levels of abstraction are called multiple level or multi level association
More informationMarket baskets Frequent itemsets FP growth. Data mining. Frequent itemset Association&decision rule mining. University of Szeged.
Frequent itemset Association&decision rule mining University of Szeged What frequent itemsets could be used for? Features/observations frequently co-occurring in some database can gain us useful insights
More informationChapter 12: Query Processing
Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation
More information