Identification of Data Cohesive Subsystems Using Data Mining Techniques

Size: px
Start display at page:

Download "Identification of Data Cohesive Subsystems Using Data Mining Techniques"

Transcription

1 Identification of Data Cohesive Subsystems Using Data Mining Techniques Carlos Montes de Oca and Doris L. Carver Department of Computer Science, Louisiana State University Baton Rouge, Louisiana, USA (moca, Abstract The activity of reengineering and maintaining large legacy systems involves the use of design recovery techniques to produce abstractions that facilitate the understanding of the system. In this paper, we present an approach to design recovery based on data mining. This approach derives from the observation that data mining can discover unsuspected non-trivial relationships among elements in large databases. This observation suggests that data mining can be used to elicit new knowledge about the design of a subject system and that it can be applied to large legacy systems. We describe the ISA methodology which uses data mining to identify data cohesive subsystems. We were able to decompose COBOL systems into subsystems by using this approach. Our experience shows that data mining can identify data cohesive subsystems without any previous knowledge of the subject system. Furthermore, data mining can produce meaningful results regardless of system size making this approach especially appropriate to the analysis of large undocumented systems. 1. Introduction There are software systems that for many years have supported the activities of large organizations, yet the software systems have grown old. These systems, known as legacy systems, show signs of deterioration. Generally speaking, the term legacy systems refers to old software systems for which maintenance has become very expensive; however, they perform a critical task for the enterprise. The problem grows daily because legacy systems need continual maintenance to respond to the demands of the constantly-changing business environment. This situation promotes fast and unplanned modifications that inevitably make the problem greater. Consequently, the maintenance of legacy systems continues to grow more complex and expensive. One approach to cope with the legacy systems problem is reengineering. Reengineering addresses the problem in two stages. The first stage, called reverse engineering, focuses on understanding the legacy system. The second stage, called forward engineering, uses the information produced in the reverse engineering stage and adds new specifications to rebuild the legacy system using modern technologies. A relevant subarea of reverse engineering is design recovery, which focuses on producing meaningful high level abstractions from the subject system [1]. To this end, design recovery may use any available source of information such as source code, documentation, domain knowledge, and personal experience. Similarly, the identified abstractions may take different forms such as module breakdown, structure-charts, entity-relationship diagrams, and formal specifications. Design recovery also plays an important role in the areas of maintenance and reuse by providing information that simplifies the understanding of the system and the localization of reusable parts. As suggested above, there are many approaches to design recovery. They differ in many respects such as the sources of information they use, the extraction procedures and tools they employ, and the outcome they produce. In this paper, we present an approach to design recovery that is based on data mining. Data mining is the process of extracting patterns or models from data by applying specific algorithms [17]. We are interested in data mining techniques because they have features that are potentially helpful for reverse engineering and maintenance tasks. For instance, the data mining process finds nontrivial and previously unknown patterns in large databases. That is, data mining techniques are capable of revealing unknown classifications, associations, and sequences among data. Such information

2 may be used to identify objects, to detect reusable components, or to find novel ways to form relationships among system's components. This feature is especially helpful when dealing with large undocumented systems. In general, our research aims to use data mining techniques as a tool in the reverse engineering and maintenance domains. In particular, we explore the use of data mining techniques to recover the design of imperative legacy code. We describe a general method to apply data mining to design recovery tasks. The method consists of three steps: (1) design a database view of the subject system that is used as the input to the mining process, (2) select a data mining algorithm to mine the required knowledge, and (3) design algorithms that consolidate the results of the data mining process into a meaningful high level abstraction. To test the feasibility of this method we have developed the ISA methodology. ISA identifies data cohesive subsystems based on mined association rules. We have been able to apply ISA to decompose COBOL systems into a set of subsystems. Using data mining for design recovery offers two advantages. First, data mining is capable of producing a logical decomposition of a system without using supporting information such as documentation or a priori knowledge about the functionality of the system. Second, data mining is designed to deal with a large amount of information. Therefore, data mining can perform well regardless of the size of the subject system. These advantages are important, given the lack of documentation and the size of most legacy systems. The rest of the paper is organized as follows. Section 2 includes the related work. Section 3 explains the underlying ideas behind data mining and the motivation for pursuing this approach. Section 4 describes the ISA methodology followed by a case study in section 5. Section 6 contains the results of the case study, and section 7 presents the conclusions. 2. Related work Research in design recovery uses different approaches and produces diverse results. For example, although the preferred source of information to recover the design is the source code, some research uses non-code sources such as data flow diagrams [2] and structured specifications [3]. Works using the source code as a primary source of information utilize different approaches. Formal approaches use rigorous mathematical procedures and notations to extract a formal description of the subject system. The advantage of such a description is that it is precise, verifiable, and prone to automation. Works in this category are described in [4] and [5]. Knowledge-based works include adding knowledge representation of some type to the design recovery process [6]. Other research addresses legacy systems by identifying objects. That is, the goal is to extract an object-oriented representation of the subject system [7], [8]. Moreover, there has been an increased interest on object recovery or the identification of objects within procedural programs [9], [10]. Other related works focus on production of a hierarchical design description [11], identification of clichés [12], and generation of a SSADM (Structured Systems Analysis and Design Method) representation [13]. The idea of using a database representation of the subject system is not new. For example, Chen [14] generates a relational view of C code to support software activities such as graphical views, subsystem extraction, binding analysis, dead code elimination, and program layering. Narat [15] uses a database to support maintenance activities of source code by producing crossreference documentation. Grass [16] uses CIA++ (C++ Information Abstractor) to extract design information from C++ programs. CIA++ constructs a relational database that contains information obtained from C++ programs. Her aim is to do object recovery by querying the relational database created by CIA++. Although these works use a database representation of the subject system, data mining techniques are not used to extract design information. 3. Data mining and design recovery The underlying idea behind data mining is the extraction of useful information from large databases. For years, organizations have been collecting data as part of their normal operations. Consequently, they have created large databases that contain meaningful information for the organization such as classifications, trends, and patterns. This information is a powerful base for making decisions, forecasts, and planning. Unfortunately, traditional querying systems are not capable of extracting this "hidden" information; thus, data mining technology emerged to extract such information. The overall process of discovering useful knowledge from large volumes of data is known as knowledge discovery in databases [17]. This process encompasses several steps that range from understanding the problem domain and data preparation to interpreting the mined patterns and consolidation of the discovered knowledge. Data mining is one of these steps which consists of applying appropriate data mining algorithms to extract the desired patterns from the databases. In this context, patterns refer to expressions that represent facts about the data contained in the database [18]. An example of these patterns is s% of the customers that purchase item A in a visit purchase items B and C in the following visit.

3 The objectives of data mining algorithms include: Classifications. The objective is to classify a data item into one of several predefined classes. For example, data mining classification algorithms can be used to identify particular objects in huge image databases. Clustering. The idea is to group data items to form classes or clusters of data items according to some similarity function. In this case, the data mining algorithm defines the classes as opposed to classification where the classes are predefined. For instance, data mining clustering algorithms can be used to identify groups of homogeneous people to help develop a marketing plan. Associations. The objective is to find association rules of the form c% of the customers that buy product A also buy products B and D. This kind of information can be used to design the floor plan of the store, the marketing strategy, or even to forecast inventory levels. Sequences. The idea is to find sequences of events. For example, if event A occurs, then c% of the time events B and D occur within the next t units of time. This information can be used to forecast equipment failures and stock booms. Consequently, there are different data mining algorithms. Each algorithm aims to mine a specific pattern (knowledge). We are particularly interested in data mining due to the following observations. First, data mining can discover unsuspected non-trivial relationships among data elements in large databases. Second, data mining techniques are capable of mining relevant information regardless of the previous knowledge of the object of study. Finally, data mining is designed to work with a large amount of information. These features of data mining suggest that this technique has potential as a valuable tool for reengineering and maintenance tasks. For instance, the first observation suggests that if a software system were represented as a database, it might be possible to apply data mining to unveil relevant relationships among systems components. In other words, data mining could elicit new knowledge about a subject system. This information could be used to support diverse reverse engineering and maintenance tasks such as design recovery, object extraction, identification of reusable parts, and detection of repeated code. The second observation suggests that data mining techniques have the potential to produce novel ways to relate systems' components even without previous knowledge of the system s functionality and implementation details. Finally, the last observation implies that data mining can analyze large software systems at no detriment to performance. Indeed, the larger the system the better chances that data mining will produce significant information. Therefore, this approach seems well suited for reengineering and maintaining legacy systems. 4. The ISA methodology This section describes our approach to design recovery. We first describe a general method to apply data mining to design recovery. Then, we present an instance of the method, called the ISA methodology (Identification of Subsystems based on Associations). The general method consists of the following three steps: 1. Define a database view of the system. A database view of the system is a representation of the system or a subset of it using a database. The data to be loaded into the database comes from any source of information, primarily from the source code (e.g., variables, programs, modules, files). The selection of the database view determines the type of information that the data mining algorithms can mine. Consequently, the selection of this view is critical to the success of the mining analysis, and it is done with the selection of the particular data mining algorithm in mind. 2. Perform data mining. This step involves the selection and use of data mining algorithms to mine the database view of the system. The selection of data mining algorithms depends on the specific information requirements of the design recovery process, such as associations, sequences, classifications, or clusters. 3. Consolidate and interpret results. The outcome of the mining process is combined into meaningful knowledge to construct the design of the system. This method can be applied at different granularity levels. For instance, it has potential to be used to analyze a program, a module, or a system. In addition, it can produce diverse high level abstractions depending on the specific instantiation of the database view, the mining algorithms, and the consolidation procedure. In order to apply the method, we developed the ISA methodology. ISA is a system level methodology whose objective is to produce a decomposition of a system into data cohesive subsystems. In this context, a subsystem is a subset of programs and files. A data cohesive subsystem is formed by programs that use the same persistent data repositories (i.e., data files). ISA instantiates the three-step method as follows. In the first step, ISA represents the database view of the system as a set of tuples. In the second step, ISA mines association rules. In the last step, ISA builds a table based on the associations found. Then, the table is used to

4 identify subsystems. Before describing each step, we introduce some definitions. Let 6 be a software system composed of a set of programs 3 and a set of data files ). For example, a payroll system written in COBOL may consist of several programs and several files. The system would likely include programs to print the payroll, to print checks, to perform the calculations, and to report tax withheld. In addition, the system would contain the employee file, the salaries file, and the scheduling file. For simplicity, this definition of a system does not include script files, and JCL (Job Control Language) scripts. A program p uses a file f if p reads or writes information on f. Let U(p, )) denote the number of files f ) that the program p uses. For example, if ) ={A, B, D, H, G}, 3 ={m, t, w, x}, and program x uses files A, D, and G, then U(x, ))=3. Similarly, Q(f, 3) represent the number of programs p 3 that use file f. That is, if programs t and w use file B, then Q(B, 3) = 2. We describe each of the three steps of the ISA methodology in sections 4.1, 4.2, and Define a database view of the system The ISA methodology performs a data preprocessing before defining the database view of the system. The objective is to generate a clean data set for the mining process. The resulting data set is known as the alpha set. The alpha set is the set A={P, F} such that P 3, F ), P = {p U(p, F) > γ}, and F = {f Q(f, P) > β} where γ, β are integers and γ > 0, β > 0. The alpha set contains programs that use more than γ files and files that are used by more than β programs. These parameters are necessary to avoid introducing noise to the analysis. For example, if γ = 0 were allowed, the alpha set would include programs that use just one file. These programs do not provide information to form associations among files. The alpha set is not produced just by removing from 6 the files and programs that do not satisfy the β and γ constraints. It is possible that by removing a program or file from 6 another program or file would not satisfy the β and γ constraints. Thus, the generation of the alpha set is an iterative process. The algorithm in Figure 1 produces the alpha set. Having the alpha set, the database view of the system is defined as the set of tuples T = t 1, t 2,..., t F where t i ={ p P p uses f i }. There is a tuple t for each file f F. A tuple t i contains all the programs that use file f i Perform data mining The data mining task used in this step is the search of association rules on the alpha set. Agrawal, Imielinski, and (1) done = false, F=), P=3, F =, P = (2) do (3) F = {f F Q(f, P) > β} (4) P = {p P U(p, F ) > γ} (5) if (F = F) and (P = P) then (6) done = true (7) elseif (8) F = F (9) P = P (10) until (done) (11) A={P, F} Figure 1. Algorithm to obtain the alpha set Swami [19] introduced the problem of mining association rules from large databases of transactions. Their original idea is to find associations among the items a customer buys. For instance, having a large database of transactions, each transaction containing all the products purchased by a customer in a particular visit, the goal is to produce rules of the form 90% of the times a customer buys milk, she also buys bananas. Formally, the problem of mining association rules is defined as follows [19]: Let I={i 1, i 2, i 3,, i m } be a set of items. D is a set of transactions R such that R I. Additionally, let say that R contains X if X R. An association rule is an implication X Y, where X I, Y I, and X Y =. The rule X Y holds in D with confidence c if c% of transactions in D that contain X also contain Y. In addition, the rule X Y has support s if s% of the transactions in D contain X Y. Then, the problem of finding association rules in a set of transactions consists on finding all the association rules having s > minsup and c > minconf. Minsup and minconf are user-supplied parameters representing the minimum required support and confidence respectively. In the ISA notation, P represents the set of items I, T is the set of transactions D, and a tuple t is a transaction R. Mining the alpha set for association rules produces associations of the form s[p 1, p 2,..., p n ], where s is the support of the association, n P, and p i P. In other words, s is the number of tuples in T that contain p 1, p 2,..., p n. The interpretation of such an association rule is that programs p 1, p 2,..., p n use the same s files. The association does not mean that all the programs in the association use just the s files; rather, it means that those s files are common among the programs in the association. Thus, an association provides the rationale to form groups of programs based on the common data repositories the programs use Consolidate and interpret results Once the associations are mined they are used to form groups of programs and files (i.e. subsystems). To this

5 end, a grouping table is built. This grouping table organizes programs and files in rows and columns, respectively. For each program in the alpha set there is a row in the grouping table. Similarly, for each file in the alpha set there is a column in the grouping table. The intersection of a row with a column is marked if the program represented by the row uses the file represented by the column. The construction of the grouping table is a bottom-up iterative process. In each iteration, new rows are added to the grouping table. Then, the rows are reorganized to keep the programs that use similar files in adjacent rows. Likewise, the columns are rearranged to maintain the files that are used together in adjacent columns. This process continues until all the programs are represented in the grouping table. The algorithm in Figure 2 builds the grouping table. Some heuristics are required in lines 3, 7, and 14 of the algorithm. For example, let a i ={p 1, p 2 } where a i $. Assume that the algorithm is in iteration i, that p 1, and p 3 are already in the grouping table, and that p 1 is in row 10 and p 3 is in row 11. Assume also that s i =14, and that M(p 1, p 3 ) = 16. Line 3 of the algorithm requires putting all p a i in adjacent rows. That is, p 2 has to go adjacent to p 1. However, M(p 1, p 3 ) > M(p 1, p 2 ). In this case, we add a new row before row 10 and put p 2 in this new row, and then renumber the rows. Once the table has been built, it contains a grouping of programs and files. The programs that use a similar set of files are in adjacent rows and the files that are likely to be used together are in adjacent columns, thereby identifying groups of programs that use the same set of files. In this sense, ISA identifies data cohesive subsystems. The final grouping might contain files that are used by more than one group of programs (subsystem). This situation is normal because the subsystems are interrelated. These files can be seen as the interfaces among subsystems. Consequently, the ISA methodology decomposes a software system 6 into k subsystems Z i ={G i, H i } where G i 3 and H i ), for i= 1,2,, k, and G i G j =, H i H j = for i, j = 1, 2,, k, and i j. This decomposition does not include all the programs and files in 6 because some programs cannot be classified into any subsystem and some files are used by several subsystems Therefore, G 1 G 2 G k may not be equal to 3, and H 1 H 2 H k may not be equal to ). Although this subsystem decomposition produces disjoint sets of programs and files, it does not imply that a program in subsystem Z i cannot use a file in subsystem Z j. 5. Case study We have applied the methodology to COBOL systems. For simplicity, we discuss the details of applying ISA to a small system, known as the TRS system, consisting of approximately 25 KLOC distributed in 28 COBOL programs. TRS also uses 36 data files. We started by creating the alpha set. First, a number was assigned to each file and to each program (e.g., programs p1 to p28, and files f1 to f36). Then, we used the Let M(p x, p y ) denote the number of common files between p x and p y (1) Let $ ={a 1, a 2, a 3,, a k } be the set of associations sorted by s (i.e., s 1 s 2 s k ) (2) for i = 1 to k (3) Put in adjacent rows the programs p a i (4) for each p a i (5) mark the columns that represent a file used in p (6) endfor (7) Put in adjacent columns the files that are common to all the programs in a i (8) if all the programs have been included in the table then (9) STOP (10) endif (11) endfor (12) for each p not included in the grouping table (13) Find a program q in the grouping table such that M(p, q) is maximum (14) Put p in an adjacent row to program q (15) endfor Figure 2. Algorithm to build the grouping table

6 algorithm defined in section 4.1 to create the alpha set. We ran the algorithm with α = 1, and β = 1. The resulting alpha set consisted of 22 programs and 24 files. Next, we created the database view in an ASCII file. Each line in the ASCII file contained a tuple as defined in section 4.1. A tuple consisted of the file number followed by the numbers of the programs that use that file. For example, the tuple [29] means that file 29 is used by programs 16, 18, 19, and 20. There was a record in the ASCII file for each file in the alpha set. Note that the programs numbers within a tuple were sorted due to a requirement of the data mining algorithm that we used. We applied the Apriori [20] algorithm to mine association rules. Apriori mines association rules in two steps. First, Apriori finds all the item sets with transaction support greater than the threshold value minsup. That is, it finds all the sets of items contained in more than minsup transactions. These item sets are called large itemsets. Second, Apriori, generates the association rules based on the large itemsets found in the previous step. Apriori does several passes over the data set. In pass k, Apriori finds large itemsets of size k called k-itemsets. Specifically, for each iteration k, Apriori generates candidate sets using the (k-1)-itemsets and then traverse the tuples to calculate the support for each candidate set. The candidate sets with support > minsup form the k-itemset. The process continues until Apriori cannot generate more candidate sets. For a detailed description of the Apriori algorithm refer to [20]. Our version of Apriori does not use the minisup parameter as a percentage but as a number. Moreover, our version of Apriori just finds large itemsets. We ran Apriori using minsup = 3 to mine for associations with support equal or greater than 3. We obtained 538 associations. Then, we sorted the associations by support. The top 10 associations are shown in Table 1. The first association 16[16 18] means that programs 16 and 18 use the same 16 files. If we consider that program 16 uses 17 files and program 18 uses 18 files, the first association implies that these programs share more than 85% of the files they use. The third association 14[ ] means that programs 16, 18, and 19 use the same 14 files (program 19 uses 16 files). This information could indicate that these programs perform similar or complementary functions. Thus, this set of programs and their 14 common files can be grouped in one subsystem. Next, we built the grouping table. We started with the first association in Table 1. We created the first two rows of the table. Row one represented program 16 and row two program 18. Each of the 24 columns represented one file. We put a 1 in the columns that represent files used by programs 16 and 18 (Figure 3). Then, we considered the second association and drew a third row representing program 19. The third association was trivial since we only had three rows. However, we used this information to arrange the columns in such a way that the 14 common files were in adjacent columns. Similarly, the fourth association was used to arrange columns. The fifth association produced rows 4 and 5 (Figure 4). We continued in this way until we had used all associations. Finally, we appended to the table the programs that were not included in the associations but were in the alpha set. The complete table is shown in Figure Results No s Programs Table 1. Associations with largest support The grouping table in Figure 5 was used to identify subsystems. The table contains 5 groups of programs. The first group contains programs 17, 16, 18, and 19. Programs 16, 18, and 19 use 11 files that are not used by any other program in the system. Moreover, files 30, 25, 17, 9, and 18 are used just by programs in this group. The second Files p p Figure 3. First two rows of the grouping table

7 Files p p p p p Figure 4. Grouping table with five rows group contains programs 23, 28, and 26. Programs 23 and 28 use the same set of files. Program 26 uses a subset of the files that programs 23 and 28 use. In addition, file 28 is used only within this group. A similar analysis can be applied to the rest of the groups with the exception of the last group. The fifth group contains programs that could not been classified into the rest of the groups (i.e., programs 20 and 21). We also found this type of result when analyzing other software systems. Generally, the number of unclassified programs is small compared with the total number of programs in the system. Therefore, we consider these programs as exceptions, and do not consider group 5 as a subsystem. As previously indicated, some files are used across groups. For example, file 26 is used in all subsystems, file 23 is used in three subsystems, and files 3 and 5 are used in two subsystems. These files could be seen as communication buffers or links among subsystems. 7. Conclusions We have presented our initial work of using data mining techniques to recover designs of software systems. We proposed a general three-step method that can be used as a framework to apply data mining at different granularity levels and to produce different high level abstractions. In addition, we described an instantiation of this method, the ISA methodology, which decomposes a software system into data cohesive subsystems by mining association rules. Our experience shows that data mining can be used to produce a logical decomposition of a software system. Data mining offers the advantage that it can identify data cohesive subsystems without any knowledge of the subject system. The only required source of information is the source code. Moreover, data mining is capable of Files p p p p p p p p p p p p p p p p p1 1 1 p p2 1 1 p9 1 1 p p Figure 5 Grouping table

8 producing meaningful results regardless of the size of the system. These properties of data mining make this approach especially appropriate to the analysis of large undocumented software systems. Furthermore, the approach has great potential for automation as shown by the ISA methodology. Therefore, data mining has potential as a valuable tool for the reverse engineering and maintenance domains. The next steps are the modification of the methodology to facilitate its automation and scalability, and the definition of a graphical model to represent the information generated by the methodology. References [1] E.J. Chikofsky, J.H. Cross II, Reverse Engineering and Design Recovery: A Taxonomy, IEEE Software, Vol. 7, No 1, Jan. 1990, pp [2] G. Butler, P. Grogono, R. Shinghal, I. Tjandra, Retrieving Information from Data Flow Diagrams, in Proc. Second Working Conference on Reverse Engineering, IEEE Computer Society Press, July 1995, pp [3] J.C.S.P. Leite, P.M. Cerqueira, Recovering Business Rules from Structured Analysis Specifications in Proc. Second Working Conference on Reverse Engineering, IEEE Computer Society Press, July 1995, pp [4] K. Lano, P.T. Breuer, H. Haughton, Reverse-engineering COBOL via Formal Methods, Journal of Software Maintenance: Research and Practice, Vol.5, 1993, pp [5] A. Cimitile, A. De Lucia, M. Munro, Qualifying Reusable Functions Using Symbolic Execution, in Proc. Second Working Conference on Reverse Engineering, IEEE Computer Society Press, July 1995, pp [6] D.R. Harris, H.B. Reubenstein, A.S. Yeh, Reverse Engineering to the Architectural Level, in Proc.17 th Int l Conf. on Software Engineering, IEEE Computer Society Press, 1995, pp [7] I. Jacobson, F. Lindström, Reengineering-engineering of old systems to an object-oriented architecture, in Proceedings of OOPSLA, 1991, pp [8] P. Newcomb, G. Kotik, Reengineering Procedural Into Object-Oriented Systems, in Proc. Second Working Conference on Reverse Engineering, IEEE Computer Society Press, July 1995, pp [9] H. Gall, R. Klösch, Finding Objects in Procedural Programs: An Alternative Approach, in Proc. Second Working Conference on Reverse Engineering, IEEE Computer Society Press, July 1995, pp [10] H. M. Sneed, E. Nyáry, Extracting Object-Oriented Specification from Procedurally Oriented Programs, in Proc. Second Working Conf. on Reverse Engineering, IEEE Computer Society Press, July 1995, pp [11] S. C. Choi, W. Scacchi, Extracting and Restructuring the Design of Large Systems, IEEE Software, Vol. 7, No 1, Jan. 1990, pp [12] C. Rich, L. M. Wills, Recognizing a Program s Design: A Graph-Parsing Approach, IEEE Software, Vol. 7, No 1, Jan. 1990, pp [13] H.M. Edwards, M. Munro, RECAST: Reverse Engineering from COBOL to SSADM Specification, in Proc. of the Working Conference on Reverse Engineering IEEE Computer Society Press, 1993, pp [14] Yih-Farn Chen, M. Y. Nishimoto, C.V. Ramamoorty, The C Information Abstraction System, IEEE Transactions on Software Engineering, Vol. 16, No. 3, Mar. 1990, pp [15] V. Narat, Using a relational database for software maintenance: a case study, in Proc. IEEE Conference on Software Maintenance CSM-93, IEEE Computer Society Press, 1993, pp [16] J.E. Grass, Object-Oriented Design Archaeology with CIA++, Computing Systems: The Journal of the USENIX Association, Vol. 5, No. 1, Winter 1992, pp [17] U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, The KDD Process for Extracting Useful Knowledge from Volumes of Data, Communications of the ACM, Vol. 39, No. 11, Nov. 1996, pp [18] U. Fayyad, G. Piatestsky-Shapiro, P. Smyth, From Data Mining to Knowledge Discovery: An Overview, U. Fayyad, G. Piatestsky-Shapiro, P. Smyth, R Uthurusamy, eds., Advances in Knowledge Discovery and Data Mining, chapter 1, AAAI/MIT Press, [19] R. Agrawal, T. Imielinski, A. Swami, Mining Association Rules between Sets of Items in Large Databases, Proc. ACM SIGMOD Int l Conf. Management of Data, May 1993, pp [20] R. Agrawal, R.Srikant, Fast Algorithms for Mining Association Rules in Large Databases, Proc. 20 th Int l Conf. on Very Large Data Bases (VLDB 94), Sep. 1994, pp

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42 Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth

More information

A mining method for tracking changes in temporal association rules from an encoded database

A mining method for tracking changes in temporal association rules from an encoded database A mining method for tracking changes in temporal association rules from an encoded database Chelliah Balasubramanian *, Karuppaswamy Duraiswamy ** K.S.Rangasamy College of Technology, Tiruchengode, Tamil

More information

ASSOCIATION RULE MINING: MARKET BASKET ANALYSIS OF A GROCERY STORE

ASSOCIATION RULE MINING: MARKET BASKET ANALYSIS OF A GROCERY STORE ASSOCIATION RULE MINING: MARKET BASKET ANALYSIS OF A GROCERY STORE Mustapha Muhammad Abubakar Dept. of computer Science & Engineering, Sharda University,Greater Noida, UP, (India) ABSTRACT Apriori algorithm

More information

Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering Amravati

Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering Amravati Analytical Representation on Secure Mining in Horizontally Distributed Database Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering

More information

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Kranti Patil 1, Jayashree Fegade 2, Diksha Chiramade 3, Srujan Patil 4, Pradnya A. Vikhar 5 1,2,3,4,5 KCES

More information

Data Mining: Mining Association Rules. Definitions. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

Data Mining: Mining Association Rules. Definitions. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Mining Association Rules Definitions Market Baskets. Consider a set I = {i 1,...,i m }. We call the elements of I, items.

More information

Temporal Weighted Association Rule Mining for Classification

Temporal Weighted Association Rule Mining for Classification Temporal Weighted Association Rule Mining for Classification Purushottam Sharma and Kanak Saxena Abstract There are so many important techniques towards finding the association rules. But, when we consider

More information

Association Rule Mining. Entscheidungsunterstützungssysteme

Association Rule Mining. Entscheidungsunterstützungssysteme Association Rule Mining Entscheidungsunterstützungssysteme Frequent Pattern Analysis Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set

More information

An Improved Apriori Algorithm for Association Rules

An Improved Apriori Algorithm for Association Rules Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan

More information

A Data Mining Framework for Extracting Product Sales Patterns in Retail Store Transactions Using Association Rules: A Case Study

A Data Mining Framework for Extracting Product Sales Patterns in Retail Store Transactions Using Association Rules: A Case Study A Data Mining Framework for Extracting Product Sales Patterns in Retail Store Transactions Using Association Rules: A Case Study Mirzaei.Afshin 1, Sheikh.Reza 2 1 Department of Industrial Engineering and

More information

Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules

Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules Manju Department of Computer Engg. CDL Govt. Polytechnic Education Society Nathusari Chopta, Sirsa Abstract The discovery

More information

DIVERSITY-BASED INTERESTINGNESS MEASURES FOR ASSOCIATION RULE MINING

DIVERSITY-BASED INTERESTINGNESS MEASURES FOR ASSOCIATION RULE MINING DIVERSITY-BASED INTERESTINGNESS MEASURES FOR ASSOCIATION RULE MINING Huebner, Richard A. Norwich University rhuebner@norwich.edu ABSTRACT Association rule interestingness measures are used to help select

More information

Hierarchical Online Mining for Associative Rules

Hierarchical Online Mining for Associative Rules Hierarchical Online Mining for Associative Rules Naresh Jotwani Dhirubhai Ambani Institute of Information & Communication Technology Gandhinagar 382009 INDIA naresh_jotwani@da-iict.org Abstract Mining

More information

An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction

An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction International Journal of Engineering Science Invention Volume 2 Issue 1 January. 2013 An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction Janakiramaiah Bonam 1, Dr.RamaMohan

More information

620 HUANG Liusheng, CHEN Huaping et al. Vol.15 this itemset. Itemsets that have minimum support (minsup) are called large itemsets, and all the others

620 HUANG Liusheng, CHEN Huaping et al. Vol.15 this itemset. Itemsets that have minimum support (minsup) are called large itemsets, and all the others Vol.15 No.6 J. Comput. Sci. & Technol. Nov. 2000 A Fast Algorithm for Mining Association Rules HUANG Liusheng (ΛΠ ), CHEN Huaping ( ±), WANG Xun (Φ Ψ) and CHEN Guoliang ( Ξ) National High Performance Computing

More information

Discovery of Association Rules in Temporal Databases 1

Discovery of Association Rules in Temporal Databases 1 Discovery of Association Rules in Temporal Databases 1 Abdullah Uz Tansel 2 and Necip Fazil Ayan Department of Computer Engineering and Information Science Bilkent University 06533, Ankara, Turkey {atansel,

More information

EXTRACTION OF REUSABLE COMPONENTS FROM LEGACY SYSTEMS

EXTRACTION OF REUSABLE COMPONENTS FROM LEGACY SYSTEMS EXTRACTION OF REUSABLE COMPONENTS FROM LEGACY SYSTEMS Moon-Soo Lee, Yeon-June Choi, Min-Jeong Kim, Oh-Chun, Kwon Telematics S/W Platform Team, Telematics Research Division Electronics and Telecommunications

More information

Reduce convention for Large Data Base Using Mathematical Progression

Reduce convention for Large Data Base Using Mathematical Progression Global Journal of Pure and Applied Mathematics. ISSN 0973-1768 Volume 12, Number 4 (2016), pp. 3577-3584 Research India Publications http://www.ripublication.com/gjpam.htm Reduce convention for Large Data

More information

Mining Frequent Patterns with Counting Inference at Multiple Levels

Mining Frequent Patterns with Counting Inference at Multiple Levels International Journal of Computer Applications (097 7) Volume 3 No.10, July 010 Mining Frequent Patterns with Counting Inference at Multiple Levels Mittar Vishav Deptt. Of IT M.M.University, Mullana Ruchika

More information

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set To Enhance Scalability of Item Transactions by Parallel and Partition using Dynamic Data Set Priyanka Soni, Research Scholar (CSE), MTRI, Bhopal, priyanka.soni379@gmail.com Dhirendra Kumar Jha, MTRI, Bhopal,

More information

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach ABSTRACT G.Ravi Kumar 1 Dr.G.A. Ramachandra 2 G.Sunitha 3 1. Research Scholar, Department of Computer Science &Technology,

More information

Discovering interesting rules from financial data

Discovering interesting rules from financial data Discovering interesting rules from financial data Przemysław Sołdacki Institute of Computer Science Warsaw University of Technology Ul. Andersa 13, 00-159 Warszawa Tel: +48 609129896 email: psoldack@ii.pw.edu.pl

More information

PATTERN DISCOVERY IN TIME-ORIENTED DATA

PATTERN DISCOVERY IN TIME-ORIENTED DATA PATTERN DISCOVERY IN TIME-ORIENTED DATA Mohammad Saraee, George Koundourakis and Babis Theodoulidis TimeLab Information Management Group Department of Computation, UMIST, Manchester, UK Email: saraee,

More information

Optimization using Ant Colony Algorithm

Optimization using Ant Colony Algorithm Optimization using Ant Colony Algorithm Er. Priya Batta 1, Er. Geetika Sharmai 2, Er. Deepshikha 3 1Faculty, Department of Computer Science, Chandigarh University,Gharaun,Mohali,Punjab 2Faculty, Department

More information

On Multiple Query Optimization in Data Mining

On Multiple Query Optimization in Data Mining On Multiple Query Optimization in Data Mining Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland {marek,mzakrz}@cs.put.poznan.pl

More information

Question Bank. 4) It is the source of information later delivered to data marts.

Question Bank. 4) It is the source of information later delivered to data marts. Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile

More information

An Overview of various methodologies used in Data set Preparation for Data mining Analysis

An Overview of various methodologies used in Data set Preparation for Data mining Analysis An Overview of various methodologies used in Data set Preparation for Data mining Analysis Arun P Kuttappan 1, P Saranya 2 1 M. E Student, Dept. of Computer Science and Engineering, Gnanamani College of

More information

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10) CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification

More information

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm S.Pradeepkumar*, Mrs.C.Grace Padma** M.Phil Research Scholar, Department of Computer Science, RVS College of

More information

Using Association Rules for Better Treatment of Missing Values

Using Association Rules for Better Treatment of Missing Values Using Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine Intelligence Group) National University

More information

Association Rule Mining. Introduction 46. Study core 46

Association Rule Mining. Introduction 46. Study core 46 Learning Unit 7 Association Rule Mining Introduction 46 Study core 46 1 Association Rule Mining: Motivation and Main Concepts 46 2 Apriori Algorithm 47 3 FP-Growth Algorithm 47 4 Assignment Bundle: Frequent

More information

Improved Frequent Pattern Mining Algorithm with Indexing

Improved Frequent Pattern Mining Algorithm with Indexing IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov Dec. 2014), PP 73-78 Improved Frequent Pattern Mining Algorithm with Indexing Prof.

More information

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3

More information

Frequent Itemsets Melange

Frequent Itemsets Melange Frequent Itemsets Melange Sebastien Siva Data Mining Motivation and objectives Finding all frequent itemsets in a dataset using the traditional Apriori approach is too computationally expensive for datasets

More information

Performance Based Study of Association Rule Algorithms On Voter DB

Performance Based Study of Association Rule Algorithms On Voter DB Performance Based Study of Association Rule Algorithms On Voter DB K.Padmavathi 1, R.Aruna Kirithika 2 1 Department of BCA, St.Joseph s College, Thiruvalluvar University, Cuddalore, Tamil Nadu, India,

More information

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department

More information

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery : Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Hong Cheng Philip S. Yu Jiawei Han University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center {hcheng3, hanj}@cs.uiuc.edu,

More information

Thanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently New challenges: with a

Thanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently New challenges: with a Data Mining and Information Retrieval Introduction to Data Mining Why Data Mining? Thanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently

More information

A Quantified Approach for large Dataset Compression in Association Mining

A Quantified Approach for large Dataset Compression in Association Mining IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 15, Issue 3 (Nov. - Dec. 2013), PP 79-84 A Quantified Approach for large Dataset Compression in Association Mining

More information

Keywords Fuzzy, Set Theory, KDD, Data Base, Transformed Database.

Keywords Fuzzy, Set Theory, KDD, Data Base, Transformed Database. Volume 6, Issue 5, May 016 ISSN: 77 18X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Fuzzy Logic in Online

More information

A Conflict-Based Confidence Measure for Associative Classification

A Conflict-Based Confidence Measure for Associative Classification A Conflict-Based Confidence Measure for Associative Classification Peerapon Vateekul and Mei-Ling Shyu Department of Electrical and Computer Engineering University of Miami Coral Gables, FL 33124, USA

More information

Purna Prasad Mutyala et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (5), 2011,

Purna Prasad Mutyala et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (5), 2011, Weighted Association Rule Mining Without Pre-assigned Weights PURNA PRASAD MUTYALA, KUMAR VASANTHA Department of CSE, Avanthi Institute of Engg & Tech, Tamaram, Visakhapatnam, A.P., India. Abstract Association

More information

A System to Automatically Index Genealogical Microfilm Titleboards Introduction Preprocessing Method Identification

A System to Automatically Index Genealogical Microfilm Titleboards Introduction Preprocessing Method Identification A System to Automatically Index Genealogical Microfilm Titleboards Samuel James Pinson, Mark Pinson and William Barrett Department of Computer Science Brigham Young University Introduction Millions of

More information

UNSUPERVISED TRANSFORMATION OF PROCEDURAL PROGRAMS TO OBJECT-ORIENTED DESIGN. Istvan Gergely Czibula and Gabriela Czibula

UNSUPERVISED TRANSFORMATION OF PROCEDURAL PROGRAMS TO OBJECT-ORIENTED DESIGN. Istvan Gergely Czibula and Gabriela Czibula Acta Universitatis Apulensis, ISSN 1582-5329, 2011, Special Issue on Understanding Complex Systems, Eds. Barna Iantovics, Ladislav Hluchý and Roumen Kountchev UNSUPERVISED TRANSFORMATION OF PROCEDURAL

More information

Circle Graphs: New Visualization Tools for Text-Mining

Circle Graphs: New Visualization Tools for Text-Mining Circle Graphs: New Visualization Tools for Text-Mining Yonatan Aumann, Ronen Feldman, Yaron Ben Yehuda, David Landau, Orly Liphstat, Yonatan Schler Department of Mathematics and Computer Science Bar-Ilan

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN: IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 20131 Improve Search Engine Relevance with Filter session Addlin Shinney R 1, Saravana Kumar T

More information

Tadeusz Morzy, Maciej Zakrzewicz

Tadeusz Morzy, Maciej Zakrzewicz From: KDD-98 Proceedings. Copyright 998, AAAI (www.aaai.org). All rights reserved. Group Bitmap Index: A Structure for Association Rules Retrieval Tadeusz Morzy, Maciej Zakrzewicz Institute of Computing

More information

Chapter 4 Data Mining A Short Introduction

Chapter 4 Data Mining A Short Introduction Chapter 4 Data Mining A Short Introduction Data Mining - 1 1 Today's Question 1. Data Mining Overview 2. Association Rule Mining 3. Clustering 4. Classification Data Mining - 2 2 1. Data Mining Overview

More information

An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets

An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.8, August 2008 121 An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets

More information

ETP-Mine: An Efficient Method for Mining Transitional Patterns

ETP-Mine: An Efficient Method for Mining Transitional Patterns ETP-Mine: An Efficient Method for Mining Transitional Patterns B. Kiran Kumar 1 and A. Bhaskar 2 1 Department of M.C.A., Kakatiya Institute of Technology & Science, A.P. INDIA. kirankumar.bejjanki@gmail.com

More information

FUFM-High Utility Itemsets in Transactional Database

FUFM-High Utility Itemsets in Transactional Database Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 3, March 2014,

More information

Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm

Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm Expert Systems: Final (Research Paper) Project Daniel Josiah-Akintonde December

More information

Fast Discovery of Sequential Patterns Using Materialized Data Mining Views

Fast Discovery of Sequential Patterns Using Materialized Data Mining Views Fast Discovery of Sequential Patterns Using Materialized Data Mining Views Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo

More information

FM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data

FM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data FM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data Qiankun Zhao Nanyang Technological University, Singapore and Sourav S. Bhowmick Nanyang Technological University,

More information

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining P.Subhashini 1, Dr.G.Gunasekaran 2 Research Scholar, Dept. of Information Technology, St.Peter s University,

More information

Software Architecture Recovery based on Dynamic Analysis

Software Architecture Recovery based on Dynamic Analysis Software Architecture Recovery based on Dynamic Analysis Aline Vasconcelos 1,2, Cláudia Werner 1 1 COPPE/UFRJ System Engineering and Computer Science Program P.O. Box 68511 ZIP 21945-970 Rio de Janeiro

More information

Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials *

Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials * Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials * Galina Bogdanova, Tsvetanka Georgieva Abstract: Association rules mining is one kind of data mining techniques

More information

Discovery of Multi Dimensional Quantitative Closed Association Rules by Attributes Range Method

Discovery of Multi Dimensional Quantitative Closed Association Rules by Attributes Range Method Discovery of Multi Dimensional Quantitative Closed Association Rules by Attributes Range Method Preetham Kumar, Ananthanarayana V S Abstract In this paper we propose a novel algorithm for discovering multi

More information

Efficient Distributed Data Mining using Intelligent Agents

Efficient Distributed Data Mining using Intelligent Agents 1 Efficient Distributed Data Mining using Intelligent Agents Cristian Aflori and Florin Leon Abstract Data Mining is the process of extraction of interesting (non-trivial, implicit, previously unknown

More information

Understanding Rule Behavior through Apriori Algorithm over Social Network Data

Understanding Rule Behavior through Apriori Algorithm over Social Network Data Global Journal of Computer Science and Technology Volume 12 Issue 10 Version 1.0 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc. (USA) Online ISSN: 0975-4172

More information

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer Data Mining George Karypis Department of Computer Science Digital Technology Center University of Minnesota, Minneapolis, USA. http://www.cs.umn.edu/~karypis karypis@cs.umn.edu Overview Data-mining What

More information

Value Added Association Rules

Value Added Association Rules Value Added Association Rules T.Y. Lin San Jose State University drlin@sjsu.edu Glossary Association Rule Mining A Association Rule Mining is an exploratory learning task to discover some hidden, dependency

More information

Model for Load Balancing on Processors in Parallel Mining of Frequent Itemsets

Model for Load Balancing on Processors in Parallel Mining of Frequent Itemsets American Journal of Applied Sciences 2 (5): 926-931, 2005 ISSN 1546-9239 Science Publications, 2005 Model for Load Balancing on Processors in Parallel Mining of Frequent Itemsets 1 Ravindra Patel, 2 S.S.

More information

Materialized Data Mining Views *

Materialized Data Mining Views * Materialized Data Mining Views * Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland tel. +48 61

More information

CompSci 516 Data Intensive Computing Systems

CompSci 516 Data Intensive Computing Systems CompSci 516 Data Intensive Computing Systems Lecture 20 Data Mining and Mining Association Rules Instructor: Sudeepa Roy CompSci 516: Data Intensive Computing Systems 1 Reading Material Optional Reading:

More information

C-NBC: Neighborhood-Based Clustering with Constraints

C-NBC: Neighborhood-Based Clustering with Constraints C-NBC: Neighborhood-Based Clustering with Constraints Piotr Lasek Chair of Computer Science, University of Rzeszów ul. Prof. St. Pigonia 1, 35-310 Rzeszów, Poland lasek@ur.edu.pl Abstract. Clustering is

More information

Efficient Remining of Generalized Multi-supported Association Rules under Support Update

Efficient Remining of Generalized Multi-supported Association Rules under Support Update Efficient Remining of Generalized Multi-supported Association Rules under Support Update WEN-YANG LIN 1 and MING-CHENG TSENG 1 Dept. of Information Management, Institute of Information Engineering I-Shou

More information

International Journal of Scientific Research and Reviews

International Journal of Scientific Research and Reviews Research article Available online www.ijsrr.org ISSN: 2279 0543 International Journal of Scientific Research and Reviews A Survey of Sequential Rule Mining Algorithms Sachdev Neetu and Tapaswi Namrata

More information

Aggregation and maintenance for database mining

Aggregation and maintenance for database mining Intelligent Data Analysis 3 (1999) 475±490 www.elsevier.com/locate/ida Aggregation and maintenance for database mining Shichao Zhang School of Computing, National University of Singapore, Lower Kent Ridge,

More information

Generation of Potential High Utility Itemsets from Transactional Databases

Generation of Potential High Utility Itemsets from Transactional Databases Generation of Potential High Utility Itemsets from Transactional Databases Rajmohan.C Priya.G Niveditha.C Pragathi.R Asst.Prof/IT, Dept of IT Dept of IT Dept of IT SREC, Coimbatore,INDIA,SREC,Coimbatore,.INDIA

More information

APRIORI ALGORITHM FOR MINING FREQUENT ITEMSETS A REVIEW

APRIORI ALGORITHM FOR MINING FREQUENT ITEMSETS A REVIEW International Journal of Computer Application and Engineering Technology Volume 3-Issue 3, July 2014. Pp. 232-236 www.ijcaet.net APRIORI ALGORITHM FOR MINING FREQUENT ITEMSETS A REVIEW Priyanka 1 *, Er.

More information

A Study on Mining of Frequent Subsequences and Sequential Pattern Search- Searching Sequence Pattern by Subset Partition

A Study on Mining of Frequent Subsequences and Sequential Pattern Search- Searching Sequence Pattern by Subset Partition A Study on Mining of Frequent Subsequences and Sequential Pattern Search- Searching Sequence Pattern by Subset Partition S.Vigneswaran 1, M.Yashothai 2 1 Research Scholar (SRF), Anna University, Chennai.

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 1, Number 1 (2015), pp. 25-31 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

1. Inroduction to Data Mininig

1. Inroduction to Data Mininig 1. Inroduction to Data Mininig 1.1 Introduction Universe of Data Information Technology has grown in various directions in the recent years. One natural evolutionary path has been the development of the

More information

Mining Spatial Gene Expression Data Using Association Rules

Mining Spatial Gene Expression Data Using Association Rules Mining Spatial Gene Expression Data Using Association Rules M.Anandhavalli Reader, Department of Computer Science & Engineering Sikkim Manipal Institute of Technology Majitar-737136, India M.K.Ghose Prof&Head,

More information

Item Set Extraction of Mining Association Rule

Item Set Extraction of Mining Association Rule Item Set Extraction of Mining Association Rule Shabana Yasmeen, Prof. P.Pradeep Kumar, A.Ranjith Kumar Department CSE, Vivekananda Institute of Technology and Science, Karimnagar, A.P, India Abstract:

More information

EVALUATING GENERALIZED ASSOCIATION RULES THROUGH OBJECTIVE MEASURES

EVALUATING GENERALIZED ASSOCIATION RULES THROUGH OBJECTIVE MEASURES EVALUATING GENERALIZED ASSOCIATION RULES THROUGH OBJECTIVE MEASURES Veronica Oliveira de Carvalho Professor of Centro Universitário de Araraquara Araraquara, São Paulo, Brazil Student of São Paulo University

More information

Mining of Web Server Logs using Extended Apriori Algorithm

Mining of Web Server Logs using Extended Apriori Algorithm International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

More information

Data Access Paths for Frequent Itemsets Discovery

Data Access Paths for Frequent Itemsets Discovery Data Access Paths for Frequent Itemsets Discovery Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science {marekw, mzakrz}@cs.put.poznan.pl Abstract. A number

More information

Chapter 4: Association analysis:

Chapter 4: Association analysis: Chapter 4: Association analysis: 4.1 Introduction: Many business enterprises accumulate large quantities of data from their day-to-day operations, huge amounts of customer purchase data are collected daily

More information

Parallel Implementation of Apriori Algorithm Based on MapReduce

Parallel Implementation of Apriori Algorithm Based on MapReduce International Journal of Networked and Distributed Computing, Vol. 1, No. 2 (April 2013), 89-96 Parallel Implementation of Apriori Algorithm Based on MapReduce Ning Li * The Key Laboratory of Intelligent

More information

A Novel Texture Classification Procedure by using Association Rules

A Novel Texture Classification Procedure by using Association Rules ITB J. ICT Vol. 2, No. 2, 2008, 03-4 03 A Novel Texture Classification Procedure by using Association Rules L. Jaba Sheela & V.Shanthi 2 Panimalar Engineering College, Chennai. 2 St.Joseph s Engineering

More information

An Algorithm for Frequent Pattern Mining Based On Apriori

An Algorithm for Frequent Pattern Mining Based On Apriori An Algorithm for Frequent Pattern Mining Based On Goswami D.N.*, Chaturvedi Anshu. ** Raghuvanshi C.S.*** *SOS In Computer Science Jiwaji University Gwalior ** Computer Application Department MITS Gwalior

More information

Medical Data Mining Based on Association Rules

Medical Data Mining Based on Association Rules Medical Data Mining Based on Association Rules Ruijuan Hu Dep of Foundation, PLA University of Foreign Languages, Luoyang 471003, China E-mail: huruijuan01@126.com Abstract Detailed elaborations are presented

More information

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace [Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 20 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(20), 2014 [12526-12531] Exploration on the data mining system construction

More information

CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM. Please purchase PDF Split-Merge on to remove this watermark.

CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM. Please purchase PDF Split-Merge on   to remove this watermark. 119 CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM 120 CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM 5.1. INTRODUCTION Association rule mining, one of the most important and well researched

More information

A New Technique to Optimize User s Browsing Session using Data Mining

A New Technique to Optimize User s Browsing Session using Data Mining Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

Fuzzy Cognitive Maps application for Webmining

Fuzzy Cognitive Maps application for Webmining Fuzzy Cognitive Maps application for Webmining Andreas Kakolyris Dept. Computer Science, University of Ioannina Greece, csst9942@otenet.gr George Stylios Dept. of Communications, Informatics and Management,

More information

SQL Based Association Rule Mining using Commercial RDBMS (IBM DB2 UDB EEE)

SQL Based Association Rule Mining using Commercial RDBMS (IBM DB2 UDB EEE) SQL Based Association Rule Mining using Commercial RDBMS (IBM DB2 UDB EEE) Takeshi Yoshizawa, Iko Pramudiono, Masaru Kitsuregawa Institute of Industrial Science, The University of Tokyo 7-22-1 Roppongi,

More information

Ubiquitous Computing and Communication Journal (ISSN )

Ubiquitous Computing and Communication Journal (ISSN ) A STRATEGY TO COMPROMISE HANDWRITTEN DOCUMENTS PROCESSING AND RETRIEVING USING ASSOCIATION RULES MINING Prof. Dr. Alaa H. AL-Hamami, Amman Arab University for Graduate Studies, Amman, Jordan, 2011. Alaa_hamami@yahoo.com

More information

Data warehouse and Data Mining

Data warehouse and Data Mining Data warehouse and Data Mining Lecture No. 14 Data Mining and its techniques Naeem A. Mahoto Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

Sathyamangalam, 2 ( PG Scholar,Department of Computer Science and Engineering,Bannari Amman Institute of Technology, Sathyamangalam,

Sathyamangalam, 2 ( PG Scholar,Department of Computer Science and Engineering,Bannari Amman Institute of Technology, Sathyamangalam, IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 8, Issue 5 (Jan. - Feb. 2013), PP 70-74 Performance Analysis Of Web Page Prediction With Markov Model, Association

More information

Pamba Pravallika 1, K. Narendra 2

Pamba Pravallika 1, K. Narendra 2 2018 IJSRSET Volume 4 Issue 1 Print ISSN: 2395-1990 Online ISSN : 2394-4099 Themed Section : Engineering and Technology Analysis on Medical Data sets using Apriori Algorithm Based on Association Rules

More information

Algorithm for Efficient Multilevel Association Rule Mining

Algorithm for Efficient Multilevel Association Rule Mining Algorithm for Efficient Multilevel Association Rule Mining Pratima Gautam Department of computer Applications MANIT, Bhopal Abstract over the years, a variety of algorithms for finding frequent item sets

More information

9. Conclusions. 9.1 Definition KDD

9. Conclusions. 9.1 Definition KDD 9. Conclusions Contents of this Chapter 9.1 Course review 9.2 State-of-the-art in KDD 9.3 KDD challenges SFU, CMPT 740, 03-3, Martin Ester 419 9.1 Definition KDD [Fayyad, Piatetsky-Shapiro & Smyth 96]

More information

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH International Journal of Information Technology and Knowledge Management January-June 2011, Volume 4, No. 1, pp. 27-32 DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY)

More information

Mining Association Rules with Item Constraints. Ramakrishnan Srikant and Quoc Vu and Rakesh Agrawal. IBM Almaden Research Center

Mining Association Rules with Item Constraints. Ramakrishnan Srikant and Quoc Vu and Rakesh Agrawal. IBM Almaden Research Center Mining Association Rules with Item Constraints Ramakrishnan Srikant and Quoc Vu and Rakesh Agrawal IBM Almaden Research Center 650 Harry Road, San Jose, CA 95120, U.S.A. fsrikant,qvu,ragrawalg@almaden.ibm.com

More information

Mining Temporal Association Rules in Network Traffic Data

Mining Temporal Association Rules in Network Traffic Data Mining Temporal Association Rules in Network Traffic Data Guojun Mao Abstract Mining association rules is one of the most important and popular task in data mining. Current researches focus on discovering

More information

ISSN: (Online) Volume 2, Issue 7, July 2014 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 2, Issue 7, July 2014 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 2, Issue 7, July 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Integration new Apriori algorithm MDNC and Six Sigma to Improve Array yield in the TFT-LCD Industry

Integration new Apriori algorithm MDNC and Six Sigma to Improve Array yield in the TFT-LCD Industry Integration new Apriori algorithm MDNC and Six Sigma to Improve Array yield in the TFT-LCD Industry Chiung-Fen Huang *, Ruey-Shun Chen** * Institute of Information Management, Chiao Tung University Management

More information