International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015)

Similar documents
Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments

Data Mining Part 3. Associations Rules

Comparing the Performance of Frequent Itemsets Mining Algorithms

An Improved Frequent Pattern-growth Algorithm Based on Decomposition of the Transaction Database

Data Structures. Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali Association Rules: Basic Concepts and Application

INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM

Chapter 4: Association analysis:

The Key Technology and Algorithm Design for the Development of Intelligent Examination System

Research and Application of E-Commerce Recommendation System Based on Association Rules Algorithm

Maintenance of the Prelarge Trees for Record Deletion

Appropriate Item Partition for Improving the Mining Performance

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey

Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India

Mining Temporal Association Rules in Network Traffic Data

Salah Alghyaline, Jun-Wei Hsieh, and Jim Z. C. Lai

Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree

A NEW ASSOCIATION RULE MINING BASED ON FREQUENT ITEM SET

Understanding Rule Behavior through Apriori Algorithm over Social Network Data

Improved Balanced Parallel FP-Growth with MapReduce Qing YANG 1,a, Fei-Yang DU 2,b, Xi ZHU 1,c, Cheng-Gong JIANG *

Performance Based Study of Association Rule Algorithms On Voter DB

Improved Frequent Pattern Mining Algorithm with Indexing

Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke

ALGORITHM FOR MINING TIME VARYING FREQUENT ITEMSETS

An Improved Apriori Algorithm for Association Rules

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

An Efficient Algorithm for finding high utility itemsets from online sell

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Mining Frequent Patterns without Candidate Generation

A Comparative Study of Association Rules Mining Algorithms

Mining of Web Server Logs using Extended Apriori Algorithm

Fuzzy Cognitive Maps application for Webmining

Research of Improved FP-Growth (IFP) Algorithm in Association Rules Mining

Efficient Tree Based Structure for Mining Frequent Pattern from Transactional Databases

Research and Improvement of Apriori Algorithm Based on Hadoop

What Is Data Mining? CMPT 354: Database I -- Data Mining 2

A Study on Mining of Frequent Subsequences and Sequential Pattern Search- Searching Sequence Pattern by Subset Partition

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Comparison of FP tree and Apriori Algorithm

Maintenance of fast updated frequent pattern trees for record deletion

Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm

Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

Tutorial on Association Rule Mining

Association rule mining

DATA MINING II - 1DL460

Association Rules Extraction with MINE RULE Operator

Approaches for Mining Frequent Itemsets and Minimal Association Rules

Memory issues in frequent itemset mining

A mining method for tracking changes in temporal association rules from an encoded database

Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering Amravati

ISSN Vol.03,Issue.09 May-2014, Pages:

Pamba Pravallika 1, K. Narendra 2

Certified CSS Designer VS-1028

Association Rules Mining Algorithm based on Matrix Lei MA 1, a

Improved Aprori Algorithm Based on bottom up approach using Probability and Matrix

A BETTER APPROACH TO MINE FREQUENT ITEMSETS USING APRIORI AND FP-TREE APPROACH

I. INTRODUCTION. Keywords : Spatial Data Mining, Association Mining, FP-Growth Algorithm, Frequent Data Sets

This paper proposes: Mining Frequent Patterns without Candidate Generation

Association Rule with Frequent Pattern Growth. Algorithm for Frequent Item Sets Mining

Mining Association Rules From Time Series Data Using Hybrid Approaches

Optimization using Ant Colony Algorithm

Data Mining for Knowledge Management. Association Rules

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: [35] [Rana, 3(12): December, 2014] ISSN:

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Research and application on the Data mining technology in medical information systems. Jinhai Zhang

AN EFFECTIVE WAY OF MINING HIGH UTILITY ITEMSETS FROM LARGE TRANSACTIONAL DATABASES

Available online at ScienceDirect. Procedia Computer Science 45 (2015 )

Frequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar

Chapter 7: Frequent Itemsets and Association Rules

Chapter 4 Data Mining A Short Introduction

Novel Techniques to Reduce Search Space in Multiple Minimum Supports-Based Frequent Pattern Mining Algorithms

DATA MINING II - 1DL460

Design and Realization of Data Mining System based on Web HE Defu1, a

Medical Data Mining Based on Association Rules

Temporal Weighted Association Rule Mining for Classification

APRIORI ALGORITHM FOR MINING FREQUENT ITEMSETS A REVIEW

Efficient Algorithm for Frequent Itemset Generation in Big Data

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

Chapter 4: Mining Frequent Patterns, Associations and Correlations

Association Pattern Mining. Lijun Zhang

Frequent Itemset Mining of Market Basket Data using K-Apriori Algorithm

Induction of Association Rules: Apriori Implementation

An Effectual Approach to Swelling the Selling Methodology in Market Basket Analysis using FP Growth

Association Rule Mining from XML Data

Parallel Trie-based Frequent Itemset Mining on Graphics Processors

Combined Intra-Inter transaction based approach for mining Association among the Sectors in Indian Stock Market

MEIT: Memory Efficient Itemset Tree for Targeted Association Rule Mining

Proxy Server Systems Improvement Using Frequent Itemset Pattern-Based Techniques

Implementation of Data Mining for Vehicle Theft Detection using Android Application

An Algorithm of Association Rule Based on Cloud Computing

ANALYZING CHARACTERISTICS OF PC CLUSTER CONSOLIDATED WITH IP-SAN USING DATA-INTENSIVE APPLICATIONS

Discovery of Frequent Itemset and Promising Frequent Itemset Using Incremental Association Rule Mining Over Stream Data Mining

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm

Association Rule Mining

A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining

The Establishment of Large Data Mining Platform Based on Cloud Computing. Wei CAI

Transcription:

International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015) The Improved Apriori Algorithm was Applied in the System of Elective Courses in Colleges and Universities and Analysis of Carbon Emissions She Wei 1,a,Tan Yuyin 2,b, Xie Huijuan 3,c 1 Hainan University, 570228, Haikou, China 2 Hainan University, 570228, Haikou, China 3 Hainan College of Economics and Business, 571127, Haikou, China a 694394619@qq.com, b 26473428@qq.com, c 360088369@qq.com Keywords: Data Mining, Apriori Algorithm, Professional Elective System, low-carbon Abstract. Apriori algorithm is one of classical algorithms in data mining, can mine frequent item sets what the association rules needed. A classic application is shopping analysis in the supermarket, such as "beer" and "diaper". Aiming at the system of professional elective courses in the colleges and universities, this paper proposes a modified Apriori algorithm that can analyze what the combination of each professional elective course that influenced the employment can provide basis for decision making for the college professional elective course system and provide some suggestions for the employment of students. At the same time, the association rules can also provide the analysis of carbon emissions in the environmental protection, and provide decisions to the low-carbon for China's government. Introduction Data mining was also translated into two different meanings. It is one of steps in the Knowledge-Discovery in Databases (KDD). Data mining generally refers to searching for hidden information by the algorithm from a great deal data. Data mining usually achieves the above objectives through many methods, such as the Statistics, Online Analysis Processing Information Retrieval, Machine Learning, Expert System (depending on the old rules of thumb) and Pattern Recognition. In the data mining model, association rules model is a wide application. The concept of association rule (a simple and useful rule) was proposed by Agrawal, Imielinski, Swami. The association rule reflects the unknown association relationship among each data item in the database, discovering frequent item sets is the core technology of the association rules in data mining. Apriori algorithm was not only an widely used algorithm in the data mining but also was inefficient because of scanning the database. This paper proposes an improved Apriori algorithm that could analyze for the combination of college professional elective course and employment, so as to find out the professional elective association between course combination and employment, the results can provide some suggestion for professional elective course system and employment in the colleges and universities. Classic Apriori Algorithm Apriori algorithm that was a breadth first algorithm based on repeatedly scanning the database to find all the frequent item sets, each scan only considered the same length of all items [1]. Apriori algorithm generated frequent item sets with searching layer by layer and iterative method [2]. First, scanning the transaction database D, 1- frequent item sets L1 would be found in database D by the user minimum support degree, then 2- candidate item sets C2 would be generated by L1 connecting operations, 2- frequent item sets L2 would be found from C2 by rescanning the transaction database D, and so on, it would not be end until no more k- frequent item sets or the candidate sets were empty. 2015. The authors - Published by Atlantis Press 744

Improved Apriori Algorithms The common and improved Apriori algorithm as follows: width first algorithm, depth first algorithm, the data set partitioning algorithm, sampling algorithm, incremental updating algorithm and parallel algorithm for mining [3]. the FP-growth algorithm that was one of improved depth first algorithms in the Apriori algorithm according to professional selective courses system in university and the data of employment could analyze and process the transaction database and find the frequent item sets. The implementation of FP-growth algorithm: Step 1: input transaction database D, and set to the minimum support value min_sup. Step 2: 1- frequent item sets G was found according to the first scanning the transaction database D, the frequent item table L1 including item-name domain and pointer domain that was the first node pointing to FP-tree and had the same item-name. Pointer could not point to the child nodes of FP-tree root node, because it didn t have a prefix when it was a suffix in the inverse the traversal of FP-tree. In other words, 1-frequent item sets mostly had no significance. Step 3: FP-tree was generated by the second scanning the transaction database D. Create FP-tree roots and set null, item-names of each transaction T in the transaction database D were added to FP-tree in sequence. Each of FP-tree nodes beside root node included three domains: item-name, count and link. The specific operation as follows: First, scanning the first transaction T1, the items in the transaction would be arranged in the order by L1, the first frequent item i1 as the child node of root node was inserted into FP-tree, the item-name and the count was separately set i1 and 1; the second frequent item i2 as the child node of the node i1 was inserted into FP-tree, the item-name and the count was separately set i2 and 2; the third frequent item i3 as the child node of node i2 was inserted into FP-tree, the item-name and the count was separately set i3 and 1, the pointer of i3 in the L1 pointed to the node, in this transaction other frequent items were inserted into FP-tree in turn. Second, the items in the transaction would be arranged in the order by L1, the first frequent item j1 was inserted into FP-tree. First, if the children N1 and j1 had the same name among the root nodes in the FP-tree, the value of N1 was added 1; second, a new node was created as the child node of root node, and the item-name and the count was separately set j1 and 1. The second frequent item j1 was inserted into FP-tree, In the first case, if the children N2 and j2 in j1 had the same name in the FP-tree, the value of N2 was added 1; when the second case happened in j1, a new node would be created as the child node of j1, and the item-name and the count were separately set j2 and 1. If the pointer in the corresponding J2 in L1 was null, the pointer was pointed to this node, or the value of the pointer was saved in the link of this node, and then the pointer would point to the node. Other frequent items in the transaction were inserted into the FP-tree as the node j2 did. Last, scanning the other affairs T K, referencing to the method of scanning the second transaction T2. FP-tree would be structured after scanning the whole transaction. Step 4: mining the FP-tree by the bottom-up way. The branch structural mode base would be found out by count domain and link domain in the table L1, FP-tree would be constructed and the frequent patterns were generated [4]. The FP-growth algorithm did not generate candidate item sets, scanned the transaction database only twice, greatly reduced frequency of classical Apriori algorithm that scanning the transaction database, so avoided to produce a mass of candidate item sets, effectively improve the operation efficiency. At the same time, because the number of similar professional elective professional courses system was considerably less than the number of item that Apriori algorithm was applied in other fields, and the number of courses that allowed students to take was limited, so the long-branch in the FP-tree number was effectively avoided. 745

The Results of Testing Algorithms In theory, the efficiency of the FP-growth algorithm was more effective than classic Apriori algorithm. The two algorithms were compared with java language in the experiment. The experimental environment as follows: CPU was the processor (R) Pentium (R)CPU G2010 @ 2.80GHz, memory was 4GB (3.41GB was enough), the operating system was Windows 7ultimate, 32 bit. They were tested using the Eclipse software. This paper provided 7228 affairs, including 112 attributes, the support degree of them respectively was 0.1, 0.2, 0.3, 0.4, and 0.5, and the test results showed the efficiency of the FP-growth algorithm was more effective than classic Apriori algorithm, especially under the condition of small supporting degree. The results were shown in fig 1. The Process of Application Fig 1 Experiments and Results The paper mainly used the association rules, the hidden and potential regular pattern and the relevant item sets could be found by the mining data of employment and professional elective courses before the graduation, could be good for the system of the professional elective courses and employment. In this paper the improved Apriori algorithm can also be applied in the prediction and analysis of carbon emissions, the association rules those were found can predict how to increase or decrease the carbon in environmental protection, and provide valid decision-makings for the nation development. Data Collection, Data arrangement and Collation and Establishing a Transaction Database Data Collection.At present, the university has managed the employment by using the information management graduates, some colleges and universities have input the employment information into the database, but there are a large number of universities without the database electronic information system. Max Institute is good for collecting the information of tracking evaluation of employed, students and employer [5]. In this paper, the employment data for the students who had graduated for half or half a year was received by questionnaire that asked students to write whether they had been employed, whether they were professional counterparts, how they got jobs, whether they were satisfied with jobs and the prospects of their occupation Data Arrangement.The electronic questionnaire received should be rearranged, including excluding the unfit data and merging the courses with the different names but the same essence into a specialty elective course. 746

The Establishment of the Transaction Database.The transaction database with employment and professional selective courses table would be generated by connecting the table inputted information of the electronic questionnaire and graduate list in the dean s office. The Main Association Rules The valid data, that was from related accounting graduates between half a year and 5 years, was tested. The condition was as follows: they were employed, generally interested in the specialty, employed professionally, were candidates for the posts, employment satisfaction and career prospects were ignored. The supporting degree was set 0.15, the length of discovered rule was 3, and the discovered rule is 6, as shown in Table1. Table1 Association Rule and Support Serial Number Association Rule Support 1 Financial Applied Writing- Public Relation and Etiquette - Accounting Information Application 29% 2 International Trade-International Accounting - Financial Engineering 23% 3 Public Relation and Etiquette - Marketing Public Management 18% 4 Financial Applied Writing- International Trade- Capital Management Practice 16% 5 Financial Applied Writing- Public Relation and Etiquette - Tax Planning 15% 6 Finance- Management Consulting- Public Management 15% The courses on practical skills were good for employment, which could be seen from the results of the association rules, and the employment would not be less affected by the strong theoretical course. The association rules can promote the investment on the related course in college and university, and recommended students electing related courses. Summary This paper introduced the Apriori algorithm and FP-growth algorithm in the data mining, and combined the association rules application to the elective course system and employment. Frequent item sets could provide the decision basis for the professional elective course system, and provide some suggestions for the employment. There is important to employment that is becoming increasingly difficult. The improved Apriori algorithm can also be applied in the prediction and analysis of carbon emissions, the association rules can also provide available decisions for decreasing 40-50% carbon the nation development. The Project: The Statistical Accounting System Research And the Ability to Construction on Climate in Hainan Province,2013112 References [1] Luo,K.&He,C.W. The Extraction Algorithm of Improved Association Rules Based On Apriori [J]. Computer and Digital Engineering,2006,(2): 48-49. [2]Yang,J.F.&Liu,F. A new Improved Apriori Algorithm [J].Microcomputer and Application,2010, (1): 55-57. [3] Yang,Q. Improvement of of Algorithm of [J].Computer Knowledge and Technology,2013,(9): 2037-2039. 747

[4] Wang,A.P.&Wang,Z.F.Use of Association Rules for Collapsing Algorithm in the Data Mining [J]. Technology and Development of the Computer,2010,(4): 105-108. [5]Jiang,S.Y,&Li,X.Theory and Practice of the Data Mining[M].Beijing:Electronic Industry Press,2011:30-50. 748