SQL Model in Language Encapsulation and Compression Technique for Association Rules Mining

Similar documents
Improved Frequent Pattern Mining Algorithm with Indexing

Mining of Web Server Logs using Extended Apriori Algorithm

An Improved Apriori Algorithm for Association Rules

Optimization using Ant Colony Algorithm

Tutorial on Association Rule Mining

Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree

620 HUANG Liusheng, CHEN Huaping et al. Vol.15 this itemset. Itemsets that have minimum support (minsup) are called large itemsets, and all the others

Performance Based Study of Association Rule Algorithms On Voter DB

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

A mining method for tracking changes in temporal association rules from an encoded database

A Decremental Algorithm for Maintaining Frequent Itemsets in Dynamic Databases *

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining

Data Structure for Association Rule Mining: T-Trees and P-Trees

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm

Mining High Average-Utility Itemsets

Association Rule Mining. Entscheidungsunterstützungssysteme

Appropriate Item Partition for Improving the Mining Performance

An Algorithm for Frequent Pattern Mining Based On Apriori

CS570 Introduction to Data Mining

Data Structures. Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali Association Rules: Basic Concepts and Application

Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

Mining Frequent Patterns with Counting Inference at Multiple Levels

Data Mining Part 3. Associations Rules

Using Association Rules for Better Treatment of Missing Values

Fast Discovery of Sequential Patterns Using Materialized Data Mining Views

DATA MINING II - 1DL460

Temporal Weighted Association Rule Mining for Classification

A Data Mining Framework for Extracting Product Sales Patterns in Retail Store Transactions Using Association Rules: A Case Study

Generating Cross level Rules: An automated approach

Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules

Research Article Apriori Association Rule Algorithms using VMware Environment

An Algorithm for Mining Frequent Itemsets from Library Big Data

A Conflict-Based Confidence Measure for Associative Classification

Performance Evaluation for Frequent Pattern mining Algorithm

Mining Quantitative Association Rules on Overlapped Intervals

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Data Mining for Knowledge Management. Association Rules

An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets

DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE

ISSN Vol.03,Issue.09 May-2014, Pages:

Materialized Data Mining Views *

An Improved Technique for Frequent Itemset Mining

Discovering interesting rules from financial data

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set

An Efficient Algorithm for finding high utility itemsets from online sell

International Journal of Computer Trends and Technology (IJCTT) volume 27 Number 2 September 2015

Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm

FIT: A Fast Algorithm for Discovering Frequent Itemsets in Large Databases Sanguthevar Rajasekaran

Structure of Association Rule Classifiers: a Review

Comparing the Performance of Frequent Itemsets Mining Algorithms

An Overview of various methodologies used in Data set Preparation for Data mining Analysis

Discovery of Multi Dimensional Quantitative Closed Association Rules by Attributes Range Method

A Modern Search Technique for Frequent Itemset using FP Tree

EFFICIENT ALGORITHM FOR MINING FREQUENT ITEMSETS USING CLUSTERING TECHNIQUES

Mining Spatial Gene Expression Data Using Association Rules

An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction

A Novel method for Frequent Pattern Mining

Association Rule Learning

Chapter 4: Mining Frequent Patterns, Associations and Correlations

ON-LINE GENERATION OF ASSOCIATION RULES USING INVERTED FILE INDEXING AND COMPRESSION

Memory issues in frequent itemset mining

ASSOCIATION RULE MINING: MARKET BASKET ANALYSIS OF A GROCERY STORE

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

Medical Data Mining Based on Association Rules

Application of Web Mining with XML Data using XQuery

Combinatorial Approach of Associative Classification

Mining Frequent Itemsets for data streams over Weighted Sliding Windows

Sequential Data. COMP 527 Data Mining Danushka Bollegala

AN ENHANCED SEMI-APRIORI ALGORITHM FOR MINING ASSOCIATION RULES

Product presentations can be more intelligently planned

Hierarchical Online Mining for Associative Rules

Association mining rules

Sensitive Rule Hiding and InFrequent Filtration through Binary Search Method

Introducing Partial Matching Approach in Association Rules for Better Treatment of Missing Values

Mining Association Rules in Large Databases

Iliya Mitov 1, Krassimira Ivanova 1, Benoit Depaire 2, Koen Vanhoof 2

A Novel Algorithm for Associative Classification

Efficient Algorithm for Frequent Itemset Generation in Big Data

Optimized Frequent Pattern Mining for Classified Data Sets

A SHORTEST ANTECEDENT SET ALGORITHM FOR MINING ASSOCIATION RULES*

Research and Improvement of Apriori Algorithm Based on Hadoop

Data Mining: Mining Association Rules. Definitions. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

Salah Alghyaline, Jun-Wei Hsieh, and Jim Z. C. Lai

Improved Algorithm for Frequent Item sets Mining Based on Apriori and FP-Tree

A Further Study in the Data Partitioning Approach for Frequent Itemsets Mining

H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases. Paper s goals. H-mine characteristics. Why a new algorithm?

A Literature Review of Modern Association Rule Mining Techniques

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6

Association Rule Mining from XML Data

IMPROVING APRIORI ALGORITHM USING PAFI AND TDFI

STUDY ON FREQUENT PATTEREN GROWTH ALGORITHM WITHOUT CANDIDATE KEY GENERATION IN DATABASES

A Comparative study of CARM and BBT Algorithm for Generation of Association Rules

MINING ASSOCIATION RULE FOR HORIZONTALLY PARTITIONED DATABASES USING CK SECURE SUM TECHNIQUE

A NEW ASSOCIATION RULE MINING BASED ON FREQUENT ITEM SET

PTclose: A novel algorithm for generation of closed frequent itemsets from dense and sparse datasets

Association Rule with Frequent Pattern Growth. Algorithm for Frequent Item Sets Mining

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

Transcription:

SQL Model in Language Encapsulation and Compression Technique for Association Rules Mining 1 Somboon Anekritmongkol, 2 Kulthon Kasemsan 1, Faculty of information Technology, Rangsit University, Pathumtani 12000, Thailand, somboon_a@hotmail.com 2 Faculty of information Technology, Rangsit University, Pathumtani 12000, Thailand, kasemsan@rangsit.rsu.ac.th Abstract Discovery of association rules is an important for Data mining. One of the most famous association rule learning algorithms is Apriori rule. Apriori algorithm is one of algorithms for generation of association rules. The drawback of Apriori Rule algorithm is the number of time to read data in the database equal number of each candidate is generate. Many research papers have been published trying to reduce the amount of time needed to read data from the database. In this paper, we propose a new algorithm that will work rapidly and without frequency tree or temporary candidate itemsets in RAM or Hard disk. SQL Model in Language Encapsulation and Compression Technique for Association Rules Mining (SMILE-ARM). This algorithm will generate candidates are greater than minimum support on-the-fly by SQL. This algorithm is based on two major ideas. Firstly, compress data. Secondly, generation of candidate itemsets on-the-fly by SQL. Based on the experimental results, an increase in the number of transactions or the number of items did not affect the speed at which candidates were generated by this algorithm. The construction method of SQL Model in Language Encapsulation and Compression Technique for Association Rules Mining (SMILE-ARM) technique has twenty times higher mining efficiency in execution time than Apriori Rule. 1. Introduction Keywords: Data Mining, Association Rule, Apriori Rule, Frequent Itemset Data mining is the process of extracting patterns from data. Data mining is seen as an increasingly important tool by modern business to transform data into an informational advantage. Association rule mining is searches for recurring relationships in a database. One of the most popular technique in association rule mining is Apriori rule [1][2]. Association rule mining is usually associated with huge information. Association rules exhaustively look for hidden patterns, making them suitable for discovering predictive rules involving subsets of data set attributes. Association rule learners are used to discover elements that co-occur frequently within a data set consisting of multiple independent selections of elements (such as purchasing transactions), and to discover rules. In my point of view, Firstly, most of information in data set is same pattern. Secondly, amount of time to read the whole database. Thirdly, the pruning candidate in each step of process. This paper proposes the development of algorithm to discover association rules from large amounts of information that is faster than Apriori rule by using SQL Model in Language Encapsulation and Compression Technique for Association Rules Mining(SMILE-ARM). The improvement focuses on compress data and improve performance without generate temporary frequency of Itemsets and reducing the number of times to read data from the database 2. Basic in Association Rule Let D = {T 1, T 2,...,T n } [2] be a set of n transactions and let I be a set of items, I = {I 1, I 2,..., I m }. Each transaction is a set of items, i.e. Ti I. An association rule is an implication of the form X Y, where X, Y I, and X Y = ; X is called the antecedent and Y is called the consequent of the rule. In general, a set of items, such as X or Y, is called an itemset. In this work, a transaction record transformed into a binary format where only positive binary values are included as items. This is done for efficiency purposes because transactions represent sparse binary vectors. Let P(X) be the probability of appearance of itemset X in D and let P(Y X) be International Journal of Information Processing and Management(IJIPM) Volume 4, Number 1, January 2013 doi:10.4156/ijipm.vol4.issue1.9 65

the conditional probability of appearance of itemset Y given itemset X appears. For an itemset X I, support(x) is defined as the fraction of transactions Ti D such that X Ti. That is, P(X) = support(x). The support of a rule X Y is defined as support(x Y) = P(X Y). An association rule X Y has a measure of reliability called confidence (X Y) defined as P(Y X) = P(X Y)/P(X) = support(x Y)/support(X). The standard problem of mining association rules [1] is to find all rules whose metrics are equal to or greater than some specified minimum support and minimum confidence thresholds. A k-itemset with support above the minimum threshold is called frequent. We use a third significance metric for association rules called lift, lift(x Y) = P(Y X)/P(Y) = confidence (X Y)/support(Y). Lift quantifies the predictive power of X Y; we are interested in rules such that lift(x Y) > 1. 3. Apriori rule Apriori is an algorithm proposed by R. Agrwal and R. Srikant in 1994. Apriori rule employs an iterative approach know as a level-wise search, where k-itemsets are used to explore (k+1)-itemsets. First, the set of frequency 1-itemsets is found by scanning the database to accumulate the count of each time and collecting those items satisfy minimum support. The resulting set is L 1. Next L 1 used to find the set of frequency 2-itemsets, which is used to find, and so on, until no more frequency k-itemsets can be found. The finding of each L k requires one full scan of database. Algorithm: Apriori rule. Find frequent itemsets using an iterative level-wide approach based on candidate generation. Input: 1. D, a database of transaction; 2. min_sup, The minimum support count threshold. Output: L, frequent itemsets in D Method: (1) L1 = Find_Frequent_1-Itemset(D); (2) for (k=2;l k-1 ¹ 0;k++){ (3) C k = apriori_gen(l K -1); (4) for each transaction TÎD { // scan D for count (5) C t = subset(c k,t);// Get subset of t that are candidate (6) for each candidate cî C t (7) c.count++; (8) } (9) L k = {cîc k c.count ³ min_sup} (10) } (11) Return L = È K L K ; Procedure apriori_gen(l k-1 frequent (k-1)-itemsets) (1) for each itemset l 1 Î L k-1 (2) for each itemset l 2 ÎL k-1 (3) if(l 1 [1] = l 2 [1]^(l 1 [2]=l 2 [2]^ ^(l 1 [k-2]= l 2 [k-2])^( l 1 [k-1])< l 2 [k-1] then { (4) C= l 1 >< l 2 ; // join step : generate candidates (5) if has_infrequent_subset(c,l k-1 ) the (6) delete c; // prune step : remove unfruitful candidate (7) else add c to C k ; (8) } (9) return C k ; Procedure has_infrequent_subset(c:candidate k-itemset; L k-1 : frequent(k-1)-itemsets); // use prior knowledge (1) for each (k-1)-subset s of c (2) if s Ï L k-1 then (3) return True; (4) return False; (5) 66

4. Boolean Algebra Boolean algebra, developed in 1854 by George Boole in his book An Investigation of the Laws of Thought. Some operations of ordinary algebra, in particular multiplication xy, addition x+y, and negation -x, have their counterparts in Boolean algebra, respectively the Boolean Operations AND, OR, and NOT also called conjunction x y, disjunction x y, and negation or complement x sometime!x. Some authors use instead the same arithmetic operations as ordinary algebra reinterpreted for Boolean algebra, treating xy as synonymous with x y and x+y with x y. Figure 1. Logic Gate 5. Structured Query Language(SQL) SQL was developed at IBM by Donald D. Chamberlin and Raymond F.Boyce in the early 1970. SQL was designed to manipulate and retrieve data stored in IBM's original quasi-relational database management system, System R, which a group at IBM San Jose Research Laboratory had developed during the 1970. SQL is a database computer language designed for managing data in relational database management systems (RDBMS). The Structured Query Language (SQL) defines the methods used to create and manipulate relational databases on all major platforms. SQL Command SELECT INSERT UPDATE DELETE CREATE object ALTER object DROP object Table 1. Basic SQL Command Description Retrieves data from a table. Add new Data to a table. Modifies existing data in a table. Removes existing data from a table. Create a new database object Modifies the structure of an object. Removes an existing database object. SQL query syntax Select [Distinct] <column-name(s), arithmetic expression> From <table-name(s)> [Where <condition>] [Group by <column-name(s)>] [Having <condition>] [Order by <column-name(s)> [ASC/DESC]] SQL Insert syntax Insert into <table-name> [(column-name-1, column-name-2, )] Value (<value-1,value-2, >) 6. SQL Model in Language Encapsulation and Compression Technique for Association Rules Mining(SMILE-ARM) An Algorithm of SMILE-ARM, The step of algorithm is 1. Find frequent itemsets minimum support. 2. Compress data by create a new structure are same pattern of transactions. 3. Generate Candidate by SQL from candidate 2-itemsets until k-itemsets by generate candidate based on actual data in pattern transactions. SMILE_ARM is able to generate any candidate itemsets without previous 67

candidate such as create candidate 4-itemsets do not need to create candidate 3-itemsets or candidate 2- itemsets. SMILE-ARM is able to apply to do classification. Algorithm: Pseudo-code of SMILE-ARM. Tid_cl, a table of transaction Tid_Pattern, a table contain Pattern Data Final_Candidate, the result of candidates Min_sup Tid_Item, Itemset Tid, transaction ID Min_sup, Minimum support count all transactions Procedure Find L1 (1) for each itemset count Tk Tid_cl; (2) Delete Tk < Min_sup; (3) add to Final_Candidate; Procedure Compress Structure (1) for each Tid, Tidk L1 { (2) if Tid <> Tid_Patternk then (3) Create Tid_Patternk;\\ Create Pattern; (4) else (5) Tid_Patternk++; (6) } Procedure SQL_Command(k) (1) Insert into Final_Candidate (Candidate, Count_Items, xdim) (2) Select Tk.Tid_Item+Tk-1.Tid_Item+ +T1.Tid_Item, sum(t1.feq),k (3) From Tid_Pattern as T1, Tid_Pattern as T2,,Tid_Pattern as Tk (4) Where (T1.Tid_Item < T2.Tid_Item and T2.Tid_Item < T3.Tid_Item and Tk-1.Tid_Item < Tk.Tid_Item ) and (T1.Tid=T2.Tid and T2.Tid=T3.Tid and Tk-1.Tid =Tk.Tid) (5) Group by Tk.Tid_item, Tk-1.Tid_item,, T1.Tid_item (6) Having sum(t1.feq) >= min_sup Procedure Generate Associate Data (1) Find L1; (2) Compress Structure; (3) for each (k=2; k = Max_Itemk; k++){ (4) SQL_Model(k); (5) } Example: Minimum support 10 percent = 1.5 Table 2. Transaction Data Trans# Item Trans# Item Trans# Item T001 T001 T001 T002 T002 T003 T003 T004 T004 T004 T004 T005 T005 I5 I4 I4 T006 T006 T007 T007 T008 T008 T008 T009 T009 T009 T010 T010 T010 I4 T011 T011 T011 T012 T012 T013 T013 T013 T014 T014 T014 T015 T015 I4 68

Step 1: Find L1 The algorithm scans all of transactions in order to find frequency item of each itemsets and remove frequency item less than minimum support. Eliminate {I5} lower Minimum Support Table 3. Support Count Itemsets Support count {} 9 {} 14 {} 11 {I4} 4 {I5} 1 Table 4. Result of candidate-1 Itemset Itemsets Support count {} 9 {} 14 {} 11 {I4} 4 Step 2: Compress Data Count transactions are same items such as T003 = {, }, T006 = {, }, T007 = {, }, T012 = {, } and T015 = {, } or {, } = {(T003), (T006), (T007), (T012), (T015)} = 5. Create pattern is same structure such as T001 create Pattern-1, T002 Pattern-2, {T003, T006, T007, T012, T015} Pattern-3, T004 Pattern-4, {T010, T013 } Pattern-5, T005 Pattern-6 and {T008, T009, T011, T014} Pattern-7. Table 5. Data Compression I4 Count Pattern-1 X X 1 Pattern-2 X X 1 Pattern-3 X X 5 Pattern-4 X X X X 1 Pattern-5 X X X 2 Pattern-6 X X 1 Pattern-7 X X X 4 Step 3: Find Candidate 2-Itemsets to k-itemsets To discover the set of frequency 2-itemsets to k-itemsets by SQL Model, Generate candidate from Pattern table. SQL Command for candidate 2-Itemsets. Insert into CL_Final (Candidate,Count_Items,xdim) Select T2.Tid_Item+T1.Tid_Item, sum(t1.feq),2 From Tid_Pattern as T1, Tid_Pattern as T2 Where (T1.Tid_Item < T2.Tid_Item) and (T1.Tid=T2.Tid) Group by T2.Tid_item, T1.Tid_item Having sum (T1.Feq) >= Min_Support 69

Figure 2. Generate Candidate K-Itemsets Table 6. Result of candidate=2 Itemsets Pattern Data Count Final Candidate Candiates xdim 1 {,} 1 {,} 8 2 2 {,I4} 1 {,} 6 2 3 {,} 5 {,I4} 3 2 4 {,,,I4} 1 {,} 10 2 5 {,,I4} 2 {,I4} 4 2 6 {,} 1 7 {,,} 4 SQL Command for candidate 3-itemsets. Insert into Final_Candidate (Candidate,Count_Items,3) Select T3.Tid_Item +T2.Tid_Item+T1.Tid_Item, sum(t1.feq),2 From Tid_Pattern as T1, Tid_Pattern as T2, Tid_Pattern as T3 Where (T1.Tid_Item < T2.Tid_Item and T2.Tid_Item < T3.Tid_Item) and (T1.Tid=T2.Tid and T2.Tid=T3.Tid) Group by T 3.Tid_item, T 2.Tid_item, T 1.Tid_item Having sum (T1.Feq) >= Min_Support Attribute of <xdim> is for control to generate candidates on each step. Table 7. Generate from Candidate-3 Itemsets Pattern Data Count Final Candidate Candidate xdim 1 {,} 1 {,,} 5 3 2 {,I4} 1 {,,I4} 3 3 3 {,} 5 4 {,,,I4} 1 5 {,,I4} 2 6 {,} 1 7 {,,} 4 Final Result: Candidate-Itemsets ³ Min_Support 70

Itemsets {} {} {} {I4} {,} {,} {,I4} {,} {,I4} {,,} {,,I4} Table 8. Final Result Final Candidate 9 14 11 4 8 6 3 10 4 5 3 7. Experimental results In this section, we performed a set of experiments to evaluate the effectiveness of SMILE-ARM. The experiment dataset consists of two kinds of data. First, data from Phranakorn Yontrakarn Co., Ltd. This company sales and offer car services to discover association data. Second, generate sampling data. The experiment of four criteria, Firstly, Increase amount of records from 10,000 to 50,000 records and fixed 10 items (Figure3, Figure4). Secondly, increase of item and fixed amount of records = 50,000 (Figure5, Figure6). Thirdly, increase amount of records from 10,000 to 190,000 records (Figure7). Fourthly, decrease of minimum support and fixed items and amount of records (Figure8). Experiment 1: Increase number of records. Step 10,000 records. Fixed 10 items. Compare Apriori Rule with SQL Model in Language Encapsulation and Compression Technique for Association Rules Mining (SMILE-ARM). Apriori rule, with increasing amount of record will take longer time. SMILE- ARM has 35 times higher mining efficiency in execution time than Apriori Rule (10 Itemsets, 50,000 records). Figure 3. Data from Phranakorn Yontrakarn Co., Ltd. Itemsets=10, Increase number of records 71

Figure 4. Sampling Data, Itemsets =10, Increase number of records Experiment 2: Increase items. Fixed number of records 50,000 records. Compare Apriori Rule with SQL Model in Language Encapsulation and Compression Technique for Association Rules Mining (SMILE-ARM). Apriori rule, with increasing itemsets will take longer time. SMILE-ARM has 30 times higher mining efficiency in execution time than Apriori Rule (30 Itemsets, 50,000 records). That it mean amount of records will be slightly affected SMILE-ARM but Apriori rule performance takes a lot of time. Figure 5. Data from Phranakorn Yontrakarn, Increase Itemsets, Fixed number of records = 50,000 records Figure 6. Sampling Data, Number of records=50,000 records, Increase Itemsets 72

Experiment 3: Increase number of records from 10,000 to 190,000 records. Compare Apriori Rule with SQL Model in Language Encapsulation and Compression Technique for Association Rules Mining (SMILE-ARM). Real-life data from Phranakorn Yontrakarn Co., Ltd. shows effective performance between Apriori rule and SMILE-ARM as 190,000 records, Apriori rule takes 238 seconds but SMILE-ARM takes only 6 seconds on processing time. SMILE-ARM has 39 times higher mining efficiency in execution time than Apriori Rule (10 Itemsets, 190,000 records). Figure 7. Data from Phranakorn Yontrakarn Co., Ltd. Increase number of recoords and Increase Itemsets Experiment 4: Decrease of minimum support from 60% to 10%. The step to change minimum support, Apriori rule low minimum support takes time to process 464 sec. but SMILE-ARM takes time to process 3 sec. SMILE-ARM has 154 times higher mining efficiency in execution time than Apriori Rule (10 Itemsets, 20,000 records, density 80%). SMILE-ARM slightly affects the performance because SMILE-ARM compresses data, SQL is generate candidate direct to database without temporary candidates. Figure 8. Performance of Apriori Rule and SMILE-ARM Table 9. Number of candidates Minimum Support (%) Number of Candidates 10 1023 20 1012 30 847 40 509 50 175 60 56 The result of experiments, SQL Model in Language Encapsulation and Compression Technique for Association Rules Mining (SMILE-ARM) discovers an association is faster than 73

Apriori rule. If increasing the number of records, Apriori rule will take time to read the whole data. If increasing the number of items, Apriori rule will create more candidates depending on the number of items but SQL Model in Language Encapsulation and Compression Technique for Association Rules Mining (SMILE-ARM) takes shorter time because it will compress data and SQL is generate candidate on-the-fly direct to database without temporary candidates. 8. Conclusion The paper proposes a new association rule mining theoretic models and designs a new algorithm based on established theories. SQL Model in Language Encapsulation and Compression Technique for Association Rules Mining (SMILE-ARM) is compresses data and processes only the actual data by SQL command. SQL is very effective in terms of performance. SMILE-ARM is able to discover data more than twenty times faster than Apriori rule. From the experiments, we found that the number of records and number of Items would not affect the performance of SQL Model in Language Encapsulation and Compression Technique for Association Rules Mining (SMILE-ARM). The results is a especially amplified in itemsets from experiment 4 where we used data from 10,000 records, 10 Items and density 80%. SMILE-ARM has proven to have the best performance. We believe that SMILE-ARM is an important way to find Association data or Data mining and the best of algorithm for classification. 9. References [1] Agrawal R., Imielinski T. and Swami A., Mining association rules between sets of items in large databases, In Proceedings of ACM SIGMOD International Conference on Management of Data Washington D.C., USA, pp. 207 216, 1993. [2] Ramakrishnan Srikant and Rakesh Agawal, Mining Generalized Association Rules, In Proceedings of 21th VLDB Conference Zurich, Swizerland, pp. 407-419, 1995. [3] Fukuda, T., Morimoto, Y. Morishita, S. and Tokuyama T., Data Mining using Two- Dimensional Optimized Association Rules Scheme, Algorithms, and Visualization, In Proceedings of ACM - SIGMOD International Conference on the Management of Data, pp. 13-23, 1996. [4] M. Zaki, S. Parthasarathy, M. Ogihara and W. Li. "New Algorithms for Fast Discovery of Association Rules". 3rd International Conference on Knowledge Discovery and Data, Mining KDD 97, Newport Beach, CA, 283 296, 1997. [5] Christian Borgelt, Simple Algorithms for Frequent Item Set Mining, Springer-Verlag, Berlin, Germany, pp.351-369, 2010. [6] Zuling Chen and Guoqing Chen, Building an Association Classifier Based on Fuzzy Association Rules, International Journal of Computational Intelligence Systems, Vol. 1-3, pp. 262-272, 2008. [7] HUANG Liusheng, CHEN Huaping, WANG Xun, CHEN Guoliang, A Fast Algorithm for Mining Association Rules. Journal of Computer Science and Technology, Vol.15, pp. 619-624, 2000. [8] DU XiaoPing, TANG SgiWei and Akifumi Makinouchi, Maintaining Discovered Frequent Itemsets: Cases for Changeable Database and Support, Journal of Computer Science and Technology, Vol.18, pp. 648-658, 2003. [9] S.Prakash and R.M.S.Parvathi, An Enhanced Scaling Apriori for Association Rule Mining Efficiency. European Journal of Scientific Research, Vol.39, pp.257-264, 2010. [10] LI Qingzhong, WANG Haiyang, YAN Zhongmin and MA Shaohan, Efficient Mining of Association Rules by Reducing the Number of Passes over the Database, Journal of Computer Science and Technology, Vol.16, pp. 182-188, 2001. [11] Somboon Anekritmongkol and M.L. Kulthon Kasemsan, The Comparative of Boolean Algebra Compress and Apriori Rule Techniques for New Theoretic Association Rule Mining Model, International Journal of Advancements in Computing Technology, Vol. 3 No. 1, pp. 58-67, 2011. 74

[12] Shui Wang, Le Wang, "An Implementation of FP-Growth Algorithm Based on High Level Data Structures of Weka-JUNG Framework", Journal of JCIT, Vol. 5, No. 9, pp. 287-294, 2010. [13] S. Roy, D. K. Bhattacharyya, "OPAM: An Efficient One Pass Association Mining Technique without Candidate Generation", Journal of JCIT, Vol. 3, No. 3, pp. 32-38, 2008. 75