The Comparison of CBA Algorithm and CBS Algorithm for Meteorological Data Classification Mohammad Iqbal, Imam Mukhlash, Hanim Maria Astuti

Similar documents
A neural-networks associative classification method for association rule mining

CSE 634/590 Data mining Extra Credit: Classification by Association rules: Example Problem. Muhammad Asiful Islam, SBID:

An Improved Apriori Algorithm for Association Rules

Analysis of a Population of Diabetic Patients Databases in Weka Tool P.Yasodha, M. Kannan

Performance Based Study of Association Rule Algorithms On Voter DB

Data Mining Course Overview

A Hierarchical Document Clustering Approach with Frequent Itemsets

The Fuzzy Search for Association Rules with Interestingness Measure

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 4, Jul Aug 2017

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

What Is Data Mining? CMPT 354: Database I -- Data Mining 2

Adapting Association Rules Mining to task of Classification

Efficiently Mining Positive Correlation Rules

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

AN ADAPTIVE PATTERN GENERATION IN SEQUENTIAL CLASSIFICATION USING FIREFLY ALGORITHM

Temporal Weighted Association Rule Mining for Classification

IMPLEMENTATION AND COMPARATIVE STUDY OF IMPROVED APRIORI ALGORITHM FOR ASSOCIATION PATTERN MINING

Enhanced Associative classification based on incremental mining Algorithm (E-ACIM)

2015 International Conference on Computer, Control, Informatics and Its Applications

Hybrid Feature Selection for Modeling Intrusion Detection Systems

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore

Chapter 1, Introduction

Dynamic Clustering of Data with Modified K-Means Algorithm

Mining Association Rules in Temporal Document Collections

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset

Improved Frequent Pattern Mining Algorithm with Indexing

Categorization of Sequential Data using Associative Classifiers

Optimization using Ant Colony Algorithm

Datasets Size: Effect on Clustering Results

ETP-Mine: An Efficient Method for Mining Transitional Patterns

Conformance Checking Evaluation of Process Discovery Using Modified Alpha++ Miner Algorithm

SENSOR SELECTION SCHEME IN TEMPERATURE WIRELESS SENSOR NETWORK

Integrating Classification and Association Rule Mining

Using Association Rules for Better Treatment of Missing Values

Challenges and Interesting Research Directions in Associative Classification

PERFORMANCE OF FACE RECOGNITION WITH PRE- PROCESSING TECHNIQUES ON ROBUST REGRESSION METHOD

Fuzzy Cognitive Maps application for Webmining

A Conflict-Based Confidence Measure for Associative Classification

A Novel Algorithm for Associative Classification

SCHEME OF COURSE WORK. Data Warehousing and Data mining

Knowledge Discovery and Data Mining

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Mining of Web Server Logs using Extended Apriori Algorithm

A Novel Method for Activity Place Sensing Based on Behavior Pattern Mining Using Crowdsourcing Trajectory Data

QUALITY CONTROL FOR UNMANNED METEOROLOGICAL STATIONS IN MALAYSIAN METEOROLOGICAL DEPARTMENT

COMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

Ontology Alignment using Combined Similarity Method and Matching Method

Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports

Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data

Mining Frequent Itemsets Along with Rare Itemsets Based on Categorical Multiple Minimum Support

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest

Association Rule Mining: FP-Growth

DISCOVERING SEQUENTIAL DISEASE PATTERNS IN MEDICAL DATABASES USING FREESPAN MINING AND PREFIKSPAN MINING APPROACH

Bit Mask Search Algorithm for Trajectory Database Mining

PREDICTION OF POPULAR SMARTPHONE COMPANIES IN THE SOCIETY

Automatic Metadata Generation By Clustering Extracted Representative Keywords From Heterogeneous Sources

A mining method for tracking changes in temporal association rules from an encoded database

Pruning Techniques in Associative Classification: Survey and Comparison

Data Mining Concepts

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

The University of Jordan. Accreditation & Quality Assurance Center. Curriculum for Doctorate Degree

Gurpreet Kaur 1, Naveen Aggarwal 2 1,2

Data Preprocessing Method of Web Usage Mining for Data Cleaning and Identifying User navigational Pattern

Iteration Reduction K Means Clustering Algorithm

AN OPTIMIZATION GENETIC ALGORITHM FOR IMAGE DATABASES IN AGRICULTURE

Reality Mining Via Process Mining

Understanding Rule Behavior through Apriori Algorithm over Social Network Data

A Spatial Point Pattern Analysis to Recognize Fail Bit Patterns in Semiconductor Manufacturing

INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM

Building a Concept Hierarchy from a Distance Matrix

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

CLASSIFICATION OF WEB LOG DATA TO IDENTIFY INTERESTED USERS USING DECISION TREES

A New Technique to Optimize User s Browsing Session using Data Mining

Elena Marchiori Free University Amsterdam, Faculty of Science, Department of Mathematics and Computer Science, Amsterdam, The Netherlands

CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM. Please purchase PDF Split-Merge on to remove this watermark.

An Efficient Algorithm for finding high utility itemsets from online sell

Climate Precipitation Prediction by Neural Network

Privacy-Preserving of Check-in Services in MSNS Based on a Bit Matrix

Research Article Apriori Association Rule Algorithms using VMware Environment

Using EasyMiner API for Financial Data Analysis in the OpenBudgets.eu Project

Rule Pruning in Associative Classification Mining

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH

WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity

Performance Evaluation of Sequential and Parallel Mining of Association Rules using Apriori Algorithms

Measuring and Evaluating Dissimilarity in Data and Pattern Spaces

A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining

Efficient Mining of Top-K Sequential Rules

Structure of Association Rule Classifiers: a Review

An Improved Document Clustering Approach Using Weighted K-Means Algorithm

Chapter 7: Frequent Itemsets and Association Rules

Author(s) Yoshida, Yoshihiro, Yabuki, Nobuy.

Log Information Mining Using Association Rules Technique: A Case Study Of Utusan Education Portal

[Shingarwade, 6(2): February 2019] ISSN DOI /zenodo Impact Factor

A Parallel Evolutionary Algorithm for Discovery of Decision Rules

DALA Project: Digital Archive System for Long Term Access

Silvia Rostianingsih, Gregorius Satia Budhi and Leonita Kumalasari Theresia Petra Christian University,

Transcription:

Information Systems International Conference (ISICO), 2 4 December 2013 The Comparison of CBA Algorithm and CBS Algorithm for Meteorological Data Classification Mohammad Iqbal, Imam Mukhlash, Hanim Maria Astuti Mohammad Iqbal*, Imam Mukhlash*, Hanim Maria Astuti** * Department of Mathematics, Faculty of Mathematics and Science, Institut Teknologi Sepuluh Nopember ** Department of Information Systems, Faculty of Information Technology, Institut Teknologi Sepuluh Nopember Keywords: Meteorological Data Classification Association Sequential pattern Data mining ABSTRACT An increase in the growth of data as a result of the widely use of applications as well as information systems has made data mining an important task in knowledge discovery field of research. Several methods in data mining such as classification has been proposed based on renowned learning methods such as decision tree or neural network. Few studies explored the topic of classification combined with another task in data mining such as association and sequential pattern. Several algorithms that combine classification with other data mining tasks are Classification Based on Associations (CBA) that combines classification with association rules and Classify by Sequence (CBS) algorithm which combines classification with sequential patterns. However, none of studies analyzes the comparison between these two algorithms. In this paper, we explore and compare the performance of CBA algorithm and CBS algorithm in term of accuracy and running time. To do so, we use meteorological data for rain or dry season classification with average temperature, wind speed, relative humidity and air pressure set as our parameters. Based on our experimental evaluation, the CBS algorithm results in high accuracy than of the CBA algorithm. In term of runtime, the CBS algorithm is more efficient than the CBA algorithm. Copyright 2013 Information Systems International Conference. All rights reserved. Corresponding Author: Iqbal, Mohammad Departement of Mathematics, Faculty of Mathematics and Science, Institut Teknologi Sepuluh Nopember, Jalan Raya Kampus ITS, Gedung U, Sukolilo, Surabaya, Indonesia. Email: iqbalmohammad.math@gmail.com 1. INTRODUCTION Data mining is a method to find useful information from large databases. In recent years, various data mining techniques were developed, such as association rules, clustering, classification, sequential pattern, and others. From many data mining problems, classification and prediction are considered as important tasks due to its large applications. Classification is a data mining task intended to learn function or model that maps an item of data into a class from known classes [4]. It is then used to predict a new item data. Several common techniques for classification are decision tree, neural network and others. Other data mining techniques are association rule and sequence pattern. Association rule is a technique used to find intra transactional patterns in databases only for an event while sequence pattern is a data mining technique intended to find series pattern of event at time. All this time, the development of a single data mining technique has increased due to the widely used of data mining techniques for business, research and other purposes. However, along with the development of this single technique, the development of multiple techniques that integrates several data mining techniques is also rapidly increasing. Some of algorithms based on multiple techniques are classification based association rule (CBA) [1] and featured based sequence classifier [3]. Furthermore, there is also Classify by Sequence (CBS) algorithm that integrates classification and sequence pattern for temporal data [6]. These combination methods can effectively combine the advantages of single mining methods to improve the performance in complex data mining [6]. A study about data mining for weather prediction has been conducted using a single technique such as classification or association rule [1]. However, none of studies explores weather prediction using multiple

424 Artificial Intelligence and Enterprise Systems Track data mining techniques. CBA and CBS are included as algorithms based on multiple techniques. These algorithms have similarities, those are: 1) both of them is constructed based on apriori algorithm, and 2) both of the algorithms can process temporal data [6][7][8]. Based on the two aforementioned reasons, a mining process uses either CBA or CBS can be possibly conducted. In addition to that, comparing both algorithms can also be done. In this paper, we use meteorological data to build a classifier using association and sequence pattern for predicting weather. Among two types of meteorological data a station and various stations we use station meteorological data. Based on this meteorological data, in this paper, we analyze the performance of CBA algorithm in compared to CBS algorithm in term of accuracy and running time consumption. For this classification, we use two classes on meteorological data: rain season class and dry season class. 2. LITERATURE REVIEW a. Classification Based on Association (CBA) Algorithm The basic idea of Classification Based on Association Rule (CBA) technique is generated from association rule that integrated with classification rule to produce subset from effective rules [2]. In recent years, many researchers show that CBA has a high potential to build classification system with more predictive and accurate result than of traditional methods such as decision tree [1]. The concept of CBA algorithm consists of two phases [1]. i. CBA-RG (CBA-Rule Generator) CBA-RG is called a rule generator that built based on Apriori algorithm to find association rule. CBA-RG is used to find all ruleitems that at least similar to the minimum support or minsup. is a ruleitem where condset is a set of items and is a class label. condsupcount is number of cases in D dataset that consisted of condset. rulesupcount is a numerous case in D that consisted of condset and label of classes y. Each ruleitems represents a rule. (1) and (2) which D is the dataset size. Ruleitem that has support values more than min_sup is called frequent ruleitem whereas frequent ruleitems used to generate set possibility frequent rule items is called candidate ruleitems. The last step in this procedure is to find the candidate ruleitem frequent which has support more than min_sup. ii. CBA-CB (CBA-Classifier Builder) CBA-RB is a classifier builder procedure. To build classifier, set rules evaluate all possible subset training data and choose the correct subset rule sequence with the smallest error. Given two rules and where, if confidence confidence but support support so that built first than. And classifier :,, R is set of generated rules if. If there is no ruleitems in class, it will generate the default class. b. Classify by Sequence (CBS) Classify by sequence is integrating sequential pattern mining based on Apriori algorithm and probabilistic induction for classification on temporal data. The advantages are simplicity implementation and the availability of result classification due to the pattern based architecture of CBS. Considering a large database D that stores numerous instances, where each instance is comprised of a sequence of temporal data, the following rules are set. Let represents all temporal data instances belonging to class i,. For each class dataset, temporal data instances are represented by, where represents the value of the kth time point of instance. The CBS algorithm consists two phases [6]:

Artificial Intelligence and Enterprise Systems Track 425 a. CSP Miner Algorithm In this phase, the algorithm discovers all classifiable sequential pattern (CSP) for temporal classification. To do so, first, we discover all frequent items of the whole dataset as length 1-frequent sequences. We build a root containing these items as leaves. Using this architecture, CBS feature mining can discover all possible sequences. As the Apriori-like algorithm, generate all candidate sequences by tree extension. It adds all possible elements to the leaves in order to compose the tree by all frequent sequences of length k, and subsequently followed by other candidate sequences of length k+1. After candidate tree is built, count the tree by the whole dataset for pruning the non-frequent parts by tree tracing. After counting, remove all sub trees that do not contain any leaves with value more than min_sup. b. Classifier Builder Algorithm In this phase, build a classifier by removing CSP. Before building a classifier, remove the CSP that has no classification information. After pruning, the remaining CSPs are the features of the class they belong to. Then, the CSP can be classified. Because each CSP has its own class, it shows that the instances which contain this CSP have some possibilities of being classified as a class of CSP. Most of the CSP contains with one temporal instance belonged to one class, so that this instance should be classified as the class. 3. RESEARCH METHOD In this paper, we compare the performance between CBA Algorithm and CBS Algorithm in Meteorological data for weather classification. The procedures our research are shown in Fig 1. START Studying literature on classification algorithm Preparing the meteorological data i.e. temperatur, relative humidity, speed of wind, air pressure Preprocesing data including transform numerical values into categorial values and cleaning data Mining using CBA Algorithm Mining using CBS Algorithm Evaluating the performance of CBA Algorithm Evaluating the performance of CBS Algorithm Comparing the performance of both algorithms Conclusion and recommendation enhancement algorithm END Figure 1. Flow chart of research method 4. RESULTS AND ANALYSIS In experimental, we use meteorological data with a single station. It consists of 13.515 datasets of climate data at Perak Surabaya, East Java-Indonesia started from 1973 until 2009, obtained from the Bureau

426 Artificial Intelligence and Enterprise Systems Track of Meteorology, Climatology and Geophysics of Indonesia (BMKG). All datasets are in numerical values so we need to transform them into categorical values. Since there were some missing values in the attributes, we deleted records containing the missing values and filled with average values taken from other attributes. After this preprocessing stage, the number of datasets decreases into 9.885 datasets. The attributes of the datasets consist of temperature (average, maximum, and minimum), relative humidity, air pressure and speed of wind (average, maximum, and minimum), dew temperature, rainfall, etc. In this paper, we used 4 attributes as class attributes, i.e. average temperature, relative humidity, average speed of wind and air pressure, and rainfall. We built classes into two classes, rain season and dry season. Like with other general data mining experimentations, we divided datasets into a training part and a testing part. The training datasets are used to build the classifier and the testing data that is intended for accuracy evaluation. To do so, we randomly selected 80% instances of datasets for training data, and the remaining 20% for testing data. The next stage, the meteorological data is processed using CBA and CBS algorithm. The results obtained from both algorithms are then compared. Both of the algorithms have similarities, as both is constructed based on apriori algorithm. However, the meteorological data used in CBA algorithm is perceived as a sequence to obtain classifier by paying attention to each class label [5]. As for CBS algorithm, daily data obtained from meteorological data is perceived as a sequence or event where rainy or sunny is set as the class label. The main metric in evaluating CBA and CBS is accuracy, of which is defined as [6]: (3) Table 1 shows the accuracy of both methods by varying minimum support. The accuracy results of CBS are stable when the minimum support is set from 0.1 to 0.4, and slightly decreasing when the minimum support is set from 0.5 to 0.7. As for CBA, its accuracy results increases with minimum support started from 0.3 to 0.7. However, if we compare the results of both algorithms, the accuracy of CBS is higher than CBA when minimum support is set from 0.1 to 0.6. Table 1. Comparison accuracy vs minimum support between CBA algorithm and CBS algorithm Minimum support Accuracy (%) CBA CBS 0.1 36.97 76.7 0.2 48.36 76.67 0.3 35.6 76.67 0.4 44.26 76.67 0.5 66.67 76.6 0.6 66.67 76.6 0.7 79.5 76.5 The graphic visualization of the above table is presented in the Fig.2 below. The performance of CBA algorithm is slightly increasing while the performance of CBS algorithm is quite stable since the first time the minimum support is set. However, the accuracy of CBS algorithm is better than CBA algorithm Based on this result, therefore, for the case of classification for meteorological datasets, CBS algorithm is more accurate than CBA algorithm. Figure 2. Comparison accuracy vs minimum support between CBA algorithm and CBS algorithm

Artificial Intelligence and Enterprise Systems Track 427 The second comparison performance parameter is time consumption or runtime program between CBA and CBS. Table 2 below displays the run time results of both methods by varying minimum support settings. It can be seen that the runtime results of both methods decrease when the minimum support value is started from 0.2 to 0.5. Table 2. Comparison runtime vs minimum support between CBA algorithm and CBS algorithm Minimum Runtime (in seconds) Support CBA CBS 0.2 1739 11.6 0.3 1620 6 0.4 553 4.2 0.5 175 3.1 0.6 372 2.3 0.7 233 1.4 Figure 3 is the graphical representation of Table 2. The figure shows that the runtime results of both methods decrease when the minimum support value is started from 0.2 to 0.5. The result tells us that in overall varying minimum support, the CBS runtime is more efficient than CBA. Thus, we conclude that the performance CBS in running time is better than of the CBA. Figure 3. Comparison runtime vs minimum support between CBA algorithm and CBS algorithm 5. CONCLUSION The research is aimed to compare the performance of multiple algorithms in classification technique, those are, CBA and CBS algorithms, in term of accuracy and running time. Through experimental evaluation, the results show that the running time of CBS is faster than of the CBA. This means that CBS algorithm is more efficient than the CBA. The results also show that CBS achieves higher and more stable classification results than CBA in term of accuracy. In addition to that, the empirical results show that for meteorological datasets, CBS algorithm can discover more classifiable features from training data than CBA algorithm. When minimum support is set to more than 0.7, no classifier is successfully built. For future work we will further explore enhancement the algorithm to make pruning part more effectively for produced more classifiable rules, with aim to execution efficiency in the process. In addition to that, our research shows that the performance of CBS algorithm is better than of CBA. In our future research, we plan to improve the performance of CBS algorithm because this algorithm indeed has some weaknesses as stated in [6]. One of the rooms for improvement is in the more effective classifiable rules in the pruning process so that the execution time will be much more efficient. ACKNOWLEDGEMENTS This paper is part of Penelitian Pendukung Unggulan, a research grant funded by Institute of Research and Public Services, Institut Teknologi Sepuluh Nopember. REFERENCES [1] B. Liu, W. Hsu and Y. Ma. Integrating Classification and Association Rule Mining, In Proceeding of the 4 th International conference on knowledge discovery and data mining (KDD 1998), New York, USA, 1998., 1998.

428 Artificial Intelligence and Enterprise Systems Track [2] M. Nofal, S. Bani-Ahmad. Classification Based On Association-Rule Mining Techniques: A General Survey and Empirical Comparative Evaluation. Department of Information Technology, Al-Balqa Applied University, Jordan. [3] N. Lesh, M. J. Zaki, M. Ogiharra. Mining features for sequence classification. In Proceeding of the 5 th ACM SIGKDD International conference on knowledge discovery and data miningsan Diego, California, USA, 1999, pp. 31-36. [4] N.P. Tan, M. Steinbach, V. Kumar. Introduction to Data Mining, Pearson Addison Weasly., New York, 2006. [5] S.Nandagopal, S.Karthik, V.P.Arunachalam. Mining of Meteorological Data Using Modified Apriori Algorithm. European Journal of Scientific Research., 2010. [6] V.S. Tseng, C. H. Lee. Effective Temporal Data Classification by Integrating Sequential Pattern Mining and Probabilistic Induction. Journal of Expert System with Application, 2009, pp 9524-9532. [7] Hong Shu, Xinyan Zhu, Shangping Dai. Mining Association Rules in Geographical Spatio-Temporal Data. International Society for Photogrammetry and Remote Sensing. 2012. [8] Mennis, J. and Liu, J.W., Mining association rules in spatio-temporal data: an analysis of urban socioeconomic and land cover change. Transactions in GIS, 9(1): 5-17. 2005. BIBLIOGRAPHY OF AUTHORS Mohammad Iqbal earned B.Sc and a Master Degree in Mathematics, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia. He is a lecturer in Mathematics Department, Institut Teknologi Sepuluh Nopember and a member of Computational Mathematics Laboratory in this department. His research interest includes data mining. Imam Mukhlash is a Lecture in Mathematics Department, Institut Teknologi Sepuluh Nopember. In this department, he is an active member of Computational Mathematics Laboratory. He received his B.Sc in Mathematics from Institut Teknologi Sepuluh Nopember; a Master Degree and a Doctoral Degree both in Informatics and Electrical Engineering from Institut Teknologi Bandung, Indonesia. His research interests include data mining and intelligent systems. In addition to teaching, he has conducted some research projects funded by Institute of Research and Public Services, Institut Teknologi Sepuluh Nopember and National Research Program, the Ministry of Higher Education of Indonesia. He has also published papers both in international journal and international seminars. Hanim Maria Astuti is currently an active member of the Planning and Development of Information Systems Laboratory in the Department of Information Systems, Institut Teknologi Sepuluh Nopember, Indonesia. She earned B.S. in Information Systems from Institut Teknologi Sepuluh Nopember, Indonesia; and M.Sc in Management of Technology from Delft University of Technology, the Netherlands. She has conducted some research projects granted by National Research Program, The Ministry of Higher Education, Indonesia, and has published conference papers. Her research interests include Mobile Service Innovation and Business Process Management.