Efficient Rule Set Generation using K-Map & Rough Set Theory (RST)

Similar documents
Classification with Diffuse or Incomplete Information

Climate Precipitation Prediction by Neural Network

DECISION TREE INDUCTION USING ROUGH SET THEORY COMPARATIVE STUDY

A study on lower interval probability function based decision theoretic rough set models

American International Journal of Research in Science, Technology, Engineering & Mathematics

ROUGH SETS THEORY AND UNCERTAINTY INTO INFORMATION SYSTEM

A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set

Attribute Reduction using Forward Selection and Relative Reduct Algorithm

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM

Obtaining Rough Set Approximation using MapReduce Technique in Data Mining

Classification with Diffuse or Incomplete Information

Finding Rough Set Reducts with SAT

A STUDY ON COMPUTATIONAL INTELLIGENCE TECHNIQUES TO DATA MINING

Feature Selection Based on Relative Attribute Dependency: An Experimental Study

Available online at ScienceDirect. Procedia Computer Science 96 (2016 )

ROUGH MEMBERSHIP FUNCTIONS: A TOOL FOR REASONING WITH UNCERTAINTY

Chapter 2. Boolean Expressions:

On Reduct Construction Algorithms

A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM

Review of Fuzzy Logical Database Models

The Graph of Simplex Vertices

CHAPTER-2 STRUCTURE OF BOOLEAN FUNCTION USING GATES, K-Map and Quine-McCluskey

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Minimal Test Cost Feature Selection with Positive Region Constraint

Rough Set Approach to Unsupervised Neural Network based Pattern Classifier

Irregular Interval Valued Fuzzy Graphs

Gate-Level Minimization. BME208 Logic Circuits Yalçın İŞLER

Collaborative Rough Clustering

A Decision-Theoretic Rough Set Model

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Module -7. Karnaugh Maps

Combinational Logic Circuits Part III -Theoretical Foundations

Dynamic Tabu Search for Dimensionality Reduction in Rough Set

A Comparison of Global and Local Probabilistic Approximations in Mining Data with Many Missing Attribute Values

Chapter 2 Combinational Logic Circuits

Irregular Bipolar Fuzzy Graphs

Online Incremental Rough Set Learning in Intelligent Traffic System

Fuzzy-Rough Sets for Descriptive Dimensionality Reduction

On Solving Fuzzy Rough Linear Fractional Programming Problem

Chapter 3. Gate-Level Minimization. Outlines

Specifying logic functions

[Ch 6] Set Theory. 1. Basic Concepts and Definitions. 400 lecture note #4. 1) Basics

An Application of Fuzzy Matrices in Medical Diagnosis

Combinational Logic Circuits

Handling Numerical Missing Values Via Rough Sets

Points Addressed in this Lecture. Standard form of Boolean Expressions. Lecture 4: Logic Simplication & Karnaugh Map

A Combined Encryption Compression Scheme Using Chaotic Maps

Rough Set Approaches to Rule Induction from Incomplete Data

INCREASING CLASSIFICATION QUALITY BY USING FUZZY LOGIC

REDUNDANCY OF MULTISET TOPOLOGICAL SPACES

Data with Missing Attribute Values: Generalization of Indiscernibility Relation and Rule Induction

Feature Selection from the Perspective of Knowledge Granulation in Dynamic Set-valued Information System *

Date Performed: Marks Obtained: /10. Group Members (ID):. Experiment # 04. Boolean Expression Simplification and Implementation

1 Inference for Boolean theories

Karnaugh Map (K-Map) Karnaugh Map. Karnaugh Map Examples. Ch. 2.4 Ch. 2.5 Simplification using K-map

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks

An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm

Lecture 3: Linear Classification

A B AB CD Objectives:

Formal Concept Analysis and Hierarchical Classes Analysis


Bipolar Fuzzy Line Graph of a Bipolar Fuzzy Hypergraph

Net Tape Graphs. M. El-Ghoul Mathematics Department, Faculty of Science Tanta University, Tanta, Egypt

Incorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data

Redefining and Enhancing K-means Algorithm

Rough Sets, Neighborhood Systems, and Granular Computing

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

To write Boolean functions in their standard Min and Max terms format. To simplify Boolean expressions using Karnaugh Map.

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

An Innovative Approach for Attribute Reduction in Rough Set Theory

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Digital Logic Lecture 7 Gate Level Minimization

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Similarity Measures of Pentagonal Fuzzy Numbers

Data Mining & Feature Selection

RECORD-TO-RECORD TRAVEL ALGORITHM FOR ATTRIBUTE REDUCTION IN ROUGH SET THEORY

Chapter 2 Combinational

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394

Multi Attribute Decision Making Approach for Solving Intuitionistic Fuzzy Soft Matrix

Propositional Calculus: Boolean Algebra and Simplification. CS 270: Mathematical Foundations of Computer Science Jeremy Johnson

Survey on Rough Set Feature Selection Using Evolutionary Algorithm

Application of Rough Set for Routing Selection Based on OSPF Protocol

VLSI System Design Part II : Logic Synthesis (1) Oct Feb.2007

Switching Theory And Logic Design UNIT-II GATE LEVEL MINIMIZATION

Chapter 6. Logic Design Optimization Chapter 6

APPROXIMATION DISTANCE CALCULATION ON CONCEPT LATTICE

FUZZY SQL for Linguistic Queries Poonam Rathee Department of Computer Science Aim &Act, Banasthali Vidyapeeth Rajasthan India

Chapter 4 Fuzzy Logic

A New Similarity Measure of Intuitionistic Fuzzy Multi Sets in Medical Diagnosis Application

Efficient SQL-Querying Method for Data Mining in Large Data Bases

Gate-Level Minimization. section instructor: Ufuk Çelikcan

Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1

Image Compression: An Artificial Neural Network Approach

2.2 Set Operations. Introduction DEFINITION 1. EXAMPLE 1 The union of the sets {1, 3, 5} and {1, 2, 3} is the set {1, 2, 3, 5}; that is, EXAMPLE 2

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

Salman Ahmed.G* et al. /International Journal of Pharmacy & Technology

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies

Keywords Fuzzy, Set Theory, KDD, Data Base, Transformed Database.

FUZZY BOOLEAN ALGEBRAS AND LUKASIEWICZ LOGIC. Angel Garrido

Transcription:

International Journal of Engineering & Technology Innovations, Vol. 2 Issue 3, May 2015 www..com 6 Efficient Rule Set Generation using K-Map & Rough Set Theory (RST) Durgesh Srivastava 1, Shalini Batra 2, Sweta Bhalothia 3 1,2,3 BRCM CET, Bahal, Bhiwani, Haryana, INDIA {dkumar.bit, shwetabhalothia252 }@gmail.com Abstract With the enormous growth of data especially large size data sets, mixed types of data, uncertainty, incompleteness, data change, use of background knowledge, etc. the inconsistence of databases is growing many fold. Many a times the information system may contain a certain amount of redundancy that will not assist in any knowledge discovery and may in fact mislead the process. One of the methods which can be used to deal with these issues is the Rough Set Theory (RST), a mathematical tool used to deal with imperfect knowledge and discover pattern hidden in data. RST deals with uncertainty and vagueness, without any requirement of additional information about data, allowing generation of the sets of decision rules from data. The redundant attributes may be eliminated in order to generate the reduct set or to construct the core of the attribute set. This paper proposes a new approach; Karnaugh map for the reduction of attributes and then uses rough set theory to generate rules. Keywords: Rough Set Theory (RST), K-Map, Feature Selection, Decision Rules. 1. Introduction In machine learning, classification is considered an instance of supervised learning; learning where training set of correctlyidentified observations is available. Data classification is the categorization of data for its most effective and efficient use. To classify means identify to which category or set of categories a new observation belongs, based on the training set of data containing observations whose category membership is known. Rough Set Theory, proposed in 1982 by Zdzislaw Pawlak, considered as one of the first non-statistical approaches in data analysis, chiefly concerned with the classification is also concerned with analysis of imprecise, uncertain or incomplete information and knowledge. Very large data sets, mixed types of data (continuous valued, symbolic data), uncertainty (noisy data), incompleteness (missing, incomplete data), data change, use of background knowledge etc. are real world issues where rough set theory can be useful. Rough sets can handle uncertainty and vagueness, discovering patterns in inconsistent data. Attribute reduction is very important in data classification where irrelevant and redundant attributes are removed from the decision without any classification information loss [1], [7], [8]. This paper proposes new approaches for attribute reduction. Karnaugh map, also known as K-map, introduced by Maurice Karnaugh in 1953 to simplify Boolean algebra expressions. The K-map reduces the need for extensive calculations by taking advantages of humans pattern recognition capability [4], [9], [10]. This paper is organized as follows. Section 2 describes some basic concepts of Rough Set Theory. Section 3 highlights the concepts of K-map. Section 4, gives the introduction of proposed method. Experimental results are described in Section 5.Section 6 gives the conclusion followed by references. 2. Rough Set Theory (RST) Rough set theory proposed by Z. Pawlak in 1982, is a new mathematical approach to imperfect knowledge having close connections with many other theories. Despite its connections with other theories, the rough set theory may be considered as own independent discipline. Recently rough sets have been exhaustively used in many applications such as Artificial Intelligence and cognitive sciences, especially in machine learning, knowledge discovery, data mining, expert systems, approximate reasoning and pattern recognition [1], [7]. Basic concepts of Rough Set Theory An information system is a pair S = (U, P) where U is a nonempty finite set of objects called the universe and P is a nonempty finite set of attributes such that a: U V a for every aϵ P. The set V a is called the value set of a. A decision system is any information system of the form S = (U, P {d}), where d (is not element of P) is the decision attribute. The elements of P are called conditional attributes or simply conditions. Let Q P then Q-indiscernibility relation denoted by IND (Q), is defined as: IND (Q) = {(x, x ) U 2 a ϵ Q a(x).a (x )} If (x, x ) ϵ IND(Q), then objects x and x are indiscernible from each other by attributes from Q. The equivalence classes of the Q-indiscernibility relation are denoted [x] Q. The discernibility matrix of S is a symmetric n n matrix with entries c ij as given as: c ij ={ aϵ Q a( x i ) a (x j )} for i, j=1,...,n Each entry thus consists of the set of attributes upon which objects x i and x j differ. Since discernibility matrix is symmetric and c ii = (the empty set) for i=1,.,n, this matrix can be represented using only elements in its lower triangular part, i.e. for1 j < i n. A discernibility function(f S) for an information system S is a Boolean function of m Boolean variables * a 1,...,a * m (corresponding to the attribute a 1,...,a m ) defined as : f s (a 1,..., a m )= { c ij * 1 j i n,c ij φ} Where c ij * = {a* a ϵ c ij }. The set of all prime implicates of f s determines the set of all reducts of Q. The discernibility function is a Boolean function in POS form. Approximations of set Let X is a subset of U, i.e. X U.

International Journal of Engineering & Technology Innovations, Vol. 2 Issue 3, May 2015 www..com 7 Lower Approximation: For a given concept, its lower approximation refers to the set of observations that can all be classified into this concept. Q * (X) = {x:[x] Q X} Upper Approximation: For a given concept, its upper approximation refers to the set of observations that can be possibly classified into this concept. Q * (X) = {x:[x] Q X } Once the reducts have been computed, the rules are easily constructed by overlaying the reducts over the originating decision table and reading off the values. 3. Karnaugh-Map (K-Map) The Karnaugh map, also known as the k-map, a method to simplify Boolean algebra expressions reduces the need for extensive calculations. In Boolean algebra, any Boolean function can be expressed in canonical form using the dual concepts of min-terms and max-terms. Product of Sums (POS) is a conjunction (AND) of max terms. The canonical forms allow for greater analysis into the simplification of Boolean functions. Here K-map is used for the reduction of attributes by simplifying discernibility function which is a Boolean function in POS form. Since discernibility function is a monotone Boolean function (Boolean function without negation), so negative terms in K-map are not considered [9], [10]. its true and false form within the group. This eliminates variable B leaving only variable A which only has its true form. The minimized answer therefore is Z = A. 4. Proposed Method In real world data varies in size and complexity, which is difficult to analyze and also hard to manage from computational view point. The major objectives of Rough Set analysis are to reduce data size and to handle inconsistency in data. The work presented here deals with uncertainty and extract useful information from the database. The basic methodology of the proposed work is defined under the following steps: 1. Representation of the Information System 2. Discernibility Matrices 3. Discernibility Functions 4. Reduction of Attributes using K-Map 5. New Reduct Table 6. Set Approximation 7. Rule Generation Advantages of K-map: It takes less space It takes less time Example: Consider the following map. The function plotted is: Z = f (A, B) = A + AB Fig. 1. Flow Diagram of Proposed Method The values of the input variables form the rows and columns, which are the logic values of the variables A and B (with one denoting true form and zero denoting false form) form the head of the rows and columns respectively. The above map is a one dimensional type which can be used to simplify an expression in two variables. There is a two-dimensional map that can be used for up to four variables, and a three-dimensional map for up to six variables. Using algebraic simplification, Z = A + AB Z = A ( + B) Z = A Variable B becomes redundant due to Boolean Theorem. Referring to the map above, the two adjacent 1's are grouped together. Through inspection it can be seen that variable B has 5. Experimental Result The proposed work uses Flu Data Set where the data is discretized by using RST and K-map. The data is split into training set and test set. In Flu Data Set we have taken six patients data. The data about six patients is used as training data. The attributes for this data are headache, muscle pain, temperature and flu where headache, muscle pain and temperature are conditional attributes and flu is decision attribute. Samples of decision rules for Flu data set generated from reduct are given below: Step1: Taken Flu Data Set Here several patients data set is taken with possible flu symptoms(as shown in Table 1). By using k-map & Rough set approach, data is analyzed, redundant data is eliminated, attributes are reduced and set of rules are developed. Table 1 shows the patient s data set and respective symptoms. Table 1. Information system of Flu Disease Patient Headache Muscle pain Temperature Flu p1 No Yes High Yes p2 Yes No High Yes

International Journal of Engineering & Technology Innovations, Vol. 2 Issue 3, May 2015 www..com 8 p3 Yes Yes Very High Yes p4 No Yes Normal No p5 Yes No High No p6 No Yes Very High Yes Step2: Discernibility Matrix As discussed above discernibility matrix of an information system is a symmetric n n matrix with entries c ij where each entry consists of the set of attributes upon which objects x i and x j differ. Since discernibility matrix is symmetric and c ii = for i =1,...,n, this matrix can be represented using only elements in its lower triangular part i.e. for 1 j i n. In the Table 1 the given information system has 6 objects, so the discernibility matrix for this system is a 6 6 matrix. The complete discernibility matrix for above information system is shows in Table 2: Table 2. Discernibility Matrices for given Information System p1 p2 p3 p4 p5 p6 p1 p2 H,M p3 H,T M,T p4 T H,M,T H,T p5 H,M M,T H,M,T p6 T H,M,T H T H,M,T In this table, (H, M, T) denotes Headache, Muscle pain and Temperature respectively [11]. Step3: Discernibility Function A discernibility function for an information system is a Boolean function of Boolean variables. With every discernibility matrix one can associate a discernibility function. For above discernibility matrix, discernibility function is given below: f (H,M,T)=(H+M).(H+T).T.(H+M).T (M+T).(H+M+T).(H+M+T) (H+T).(M+T).H (H+M+T).T (H+M+T) Each row in the above discernibility function corresponds to one column in the discernibility matrix. Each parenthesized tuple is a sum in the Boolean expression and one-letter Boolean variables correspond to attribute names [11]. Step4: K-map for this function Since there are 3 variables (H, M, T), so the K-map for this function has 2 3 =8 squares. In K-map for POS function each square represents a max-term. Adjacent squares always differ by just one literal. The K-map for a function is specified by putting a 0 in the square corresponding to a max-term in case of POS form. For example, the first sum in the above discernibility function is H+M. Here the value of H and M is 0, the absence of T indicates that the value of T is 1, so enter a 0 in the square which has value 001. Like this we enters the values for the above function [9]. K-map can be minimized with the help of grouping the terms to form single terms. When forming groups of squares, following points need to be considered: Every square containing 0 must be considered at least once. A square containing 0 can be included in as many groups as desired. A group must be as large as possible. If a square containing 0 cannot be placed in a group, then leave it out to include in final expression The number of squares in a group must be equal to 2, 4, 8,... Fig.2. K-Map of FLU Disease After grouping the terms, it is checked that which variable is changing its values and which remains same in group and the variable that is changing its value is dropped. So the output of a group is the terms that do not change its values. In the k-map given below (Fig. 2), two groups are formed: one is represented by red color and other is by green color. In the group represented by red color, M and T are changing their values but H has the same value throughout the group, so output of this group is H, same as the output of the second group is T. So, Reduct = {H, T} Step5: Reduct Table After the simplification of K-map following reduct set {H, T} is obtained which indicates that there is only one reduct set {H, T} in the data table and it is the core. Thus the attribute Muscle pain can be eliminated from the table.the new reduct table is:

International Journal of Engineering & Technology Innovations, Vol. 2 Issue 3, May 2015 www..com 9 Table 2. Reduct Information system of Flu Disease Patient Headache Temperature Flu p1 No High Yes p2 Yes High Yes p3 Yes Very High Yes p4 No Normal No p5 Yes High No p6 No Very High Yes Step6: Set Approximation The lower and the upper approximations of a set are interior and closure operations in a topology generated by a indiscernibility relation. Indiscernibility Relation is the relation between two objects or more, where all the values are identical in relation to a subset of considered attributes. The indiscernibility relation for given information system is [6, 11]: (Q) = {{p1}, {p2, p5}, {p3}, {p4}, {p6}} X={x: Flu(x) =yes} = {p1, p2, p3, p6} Q = {Headache, Temperature} IND Lower Approximation set Q * (X) of the patients that are definitely having Flu are identified as: Q * (X) = {p1, p3, p6} Upper Approximation set Q*(X) of the patients that possibly have Flu are identified as: Q * (X) = {p1, p2, p3, p5, p6} Boundary region is defined as: QN Q (X) = {p2, p5} Boundary Region (BR), the set constituted by elements p2 and p5 (which cannot be classified, since they possess the same characteristics, but with differing conclusions) differ in the decision attribute. Step7: Rule Generation Once the reducts have been computed, the rules are easily constructed by overlaying the reducts over the originating decision table and reading off the values. Given the reduct {Headache, Temperature} in new reduct table, the rule read off the first object is "if Headache is no and Temperature is high then Flu is yes" [4]. Table 3. Set of Rules for FLU disease Sl. No. Rules 1 (Headache, No) and (Temperature, High) (Flu, Yes) 2 (Headache, Yes) and (Temperature, Very High) (Flu, Yes) 3 (Headache, No) and (Temperature, Very High) (Flu, Yes) 4 (Headache, no) and (Temperature, Normal) (Flu, No) Conclusion: An efficient rule set generation approach has been presented to classify data by using K-map and Rough Set Theory. In this paper, different steps used to generate the rules have been presented. The features have been reduced by using first three steps discernibility matrix, discernibility function and K-map. After reducing attributes and calculating set approximations, extraction of classification rules becomes easier making it easy to automate the task of classifying data. The boundary region is representing uncertainty. The elements of boundary region cannot be classified, since they possess the same characteristics, but with differing conclusions. The experimental results on a number of data sets indicate that the given approach is very satisfactory. References: 1. Prof. Ayesha Butalia. Prof. M. L. Dhore, Ms. Geetika Tewani: Applications of Rough Sets in the field of Data Mining. First International Conference on Emerging Trends in Engineering and Technology.978-0-7695-3267-7/08 IEEE(2008) 2. Puniethaa Prabhu, Duraiswamy: Feature selection for HIV database using Rough system. Second International conference on Computing, Communication and Networking Technologies,978-1-4244-6589-7/10/ IEEE(2010) 3. Daijin Kim: Data classification based on tolerant rough set. 2001 Pattern Recognition Society. Published by Elsevier Science Ltd 4. Prasanta Gogoi, Ranjan Das, B Borah & D K Bhattacharyya: Efficient Rule Set Generation using Rough Set Theory for Classification of High

International Journal of Engineering & Technology Innovations, Vol. 2 Issue 3, May 2015 www..com 10 Dimensional Data. International Journal of Smart Sensors and Ad Hoc Networks (IJSSAN) ISSN No. 2248-9738 (Print) Volume-1, Issue-2, 2011 5. Junbo Zhang, Tianrui Li, Da Ruan b, Zizhe Gao, Chengbing Zhao : A parallel method for computing rough set approximations. Information Sciences 194 (2012) 209 223 Elsevier 2012 6. J.F. Peters, A. Skowron, A rough set approach to reasoning about data, International Journal of Intelligent Systems, vol. 15,2001, 1-2. 7. Z. Pawlak, Rough Sets, International Journal of Computer and Information Sciences, Vol.11, 341-356(1982) 8. J. J. Alpigini, J. F. Peters, J. Skowron and N.Zhong. Rough Sets and Current Trends in Computing, Third International Conference, USA, 2002. 9. Available from "Karnaugh Maps Rules of Simplification". Retrieved 2009-05-30. 10. "Simplifying Logic Circuits with Karnaugh Maps". The University of Texas at Dallas. Retrieved 7 October 2012. 11. Zbigniew Suraj. An Introduction to Rough Set Theory and Its Applications ICENCO 2004, December 27-30, 2004, Cairo, Egypt.