ORG - Oblique Rules Generator

Similar documents
Minimal Test Cost Feature Selection with Positive Region Constraint

RSES 2.2 Rough Set Exploration System 2.2 With an application implementation

Induction of Multivariate Decision Trees by Using Dipolar Criteria

Feature Selection with Positive Region Constraint for Test-Cost-Sensitive Data

On the Evolution of Rough Set Exploration System

A New Version of Rough Set Exploration System

Attribute Reduction using Forward Selection and Relative Reduct Algorithm

Mining Local Association Rules from Temporal Data Set

Using Decision Boundary to Analyze Classifiers

Efficient SQL-Querying Method for Data Mining in Large Data Bases

Comparison of Heuristics for Optimization of Association Rules

A Parallel Evolutionary Algorithm for Discovery of Decision Rules

The Effects of Outliers on Support Vector Machines

Reachability on a region bounded by two attached squares

Rank Measures for Ordering

Using a genetic algorithm for editing k-nearest neighbor classifiers

A Hierarchical Approach to Multimodal Classification

ON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTION

Mining High Order Decision Rules

RPKM: The Rough Possibilistic K-Modes

A Comparison of Global and Local Probabilistic Approximations in Mining Data with Many Missing Attribute Values

Chapter 8 The C 4.5*stat algorithm

Digital Image Processing Fundamentals

Segmentation of Images

HOUGH TRANSFORM CS 6350 C V

The Rough Set Exploration System

ROUGH SETS THEORY AND UNCERTAINTY INTO INFORMATION SYSTEM

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

ECG782: Multidimensional Digital Signal Processing

AN INFORMATION system proposed by Z. Pawlak [1]

Integration Base Classifiers Based on Their Decision Boundary

C-NBC: Neighborhood-Based Clustering with Constraints

Rough Set Approaches to Rule Induction from Incomplete Data

Available online at ScienceDirect. Procedia Computer Science 35 (2014 )

Perimeter and Area Estimations of Digitized Objects with Fuzzy Borders

Improving Classifier Performance by Imputing Missing Values using Discretization Method

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems

Rule extraction from support vector machines

SVM Classification in Multiclass Letter Recognition System

Feature Selection Using Modified-MCA Based Scoring Metric for Classification

On Reduct Construction Algorithms

Feature Selection for Multi-Class Imbalanced Data Sets Based on Genetic Algorithm

A Closest Fit Approach to Missing Attribute Values in Preterm Birth Data

Machine Learning for NLP

Support Vector Machines

Linear Separability. Linear Separability. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons

Estimating Feature Discriminant Power in Decision Tree Classifiers*

Kernel Combination Versus Classifier Combination

Data with Missing Attribute Values: Generalization of Indiscernibility Relation and Rule Induction

Bagging for One-Class Learning

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Sequences Modeling and Analysis Based on Complex Network

Leave-One-Out Support Vector Machines

The Role of Biomedical Dataset in Classification

Support Vector Regression for Software Reliability Growth Modeling and Prediction

The Rough Set Database System: An Overview

LOGISTIC REGRESSION FOR MULTIPLE CLASSES

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset

USING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS

Parallel Monte Carlo Sampling Scheme for Sphere and Hemisphere

SIMULATION OF ARTIFICIAL SYSTEMS BEHAVIOR IN PARAMETRIC EIGHT-DIMENSIONAL SPACE

Three Dimensional Geometry. Linear Programming

Multiobjective Formulations of Fuzzy Rule-Based Classification System Design

Controlling the spread of dynamic self-organising maps

A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set

Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes

Two-step Modified SOM for Parallel Calculation

Isometric Diamond Subgraphs

A motion planning method for mobile robot considering rotational motion in area coverage task

Accelerating Improvement of Fuzzy Rules Induction with Artificial Immune Systems

A study on lower interval probability function based decision theoretic rough set models

AQA GCSE Maths - Higher Self-Assessment Checklist

Efficient Case Based Feature Construction

Support Vector Machines

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

The Curse of Dimensionality

Improving the Discrimination Capability with an Adaptive Synthetic Discriminant Function Filter

Data Mining. Lesson 9 Support Vector Machines. MSc in Computer Science University of New York Tirana Assoc. Prof. Dr.

Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms

Use of Mean Square Error Measure in Biometric Analysis of Fingerprint Tests

G 6i try. On the Number of Minimal 1-Steiner Trees* Discrete Comput Geom 12:29-34 (1994)

Performance Assessment of DMOEA-DD with CEC 2009 MOEA Competition Test Instances

Collaborative Rough Clustering

Face Recognition with Rough-Neural Network: A Rule Based Approach

Generalized Coordinates for Cellular Automata Grids

New Optimal Load Allocation for Scheduling Divisible Data Grid Applications

The Modified IWO Algorithm for Optimization of Numerical Functions

Discretizing Continuous Attributes Using Information Theory

Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information

Data mining with Support Vector Machine

A Rough Set Approach to Data with Missing Attribute Values

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM

Comparing Univariate and Multivariate Decision Trees *

Comparison of supervised self-organizing maps using Euclidian or Mahalanobis distance in classification context

Time Complexity Analysis of the Genetic Algorithm Clustering Method


Bagging and Boosting Algorithms for Support Vector Machine Classifiers

Support Vector Machines.

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Robotics Programming Laboratory

Transcription:

ORG - Oblique Rules Generator Marcin Michalak,MarekSikora,2, and Patryk Ziarnik Silesian University of Technology, ul. Akademicka 6, 44- Gliwice, Poland {Marcin.Michalak,Marek.Sikora,Patryk.Ziarnik}@polsl.pl 2 Institute of Innovative Technologies EMAG, ul. Leopolda 3, 4-89 Katowice, Poland Abstract. In this paper the new approach to generating oblique decision rules is presented. On the basis of limitations for every oblique decision rules parameters the grid of parameters values is created and then for every node of this grid the oblique condition is generated and its quality is calculated. The best oblique conditions build the oblique decision rule. Conditions are added as long as there are non-covered objects and the limitation of the length of the rule is not exceeded. All rules are generated with the idea of sequential covering. Keywords: machine learning, decision rules, oblique decision rules, rules induction. Introduction Example based rules induction is, apart from decision trees induction, one of the most popular technique of knowledge discovery in databases. So-called decision rules are the special kind of rules. Sets of decision rules built by induction algorithms are usually designed for two basic aims. One is developing a classification system that exploits determined rules. Other aim is describing patterns in an analyzed dataset. Apart from the number of algorithms that generate hyper-cuboidal decision rules it is worth to raise the question: Aren t the oblique decision rules more flexible to describe the nature of the data? On the one hand every simple condition like "parameter less/greater than value" may be interpreted in the intuitive way, but on the other hand the linear combination of the parameters "a parameter a ±a 2 parameter a 2 + a less/greater may substitute several non-oblique decision rules with the cost of being a little less interpretable. In this article we describe the method of generating oblique decision rules (Oblique Decision rules Generator ORG) which is the kind of exhausting searching of oblique conditions in the space of oblique decision rule parameters. As oblique decision rules may be treated as the generalization of the standard decision rules the next part of the paper presents some achievements in the area of rules generalization. Then some basic notions that deal with oblique decision rules are presented. Afterwards the algorithm that generates oblique decision rules (ORG) is defined. The paper ends with comparison of results obtained on several our synthetic and some well known datasets. L. Rutkowski et al. (Eds.): ICAISC 22, Part II, LNCS 7268, pp. 52 59, 22. c Springer-Verlag Berlin Heidelberg 22

ORG - Oblique Rules Generator 53 2 Related Works The simplest method of generalization used by all induction algorithms is rules shortening consists in removing elementary conditions. Heuristic strategies are applied here (for example hill climbing) or exhaustive searching. Rules are shortened until a quality (e.g. precision) of the shortened rule drops below some fixed threshold. Such solution was applied, inter alia, in the RSES system [2] where rules are shortened as long as the rule precision does not decrease. In the case of unbalanced data introducing various threshold values of shortened rules quality leads to keeping better sensitivity and specificity of an obtained classifier. The other approach to rules generalization is concerned with decision rules joining algorithms that consists in merging two or more similar rules [,6]. In [6] an iterative joining algorithm relying on merging ranges occurring in corresponding elementary conditions of input rules is presented. The merging ends when a new rule covers all positive examples covered by joined rules. Rule quality measures [] are used for output rules quality assessment. Paper [] presents a similar approach, where rules are grouped before joining [] or the similarity between rules is calculated, and rules belonging to the same group or sufficiently similar are joined. The special case of a rules joining algorithm is the algorithm proposed in [3], in which authors introduce complex elementary conditions in rules premises. The complex conditions are linear combinations of attributes occurring in simple elementary conditions of rules premises. The algorithm applies to the special kind of rules obtained in so-called dominance based rough set model [8] only, and is not fit for aggregation of classic decision rules, in which ranges of elementary conditions can be bounded above and below simultaneously. Finally, also algorithms that make it possible to generate oblique elementary conditions during the model constructing are worth to be mentioned. One manages here with algorithms of oblique decision trees induction [5,9,2]. A special case of getting a tree with oblique elementary conditions is an application of the linear SVM in construction of the tree nodes [3]. For decision rules, an algorithm that enables oblique elementary conditions to appear during the rules induction is ADReD [4]. Considering obtained rules in terms of their description power we can say that, even though the number of elementary conditions in rules premises is usually less than in rules allowing no oblique conditions, unquestionable disadvantage of these algorithms is a very complicated form of elementary conditions in which all conditional attributes are frequently used. Other approach introducing oblique elementary conditions in rules premises consists in applying the constructive induction (especially the data driven constructive induction) and inputting new attributes depending on linear combinations of existing features, and next determining rules by the standard induction algorithm [4,7] based on the attributes set extended this way. 3 Oblique Decision Rules Fundamentals. Decision rules with oblique conditions assume more complex form of descriptors than standard decision rules. The oblique condition is a

54 M. Michalak, M. Sikora, and P. Ziarnik condition in which a plane separating decision classes is a linear combination of conditional values of attributes a i A (elementary conditions) on the assumption that all of them are of numerical type: A i= c ia i +c where a i A, c i,c R. The oblique condition can be defined as: A i= c ia i + c or i = A c i a i + c < The oblique condition describes a hyperplane in a condition attributes space. The condition of the rule determines which elements from the decision class are covered by the given rule. Each oblique decision rule is defined by the intersection of oblique conditions. Parameters of the Descriptor and Their Ranges - the Analysis. Let us define the space of all hyperplanes which are single oblique conditions. The n dimensional hyperplane can be described with a linear equation of the following general form A x + A 2 x 2 +... + A n x n + C =where A i,c R and at least one of the A i. In the proposed solution, instead of the general form, we can use the normal form of the hyperplane equation: α x + α 2 x 2 +... + α n x n ρ = where α i are the direction cosines (α 2 + α 2 2 +... + α n 2 =)andρ is the hyperplane distance from the origin of the coordinate system. This notation makes it possible to limit the range of every parameter. To explain how to find a real value range of descriptor parameters we could consider a straight line in the plane defined by the following normal form: x cos θ + y sin θ ρ = where θ is the angle of depression to the x axis and ρ is the distance between the line and the origin as illustrated in Fig.. Every line in the plane corresponds to a proper point in the parameters space. Determination of a straight line in (θ, ρ)-space could be realized by searching a chosen subset of that space using a grid method. The angle θ is naturally bounded, so it can be defined as θ [, 2π). It is enough to determine a step of creating a grid for this variable. It is also possible to bound the values of Fig.. The normal parameters for a line

ORG - Oblique Rules Generator 55 parameter ρ. The lower bound is and the upper bound could be calculated as follows: The set of points is finite so we could determine maximal values of each coordinate. If some values of variables are negative, data could be translated into such a coordinate system where all of coordinates are positive. Fig. 2. The idea of the values of the parameter The idea is to find a straight line which passes through the point and its distance from the origin is the longest one (Fig. 2.). This problem could be solved by searching the global maximum of a function of the distance between the line and the origin depending on the value of the angle θ: ρ max (θ opt )=x max cos ( arctan y max x max ) + y max sin ( arctan y max x max Having set boundary values for all parameters of the condition we only have to determine the resolution of searching of the parameter space - a step for each parameter for the grid method: θ [, 2π), ρ [,ρ max ). The solution could be used for each hyperplane using the dependency of the sum of the squaresof the direction cosines, for example for planes in 3 dimensional and any n dimensional space. Correct Side of the Condition. Each oblique condition requires to define its correct side. To determine this we can use a normal vector to a hyperplane (containing a considered condition) as follows: for n dimensional space each hyperplane could be described with its normal vector n defined as n = [A,A 2,..., A n ]. We should calculate one vector more to find a correct side of a considered condition for a given point called T. The initial point P of such a vector could be any point lying on the hyperplane and the final point should be the point T. According to this, the second vector v is defined as: P =(x P,x P 2,..., x Pn ); T =(x T,x T 2,..., x Tn ) v = PT =(x T x P,x T 2 x P 2,..., x Tn x Pn ) The next step is to calculate the dot product of these two vectors: n and v: n v = n v cos α To decide whether the point T is lying on the correct side of the condition we should consider the value of the dot product in the following way: )

56 M. Michalak, M. Sikora, and P. Ziarnik. If the value is greater than, the point T is considered to be on the correct side of the condition. 2. If the value is equal to, the point T is assumed to be on the correct side of the condition. 3. If the value is less than, the point T is not on the correct side of the condition. In this moment we can limit the bound for the angle θ in such a way θ [,π) and for each θ consider also the second case when thecorrectsideistheopposite one. 4 Description of the Algorithm The purpose of the algorithm is to find the best oblique decision rules for each decision class of the input data taking into account several defined constraints. In general, the are two basic steps of the algorithm:. Create a parameter grid using a determined step for each parameter. 2. The growth of the new created rule depends on checking all conditions defined with the grid nodes. It is possible to constrain a number of rules defining the maximal number of rules which describe each class. Successive rules should be generated as long as there are still training objects which do not support any rule and the constraint is still not achieved. For each decision rule successive oblique conditions are obtained using a hill climbing method. Below, the description of generating the single oblique decision rule is shown:. For each cell of parameter grid create a condition and calculate its quality for the given training set using one of possible quality measures. 2. Save only the first best condition (with the highest quality). 3. Reduce the training set (just for the time of generating next condition) by rejecting all training objects which do not cover previously found conditions. 4. Find a successive condition with the first highest quality using the reduced training set. 5. A new condition should be added to the rule only if the extended rule is better than the rule generated in the previous iteration and the constraint (maximal number of descriptors for each rule) is not achieved. Otherwise, the new condition must be rejected and the search of the next conditions for this rule is stopped. 6. Continue searching successive conditions after reducing the training set by rejecting all objects which do not recognise the current rule. The addition of conditions should be stopped when the rule consists of the determined maximal number of conditions or the quality of the oblique decision rule with added condition is not improved (such a found condition is excluded). After the rule is generated we remove all covered positive objects from the training set and in the case when the maximal number of rules per decision class is not achieved we start to generate the new rule.

ORG - Oblique Rules Generator 57 5 Experiments and Results First experiments were done for three synthetic datasets, preparedexactly for the task of searching oblique decision rules: two two-dimensional (2D and double2d) and one three-dimensional (3D). Simple visualisation of these datasets is shown on the Fig. 3. Each dataset contains objects that belong to two classes. Two-dimensional datasets are almost balanced (562:438 and 534:466) but the third dataset has the proportion of classes size 835:65. First two-dimensional dataset looks like the square divided into two classes by its diagonal. The second two-dimensional dataset may be described as follows: one class is in two opposite corners and the second class is the rest. Three-dimensional dataset are unbalanced because only the one corner belongs to the smaller class. For this datasets the limitation of the maximal number of the rules per decision class and the maximal number of conditions per decision rule is given in the table with the results. As the quality measure the average of the rule precision and coverage was used..9.9.8.8.7.7.6.6.5.5.5.4.3.4.3.2.2..2.4.6.8.2..2.4.6.8.2.4.6.8.4.6.8 Fig. 3. Visualisation of the synthetic datasets: 2D (left); double2d (center); 3D (right) For the further experiments several datasets from UCI repository were taken into consideration: iris, balance scale, ecoli, breast wisconsin [6]. Also the Ripley s synth.tr data were used [5]. For every experiment the limitation of number of rules per decision class and the number of conditions per single rule for the ORG algorithm was the same: at most two rules built from at most two conditions. The quality measure remained the same as for the previous datasets. Results of ORG are compared with PART algorithm [7] obtained with the WEKA software. The WEKA implementation of PART algorithm does not give the information about the error standard deviation in the -CV model so it can not be compared with the ORG results. 6 Conclusions and Further Works In this short article the intuitive and kind of exhausting way of oblique decision rules generating was presented. This algorithm, called ORG, is based on the limitation for parameters of oblique condition. In this approach it is possible to constrain the number of obtained rules (per single decision class) and also the shape of rules (with the definition of maximal number of oblique conditions).

58 M. Michalak, M. Sikora, and P. Ziarnik Table. Results on synthetic datasets avg. std avg. rules avg. elem. ORG params/class dataset accuracy dev. number cond. number max number of: PART ORG PART ORG PART ORG PART ORG rules conditions 2D 95.5 96..5 2 8 3 2 2 double 2D 93.8 84.3 3. 4 3 23 6 2 2 3D 94.8 98.2.2 3 2 22 2 Table 2. Results on popular benchmark datasets dataset avg. accuracy std dev. avg. rules number avg. elem. cond. number PART ORG PART ORG PART ORG PART ORG iris 94 94 4.6 2 3. 3 5.2 balance scale 84 92 2.4 46 6 26 2 Ripley 85 8 8.4 4 2 6 4 breast wisconsin 94 97.7 3 8 6. ecoli 84 76 8. 2 33 9 We may see, on the basis of the results for the synthetic datasets, that ORG may be successfully applied for datasets that contain various oblique dependencies. It may be observed, in comparison with PART results, in the decrease (on average: five times) of the average number of decision rules for every decision class. In the case of popular benchmark datasets the decrease of the number of rules per decision class may be also observed. On the basis of these observations our further works will focus on finding the best conditions in the strategy with taking into consideration also the length of the condition. It should be also worth being examined whether the calculation of oblique condition parameters limitations should be analyzed more often than only in the beginning of dataset analysis. Acknowledgements. This work was supported by the European Community from the European Social Fund. The research and the participation of the second author is supported by National Science Centre (decision DEC-2//D/ST6/77) References. An, A., Cercone, N.: Rule quality measures for rule induction systems - description and evaluation. Computational Intelligence 7, 49 424 (2) 2. Bazan, J., Szczuka, M., Wróblewski, J.: A New Version of Rough Set Exploration System. In: Alpigini, J.J., Peters, J.F., Skowron, A., Zhong, N. (eds.) RSCTC 22. LNCS (LNAI), vol. 2475, pp. 397 44. Springer, Heidelberg (22) 3. Bennett, K.P., Blue, J.A.: A support vector machine approach to decision trees. In: Proceedings of the IJCNN 998, pp. 2396 24 (997) 4. Bloedorn, E., Michalski, R.S.: Data-Driven Constructive Induction. IEEE Intelli. Syst. 3(2), 3 37 (998)

ORG - Oblique Rules Generator 59 5. Cantu-Paz, E., Kamath, C.: Using evolutionary algorithms to induce oblique decision trees. In: Proc. of Genet. and Evol. Comput. Conf., pp. 53 6 (2) 6. Frank, A., Asuncion, A.: UCI Machine Learning Repository (2), http://archive.ics.uci.edu/ml 7. Frank, E., Witten, I.H.: Generating Accurate Rule Sets Without Global Optimization. In: Proc. of the 5th Int. Conf. on Mach. Learn., pp. 44 5 (998) 8. Greco, S., Matarazzo, B., Słowiński, R.: Rough sets theory for multi-criteria decision analysis. Eur. J. of Oper. Res. 29(), 47 (2) 9. Kim, H., Loh, W.-Y.: Classification trees with bivariate linear discriminant node models. J. of Comput. and Graph. Stat. 2, 52 53 (23). Latkowski, R., Mikołajczyk, M.: Data decomposition and decision rule joining for classification of data with missing values. In: Peters, J.F., Skowron, A., Grzymała- Busse, J.W., Kostek, B.z., Świniarski, R.W., Szczuka, M.S. (eds.) Transactions on Rough Sets I. LNCS, vol. 3, pp. 299 32. Springer, Heidelberg (24). Mikołajczyk, M.: Reducing Number of Decision Rules by Joining. In: Alpigini, J.J., Peters, J.F., Skowron, A., Zhong, N. (eds.) RSCTC 22. LNCS (LNAI), vol. 2475, pp. 425 432. Springer, Heidelberg (22) 2. Murthy, S.K., Kasif, S., Salzberg, S.: A system for induction of oblique decision trees. J. of Artif. Intell. Res. 2, 32 (994) 3. Pindur, R., Sasmuga, R., Stefanowski, J.: Hyperplane Aggregation of Dominance Decision Rules. Fundam. Inf. 6(2), 7 37 (24) 4. Raś, Z.W., Daradzińska, A., Liu, X.: System ADReD for discovering rules based on hyperplanes. Eng. App. of Artif. Intell. 7(4), 4 46 (24) 5. Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press (996) 6. Sikora, M.: An algorithm for generalization of decision rules by joining. Found. on Comp. and Decis. Sci. 3(3), 227 239 (25) 7. Ślęzak, D., Wróblewski, J.: Classification Algorithms Based on Linear Combinations of Features. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 999. LNCS (LNAI), vol. 74, pp. 548 553. Springer, Heidelberg (999)