Feature Minimization within Decision Trees. Erin J. Bredensteiner. Kristin P. Bennett. R.P.I. Math Report No November 1995.

Size: px
Start display at page:

Download "Feature Minimization within Decision Trees. Erin J. Bredensteiner. Kristin P. Bennett. R.P.I. Math Report No November 1995."

Transcription

1 Feature Minimization within Decision Trees Erin J. Bredensteiner Kristin P. Bennett R.P.I. Math Report No. 218 November 1995 Abstract Decision trees for classication can be constructed using mathematical programming. Within decision tree algorithms, the feature minimization problem is to construct accurate decisions using as few features or attributes within each decision as possible. Feature minimization is an important aspect of data mining since it helps identify what attributes are important and helps produce accurate and interpretable decision trees. In feature minimization with bounded accuracy, we minimize the number of features using a given misclassication error tolerance. This problem can be formulated as a parametric bilinear program and is shown to be NP-complete. A parametric Frank- Wolfe method is used to solve the bilinear subproblems. The resulting minimization algorithm produces more compact, accurate, and interpretable trees. Computational results compare favorably with a popular greedy feature elimination method as well as with a linear programming method of tree construction. Key Words: Data mining, machine learning, feature minimization, decision trees, bilinear programming. 1 Introduction We consider the fundamental problem in machine learning of the discrimination between elements of two sets A and B in the n-dimensional real space R n. Each dimension of the space represents a feature or attribute of the elements of the set. Commonly, the method of discrimination involves determining a linear function which consists of a linear combination of the attributes of the two given sets. In general it is not possible for a single linear function to completely separate these sets of points. Knowledge Discovery and Data Mining Group, Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY bredee@rpi.edu, bennek@rpi.edu. Telephone (518) FAX (518) This material is based on research supported by National Science Foundation Grant

2 Thus, some error criterion is minimized to determine the linear discriminant. To obtain a more accurate discrimination, many linear separators can be used as the decisions within a decision tree. In a decision tree, several linear discriminants are applied recursively to form a nonlinear separation of the space R n into disjoint regions, each corresponding to set A or set B. The goal is to obtain a decision tree, with one or more decisions, which generalizes well, i.e. correctly classies future points. Feature minimization is an important aspect of multivariate decision tree construction. The goal of feature minimization is to construct good decisions using as few features as possible. By minimizing the number of features used at each decision, understandability of the resulting tree is increased and the number of data evaluations is decreased [6]. Feature minimization is not necessary in univariate decision tree algorithms in which each decision in the tree is based on a single feature or attribute. Note that in this paper we use the term feature and attribute interchangeably. For example, in a credit card approval application a univariate decision may be: \Is income > $50,000?" A multivariate decision uses a linear combination of features, for example: \Is 3*debt > income?" A tree with multivariate decisions can represent more complex relationships using fewer decisions than univariate trees. However multivariate decisions with too many attributes can be dicult to interpret. Our goal is to make both a small number of decisions and to utilize only necessary attributes in each decision. Feature minimization is especially important in data mining applications where the resulting decision tree is used not only to classify future points, but also to understand the underlying characteristics of the sets being studied. An added benet is increased generalization at each node which may assist in better decision tree construction by avoiding overtting. There is a trade o between the complexity of each decision and the number of decisions required in the tree. Multivariate decision trees typically have many fewer decisions than univariate decision trees constructed using one attribute per decision. Univariate decision trees have the advantages that single attribute decisions help avoid over-parameterization and the resulting trees are more readily interpretable provided the number of decisions is not excessive. Reducing the number of features at each decision allows the inclusion of all of the benets of multivariate decisions while maintaining the simplicity of univariate decisions. The goal of this paper is to obtain a compact, accurate decision tree with decisions that include as few features as possible while maintaining a specic level of accuracy. We will be using mathematical programming methods to construct the decisions. In contrast, other common approaches to feature minimization are based on heuristics. Sequential Backward Elimination (SBE) and Sequential Forward Elimination (SFE) [6] involve searching the feature space for features that do not contribute (SBE) or contribute (SFE) to the quality of the decision. In SBE an initial discriminant function is constructed using all of the features and then features are removed sequentially from the problem until some stopping criterion is satised. In SFE, a discriminant is constructed using a single feature and then features are added one at a time. At each iteration in both methods, the best feature to add or remove is determined by nding the best discriminant for each possible attribute. We will compare our method to a simplied version of the SBE called Greedy Sequential Backward Elimination (GSBE)[6] that was found to compare very favorably with the more costly SBE and SFE methods. Using the mathematical programming approach we can evaluate the eectiveness of this heuristic. 2

3 In Section 2, we will discuss the background and formulation of our feature minimization method using parametric bilinear programming. We then prove in Section 3 that our feature minimization problem is NP-complete. In Section 4, we propose an algorithm based on the Frank-Wolfe method discussed in [4] for solving the parametric bilinear programming problem. Section 5 contains a computational comparison of our feature minimization method to GSBE and a linear programming approach to decision tree construction. Results on a number of practical problems are given. The following notation is used. Let A and B be two sets of points in the n- dimensional real space R n with cardinality m and k respectively. Let A be a m n matrix whose rows are the points in A. Let B be a k n matrix whose rows are the points in B. The i th point in A and the i th row of A are both denoted A i. Likewise, B j is the j th point in B and the j th row of B. For two vectors in R n, xy denotes the dot product. Let e denote a vector of ones of the appropriate dimension. The set of minimizers of f(x) on the set S is denoted by arg min x2s f(x). For a vector x in R n, x + will denote the vector in R n with components (x + ) i := maxfx i ; 0g; i = 1; : : :; n. The step function x will denote the vector in [0; 1] n with components (x ) i := 0 if x i 0 and (x ) i := 1 if x i > 0; i = 1; : : :; n. 2 Feature Minimization At each decision we are interested in nding a linear function that separates the two sets. Mathematically, this corresponds to nding the plane such that wx = (1) Aw > e e > Bw (2) where w 2 R n is the normal to the separating plane and determines the distance of the plane from the origin. Upon normalization, this becomes Aw? e? e 0? Bw + e? e 0 (3) Equation (3) is feasible if and only if the two sets are linearly separable. In the event that the sets are not linearly separable, we must choose a plane that minimizes some error function. Our formulations of the feature minimization problem can be applied to many dierent error functions. In this paper, we use an error function that minimizes the average magnitude of misclassied points within each class. The underlying problem without feature minimization is a linear program. This robust linear program (RLP) [3] has been used for decision tree construction [1]. RLP combined with the greedy sequential backward elimination method for feature minimization forms the basis of a breast cancer diagnosis system [17, 16]. Our feature minimization method could also be applied to algorithms that minimize the number of points misclassied such as [2, 11] or to other successful linear programming approaches [10, 15], but we leave these extensions for future work. 3

4 The following robust linear programming problem, RLP [3], minimizes a weighted average of the sum of the distances from the misclassied points to the separating plane. min w;;u;v 1 m eu + 1 k ev subject to u + Aw? e? e 0 v? Bw + e? e 0 u 0; v 0 We are interested in minimizing the number of features at each decision in an eort to balance the amount of separation achieved versus the number of features used. The step function x will be used to count the number of nonzero elements in the vector w. We replace w with (w + )? (w? ) where w + ; w? 0. At optimality, w + = (w) + and w? = (?w) +. Thus the number of nonzero elements in the vector w is now ((w + ) + (w? )). Adding this term to the objective function yields the following multiobjective optimization problem: min w+;w?;;u;v 1 m eu + 1 k ev + e(w + + w? ) subject to u + A(w +? w? )? e? e 0 v? B(w +? w? ) + e? e 0 u 0; v 0; w + 0; w? 0 The rst issue we will confront in the above problem is the elimination of the step function. The step function is removed from problem (5) using properties found in [11] and [12]. The details are contained in the appendix. The resulting linear program (6) with equilibrium constraints is equivalent to the original problem (5). min w+;w?;;u;v;r 1 m eu + 1 ev + er k subject to u + A(w +? w? )? e? e 0 v? B(w +? w? ) + e? e 0 (w + + w? )(e? r) = 0 0 r e u 0; v 0; w + 0; w? 0 Note that at optimality r = (w + + w? ), thus er counts the number of features used. The second issue to confront is how to choose the parameter. The solution of problem (6) yields optimal decisions dependent on the value of. The choice of is not intuitively obvious. We propose two variants of the problem that eliminate the parameter, move the complementarity constraints to the objective function, and allow the problem to be solved using bilinear programming. One possible approach is to minimize the number of features while satisfying a specic misclassication error bound. In our eort to achieve this goal, we propose removing from the problem by bounding the error function in a constraint. Problem (7) removes features while maintaining accuracy within some tolerance,. A similar concept was used by [6] and [16] in their feature elimination methods. In [6], feature elimination was allowed to continue as long as a specic error tolerance was maintained. Street [16] computed planes for all feature counts and then used a tuning set to determine the best plane. We call this problem feature minimization with bounded accuracy. (4) (5) (6) 4

5 Find the positive integer such that where f() = = minfjf() = 0g (7) >0 min w+;w?;;u;v;r subject to (w + + w? )(e? r) 1 m eu + 1 k ev u + A(w +? w? )? e? e 0 v? B(w +? w? ) + e? e 0 0 r e er 2 [1; n] u 0; v 0; w + 0; w? 0 (8) For each xed value of, problem (8) nds a linear separator within a specic error rate. If for any given, f() 6= 0, then no linear discriminant exists with the error tolerance that uses at most features. Theorem 3.2 proves this problem is NP-complete for the error function that counts the number of points misclassied. An alternate approach is to rephrase the problem as follows: What is the best decision that can be made using at most variables? This limited feature minimization problem then becomes: min w+;w?;;u;v;r 1 m eu + 1 k ev + (w + + w? )(e? r) subject to u + A(w +? w? )? e? e 0 v? B(w +? w? ) + e? e 0 0 r e er 2 [1; n] u 0; v 0; w + 0; w? 0 Here must be chosen suciently large in order to force the complementarity constraints of (6), (w + + w? )(e? r) = 0, to be satised at optimality. Limiting the maximum number of features is appealing in practice because if the number of features is small the interpretability of the tree may be greatly enhanced. For example, the number of features per decision could be limited to three, then each decision may be viewed graphically as a three dimensional plot. The extreme case of univariate trees, those limited to a single feature, has been demonstrated to work very well on numerous applications. However, such trees may require an excessive number of decisions. (9) 3 Computational Complexity For this paper we will concentrate on the feature minimization with bounded accuracy problem. In this section, we will show that this problem is NP-complete. We begin by giving a formal denition of a problem titled \bounded accuracy with limited features". We then prove this problem to be NP-complete. The feature minimization with bounded accuracy problem is then dened and the bounded accuracy with limited features problem is used to prove it is NP-complete. Denition 3.1 (Bounded Accuracy with Limited Features) Let X be a nite subset of vectors in R n+1. Let the vector x 2 X have integer valued entries. Is there a 5

6 vector y 2 R n+1 such that at most (0 < n) entries y i ; i = 1; : : :; n are nonzero and such that xy > 0 for at least K vectors x? Specically, X contains vectors of the form [A i ;?1] and [?B i ; 1]. Also, y i = (w +? w? ) i for i = 1; : : :; n and y n+1 =. Theorem 3.1 The Bounded Accuracy with Limited Features Problem is NP-complete. Proof. It is easy to show that this problem is in NP. We need only choose a vector y 2 R n+1 and check in polynomial time whether xy > 0 for at least K vectors x 2 X and if at most elements y i ; i = 1; : : :; n are nonzero. To show that the above problem is NP-complete the Open Hemisphere problem of [9] can be easily transformed into a single instance of the bounded accuracy with limited features problem. The Open Hemisphere problem is the problem of determining if there is a vector y such that xy > 0 for at least K vectors x 2 X. Thus, solving the Open Hemisphere problem is exactly solving the instance of our problem when = n. The feature minimization with bounded accuracy problem is precisely the problem of bounded accuracy with limited features with the added condition that the number of features be minimized. The formal denition of this problem is as follows. Denition 3.2 (Feature Minimization with Bounded Accuracy) Let X be a - nite subset of vectors in R n+1. Let the vector x 2 X have integer valued entries. Find a vector y 2 R n+1 such that the number of nonzero elements y i ; i = 1; : : :; n is minimized and such that xy > 0 for at least K vectors x. Theorem 3.2 The Feature Minimization with Bounded Accuracy Problem is NPcomplete. Proof. We can show that this problem is in NP by relating it to at most two instances of the bounded accuracy with limited features problem which is in NP by Theorem 3.1. There exists a solution for the feature minimization with bounded accuracy problem with exactly p nonzero elements y i ; i = 1; : : :; n if and only if there exists a solution for the bounded accuracy with limited features problem for = p, but no solution exists for = p? 1. Thus, given a vector y with p nonzero elements y i ; i = 1; : : :; n we can check whether or not y is a solution to the feature minimization with bounded accuracy problem by solving the bounded accuracy with limited features problem for = p? 1 and verifying that y satises xy > 0 for at least K vectors x 2 X. Therefore, the feature minimization with bounded accuracy problem is in NP. We will now show that this problem is NP-complete by reducing the bounded accuracy with limited features problem to the feature minimization with bounded accuracy problem. We solve the feature minimization with bounded accuracy problem and obtain the solution y which contains exactly p nonzero elements y i ; i = 1; : : :; n. A solution exists to the bounded accuracy with limited features problem if and only if p. In the next two sections, we describe a practical algorithm for solving the feature minimization with bounded accuracy problem and provide computational results. 6

7 4 Feature Minimization Algorithm In this section we provide the algorithm used in solving our feature minimization problem (7). The rst step in solving problem (7) is to determine values for the parameters and. To determine, we solve the linear program (4) and allow for a 10% error on the value of the objective function. This value of will stay constant for the remainder of the solution of this program. The parameter is allowed to change in the process of solving this parametric bilinear program. Subsection 4.2 contains a complete description of how is chosen. For xed values of and, several approaches are available to nd a solution of program (8). Some possibilities are to apply branch and bound techniques, cutting plane methods, or the Frank-Wolfe method. The approach implemented in this paper uses a Frank-Wolfe type algorithm used successfully to solve bilinear programs in [4, 2]. This algorithm reduces the original bilinear program into two linear programs. One of these linear programs has a closed form solution as shown in [2]. A complete description of our algorithm is given in the following two subsections. 4.1 Bilinear Subproblems The parametric bilinear programming formulation (8) is an uncoupled bilinear program. It has been shown that a Frank-Wolfe algorithm [8] applied to an uncoupled bilinear program will converge to a global solution or a stationary point [4]. Applying this Frank-Wolfe algorithm to problem (8) we obtain the following algorithm: Algorithm 4.1 (Frank-Wolfe algorithm for uncoupled bilinear programs) For xed, Step 1: (w i+1 + ; wi+1? ; i+1 ; u i+1 ; v i+1 ) 2 arg min w+;w?;;u;v (w + + w? )(e? r i ) 1 m eu + 1 k ev u + A(w +? w? )? e? e 0 v? B(w +? w? ) + e? e 0 u 0 v 0 w + 0 w? 0 Step 2: (r i+1 ) 2 arg min (w i+1 + r + wi+1? )(e? r) 0 r e er Step 3: Repeat until no improvement in objective. It can easily be shown that the subproblem contained in step 2 has a closed form integer solution namely r j = 1 for the largest components of jw i+1 j = (w i w i+1? ) otherwise r j = The Feature Minimization Bilinear Program The parametric bilinear program (7) searches for the minimum number of features such that a specic error criterion is met. There are various methods available for choosing which values of should be explicitly solved. For each a series of linear programs 7

8 must be solved, thus it is computationally valuable to solve for as few values of as possible. We have used a modication of the secant method, similar to that used in [2], in the following algorithm: Algorithm 4.2 (Feature Minimization with Bounded Accuracy) Let max denote the smallest number of features such that the error tolerance is satised thus far. Let min denote the largest number of features attempted so far in Algorithm 4.1 such that the error tolerance is violated. All calculations for and p are rounded to the nearest integer. Step 0: Solve the robust LP (4) to nd the best linear discriminant using all of the features. Let lp error = 1 m eu + 1 k ev. Let = 1.1(lp error). Let max = n. Let min = 1. Step 1: Solve bilinear subproblem (8) using Algorithm 4.1 for = 1. If f() = 0 then return = 1 else let f( min ) = f() and = 1 2 (n) Step 2: Solve bilinear subproblem (8) using Algorithm 4.1. Step 3: If f() = 0 then let max = and = 1 2 ( min + max ) else calculate secant method update (? min ) p =? f() (f()? f( min )) Let min = and f( min ) = f(). If p 2 ( min ; max ) then let = p else let = 1 2 ( min + max ) Step 4: If max > min + 1 Go to Step 2 Else return max. In the remaining two sections, we refer to our implementation of the feature minimization with bounded accuracy program as Feature Minimization. 5 Computational Method To evaluate the eectiveness of our mathematical programming method, Feature Minimization, we have implemented a heuristic feature reduction method, GSBE, and the LP problem (4) for comparison. Several experiments on real world data sets are reported. Section 5.1 describes our experimental method, the GSBE algorithm, and the data sets used. Computational results on single linear separators and decision trees are contained in section

9 5.1 Experimental Method Feature Minimization results are compared to the LP (4) and our implementation of GSBE as described below. Each method utilizes the CPLEX 3.0 [7] solver to optimize the linear subproblems. To estimate generalization or accuracy on future data, 10-fold cross validation was used to evaluate the testing set accuracies. The original data set is split into ten equal parts. Nine of these are used for training and the remaining one is saved for testing. This process is repeated ten times allowing each part to be the testing set. Paired t-tests were performed to determine any signicant dierences between the three methods. The Feature Minimization and the GSBE algorithms both require that the training set be normalized. Thus, for all three methods, at each decision we normalize the training data and use the normalization information to transform (w; ) for testing. Two experiments were performed. In the rst, we compared the three results when obtaining a single linear separator. In the second, we compared the use of the three algorithms in a greedy decision tree procedure. This procedure nds the best decision using a given algorithm. This splits the data into two regions, and this process is repeated recursively for points falling in the two regions. This process can be used to construct a decision tree to recognize any two disjoint sets with 100% accuracy. However the resulting tree frequently overts the data causing poor generalization. Thus we must simplify or prune the tree. The pruning method chosen is Error Complexity pruning [5] in which 10% of the training set is reserved specically for pruning. Decision tree growth is terminated when each leaf contains either a single class or both classes where one class has less than 1% of the total points represented. Also to avoid overtting, at a given decision of 2r points, if r or more features are chosen then this decision becomes a leaf and growth is terminated. The Greedy Sequential Backward Elimination algorithm works as follows: Algorithm 5.1 (Greedy Sequential Backward Elimination - GSBE) Step 1: Find a linear separator, (w; ) n, using all n features. Step 2: Let (w; ) best = (w; ) n and i = n. Step 3: Eliminate the feature with the smallest magnitude and let i = i - 1. Step 4: Solve for (w; ) i. Step 5: If a stopping criterion is met, then stop and return (w; ) best. Step 6: Otherwise, let (w; ) best = (w; ) i and go to 3. The Robust LP was used to determine the linear separators in steps 1 and 4. We let GSBE continue as long as the magnitude of misclassication error ( 1 m eu + 1 k ev ) obtained for i variables is less than 110% the error for n variables. This is consistent with the choice of in problem (8). The data sets used in the computational experiments are listed below. All of these data sets except the Star/Galaxy Database and the Plastics data set are available via anonymous le transfer protocol (ftp) from the University of California Irvine UCI Repository of Machine Learning Databases [13]. Cleveland Heart Disease Database The Cleveland Heart Disease Database has 297 patients listed with 13 numeric attributes. Each patient is classied as to 9

10 data. whether there is presence or absence of heart disease. There are 137 patients who have a presence of heart disease. Wisconsin Breast Cancer Database This data set is used to classify 682 patients with breast cancer. Each patient is represented by nine integral attributes ranging in value from 1 to 10. The two classes represented are benign and malignant: 442 of the patients are benign while 240 are malignant. Star/Galaxy Database The Star/Galaxy Database consists of two data sets: dim and bright. The dim data set has 4192 examples and the bright data set has 2462 examples. Each example represents a star or a galaxy and is described by 14 numeric attributes. The bright data set is nearly linearly separable. These two data sets are generated from a large set of star and galaxy images collected by Odewahn [14] at the University of Minnesota. BUPA liver disorders The BUPA data set contains 345 single male patients with 6 numeric attributes. Five of these attributes are blood tests which are thought to be relevant to liver disorders. The sixth attribute corresponds to the number of alcoholic beverages drunk per day. Johns Hopkins University Ionosphere database The Ionosphere data set is used to distinguish between good and bad radar returns. A good return is one indicating evidence of some type of structure in the ionosphere. A bad return simply passes through the ionosphere. There are 351 instances with 34 continuous attributes. Pima Indians Diabetes Database The Pima Diabetes data set consists of 768 female patients who are at least 21 years of age and are of Pima Indian heritage. The 8 numeric attributes describe physical features of each patient United States Congressional Voting Records Database This data set includes votes for each of the 435 U.S. House of Representatives Congressmen. There are 267 democrats and 168 republicans. The chosen attributes represent 16 key votes. Possible values for the attributes are y,n, and?. A value of? indicates that the person did not make a position known. Our program requires numeric valued attributes, thus we let y,n, and? be 2,-2, and 0 respectively. Plastic Part Injection Molding This proprietary plastic part manufacturing data set 1 consists of two data sets: Plastics I has 935 examples and Plastics II has 1118 examples. Each example consists of 47 numeric measurements taken during manufacturing of a plastic part and the resulting quality of the part. 5.2 Computational Results Tables 1 and 2 are the results of constructing a single linear separator on each training set. Tables 3 and 4 contain results obtained from full decision tree construction and Error Complexity pruning. T-tests were computed, for both the decision trees and the single planes, on each pair of methods. We are using a p value of less than 10% to indicate a signicant dierence between the two methods. In every case except two that are noted, a 15% level of signicance yields the same results. First, we will discuss the results for a single plane. Then the decision tree results will be investigated. Table 1 shows that Feature Minimization had a smaller testing set error than GSBE on 6 of the 10 problems. On the three problems where Feature Minimization performed 1 Many thanks to Dr. Ozden Gur Ali and GE Corporate Research and Development for providing this 10

11 Feature Minimization GSBE RLP Heart Cancer Bupa Ionosphere Pima House Votes Bright Dim Plastics I Plastics II Table 1: Single Plane Testing Set Errors (%) worse than GSBE the dierences were not statistically signicant. On the two data sets Plastics I and House Votes, Feature Minimization tested signicantly better than GSBE with p values of 13% and 14% respectively. In general, RLP performed better in terms of testing set accuracy than both GSBE and Feature Minimization. However the t-tests indicate that RLP is signicantly better than both Feature Minimization and GSBE on the Pima, Dim, and Plastics II data sets. T-tests also indicate that Feature Minimization was signicantly better than RLP on the Ionosphere data set. As shown in Table 2, RLP uses many more features than both of the other methods. The average numbers of features used by Feature Minimization and GSBE are very close. Note that GSBE actually used slightly fewer features on the Heart, Ionosphere, Dim, and Plastics II data sets. This indicates that the Frank-Wolfe method (a local optimization method) is failing to nd the global minimum in some of the cases. So it is possible the results could be improved by using a global optimization technique. Our results indicate that the GSBE heuristic does perform well especially when computational times are considered. On average, Feature Minimization takes 15.7 times as long as RLP in training time, and GSBE is 3.2 times slower than RLP. Our results do show that if a small number of features is favored, both GSBE and Feature Minimization provide good results. For the decision tree problem a similar pattern was observed. Table 3 shows that Feature Minimization performed better than GSBE on 8 of the 10 data sets. The only signicant dierences reported by the t-test correspond to the House Votes and Dim results. Feature Minimization performed better on House Votes, but GSBE has a smaller error on the Dim data set. RLP performed better than FM on only ve data sets with a statistically signicant improvement only on the Pima and Dim data sets. RLP performed better than GSBE on 7 of the 10 data sets with a statistically signicant improvement only on the Pima data set. Table 4 lists the average number of features used per decision along with the average number of leaves in each decision tree. Comparing RLP and Feature Minimization in Table 4, Feature Minimization utilized signicantly fewer features than RLP while maintaining a similar small tree size. Thus, Feature Minimization provides a substantial improvement over the RLP 11

12 Feature Minimization GSBE RLP Heart Cancer Bupa Ionosphere Pima House Votes Bright Dim Plastic I Plastic II Table 2: Average Number of Features in Each Linear Separator Feature Minimization GSBE RLP Heart Cancer Bupa Ionosphere Pima House Votes Bright Dim Plastic I Plastic II Table 3: Decision Tree Testing Set Errors (%) program with regards to constructing much simpler but still very accurate decision trees. However, as expected, Feature Minimization is computationally more expensive than RLP. On average Feature Minimization takes 10 times as long to solve as RLP. GSBE also uses signicantly fewer features per decision tree than RLP. On average GSBE takes 3 times as long as RLP to solve. Feature Minimization used less features on average than GSBE on 7 of the 10 data sets. If accuracy is the only priority when choosing a method, then RLP seems preferable. RLP achieved excellent results with the least computational eort. If the number of features is an important criterion, then both Feature Minimization and GSBE obtained excellent results that varied with the data sets. When computational time is considered, the heuristic GSBE has proven to be a very eective method. For high dimensional problems such as the Plastics Manufacturing problems, Feature Minimization performed the best in terms of testing set errors and minimum number of features. 12

13 Feature Minimization GSBE RLP F L F L F L Heart Cancer Bupa Ionosphere Pima House Votes Bright Dim Plastics I Plastics II Table 4: Average Number of Features (F) and Leaves (L) in Decision Tree 6 Conclusions We have proposed two parametric bilinear programming methods for feature minimization. The rst method, feature minimization with bounded accuracy, is the problem of nding a linear separator within a specic accuracy using as few features as possible. The second method, limited feature minimization, nds the best linear discriminant using at most features. The feature minimization with bounded accuracy problem was shown to be NP-complete. These feature minimization approaches can be used to produce accurate decision trees using the minimum number of features. Feature minimization is an important aspect of data mining because we are interested in both the accuracy of the trees and the interpretability of the trees. A Frank-Wolfe algorithm was used to transform the bilinear program into a series of linear programs, half of which have closed form solutions. Computational results indicate that the Feature Minimization method performed as accurately as the robust linear programming method, RLP, and the heuristic GSBE. Feature Minimization and GSBE nd planes with substantially fewer features than RLP. Therefore, Feature Minimization provides improvement over the RLP method with an additional computational time expense. The computationally ecient GSBE heuristic performed surprisingly well. Thus, GSBE proves to be a valuable method. As expected, the results are data set dependent and no single method always performs best. However, Feature Minimization did perform best on the high dimensional plastics data sets. Overall Feature Minimization is a very promising approach. Further work is needed to explore the application of Feature Minimization to other types of discriminant functions and misclassication error metrics. A Removal of the Step Function The following equivalence relation will be used to transform the step function from program (5): 13

14 Proposition A.1 (Characterization of the Step Function) [11], [12]! r r = (a) u = (a) + () (r; u) 2 arg min er subject to: r;u u = r? u + a r + u? e In addition, the elementary relationship in Proposition A.2 will be useful in the removal of the plus function: Proposition A.2 [11] c = d + () c 0 c? d 0 c(c? d) = 0! + In order to apply these properties to problem (5) we let r and u be as follows: Thus from property A.1: r = (w + + w? ) (10) u = (w + + w? ) + = (w + + w? ) = a (11) r = (r? u + a) + = r + (12) u = (r + u? e) + (13) Equation (13) is reduced to the following three equations using Proposition A.2: u 0 (14) e? r 0 (15) u(e? r) = 0 (16) Resulting from equations (11), (12), (15), and (16) are the new constraints: References r 0 (17) e? r 0 (18) (w + + w? )(e? r) = 0 (19) [1] K. P. Bennett. Decision tree construction via linear programming. In M. Evans, editor, Proceedings of the 4th Midwest Articial Intelligence and Cognitive Science Society Conference, pages 97{101, Utica, Illinois, [2] K. P. Bennett and E. J. Bredensteiner. A parametric optimization method for machine learning. Math Report 217, Rensselaer Polytechnic Institute, Troy, New York, Submitted to ORSA Journal on Computing. [3] K. P. Bennett and O. L. Mangasarian. Neural network training via linear programming. In P. M. Pardalos, editor, Advances in Optimization and Parallel Computing, pages 56{67, Amsterdam, North Holland. 14

15 [4] K. P. Bennett and O. L. Mangasarian. Bilinear separation of two sets in n-space. Computational Optimization and Applications, 2:207{227, [5] L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classication and Regression Trees. Wadsworth International, California, [6] C. E. Brodley and P. E. Utgo. Multivariate decision trees. Machine Learning, 19(1):45{77, [7] CPLEX Optimization Incorporated, Incline Village, Nevada. Using the CPLEX Callable Library, [8] M. Frank and P. Wolfe. An algorithm for quadratic programming. Naval Research Logistics Quarterly, 3:95{110, [9] M.R. Garey and D.S. Johnson. Computers and Intractability, A Guide to the Theory of NP{Completeness. W.H. Freeman and Company, San Francisco, [10] F. Glover. Improved linear programming models for discriminant analysis. Decision Sciences, 21:771{785, [11] O. L. Mangasarian. Misclassication minimization. Journal of Global Optimization, 5:309{332, [12] O. L. Mangasarian. Mathematical programming in machine learning. Technical Report 95-06, University of Wisconsin, Madison, Wisconsin, Submitted to Proceedings of Nonlinear Optimization and Applications Workshop, June 1995, Plenum Press. [13] P.M. Murphy and D.W. Aha. UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, California, [14] S. Odewahn, E. Stockwell, R. Pennington, R Humphreys, and W Zumach. Automated star/galaxy discrimination with neural networks. Astronomical Journal, 103(1):318{331, [15] A. Roy, L. S. Kim, and S. Mukhopadhyay. A polynomial time algorithm for the construction and training of a class of multilayer perceptrons. Neural Networks, 6:535{545, [16] W.N. Street. Cancer diagnosis and prognosis via linear-programming-based machine learning. Technical Report 94-14, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, August Ph.D. thesis. [17] W.H. Wolberg, W. N. Street, and O.L. Mangasarian. Image analysis and machine learning applied to breast cancer diagnosis and prognosis. Quantitative Cytology and Histology, 17(2):77{87,

Contrained K-Means Clustering 1 1 Introduction The K-Means clustering algorithm [5] has become a workhorse for the data analyst in many diverse elds.

Contrained K-Means Clustering 1 1 Introduction The K-Means clustering algorithm [5] has become a workhorse for the data analyst in many diverse elds. Constrained K-Means Clustering P. S. Bradley K. P. Bennett A. Demiriz Microsoft Research Dept. of Mathematical Sciences One Microsoft Way Dept. of Decision Sciences and Eng. Sys. Redmond, WA 98052 Renselaer

More information

A Non-greedy Decision Tree Algorithm. Kristin P. Bennett. Rensselaer Polytechnic Institute. Troy, NY 12180

A Non-greedy Decision Tree Algorithm. Kristin P. Bennett. Rensselaer Polytechnic Institute. Troy, NY 12180 Global Tree Optiization: A Non-greedy Decision Tree Algith Kristin P. Bennett Eail benne@rpi.edu Departent of Matheatical Sciences Rensselaer Polytechnic Institute Troy, NY 28 Abstract A non-greedy approach

More information

Localization in Graphs. Richardson, TX Azriel Rosenfeld. Center for Automation Research. College Park, MD

Localization in Graphs. Richardson, TX Azriel Rosenfeld. Center for Automation Research. College Park, MD CAR-TR-728 CS-TR-3326 UMIACS-TR-94-92 Samir Khuller Department of Computer Science Institute for Advanced Computer Studies University of Maryland College Park, MD 20742-3255 Localization in Graphs Azriel

More information

OC1: A randomized algorithm for building oblique. decision trees. Sreerama K. Murthy Simon Kasif. Steven Salzberg. Department of Computer Science

OC1: A randomized algorithm for building oblique. decision trees. Sreerama K. Murthy Simon Kasif. Steven Salzberg. Department of Computer Science OC1: A randomized algorithm for building oblique decision trees Sreerama K. Murthy Simon Kasif Steven Salzberg Department of Computer Science Johns Hopkins University Baltimore, MD 21218 Richard Beigel

More information

Univariate and Multivariate Decision Trees

Univariate and Multivariate Decision Trees Univariate and Multivariate Decision Trees Olcay Taner Yıldız and Ethem Alpaydın Department of Computer Engineering Boğaziçi University İstanbul 80815 Turkey Abstract. Univariate decision trees at each

More information

Unlabeled Data Classification by Support Vector Machines

Unlabeled Data Classification by Support Vector Machines Unlabeled Data Classification by Support Vector Machines Glenn Fung & Olvi L. Mangasarian University of Wisconsin Madison www.cs.wisc.edu/ olvi www.cs.wisc.edu/ gfung The General Problem Given: Points

More information

APPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES

APPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES APPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES A. Likas, K. Blekas and A. Stafylopatis National Technical University of Athens Department

More information

Clustering via Concave Minimization

Clustering via Concave Minimization Clustering via Concave Minimization P. S. Bradley and O. L. Mangasarian Computer Sciences Department University of Wisconsin 1210 West Dayton Street Madison, W 53706 email: paulb@es.wise.edu, olvi@es.wise.edu

More information

Comparing Univariate and Multivariate Decision Trees *

Comparing Univariate and Multivariate Decision Trees * Comparing Univariate and Multivariate Decision Trees * Olcay Taner Yıldız, Ethem Alpaydın Department of Computer Engineering Boğaziçi University, 80815 İstanbul Turkey yildizol@cmpe.boun.edu.tr, alpaydin@boun.edu.tr

More information

r=0 for instances discarded at A r=1 for instances discarded at B r=2 for instances discarded at C r=3 for instances at the leaf node

r=0 for instances discarded at A r=1 for instances discarded at B r=2 for instances discarded at C r=3 for instances at the leaf node Boosting Lazy Decision Trees Xiaoli Zhang Fern xz@ecn.purdue.edu Carla E. Brodley brodley@ecn.purdue.edu School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907 Abstract

More information

Random Forest A. Fornaser

Random Forest A. Fornaser Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University

More information

Research Interests Optimization:

Research Interests Optimization: Mitchell: Research interests 1 Research Interests Optimization: looking for the best solution from among a number of candidates. Prototypical optimization problem: min f(x) subject to g(x) 0 x X IR n Here,

More information

An Empirical Comparison of Ensemble Methods Based on Classification Trees. Mounir Hamza and Denis Larocque. Department of Quantitative Methods

An Empirical Comparison of Ensemble Methods Based on Classification Trees. Mounir Hamza and Denis Larocque. Department of Quantitative Methods An Empirical Comparison of Ensemble Methods Based on Classification Trees Mounir Hamza and Denis Larocque Department of Quantitative Methods HEC Montreal Canada Mounir Hamza and Denis Larocque 1 June 2005

More information

Robust 1-Norm Soft Margin Smooth Support Vector Machine

Robust 1-Norm Soft Margin Smooth Support Vector Machine Robust -Norm Soft Margin Smooth Support Vector Machine Li-Jen Chien, Yuh-Jye Lee, Zhi-Peng Kao, and Chih-Cheng Chang Department of Computer Science and Information Engineering National Taiwan University

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Integrated Classifier Hyperplane Placement and Feature Selection

Integrated Classifier Hyperplane Placement and Feature Selection Final version appears in Expert Systems with Applications (2012), vol. 39, no. 9, pp. 8193-8203. Integrated Classifier Hyperplane Placement and Feature Selection John W. Chinneck Systems and Computer Engineering

More information

Rearrangement of DNA fragments: a branch-and-cut algorithm Abstract. In this paper we consider a problem that arises in the process of reconstruction

Rearrangement of DNA fragments: a branch-and-cut algorithm Abstract. In this paper we consider a problem that arises in the process of reconstruction Rearrangement of DNA fragments: a branch-and-cut algorithm 1 C. E. Ferreira 1 C. C. de Souza 2 Y. Wakabayashi 1 1 Instituto de Mat. e Estatstica 2 Instituto de Computac~ao Universidade de S~ao Paulo e-mail:

More information

A Hybrid Recursive Multi-Way Number Partitioning Algorithm

A Hybrid Recursive Multi-Way Number Partitioning Algorithm Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence A Hybrid Recursive Multi-Way Number Partitioning Algorithm Richard E. Korf Computer Science Department University

More information

Network. Department of Statistics. University of California, Berkeley. January, Abstract

Network. Department of Statistics. University of California, Berkeley. January, Abstract Parallelizing CART Using a Workstation Network Phil Spector Leo Breiman Department of Statistics University of California, Berkeley January, 1995 Abstract The CART (Classication and Regression Trees) program,

More information

Fig. 1): The rule creation algorithm creates an initial fuzzy partitioning for each variable. This is given by a xed number of equally distributed tri

Fig. 1): The rule creation algorithm creates an initial fuzzy partitioning for each variable. This is given by a xed number of equally distributed tri Some Approaches to Improve the Interpretability of Neuro-Fuzzy Classiers Aljoscha Klose, Andreas Nurnberger, and Detlef Nauck Faculty of Computer Science (FIN-IWS), University of Magdeburg Universitatsplatz

More information

The only known methods for solving this problem optimally are enumerative in nature, with branch-and-bound being the most ecient. However, such algori

The only known methods for solving this problem optimally are enumerative in nature, with branch-and-bound being the most ecient. However, such algori Use of K-Near Optimal Solutions to Improve Data Association in Multi-frame Processing Aubrey B. Poore a and in Yan a a Department of Mathematics, Colorado State University, Fort Collins, CO, USA ABSTRACT

More information

Adaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented wh

Adaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented wh Adaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented which, for a large-dimensional exponential family G,

More information

SSV Criterion Based Discretization for Naive Bayes Classifiers

SSV Criterion Based Discretization for Naive Bayes Classifiers SSV Criterion Based Discretization for Naive Bayes Classifiers Krzysztof Grąbczewski kgrabcze@phys.uni.torun.pl Department of Informatics, Nicolaus Copernicus University, ul. Grudziądzka 5, 87-100 Toruń,

More information

size, runs an existing induction algorithm on the rst subset to obtain a rst set of rules, and then processes each of the remaining data subsets at a

size, runs an existing induction algorithm on the rst subset to obtain a rst set of rules, and then processes each of the remaining data subsets at a Multi-Layer Incremental Induction Xindong Wu and William H.W. Lo School of Computer Science and Software Ebgineering Monash University 900 Dandenong Road Melbourne, VIC 3145, Australia Email: xindong@computer.org

More information

of Perceptron. Perceptron CPU Seconds CPU Seconds Per Trial

of Perceptron. Perceptron CPU Seconds CPU Seconds Per Trial Accelerated Learning on the Connection Machine Diane J. Cook Lawrence B. Holder University of Illinois Beckman Institute 405 North Mathews, Urbana, IL 61801 Abstract The complexity of most machine learning

More information

Lab 2: Support Vector Machines

Lab 2: Support Vector Machines Articial neural networks, advanced course, 2D1433 Lab 2: Support Vector Machines March 13, 2007 1 Background Support vector machines, when used for classication, nd a hyperplane w, x + b = 0 that separates

More information

Bilinear Programming

Bilinear Programming Bilinear Programming Artyom G. Nahapetyan Center for Applied Optimization Industrial and Systems Engineering Department University of Florida Gainesville, Florida 32611-6595 Email address: artyom@ufl.edu

More information

Using Pairs of Data-Points to Define Splits for Decision Trees

Using Pairs of Data-Points to Define Splits for Decision Trees Using Pairs of Data-Points to Define Splits for Decision Trees Geoffrey E. Hinton Department of Computer Science University of Toronto Toronto, Ontario, M5S la4, Canada hinton@cs.toronto.edu Michael Revow

More information

Appears in Proceedings of the International Joint Conference on Neural Networks (IJCNN-92), Baltimore, MD, vol. 2, pp. II II-397, June, 1992

Appears in Proceedings of the International Joint Conference on Neural Networks (IJCNN-92), Baltimore, MD, vol. 2, pp. II II-397, June, 1992 Appears in Proceedings of the International Joint Conference on Neural Networks (IJCNN-92), Baltimore, MD, vol. 2, pp. II-392 - II-397, June, 1992 Growing Layers of Perceptrons: Introducing the Extentron

More information

Analysis of Decision Boundaries Generated by Constructive Neural Network Learning Algorithms

Analysis of Decision Boundaries Generated by Constructive Neural Network Learning Algorithms Computer Science Technical Reports Computer Science 995 Analysis of Decision Boundaries Generated by Constructive Neural Network Learning Algorithms ChunHsien Chen Iowa State University R. G. Parekh Iowa

More information

Enumeration of Full Graphs: Onset of the Asymptotic Region. Department of Mathematics. Massachusetts Institute of Technology. Cambridge, MA 02139

Enumeration of Full Graphs: Onset of the Asymptotic Region. Department of Mathematics. Massachusetts Institute of Technology. Cambridge, MA 02139 Enumeration of Full Graphs: Onset of the Asymptotic Region L. J. Cowen D. J. Kleitman y F. Lasaga D. E. Sussman Department of Mathematics Massachusetts Institute of Technology Cambridge, MA 02139 Abstract

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

CloNI: clustering of JN -interval discretization

CloNI: clustering of JN -interval discretization CloNI: clustering of JN -interval discretization C. Ratanamahatana Department of Computer Science, University of California, Riverside, USA Abstract It is known that the naive Bayesian classifier typically

More information

The Global Standard for Mobility (GSM) (see, e.g., [6], [4], [5]) yields a

The Global Standard for Mobility (GSM) (see, e.g., [6], [4], [5]) yields a Preprint 0 (2000)?{? 1 Approximation of a direction of N d in bounded coordinates Jean-Christophe Novelli a Gilles Schaeer b Florent Hivert a a Universite Paris 7 { LIAFA 2, place Jussieu - 75251 Paris

More information

nonlinear surfaces such as quadratic [11] or more comple [5], is straightforward and merely requires transforming the input data by some nonlinear tra

nonlinear surfaces such as quadratic [11] or more comple [5], is straightforward and merely requires transforming the input data by some nonlinear tra Massive Data Discrimination via Linear Support Vector Machines P. S. Bradley O. L. Mangasarian Microsoft Research Computer Sciences Department One Microsoft Way University of Wisconsin Redmond, WA 9805

More information

Generating the Reduced Set by Systematic Sampling

Generating the Reduced Set by Systematic Sampling Generating the Reduced Set by Systematic Sampling Chien-Chung Chang and Yuh-Jye Lee Email: {D9115009, yuh-jye}@mail.ntust.edu.tw Department of Computer Science and Information Engineering National Taiwan

More information

Double Sort Algorithm Resulting in Reference Set of the Desired Size

Double Sort Algorithm Resulting in Reference Set of the Desired Size Biocybernetics and Biomedical Engineering 2008, Volume 28, Number 4, pp. 43 50 Double Sort Algorithm Resulting in Reference Set of the Desired Size MARCIN RANISZEWSKI* Technical University of Łódź, Computer

More information

A Boolean Expression. Reachability Analysis or Bisimulation. Equation Solver. Boolean. equations.

A Boolean Expression. Reachability Analysis or Bisimulation. Equation Solver. Boolean. equations. A Framework for Embedded Real-time System Design? Jin-Young Choi 1, Hee-Hwan Kwak 2, and Insup Lee 2 1 Department of Computer Science and Engineering, Korea Univerity choi@formal.korea.ac.kr 2 Department

More information

Theorem 2.9: nearest addition algorithm

Theorem 2.9: nearest addition algorithm There are severe limits on our ability to compute near-optimal tours It is NP-complete to decide whether a given undirected =(,)has a Hamiltonian cycle An approximation algorithm for the TSP can be used

More information

Global Minimization via Piecewise-Linear Underestimation

Global Minimization via Piecewise-Linear Underestimation Journal of Global Optimization,, 1 9 (2004) c 2004 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Global Minimization via Piecewise-Linear Underestimation O. L. MANGASARIAN olvi@cs.wisc.edu

More information

Multi-Way Number Partitioning

Multi-Way Number Partitioning Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09) Multi-Way Number Partitioning Richard E. Korf Computer Science Department University of California,

More information

16 Greedy Algorithms

16 Greedy Algorithms 16 Greedy Algorithms Optimization algorithms typically go through a sequence of steps, with a set of choices at each For many optimization problems, using dynamic programming to determine the best choices

More information

Discrete Lagrangian-Based Search for Solving MAX-SAT Problems. Benjamin W. Wah and Yi Shang West Main Street. Urbana, IL 61801, USA

Discrete Lagrangian-Based Search for Solving MAX-SAT Problems. Benjamin W. Wah and Yi Shang West Main Street. Urbana, IL 61801, USA To appear: 15th International Joint Conference on Articial Intelligence, 1997 Discrete Lagrangian-Based Search for Solving MAX-SAT Problems Abstract Weighted maximum satisability problems (MAX-SAT) are

More information

Minimum Weight Constrained Forest Problems. Problem Definition

Minimum Weight Constrained Forest Problems. Problem Definition Slide 1 s Xiaoyun Ji, John E. Mitchell Department of Mathematical Sciences Rensselaer Polytechnic Institute Troy, NY, USA jix@rpi.edu, mitchj@rpi.edu 2005 Optimization Days Montreal, Canada May 09, 2005

More information

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Zhou B. B., Brent R. P. and Tridgell A. y Computer Sciences Laboratory The Australian National University Canberra,

More information

Center for Automation and Autonomous Complex Systems. Computer Science Department, Tulane University. New Orleans, LA June 5, 1991.

Center for Automation and Autonomous Complex Systems. Computer Science Department, Tulane University. New Orleans, LA June 5, 1991. Two-phase Backpropagation George M. Georgiou Cris Koutsougeras Center for Automation and Autonomous Complex Systems Computer Science Department, Tulane University New Orleans, LA 70118 June 5, 1991 Abstract

More information

Using Local Trajectory Optimizers To Speed Up Global. Christopher G. Atkeson. Department of Brain and Cognitive Sciences and

Using Local Trajectory Optimizers To Speed Up Global. Christopher G. Atkeson. Department of Brain and Cognitive Sciences and Using Local Trajectory Optimizers To Speed Up Global Optimization In Dynamic Programming Christopher G. Atkeson Department of Brain and Cognitive Sciences and the Articial Intelligence Laboratory Massachusetts

More information

LP-Modelling. dr.ir. C.A.J. Hurkens Technische Universiteit Eindhoven. January 30, 2008

LP-Modelling. dr.ir. C.A.J. Hurkens Technische Universiteit Eindhoven. January 30, 2008 LP-Modelling dr.ir. C.A.J. Hurkens Technische Universiteit Eindhoven January 30, 2008 1 Linear and Integer Programming After a brief check with the backgrounds of the participants it seems that the following

More information

of m clauses, each containing the disjunction of boolean variables from a nite set V = fv 1 ; : : : ; vng of size n [8]. Each variable occurrence with

of m clauses, each containing the disjunction of boolean variables from a nite set V = fv 1 ; : : : ; vng of size n [8]. Each variable occurrence with A Hybridised 3-SAT Algorithm Andrew Slater Automated Reasoning Project, Computer Sciences Laboratory, RSISE, Australian National University, 0200, Canberra Andrew.Slater@anu.edu.au April 9, 1999 1 Introduction

More information

Implementations of Dijkstra's Algorithm. Based on Multi-Level Buckets. November Abstract

Implementations of Dijkstra's Algorithm. Based on Multi-Level Buckets. November Abstract Implementations of Dijkstra's Algorithm Based on Multi-Level Buckets Andrew V. Goldberg NEC Research Institute 4 Independence Way Princeton, NJ 08540 avg@research.nj.nec.com Craig Silverstein Computer

More information

FuzzyDT- A Fuzzy Decision Tree Algorithm Based on C4.5

FuzzyDT- A Fuzzy Decision Tree Algorithm Based on C4.5 FuzzyDT- A Fuzzy Decision Tree Algorithm Based on C4.5 Marcos E. Cintra 1, Maria C. Monard 2, and Heloisa A. Camargo 3 1 Exact and Natural Sciences Dept. - Federal University of the Semi-arid - UFERSA

More information

then present results of directly applying the SFFS feature selection algorithms for mammographic mass detection, discussing how noise and the sie of t

then present results of directly applying the SFFS feature selection algorithms for mammographic mass detection, discussing how noise and the sie of t The role of feature selection in building pattern recogniers for computer-aided diagnosis Clay Spence and Paul Sajda National Information Display Laboratory Sarno Corporation Princeton, NJ 08543-5300,

More information

REDUCING GRAPH COLORING TO CLIQUE SEARCH

REDUCING GRAPH COLORING TO CLIQUE SEARCH Asia Pacific Journal of Mathematics, Vol. 3, No. 1 (2016), 64-85 ISSN 2357-2205 REDUCING GRAPH COLORING TO CLIQUE SEARCH SÁNDOR SZABÓ AND BOGDÁN ZAVÁLNIJ Institute of Mathematics and Informatics, University

More information

Semi-Supervised Support Vector Machines for Unlabeled Data Classification

Semi-Supervised Support Vector Machines for Unlabeled Data Classification Optimization Methods and Software,, 1 14 (2001) c 2001 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Semi-Supervised Support Vector Machines for Unlabeled Data Classification GLENN

More information

Richard E. Korf. June 27, Abstract. divide them into two subsets, so that the sum of the numbers in

Richard E. Korf. June 27, Abstract. divide them into two subsets, so that the sum of the numbers in A Complete Anytime Algorithm for Number Partitioning Richard E. Korf Computer Science Department University of California, Los Angeles Los Angeles, Ca. 90095 korf@cs.ucla.edu June 27, 1997 Abstract Given

More information

Introduction The problem of cancer classication has clear implications on cancer treatment. Additionally, the advent of DNA microarrays introduces a w

Introduction The problem of cancer classication has clear implications on cancer treatment. Additionally, the advent of DNA microarrays introduces a w MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No.677 C.B.C.L Paper No.8

More information

2 The Service Provision Problem The formulation given here can also be found in Tomasgard et al. [6]. That paper also details the background of the mo

2 The Service Provision Problem The formulation given here can also be found in Tomasgard et al. [6]. That paper also details the background of the mo Two-Stage Service Provision by Branch and Bound Shane Dye Department ofmanagement University of Canterbury Christchurch, New Zealand s.dye@mang.canterbury.ac.nz Asgeir Tomasgard SINTEF, Trondheim, Norway

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Isabelle Guyon Notes written by: Johann Leithon. Introduction The process of Machine Learning consist of having a big training data base, which is the input to some learning

More information

CS229 Lecture notes. Raphael John Lamarre Townshend

CS229 Lecture notes. Raphael John Lamarre Townshend CS229 Lecture notes Raphael John Lamarre Townshend Decision Trees We now turn our attention to decision trees, a simple yet flexible class of algorithms. We will first consider the non-linear, region-based

More information

HEURISTIC ALGORITHMS FOR THE GENERALIZED MINIMUM SPANNING TREE PROBLEM

HEURISTIC ALGORITHMS FOR THE GENERALIZED MINIMUM SPANNING TREE PROBLEM Proceedings of the International Conference on Theory and Applications of Mathematics and Informatics - ICTAMI 24, Thessaloniki, Greece HEURISTIC ALGORITHMS FOR THE GENERALIZED MINIMUM SPANNING TREE PROBLEM

More information

Approximability Results for the p-center Problem

Approximability Results for the p-center Problem Approximability Results for the p-center Problem Stefan Buettcher Course Project Algorithm Design and Analysis Prof. Timothy Chan University of Waterloo, Spring 2004 The p-center

More information

FERDINAND KAISER Robust Support Vector Machines For Implicit Outlier Removal. Master of Science Thesis

FERDINAND KAISER Robust Support Vector Machines For Implicit Outlier Removal. Master of Science Thesis FERDINAND KAISER Robust Support Vector Machines For Implicit Outlier Removal Master of Science Thesis Examiners: Dr. Tech. Ari Visa M.Sc. Mikko Parviainen Examiners and topic approved in the Department

More information

An ATM Network Planning Model. A. Farago, V.T. Hai, T. Cinkler, Z. Fekete, A. Arato. Dept. of Telecommunications and Telematics

An ATM Network Planning Model. A. Farago, V.T. Hai, T. Cinkler, Z. Fekete, A. Arato. Dept. of Telecommunications and Telematics An ATM Network Planning Model A. Farago, V.T. Hai, T. Cinkler, Z. Fekete, A. Arato Dept. of Telecommunications and Telematics Technical University of Budapest XI. Stoczek u. 2, Budapest, Hungary H-1111

More information

the application rule M : x:a: B N : A M N : (x:a: B) N and the reduction rule (x: A: B) N! Bfx := Ng. Their algorithm is not fully satisfactory in the

the application rule M : x:a: B N : A M N : (x:a: B) N and the reduction rule (x: A: B) N! Bfx := Ng. Their algorithm is not fully satisfactory in the The Semi-Full Closure of Pure Type Systems? Gilles Barthe Institutionen for Datavetenskap, Chalmers Tekniska Hogskola, Goteborg, Sweden Departamento de Informatica, Universidade do Minho, Braga, Portugal

More information

Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection

Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection Petr Somol 1,2, Jana Novovičová 1,2, and Pavel Pudil 2,1 1 Dept. of Pattern Recognition, Institute of Information Theory and

More information

Complementary Graph Coloring

Complementary Graph Coloring International Journal of Computer (IJC) ISSN 2307-4523 (Print & Online) Global Society of Scientific Research and Researchers http://ijcjournal.org/ Complementary Graph Coloring Mohamed Al-Ibrahim a*,

More information

Salman Ahmed.G* et al. /International Journal of Pharmacy & Technology

Salman Ahmed.G* et al. /International Journal of Pharmacy & Technology ISSN: 0975-766X CODEN: IJPTFI Available Online through Research Article www.ijptonline.com A FRAMEWORK FOR CLASSIFICATION OF MEDICAL DATA USING BIJECTIVE SOFT SET Salman Ahmed.G* Research Scholar M. Tech

More information

Data mining with sparse grids

Data mining with sparse grids Data mining with sparse grids Jochen Garcke and Michael Griebel Institut für Angewandte Mathematik Universität Bonn Data mining with sparse grids p.1/40 Overview What is Data mining? Regularization networks

More information

where is a constant, 0 < <. In other words, the ratio between the shortest and longest paths from a node to a leaf is at least. An BB-tree allows ecie

where is a constant, 0 < <. In other words, the ratio between the shortest and longest paths from a node to a leaf is at least. An BB-tree allows ecie Maintaining -balanced Trees by Partial Rebuilding Arne Andersson Department of Computer Science Lund University Box 8 S-22 00 Lund Sweden Abstract The balance criterion dening the class of -balanced trees

More information

Algorithms for Integer Programming

Algorithms for Integer Programming Algorithms for Integer Programming Laura Galli November 9, 2016 Unlike linear programming problems, integer programming problems are very difficult to solve. In fact, no efficient general algorithm is

More information

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines B. B. Zhou, R. P. Brent and A. Tridgell Computer Sciences Laboratory The Australian National University Canberra,

More information

336 THE STATISTICAL SOFTWARE NEWSLETTER where z is one (randomly taken) pole of the simplex S, g the centroid of the remaining d poles of the simplex

336 THE STATISTICAL SOFTWARE NEWSLETTER where z is one (randomly taken) pole of the simplex S, g the centroid of the remaining d poles of the simplex THE STATISTICAL SOFTWARE NEWSLETTER 335 Simple Evolutionary Heuristics for Global Optimization Josef Tvrdk and Ivan Krivy University of Ostrava, Brafova 7, 701 03 Ostrava, Czech Republic Phone: +420.69.6160

More information

Transmissions in a Network with Capacities and Delays. Christos D. Zaroliagis 4. September 9, 1996

Transmissions in a Network with Capacities and Delays. Christos D. Zaroliagis 4. September 9, 1996 Transmissions in a Network with Capacities and Delays Dimitrios Kagaris 1 Grammati E. Pantziou 2 Spyros Tragoudas 3 Christos D. Zaroliagis 4 September 9, 1996 (1) Electrical Engineering Department, Southern

More information

Model combination. Resampling techniques p.1/34

Model combination. Resampling techniques p.1/34 Model combination The winner-takes-all approach is intuitively the approach which should work the best. However recent results in machine learning show that the performance of the final model can be improved

More information

Fuzzy Partitioning with FID3.1

Fuzzy Partitioning with FID3.1 Fuzzy Partitioning with FID3.1 Cezary Z. Janikow Dept. of Mathematics and Computer Science University of Missouri St. Louis St. Louis, Missouri 63121 janikow@umsl.edu Maciej Fajfer Institute of Computing

More information

PARALLEL COMPUTATION OF THE SINGULAR VALUE DECOMPOSITION ON TREE ARCHITECTURES

PARALLEL COMPUTATION OF THE SINGULAR VALUE DECOMPOSITION ON TREE ARCHITECTURES PARALLEL COMPUTATION OF THE SINGULAR VALUE DECOMPOSITION ON TREE ARCHITECTURES Zhou B. B. and Brent R. P. Computer Sciences Laboratory Australian National University Canberra, ACT 000 Abstract We describe

More information

The task of inductive learning from examples is to nd an approximate definition

The task of inductive learning from examples is to nd an approximate definition 1 Initializing Neural Networks using Decision Trees Arunava Banerjee 1.1 Introduction The task of inductive learning from examples is to nd an approximate definition for an unknown function f(x), given

More information

Combined Weak Classifiers

Combined Weak Classifiers Combined Weak Classifiers Chuanyi Ji and Sheng Ma Department of Electrical, Computer and System Engineering Rensselaer Polytechnic Institute, Troy, NY 12180 chuanyi@ecse.rpi.edu, shengm@ecse.rpi.edu Abstract

More information

A Lazy Approach for Machine Learning Algorithms

A Lazy Approach for Machine Learning Algorithms A Lazy Approach for Machine Learning Algorithms Inés M. Galván, José M. Valls, Nicolas Lecomte and Pedro Isasi Abstract Most machine learning algorithms are eager methods in the sense that a model is generated

More information

Rowena Cole and Luigi Barone. Department of Computer Science, The University of Western Australia, Western Australia, 6907

Rowena Cole and Luigi Barone. Department of Computer Science, The University of Western Australia, Western Australia, 6907 The Game of Clustering Rowena Cole and Luigi Barone Department of Computer Science, The University of Western Australia, Western Australia, 697 frowena, luigig@cs.uwa.edu.au Abstract Clustering is a technique

More information

Optimal Sequential Multi-Way Number Partitioning

Optimal Sequential Multi-Way Number Partitioning Optimal Sequential Multi-Way Number Partitioning Richard E. Korf, Ethan L. Schreiber, and Michael D. Moffitt Computer Science Department University of California, Los Angeles Los Angeles, CA 90095 IBM

More information

Hyperplane Ranking in. Simple Genetic Algorithms. D. Whitley, K. Mathias, and L. Pyeatt. Department of Computer Science. Colorado State University

Hyperplane Ranking in. Simple Genetic Algorithms. D. Whitley, K. Mathias, and L. Pyeatt. Department of Computer Science. Colorado State University Hyperplane Ranking in Simple Genetic Algorithms D. Whitley, K. Mathias, and L. yeatt Department of Computer Science Colorado State University Fort Collins, Colorado 8523 USA whitley,mathiask,pyeatt@cs.colostate.edu

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms Given an NP-hard problem, what should be done? Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one of three desired features. Solve problem to optimality.

More information

AM205: lecture 2. 1 These have been shifted to MD 323 for the rest of the semester.

AM205: lecture 2. 1 These have been shifted to MD 323 for the rest of the semester. AM205: lecture 2 Luna and Gary will hold a Python tutorial on Wednesday in 60 Oxford Street, Room 330 Assignment 1 will be posted this week Chris will hold office hours on Thursday (1:30pm 3:30pm, Pierce

More information

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine

More information

3 INTEGER LINEAR PROGRAMMING

3 INTEGER LINEAR PROGRAMMING 3 INTEGER LINEAR PROGRAMMING PROBLEM DEFINITION Integer linear programming problem (ILP) of the decision variables x 1,..,x n : (ILP) subject to minimize c x j j n j= 1 a ij x j x j 0 x j integer n j=

More information

Dual-Based Approximation Algorithms for Cut-Based Network Connectivity Problems

Dual-Based Approximation Algorithms for Cut-Based Network Connectivity Problems Dual-Based Approximation Algorithms for Cut-Based Network Connectivity Problems Benjamin Grimmer bdg79@cornell.edu arxiv:1508.05567v2 [cs.ds] 20 Jul 2017 Abstract We consider a variety of NP-Complete network

More information

SUPERVISED CLASSIFIER PERFORMANCE ON THE UCI DATABASE. Anh Hoang B.Sc. (Computer Science) Department of Computer Science. University of Adelaide

SUPERVISED CLASSIFIER PERFORMANCE ON THE UCI DATABASE. Anh Hoang B.Sc. (Computer Science) Department of Computer Science. University of Adelaide SUPERVISED CLASSIFIER PERFORMANCE ON THE UCI DATABASE Anh Hoang B.Sc. (Computer Science) Department of Computer Science University of Adelaide submitted in partial fulfilment of the requirements for the

More information

Performance analysis of a MLP weight initialization algorithm

Performance analysis of a MLP weight initialization algorithm Performance analysis of a MLP weight initialization algorithm Mohamed Karouia (1,2), Régis Lengellé (1) and Thierry Denœux (1) (1) Université de Compiègne U.R.A. CNRS 817 Heudiasyc BP 49 - F-2 Compiègne

More information

Penalizied Logistic Regression for Classification

Penalizied Logistic Regression for Classification Penalizied Logistic Regression for Classification Gennady G. Pekhimenko Department of Computer Science University of Toronto Toronto, ON M5S3L1 pgen@cs.toronto.edu Abstract Investigation for using different

More information

USING REGRESSION TREES IN PREDICTIVE MODELLING

USING REGRESSION TREES IN PREDICTIVE MODELLING Production Systems and Information Engineering Volume 4 (2006), pp. 115-124 115 USING REGRESSION TREES IN PREDICTIVE MODELLING TAMÁS FEHÉR University of Miskolc, Hungary Department of Information Engineering

More information

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs Advanced Operations Research Techniques IE316 Quiz 1 Review Dr. Ted Ralphs IE316 Quiz 1 Review 1 Reading for The Quiz Material covered in detail in lecture. 1.1, 1.4, 2.1-2.6, 3.1-3.3, 3.5 Background material

More information

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples. Supervised Learning with Neural Networks We now look at how an agent might learn to solve a general problem by seeing examples. Aims: to present an outline of supervised learning as part of AI; to introduce

More information

A tabu search based memetic algorithm for the max-mean dispersion problem

A tabu search based memetic algorithm for the max-mean dispersion problem A tabu search based memetic algorithm for the max-mean dispersion problem Xiangjing Lai a and Jin-Kao Hao a,b, a LERIA, Université d'angers, 2 Bd Lavoisier, 49045 Angers, France b Institut Universitaire

More information

Induction of Multivariate Decision Trees by Using Dipolar Criteria

Induction of Multivariate Decision Trees by Using Dipolar Criteria Induction of Multivariate Decision Trees by Using Dipolar Criteria Leon Bobrowski 1,2 and Marek Krȩtowski 1 1 Institute of Computer Science, Technical University of Bia lystok, Poland 2 Institute of Biocybernetics

More information

A Recursive Coalescing Method for Bisecting Graphs

A Recursive Coalescing Method for Bisecting Graphs A Recursive Coalescing Method for Bisecting Graphs The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Citation Accessed Citable

More information

Journal of Global Optimization, 10, 1{40 (1997) A Discrete Lagrangian-Based Global-Search. Method for Solving Satisability Problems *

Journal of Global Optimization, 10, 1{40 (1997) A Discrete Lagrangian-Based Global-Search. Method for Solving Satisability Problems * Journal of Global Optimization, 10, 1{40 (1997) c 1997 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. A Discrete Lagrangian-Based Global-Search Method for Solving Satisability Problems

More information

Progress in Image Analysis and Processing III, pp , World Scientic, Singapore, AUTOMATIC INTERPRETATION OF FLOOR PLANS USING

Progress in Image Analysis and Processing III, pp , World Scientic, Singapore, AUTOMATIC INTERPRETATION OF FLOOR PLANS USING Progress in Image Analysis and Processing III, pp. 233-240, World Scientic, Singapore, 1994. 1 AUTOMATIC INTERPRETATION OF FLOOR PLANS USING SPATIAL INDEXING HANAN SAMET AYA SOFFER Computer Science Department

More information

Integer Programming ISE 418. Lecture 7. Dr. Ted Ralphs

Integer Programming ISE 418. Lecture 7. Dr. Ted Ralphs Integer Programming ISE 418 Lecture 7 Dr. Ted Ralphs ISE 418 Lecture 7 1 Reading for This Lecture Nemhauser and Wolsey Sections II.3.1, II.3.6, II.4.1, II.4.2, II.5.4 Wolsey Chapter 7 CCZ Chapter 1 Constraint

More information

control polytope. These points are manipulated by a descent method to compute a candidate global minimizer. The second method is described in Section

control polytope. These points are manipulated by a descent method to compute a candidate global minimizer. The second method is described in Section Some Heuristics and Test Problems for Nonconvex Quadratic Programming over a Simplex Ivo Nowak September 3, 1998 Keywords:global optimization, nonconvex quadratic programming, heuristics, Bezier methods,

More information