Linearly and Quadratically Separable Classifiers Using Adaptive Approach

Size: px
Start display at page:

Download "Linearly and Quadratically Separable Classifiers Using Adaptive Approach"

Transcription

1 Soliman MAMA, Abo-Bakr RM. Linearly and quadratically separable classifiers using adaptive approach. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 26(5): Sept DOI /s x Linearly and Quadratically Separable Classifiers Using Adaptive Approach Mohamed Abdel-Kawy Mohamed Ali Soliman 1 and Rasha M. Abo-Bakr 2 1 Department of Computer and Systems Engineering, Faculty of Engineering, Zagazig University, Zagazig, Egypt 2 Departement of Mathematics, Faculty of Science, Zagazig University, Zagazig, Egypt mamas2000@hotmail.com; rasha abobakr@hotmail.com Received October 3, 2009; revised May 14, Abstract This paper presents a fast adaptive iterative algorithm to solve linearly separable classification problems in R n. In each iteration, a subset of the sampling data (n-points, where n is the number of features) is adaptively chosen and a hyperplane is constructed such that it separates the chosen n-points at a margin ɛ and best classifies the remaining points. The classification problem is formulated and the details of the algorithm are presented. Further, the algorithm is extended to solving quadratically separable classification problems. The basic idea is based on mapping the physical space to another larger one where the problem becomes linearly separable. Numerical illustrations show that few iteration steps are sufficient for convergence when classes are linearly separable. For nonlinearly separable data, given a specified maximum number of iteration steps, the algorithm returns the best hyperplane that minimizes the number of misclassified points occurring through these steps. Comparisons with other machine learning algorithms on practical and benchmark datasets are also presented, showing the performance of the proposed algorithm. Keywords linear classification, quadratic classification, iterative approach, adaptive technique 1 Introduction Pattern recognition [1-2] is the scientific discipline whose goal is the classification of objects into a number of categories or classes. Depending on applications, these objects can be images or signal waveforms or any type of measurements that need to be classified. Linear separability is an important topic in the domains of artificial intelligence and machine learning. There are many real life problems in which there is a linear separation. A linear model is very robust against noise since a nonlinear model may consider the noisy samples in training data and perform more calculations to fit them. However, it may be less efficient than a linear model for testing data. Multilayer nonlinear (NL) neural networks, such as the back-propagation algorithm, work well for nonlinear classification problems. However, using back-propagation for a linear problem is overkill, with thousands of iterations needed to get to the point where linear separation can bring us fast. Linear separability methods are also used for training Support Vector Machines (SVMs) [3-4] used for pattern recognition. Support Vector Machines are linear learning machines on linearly or nonlinearly separable data. They are trained by finding a hyperplane that linearly separates the data. In the case of nonlinearly separable data, the data are mapped into some other Euclidean space. Thus, SVM is still doing a linear separation but in a different space. In this paper, a novel and efficient method of finding a hyperplane which separates two linearly separable (LS) sets in R n is proposed. It is an adaptive iterative linear classifier (AILC) approach. The main idea in our approach is to detect the boundary region between the two classes where the points of different classes are close to each other. Then, from this region, n-points belonging to the two different classes are chosen and a hyperplane is constructed such that each of the n-points lies at prescribed distance ɛ (but points belonging to each class lie at opposite sides) from it. There exist precisely two such hyperplanes from which we choose the one that correctly classifies more points. If the chosen hyperplane successfully classifies all the points, calculations are terminated. Otherwise, other n-points are chosen to start a next iteration. These n-points are chosen adaptively from the misclassified ones as those were furthest from the constructed hyperplane in the current iteration because these points are most probably lying in the critical region between the two classes. Compared with other iterative linear classifiers, this approach is Regular Paper 2011 Springer Science + Business Media, LLC & Science Press, China

2 Mohamed Abdel-Kawy Mohamed Ali Soliman et al.: Linearly and Quadratically Separable Classifiers 909 adaptive and numerical results show that very few iteration steps are sufficient for convergence even for large sampling data. The concept of a hyperplane is extended to performing quadratic classifications, not just linear ones. Analogous to the separating hyperplane that is represented by a linear (first degree) equation, in quadratic classification a second degree hypersurface is constructed to separate the two classes. This paper is divided into seven sections. In Section 2, a brief survey of methods which classify LS classes are introduced showing theoretical basis for the most related ones to the proposed classifier. In Section 3, the main idea, geometric interpretation and mathematical formulation of the proposed AILC are presented. Illustrative examples are given in Section 4. The quadratically separable classifier is discussed and demonstrated by some examples in Section 5. Comparisons with other known algorithms are performed for linearly and nonlinearly separable benchmark datasets and results are presented in Section 6. Finally, in Section 7, conclusions and future work are discussed. 2 Comparison with Existing Algorithms Numerous techniques exist in literature for solving the linear separability classification problem. These techniques include the methods based on solving linear constraints (the Fourier-Kuhn elimination algorithm [5] or linear programming [6] ), the methods based on the perceptron algorithm [7], and the methods based on computational geometry (convex hull) techniques [8]. In addition, statistical approaches are characterized by an explicit underlying probability model, which provides a probability that an instance belongs to a specific class, rather than simply a classification. The most related algorithms to the one proposed in this work are the perceptron and SVM algorithms. The perceptron algorithm was proposed by Rosenblatt [5] for computing a hyperplane that linearly separates two finite and disjoint sets of points. In the perceptron algorithm, starting with arbitrary hyperplane, the dataset is tested sequentially point after point to check if it is correctly classified. If a point is misclassified, the current hyperplane is updated to correctly classify this point. This process is repeated until a hyperplane is found that succeeds to classify the full dataset. If two classes are linearly separable, the perceptron algorithm will provide, in a finite number of steps, a hyperplane that linearly separates the two classes. However it is not known, ahead of time, how many iteration steps are needed for the algorithm to converge. SVM [3], as a linear learning method, is trained by finding an optimum hyperplane that separates the dataset (with largest possible margin) by solving a constrained convex quadratic programming optimization problem which is time consuming. In the proposed AILC, starting with an arbitrary hyperplane, the full dataset is tested and the information about the relative locations of the misclassified points with respect to the hyperplane is utilized to predict the critical region between the two classes where a better hyperplane can exist. This adaptive nature of iteration speeds up the convergence to a hyperplane that successfully separates the two classes. In Section 3, the classification problem is reformulated to produce the required information at low cost. In addition, theoretical basis and implementation of AILC are provided. 3 Adaptive Iterative Linear Classifier (AILC) In this section we present adaptive iterative linear classifier (AILC). The main idea in our approach is to simulate how one can predict a line in R 2 that separates points belonging to two linearly separable classes. First, it detects the boundary region between the two classes where points of different classes are close to each other. From this region of interest, it can choose two points (one point of each class) that seem to be most difficult (nearest) and predict a line that not only separates the two points but, as much as possible, correctly separates the two classes, that is it tries to construct a line having one of the points with remaining points of its class in one side of the line and the second point with the rest of its class in the other side. If such a line exists, the task is done. Otherwise, another two points are chosen to start a next iteration. These new points are chosen adaptively as those expected, by the constructed line in the current iteration, to lie in the border region between the two classes. Construction of a separating line in our approach is characterized by the requirement that the 2-points lie at prescribed distance ɛ (but at opposite sides) from it. In fact, there exist precisely two such lines from which we choose the one that correctly separates more points. A generalization about R n is straight forward. Starting with n-points in R n belonging to two different classes, we construct a hyperplane such that each of the n-points lies at prescribed distance ɛ (but points belonging to each class lie at opposite sides) from it. Again, there exist precisely two such hyperplanes from which we choose the one that correctly classifies more points. If the chosen hyperplane successfully classifies all the points, we terminate calculations. Otherwise, a new iteration is started by choosing another n-point from the misclassified ones (see Subsection 3.1 for more details).

3 910 J. Comput. Sci. & Technol., Sept. 2011, Vol.26, No.5 This approach is more efficient than other related methods proposed in the literature. For example, the CLS [9-11] examines each possible hyperplane passing by every n-points set to check if it can successfully classify the remaining points. When such a hyperplane is reached, the required hyperplane is constructed such that it, further, properly separates the n-points according to their classes. 3.1 Geometric Interpretation and Theoretical Basis for AILC The classification problem considered in this work consists of finding a hyperplane P that linearly separates N points in R n. Each of these points belongs to either of the two disjoint classes A or B that lie in the positive or negative half space of P, respectively. If the training data are linearly separable, then a hyperplane exists such that P (w; t) : x T i w + t = 0 (1) x T i w + t > 0, for all x i A x T i w + t < 0, for all x i B, (2) where x T i R n is the feature vector (or the coordinates) of point i while w R n is termed the weight vector and t R the bias (or t is termed the threshold) of the hyperplane. Defining the class identifier variables is that each e i is a measure of the distance between point x i and P. This can easily be proven as follows. Recall that the distance between any point x i R n and the hyperplane P (W ; c), x i P (W ; c) is given by δ(x i, P ) = x T i W + c / W = e i / W > 0, (6) where W is the L 2 norm of W (length of vector W ), then e i = W δ(x i, P ). (7) In our approach, since W = (w 1, w 2,..., w n ) T consists of n-unknown components, we choose n-points and assume that they all lie at constant distances from a trial hyperplane P such that each point lies in the proper half space according to its class. Substitution of x T i, d i and e i = ɛ > 0, i = 1, 2,..., n in (5) and noting that c = 1 or 1, produces two linear systems of equations in the n-unknowns w 1, w 2,..., w n. Solution of these systems (assuming linear independence of the equations) produces two hyperplanes: P 1 = P (W 1 ; 1) and P 2 = P (W 2 ; 1). The first adaptive feature of the proposed algorithm is to select from P 1 and P 2 the more efficient one in classifying the remaining N n points. { 1, if xi class A, d i = 1, if x i class B, (3) (2) reduces to the single form d i (x T i w + t) > 0, i = 1, 2,..., N. (4) Dividing (4) by t yields d i (x T i W + c) = e i, e i > 0, i = 1, 2,..., N, (5) where W = w/ t is a weighted vector having the same direction of w (normal to hyperplane P (W ; c): x T i W + c = 0) and pointing to its positive half space, and c = 1 or 1 according to the sign of t is either positive or negative, respectively. In (5), we have introduced the variables e i, i = 1, 2,..., N for the first time. These variables will be the source of information in our approach. According to (5), a hyperplane P (W ; c) will correctly separate the two classes if e i > 0, i = 1, 2,..., N. However, for a trial hyperplane P (Ŵ ; ĉ), if substitution of Ŵ and ĉ in (5) produces negative value for e i then point i is misclassified by P. Another interesting importance of these Fig.1. Choice of the better hyperplane. The arrow of each hyperplane refers to its positive half-space. In Fig.1, an elastration in R 2 is presented with N = 16 (8 points of each class), where we refer by a black circle to the class with identifier d = 1 and a triangle to the other class with d = 1. The starting 2-points are enclosed in squares. Both P 1 and P 2 successfully separate the chosen points into the two classes. However it is not guaranteed that both P 1 and P 2 correctly classify the full N-set of points. P 2 succeeded in classifying 12 points (5 circles in its positive half space and 7 triangles in the other side) but failed with the rest 4 points whereas P 1 succeeded in classifying 6 points (4 circles in its positive half space and 2 triangles in the other side) but failed with the rest 10 points. Thus the algorithm chooses P 2.

4 Mohamed Abdel-Kawy Mohamed Ali Soliman et al.: Linearly and Quadratically Separable Classifiers Mathematical Formulation Let x T i = [x i1, x i2,..., x in ] be the row representation of the components of an input data point x i R n that has n features and let N be the number of data points belonging to two disjoint classes (A and B). Then, applying (5) for all N-points yields the system where d 1 0 d 2 D =... 0 D(X T W + C) = E (8) d N, x 11 x 12 x 1n x 21 x 22 x 2n X T =.... x N1 x N2 x Nn W 1 c W W = 2., C = c. c W n 1 J N = 1. 1 n 1 N 1, E = e 1 e 2. e N N n N 1, = cj N,. (9) And thus, the classification problem is formulated by D(X T W + cj N ) = E. (10) One has to notice that matrices X T and D represent the input data such that, for each point i, X T contains in each row i, the feature vector X T i and D is a diagonal matrix formed such that its diagonal elements are the elements of vector d = [d 1 d 2... d N ] T. Thus, interchanging the rows of both X T and D corresponds to reordering of the N-points. In (10), J N is an N-vector whose entries are all unity and c = ±1. Also, referring to (5), for a separating hyperplane, all the entries of vector E must be positive. Hence the classification problem reads: find a hyperplane such that all the entries of E are all positive, or equivalently find W and c, such that: E > 0. (11) The proposed solution consists of the partitioning of the N-system (see (10)), into two subsystems; the first one consists of the first n-equations while the second subsystem consists of the next (N n) equations. Let X T be partitioned as: [ X T = then (10) is rewritten as a n n b (N n) n [ D1 0 ]([ a ] [ J ]) [ 1 E1 ] W + c =, (12) 0 D 2 b J 2 E 2 where a is a nonsingular square matrix of dimension n, b is in general a rectangular matrix of dimension (N n) n, J 1 and J 2 are vectors with unit n and N n components, respectively. D 1 and D 2 are diagonal square matrices of dimensions n and N n, respectively. (12) can then be written as ], D 1 (aw + cj 1 ) = E 1, (13) D 2 (bw + cj 2 ) = E 2. (14) And the classification problem becomes: find W and c, such that: E 1 > 0, E 2 > Adaptive Iterative Linear Classifier (AILC) To simplify the solution of (13) and (14), choose a small positive number ɛ and assume e 1 = e 2 = = e n = ɛ > 0, (15) then E 1 = ɛj 1 > 0 and hence, upon substitution in (13), using D 1 1 = D 1, and solving W as a function of c, it reduces to W = a 1 Q. (16) Here Q = (ɛd 1 J 1 cj 1 ) is a vector of length n and is computed easily because its i-entry is given by: ɛd i c, 1 < i < n. Substituting (16) in (14) E 2 = D 2 bw + cd 2 J 2. (17) To compute E 2, note that the i-th-entry of E 2 is e i = d i (b T i W + c), n + 1 i N. (18) Clearly, since vector Q is dependent on the value of c, then so are both W and E Adaptive Procedure In the proposed AILC, we try to speed up the convergence rate by making full use of all available information within and after iteration. Two adaptive choices are performed as follows. First, within iteration r, the algorithm chooses the value of c as +1 or 1 such that the constructed hyperplane correctly classifies more points as described in

5 912 J. Comput. Sci. & Technol., Sept. 2011, Vol.26, No.5 Subsection 3.1. In Algorithm 1 the implementation of this adaptive choice is presented. Algorithm 1. Iteration r (a 1, b, D 1, D 2; c, W, E 2, m) 1. Set c = Compute the vectors W (c) and E 2(c) using (16), (17). 3. Compute m(c) as the number of negative entries of E 2(c). 4. if m(c) = 0, then E 2(c) > 0, go to step else if c = 1, set c = 1 and repeat steps if m(1) < m( 1) then c = 1 produces the accepted hyperplane P r. Set c = 1, go to step else c = 1 produces the accepted hyperplane P r. Set c = 1, go to step The separating hyperplane P is defined by c, W (c). end iteration r. 9. The best hyperplane P r is defined by c, W = W (c), return also, E 2(c), m(c). end iteration r. Second, after an iteration r, vector E r = [ E 1 ] E is 2 r computed. E r is constructed as augmentation of E 1 all of whose n-entries equal ɛ and E 2 whose entries are computed by (18). In fact, E r contains important information about the fitness of the constructed hyperplane P r as a separator. First, recall that a negative sign of an entry e i of E r means that point i is misclassified by the hyperplane. Second, (in (7)) the absolute value of e i provides a measure of the distance of point i from the hyperplane. Thus, if entries of E r are all positive, then P r is an acceptable classifier, otherwise, the entries having the lowest values in E r would correspond to the furthest misclassified points from P r and hence such points more probably lie in the critical region between the two classes where an objective classifier P has to be constructed. Accordingly, we choose n of these points (that, in addition, must be linearly independent and belong to both of the different classes) to determine the hyperplane in the next iteration. So, matrix a in (12) is chosen by adaptively reordering the input matrix X T after each iteration such that the first n-rows of X T and D correspond to the data of the chosen n-points. An illustration in R 2 is shown in Fig.2 where black circles and triangles refer to the classes that must lie in the positive and negative half space, respectively. The misclassified points lie in the shaded regions and the chosen 2-points for the next iteration are shown in rectangles Implementation of AILC Algorithm 1 describes a typical iteration r that returns either a separating hyperplane P or a hyperplane P r. Although it does not successfully classify all the points, it minimizes the number m of misclassified points through the adaptive choice of c. Adaptive Reordering Algorithm (Algorithm 2) rearranges X T, d such that the first n-points in X T (forming a in the next iteration) must satisfy the conditions: 1) correspond to rows that have the lowest values in E, 2) a is nonsingular, and 3) belonging to the two classes. The details of such an algorithm are presented in Algorithm 2. The complete algorithm AILC is presented in Algorithm 3. Algorithm 2. Adaptive Reordering (n, N, ɛ, X T, d, E 2) 1. Form vector E as augmentation of E 1 (all its n- entries equal ɛ) and E Form vector F such that its entries are the rows numbers of E when it is sorted in an ascending order. 3. Set a(n, n) = zero matrix, da(n) = zero vector, flag(n) = zero vector. 4. Set i = 1, j = while i < n 6. while j < N I. k = F (j), a T i = X T k II. if rank (first i rows of a) = i, then set: da(i) = d(k); flag(k) = i; break, end. III. j = j + 1; go to step i = i + 1; go to step i = n. 9. while j N I. K = F (j), a T i = X T k II. if (d(k) = da(n 1) and rank (first i rows of a) = i), then set: da(i) = d(k); flag(k) = i; break, end. III. j = j + 1; go to step for each 1 k N, if (flag(k) = i 0), X T k = X T i, d(k) = d(i). 11. for i = 1 to n, X T i = a T i, d(i) = da(i). 4 Numerical Illustration Fig.2. Illustration of the adaptive choice of next iteration in the classifier (AILC) in R 2. In this section, the use of algorithm AILC is demonstrated by three linearly separable (LS) examples. The

6 Mohamed Abdel-Kawy Mohamed Ali Soliman et al.: Linearly and Quadratically Separable Classifiers 913 Algorithm 3. AILC (N, n, X T, d, ɛ, rmax; c, W, r, m) Input: data N, n, N n array X T, class identifier N 1 array d, maximum number of iterations rmax, and parameter ɛ. Output: a hyperplane (c, W ), iteration r, and number of misclassified points m. 1. Arrange X T, d such that the first n-rows of X T form a nonsingular n n matrix a. 2. Set m 0 = N, r = while r rmax a) Form the partitioned matrices a, b, D 1, D 2 then compute a 1 (see (12)). b) Call Iteration r (a 1, b, D 1, D 2; c, W, E 2, m). c) if m = 0 (successful separation), return c, W, r, m; break; end. d) else if m < m 0, m 0 = m, c opt = c, W opt = W, r opt = r. e) Call Adaptive Reordering (n, N, ɛ, X T, d, E 2). f) r = r + 1. g) go to step return data of hyperplane with minimum misclassified points: c = c opt, W = W opt, also return m = m 0, r = r opt. end. first is a 2D-classification problem where successive iterations are visualized to illustrate the adaptive feature and convergence behavior of the algorithm. The influence of the value of ε and the reordering of input data on the convergence are numerically discussed. The second example is a 3D-classification problem in R 3 while the third one is a 4D-classification problem in R 4 where the standard benchmark classification dataset: IRIS [12] is arranged as two LS classes. Example 1. A 2D-classification problem consists of two classes A (black circles) and B (triangles) given: A = {(4, 3), (0, 4), (2, 1.6), (7, 3), (3, 4), (4, 3), (3, 2)} B = {(4, 4), ( 3, 0), ( 6, 1), (1, 0), (1, 0.5), (0, 7), (6, 2)}. Points of the two classes A, B are represented in Fig.3(a) showing great difficulty in classifying these data. A circle about the starting two points ( 4, 3), (4, 4) are also shown. Figs. 3(b) 3(d) show the application of our algorithm to this problem with ɛ = After each iteration, the computed weight vector and threshold are shown in Table 1. To discuss the dependency of the proposed algorithm on the starting n-points and the parameter ɛ, we repeat solving the previous example starting with another two points ( 3, 0), (4, 4) and select ɛ = 0.4. The number of iteration changes; two iterations were required to classify these difficult data although the starting points Fig.3. 2D plot of the two-class classification problem (class A (black circles), class B (triangles)). Squares indicate the worst points after iterations 1 and 2. (a) Original dataset. (b) After iteration 1. (c) After iteration 2. (d) After iteration 3. Table 1. Weight Vectors and Threshold Values Obtained by Executing the Algorithm i (iteration) W (weight vector) c (threshold) 1 (1.6375, 2) 1 2 ( , ) 1 3 ( 0.3, 1.175) 1 Table 2. Weight Vectors and Threshold Values Obtained by Executing the Algorithm i (iteration) W (weight vector) c (threshold) 1 (0.4667, ) 1 2 ( , ) 1 belong to the same class (B). The results are presented in Table 2 and Fig.4. It would be mentioned that no more than 4 iterations were needed to solve this classification problem irrespective of starting points and for < ɛ < 0.5. Example 2. The algorithm presented in Algorithm 3 was tested by applying it to an LS 3D-classification problem that consists of two classes A( ) and B( ). A = {(1, 4.5, 1), (2, 4, 3), (6, 5, 4), (4, 6, 5), (4, 5, 6), (1, 3, 1)} B = {(0, 4, 0), (2, 4, 3), ( 4, 4, 2), ( 3, 4, 4), ( 2, 3, 3), ( 4, 4, 1)}.

7 914 J. Comput. Sci. & Technol., Sept. 2011, Vol.26, No.5 Fig.4. Classification of the same classes (black circles = class A, triangles = class B) represented in Fig.3(a) when we start with ( 3, 0), (4, 4) and ɛ = 0.4. iteration are included in squares. The worst points after the first Table 3. Weight Vectors and Threshold Values Obtained by Executing the Algorithm i (iteration) W (weight vector) c (threshold) 1 (0.9375, 0.175, 0.425) 1 2 (0.465, 0.175, 0.31) 1 Starting with points (0, 4, 0), (1, 4.5, 1), (2, 4, 3) and choosing ɛ = 0.3, Algorithm in Table 3 was applied to classify these data. Two iterations were sufficient to solve this classification problem as shown in Fig.5. The situation after the first and second iterations are shown in Figs. 5(a) and 5(b) respectively. In each case, the graph was rotated such that the view was perpendicular to the separating plane. After the first iteration the points (2, 4, 3), (1, 3, 1), (0, 4, 0) were found to be the worst. The results of different iterations are summarized in Table 3. The dataset describes every iris plant using four input parameters (Sepal length, Sepal width, Petal length, and Petal width). The dataset contains a total of 150 samples with 50 samples for each of the three classes. Some of the publications that used only the samples belonging to the Iris Versicolour and the Iris Virginica classes include: Fisher [13] (1936), Dasarathy (1980), Elizondo (1997), and Gates (1972). Although the IRIS dataset is nonlinearly separable, it is known that all the samples of the Iris Setosa class are linearly separable from the rest of the samples (Iris Versicolour and Iris Virginica). Therefore, in this example, a linearly separable dataset was constructed from the IRIS dataset such that the samples belonging to Iris Versicolour and Iris Virginica classes were grouped in one class and the Iris Setosa was considered to be the other class. Thus, a linearly separable 4D-classification problem was considered in this example with 100 points in class A and 50 points in class B. Using the proposed algorithm with ɛ = 0.5, data were completely classified after two iterations and the results were collected in Table 4. Table 4. Weight Vectors and Threshold Values Iris Classification Problem i (iteration) W (weight vector) c (threshold) 1 (0, 0, 0, 2.5) 1 2 (0.3763, , , ) 1 5 Classification of Quadratically Separable Sets Two classes A, B are said to be quadratically separable if there exists a quadratic polynomial P 2 (y) = 0, y R m such that P 2 (y) > 0 if y A and P 2 (y) < 0 if y B. In R 2, a general quadratic polynomial can be put in the form: w 1 y w 2 y w 3 y 1 y 2 + w 4 y 1 + w 5 y 2 + c = 0. (18) (18) represents a conic section (parabola, ellipse, or hyperbola depending on the values of coefficients w i ). Now, consider a mapping φ : R 2 R 5 such that a point y(y 1, y 2 ) R 2 is mapped into a point x R 5, with components: Fig.5. Original dataset and the constructed hyperplanes for 3Dproblem of Example 2. (a) After first iteration. (b) After second iteration. Example 3. The IRIS dataset [12] classifies a plant as being an Iris Setosa, Iris Versicolour or Iris Virginica. x 1 = y1, 2 x 2 = y2, 2 x 3 = y 1 y 2, x 4 = y 1, x 5 = y 2. (19) Using this mapping, P 2 (y) = 0 is transformed into a hyperplane; x T w + c = 0 in R 5. The transformed linear classification problem can be solved by algorithm AILC to get w and c and hence a quadratic polynomial P 2 (y) = 0 is determined. Generally, a quadratic polynomial in R m can be transformed into a hyperplane in R n with n = m +

8 Mohamed Abdel-Kawy Mohamed Ali Soliman et al.: Linearly and Quadratically Separable Classifiers 915 m(m + 1)/2. Quadratic polynomials in R 3 represent surfaces such as ellipsoids, paraboloids, hyperboloids, cone. Although the algorithm is applicable to higher dimensions, we represent an example in R 2 for convenience in visualization. A set of points belonging to two classes (black and red +) are presented in Fig.6. The mapping φ defined by (19) is used to generate coordinates in R 5 corresponding to input data points. Algorithm AILC is used to solve the transformed linearly separable problem with two different values of ɛ = 0.4 and ɛ = 0.5. For each of these values, the resulting quadratic equation is plotted in blue. Although the algorithm successfully classified the points in both the cases, it shows the sensitivity to the value of ɛ. For ɛ = 0.4, five iterations were required to converge to a parabola (see Fig.6(a)) while it takes eleven iterations when ɛ = 0.5 to converge to the hyperbola shown in Fig.6(b). Moreover, the algorithm may diverge for other range of values compared with the case of the linearly separable classification problems where very few iterations (1 3) were sufficient for convergence for 0 < ɛ < 0.5. Fig.6. Classification by a conic section using different values of ɛ. (a) ɛ = 0.4. (b) ɛ = 0.5. For the difficult dataset presented in Fig.7, the application of the algorithm produces the shown separating ellipse. 6 Numerical Results In this section we discuss the performance of algorithm AILC compared with other learning algorithms in the case of linearly and nonlinearly separable practical and benchmark datasets. 6.1 Classification of Linearly Separable Datasets Fig.7. Application of algorithm produces an ellipse for the quadratically separable data. following linearly separable datasets were chosen, including benchmark dataset IRIS [12] and some randomly generated datasets. 1) IRIS: full description of IRIS dataset is given in Section 4 (Example 3). Here, we consider two classes: Iris Setosa (50 samples) versus the non- Setosa (the remaining 100 samples belonging to the Iris Versicolour and the Iris Virginica). 2) G ) G ) G ) G ) G The following procedure describes the automatic generation of data. Generate a random array consisting of N rows and n columns as input matrix X T. To define the class identifier d, we first generate a random vector of length n + 1 for the weight W and c then for 1 i N compute b i = X T i w + c and define d i as +1 or 1 according to b i > δ or b i < δ where δ is a small positive number that preserves a margin between the two generated sets. The generated data consist of X T and d in the form of N (n + 1) array. Table 5 gives the summary of the datasets being used. Table 5. Description of the Benchmark and Randomly Generated Linearly Separable Datasets Samples N Features n IRIS G G G G G For the evaluation of the AILC algorithm, the In the next experiment these linearly separable

9 916 J. Comput. Sci. & Technol., Sept. 2011, Vol.26, No.5 datasets are used to predict the performance of our proposed algorithm AILC and other machine learning algorithms including: decision tree, support vector machine and radial basis function network. A summary of these algorithms is given in Table 6. We compared our results with the implementations in WEKA [14-15]. Table 6. Summary of Machine Learning Algorithms Used J48 RBF MLP (L) SMO (d) AILC (d) to Produce Results of Tables (7, 9) Decision tree learner Radial basis function network Multilayer perceptron with back-propagation neural network using L hidden layers Sequential minimal optimization algorithm for support vector classification with polynomial kernel of degree d Proposed adaptive iterative linear (d = 1) and quadratic (d = 2) classifier Table 7. Results for the Empirical Comparison Showing the Number of Misclassified Instants J48 SMO (1) RBF AILC (1) IRIS (2) G (3) G (4) G (2) G (42) G (255) For each dataset, the full data were used in training different algorithms to predict the best separating hyperplane. The number of misclassified samples, if exist, is reported in Table 7. In addition, for AILC, the number of required iterations to obtain the separating hyperplane is given in parentheses. One can easily conclude (from Table 7 and many other experiments not reported here) that although the number of required iterations increases significantly with the increase of number of features n, it is nearly independent of the number of samples N. Being independent of N shows the strength point of the adaptive technique and being significantly dependent on n presents the weakness of the proposed technique that resulted from the assumption that the chosen n-points have to be at an equal and prescribed distance from the hyperplane. However, the proposed algorithm succeeded in separating all these datasets while other algorithms did not. 6.2 Behavior of Algorithm AILC in Nonlinearly Separable Datasets In this subsection, we will discuss the behavior of the proposed adaptive iterative linear classifier algorithm if the dataset is nonlinearly separable. And a comparison among this algorithm and decision tree, back-propagation neural network and support vector machines is presented Datasets Used for Empirical Evaluation For an empirical evaluation of the algorithm AILC with the nonlinearly separable datasets we have chosen five datasets from the UCI machine learning repository [12] for binary classification tasks. 1) Breast-Cancer (BC). We used the original Winconsin breast cancer dataset, which consists of 699 samples of breast-cancer medical data out of two classes. Sixteen examples containing missing values have been removed. 65.5% of the samples came from the majority class. 2) Pima Indian Diabetes (DI). This dataset contains 768 samples with eight attributes (features) each plus a binary class label. 3) Ionosphere (IO). This database contains 351 samples of radar return signals from the ionosphere. Each sample consists of 34 real valued attributes plus binary class information. 4) IRIS. Full description of IRIS dataset is given in Section 4 (Example 3). Here, only the 100 samples belonging to the Iris Versicolour and the Iris Virginica classes are considered. 5) Sonar (SN). The sonar database is a highdimensional dataset describing sonar signals in 60 realvalued attributes. The dataset contains 208 samples. Table 8 gives an overview of the datasets being used. The numbers of the examples in brackets show the original size of the dataset before the examples containing missing values had been removed. Table 8. Numerical Description of the Benchmark Datasets Used for Empirical Evaluation Samples Majority Features (Instances) Class (%) (Attributes) BC (699) DI IO IRIS SN There exist many different techniques to evaluate the performance of different learning techniques based on data with a limited number of samples. The stratified ten-fold cross-validation technique is gaining ascendancy and is probably the evaluation method of choice in most practical limited-data situations. In this technique, the data are divided randomly into ten parts in which the class is represented in approximately the same proportions as in the full dataset. Each part is

10 Mohamed Abdel-Kawy Mohamed Ali Soliman et al.: Linearly and Quadratically Separable Classifiers 917 Table 9. Results for the Empirical Comparison Showing the Number of Misclassified Instances and Accuracy on the Test Set Using 10-Fold Cross Validation BC DI IO IRIS SN J48 32 (95.31%) 196 (74.48%) 34 (90.31%) 6 (94%) 60 (71.15%) MLP (3) 36 (94.73%) 181 (76.43%) 31 (91.17%) 7 (93%) 41 (80.28%) SMO (1) 21 (96.93%) 179 (76.69%) 44 (87.46%) 6 (94%) 50 (75.96%) SMO (2) 24 (96.49%) 171 (77.73%) 33 (90.60%) 7 (93%) 37 (82.21%) AILC (1) 37 (94.58%) 199 (74.09%) 69 (80.34%) 6 (94%) 71 (65.87%) AILC (2) 4 (96%) held out in turn and the learning scheme trained on the remaining nine-tenths; then its error rate is calculated on the holdout set. Thus the learning procedure is executed a total of ten times on different training sets (each of which has a lot in common). Finally, ten error estimates are averaged to yield an overall error estimate. In this study, the technique of cross validation was applied to benchmark datasets (see Table 8) to predict the performance of our proposed algorithm AILC and other machine learning algorithms including: decision tree, back-propagation neural network and support vector machines (see Table 6). We compared our results with the implementations in WEKA [14-15]. The results of comparison are summarized in Table 9 where the number of misclassified instances and accuracy of classification, in parentheses, are given. Although AILC is a linear classifier, it produces reasonable results even in the case of nonlinearly separable datasets. Again, as in the linearly separable case (Subsection 6.1), one can easily conclude that the performance of AILC is independent of the size of the samples N but reduces with the increase of feature dimension n. Note that for IRIS dataset where n = 4, AILC is as accurate as SVM when using polynomial kernel of degree 1 and its performance outperforms that of SVM when using polynomial kernel of degree 2. For the datasets BC (n = 9) and DI (n = 8), comparable results are obtained even N is large (see Table 8). On the other hand less acceptable results are obtained in case of IO (n = 34) and SN (n = 60). 7 Conclusions A fast adaptive iterative algorithm AILC for classifying linearly separable data is presented. In a binary classification problem containing N samples with n features, the main idea of the algorithm is that it chooses adaptively a subset of n-samples and constructs a hyperplane that separates the n-samples at a margin ɛ and it best classifies the remaining points. This process is repeated until the separating hyperplane is obtained. If such a hyperplane was not obtained after the prescribed number of iterations, the algorithm returns the hyperplane that misclassifies fewer samples. Further, a quadratically separable classification problem can be mapped from its physical space to another larger where the problem becomes linearly separable. From various numerical illustrations and comparisons with other classification algorithms using benchmark datasets, one can conclude: 1) the algorithm is fast due to its adaptive feature; 2) the complexity of the algorithm is C 1 N +C 2 n 2, C 1 and C 2 are independent on N, which ensures excellent performance especially when n is small; 3) the assumption that n-samples must lie at a prescribed margin from the hyperplane is restrictive and makes the convergence rate dependent on n; and on the other hand, the user must provide the prescribed parameter ɛ which is problem dependent; 4) convergence rates of AILC are measured either by a number of required iterations to get the separating hyperplane or by a number of misclassified samples after prescribed number of iterations. Theoretical and numerical results show that convergence rates are nearly independent on N but reduce with the increase of n, and usually fewer iterations are sufficient for convergence for small n. Although reasonable results were obtained, convergence was greatly dependent on the value n which in turn depends on the prescribed parameter ɛ. Other algorithms are in development to predict the value of ɛ that ensures maximum margin for the n-points. Moreover, the classification problem as formulated in Section 3 may be developed as a linear programming algorithm that determines ɛ as an n-valued vector, rather than a scalar value, and produces the hyperplane with maximum margin. References [1] Duda R O, Hart P E, Stork D G. Pattern Classification. New York: Wiley-Interscience, [2] Theodoridis S, Koutroumbas K. Pattern Recognition. Academic Press, An Imprint of Elsevier, [3] Cristianini N, Shawe T J. An Introduction to Support Vector Machines. Vol. I, Cambridge University Press, [4] Atiya A. Learning with kernels: Support vector machines, regularization, optimization, and beyond. IEEE Transactions on Neural Networks, 2005, 16(3): 781.

11 918 J. Comput. Sci. & Technol., Sept. 2011, Vol.26, No.5 [5] Rosenblatt F. Principles of Neurodynamics. Spartan Books, [6] Taha H A. Operations Research An Introduction. Macmillan Publishing Co., Inc, [7] Zurada J M. Introduction to Artificial Neural Systems. Boston: PWS Publishing Co., USA, [8] Barber C B, Dodkin D P, Huhdanpaa H. The quickhull algorithm for convex hulls. ACM Transactions on Mathematical Software, 1996, 22(4): [9] Tajine M, Elizondo D. New methods for testing linear separability. Neurocomputing, 2002, 47(1-4): [10] Elizondo D. Searching for linearly separable subsets using the class of linear separability method. In Proc. IEEE-IJCNN, Budapest, Hungary, Jul , 2004, pp [11] Elizondo D. The linear separability problem: Some testing methods. IEEE Transactions on Neural Networks, 2006, 17(2): [12] Mar. 31, [13] Fisher R A. The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics, 1936, 7: [14] ml/weka/, May 1, [15] Witten I H, Frank E. Data Mining: Practical Machine Learning Tools and Techniques. Elsevier, Rasha M. Abo-Bakr was born in 1976 in Egypt, received her Bachelor s degree from Mathematics (Computer Science) Department, Faculty of Science, Zagazig University, Egypt. She was also awarded her Master s degree in computer science in 2003, with a thesis titled Computer Algorithms for System Identification. Since 2003 she has been an assistant lecturer at Mathematics (Computer Science) Department, Faculty of Science, Zagazig University. She received her Ph.D. degree in mathematics & computer science from Zagazig University, in 2011, with a dissertation titled Symbolic Modeling of Dynamical Systems Using Soft Computing Techniques. Her research interests are artificial intelligence, soft computing technologies, and astronomy. Mohamed Abdel-Kawy Mohamed Ali Soliman received the B.S. degree in electrical and electronic engineering from M.T.C (Military Technical College), Cairo, Egypt, with grade (Excellent) in 1974, the M.S. degree in electronic and communications engineering from Faculty of Engineering, Cairo University, Egypt, with the research on observers in modern control systems theory, 1985, and the Ph.D. degree in aeronautical engineering, the thesis title is Intelligent Management for Aircraft and Spacecraft Sensors Systems, He is currently head of Computer and Systems Engineering Department, Faculty of Engineering, Zagazig University. His research interests lie in the intersection of the general fields of computer science and engineering, brain science, and cognitive science.

Rule extraction from support vector machines

Rule extraction from support vector machines Rule extraction from support vector machines Haydemar Núñez 1,3 Cecilio Angulo 1,2 Andreu Català 1,2 1 Dept. of Systems Engineering, Polytechnical University of Catalonia Avda. Victor Balaguer s/n E-08800

More information

Hsiaochun Hsu Date: 12/12/15. Support Vector Machine With Data Reduction

Hsiaochun Hsu Date: 12/12/15. Support Vector Machine With Data Reduction Support Vector Machine With Data Reduction 1 Table of Contents Summary... 3 1. Introduction of Support Vector Machines... 3 1.1 Brief Introduction of Support Vector Machines... 3 1.2 SVM Simple Experiment...

More information

IN recent years, neural networks have attracted considerable attention

IN recent years, neural networks have attracted considerable attention Multilayer Perceptron: Architecture Optimization and Training Hassan Ramchoun, Mohammed Amine Janati Idrissi, Youssef Ghanou, Mohamed Ettaouil Modeling and Scientific Computing Laboratory, Faculty of Science

More information

3 Perceptron Learning; Maximum Margin Classifiers

3 Perceptron Learning; Maximum Margin Classifiers Perceptron Learning; Maximum Margin lassifiers Perceptron Learning; Maximum Margin lassifiers Perceptron Algorithm (cont d) Recall: linear decision fn f (x) = w x (for simplicity, no ) decision boundary

More information

Data mining with Support Vector Machine

Data mining with Support Vector Machine Data mining with Support Vector Machine Ms. Arti Patle IES, IPS Academy Indore (M.P.) artipatle@gmail.com Mr. Deepak Singh Chouhan IES, IPS Academy Indore (M.P.) deepak.schouhan@yahoo.com Abstract: Machine

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

Lecture 3: Linear Classification

Lecture 3: Linear Classification Lecture 3: Linear Classification Roger Grosse 1 Introduction Last week, we saw an example of a learning task called regression. There, the goal was to predict a scalar-valued target from a set of features.

More information

FUZZY KERNEL K-MEDOIDS ALGORITHM FOR MULTICLASS MULTIDIMENSIONAL DATA CLASSIFICATION

FUZZY KERNEL K-MEDOIDS ALGORITHM FOR MULTICLASS MULTIDIMENSIONAL DATA CLASSIFICATION FUZZY KERNEL K-MEDOIDS ALGORITHM FOR MULTICLASS MULTIDIMENSIONAL DATA CLASSIFICATION 1 ZUHERMAN RUSTAM, 2 AINI SURI TALITA 1 Senior Lecturer, Department of Mathematics, Faculty of Mathematics and Natural

More information

Bagging and Boosting Algorithms for Support Vector Machine Classifiers

Bagging and Boosting Algorithms for Support Vector Machine Classifiers Bagging and Boosting Algorithms for Support Vector Machine Classifiers Noritaka SHIGEI and Hiromi MIYAJIMA Dept. of Electrical and Electronics Engineering, Kagoshima University 1-21-40, Korimoto, Kagoshima

More information

SoftDoubleMinOver: A Simple Procedure for Maximum Margin Classification

SoftDoubleMinOver: A Simple Procedure for Maximum Margin Classification SoftDoubleMinOver: A Simple Procedure for Maximum Margin Classification Thomas Martinetz, Kai Labusch, and Daniel Schneegaß Institute for Neuro- and Bioinformatics University of Lübeck D-23538 Lübeck,

More information

Simulation of Back Propagation Neural Network for Iris Flower Classification

Simulation of Back Propagation Neural Network for Iris Flower Classification American Journal of Engineering Research (AJER) e-issn: 2320-0847 p-issn : 2320-0936 Volume-6, Issue-1, pp-200-205 www.ajer.org Research Paper Open Access Simulation of Back Propagation Neural Network

More information

The Curse of Dimensionality

The Curse of Dimensionality The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more

More information

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017 Data Analysis 3 Support Vector Machines Jan Platoš October 30, 2017 Department of Computer Science Faculty of Electrical Engineering and Computer Science VŠB - Technical University of Ostrava Table of

More information

Cluster Analysis using Spherical SOM

Cluster Analysis using Spherical SOM Cluster Analysis using Spherical SOM H. Tokutaka 1, P.K. Kihato 2, K. Fujimura 2 and M. Ohkita 2 1) SOM Japan Co-LTD, 2) Electrical and Electronic Department, Tottori University Email: {tokutaka@somj.com,

More information

Robustness of Selective Desensitization Perceptron Against Irrelevant and Partially Relevant Features in Pattern Classification

Robustness of Selective Desensitization Perceptron Against Irrelevant and Partially Relevant Features in Pattern Classification Robustness of Selective Desensitization Perceptron Against Irrelevant and Partially Relevant Features in Pattern Classification Tomohiro Tanno, Kazumasa Horie, Jun Izawa, and Masahiko Morita University

More information

Use of Multi-category Proximal SVM for Data Set Reduction

Use of Multi-category Proximal SVM for Data Set Reduction Use of Multi-category Proximal SVM for Data Set Reduction S.V.N Vishwanathan and M Narasimha Murty Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560 012, India Abstract.

More information

Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

More information

Noise-based Feature Perturbation as a Selection Method for Microarray Data

Noise-based Feature Perturbation as a Selection Method for Microarray Data Noise-based Feature Perturbation as a Selection Method for Microarray Data Li Chen 1, Dmitry B. Goldgof 1, Lawrence O. Hall 1, and Steven A. Eschrich 2 1 Department of Computer Science and Engineering

More information

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions ENEE 739Q: STATISTICAL AND NEURAL PATTERN RECOGNITION Spring 2002 Assignment 2 Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions Aravind Sundaresan

More information

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Implementation: Real machine learning schemes Decision trees Classification

More information

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization

More information

Linear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines

Linear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines Linear Models Lecture Outline: Numeric Prediction: Linear Regression Linear Classification The Perceptron Support Vector Machines Reading: Chapter 4.6 Witten and Frank, 2nd ed. Chapter 4 of Mitchell Solving

More information

Table of Contents. Recognition of Facial Gestures... 1 Attila Fazekas

Table of Contents. Recognition of Facial Gestures... 1 Attila Fazekas Table of Contents Recognition of Facial Gestures...................................... 1 Attila Fazekas II Recognition of Facial Gestures Attila Fazekas University of Debrecen, Institute of Informatics

More information

Classification: Linear Discriminant Functions

Classification: Linear Discriminant Functions Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Michael Tagare De Guzman May 19, 2012 Support Vector Machines Linear Learning Machines and The Maximal Margin Classifier In Supervised Learning, a learning machine is given a training

More information

Generating the Reduced Set by Systematic Sampling

Generating the Reduced Set by Systematic Sampling Generating the Reduced Set by Systematic Sampling Chien-Chung Chang and Yuh-Jye Lee Email: {D9115009, yuh-jye}@mail.ntust.edu.tw Department of Computer Science and Information Engineering National Taiwan

More information

Basis Functions. Volker Tresp Summer 2017

Basis Functions. Volker Tresp Summer 2017 Basis Functions Volker Tresp Summer 2017 1 Nonlinear Mappings and Nonlinear Classifiers Regression: Linearity is often a good assumption when many inputs influence the output Some natural laws are (approximately)

More information

Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms

Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002 1225 Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms S. Sathiya Keerthi Abstract This paper

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines srihari@buffalo.edu SVM Discussion Overview 1. Overview of SVMs 2. Margin Geometry 3. SVM Optimization 4. Overlapping Distributions 5. Relationship to Logistic Regression 6. Dealing

More information

Support Vector Machines

Support Vector Machines Support Vector Machines SVM Discussion Overview. Importance of SVMs. Overview of Mathematical Techniques Employed 3. Margin Geometry 4. SVM Training Methodology 5. Overlapping Distributions 6. Dealing

More information

Leave-One-Out Support Vector Machines

Leave-One-Out Support Vector Machines Leave-One-Out Support Vector Machines Jason Weston Department of Computer Science Royal Holloway, University of London, Egham Hill, Egham, Surrey, TW20 OEX, UK. Abstract We present a new learning algorithm

More information

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty) Supervised Learning (contd) Linear Separation Mausam (based on slides by UW-AI faculty) Images as Vectors Binary handwritten characters Treat an image as a highdimensional vector (e.g., by reading pixel

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Support Vector Machines Aurélie Herbelot 2018 Centre for Mind/Brain Sciences University of Trento 1 Support Vector Machines: introduction 2 Support Vector Machines (SVMs) SVMs

More information

Kernel Methods and Visualization for Interval Data Mining

Kernel Methods and Visualization for Interval Data Mining Kernel Methods and Visualization for Interval Data Mining Thanh-Nghi Do 1 and François Poulet 2 1 College of Information Technology, Can Tho University, 1 Ly Tu Trong Street, Can Tho, VietNam (e-mail:

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Generalisation and the Recursive Deterministic Perceptron

Generalisation and the Recursive Deterministic Perceptron Generalisation and the Recursive Deterministic Perceptron David Elizondo, Ralph Birkenhead, and Eric Taillard Abstract The Recursive Deterministic Perceptron (RDP) feed-forward multilayer neural network

More information

Support Vector Machines

Support Vector Machines Support Vector Machines . Importance of SVM SVM is a discriminative method that brings together:. computational learning theory. previously known methods in linear discriminant functions 3. optimization

More information

Chapter 15 Introduction to Linear Programming

Chapter 15 Introduction to Linear Programming Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2015 Wei-Ta Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of

More information

Feature scaling in support vector data description

Feature scaling in support vector data description Feature scaling in support vector data description P. Juszczak, D.M.J. Tax, R.P.W. Duin Pattern Recognition Group, Department of Applied Physics, Faculty of Applied Sciences, Delft University of Technology,

More information

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning Robot Learning 1 General Pipeline 1. Data acquisition (e.g., from 3D sensors) 2. Feature extraction and representation construction 3. Robot learning: e.g., classification (recognition) or clustering (knowledge

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines srihari@buffalo.edu SVM Discussion Overview. Importance of SVMs. Overview of Mathematical Techniques Employed 3. Margin Geometry 4. SVM Training Methodology 5. Overlapping Distributions

More information

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs) Data Mining: Concepts and Techniques Chapter 9 Classification: Support Vector Machines 1 Support Vector Machines (SVMs) SVMs are a set of related supervised learning methods used for classification Based

More information

Statistical Methods in AI

Statistical Methods in AI Statistical Methods in AI Distance Based and Linear Classifiers Shrenik Lad, 200901097 INTRODUCTION : The aim of the project was to understand different types of classification algorithms by implementing

More information

Machine Learning in Biology

Machine Learning in Biology Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 20: 10/12/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning

Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning Overview T7 - SVM and s Christian Vögeli cvoegeli@inf.ethz.ch Supervised/ s Support Vector Machines Kernels Based on slides by P. Orbanz & J. Keuchel Task: Apply some machine learning method to data from

More information

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University

More information

Supervised classification exercice

Supervised classification exercice Universitat Politècnica de Catalunya Master in Artificial Intelligence Computational Intelligence Supervised classification exercice Authors: Miquel Perelló Nieto Marc Albert Garcia Gonzalo Date: December

More information

k-nearest Neighbors + Model Selection

k-nearest Neighbors + Model Selection 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University k-nearest Neighbors + Model Selection Matt Gormley Lecture 5 Jan. 30, 2019 1 Reminders

More information

KBSVM: KMeans-based SVM for Business Intelligence

KBSVM: KMeans-based SVM for Business Intelligence Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2004 Proceedings Americas Conference on Information Systems (AMCIS) December 2004 KBSVM: KMeans-based SVM for Business Intelligence

More information

Week 3: Perceptron and Multi-layer Perceptron

Week 3: Perceptron and Multi-layer Perceptron Week 3: Perceptron and Multi-layer Perceptron Phong Le, Willem Zuidema November 12, 2013 Last week we studied two famous biological neuron models, Fitzhugh-Nagumo model and Izhikevich model. This week,

More information

Efficient Pairwise Classification

Efficient Pairwise Classification Efficient Pairwise Classification Sang-Hyeun Park and Johannes Fürnkranz TU Darmstadt, Knowledge Engineering Group, D-64289 Darmstadt, Germany Abstract. Pairwise classification is a class binarization

More information

A Lazy Approach for Machine Learning Algorithms

A Lazy Approach for Machine Learning Algorithms A Lazy Approach for Machine Learning Algorithms Inés M. Galván, José M. Valls, Nicolas Lecomte and Pedro Isasi Abstract Most machine learning algorithms are eager methods in the sense that a model is generated

More information

Argha Roy* Dept. of CSE Netaji Subhash Engg. College West Bengal, India.

Argha Roy* Dept. of CSE Netaji Subhash Engg. College West Bengal, India. Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Training Artificial

More information

Support Vector Machines and their Applications

Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of Technology Kanpur. Summer School on Expert Systems And Their Applications, Indian Institute of Information Technology

More information

Machine Learning with MATLAB --classification

Machine Learning with MATLAB --classification Machine Learning with MATLAB --classification Stanley Liang, PhD York University Classification the definition In machine learning and statistics, classification is the problem of identifying to which

More information

Lab 2: Support vector machines

Lab 2: Support vector machines Artificial neural networks, advanced course, 2D1433 Lab 2: Support vector machines Martin Rehn For the course given in 2006 All files referenced below may be found in the following directory: /info/annfk06/labs/lab2

More information

Lecture #11: The Perceptron

Lecture #11: The Perceptron Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be

More information

1 Case study of SVM (Rob)

1 Case study of SVM (Rob) DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how

More information

Machine Learning: Algorithms and Applications Mockup Examination

Machine Learning: Algorithms and Applications Mockup Examination Machine Learning: Algorithms and Applications Mockup Examination 14 May 2012 FIRST NAME STUDENT NUMBER LAST NAME SIGNATURE Instructions for students Write First Name, Last Name, Student Number and Signature

More information

The Generalisation of the Recursive Deterministic Perceptron

The Generalisation of the Recursive Deterministic Perceptron 006 International Joint Conference on Neural Networks Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada July 16-1, 006 The Generalisation of the Recursive Deterministic Perceptron David Elizondo,

More information

.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..

.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar.. .. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar.. Machine Learning: Support Vector Machines: Linear Kernel Support Vector Machines Extending Perceptron Classifiers. There are two ways to

More information

Basis Functions. Volker Tresp Summer 2016

Basis Functions. Volker Tresp Summer 2016 Basis Functions Volker Tresp Summer 2016 1 I am an AI optimist. We ve got a lot of work in machine learning, which is sort of the polite term for AI nowadays because it got so broad that it s not that

More information

Lecture 7: Support Vector Machine

Lecture 7: Support Vector Machine Lecture 7: Support Vector Machine Hien Van Nguyen University of Houston 9/28/2017 Separating hyperplane Red and green dots can be separated by a separating hyperplane Two classes are separable, i.e., each

More information

Machine Learning and Pervasive Computing

Machine Learning and Pervasive Computing Stephan Sigg Georg-August-University Goettingen, Computer Networks 17.12.2014 Overview and Structure 22.10.2014 Organisation 22.10.3014 Introduction (Def.: Machine learning, Supervised/Unsupervised, Examples)

More information

Nelder-Mead Enhanced Extreme Learning Machine

Nelder-Mead Enhanced Extreme Learning Machine Philip Reiner, Bogdan M. Wilamowski, "Nelder-Mead Enhanced Extreme Learning Machine", 7-th IEEE Intelligent Engineering Systems Conference, INES 23, Costa Rica, June 9-2., 29, pp. 225-23 Nelder-Mead Enhanced

More information

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation. Equation to LaTeX Abhinav Rastogi, Sevy Harris {arastogi,sharris5}@stanford.edu I. Introduction Copying equations from a pdf file to a LaTeX document can be time consuming because there is no easy way

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting

More information

Robust 1-Norm Soft Margin Smooth Support Vector Machine

Robust 1-Norm Soft Margin Smooth Support Vector Machine Robust -Norm Soft Margin Smooth Support Vector Machine Li-Jen Chien, Yuh-Jye Lee, Zhi-Peng Kao, and Chih-Cheng Chang Department of Computer Science and Information Engineering National Taiwan University

More information

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis CHAPTER 3 BEST FIRST AND GREEDY SEARCH BASED CFS AND NAÏVE BAYES ALGORITHMS FOR HEPATITIS DIAGNOSIS 3.1 Introduction

More information

Support Vector Machines

Support Vector Machines Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007,

More information

Practice EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Practice EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE Practice EXAM: SPRING 0 CS 6375 INSTRUCTOR: VIBHAV GOGATE The exam is closed book. You are allowed four pages of double sided cheat sheets. Answer the questions in the spaces provided on the question sheets.

More information

DM6 Support Vector Machines

DM6 Support Vector Machines DM6 Support Vector Machines Outline Large margin linear classifier Linear separable Nonlinear separable Creating nonlinear classifiers: kernel trick Discussion on SVM Conclusion SVM: LARGE MARGIN LINEAR

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine

More information

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University of Economics

More information

5. GENERALIZED INVERSE SOLUTIONS

5. GENERALIZED INVERSE SOLUTIONS 5. GENERALIZED INVERSE SOLUTIONS The Geometry of Generalized Inverse Solutions The generalized inverse solution to the control allocation problem involves constructing a matrix which satisfies the equation

More information

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers A. Salhi, B. Minaoui, M. Fakir, H. Chakib, H. Grimech Faculty of science and Technology Sultan Moulay Slimane

More information

Classification using Weka (Brain, Computation, and Neural Learning)

Classification using Weka (Brain, Computation, and Neural Learning) LOGO Classification using Weka (Brain, Computation, and Neural Learning) Jung-Woo Ha Agenda Classification General Concept Terminology Introduction to Weka Classification practice with Weka Problems: Pima

More information

Support Vector Machines

Support Vector Machines Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining

More information

Wrapper Feature Selection using Discrete Cuckoo Optimization Algorithm Abstract S.J. Mousavirad and H. Ebrahimpour-Komleh* 1 Department of Computer and Electrical Engineering, University of Kashan, Kashan,

More information

Support Vector Machines (a brief introduction) Adrian Bevan.

Support Vector Machines (a brief introduction) Adrian Bevan. Support Vector Machines (a brief introduction) Adrian Bevan email: a.j.bevan@qmul.ac.uk Outline! Overview:! Introduce the problem and review the various aspects that underpin the SVM concept.! Hard margin

More information

Well Analysis: Program psvm_welllogs

Well Analysis: Program psvm_welllogs Proximal Support Vector Machine Classification on Well Logs Overview Support vector machine (SVM) is a recent supervised machine learning technique that is widely used in text detection, image recognition

More information

12 Classification using Support Vector Machines

12 Classification using Support Vector Machines 160 Bioinformatics I, WS 14/15, D. Huson, January 28, 2015 12 Classification using Support Vector Machines This lecture is based on the following sources, which are all recommended reading: F. Markowetz.

More information

Naïve Bayes for text classification

Naïve Bayes for text classification Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support

More information

Classification and Regression using Linear Networks, Multilayer Perceptrons and Radial Basis Functions

Classification and Regression using Linear Networks, Multilayer Perceptrons and Radial Basis Functions ENEE 739Q SPRING 2002 COURSE ASSIGNMENT 2 REPORT 1 Classification and Regression using Linear Networks, Multilayer Perceptrons and Radial Basis Functions Vikas Chandrakant Raykar Abstract The aim of the

More information

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska Classification Lecture Notes cse352 Neural Networks Professor Anita Wasilewska Neural Networks Classification Introduction INPUT: classification data, i.e. it contains an classification (class) attribute

More information

Version Space Support Vector Machines: An Extended Paper

Version Space Support Vector Machines: An Extended Paper Version Space Support Vector Machines: An Extended Paper E.N. Smirnov, I.G. Sprinkhuizen-Kuyper, G.I. Nalbantov 2, and S. Vanderlooy Abstract. We argue to use version spaces as an approach to reliable

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Maximum Margin Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Rule Based Learning Systems from SVM and RBFNN

Rule Based Learning Systems from SVM and RBFNN Rule Based Learning Systems from SVM and RBFNN Haydemar Núñez 1, Cecilio Angulo 2 and Andreu Català 2 1 Laboratorio de Inteligencia Artificial, Universidad Central de Venezuela. Caracas, Venezuela hnunez@strix.ciens.ucv.ve

More information

A Comparative Study of SVM Kernel Functions Based on Polynomial Coefficients and V-Transform Coefficients

A Comparative Study of SVM Kernel Functions Based on Polynomial Coefficients and V-Transform Coefficients www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 6 Issue 3 March 2017, Page No. 20765-20769 Index Copernicus value (2015): 58.10 DOI: 18535/ijecs/v6i3.65 A Comparative

More information

A Support Vector Method for Hierarchical Clustering

A Support Vector Method for Hierarchical Clustering A Support Vector Method for Hierarchical Clustering Asa Ben-Hur Faculty of IE and Management Technion, Haifa 32, Israel David Horn School of Physics and Astronomy Tel Aviv University, Tel Aviv 69978, Israel

More information

Using Decision Boundary to Analyze Classifiers

Using Decision Boundary to Analyze Classifiers Using Decision Boundary to Analyze Classifiers Zhiyong Yan Congfu Xu College of Computer Science, Zhejiang University, Hangzhou, China yanzhiyong@zju.edu.cn Abstract In this paper we propose to use decision

More information

Morphological Image Processing

Morphological Image Processing Morphological Image Processing Binary image processing In binary images, we conventionally take background as black (0) and foreground objects as white (1 or 255) Morphology Figure 4.1 objects on a conveyor

More information

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER Inverting Feedforward Neural Networks Using Linear and Nonlinear Programming

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER Inverting Feedforward Neural Networks Using Linear and Nonlinear Programming IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER 1999 1271 Inverting Feedforward Neural Networks Using Linear and Nonlinear Programming Bao-Liang Lu, Member, IEEE, Hajime Kita, and Yoshikazu

More information

THE discrete multi-valued neuron was presented by N.

THE discrete multi-valued neuron was presented by N. Proceedings of International Joint Conference on Neural Networks, Dallas, Texas, USA, August 4-9, 2013 Multi-Valued Neuron with New Learning Schemes Shin-Fu Wu and Shie-Jue Lee Department of Electrical

More information

Reihe Informatik 10/2001. Efficient Feature Subset Selection for Support Vector Machines. Matthias Heiler, Daniel Cremers, Christoph Schnörr

Reihe Informatik 10/2001. Efficient Feature Subset Selection for Support Vector Machines. Matthias Heiler, Daniel Cremers, Christoph Schnörr Computer Vision, Graphics, and Pattern Recognition Group Department of Mathematics and Computer Science University of Mannheim D-68131 Mannheim, Germany Reihe Informatik 10/2001 Efficient Feature Subset

More information

Comparative Study of Instance Based Learning and Back Propagation for Classification Problems

Comparative Study of Instance Based Learning and Back Propagation for Classification Problems Comparative Study of Instance Based Learning and Back Propagation for Classification Problems 1 Nadia Kanwal, 2 Erkan Bostanci 1 Department of Computer Science, Lahore College for Women University, Lahore,

More information

Efficient Pruning Method for Ensemble Self-Generating Neural Networks

Efficient Pruning Method for Ensemble Self-Generating Neural Networks Efficient Pruning Method for Ensemble Self-Generating Neural Networks Hirotaka INOUE Department of Electrical Engineering & Information Science, Kure National College of Technology -- Agaminami, Kure-shi,

More information