Use of Multi-category Proximal SVM for Data Set Reduction

Size: px

Start display at page:

Download "Use of Multi-category Proximal SVM for Data Set Reduction"

Blaze Miller
5 years ago
Views:

1 Use of Multi-category Proximal SVM for Data Set Reduction S.V.N Vishwanathan and M Narasimha Murty Department of Computer Science and Automation, Indian Institute of Science, Bangalore , India Abstract. In this paper we describe a method for data set reduction by effective use of Multi-category Proximal Support Vector Machine (MPSVM). By using the Linear MPSVM Formulation in an iterative manner we identify the outliers in the data set and eliminate them. A k-nearest Neighbor (k-nn) classifier is able to classify points using this reduced data set without significant loss of accuracy. We present experiments on a well known large OCR data set to validate our claims. 1 Introduction Support Vector Machines (SVM) have recently gained prominence in the field of machine learning and pattern classification[8]. Classification is achieved by realizing a linear or non linear separation surface in the input space. Training the SVM involves solving a quadratic optimization problem which involves the use of optimization routines from numerical libraries. This step is computationally intensive, can be subject to stability problems and is nontrivial to implement[5]. Recently many iterative algorithms have been proposed to overcome the above mentioned problems. Notable among them are, Sequential Minimal Optimization (SMO), Nearest Point Algorithm (NPA) and Multi Category Proximal Support Vector Machine (MPSVM)[5,4,2]. These algorithms offer mathematical elegance as well as ease of implementation. k-nearest Neighbor (k-nn) classifier is one of the most robust and widely used classifiers in the field of Optical Character Recognition[3]. The classification of a test point using the k-nn classifier takes O(n) time, where, n is the number of points in the training set. One popular method of speeding up the k-nn classifier is to reduce the number of points in the training set by appropriate data selection and outlier elimination. It is also well known that the presence of outliers tends to decrease the classification accuracy of the k-nn classifier[7]. In this paper we propose an hybrid classification system, wherein, we first use the MPSVM iteratively to perform data set reduction and outlier vishy@csa.iisc.ernet.in mnm@csa.iisc.ernet.in - Corresponding author

2 2 Vishwanathan and Murty elimination. The pre-processed data is now used as the training set for a k-nn classifier. The resultant k-nn classifier is more robust and takes less time to classify test points. This paper is organized as follows. In section 2 we briefly discuss the MPSVM formulation. We present our algorithm in section 3. In section 4 we discuss the experiments carried out on a public OCR data set. Section 5 concludes with a summary of the present work and gives pointers for future research. 2 Multi Category Support Vector Machines (MPSVM) Let, X = {x 1, x 2,... x m } be a set of m points in n-dimensional real space R n represented by a m n matrix A. We consider the problem of classifying these points according to the membership of each point in the class A+ or A as specified by a given m m diagonal matrix D which contains +1 or 1 along the diagonal. For this problem, the Proximal Support Vector Machine (PSVM) formulation for a linear kernel with ν > 0 is given by[1] min (w,γ,y) R n+1+m s.t ν 2 Ny [ ] 2 w 2 γ D(Aw eγ) + y = e (1) Here e is a vector of ones and N is a diagonal normalization matrix used to account for the difference in the number of points in the two classes. If there are p samples in class A+ and m p samples in A then, N contains 1/p on rows corresponding to entries of class A+ and 1/(m p) on rows corresponding to entries of class A. y is an error variable and corresponds to the deviation of a point from the proximal plane. Applying the KKT conditions and solving the above equations yields w = νa T DN[I H( I ν + HT NH) 1 H T N] (2) and γ = νe T DN[I H( I ν + HT NH) 1 H T N] (3) Mangasarian et. al. have proposed a non linear extension to the PSVM formulation using the Kernel trick [1]. We do not use the non linear formulation in our experiments. The MPSVM is a straight forward extension to PSVM where multiple classes are handled by using the one against the rest scheme i.e. samples of one class are considered to constitute the class A+ and the rest of the samples are considered to belong to class A, this is repeated for every class in the data set[2].

3 MPSVM for Data Set Reduction 3 3 Our Algorithm Our algorithm is conceptually simple to understand. We perform data set reduction in two ways Boundary patterns which are most likely to cause confusion while classifying a point are pruned away from the training set. Very typical patterns of the class which are far removed from the boundary are often not useful for classification and can be safely ignored[6]. For each of the classes we use the MPSVM formulation to obtain separating planes. For the next iteration we retain only those points which are enclosed by these separating planes. These are points which satisfy x T w γ ( 1, 1) (4) This reduced set is used iteratively as input for the MPSVM formulation. Once the number of data points enclosed by these separating planes becomes less than a preset threshold we treat them as boundary points and prune them from the training set. We state this in Algorithm 1. Algorithm 1 EliminateBoundary( A, D, ν, T, M ) Reduced := A for classlabel=1:no Of Classes do ToDelete := Reduced for i=1:m do [W, γ] = MPSVM(ToDelete, D, ν) ToDelete := {x Reduced : x T W γ ( 1, 1)} if size(todelete) < T then break end if end for Reduced := Reduced \ ToDelete end for return Reduced During the first iteration all points of a given class which lie on the other side of the separating plane are classified as far removed from the boundary and hence are removed. Assuming that samples of a class belong to A+ and the rest of the samples are classified as A, these are points which satisfy x T w γ > 1 (5) We state this procedure in Algorithm 2. In both the algorithms, A is the full training set, D is a diagonal matrix indicating the class label of the training samples, T is the threshold on the number of samples to discard, M is the maximum number of iterations to perform per class.

4 4 Vishwanathan and Murty Algorithm 2 EliminateTypical( A, D, ν ) Reduced := A for classlabel=1:no Of Classes do ToDelete := Reduced [W, γ] = MPSVM(ToDelete, D, ν) ToDelete = {x Reduced : x T W γ > 1} Reduced := Reduced \ ToDelete end for return Reduced 4 Experimental Results All experiments were performed using a well known public data set used widely by us and other researchers[10,6]. The training set consists of 6670 preprocessed hand written digits (0-9) with roughly 667 samples per class. The test set consists of 3333 samples with roughly 333 samples per class. All the samples had a dimensionality of 192 after initial preprocessing and there were no missing values. All experiments were performed on a 450 MHz Pentium III machine with 64MB RAM running RedHat Linux 6.1 and MATLAB k-nearest Neighbor Classifier Using Full Data Set We performed k-nearest Neighbor classification of the test set using all the samples in the training set. We experimented with values of k between 1 and 10 and the best classification accuracy of 92.5% was obtained with a k value of MPSVM Classifier Here we used the Linear MPSVM for classification as described in [2] and obtained an accuracy of 87.40%. It must be noted here that a Non Linear MPSVM may perform better in our context. 4.3 k-nearest Neighbor Classifier Using Reduced Data Set We used the Linear MPSVM to reduce the training set size as described in Section 3. A few representative samples shown in Figure 1 give an idea about the kind of outliers eliminated by our algorithm. Various values of threshold and ν were used and the best results are reproduced in Table 1. Again, the best classification accuracy was obtained for a k value of 5.

MPSVM for Data Set Reduction 5 ν Threshold No.

Examples of outliers detected by our algorithm.

the k-nearest Neighbor classifier which uses the full training set for

But, even after elimination of around 44% of the training samples the

Furthermore, elimination of around 58% of the samples has caused the

report a accuracy of 86.32% after elimination of 2390 atypical patterns and 75.

Thus it can be clearly seen that our method is effective in reducing the size

note that the number of samples eliminated is more sensitive to the threshold

5 MPSVM for Data Set Reduction 5 ν Threshold No. Eliminated % Accuracy Table 1. Accuracy of k-nearest Neighbor using a reduced training set Fig. 1. Examples of outliers detected by our algorithm. Actual class labels (L to R) are 0, 0, 4, 9 (Row1) 2, 2, 3, 3 (Row 2) 8, 8, 7, 7 (Row 3). 4.4 Discussion of the results As can be seen, the best accuracy is obtained for the k-nearest Neighbor classifier which uses the full training set for classification. But, even after elimination of around 44% of the training samples the classification accuracy has dropped by only around 2%. Furthermore, elimination of around 58% of the samples has caused the classification accuracy to drop by around 4.5%. Saradhi et. al. report a accuracy of 86.32% after elimination of 2390 atypical patterns and 75.61% after elimination of 3972 atypical patterns on the same data set[6]. Thus it can be clearly seen that our method is effective in reducing the size of the data set without losing the patterns that are important for classification. We also note that the number of samples eliminated is more sensitive to the threshold value rather than the value of ν. We do not know at this point if this is a phenomenon peculiar to the dataset that we have used. Experiments on other data sets are needed before commenting on this aspect. Because of the nature of the dataset we are convinced that a Non Linear MPSVM classifier will be able to perform better on this data set. Results for Linear MPSVM classifier are reported here only for completeness.

6 6 Vishwanathan and Murty 5 Conclusion We have proposed a conceptually simple algorithm for data set reduction and outlier detection. In some sense the algorithm comes closer to methods like bootstrapping which work by increasing the separation between the classes. Our algorithm increases separation between classes by eliminating the noisy patterns at the class boundaries and thus leads to better generalization. It also identifies the typical patterns in the dataset and prunes them away leading to a smaller data set. One immediately apparent approach to extend our algorithm is to use the Non Linear MPSVM formulation which may lead to better separation in higher dimensional kernel space. At the time of writing results of this approach are not available. The main difficulty in implementing this approach seems that even a rectangular kernel of modest size requires a large amount of memory because of the large dimensionality of the problem. Use of some dimensionality reduction algorithm like the one proposed by us may be explored to overcome this limitation[9]. Saradhi et. al. report good results by applying data set reduction techniques after bootstrapping the data points[6]. This is an area of further investigation. References 1. G. Fung and O. L. Mangasarian. Proximal support vector machine classifiers. In D. Lee, F. Provost, and R. Srikant, editors, Proceedings KDD2001: Knowledge Discovery and Data Mining, pages 64 70, New York, Glenn Fung and O. L. Mangasarian. Multicategory proximal support vector classifiers. Technical Report 01-06, Data Mining Institute, July L. Holmstrom, P. Koistinen, J. Laaksonen, and E. Oja. Neural and statistical classifier - taxonomy and two case studies. IEEE Transactions on Neural Networks, 8(1):5 17, January S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, and K. R. K. Murthy. A fast iterative nearest point algorithm for support vector machine classifier design. IEEE Transactions on Neural Networks, 11(1):124, J. C. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods: Support Vector Machines. MIT Press, Cambridge, December V. Vijaya Saradhi. Pattern representation and prototype selection for handwritten digit recognition. Master s thesis, Indian Institute of Science, Bangalore, India, June D. Stork, R.O. Duda, and P.E. Hart. Pattern Classification and Scene Analysis. Wiley-Interscience, 2 nd edition, V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 2 nd edition, 2000.

7 MPSVM for Data Set Reduction 7 9. S.V.N. Vishwanathan and M. Narasimha Murty. Use of Kohonen map for dimensionality reduction. Technical Report IISC-CSA , Indian Institute Of Science, Bangalore, India, December S.V.N. Vishwanathan and M. Narasimha Murty. Kohonen s SOM with cache. Pattern Recognition, 33: , 2000.

Generating the Reduced Set by Systematic Sampling

Generating the Reduced Set by Systematic Sampling Chien-Chung Chang and Yuh-Jye Lee Email: {D9115009, yuh-jye}@mail.ntust.edu.tw Department of Computer Science and Information Engineering National Taiwan