Using Neural Networks and Support Vector Machines in Data Mining

Similar documents
Support Vector Machines

Support Vector Machines

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Classification / Regression Support Vector Machines

Feature Reduction and Selection

Machine Learning 9. week

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

The Research of Support Vector Machine in Agricultural Data Classification

Edge Detection in Noisy Images Using the Support Vector Machines

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

Face Recognition Based on SVM and 2DPCA

Announcements. Supervised Learning

Lecture 5: Multilayer Perceptrons

S1 Note. Basis functions.

Smoothing Spline ANOVA for variable screening

Cluster Analysis of Electrical Behavior

GSLM Operations Research II Fall 13/14

Support Vector Machines. CS534 - Machine Learning

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

Discriminative classifiers for object classification. Last time

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Face Recognition Method Based on Within-class Clustering SVM

A Robust LS-SVM Regression

Support Vector Machines

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Parallelism for Nested Loops with Non-uniform and Flow Dependences

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Network Intrusion Detection Based on PSO-SVM

SUMMARY... I TABLE OF CONTENTS...II INTRODUCTION...

Complex System Reliability Evaluation using Support Vector Machine for Incomplete Data-set

CS 534: Computer Vision Model Fitting

LECTURE : MANIFOLD LEARNING

SVM-based Learning for Multiple Model Estimation

Training of Kernel Fuzzy Classifiers by Dynamic Cluster Generation

CLASSIFICATION OF ULTRASONIC SIGNALS

General Vector Machine. Hong Zhao Department of Physics, Xiamen University

A Binarization Algorithm specialized on Document Images and Photos

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Machine Learning. Topic 6: Clustering

Evolutionary Support Vector Regression based on Multi-Scale Radial Basis Function Kernel

Classifier Selection Based on Data Complexity Measures *

Optimal Design of Nonlinear Fuzzy Model by Means of Independent Fuzzy Scatter Partition

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Random Kernel Perceptron on ATTiny2313 Microcontroller

Adaptive Virtual Support Vector Machine for the Reliability Analysis of High-Dimensional Problems

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

Human Face Recognition Using Generalized. Kernel Fisher Discriminant

Support Vector Machines for Business Applications

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Three supervised learning methods on pen digits character recognition dataset

Learning Non-Linearly Separable Boolean Functions With Linear Threshold Unit Trees and Madaline-Style Networks

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers

A New Approach For the Ranking of Fuzzy Sets With Different Heights

The Comparison of Calibration Method of Binocular Stereo Vision System Ke Zhang a *, Zhao Gao b

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Fast Sparse Gaussian Processes Learning for Man-Made Structure Classification

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Classification Of Heart Disease Using Svm And ANN

A Deflected Grid-based Algorithm for Clustering Analysis

Meta-heuristics for Multidimensional Knapsack Problems

Solving two-person zero-sum game by Matlab

Spam Filtering Based on Support Vector Machines with Taguchi Method for Parameter Selection

Solitary and Traveling Wave Solutions to a Model. of Long Range Diffusion Involving Flux with. Stability Analysis

Clustering System and Clustering Support Vector Machine for Local Protein Structure Prediction

Classification of Face Images Based on Gender using Dimensionality Reduction Techniques and SVM

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

PERFORMANCE EVALUATION FOR SCENE MATCHING ALGORITHMS BY SVM

Categories and Subject Descriptors B.7.2 [Integrated Circuits]: Design Aids Verification. General Terms Algorithms

Artificial Intelligence (AI) methods are concerned with. Artificial Intelligence Techniques for Steam Generator Modelling

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Research of Image Recognition Algorithm Based on Depth Learning

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Discrimination of Faulted Transmission Lines Using Multi Class Support Vector Machines

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Learning an Image Manifold for Retrieval

Unsupervised Learning

A Statistical Model Selection Strategy Applied to Neural Networks

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Categorizing objects: of appearance

Quadratic Program Optimization using Support Vector Machine for CT Brain Image Classification

An Anti-Noise Text Categorization Method based on Support Vector Machines *

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

Image Alignment CSC 767

An Optimal Algorithm for Prufer Codes *

Classifying Acoustic Transient Signals Using Artificial Intelligence

RECOGNIZING GENDER THROUGH FACIAL IMAGE USING SUPPORT VECTOR MACHINE

Machine Learning: Algorithms and Applications

Transcription:

Usng eural etworks and Support Vector Machnes n Data Mnng RICHARD A. WASIOWSKI Computer Scence Department Calforna State Unversty Domnguez Hlls Carson, CA 90747 USA Abstract: - Multvarate data analyss technques have the potental to mprove data analyss. Support Vector Machnes (SVS) are a recent addton to the famly of multvarate data analyss. A bref ntroducton to the SVM Vector Machnes technque s followed by an outlne of the practcal applcaton Key-Words: - SVM vector machnes, data analyss 1 Introducton A common problem n varous areas of scence s that of classfcaton. One approach s to develop an algorthm whch fnds complex patterns n nput example data (labeled tranng data) to learn the soluton to the problem. Ths s called supervsed learnng. Such an algorthm can map each tranng example onto two categores, more than two categores, or contnuous, real-valued output. A potental problem wth ths approach s nose n tranng data. There may be no correct underlyng classfcaton functon. Two smlar tranng examples may be n dfferent categores. Another problem may be that the resultng algorthm msclassfes unseen data because t has overfts the tranng data. A better goal s to optmze generalzaton the ablty to correctly classfy unseen data. ext secton shows how the Support Vector Machne learnng methodology addresses these problems. The descrpton follows that of several references n the lterature [4,5,21]. 2 Problem Formulaton The SVM vector method was developed to construct separatng hyperplanes for pattern recognton problems [4,5]. In the 1990s t was generalzed for constructng nonlnear separatng functons and for estmatng real-valued functons. Applcatons of SVMs nclude text categorzaton, character recognton, and bonformatcs and face detecton. Support Vector Machnes (SVM) are learnng machnes that can perform classfcaton and real valued functon approxmaton. SVM creates functons from a set of labeled tranng data and operate by fndng a hypersurface n the space of possble nputs. Ths hypersurface wll attempt to splt the postve examples from the negatve examples. The splt wll be chosen to have the largest dstance from the hypersurface to the nearest of the postve and negatve examples. Intutvely, ths makes the classfcaton correct for testng data that s near, but not dentcal to the tranng data. In detal, durng the tranng phase SVM takes a data matrx as nput, and labels each sample as ether belongng to a gven class (postve) or not (negatve). SVM treats each sample n the matrx as a pont n a hgh-dmensonal feature space, where the number of attrbutes determnes the dmensonalty of the space. SVM learnng algorthm then dentfes a hyperplane n ths space that best separates the postve and negatve tranng samples. The traned SVM can then be used to make predctons about a test sample s membershp n the class. In bref, SVM non-lnearly maps ther n-dmensonal nput space nto a hgh dmensonal feature space. In ths hgh dmensonal feature space a lnear classfer s constructed. 3 SVM based algorthm The man dea of the SVM approach s to map the tranng data nto a hgh dmensonal feature space n whch a decson boundary s determned by constructng the optmal separatng hyperplane. Computatons n the feature space are avoded by usng a kernel functon. The formal goal s to estmate the functon f :R { ± 1} usng nput output tranng data ( x1, y1),...,( x, y ) R { ± 1} such that f wll correctly classfy examples ( x, y),.e. f ( x) = y. s the number of tranng examples. For generalzaton we restrct the class of functons from whch f s chosen. Smply mnmzng the tranng error does not necessarly result n good generalzaton. SVM Vector classfers are based on the class of hyperplanes ( w x) + b= 0 wth w R, b R and correspondng to the decson functon f ( x ) = sgn [( w x ) + b]. w s called the weght vector and b the threshold. w and b s the parameters controllng the functon and must be learned from the data. The unque hyperplane wth maxmal margn of separaton between the two classes s called the optmal hyperplane. The optmzaton problem thus s to fnd the optmal hyperplane. Both the optmzaton problem and the fnal decson functon depend only on dot products between nput vectors. Ths s crucal for the successful generalzaton to the nonlnear case. If f ( x ) s

a nonlnear functon of x one possble approach s to use a neural network, whch conssts of a network of smple lnear classfers. Problems wth ths approach nclude many parameters and the exstence of local mnma. The SVM approach s to map the nput data nto a hgh, possbly nfnte dmensonal feature space, F va a nonlnear map Φ: R F. Then the optmal hyperplane algorthm can be used n F. Ths hgh dmensonalty may lead to a practcal computatonal problem n feature space. Snce the nput vectors appear n the problem only nsde dot products, however, we only need to use dot products n feature space. If we can fnd a kernel functon, K, such that K( x1, x2) =Φ( x1) Φ( x2) then we don t need to know Φ explctly. Mercer s Theorem tells us that a functon K( xy, ) s a kernel,.e. there exsts a mappng Φ such that K( x1, x2) =Φ( x1) Φ( x2) We can choose from known kernel functons: polynomal of degree, Gaussan Radal Bass Functon or sgmod. We propose a new SVMbased method (BSVM), whch uses the dynamc programmng algorthm as a kernel functon. A detaled descrpton of experments can be located n [19]. The result of computatonal experment show that the BSVM method outperforms exstng algorthms we tested. 4 eural etworks based algorthm Over the past a few years, eural etworks, one of the branches n Artfcal Intellgence technology, have ganed popularty among the hydrologcal and hydraulc engneerng communty and some encouragng results have been acheved. Recently, a new tool from the Artfcal Intellgence feld called a Support Vector Machne (SVM) has ganed popularty n the Machne Learnng communty. It has been appled successfully to classfcaton tasks such as pattern recognton, OCR and more recently also to regresson and tme seres. Mathematcally, SVMs are a range of classfcaton and regresson algorthms that have been formulated from the prncples of statstcal learnng theory. So far, these SVMs have been benchmarked aganst artfcal neural networks (As) and outperformed A n many applcaton areas. It has been hypothessed that ths s because there are fewer model parameters to optmse n the SVM approach, reducng the possblty of over fttng the tranng data and thus ncreasng the actual performance. Compared wth tradtonal artfcal neural networks, tranng n SVMs s very robust due to ther quadratc objectve functons. It s useful to explore ths new technology n rver flow modelng area, wth the hope that t could overcome some of the problems n A and may perform much better than the tradtonal lnear models. Both SVMs and As can be represented as two-layer networks (where the weghts are non-lnear n the frst layer and lnear n the second layer). However, whle As generally adapt all the parameters (usng gradent or clusterng-based approaches) SVMs choose the parameters for the frst layer to be the tranng nput vectors because ths mnmses the VC-dmenson as ndcated n Fgure 1. α 1 y K(x 1,x) K(x 1,x) K(x,x) x 1 x 2 x 3 x n Mathematcally, a basc functon for statstcal learnng process s M f ( x) = α φ ( x) = wφ( x ) = 1 α 2 Fgure 1 SVM structure (1) Where the output s a lnearly-weghted sum of M. The nonlnear transformaton s carred by ().The range of models represented by Equaton 1 s extremely broad. SVM s a specal form of them and ts decson functon s represented as f ( x) = αk( x, x ) b = 1 (2) where K s the kernel functon, and b are parameters, s the number of tranng data, x are vectors used n tranng process and x s the ndependent vector. The parameters and b are derved by maxmze ther objectve functon. α Decson rule y = α K( x, x) + b = 1 Weghts α,..., 1 α In SVM, all nput data are organsed as vectors (.e., one dmensonal array) and some of these vectors are used n the modellng process (as demonstrated n Eq 2). Ths s qute dfferent compared wth other models lke A and Lnear TF models, whch are global models. In these models, model parameters are derved from the tranng data set and then only the derved parameters are used n future smulatons. The data for tranng would play no part n the predcton process. SVM s qute dfferent. It uses the tranng data for model calbraton so as to estmate the model parameters, but also keeps the most mportant part of the nput vectors n ts model. These vectors are called support vectors (only a small number of tranng vectors are chosen). The unque structures of the kernel functons used for nonlnear transformaton of onlnear transformaton based on support vectors x,..., 1 x Input vector x = x x 1 n (,..., )

Predcton error nput vectors enable SVM to get rd of most tranng vectors, so that the resulted model s much smaller. The reduced support vectors also mprove the model s generalsaton ablty and decrease the computaton load. Snce SVM theory was orgnally created from the machne learnng communty, ths type of models s coned as Support Vector Machnes. SVM has a strong nonlnear ablty and ths s analogous to the nonlnear treatment for the tradtonal lnear models. As we know, t s possble to transform the nput varables wth a certan nonlnear functons so that lnear models can be used to model nonlnear processes (Generalzed lnear system framework). For example, an nput vector x=(x 1,x 2 ) can be transformed nto a hgher 2 2 dmenson nput vector z = ( x1, x2, x1, x2, x1x2), whch can then be treated as a lnear system. In a smlar fashon, SVM uses some specfc kernel functons whch transform the nput vector as an nner product of nonlnear functons n the model. The selecton of sutable kernel functon for a specfc problem s a very complcated process at the moment and much more research work s stll needed. A major problem n any model tranng s the decson about the complexty of the model s structure. More complcated models tend to do well n tranng but do badly n predcton. For example, a common problem n A s applcatons s overfttng, sometmes the model s weghts are even less than the tranng data ponts. Hgh bas Low varance Test set Low bas Hgh varance Stop for ranfall and flow data are selected for nput vectors. An nput vector can have a mxture of varous varables (e.g., ran, flow, temperature, date, etc). At each computaton step, we sequentally add the newly acqured data and remove the earler ones, to predct the flow n the future. Before the tranng, several key parameters have to be selected by manual operatons. They are a) Three parameters to control the SVM tranng: Cost of error C, Slackness tube and kernel functon b) Wndow szes for ranfall and flow data; c) Scale factors for ranfall and flow data In the process above, C s useful for controllng the smoothness of the functon. Large C values penalse the errors, hence the resulted SVMs have small number of SV. Slackness tube wth s a new concept (n tradtonal least square method, s always zero ) and the nput data whch fall n the tube are not penalsed. Three popular kernel functons are tested: Dth degree polynomal (only 2 s used n ths project), radal bass and Sgmod functons. Varous wndow szes were tested (3 ran, 3 flow; 1 ran, 5 flow; 0 ran,10 flow; 10 ran, 0 flow; 20 ran, 0 flow; ). Scale factors are used to transform ranfall and flow data nto a smlar range otherwse the data of hgh values (.e., small unt) would domnate the tranng process. Most of work n ths part s manual, hence a tedous process due to the huge number of combnatons. In the model calbraton stage, we fnd that SVM can perform very well n many cases. Wth the data used n the tranng (Brdcreek), Polynomal kernel functon performed much better than radal bass and Sgmod kernel functons (see Fgure 3). Comparson wth lnear TF model clearly demonstrated the nonlnear effect of SVMs. The TF model s overestmaton of small peaks were removed by polynomal kernel functons. Low Tranng set Model complexty Hgh Fgure 2 The nfluence of model complexty As ndcated by Fgure 2, to chose a sutable model structure whch acheve the best test result s very mportant. In ths aspect, SVM has an advantage over A that t can automatcally mnmze the number Support Vectors, thus to mprove ts generalsaton ablty. In the modelng process, ranfall data seres (x t, x t-1, )and flow data seres (y t, y t-1, ) are used to construct vectors for the tranng and testng. At each tme step t, y t+1 s the target value and some fxed movng wndows Despte the success of SVM tranng, n testng stage, we found that SVMs were usually less stable than lnear TF models and tended to perform poorly n comparson. However, there were some nterestng features from SVMs that could make them useful for modelng hgh flows. For example, n Fgure 4, although SVM smulated flow s not as close to the measured flow as TF, ts predcted peak flow s much closer to the real peak than the TF model s one. 5 Practcal Experments In ths secton a data set s used to test BSVM. We consder s the Swss roll data. It s a three dmensonal data that looks lke Fgure 1.

Fg 1. Swss Roll Data The dstance between the samples s the geodesc dstance of the surface of two samples. The BSVM method s appled and the average testng msclassfcaton error equal to 3.9%. It shows that the BSVM method s good at ths case. 6 Concluson BSVM provde nonlnear functon approxmatons by mappng nput vectors nto a hgh dmensonal feature space where a hyperplane s constructed to separate classes n the data. Computatonally ntensve calculatons n the feature space are avoded through the use of kernel functons. BSVM correspond to a lnear method n feature space whch makes them theoretcally easer to analyze. Over the past a few years, eural etworks, one of the branches n Artfcal Intellgence technology, have ganed popularty among the hydrologcal and hydraulc engneerng communty and some encouragng results have been acheved. Recently, a new tool from the Artfcal Intellgence feld called a Support Vector Machne (SVM) has ganed popularty n the Machne Learnng communty. It has been appled successfully to classfcaton tasks such as pattern recognton, OCR and more recently also to regresson and tme seres. Mathematcally, SVMs are a range of classfcaton and regresson algorthms that have been formulated from the prncples of statstcal learnng theory. So far, these SVMs have been benchmarked aganst artfcal neural networks (As) and outperformed A n many applcaton areas. It has been hypothessed that ths s because there are fewer model parameters to optmse n the SVM approach, reducng the possblty of over fttng the tranng data and thus ncreasng the actual performance. Compared wth tradtonal artfcal neural networks, tranng n SVMs s very robust due to ther quadratc objectve functons. It s useful to explore ths new technology n rver flow modelng area, wth the hope that t could overcome some of the problems n A and may perform much better than the tradtonal lnear models. References: [1]Altschul, S. F. et al. 1997. Gapped blast and ps-blast: A new generaton of proten database search programs. uclec Acds Research 25:3389-3402. [2] Ben-Hur, A. D., Horn, H. T. Segelmann, and V. Vapnk, SVM Vector Cluster, Journal of Machne Learnng Research, vol. 2, pp. 125-137, 2001. [3] Boser, B. E. I. M. Guyon, and V. Vapnk, A Tranng Algorthm for Optmal Margn Classfers, Proceedngs of the Ffth Annual ACM Workshop on Computatonal Learnng Theory, pp. 144-152, ACM Press, 1992. [4] Joachms, T. Makng Large-scale SVM VECTOR MACHIELearnng Practcal, Advances n Kernel Methods: SVM Vector Learnng, MIT Press, Cambrdge, MA, 1998. [5]Jaakkola, T., Dekhans, M. and Haussler, D. 2000. A dscrmnatve framework for detectng remote proten homologes. Journal of Computatonal Bology 7:95-114. [6] Burges, C. Smplfed SVM Vector Decson Rules, Internatonal Conference on Machne Learnng, 1996.[7] Romdhan, S. P. Torr, B. Schölkopf, and A. Blake, Computatonally Effcent Face Detecton, Internatonal Conference on Computer Vson, 2001.[8]Lao, L. and oble, W. S. 2002. Combnng parwse sequence smlarty and SVM vector machnes for remote proten homology detecton. In: Proc. 6th Annual Internatonal Conference on Computatonal Molecular Bology(RECOMB 2002), ew York: ACM. pp. 225-232. [9]Eddy, S. R. 1995. Multple algnment usng Hdden Markov models. In Proc. 3rd Internatonal Conference on Intellgent Systems for Molecular Bology (ISMB 95), AAAI Press. pp. 114-120. [10]Karplus, K., Barrett, C. and Hughey, R. 1998. Hdden markov models for detectng remote proten homologes. Bonformatcs 14:846-856. [11]Murzn, A. G. et al. 1995. SCOP: A structural classfcaton of protens database for the nvestgaton of sequences and structures. Journal of Molecular Bology 247:536-540. [12]Sago, H., Vert, J-P., Akutsu, T. and Ueda,. 2002. Proten homology detecton usng strng algnment kernels. Manuscrpt. [13] DeCoste D. and D. Mazzon, Fast Query-Optmzed Kernel Machne Classfcaton va Incremental Approxmate earest SVM Vectors, Internatonal Conference on Machne Learnng, 2003.[14] Marchand M. and J. Shawe-Taylor, The Set Coverng Machne, Journal of Machne Learnng Research, vol. 3, pp. 723-746, 2002 [15] Muller, K. R. S. Mka, G. Ratsch, K. Tsuda, and B. Schölkopf, An Introducton to Kernel-based Learnng Algorthms, IEEE Transactons on eural etworks, vol. 12(2), pp. 181-201, 2001. [16] LeCun, Y., B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W.

Hubbard, and L. J. Jackel, Handwrtten Dgt Recognton wth Back-propagaton etwork, Advances n eural Informaton Processng Systems, 1990.[17] Platt, J., Fast Tranng of SVM Vector Machnes Usng Sequental Mnmal Optmzaton, Advances n Kernel Methods: SVM Vector Learnng, MIT Press, Cambrdge, MA, 1999.[18]Smth, T. and Waterman, M. A. 1981. Identfcaton of common molecular subsequences. Journal of Molecular Bology 147:195-197.[19] Wasnowsk R., Improvng the Performance of SVM Vector Machnes, RAW-98-104, June, 1999. [20] Wasnowsk R.,, The Use of SVM Vector Machnes n Data Mnng, RAW-99-99A, June, 1998[21] Vapnk,V, The ature of Statstcal Learnng Theory, 2nd ed., Sprnger-Verlag, ew York, 1999.[22] Zhang L. and B. Zhang, A Geometrcal Representaton of McCulloch-Ptts eural Model and Its Applcatons, IEEE Transactons on eural etworks, vol. 10(4), pp. 291-295, 1999.