The Pseudo Gradient Search and a Penalty Technique Used in Classifications.
|
|
- Merilyn Dalton
- 6 years ago
- Views:
Transcription
1 The Pseudo Gradient Search and a Penalty Technique Used in Classifications. Janyl Jumadinova Advisor: Zhenyuan Wang Department of Mathematics University of Nebraska at Omaha Omaha, NE, 68182, USA Abstract The aim of this work is to use the pseudo gradient search to solve classification problems. In most classifiers, the goal is to reduce the misclassified rate that is discrete. Since pseudo gradient search is a local search, to use it for classification problem, objective function has to be real valued. A penalty technique is used for this purpose. 1. Introduction Classification is an optimization problem with an objective of minimizing misclassified data. Given a set of training data with a size l and n predictive attributes, each new record needs to be classified. Various classification methods have been proposed and good results have been achieved using nonlinear integrals, such as Choquet integral, as aggregation tool [8]. The use of weighted Choquet integrals with respect to fuzzy measures in classification was first proposed by Xu et al.[8]. The Choquet integral was used to project the data onto an optimal line to make classification one-dimensional. Since the projection is generally nonlinear, the classification is also nonlinear. 1
2 Yan et al. proposed nonlinear classification methods using linear programming and signed fuzzy measures [10] to account for linearly inseparable data. Other classification methods that use statistics and machine learning approaches have been proposed such as decision tree [2], [3] and support vector machine [9] algorithms. Classification plays an important part in some fields, such as medicine and manufacturing. For example, disease must be correctly diagnosed before proper treatment can be given. In addition to making classifying data faster and cheaper, automatization of classification jobs can help eliminate human errors. Gradient search algorithm can be used to solve nonlinear optimization problems, such as classification. It uses partial derivatives of the function to pick the best direction for the search. When the objective function is not differentiable, pseudo gradient search can be applied, where differences are used instead to obtain the best direction. The paper is organized as follows. In Section 2, background information on classification problem is given. In Section 3, pseudo gradient search and penalty technique are discussed. In Section 4, pseudo gradient search algorithm for classification problem is presented. Section 5 describes some testing examples. 2. Classification The goal of classification is to build a model of the classifying attribute based on the predictive attributes. Then we can use this model to determine what class an observation belongs to. This paper uses an idea of the classification method based on nonlinear integrals discussed in [8]. The idea is to project the points in the feature space onto a real axis through nonlinear integral and then to optimally classify these points according to certain criterion. Each point in the feature space becomes a value of the virtual variable, ŷ i, i = 1,..., n. This way, each classification boundary is just a point on the real axis. Next, a few mathematical concepts will be discussed. Let x 1, x 2,..., x n be predictive attributes and P(X) be the power set of X, then X = {x 1, x 2,..., x n } is a feature space. Let (X, P(X)) be a measurable space and µ : P(X) [0, ) be a fuzzy measure satisfying the following 2
3 conditions: 1) µ( ) = 0 (vanishing at the empty set) 2) µ(a) µ(b) if A B, A, B P(X) (monotonicity) µ is nonadditive in general and it is regular if µ(x) = 1. Its nonadditivity represents the interaction among predictive attributes towards a certain objective attribute. An observation of predictive attributes can be defined as a function f : X (, ), then the jth observation of attribute x i is f ji = f j (x i ), i = 1, 2,..., n and j = 1, 2,..., l. The Choquet integral of a nonnegative function f is defined as: (c) f dµ = 0 µ(f α ) dα, where F α = {x f(x) α} for any α [0, ) is a level set of f. where To calculate Choquet integral the following procedure will be used: (c) fdµ = 2 n 1 j=1 z j µ j min f(x i ) max f(x i ) if it is > 0 or j = 2 n 1; i : frc( z j = j 2 i ) [ 1 2, 1) i : frc( j 2 i ) [0, 1 2 ) 0 otherwise. for j = 1, 2,..., 2 n 1, where frc( j j ) is the fractional part of and with a 2 i 2 i convention that the maximum on the empty set is zero. If we express j in binary form j n j n 1...j 1, then {i frc( j 2 i ) [ 1 2, 1)} = {i j i = 1} and {i frc( j 2 i ) [0, 1 2 )} = {i j i = 0} 3
4 3. Pseudo Gradient Search and Penalty Technique One way of solving classification problem is to use gradient search. We can start at a point and then move in the direction that gives the largest increase in the values of the objective function f, where, directional derivative has the largest value. The gradient of the objective function is the vector of first derivatives. It s norm is the magnitude of the gradient vector. When the objective function is not differentiable, the traditional gradient search fails. In such case, we can replace gradient with pseudo gradient to determine the best search direction. Advantages of pseudo gradient search are its fast convergence and the fact that objective function doesn t have to be differentiable. Disadvantage is getting trapped in some local minimum or maximum and not being able to find global minimum or maximum, like in any other local search. The goal of the pseudo gradient search in this paper is to reduce the number of misclassified observations, or ideally, to obtain no misclassified observations. Objective function in classification problems is usually the misclassification rate, which is discrete. However, to use pseudo gradient search, objective function has to be real valued. A penalty technique can be applied to classification problem to make objective function real valued. Penalty techniques are generally used to make the constrained problem into unconstrained problem by penalizing infeasible solutions. There is no general guideline on how to design penalty functions. It is usually problem-dependant. For the classification problem, it is convenient to express penalty function in terms of the sum of the distances of each misclassified point from the boundary. Pseudo gradient search works as follows: We take a small step in the positive direction and we take a step of the same length in the negative direction from the initial value and we calculate the value of the penalized objective function associated with the step. The step that had smaller value of the penalized objective function determines the direction where we want to go. When the best search direction is determined, the length of the step is iteratively doubled in that direction. When the value of the penalized objective function between steps starts increasing, direction is reversed and the length of the step is iteratively shortened in half until the value of the penalized objective function between iterations increases again. 4
5 4. Pseudo Gradient Search Algorithm for Classification Problem Algorithm: Summary of the variables: n: the number of attributes l: the number of observations m: the number of misclassified attributes δ: a small number, 10 6 in our case, used as the step length when testing for the best direction ŷ j : the virtual variable, used to project points into a real line q: the vector of objective attribute a and b: vectors used in multiregression b : best boundary reducing misclassifications p: value of the penalty function µ k : denotes µ(a), where A = k i =1 {x i} and k has a binary expression k = k n k n 1...k. t: running time in seconds Algorithm: 1. Input: number of attributes, n; number of observations, l; and the data. 2. Initialize vector q, where q j = y j, j = 1,.., l and y j is the value of the objective attribute, and initialize vectors a and b by picking 2n standard uniform random numbers. The first n numbers are for vector a and the second n numbers are for vector b. These numbers represent vector g. 3. Calculate a i and b i as follows: a i = g i min 1 i n g i (1 g i )(1 min 1 i n g i ) b i = 2g n+i 1 max 1 i n 2g n+i 1 for i = 1,.., n, where a and b are n-dimensional vectors, a = (a 1, a 2,..., a n ) and b = (b 1, b 2,..., b n ), that are used to balance the various phases and 5
6 scales of predictive attributes and should satisfy the following: a i 0 for i = 1, 2,..., n with min 1 i n a i = 0; and 1 b i 1 for i = 1, 2,..., n with max 1 i n b i = Construct matrix Z with dimensions l x 2 n as follows: z j0 = 1, min(a i + b i f ji ) max(a i + b i f ji ), if z z jk = jk > 0 or k = 2 n 1; 0, otherwise. where j = 1,..., l and k = 0,..., 2 n Apply the QR decomposition theorem to find the matrix least squares solution of the system of linear equations Zv=q, where unknown variables c, µ 1, µ 2,..., µ 2 n 1 are the elements of v. 6. Calculate ŷ j = c + (c) (a + bf j )dµ = c + 2 n 1 k=1 z jkµ k, for j = 1,.., l where ŷ j the current estimate of the virtual variable of the objective attribute for the j th observation. 7. Find the best boundary, b. Once the estimated y values have been computed, we need to classify them. This is done by searching for the boundary that minimizes the number of misclassified points. We simplify this case by allowing only two values for the classifying attribute. For local search algorithms, it is important to pick a good starting point. Therefore, here the initial boundary is obtained from the ratio of the observations in class 1 and class 2. Then, for each candidate boundary, we classify the computed ŷ j. Out of all candidate boundaries we pick the one that minimizes the number of misclassified points. 8. Calculate the initial penalty as follows: p 0 = m l m i=1 b ŷ i, for i = 1,..m, where m is the number of the misclassified points and b is the best boundary. 9. a. Take a step in the positive direction by adding δ to a i, for i = 1,..., n, where δ = b. Repeat steps
7 c. Take a step in the negative direction by subtracting 2δ from a i. d. Repeat steps 3-7. e. Compare the penalties obtained by stepping into positive direction, p δ +, and into negative direction, p δ to the initial penalty. If they are greater or equal to p 0, then no step should be taken. If p δ + < p δ, then (p 0 p δ +)a i a i, otherwise ( 1)(p 0 p δ )a i a i, where a i monitors changes in each dimension of vector a. 10. Repeat previous step for vector b. b will keep track of changes in each dimension of vector b. 11. Reset a and b to their original values. 12. Start Doubling: a. Double a and b vectors: 2 a i a i for all i = 1,..., n and 2 b i b i for all i = n + 1,..., 2n. b. If a i < 0, a i = 0. Values of vector a must be nonnegative. c. If b i > 1, then b i = 1 If b i < 1, then b i = 1 since the values for b must be between -1 and 1. d. Let p l be the latest penalty, calculate p l as in step 8 e. Repeat this step until p l > p Reverse the directions of the vectors (doubling went too far) by making them negative. 14. Start Halving: a. 1 2 a i a i for all i = 1,...n, and 1 2 b i b i for all i = n + 1,..., 2n. b. If a i < 0, a i = 0. Values of vector a must be nonnegative. c. If b i > 1, then b i = 1 If b i < 1, then b i = 1 since the values for b must be between -1 and 1. d. Calculate the latest penalty as in step 8 e. Repeat this step until p l > p 0. 7
8 15. Reverse the directions of the vectors (halving went too far) by making them negative. 16. Repeat steps 3-7 to obtain new values for c, µ, ŷ j, and p. 17. Let Max( a, b) = max 1 i n { a i a i, b i b i }. We need Max( a, b) to check if the largest change in any dimension of the change vectors was greater than δ. 18. If p 0 > 0, p l > 0 and Max( a, b) δ, then go on to the next step. Otherwise go to the last step. 19. If p l > p 0, go on to the next step, otherwise skip the next step. 20. The change from a to a and b to b has to be iteratively reduced until the penalty is smaller than the initial penalty. a. a i a i â i, b i b i ˆb i for all i = 1,..., n. b. If â i > δ, then â i 2 â i and a i + â i a i for all i = 1,.., n. c. If ˆb i > δ, then ˆb i 2 ˆb i and b i + ˆb i b i for all i = 1,.., n. d. Repeat steps 5, 6, 7, and 9. g. Let M = max 1 i n { â i, ˆb i }. Then if p > p 0 and M > δ, go to step 21 b, otherwise continue with the next step. 21. Output p l, m and the running time. 5. Simulation Results The algorithm has been coded in Java and it has been run on a Pentium M 1.73 GHz computer. We ran the algorithm on the whole data for each database (reclassification). Data sets used for our simulations can be seen in Table 1 and they are described in more detail below. 8
9 Data name number of attributes size of the data set Data from [8] Leptograptus crabs Synthetic data PIMA Heart Credit Card Table 1: Databases used in simulations. Classification results are summed up in Table 2. Best results are recorded based on 10 runs on each data set, unless perfect classification was obtain before that. Data name latest penalty Accuracy running time (sec) Data from [8] % Crabs % Synthetic data % Pima Diabetes % Heart % Credit card % Table 2: Classification Results. Data from [8]: First, artificial data presented in Tables IV and V in [8] was used for comparison purposes. We obtain nearly perfect classification. Leptograpsus crabs data: This is the data on morphology of rock crabs of genus Leptograpsus [6], [12]. There are 100 specimens both male and female (evenly distributed) of two color forms (two classes) - blue form or orange form. Attributes are: 1. FL - frontal lip of carapace (mm) 2. RW - rear width of carapace (mm) 3. CL - length along the midline of carapace (mm) 4. CW - maximum width of carapace (mm) 5. BD - body depth (mm) 9
10 After 5 runs, we are able to obtain perfect classification. Synthetic data: Synthetic data is the data from Ripley [6] and it is available on [12]. It has two real-valued attributes - x and y coordinates and a class, 0 or 1. We can plot a graph of this data set in 2-Dimensional space, with attribute 1 and attribute 2 representing corresponding axis. Among 1250 data points, 650 points are in class 1 and 650 points are in class 2 (evenly split). Red points (crosses) represent class 1 and blue points (circles) represent class 2. attribute 2 attribute 1 Figure 1: Synthetic data set. Figure 2 shows the results of classification. For this data set, the best classification accuracy we get is around 93%, that is 88 out of 1250 points are misclassified. 10
11 attribute 2 attribute 1 Figure 2: Synthetic data classification. The bold lines in Figure 2 approximately show the classification obtained by the algorithm. Figure 3 shows the progress made by the algorithm over 10 runs. 11
12 92.64 % % % % % % % classification accuracy, % % % % number of runs Figure 3: Classification Results over 10 runs. Pima Indians diabetes data: This data was tested on females at least 21 years old of Pima Indian heritage living near Phoenix, Arizona according to World Health Organization criteria. The data were collected by the US National Institute of Diabetes and Digestive and Kidney Diseases (Smith et al, 1988). Class 1 means the patient was tested positive for diabetes and 0 - negative. Attributes are: 1. npreg - number of pregnancies 2. glu - plasma glucose concentration in an oral glucose tolerance test 3. bp - diastolic blood pressure (mm Hg) 4. skin - triceps skin fold thickness (mm) 5. ins - serum insulin (micro U/ml) 6. bmi - body mass index (weight in kg/(height in m) 2 ) 7. ped - diabetes pedigree function 8. age - in years 12
13 More information about this data set can be found in [11]. Our algorithm compares better to algorithms in [5] and [2]. Heart Data: This data set is from a well-known StatLog project [5] and is available on [11]. The database contains 13 attributes, that are: 1. age 2. sex 3. chest pain type ( 4 values ) 4. resting blood pressure 5. serum cholestoral in mg/dl 6. fasting blood sugar > 120 mg/dl 7. resting electrocardiographic results ( values 0, 1, 2 ) 8. maximum heart rate achieved 9. exercise induced angina 10. oldpeak = ST depression induced exercise relative to rest 11. the slope of the peak exercise ST segment 12. number of major vessels (0-3) colored by flourosopy 13. thal: 3=normal; 6=fixed defect; 7=reversable defect Classes are absence or presence of heart disease. Our algorithm performs better than some other proposed algorithms such as Genetic Programming algorithms and Decision Tree algorithms [1], [2], [3]. German Credit Card Data: This is another data set taken from StatLog project [5] and is available on [11]. Classes are good and bad credit. We used data set with numerical attributes. It is the data set that has been edited and several indicator variables have been added by Strathclyde University, to make it suitable for algorithms that take in numerical values. Original attributes are: 1. Status of the existing checking account 2. Duration in months 3. Credit history 4. Purpose 5. Credit amount 6. Savings account/bonds 13
14 7. Length of present employment 8. Installment rate in percentage of disposable income 9. Personal status and sex 10. Other debtors / guarantors 11. Number of years living in present residence 12. Property 13. Age 14. Other installment plans 15. Housing 16. Number of existing credits at this bank 17. Job 18. Number of people being liable to or provide maintenance for 19. Telephone 20. Foreign worker The same data set was used in [5]. Our algorithm does slightly better than some of the existing algorithms [1], [5] for this database. 6. Conclusion Our algorithm provides a fast way to classify data using local, pseudo gradient search by converting objective function into real valued function with the help of the penalty. We notice that we have premature convergence at some runs, that is the algorithm is converging before misclassification rate is minimized. Overall, algorithm produces good results, the misclassification rate is low and the convergence rate is fast. 7. References [1] J. Eggermont, J. Kok, W. Kosters, Genetic Programming for Data Classification: Partitioning the Search Space, Proceedings of the 2004 ACM symposium on Applied computing, pp , [2] J. R. Quinlan, Induction of Decision Tree, Machine Learning, pp , 1986 [3] M. Last, O. Maimon, A compact and accurate model for classification, IEEE Transactions on Knowledge and Data Engineering, Vol. 16, pp ,
15 [4] M. Liu, Z. Wang, Classification using generalized Choquet integral projections, Proc. IFSA, pp , [5] D. Michie, D. Spiegelhalter, and C. Taylor. Machine Learning, Neural and Statistical Classification, Ellis Horwood, [6] B. Ripley, Pattern Recognition and Neural Networks, Cambridge University Press, [7] M. Spilde, Z. Wang, Solving nonlinear optimization problems based on Choquet integrals by using a soft computing technique, Proc. IFSA, pp , [8] K. Xu, Z. Wang, P.A.Heng, K.S.Leung, Classification by nonlinear integral projections, IEEE T. Fuzzy Systems 11, No.2, pp , [9] V. Vapnik, Statistical Learning Theory, Wiley, 1998 [10] N. Yan, Z. Wang, Y. Shi, Z. Chen, Nonlinear classification by linear programming with signed fuzzy measures, Proc. FUZZIEEE, pp , [11] mlearn/mlsummary.html. [12] 15
Predicting Diabetes and Heart Disease Using Diagnostic Measurements and Supervised Learning Classification Models
Predicting Diabetes and Heart Disease Using Diagnostic Measurements and Supervised Learning Classification Models Kunal Sharma CS 4641 Machine Learning Abstract Supervised learning classification algorithms
More informationPredicting Diabetes using Neural Networks and Randomized Optimization
Predicting Diabetes using Neural Networks and Randomized Optimization Kunal Sharma GTID: ksharma74 CS 4641 Machine Learning Abstract This paper analysis the following randomized optimization techniques
More informationA proposal of hierarchization method based on data distribution for multi class classification
1 2 2 Paul Horton 3 2 One-Against-One OAO One-Against-All OAA UCI k-nn OAO SVM A proposal of hierarchization method based on data distribution for multi class classification Tatsuya Kusuda, 1 Shinya Watanabe,
More informationPerformance Evaluation of Various Classification Algorithms
Performance Evaluation of Various Classification Algorithms Shafali Deora Amritsar College of Engineering & Technology, Punjab Technical University -----------------------------------------------------------***----------------------------------------------------------
More informationStatistics & Analysis. Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2
Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2 Weijie Cai, SAS Institute Inc., Cary NC July 1, 2008 ABSTRACT Generalized additive models are useful in finding predictor-response
More informationMachine Learning with MATLAB --classification
Machine Learning with MATLAB --classification Stanley Liang, PhD York University Classification the definition In machine learning and statistics, classification is the problem of identifying to which
More informationRetrieving and Working with Datasets Prof. Pietro Ducange
Retrieving and Working with Datasets Prof. Pietro Ducange 1 Where to retrieve interesting datasets UCI Machine Learning Repository https://archive.ics.uci.edu/ml/datasets.html Keel Dataset Repository http://sci2s.ugr.es/keel/datasets.php
More informationData cleansing and wrangling with Diabetes.csv data set Shiloh Bradley Webster University St. Louis. Data Wrangling 1
Data cleansing and wrangling with Diabetes.csv data set Shiloh Bradley Webster University St. Louis Data Wrangling 1 Data Wrangling 2 Executive Summary Through data wrangling, data is prepared for further
More informationDouble Sort Algorithm Resulting in Reference Set of the Desired Size
Biocybernetics and Biomedical Engineering 2008, Volume 28, Number 4, pp. 43 50 Double Sort Algorithm Resulting in Reference Set of the Desired Size MARCIN RANISZEWSKI* Technical University of Łódź, Computer
More informationArtificial Neural Networks (Feedforward Nets)
Artificial Neural Networks (Feedforward Nets) y w 03-1 w 13 y 1 w 23 y 2 w 01 w 21 w 22 w 02-1 w 11 w 12-1 x 1 x 2 6.034 - Spring 1 Single Perceptron Unit y w 0 w 1 w n w 2 w 3 x 0 =1 x 1 x 2 x 3... x
More informationSTENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015
STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, tsvv@steno.dk, Steno Diabetes Center June 11, 2015 Contents 1 Introduction 1 2 Recap: Variables 2 3 Data Containers 2 3.1 Vectors................................................
More informationC5.0 Classification Algorithm and Application on Individual Credit Evaluation of Banks
Systems Engineering Theory & Practice Volume 29, Issue 12, February 2009 Online English edition of the Chinese language journal Cite this article as: SETP, 2009, 29(12): 94 104 C5.0 Classification Algorithm
More informationHeart Disease Detection using EKSTRAP Clustering with Statistical and Distance based Classifiers
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. IV (May-Jun. 2016), PP 87-91 www.iosrjournals.org Heart Disease Detection using EKSTRAP Clustering
More informationEnterprise Miner Tutorial Notes 2 1
Enterprise Miner Tutorial Notes 2 1 ECT7110 E-Commerce Data Mining Techniques Tutorial 2 How to Join Table in Enterprise Miner e.g. we need to join the following two tables: Join1 Join 2 ID Name Gender
More informationGenetic Programming for Data Classification: Partitioning the Search Space
Genetic Programming for Data Classification: Partitioning the Search Space Jeroen Eggermont jeggermo@liacs.nl Joost N. Kok joost@liacs.nl Walter A. Kosters kosters@liacs.nl ABSTRACT When Genetic Programming
More informationInternational Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at
Performance Evaluation of Ensemble Method Based Outlier Detection Algorithm Priya. M 1, M. Karthikeyan 2 Department of Computer and Information Science, Annamalai University, Annamalai Nagar, Tamil Nadu,
More informationThe Basics of Decision Trees
Tree-based Methods Here we describe tree-based methods for regression and classification. These involve stratifying or segmenting the predictor space into a number of simple regions. Since the set of splitting
More informationSupervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.
Supervised Learning with Neural Networks We now look at how an agent might learn to solve a general problem by seeing examples. Aims: to present an outline of supervised learning as part of AI; to introduce
More information5 Learning hypothesis classes (16 points)
5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated
More information.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..
.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar.. Machine Learning: Support Vector Machines: Linear Kernel Support Vector Machines Extending Perceptron Classifiers. There are two ways to
More informationFuzzy Partitioning with FID3.1
Fuzzy Partitioning with FID3.1 Cezary Z. Janikow Dept. of Mathematics and Computer Science University of Missouri St. Louis St. Louis, Missouri 63121 janikow@umsl.edu Maciej Fajfer Institute of Computing
More informationModule 1 Lecture Notes 2. Optimization Problem and Model Formulation
Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationDECISION-TREE-BASED MULTICLASS SUPPORT VECTOR MACHINES. Fumitake Takahashi, Shigeo Abe
DECISION-TREE-BASED MULTICLASS SUPPORT VECTOR MACHINES Fumitake Takahashi, Shigeo Abe Graduate School of Science and Technology, Kobe University, Kobe, Japan (E-mail: abe@eedept.kobe-u.ac.jp) ABSTRACT
More informationCHAPTER 3 RESEARCH METHODOLOGY
CHAPTER 3 RESEARCH METHODOLOGY 3.1 Introduction This chapter discusses the methodology that is used in this study. The first section describes the steps involve, follows by dataset representation. The
More informationData mining with Support Vector Machine
Data mining with Support Vector Machine Ms. Arti Patle IES, IPS Academy Indore (M.P.) artipatle@gmail.com Mr. Deepak Singh Chouhan IES, IPS Academy Indore (M.P.) deepak.schouhan@yahoo.com Abstract: Machine
More informationLinear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines
Linear Models Lecture Outline: Numeric Prediction: Linear Regression Linear Classification The Perceptron Support Vector Machines Reading: Chapter 4.6 Witten and Frank, 2nd ed. Chapter 4 of Mitchell Solving
More informationSupport Vector Machines
Support Vector Machines Chapter 9 Chapter 9 1 / 50 1 91 Maximal margin classifier 2 92 Support vector classifiers 3 93 Support vector machines 4 94 SVMs with more than two classes 5 95 Relationshiop to
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More informationCOMPARATIVE STUDY OF ATTRIBUTE SELECTION USING GAIN RATIO AND CORRELATION BASED FEATURE SELECTION
International Journal of Information Technology and Knowledge Management July-December 200, Volume 2, No. 2, pp. 27-277 COMPARATIVE STUDY OF ATTRIBUTE SELECTION USING GAIN RATIO AND CORRELATION BASED FEATURE
More informationREAL-CODED GENETIC ALGORITHMS CONSTRAINED OPTIMIZATION. Nedim TUTKUN
REAL-CODED GENETIC ALGORITHMS CONSTRAINED OPTIMIZATION Nedim TUTKUN nedimtutkun@gmail.com Outlines Unconstrained Optimization Ackley s Function GA Approach for Ackley s Function Nonlinear Programming Penalty
More informationClassification: Linear Discriminant Functions
Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions
More informationPackage ordinalforest
Type Package Package ordinalforest July 16, 2018 Title Ordinal Forests: Prediction and Variable Ranking with Ordinal Target Variables Version 2.2 Date 2018-07-16 Author Roman Hornung Maintainer Roman Hornung
More informationLecture 3: Linear Classification
Lecture 3: Linear Classification Roger Grosse 1 Introduction Last week, we saw an example of a learning task called regression. There, the goal was to predict a scalar-valued target from a set of features.
More informationSetting a Good Example: Improving Generalization Performance in Support Vector Machines through Outlier Exclusion
Setting a Good Example: Improving Generalization Performance in Support Vector Machines through Outlier Exclusion P. Dwight Kuo and Wolfgang Banzhaf Department of Computer Science Memorial University of
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationWeek 3: Perceptron and Multi-layer Perceptron
Week 3: Perceptron and Multi-layer Perceptron Phong Le, Willem Zuidema November 12, 2013 Last week we studied two famous biological neuron models, Fitzhugh-Nagumo model and Izhikevich model. This week,
More information12 Classification using Support Vector Machines
160 Bioinformatics I, WS 14/15, D. Huson, January 28, 2015 12 Classification using Support Vector Machines This lecture is based on the following sources, which are all recommended reading: F. Markowetz.
More informationLeave-One-Out Support Vector Machines
Leave-One-Out Support Vector Machines Jason Weston Department of Computer Science Royal Holloway, University of London, Egham Hill, Egham, Surrey, TW20 OEX, UK. Abstract We present a new learning algorithm
More informationFUZZY KERNEL K-MEDOIDS ALGORITHM FOR MULTICLASS MULTIDIMENSIONAL DATA CLASSIFICATION
FUZZY KERNEL K-MEDOIDS ALGORITHM FOR MULTICLASS MULTIDIMENSIONAL DATA CLASSIFICATION 1 ZUHERMAN RUSTAM, 2 AINI SURI TALITA 1 Senior Lecturer, Department of Mathematics, Faculty of Mathematics and Natural
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 20: 10/12/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter
More informationComputational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions
Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Thomas Giraud Simon Chabot October 12, 2013 Contents 1 Discriminant analysis 3 1.1 Main idea................................
More informationCyber attack detection using decision tree approach
Cyber attack detection using decision tree approach Amit Shinde Department of Industrial Engineering, Arizona State University,Tempe, AZ, USA {amit.shinde@asu.edu} In this information age, information
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationAdvanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs
Advanced Operations Research Techniques IE316 Quiz 1 Review Dr. Ted Ralphs IE316 Quiz 1 Review 1 Reading for The Quiz Material covered in detail in lecture. 1.1, 1.4, 2.1-2.6, 3.1-3.3, 3.5 Background material
More informationRobust 1-Norm Soft Margin Smooth Support Vector Machine
Robust -Norm Soft Margin Smooth Support Vector Machine Li-Jen Chien, Yuh-Jye Lee, Zhi-Peng Kao, and Chih-Cheng Chang Department of Computer Science and Information Engineering National Taiwan University
More informationGenerating the Reduced Set by Systematic Sampling
Generating the Reduced Set by Systematic Sampling Chien-Chung Chang and Yuh-Jye Lee Email: {D9115009, yuh-jye}@mail.ntust.edu.tw Department of Computer Science and Information Engineering National Taiwan
More informationUnivariate Margin Tree
Univariate Margin Tree Olcay Taner Yıldız Department of Computer Engineering, Işık University, TR-34980, Şile, Istanbul, Turkey, olcaytaner@isikun.edu.tr Abstract. In many pattern recognition applications,
More informationRadial Basis Functions: An Algebraic Approach (with Data Mining Applications)
Radial Basis Functions: An Algebraic Approach (with Data Mining Applications) Tutorial Amrit L. Goel Miyoung Shin Dept. of EECS ETRI Syracuse University Daejon, Korea, 305-350 Syracuse, NY 3244 shinmy@etri.re.kr
More informationWeek 5. Convex Optimization
Week 5. Convex Optimization Lecturer: Prof. Santosh Vempala Scribe: Xin Wang, Zihao Li Feb. 9 and, 206 Week 5. Convex Optimization. The convex optimization formulation A general optimization problem is
More informationAn Integer Recurrent Artificial Neural Network for Classifying Feature Vectors
An Integer Recurrent Artificial Neural Network for Classifying Feature Vectors Roelof K Brouwer PEng, PhD University College of the Cariboo, Canada Abstract: The main contribution of this report is the
More informationCost Sensitive Time-series Classification Shoumik Roychoudhury, Mohamed Ghalwash, Zoran Obradovic
THE EUROPEAN CONFERENCE ON MACHINE LEARNING & PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES (ECML/PKDD 2017) Cost Sensitive Time-series Classification Shoumik Roychoudhury, Mohamed Ghalwash,
More informationLecture 19: Decision trees
Lecture 19: Decision trees Reading: Section 8.1 STATS 202: Data mining and analysis November 10, 2017 1 / 17 Decision trees, 10,000 foot view R2 R5 t4 1. Find a partition of the space of predictors. X2
More informationGene Expression Based Classification using Iterative Transductive Support Vector Machine
Gene Expression Based Classification using Iterative Transductive Support Vector Machine Hossein Tajari and Hamid Beigy Abstract Support Vector Machine (SVM) is a powerful and flexible learning machine.
More informationChapter 4 Fuzzy Logic
4.1 Introduction Chapter 4 Fuzzy Logic The human brain interprets the sensory information provided by organs. Fuzzy set theory focus on processing the information. Numerical computation can be performed
More information3 Nonlinear Regression
CSC 4 / CSC D / CSC C 3 Sometimes linear models are not sufficient to capture the real-world phenomena, and thus nonlinear models are necessary. In regression, all such models will have the same basic
More information06: Logistic Regression
06_Logistic_Regression 06: Logistic Regression Previous Next Index Classification Where y is a discrete value Develop the logistic regression algorithm to determine what class a new input should fall into
More informationCSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13
CSE 634 - Data Mining Concepts and Techniques STATISTICAL METHODS Professor- Anita Wasilewska (REGRESSION) Team 13 Contents Linear Regression Logistic Regression Bias and Variance in Regression Model Fit
More informationPackage reglogit. November 19, 2017
Type Package Package reglogit November 19, 2017 Title Simulation-Based Regularized Logistic Regression Version 1.2-5 Date 2017-11-17 Author Robert B. Gramacy Maintainer Robert B. Gramacy
More informationClassification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging
1 CS 9 Final Project Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging Feiyu Chen Department of Electrical Engineering ABSTRACT Subject motion is a significant
More informationAn Empirical Comparison of Spectral Learning Methods for Classification
An Empirical Comparison of Spectral Learning Methods for Classification Adam Drake and Dan Ventura Computer Science Department Brigham Young University, Provo, UT 84602 USA Email: adam drake1@yahoo.com,
More informationTHE COMPUTER MODELLING OF GLUING FLAT IMAGES ALGORITHMS. Alekseí Yu. Chekunov. 1. Introduction
MATEMATIQKI VESNIK Corrected proof Available online 01.10.2016 originalni nauqni rad research paper THE COMPUTER MODELLING OF GLUING FLAT IMAGES ALGORITHMS Alekseí Yu. Chekunov Abstract. In this paper
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationChapter 7: Numerical Prediction
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 7: Numerical Prediction Lecture: Prof. Dr.
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationLecture 9: Support Vector Machines
Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and
More information3. Data Preprocessing. 3.1 Introduction
3. Data Preprocessing Contents of this Chapter 3.1 Introduction 3.2 Data cleaning 3.3 Data integration 3.4 Data transformation 3.5 Data reduction SFU, CMPT 740, 03-3, Martin Ester 84 3.1 Introduction Motivation
More information2. Data Preprocessing
2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459
More informationFall 09, Homework 5
5-38 Fall 09, Homework 5 Due: Wednesday, November 8th, beginning of the class You can work in a group of up to two people. This group does not need to be the same group as for the other homeworks. You
More informationDecision Trees Dr. G. Bharadwaja Kumar VIT Chennai
Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target
More informationCombine the PA Algorithm with a Proximal Classifier
Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU
More informationA Short SVM (Support Vector Machine) Tutorial
A Short SVM (Support Vector Machine) Tutorial j.p.lewis CGIT Lab / IMSC U. Southern California version 0.zz dec 004 This tutorial assumes you are familiar with linear algebra and equality-constrained optimization/lagrange
More informationClassification with Diffuse or Incomplete Information
Classification with Diffuse or Incomplete Information AMAURY CABALLERO, KANG YEN Florida International University Abstract. In many different fields like finance, business, pattern recognition, communication
More informationData mining with sparse grids
Data mining with sparse grids Jochen Garcke and Michael Griebel Institut für Angewandte Mathematik Universität Bonn Data mining with sparse grids p.1/40 Overview What is Data mining? Regularization networks
More informationTraining of Neural Networks. Q.J. Zhang, Carleton University
Training of Neural Networks Notation: x: input of the original modeling problem or the neural network y: output of the original modeling problem or the neural network w: internal weights/parameters of
More informationCSE 546 Machine Learning, Autumn 2013 Homework 2
CSE 546 Machine Learning, Autumn 2013 Homework 2 Due: Monday, October 28, beginning of class 1 Boosting [30 Points] We learned about boosting in lecture and the topic is covered in Murphy 16.4. On page
More information7. Decision or classification trees
7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,
More informationIntroduction to Fuzzy Logic. IJCAI2018 Tutorial
Introduction to Fuzzy Logic IJCAI2018 Tutorial 1 Crisp set vs. Fuzzy set A traditional crisp set A fuzzy set 2 Crisp set vs. Fuzzy set 3 Crisp Logic Example I Crisp logic is concerned with absolutes-true
More informationIntroduction to Machine Learning
Introduction to Machine Learning Maximum Margin Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationCellular Learning Automata-Based Color Image Segmentation using Adaptive Chains
Cellular Learning Automata-Based Color Image Segmentation using Adaptive Chains Ahmad Ali Abin, Mehran Fotouhi, Shohreh Kasaei, Senior Member, IEEE Sharif University of Technology, Tehran, Iran abin@ce.sharif.edu,
More informationIncorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data
Incorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data Ryan Atallah, John Ryan, David Aeschlimann December 14, 2013 Abstract In this project, we study the problem of classifying
More informationNaïve Bayes for text classification
Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support
More informationNovel Intuitionistic Fuzzy C-Means Clustering for Linearly and Nonlinearly Separable Data
Novel Intuitionistic Fuzzy C-Means Clustering for Linearly and Nonlinearly Separable Data PRABHJOT KAUR DR. A. K. SONI DR. ANJANA GOSAIN Department of IT, MSIT Department of Computers University School
More informationLECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS
LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Neural Networks Classifier Introduction INPUT: classification data, i.e. it contains an classification (class) attribute. WE also say that the class
More informationTHE COMPUTER MODELLING OF GLUING FLAT IMAGES ALGORITHMS. Alekseí Yu. Chekunov. 1. Introduction
MATEMATIČKI VESNIK MATEMATIQKI VESNIK 69, 1 (2017), 12 22 March 2017 research paper originalni nauqni rad THE COMPUTER MODELLING OF GLUING FLAT IMAGES ALGORITHMS Alekseí Yu. Chekunov Abstract. In this
More informationEfficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002 1225 Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms S. Sathiya Keerthi Abstract This paper
More informationCombined Weak Classifiers
Combined Weak Classifiers Chuanyi Ji and Sheng Ma Department of Electrical, Computer and System Engineering Rensselaer Polytechnic Institute, Troy, NY 12180 chuanyi@ecse.rpi.edu, shengm@ecse.rpi.edu Abstract
More information4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.
1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when
More informationBias-variance trade-off and cross validation Computer exercises
Bias-variance trade-off and cross validation Computer exercises 6.1 Cross validation in k-nn In this exercise we will return to the Biopsy data set also used in Exercise 4.1 (Lesson 4). We will try to
More informationADVANCED IMAGE PROCESSING METHODS FOR ULTRASONIC NDE RESEARCH C. H. Chen, University of Massachusetts Dartmouth, N.
ADVANCED IMAGE PROCESSING METHODS FOR ULTRASONIC NDE RESEARCH C. H. Chen, University of Massachusetts Dartmouth, N. Dartmouth, MA USA Abstract: The significant progress in ultrasonic NDE systems has now
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 9, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, 2014 1 / 47
More informationConstructive floorplanning with a yield objective
Constructive floorplanning with a yield objective Rajnish Prasad and Israel Koren Department of Electrical and Computer Engineering University of Massachusetts, Amherst, MA 13 E-mail: rprasad,koren@ecs.umass.edu
More informationCS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample
Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups
More informationClassification using Weka (Brain, Computation, and Neural Learning)
LOGO Classification using Weka (Brain, Computation, and Neural Learning) Jung-Woo Ha Agenda Classification General Concept Terminology Introduction to Weka Classification practice with Weka Problems: Pima
More informationReflector profile optimisation using Radiance
Reflector profile optimisation using Radiance 1,4 1,2 1, 8 6 4 2 3. 2.5 2. 1.5 1..5 I csf(1) csf(2). 1 2 3 4 5 6 Giulio ANTONUTTO Krzysztof WANDACHOWICZ page 1 The idea Krzysztof WANDACHOWICZ Giulio ANTONUTTO
More informationRobust Face Recognition via Sparse Representation
Robust Face Recognition via Sparse Representation Panqu Wang Department of Electrical and Computer Engineering University of California, San Diego La Jolla, CA 92092 pawang@ucsd.edu Can Xu Department of
More information732A54/TDDE31 Big Data Analytics
732A54/TDDE31 Big Data Analytics Lecture 10: Machine Learning with MapReduce Jose M. Peña IDA, Linköping University, Sweden 1/27 Contents MapReduce Framework Machine Learning with MapReduce Neural Networks
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More information