Homework 3: Solutions

Size: px
Start display at page:

Download "Homework 3: Solutions"

Transcription

1 Homework 3: Solutions Statistics 43 Fall 207 Data Analysis: Note: All data analysis results are provided by Yixin Chen

2 STAT43 - HW3 Yixin Chen Data Analysis in R Pipeline (a) Since the purpose is to find the best multi-class classifier, I would like to use the same analysis pipeline in Assignment 2. I randomly split the dataset into three parts which are used separately in my three phases of analysis pipeline : 60% of the data is used in train/val phase where I perform K-fold cross validation to choose the optimal tuning parameters for all tunable models ; 20% of the data goes to query phase in which I used all models with chosen tuning parameters if applicable to do classification and then concluded which model performed most accurate classification ; The left 20% of the data is treated as test data on which I do final prediction to assess our final classifier..5 Data Preprocessing (a) Before going into multi-class classification, I need to convert all labels into numbers/factors. Therefore, I used 0-6 as representations of brickface, sky, foliage, cement, window, path, grass. (The reason for starting from 0 as factor is simply that R s packages start classification from 0 as a norm). Besides the random splitting of data for analysis pipeline, I also manipulated a feature column. Because there is a feature that has the same value for all observations, I decide to delete that feature before I can scale and center the data matrix. Below is my code for data preprocessing. ## import the data 2 dat read.csv( data.csv,header=f) 3 dat2 read.csv( test.csv,header=f) 4 data0 as.matrix(rbind(dat,dat2)) 5 data00 data0[,-4] 6 data00[,2:9] = scale(data00[,2:9],center=true,scale=true) 7 ## split data 8 set.seed(00); 9 idx sample(:nrow(data00)) 0 idx idx[:floor(0.6*nrow(data00))] idx2 idx[(floor(0.6*nrow(data00))+):floor(0.8*nrow(data00))] 2 idx3 idx[(floor(0.8*nrow(data00))+):nrow(data00)] 3 train data00[idx,] # 60% Training Data 4 xtrain train[,2:9] 5 ytrain train[,] # 20% Query 6 query data00[idx2,] 7 xquery query[,2:9] 8 yquery query[,] 9 test data00[idx3,] # 20% Test Data 20 xtest test[,2:9] 2 ytest test[,]

3 2CompareandContrast (a) Naive Bayes ## Naive Bayes 2 # define training control 3 library(naivebayes) 4 train_control traincontrol(method="cv", number=5) 5 # fix the parameters 6 grid expand.grid(.fl = seq(0,0,0.),.usekernel = c(false,true),.adjust=seq(0,20,) ) 7 model_nb = train(as.factor(v).,data=as.data.frame(train),trcontrol = train_control, method = naive_bayes,tunegrid=grid) 8 prob_nb = predict(model_nb$finalmodel,newdata=as.data.frame(xquery),type = "prob") 9 pred_nb = max.col(prob_nb, first )- 0 error_nb = ce(as.factor(yquery),pred_nb) In order to tune the parameters of Naive Bayes Classifier, I used a built-in function named "train" in R that does 5-fold cross validation for NBC with 3 different tuning parameters(laplace smoothing,distribution type,bandwidth adjustment) and picks the model with best tuning parameter. For my training dataset, my best tuning parameters were [laplace smoothing = 0, distribution = use kernel,bandwidth = 0]. Then I used this tuned model to classify on query data. The misclassification error is (b) LDA model_lda = lda(as.factor(v)., data=as.data.frame(train)) 2 result_lda = predict(model_lda,newdata=as.data.frame(xquery)) 3 err_lda = ce(yquery,result_lda$class) Since LDA generally has no tuning parameters, I simply inputed query data to the LDA built-in function in R. I used the same query data as before due to consistency for comparison. As a result, its misclassification error is (c) Non-regularized Multinomial Regression ## Multinomial Regression 2 # Non-regularized MR 3 library(nnet) 4 model_mr0 = multinom(as.factor(v).,data=as.data.frame(train),maxnwts=3000,maxit =400) 5 prob_mr0 = predict(model_mr0,newdata=as.data.frame(xquery),type= probs ) 6 pred_mr0 = max.col(prob_mr0, first )- 7 error_mr0 = ce(as.factor(yquery),pred_mr0) As implied by its name, non-regularized multinomial regression does not have tuning parameters. Thus, I used the function "multinom" to fit the model with training data and then use this fitted model to classify query data. Its misclassification error is (d) Regularized MR 2

4 # Regularized MR 2 library(glmnet) 3 set.seed(00) 4 error_mr = matrix(0,,) 5 for (a in seq(0,0,)){ 6 model_mr = cv.glmnet(x=xtrain,y=ytrain,type.measure="class",alpha=a/0,nfolds=5, family="multinomial") 7 pred = predict(model_mr,newx=xquery,type="class") 8 error_mr[a+] = ce(yquery,pred) 9 } 0 idx_opt = which.min(error_mr) alpha_opt = (idx_opt-)/0 2 model_mr_opt = cv.glmnet(x=xtrain,y=ytrain,type.measure="class",alpha=alpha_opt, nfolds=5,family="multinomial") 3 pred_mr = predict(model_mr_opt,newx=xquery,type="class") 4 error_mr_opt = ce(yquery,pred_mr) Since there is no guarantee what kind of penalty results in better classification performance, I decide to try 0 alpha values from 0 to. With each trial alpha, find the model with optimum lambda by "cv.glmnet" function and then use this model as the classifier for query data. In the end, I had different query errors corresponding to different alpha values. We can see that Table Regularized MR Query Misclassification Errors Alpha Error when alpha =, the model has the best prediction performance. Therefore, I would use Lasso l2 ` Penalty for regularized multinomial regression. Then I retrained the L2 ` MR model with alpha= and got an misclassification error of on query data. (e) Linear SVM As for SVM in general, I used one against all method as its extension to multi-class classification problem since One Against All method requires less computation time and is more robustly implemented in all kinds of packages. # Linear SVM 2 library(liblinear) 3 # one against all+cv contained 4 train_control traincontrol(method="cv", number=5) 5 # fix the parameters 6 grid_lsvm_ovr expand.grid(.cost=seq(,30,0.),.loss = "L2") 7 model_lsvm_ovr2 = train(as.factor(v).,data=as.data.frame(train),trcontrol = train_ control,method = svmlinear3,tunegrid=grid_lsvm_ovr) 8 pred_lsvm_ovr = predict(model_lsvm_ovr2$finalmodel,newx=xquery,type = "raw") 9 error_lsvm_ovr = ce(as.factor(yquery),pred_lsvm_ovr$predictions) Again I used "train" function in R to tune the cost function of linear SVM from to 30 with step size of 0.. The resulted best tuning cost function is Then I used this final model to classify on query dataset and got a misclassification error of (f) Polynomial SVM # Polynomial SVM 3

5 2 library(kernlab) 3 # one against all+cv contained 4 train_control traincontrol(method="cv", number=5) 5 # fix the parameters 6 grid_psvm_ovr expand.grid(.degree=seq(2,0,),.scale=seq(,0,),.c = seq(,20,)) 7 model_psvm_ovr = train(as.factor(v).,data=as.data.frame(train),trcontrol = train_ control,method = svmpoly,tunegrid=grid_psvm_ovr) 8 pred_psvm_ovr = predict(model_psvm_ovr$finalmodel,newdata=xquery,type = "response") 9 error_psvm_ovr = ce(as.factor(yquery),pred_psvm_ovr) Similarly, I changed the method of "train" function to the corresponding one for polynomial SVM and constructed its appropriate tuning parameter vectors of polynomial degree, scale and cost function. As a result of the 5-fold CV, we had a best tuned final model with degree=2, scale=2 and cost =. Use this model to predict the labels for query and I had a misclassification error of (g) Radial SVM # Radial SVM 2 library(kernlab) 3 # one against all+cv contained 4 train_control traincontrol(method="cv", number=5) 5 # fix the parameters 6 grid_rsvm_ovr expand.grid(.sigma=seq(,0,),.c = seq(,20,)) 7 model_rsvm_ovr = train(as.factor(v).,data=as.data.frame(train),trcontrol = train_ control,method = svmradial,tunegrid=grid_rsvm_ovr) 8 pred_rsvm_ovr = predict(model_rsvm_ovr$finalmodel,newdata=xquery,type = "response") 9 error_rsvm_ovr = ce(as.factor(yquery),pred_rsvm_ovr) Still, with tuning parameters of sigma value and cost functions, I used "train" function to find that the best tuning parameters are sigma=0 and cost=4. As a result, the query s misclassification error is (h) Testing As we can see, the polynomial SVM with the best tuning parameters has yielded the best Table 2 Query Misclassification Errors for Best Tuned Models Above Model NB LDA Non-Regularized MR L2 MR Linear SVM Poly SVM Radial SVM Error performance in classifying query data. Therefore, I continue using this model to classify on the test dataset. ## Test 2 grid_test = model_psvm_ovr$besttune 3 old_data = rbind(train,query) 4 model_test = train(as.factor(v).,data=as.data.frame(old_data),trcontrol = train_ control,method = svmpoly,tunegrid=grid_test) 5 pred_test = predict(model_test$finalmodel,newdata=xtest,type = "response") 6 error_test = ce(as.factor(ytest),pred_test) After combining training and query data, the polynomial SVM trained model did even a better job in classifying test data. As a final model assessment, I would say the polynomial SVM model 4

6 with degree=2, scale=2 and cost = is our best classifier for multi-class classification with a very low misclassification error of on test data. 3Visualization (a) PCA I used PCA and plotted out PC2 vs. PC as shown below. The first graph is for a complete dataset. The second is only for training dataset while the last one is for query dataset. In the training and query dataset, we can see that the data is quite separable except for some slight mixture between 2, 3 and 4(which correspond to foliage, cement and window). The query dataset is sparser and also has different orientation and shape for each class of data. Therefore, it is quite reasonable that my Polynomial SVM with degree 2 works the best(only a small amount of curvature is needed). The detailed interpretation will be shown in the following section named Reflection. 5

7 4Reflection (a) As discussed above, 2-degree polynomial SVM makes the best classifier in this case. The reason can be shown that in the above plot : data with label, 5 and 6 are easily separable from the rest of the data(since we are using One vs. Rest method) by a quadratic hyperplane ; although data with label 2, 3 and 4 are mixed to some extent, this cannot be separated by linear hyperplanes but still be separated by soft margin polynomial SVM. As for the radial non-linear SVM, each class of data especially 2, 3 and 4 does not have a good round shape and therefore although radial SVM can be used to separate the data but can lead to over-fit on training data and therefore does not classify well on query data. Therefore, as a result, 2-degree polynomial SVM was the best classifier. As predicted by the PCA plots above, we can see that all other labels except 2, 3 and 4 have zero misclassification error. 6

8 Theoretical Problems:. Show that the following two optimization problems are equivalent: β 0,β 2 β γ ξ i subject to y i (x T i β + β 0 ) ξ i i =...n ξ i 0 i =...n () β 0,β Note that the first constraint in Problem () can be re-written as: ( y i (x T i β + β 0 )) + + λ 2 β 2 2. (2) Combining this with the second constraint, we see ξ i y i (x T i β + β 0 ), i ξ i [ y i (x T i β + β 0 ) ] +, i Substituting this into the objective function of Problem (), the objective becomes: [ β 0,β 2 β γ yi (x T i β + β 0 ) ] + Multiplying through by λ = γ which is Problem (2), as desired. we obtain β 0,β λ 2 β [ yi (x T i β + β 0 ) ] + For completeness, we now need to justify the substitution (the constraints only give an inequality, not an equality): ξ i [ y i (x T i β + β 0 ) ] +, i Suppose, for the sake of argument, that this weren t true: that is, that (β 0, β, ξ) forms a solution to Problem () with ξ i = δ + [ y i (x T i β + β 0 ) ] + for some fixed i and some δ>0. Then (β 0, β, ξ = ξ δe i ) is also feasible point for Problem () with a strictly lower value of the objective function. To see this, note that the only value to have changed is ξ i = ξ δ and that the constraints involving this quantity ξ i [ y i (x T i β + β 0 ) ] + remain true by construction. Next note that the objective function is lower by f(β 0, β, ξ) f(β 0, β, ξ )= 2 β γ 2 β γ We may assume δ>0fromtheconstraints. = γ ξ j ξ j j= j= = γ(ξ i (ξ i δ)) = γδ ξ j > 0 assuming γ>0 j= ξ j 2

9 This is a contradiction to our assumption that (β 0, β, ξ) is a solution to Problem (), however, and hence we must have δ = 0 giving and our substitution is valid. ξ i = [ y i (x T i β + β 0 ) ] +, i It is also possible to prove the validity of the substitution using the KKT conditions: we can write Problem () in the standard form for convex optimization problems: β 0,β The corresponding Lagrangian form is: L = 2 β γ = 2 β = 2 β γ subject to ξ i y i (x T i β + β 0 ) 0 i {,...n} ξ i + λ i ( ξ i )+ ξ i ξ i 0 µ i ( ξ i y i (x T i β + β 0 )) ξ i (γ λ i µ i )+µ i ( y i (x T i β + β 0 )) ( 2 β 2 2 ) µ i y i x T i β β 0 From the KKT conditions, we get n µ i y i + ξ i (γ λ i µ i )+ ξ i y i (x T i β + β 0 ) 0 i {,...,n} (Primal feasability) ξ i 0 i {,...,n} λ i 0 i {,...,n} (Dual feasability) µ i 0 i {,...,n} λ i ( ξ i y i (x T i β + β 0 )) = 0 i {,...,n} (Complementary slackness) β T µ i ( ξ i )=0 i {,...,n} µ i y i x T i =0 (Gradient condition: β) µ i y i =0 (Gradient condition: β 0 ) Complementary slackness gives us γ λ i µ i =0 i {,...,n} (Gradient condition: ξ i ) λ i > 0 = ξ i = y i (x T i β + β 0 ) µ i > 0 = ξ i =0 so it suffices to show that at least one of λ i,µ i is positive for each i. The final KKT condition gives us γ = λ i + µ i, i and we have λ i,µ i 0 from dual feasability so we must have at least one of λ i,µ i positive for γ>0, giving ξ i = [ y i (x T i β + β 0) ] for all i as desired. + µ i 3

Linear methods for supervised learning

Linear methods for supervised learning Linear methods for supervised learning LDA Logistic regression Naïve Bayes PLA Maximum margin hyperplanes Soft-margin hyperplanes Least squares resgression Ridge regression Nonlinear feature maps Sometimes

More information

Lecture Linear Support Vector Machines

Lecture Linear Support Vector Machines Lecture 8 In this lecture we return to the task of classification. As seen earlier, examples include spam filters, letter recognition, or text classification. In this lecture we introduce a popular method

More information

SUPPORT VECTOR MACHINES

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Today Reading AIMA 8.9 (SVMs) Goals Finish Backpropagation Support vector machines Backpropagation. Begin with randomly initialized weights 2. Apply the neural network to each training

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Maximum Margin Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

COMS 4771 Support Vector Machines. Nakul Verma

COMS 4771 Support Vector Machines. Nakul Verma COMS 4771 Support Vector Machines Nakul Verma Last time Decision boundaries for classification Linear decision boundary (linear classification) The Perceptron algorithm Mistake bound for the perceptron

More information

Constrained optimization

Constrained optimization Constrained optimization A general constrained optimization problem has the form where The Lagrangian function is given by Primal and dual optimization problems Primal: Dual: Weak duality: Strong duality:

More information

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017 Data Analysis 3 Support Vector Machines Jan Platoš October 30, 2017 Department of Computer Science Faculty of Electrical Engineering and Computer Science VŠB - Technical University of Ostrava Table of

More information

Chakra Chennubhotla and David Koes

Chakra Chennubhotla and David Koes MSCBIO/CMPBIO 2065: Support Vector Machines Chakra Chennubhotla and David Koes Nov 15, 2017 Sources mmds.org chapter 12 Bishop s book Ch. 7 Notes from Toronto, Mark Schmidt (UBC) 2 SVM SVMs and Logistic

More information

SUPPORT VECTOR MACHINES

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Today Reading AIMA 18.9 Goals (Naïve Bayes classifiers) Support vector machines 1 Support Vector Machines (SVMs) SVMs are probably the most popular off-the-shelf classifier! Software

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Chapter 9 Chapter 9 1 / 50 1 91 Maximal margin classifier 2 92 Support vector classifiers 3 93 Support vector machines 4 94 SVMs with more than two classes 5 95 Relationshiop to

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..

.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar.. .. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar.. Machine Learning: Support Vector Machines: Linear Kernel Support Vector Machines Extending Perceptron Classifiers. There are two ways to

More information

Stanford University. A Distributed Solver for Kernalized SVM

Stanford University. A Distributed Solver for Kernalized SVM Stanford University CME 323 Final Project A Distributed Solver for Kernalized SVM Haoming Li Bangzheng He haoming@stanford.edu bzhe@stanford.edu GitHub Repository https://github.com/cme323project/spark_kernel_svm.git

More information

More on Classification: Support Vector Machine

More on Classification: Support Vector Machine More on Classification: Support Vector Machine The Support Vector Machine (SVM) is a classification method approach developed in the computer science field in the 1990s. It has shown good performance in

More information

All lecture slides will be available at CSC2515_Winter15.html

All lecture slides will be available at  CSC2515_Winter15.html CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many

More information

Homework 3: Solutions

Homework 3: Solutions Homework 3: Solutions Statistics 613 Fall 2017 Mathematical Problem: 1. Use the kernel trick to derive Kernel Logistic Regression OR Kernel Discriminant Analysis. Show your work. Kernel Logistic Regression:

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Kernel Methods & Support Vector Machines

Kernel Methods & Support Vector Machines & Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Final Exam. Advanced Methods for Data Analysis (36-402/36-608) Due Thursday May 8, 2014 at 11:59pm

Final Exam. Advanced Methods for Data Analysis (36-402/36-608) Due Thursday May 8, 2014 at 11:59pm Final Exam Advanced Methods for Data Analysis (36-402/36-608) Due Thursday May 8, 2014 at 11:59pm Instructions: you will submit this take-home final exam in three parts. 1. Writeup. This will be a complete

More information

Practical 7: Support vector machines

Practical 7: Support vector machines Practical 7: Support vector machines Support vector machines are implemented in several R packages, including e1071, caret and kernlab. We will use the e1071 package in this practical. install.packages('e1071')

More information

Lecture 7: Support Vector Machine

Lecture 7: Support Vector Machine Lecture 7: Support Vector Machine Hien Van Nguyen University of Houston 9/28/2017 Separating hyperplane Red and green dots can be separated by a separating hyperplane Two classes are separable, i.e., each

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 3 Due Tuesday, October 22, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Classification. 1 o Semestre 2007/2008

Classification. 1 o Semestre 2007/2008 Classification Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 Single-Class

More information

Machine Learning: Think Big and Parallel

Machine Learning: Think Big and Parallel Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least

More information

1. What is the VC dimension of the family of finite unions of closed intervals over the real line?

1. What is the VC dimension of the family of finite unions of closed intervals over the real line? Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 March 06, 2011 Due: March 22, 2011 A. VC Dimension 1. What is the VC dimension of the family

More information

Data-driven Kernels for Support Vector Machines

Data-driven Kernels for Support Vector Machines Data-driven Kernels for Support Vector Machines by Xin Yao A research paper presented to the University of Waterloo in partial fulfillment of the requirement for the degree of Master of Mathematics in

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007,

More information

A Short SVM (Support Vector Machine) Tutorial

A Short SVM (Support Vector Machine) Tutorial A Short SVM (Support Vector Machine) Tutorial j.p.lewis CGIT Lab / IMSC U. Southern California version 0.zz dec 004 This tutorial assumes you are familiar with linear algebra and equality-constrained optimization/lagrange

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-4: Constrained optimization Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428 June

More information

Package DTRlearn. April 6, 2018

Package DTRlearn. April 6, 2018 Type Package Package DTRlearn April 6, 2018 Title Learning Algorithms for Dynamic Treatment Regimes Version 1.3 Date 2018-4-05 Author Ying Liu, Yuanjia Wang, Donglin Zeng Maintainer Ying Liu

More information

Application of Support Vector Machine Algorithm in Spam Filtering

Application of Support Vector Machine Algorithm in  Spam Filtering Application of Support Vector Machine Algorithm in E-Mail Spam Filtering Julia Bluszcz, Daria Fitisova, Alexander Hamann, Alexey Trifonov, Advisor: Patrick Jähnichen Abstract The problem of spam classification

More information

Module 4. Non-linear machine learning econometrics: Support Vector Machine

Module 4. Non-linear machine learning econometrics: Support Vector Machine Module 4. Non-linear machine learning econometrics: Support Vector Machine THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Introduction When the assumption of linearity

More information

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:

More information

Lab 2: Support vector machines

Lab 2: Support vector machines Artificial neural networks, advanced course, 2D1433 Lab 2: Support vector machines Martin Rehn For the course given in 2006 All files referenced below may be found in the following directory: /info/annfk06/labs/lab2

More information

Local and Global Minimum

Local and Global Minimum Local and Global Minimum Stationary Point. From elementary calculus, a single variable function has a stationary point at if the derivative vanishes at, i.e., 0. Graphically, the slope of the function

More information

Support Vector Machines. James McInerney Adapted from slides by Nakul Verma

Support Vector Machines. James McInerney Adapted from slides by Nakul Verma Support Vector Machines James McInerney Adapted from slides by Nakul Verma Last time Decision boundaries for classification Linear decision boundary (linear classification) The Perceptron algorithm Mistake

More information

Support vector machines

Support vector machines Support vector machines When the data is linearly separable, which of the many possible solutions should we prefer? SVM criterion: maximize the margin, or distance between the hyperplane and the closest

More information

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs) Data Mining: Concepts and Techniques Chapter 9 Classification: Support Vector Machines 1 Support Vector Machines (SVMs) SVMs are a set of related supervised learning methods used for classification Based

More information

Content-based image and video analysis. Machine learning

Content-based image and video analysis. Machine learning Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all

More information

Computerlinguistische Anwendungen Support Vector Machines

Computerlinguistische Anwendungen Support Vector Machines with Scikitlearn Computerlinguistische Anwendungen Support Vector Machines Thang Vu CIS, LMU thangvu@cis.uni-muenchen.de May 20, 2015 1 Introduction Shared Task 1 with Scikitlearn Today we will learn about

More information

Introduction to Automated Text Analysis. bit.ly/poir599

Introduction to Automated Text Analysis. bit.ly/poir599 Introduction to Automated Text Analysis Pablo Barberá School of International Relations University of Southern California pablobarbera.com Lecture materials: bit.ly/poir599 Today 1. Solutions for last

More information

Lecture 9: Support Vector Machines

Lecture 9: Support Vector Machines Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and

More information

MACHINE LEARNING TOOLBOX. Random forests and wine

MACHINE LEARNING TOOLBOX. Random forests and wine MACHINE LEARNING TOOLBOX Random forests and wine Random forests Popular type of machine learning model Good for beginners Robust to overfitting Yield very accurate, non-linear models Random forests Unlike

More information

Programs. Introduction

Programs. Introduction 16 Interior Point I: Linear Programs Lab Objective: For decades after its invention, the Simplex algorithm was the only competitive method for linear programming. The past 30 years, however, have seen

More information

CSE 417T: Introduction to Machine Learning. Lecture 22: The Kernel Trick. Henry Chai 11/15/18

CSE 417T: Introduction to Machine Learning. Lecture 22: The Kernel Trick. Henry Chai 11/15/18 CSE 417T: Introduction to Machine Learning Lecture 22: The Kernel Trick Henry Chai 11/15/18 Linearly Inseparable Data What can we do if the data is not linearly separable? Accept some non-zero in-sample

More information

An MM Algorithm for Multicategory Vertex Discriminant Analysis

An MM Algorithm for Multicategory Vertex Discriminant Analysis An MM Algorithm for Multicategory Vertex Discriminant Analysis Tong Tong Wu Department of Epidemiology and Biostatistics University of Maryland, College Park May 22, 2008 Joint work with Professor Kenneth

More information

Optimal Separating Hyperplane and the Support Vector Machine. Volker Tresp Summer 2018

Optimal Separating Hyperplane and the Support Vector Machine. Volker Tresp Summer 2018 Optimal Separating Hyperplane and the Support Vector Machine Volker Tresp Summer 2018 1 (Vapnik s) Optimal Separating Hyperplane Let s consider a linear classifier with y i { 1, 1} If classes are linearly

More information

Optimization III: Constrained Optimization

Optimization III: Constrained Optimization Optimization III: Constrained Optimization CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Doug James (and Justin Solomon) CS 205A: Mathematical Methods Optimization III: Constrained Optimization

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

AM 221: Advanced Optimization Spring 2016

AM 221: Advanced Optimization Spring 2016 AM 221: Advanced Optimization Spring 2016 Prof. Yaron Singer Lecture 2 Wednesday, January 27th 1 Overview In our previous lecture we discussed several applications of optimization, introduced basic terminology,

More information

14. League: A factor with levels A and N indicating player s league at the end of 1986

14. League: A factor with levels A and N indicating player s league at the end of 1986 PENALIZED REGRESSION Ridge and The LASSO Note: The example contained herein was copied from the lab exercise in Chapter 6 of Introduction to Statistical Learning by. For this exercise, we ll use some baseball

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Support Vector Machines Aurélie Herbelot 2018 Centre for Mind/Brain Sciences University of Trento 1 Support Vector Machines: introduction 2 Support Vector Machines (SVMs) SVMs

More information

Lecture 10: SVM Lecture Overview Support Vector Machines The binary classification problem

Lecture 10: SVM Lecture Overview Support Vector Machines The binary classification problem Computational Learning Theory Fall Semester, 2012/13 Lecture 10: SVM Lecturer: Yishay Mansour Scribe: Gitit Kehat, Yogev Vaknin and Ezra Levin 1 10.1 Lecture Overview In this lecture we present in detail

More information

Louis Fourrier Fabien Gaie Thomas Rolf

Louis Fourrier Fabien Gaie Thomas Rolf CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted

More information

DM6 Support Vector Machines

DM6 Support Vector Machines DM6 Support Vector Machines Outline Large margin linear classifier Linear separable Nonlinear separable Creating nonlinear classifiers: kernel trick Discussion on SVM Conclusion SVM: LARGE MARGIN LINEAR

More information

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018 SUPERVISED LEARNING METHODS Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018 2 CHOICE OF ML You cannot know which algorithm will work

More information

Optimization Methods. Final Examination. 1. There are 5 problems each w i t h 20 p o i n ts for a maximum of 100 points.

Optimization Methods. Final Examination. 1. There are 5 problems each w i t h 20 p o i n ts for a maximum of 100 points. 5.93 Optimization Methods Final Examination Instructions:. There are 5 problems each w i t h 2 p o i n ts for a maximum of points. 2. You are allowed to use class notes, your homeworks, solutions to homework

More information

6 Model selection and kernels

6 Model selection and kernels 6. Bias-Variance Dilemma Esercizio 6. While you fit a Linear Model to your data set. You are thinking about changing the Linear Model to a Quadratic one (i.e., a Linear Model with quadratic features φ(x)

More information

Machine Learning. Supervised Learning. Manfred Huber

Machine Learning. Supervised Learning. Manfred Huber Machine Learning Supervised Learning Manfred Huber 2015 1 Supervised Learning Supervised learning is learning where the training data contains the target output of the learning system. Training data D

More information

Semi-supervised learning and active learning

Semi-supervised learning and active learning Semi-supervised learning and active learning Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Combining classifiers Ensemble learning: a machine learning paradigm where multiple learners

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

Kernel SVM. Course: Machine Learning MAHDI YAZDIAN-DEHKORDI FALL 2017

Kernel SVM. Course: Machine Learning MAHDI YAZDIAN-DEHKORDI FALL 2017 Kernel SVM Course: MAHDI YAZDIAN-DEHKORDI FALL 2017 1 Outlines SVM Lagrangian Primal & Dual Problem Non-linear SVM & Kernel SVM SVM Advantages Toolboxes 2 SVM Lagrangian Primal/DualProblem 3 SVM LagrangianPrimalProblem

More information

Lasso. November 14, 2017

Lasso. November 14, 2017 Lasso November 14, 2017 Contents 1 Case Study: Least Absolute Shrinkage and Selection Operator (LASSO) 1 1.1 The Lasso Estimator.................................... 1 1.2 Computation of the Lasso Solution............................

More information

9. Support Vector Machines. The linearly separable case: hard-margin SVMs. The linearly separable case: hard-margin SVMs. Learning objectives

9. Support Vector Machines. The linearly separable case: hard-margin SVMs. The linearly separable case: hard-margin SVMs. Learning objectives Foundations of Machine Learning École Centrale Paris Fall 25 9. Support Vector Machines Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech Learning objectives chloe agathe.azencott@mines

More information

Model selection and validation 1: Cross-validation

Model selection and validation 1: Cross-validation Model selection and validation 1: Cross-validation Ryan Tibshirani Data Mining: 36-462/36-662 March 26 2013 Optional reading: ISL 2.2, 5.1, ESL 7.4, 7.10 1 Reminder: modern regression techniques Over the

More information

Supervised classification exercice

Supervised classification exercice Universitat Politècnica de Catalunya Master in Artificial Intelligence Computational Intelligence Supervised classification exercice Authors: Miquel Perelló Nieto Marc Albert Garcia Gonzalo Date: December

More information

Topic 4: Support Vector Machines

Topic 4: Support Vector Machines CS 4850/6850: Introduction to achine Learning Fall 2018 Topic 4: Support Vector achines Instructor: Daniel L Pimentel-Alarcón c Copyright 2018 41 Introduction Support vector machines (SVs) are considered

More information

1 Training/Validation/Testing

1 Training/Validation/Testing CPSC 340 Final (Fall 2015) Name: Student Number: Please enter your information above, turn off cellphones, space yourselves out throughout the room, and wait until the official start of the exam to begin.

More information

Lecture 19: Convex Non-Smooth Optimization. April 2, 2007

Lecture 19: Convex Non-Smooth Optimization. April 2, 2007 : Convex Non-Smooth Optimization April 2, 2007 Outline Lecture 19 Convex non-smooth problems Examples Subgradients and subdifferentials Subgradient properties Operations with subgradients and subdifferentials

More information

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class Guidelines Submission. Submit a hardcopy of the report containing all the figures and printouts of code in class. For readability

More information

Kernel Methods. Chapter 9 of A Course in Machine Learning by Hal Daumé III. Conversion to beamer by Fabrizio Riguzzi

Kernel Methods. Chapter 9 of A Course in Machine Learning by Hal Daumé III.   Conversion to beamer by Fabrizio Riguzzi Kernel Methods Chapter 9 of A Course in Machine Learning by Hal Daumé III http://ciml.info Conversion to beamer by Fabrizio Riguzzi Kernel Methods 1 / 66 Kernel Methods Linear models are great because

More information

Support Vector Machines and their Applications

Support Vector Machines and their Applications Purushottam Kar Department of Computer Science and Engineering, Indian Institute of Technology Kanpur. Summer School on Expert Systems And Their Applications, Indian Institute of Information Technology

More information

SUPPORT VECTOR MACHINE ACTIVE LEARNING

SUPPORT VECTOR MACHINE ACTIVE LEARNING SUPPORT VECTOR MACHINE ACTIVE LEARNING CS 101.2 Caltech, 03 Feb 2009 Paper by S. Tong, D. Koller Presented by Krzysztof Chalupka OUTLINE SVM intro Geometric interpretation Primal and dual form Convexity,

More information

LECTURE 13: SOLUTION METHODS FOR CONSTRAINED OPTIMIZATION. 1. Primal approach 2. Penalty and barrier methods 3. Dual approach 4. Primal-dual approach

LECTURE 13: SOLUTION METHODS FOR CONSTRAINED OPTIMIZATION. 1. Primal approach 2. Penalty and barrier methods 3. Dual approach 4. Primal-dual approach LECTURE 13: SOLUTION METHODS FOR CONSTRAINED OPTIMIZATION 1. Primal approach 2. Penalty and barrier methods 3. Dual approach 4. Primal-dual approach Basic approaches I. Primal Approach - Feasible Direction

More information

Statistics 202: Statistical Aspects of Data Mining

Statistics 202: Statistical Aspects of Data Mining Statistics 202: Statistical Aspects of Data Mining Professor Rajan Patel Lecture 9 = More of Chapter 5 Agenda: 1) Lecture over more of Chapter 5 1 Introduction to Data Mining by Tan, Steinbach, Kumar Chapter

More information

ACO Comprehensive Exam October 12 and 13, Computability, Complexity and Algorithms

ACO Comprehensive Exam October 12 and 13, Computability, Complexity and Algorithms 1. Computability, Complexity and Algorithms Given a simple directed graph G = (V, E), a cycle cover is a set of vertex-disjoint directed cycles that cover all vertices of the graph. 1. Show that there

More information

Simple Model Selection Cross Validation Regularization Neural Networks

Simple Model Selection Cross Validation Regularization Neural Networks Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February

More information

Linear discriminant analysis and logistic

Linear discriminant analysis and logistic Practical 6: classifiers Linear discriminant analysis and logistic This practical looks at two different methods of fitting linear classifiers. The linear discriminant analysis is implemented in the MASS

More information

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Supervised Learning: Nonparametric

More information

Mathematical Programming and Research Methods (Part II)

Mathematical Programming and Research Methods (Part II) Mathematical Programming and Research Methods (Part II) 4. Convexity and Optimization Massimiliano Pontil (based on previous lecture by Andreas Argyriou) 1 Today s Plan Convex sets and functions Types

More information

Support Vector Machines (a brief introduction) Adrian Bevan.

Support Vector Machines (a brief introduction) Adrian Bevan. Support Vector Machines (a brief introduction) Adrian Bevan email: a.j.bevan@qmul.ac.uk Outline! Overview:! Introduce the problem and review the various aspects that underpin the SVM concept.! Hard margin

More information

SVM in Oracle Database 10g: Removing the Barriers to Widespread Adoption of Support Vector Machines

SVM in Oracle Database 10g: Removing the Barriers to Widespread Adoption of Support Vector Machines SVM in Oracle Database 10g: Removing the Barriers to Widespread Adoption of Support Vector Machines Boriana Milenova, Joseph Yarmus, Marcos Campos Data Mining Technologies Oracle Overview Support Vector

More information

Chapter 15 Introduction to Linear Programming

Chapter 15 Introduction to Linear Programming Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2015 Wei-Ta Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of

More information

Perceptron Learning Algorithm

Perceptron Learning Algorithm Perceptron Learning Algorithm An iterative learning algorithm that can find linear threshold function to partition linearly separable set of points. Assume zero threshold value. 1) w(0) = arbitrary, j=1,

More information

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989] Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 3 rd, 2007 1 Boosting [Schapire, 1989] Idea: given a weak

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017 Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last

More information

Machine Learning. Chao Lan

Machine Learning. Chao Lan Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian

More information

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017 CPSC 340: Machine Learning and Data Mining More Linear Classifiers Fall 2017 Admin Assignment 3: Due Friday of next week. Midterm: Can view your exam during instructor office hours next week, or after

More information

ADVANCED CLASSIFICATION TECHNIQUES

ADVANCED CLASSIFICATION TECHNIQUES Admin ML lab next Monday Project proposals: Sunday at 11:59pm ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 Fall 2014 Project proposal presentations Machine Learning: A Geometric View 1 Apples

More information

Stat 4510/7510 Homework 6

Stat 4510/7510 Homework 6 Stat 4510/7510 1/11. Stat 4510/7510 Homework 6 Instructions: Please list your name and student number clearly. In order to receive credit for a problem, your solution must show sufficient details so that

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin

More information

A generalized quadratic loss for Support Vector Machines

A generalized quadratic loss for Support Vector Machines A generalized quadratic loss for Support Vector Machines Filippo Portera and Alessandro Sperduti Abstract. The standard SVM formulation for binary classification is based on the Hinge loss function, where

More information

HW7 Solutions. Gabe Hope. May The accuracy of the nearest neighbor classifier was: 0.9, and the classifier took 2.55 seconds to run.

HW7 Solutions. Gabe Hope. May The accuracy of the nearest neighbor classifier was: 0.9, and the classifier took 2.55 seconds to run. HW7 Solutions Gabe Hope May 2017 1 Problem 1 1.1 Part a The accuracy of the nearest neighbor classifier was: 0.9, and the classifier took 2.55 seconds to run. 1.2 Part b The results using random projections

More information

Cross-validation. Cross-validation is a resampling method.

Cross-validation. Cross-validation is a resampling method. Cross-validation Cross-validation is a resampling method. It refits a model of interest to samples formed from the training set, in order to obtain additional information about the fitted model. For example,

More information

Rita McCue University of California, Santa Cruz 12/7/09

Rita McCue University of California, Santa Cruz 12/7/09 Rita McCue University of California, Santa Cruz 12/7/09 1 Introduction 2 Naïve Bayes Algorithms 3 Support Vector Machines and SVMLib 4 Comparative Results 5 Conclusions 6 Further References Support Vector

More information

No more questions will be added

No more questions will be added CSC 2545, Spring 2017 Kernel Methods and Support Vector Machines Assignment 2 Due at the start of class, at 2:10pm, Thurs March 23. No late assignments will be accepted. The material you hand in should

More information

Practical 7: Support vector machines

Practical 7: Support vector machines Practical 7: Support vector machines Support vector machines are implemented in several R packages, including e1071, caret and kernlab. We will use the e1071 package in this practical. install.packages('e1071')

More information

Behavioral Data Mining. Lecture 10 Kernel methods and SVMs

Behavioral Data Mining. Lecture 10 Kernel methods and SVMs Behavioral Data Mining Lecture 10 Kernel methods and SVMs Outline SVMs as large-margin linear classifiers Kernel methods SVM algorithms SVMs as large-margin classifiers margin The separating plane maximizes

More information