Understanding multivariate pattern analysis for neuroimaging applications

Size: px
Start display at page:

Download "Understanding multivariate pattern analysis for neuroimaging applications"

Transcription

1 Understanding multivariate pattern analysis for neuroimaging applications Dimitri Van De Ville Medical Image Processing Lab, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne Department of Radiology & Medical Informatics, Université de Genève April 23, 2015

2 Overview Brain decoding: motivation and applications Limitations of regular SPM (GLM) Hall of fame of brain decoding Pattern recognition principles for brain decoding Basics Linear classifier Discriminative versus generative model Cross-validation Training phase Evaluation phase Testing phase Support vector machine (SVM) Primal and dual form Kernel trick 2

3 whole-brain scan 2x2x2mm 3, slices every 2-4 sec GLM Stats time intracranial voxels timepoints t-value 3

4 Limitations of the GLM approach Massively univariate approach Hypothesis testing is applied voxel-wise All voxels are treated independently, but are obviously not independent Textbook multivariate approaches are not practical E.g., estimating spatial covariance matrix ( x ) by timepoints will be rank-deficient fmri data are complicated Data-driven: let the data speak Group statistics are great but the ultimate goal is often to know how well the results generalizes (prediction) condition 1 condition 2 voxel 1 voxel 2 Learn from training data, apply to the (unseen) test data voxel 1 voxel 2 4

5 Forward versus backwards inference stimuli conditions disease disorder... encoding generative model data phrenology stimuli conditions disease disorder... decoding discriminative model data blind reading 5

6 Exploiting subtle spatial patterns example in in 2D 9D feature space [Cox et al., 2003; Haynes and Rees, 2006]

7 Emotions in voice-sensitive cortices How is emotional information treated in aud. cortex? 22 healthy right-handed subjects 10 actors pronounced ne kalibam sout molem (anger, sadness, neutrality, relief, joy) normalization for mean sound intensity; rated by 24 (other) subjects Classifier approach on aud. cortex localizer: human voices > animal sounds 20 stimuli for each category (5); 5 classifiers (one-versus-others) p<0.05 (FWE) [Ethofer, VDV, Scherer, Vuilleumier, Current Biology, 2009; collaboration with LABNIC/CISA, UniGE] 7

8 Anger Sadness Neutral Relief Joy z=0 z=+3 z=+6 Ne kalibam sout molem [Ethofer, VDV, Scherrer, Vuilleumier, Current Biology, 2009] 8

9 Decoding from the emotional brain Results for bilateral voice-sensitive cortices solid: smoothed (10mm FWHM) dashed: non-smoothed dotted: spatial average Chance level: 20% Classifier: linear SVM Evidence that auditory cues for emotional processing are spatially encoded; bilaterial representation Comprehension of emotional prosody is crucial for social functioning psychiatric disorders (schizophrenia, bipolar affective disorder,...) [Ethofer, VDV, Scherer, Vuilleumier, Current Biology, 2009] 9

10 Brain decoding of visual processing distributed patterns subtle patterns in V1 advanced models for receptive fields [Haxby et al., 2001] [Kamitani and Tong, 2005, 2006] [Haynes, Rees., 2006] [Miyawaki et al., 2008] 10

11 Decoding video sequences reverse statistical inference ( decode stimulus from data) [Nishimoto et al, 2011] 11

12 Basics of pattern recognition D features feature vector xi model function machine classifier... label yi (e.g., -1/1, control/patient) Machine learning aims at getting the model right based on a series of examples (supervised learning) learning/training phase: from examples {(x1,y1),...,(xn,yn)} learn model f such that f(xi)=yi testing phase: for unseen data xs, predict label f(xs) 12

13 Basics of pattern recognition N instances D features model feature matrix X labels y Classification as a regression problem Linear model: y = w T X + b + noise where the vector (w,b) characterizes the full model Learning the model by least-squares fitting: (w,b) = arg min (w,b) y w T X b 2 Underdetermined (D >N): (w,b) = arg min (w,b) y w T X b 2 + kwk 2 solution: w = pinv(x T )y T 13

14 Basics of pattern recognition N instances D features model feature matrix X labels y Evaluate classifier on unseen data Avoid overfitting: increasingly complex models will succeed better to fit (any) training data! Classify new data x 0 by comparing w T x 0 + b against 0? 14

15 Interpretation of the weight vector examples of class 1 weight vector w Voxel 1 Voxel 2 Voxel 1 Voxel 2 examples of class 2 Training w 1 = +5 w 2 = - 3 Voxel 1 Voxel 2 Voxel 1 Voxel 2 spatial representation of decision function multivariate pattern: no local inference! new example x 1 = 0.5 x 2 = 0.8 Tes6ng predicted label: (w 1.x 1 +w 2.x 2 )+b = ( )+0 = 0.1 posi?ve value - > class 1 15

16 Basics of pattern recognition separating hyperplane Linear model projects instance on line parallel to w Decision boundary is a hyperplane in the feature space If feature dimensions correspond to voxels, then w corresponds to a map Bias b allows to shift decision boundary along w [adapted from Jaakkola] 16

17 Basics of pattern recognition D features N instances p(x y) model P (y x) feature matrix X labels y Classification and decision theory Assume class-conditional densities p(x y) and class probabilities P (y) known Minimizing the overall probability of error corresponds to posterior y 0 = arg max P (y x 0 ) y = arg max y = arg max y p(x 0 y)p (y) p(x 0 ) p(x 0 y)p (y) likelihood (Bayes rule) 17

18 Basics of pattern recognition Wonderful link between discriminative model (posterior) and generative model (likelihood) posterior P (y x) / p(x y)p (y) likelihood Discriminative model; e.g., linear classifier, SVM Generative model; e.g., naïve Bayes p(x y) = DY p(x i y) i=1 where the features are modeled as independent random variables 18

19 Basics of pattern recognition D features N instances model truth estimated class 1 class 2 class 1 A B class 2 C D feature matrix X Tune labels y Full Dataset Split EV TR TE Error Train Evaluate Test OK? Final Performance Measure TR EV TE Leave-one-out cross-validation Repeatedly train on all instances except one, evaluate on left-out instance Accuracy statistics: class 1 = A A+B ; class 2= D C+D ; average = A+D A+B+C+D Sensitivity = D C+D = TP FN+TP ; specificity = A A+B = TN FP+TN 19

20 Testing classifiers: the ROC curve Receiver Operating Characteristic (ROC) curve Allow to see compromise over the operating range of a classifier Can be used for classifier comparison e.g., compute the Area Under Curve (AUC) as a summary measure of performance sensitivity classifier 1 AUC 0.79 classifier 2 AUC specificity 20

21 Learning in high dimensions Dimensionality of feature vector (e.g., #voxels) often exceeds number of instances model is underdetermined and requires additional assumptions (e.g., regularization) Other solutions feature selection & extraction gray-matter mask; region-of-interest (ROI); atlas regions only most-active voxels (inside training phase!) searchlight approach [Kriegeskorte et al., 2006] ROI is sliding around brain multivariate capabilities (long-range) are limited dimensionality reduction PCA, ICA,... kernel methods 21

22 Kernel matrix Kernel function K similarity measure (scalar) between two feature vectors simplest example: dot product (linear kernel) N brain scans (feature vectors) instances dot product: (4.-2) +(1.3) = instances 22

23 Support vector machine Maximum-margin classifier Assume data are linearly separable: 9w 2 R D, 9b 2 R : y i (w T x i + b) > 0,i=1,...,N w and b can be rescaled such that closest points to hyperplane satisfy w T x i + b =1 Distance x of a point x to hyperplane is HARD MARGIN w T x + b = 1 w T x + b = 0 w T x + b = -1 x = wt x i + b kwk Margin of hyperplane = 1/ kwk [Vapnik, 1963, 1992] 23

24 Support vector machine (primal form) Quadratic programming optimization problem arg min (w,b) 1 2 wt w subject to y i (w T x i + b) 1,i=1,...,N Constrained problem can be formulated as an unconstrained one using Lagrange multipliers: arg min max (w,b) >0 NX i y i (w T x i + b) kwk2 i=1 {z } L Points for which y i (w T x i + b) > 1 do not matter and should have i =0 Points on the margin have y i (w T x i + b) =1and are called support vectors; these have i > 0 i = y i (w T x i + b) 1=0! b = w T x i y i convex criterion so no local minima 24

25 Support vector machine (primal form) Lagrangian functional arg min max (w,b) >0 Solving L for i, b, w leads to NX i y i (w T x i + b) kwk2 i=1 {z i = y i (w T x i + =! b = w T x i y i (for support vector) NX y i i =0 = w X i=1! w = i y i x i =0 NX i y i x i i=1 25

26 Support vector machine (dual form) Lagrangian functional can be rewritten as L = 1 2 NX i=1 NX i y i j=1 x T i x j y j j {z } K(x i,x j ) N X i=1 i, s.t. NX y i i =0 i=1 Minimization of this dual form requires to find i space instead of D dimensional one! 0 in an N dimensional Weight vector w can be found as a linear combination of the instances w = XN SV i 0 =1 i 0y i 0x i 0 Efficient: sparse solution where only the N SV support vectors (with i 0 > Limited 0) contribute! number of support vectors; rest of data can be ignored Prediction of unseen data point x according to f(x) =sign XN SV i 0 =1 y i 0 i 0x T i 0 x + b! 26

27 Support vector machine Soft-margin classifier Introduce slack variables i to relax separation: i > 0! x i has margin less than 1 New optimization problem (primal) arg min (w,b), i wt w + C NX i=1 i subject to y i (w T x i + b) 1 i 2 SOFT MARGIN w T x + b = 1 1 w T x + b = 0 w T x + b = -1 Which leads to the dual form: arg min subject to NX i=1 NX i y i x T i x j y j j j=1 X N i i=1 NX y i i =0and 0 apple i apple C i=1 box constraint : allows for capacity control C = 0 : very large safety margin C = Inf: close to hard-margin constraint [Shawe-Taylor and Cristianini, 2004] 27

28 Kernel trick Linear classification (by hyperplane) can be limiting Project into (even) higher-dimensional feature space V where linear separation corresponds to non-linear separation in original space 28

29 Kernel trick Linear classification (by hyperplane) can be limiting Project into (even) higher-dimensional feature space V where linear separation corresponds to non-linear separation in original space However, this new feature space V is implicit because in practice we use generalized kernel functions: K(x 1, x 2 )=h'(x 1 ),'(x 2 )i V where h, i V should be a proper inner product, that is, K(, ) should satisfy Mercer s condition: NX NX K(x i, x j )c i c j 0 i=1 j=1 for all finite sequences of points (x 1,...,x N ) and real-valued coefficients (c 1,...,c N ). This also means that the kernel matrix [K(x i, x j )] i,j should be positive semi-definite (i.e., all eigenvalues 0) 29

30 Kernel trick Commonly used kernel functions K(x 1, x 2 )=h'(x 1 ),'(x 2 )i: Polynomial (homogeneous): K(x 1, x 2 )=(x T 1 x 2 ) p Polynomial (inhomogeneous): K(x 1, x 2 )=(x T 1 x 2 + 1) p Gaussian radial basis kernel: K(x 1, x 2 )=exp( kx 1 x 2 k 2 ), >0 Example: polynomial kernel K(x 1, x 2 )=(x T 1 x 2 ) 2 For D =2, developing the kernel function leads to (binomial theorem) K(x 1, x 2 ) = (x 1,1 x 2,1 + x 1,2 x 2,2 ) 2 = (x 1,1 x 2,1 ) 2 +2x 1,1 x 2,1 x 1,2 x 2,2 +(x 1,2 x 2,2 ) 2 D p p E = [x 2 1,1 2x1,1 x 1,2 x 2 1,2], [x 2 2,1 2x2,1 x 2 2,2] From which we can identify ' :(x 1,x 2 ) 7! (x 2 1, p 2x 1 x 2,x 2 2) For general D, this kernel maps to D+1 theoream) 2 = D(D+1) 2 dimensions (multinomial 30

31 Kernel trick The transform ' is implicit, but the weight vector is defined as w = X i i y i '(x i ) Dot products with w can be computed using the kernel trick: w T '(x) = X i i y i K(x i, x) However, in general there exists no w 0 such that w T '(x) =K(w 0, x) which means there is no equivalent weight map 31

32 Linear and non-linear kernels Neuroimaging applications often work best with linear kernels Very high-dimensional data with small sample sizes (D >> N); increased complexity of non-linear kernels (more parameters) cannot be turned into an advantage Linear models have less risk of overfitting Weight vector of linear model has direct interpretation as an image; i.e., discrimination map 32

33 What s the best classifier? Model complexity (no free lunch) high reproducibility low predictability low reproducibility high predictability Ensemble learning combine many (weak) classifiers to get better results bagging: classifiers on random training subset, equal vote boosting: adding classifiers that deal with previously misclassified data [Hasbie et al., 2009] 33

34 Recommended reading Pereira et al., Machine learning classifiers and fmri: A tutorial overview, NeuroImage 45, 2009 Richiardi et al., Machine learning with brain graphs, IEEE Signal Processing Magazine 30, 2013 Haufe et al., On the interpretation of weight vectors of linear models in multivariate neuroimaging, NeuroImage, in press 34

35 Classifier selection Is classifier A better than classifier B? Hypothesis testing framework Do the two sets of decisions of classifiers A and B represent two different populations? test sample A B Careful: Samples (sets of decisions) are dependent (same evaluation/testing dataset), and measurement type is categorical binary (for a two-class problem) Use McNemar test H0: there is no difference between the accuracies of the two classifiers 35

36 Selection with the McNemar test Compute relationships between classifier decisions B B A N11 N10 A N01 N00 If H0 is true, we should have N01 = N10 = 0.5(N01+N10) We can measure the following test-static based on the observed counts: Then, compare x 2 to critical value of χ 2 (1 DOF) at a level of significance α (requires N01+N10>25) x 2 = ( N 01 N 10 1) 2 N 01 + N 10 After [Dietterich, 1998] 36

Understanding multivariate pattern analysis for neuroimaging applications

Understanding multivariate pattern analysis for neuroimaging applications Understanding multivariate pattern analysis for neuroimaging applications Maria Giulia Preti Dimitri Van De Ville Medical Image Processing Lab, Institute of Bioengineering, Ecole Polytechnique Fédérale

More information

Pattern Recognition for Neuroimaging Data

Pattern Recognition for Neuroimaging Data Pattern Recognition for Neuroimaging Data Edinburgh, SPM course April 2013 C. Phillips, Cyclotron Research Centre, ULg, Belgium http://www.cyclotron.ulg.ac.be Overview Introduction Univariate & multivariate

More information

Multi-voxel pattern analysis: Decoding Mental States from fmri Activity Patterns

Multi-voxel pattern analysis: Decoding Mental States from fmri Activity Patterns Multi-voxel pattern analysis: Decoding Mental States from fmri Activity Patterns Artwork by Leon Zernitsky Jesse Rissman NITP Summer Program 2012 Part 1 of 2 Goals of Multi-voxel Pattern Analysis Decoding

More information

SUPPORT VECTOR MACHINES

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Today Reading AIMA 18.9 Goals (Naïve Bayes classifiers) Support vector machines 1 Support Vector Machines (SVMs) SVMs are probably the most popular off-the-shelf classifier! Software

More information

Linear methods for supervised learning

Linear methods for supervised learning Linear methods for supervised learning LDA Logistic regression Naïve Bayes PLA Maximum margin hyperplanes Soft-margin hyperplanes Least squares resgression Ridge regression Nonlinear feature maps Sometimes

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines srihari@buffalo.edu SVM Discussion Overview 1. Overview of SVMs 2. Margin Geometry 3. SVM Optimization 4. Overlapping Distributions 5. Relationship to Logistic Regression 6. Dealing

More information

Multivariate Pattern Classification. Thomas Wolbers Space and Aging Laboratory Centre for Cognitive and Neural Systems

Multivariate Pattern Classification. Thomas Wolbers Space and Aging Laboratory Centre for Cognitive and Neural Systems Multivariate Pattern Classification Thomas Wolbers Space and Aging Laboratory Centre for Cognitive and Neural Systems Outline WHY PATTERN CLASSIFICATION? PROCESSING STREAM PREPROCESSING / FEATURE REDUCTION

More information

MULTIVARIATE ANALYSES WITH fmri DATA

MULTIVARIATE ANALYSES WITH fmri DATA MULTIVARIATE ANALYSES WITH fmri DATA Sudhir Shankar Raman Translational Neuromodeling Unit (TNU) Institute for Biomedical Engineering University of Zurich & ETH Zurich Motivation Modelling Concepts Learning

More information

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs) Data Mining: Concepts and Techniques Chapter 9 Classification: Support Vector Machines 1 Support Vector Machines (SVMs) SVMs are a set of related supervised learning methods used for classification Based

More information

Kernel Methods & Support Vector Machines

Kernel Methods & Support Vector Machines & Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector

More information

SUPPORT VECTOR MACHINES

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Today Reading AIMA 8.9 (SVMs) Goals Finish Backpropagation Support vector machines Backpropagation. Begin with randomly initialized weights 2. Apply the neural network to each training

More information

Support Vector Machines (a brief introduction) Adrian Bevan.

Support Vector Machines (a brief introduction) Adrian Bevan. Support Vector Machines (a brief introduction) Adrian Bevan email: a.j.bevan@qmul.ac.uk Outline! Overview:! Introduce the problem and review the various aspects that underpin the SVM concept.! Hard margin

More information

Multivariate pattern classification

Multivariate pattern classification Multivariate pattern classification Thomas Wolbers Space & Ageing Laboratory (www.sal.mvm.ed.ac.uk) Centre for Cognitive and Neural Systems & Centre for Cognitive Ageing and Cognitive Epidemiology Outline

More information

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017 Data Analysis 3 Support Vector Machines Jan Platoš October 30, 2017 Department of Computer Science Faculty of Electrical Engineering and Computer Science VŠB - Technical University of Ostrava Table of

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007,

More information

12 Classification using Support Vector Machines

12 Classification using Support Vector Machines 160 Bioinformatics I, WS 14/15, D. Huson, January 28, 2015 12 Classification using Support Vector Machines This lecture is based on the following sources, which are all recommended reading: F. Markowetz.

More information

A Short SVM (Support Vector Machine) Tutorial

A Short SVM (Support Vector Machine) Tutorial A Short SVM (Support Vector Machine) Tutorial j.p.lewis CGIT Lab / IMSC U. Southern California version 0.zz dec 004 This tutorial assumes you are familiar with linear algebra and equality-constrained optimization/lagrange

More information

Support Vector Machines + Classification for IR

Support Vector Machines + Classification for IR Support Vector Machines + Classification for IR Pierre Lison University of Oslo, Dep. of Informatics INF3800: Søketeknologi April 30, 2014 Outline of the lecture Recap of last week Support Vector Machines

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines CS 536: Machine Learning Littman (Wu, TA) Administration Slides borrowed from Martin Law (from the web). 1 Outline History of support vector machines (SVM) Two classes,

More information

All lecture slides will be available at CSC2515_Winter15.html

All lecture slides will be available at  CSC2515_Winter15.html CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14

More information

Lecture 9: Support Vector Machines

Lecture 9: Support Vector Machines Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and

More information

MultiVariate Bayesian (MVB) decoding of brain images

MultiVariate Bayesian (MVB) decoding of brain images MultiVariate Bayesian (MVB) decoding of brain images Alexa Morcom Edinburgh SPM course 2015 With thanks to J. Daunizeau, K. Brodersen for slides stimulus behaviour encoding of sensorial or cognitive state?

More information

Support Vector Machines

Support Vector Machines Support Vector Machines About the Name... A Support Vector A training sample used to define classification boundaries in SVMs located near class boundaries Support Vector Machines Binary classifiers whose

More information

Optimal Separating Hyperplane and the Support Vector Machine. Volker Tresp Summer 2018

Optimal Separating Hyperplane and the Support Vector Machine. Volker Tresp Summer 2018 Optimal Separating Hyperplane and the Support Vector Machine Volker Tresp Summer 2018 1 (Vapnik s) Optimal Separating Hyperplane Let s consider a linear classifier with y i { 1, 1} If classes are linearly

More information

Machine Learning: Think Big and Parallel

Machine Learning: Think Big and Parallel Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least

More information

Content-based image and video analysis. Machine learning

Content-based image and video analysis. Machine learning Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all

More information

COMS 4771 Support Vector Machines. Nakul Verma

COMS 4771 Support Vector Machines. Nakul Verma COMS 4771 Support Vector Machines Nakul Verma Last time Decision boundaries for classification Linear decision boundary (linear classification) The Perceptron algorithm Mistake bound for the perceptron

More information

HW2 due on Thursday. Face Recognition: Dimensionality Reduction. Biometrics CSE 190 Lecture 11. Perceptron Revisited: Linear Separators

HW2 due on Thursday. Face Recognition: Dimensionality Reduction. Biometrics CSE 190 Lecture 11. Perceptron Revisited: Linear Separators HW due on Thursday Face Recognition: Dimensionality Reduction Biometrics CSE 190 Lecture 11 CSE190, Winter 010 CSE190, Winter 010 Perceptron Revisited: Linear Separators Binary classification can be viewed

More information

Support Vector Machines. James McInerney Adapted from slides by Nakul Verma

Support Vector Machines. James McInerney Adapted from slides by Nakul Verma Support Vector Machines James McInerney Adapted from slides by Nakul Verma Last time Decision boundaries for classification Linear decision boundary (linear classification) The Perceptron algorithm Mistake

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

Support Vector Machines

Support Vector Machines Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining

More information

Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning

Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning. Supervised vs. Unsupervised Learning Overview T7 - SVM and s Christian Vögeli cvoegeli@inf.ethz.ch Supervised/ s Support Vector Machines Kernels Based on slides by P. Orbanz & J. Keuchel Task: Apply some machine learning method to data from

More information

Support vector machines

Support vector machines Support vector machines When the data is linearly separable, which of the many possible solutions should we prefer? SVM criterion: maximize the margin, or distance between the hyperplane and the closest

More information

Support Vector Machines

Support Vector Machines Support Vector Machines . Importance of SVM SVM is a discriminative method that brings together:. computational learning theory. previously known methods in linear discriminant functions 3. optimization

More information

Machine Learning Lecture 9

Machine Learning Lecture 9 Course Outline Machine Learning Lecture 9 Fundamentals ( weeks) Bayes Decision Theory Probability Density Estimation Nonlinear SVMs 19.05.013 Discriminative Approaches (5 weeks) Linear Discriminant Functions

More information

Machine Learning Lecture 9

Machine Learning Lecture 9 Course Outline Machine Learning Lecture 9 Fundamentals ( weeks) Bayes Decision Theory Probability Density Estimation Nonlinear SVMs 30.05.016 Discriminative Approaches (5 weeks) Linear Discriminant Functions

More information

SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin. April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1

SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin. April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1 SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1 Overview The goals of analyzing cross-sectional data Standard methods used

More information

Support Vector Machines

Support Vector Machines Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

CSE 417T: Introduction to Machine Learning. Lecture 22: The Kernel Trick. Henry Chai 11/15/18

CSE 417T: Introduction to Machine Learning. Lecture 22: The Kernel Trick. Henry Chai 11/15/18 CSE 417T: Introduction to Machine Learning Lecture 22: The Kernel Trick Henry Chai 11/15/18 Linearly Inseparable Data What can we do if the data is not linearly separable? Accept some non-zero in-sample

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 20: 10/12/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

Multivariate analyses & decoding

Multivariate analyses & decoding Multivariate analyses & decoding Kay Henning Brodersen Computational Neuroeconomics Group Department of Economics, University of Zurich Machine Learning and Pattern Recognition Group Department of Computer

More information

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010 INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,

More information

Bagging for One-Class Learning

Bagging for One-Class Learning Bagging for One-Class Learning David Kamm December 13, 2008 1 Introduction Consider the following outlier detection problem: suppose you are given an unlabeled data set and make the assumptions that one

More information

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule. CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit

More information

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning Robot Learning 1 General Pipeline 1. Data acquisition (e.g., from 3D sensors) 2. Feature extraction and representation construction 3. Robot learning: e.g., classification (recognition) or clustering (knowledge

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Xiaojin Zhu jerryzhu@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [ Based on slides from Andrew Moore http://www.cs.cmu.edu/~awm/tutorials] slide 1

More information

Classification by Support Vector Machines

Classification by Support Vector Machines Classification by Support Vector Machines Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Practical DNA Microarray Analysis 2003 1 Overview I II III

More information

Bayesian Inference in fmri Will Penny

Bayesian Inference in fmri Will Penny Bayesian Inference in fmri Will Penny Bayesian Approaches in Neuroscience Karolinska Institutet, Stockholm February 2016 Overview Posterior Probability Maps Hemodynamic Response Functions Population

More information

DM6 Support Vector Machines

DM6 Support Vector Machines DM6 Support Vector Machines Outline Large margin linear classifier Linear separable Nonlinear separable Creating nonlinear classifiers: kernel trick Discussion on SVM Conclusion SVM: LARGE MARGIN LINEAR

More information

Data Mining in Bioinformatics Day 1: Classification

Data Mining in Bioinformatics Day 1: Classification Data Mining in Bioinformatics Day 1: Classification Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institute Tübingen and Eberhard Karls

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Chapter 9 Chapter 9 1 / 50 1 91 Maximal margin classifier 2 92 Support vector classifiers 3 93 Support vector machines 4 94 SVMs with more than two classes 5 95 Relationshiop to

More information

Voxel selection algorithms for fmri

Voxel selection algorithms for fmri Voxel selection algorithms for fmri Henryk Blasinski December 14, 2012 1 Introduction Functional Magnetic Resonance Imaging (fmri) is a technique to measure and image the Blood- Oxygen Level Dependent

More information

The Curse of Dimensionality

The Curse of Dimensionality The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more

More information

LECTURE 5: DUAL PROBLEMS AND KERNELS. * Most of the slides in this lecture are from

LECTURE 5: DUAL PROBLEMS AND KERNELS. * Most of the slides in this lecture are from LECTURE 5: DUAL PROBLEMS AND KERNELS * Most of the slides in this lecture are from http://www.robots.ox.ac.uk/~az/lectures/ml Optimization Loss function Loss functions SVM review PRIMAL-DUAL PROBLEM Max-min

More information

CS570: Introduction to Data Mining

CS570: Introduction to Data Mining CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.

More information

Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques

Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques Sea Chen Department of Biomedical Engineering Advisors: Dr. Charles A. Bouman and Dr. Mark J. Lowe S. Chen Final Exam October

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Example Learning Problem Example Learning Problem Celebrity Faces in the Wild Machine Learning Pipeline Raw data Feature extract. Feature computation Inference: prediction,

More information

Correction for multiple comparisons. Cyril Pernet, PhD SBIRC/SINAPSE University of Edinburgh

Correction for multiple comparisons. Cyril Pernet, PhD SBIRC/SINAPSE University of Edinburgh Correction for multiple comparisons Cyril Pernet, PhD SBIRC/SINAPSE University of Edinburgh Overview Multiple comparisons correction procedures Levels of inferences (set, cluster, voxel) Circularity issues

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting

More information

Feature scaling in support vector data description

Feature scaling in support vector data description Feature scaling in support vector data description P. Juszczak, D.M.J. Tax, R.P.W. Duin Pattern Recognition Group, Department of Applied Physics, Faculty of Applied Sciences, Delft University of Technology,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Maximum Margin Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Pa#ern Recogni-on for Neuroimaging Toolbox

Pa#ern Recogni-on for Neuroimaging Toolbox Pa#ern Recogni-on for Neuroimaging Toolbox Pa#ern Recogni-on Methods: Basics João M. Monteiro Based on slides from Jessica Schrouff and Janaina Mourão-Miranda PRoNTo course UCL, London, UK 2017 Outline

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

Lecture 7: Support Vector Machine

Lecture 7: Support Vector Machine Lecture 7: Support Vector Machine Hien Van Nguyen University of Houston 9/28/2017 Separating hyperplane Red and green dots can be separated by a separating hyperplane Two classes are separable, i.e., each

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines srihari@buffalo.edu SVM Discussion Overview. Importance of SVMs. Overview of Mathematical Techniques Employed 3. Margin Geometry 4. SVM Training Methodology 5. Overlapping Distributions

More information

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013 Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork

More information

Discriminative classifiers for image recognition

Discriminative classifiers for image recognition Discriminative classifiers for image recognition May 26 th, 2015 Yong Jae Lee UC Davis Outline Last time: window-based generic object detection basic pipeline face detection with boosting as case study

More information

Classification by Support Vector Machines

Classification by Support Vector Machines Classification by Support Vector Machines Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Practical DNA Microarray Analysis 2003 1 Overview I II III

More information

Support Vector Machines

Support Vector Machines Support Vector Machines SVM Discussion Overview. Importance of SVMs. Overview of Mathematical Techniques Employed 3. Margin Geometry 4. SVM Training Methodology 5. Overlapping Distributions 6. Dealing

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Modern Medical Image Analysis 8DC00 Exam

Modern Medical Image Analysis 8DC00 Exam Parts of answers are inside square brackets [... ]. These parts are optional. Answers can be written in Dutch or in English, as you prefer. You can use drawings and diagrams to support your textual answers.

More information

Application of Support Vector Machine Algorithm in Spam Filtering

Application of Support Vector Machine Algorithm in  Spam Filtering Application of Support Vector Machine Algorithm in E-Mail Spam Filtering Julia Bluszcz, Daria Fitisova, Alexander Hamann, Alexey Trifonov, Advisor: Patrick Jähnichen Abstract The problem of spam classification

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

Combine the PA Algorithm with a Proximal Classifier

Combine the PA Algorithm with a Proximal Classifier Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Introduction to object recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Overview Basic recognition tasks A statistical learning approach Traditional or shallow recognition

More information

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses

More information

Machine Learning. Chao Lan

Machine Learning. Chao Lan Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian

More information

A Dendrogram. Bioinformatics (Lec 17)

A Dendrogram. Bioinformatics (Lec 17) A Dendrogram 3/15/05 1 Hierarchical Clustering [Johnson, SC, 1967] Given n points in R d, compute the distance between every pair of points While (not done) Pick closest pair of points s i and s j and

More information

Classification: Feature Vectors

Classification: Feature Vectors Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12

More information

CSE 573: Artificial Intelligence Autumn 2010

CSE 573: Artificial Intelligence Autumn 2010 CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine

More information

5 Learning hypothesis classes (16 points)

5 Learning hypothesis classes (16 points) 5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated

More information

Supervised Learning for Image Segmentation

Supervised Learning for Image Segmentation Supervised Learning for Image Segmentation Raphael Meier 06.10.2016 Raphael Meier MIA 2016 06.10.2016 1 / 52 References A. Ng, Machine Learning lecture, Stanford University. A. Criminisi, J. Shotton, E.

More information

Search Engines. Information Retrieval in Practice

Search Engines. Information Retrieval in Practice Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Classification and Clustering Classification and clustering are classical pattern recognition / machine learning problems

More information

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska Classification Lecture Notes cse352 Neural Networks Professor Anita Wasilewska Neural Networks Classification Introduction INPUT: classification data, i.e. it contains an classification (class) attribute

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Support Vector Machines Aurélie Herbelot 2018 Centre for Mind/Brain Sciences University of Trento 1 Support Vector Machines: introduction 2 Support Vector Machines (SVMs) SVMs

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 12 Combining

More information

A Taxonomy of Semi-Supervised Learning Algorithms

A Taxonomy of Semi-Supervised Learning Algorithms A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph

More information

Multivariate Data Analysis and Machine Learning in High Energy Physics (V)

Multivariate Data Analysis and Machine Learning in High Energy Physics (V) Multivariate Data Analysis and Machine Learning in High Energy Physics (V) Helge Voss (MPI K, Heidelberg) Graduierten-Kolleg, Freiburg, 11.5-15.5, 2009 Outline last lecture Rule Fitting Support Vector

More information

Lab 2: Support vector machines

Lab 2: Support vector machines Artificial neural networks, advanced course, 2D1433 Lab 2: Support vector machines Martin Rehn For the course given in 2006 All files referenced below may be found in the following directory: /info/annfk06/labs/lab2

More information

CS 229 Final Project Report Learning to Decode Cognitive States of Rat using Functional Magnetic Resonance Imaging Time Series

CS 229 Final Project Report Learning to Decode Cognitive States of Rat using Functional Magnetic Resonance Imaging Time Series CS 229 Final Project Report Learning to Decode Cognitive States of Rat using Functional Magnetic Resonance Imaging Time Series Jingyuan Chen //Department of Electrical Engineering, cjy2010@stanford.edu//

More information

Interpreting predictive models in terms of anatomically labelled regions

Interpreting predictive models in terms of anatomically labelled regions Interpreting predictive models in terms of anatomically labelled regions Data Feature extraction/selection Model specification Accuracy and significance Model interpretation 2 Data Feature extraction/selection

More information

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Kernels + K-Means Matt Gormley Lecture 29 April 25, 2018 1 Reminders Homework 8:

More information

Kernel SVM. Course: Machine Learning MAHDI YAZDIAN-DEHKORDI FALL 2017

Kernel SVM. Course: Machine Learning MAHDI YAZDIAN-DEHKORDI FALL 2017 Kernel SVM Course: MAHDI YAZDIAN-DEHKORDI FALL 2017 1 Outlines SVM Lagrangian Primal & Dual Problem Non-linear SVM & Kernel SVM SVM Advantages Toolboxes 2 SVM Lagrangian Primal/DualProblem 3 SVM LagrangianPrimalProblem

More information

Rule extraction from support vector machines

Rule extraction from support vector machines Rule extraction from support vector machines Haydemar Núñez 1,3 Cecilio Angulo 1,2 Andreu Català 1,2 1 Dept. of Systems Engineering, Polytechnical University of Catalonia Avda. Victor Balaguer s/n E-08800

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Data mining with Support Vector Machine

Data mining with Support Vector Machine Data mining with Support Vector Machine Ms. Arti Patle IES, IPS Academy Indore (M.P.) artipatle@gmail.com Mr. Deepak Singh Chouhan IES, IPS Academy Indore (M.P.) deepak.schouhan@yahoo.com Abstract: Machine

More information

Decoding analyses. Neuroimaging: Pattern Analysis 2017

Decoding analyses. Neuroimaging: Pattern Analysis 2017 Decoding analyses Neuroimaging: Pattern Analysis 2017 Scope Conceptual explanation of decoding/machine learning (ML) and related concepts; Not about mathematical basis of ML algorithms! Learning goals

More information