Desiging and combining kernels: some lessons learned from bioinformatics

Size: px
Start display at page:

Download "Desiging and combining kernels: some lessons learned from bioinformatics"

Transcription

1 Desiging and combining kernels: some lessons learned from bioinformatics Jean-Philippe Vert Mines ParisTech & Institut Curie NIPS MKL workshop, Dec 12, Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 1 / 26

2 Kernels are very popular in bioinformatics Why? Many problems can be approached by kernels methods (classification, regression, feature construction,...) Many data with particular structures Kernel design Need to integrate heterogeneous data Kernel combination Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 2 / 26

3 Outline 1 Kernel design 2 Kernel combination 3 Conclusion Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 3 / 26

4 Outline 1 Kernel design 2 Kernel combination 3 Conclusion Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 4 / 26

5 What is a GOOD kernel? Leads to good performances Mathematically valid Fast to compute Interpretable model (?) X maskat... msises marssl... malhtv... mappsv... mahtlg... φ F Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 5 / 26

6 How to MAKE a good kernel? 3 main ideas 1 Define good features K (x, x ) = Φ(x) Φ(x ) 2 Define a good metric d(x, x ) = K (x, x) + K (x, x ) 2K (x, x ) 3 Define a good functional penalty { } min R(f ) + λ f 2 H f H Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 6 / 26

7 How to MAKE a good kernel? 3 main ideas 1 Define good features K (x, x ) = Φ(x) Φ(x ) 2 Define a good metric d(x, x ) = K (x, x) + K (x, x ) 2K (x, x ) 3 Define a good functional penalty { } min R(f ) + λ f 2 H f H Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 6 / 26

8 How to MAKE a good kernel? 3 main ideas 1 Define good features K (x, x ) = Φ(x) Φ(x ) 2 Define a good metric d(x, x ) = K (x, x) + K (x, x ) 2K (x, x ) 3 Define a good functional penalty { } min R(f ) + λ f 2 H f H Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 6 / 26

9 Idea 1: define good features Motivation Estimate a function f (x) = w Φ(x) A good feature is more important than a good algorithm! Examples Explicit feature computations substring or subgraph indexation Fisher kernel Φ(x) = θ log P θ (x) Implicit feature construction + kernel trick Walk-based graph kernels Mutual information kernels K (x, x ) = P θ (x)p θ (x )dθ Caveats One good feature among too many irrelevant ones may not be enough with L 2 regularization Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 7 / 26

10 Idea 1: define good features Motivation Estimate a function f (x) = w Φ(x) A good feature is more important than a good algorithm! Examples Explicit feature computations substring or subgraph indexation Fisher kernel Φ(x) = θ log P θ (x) Implicit feature construction + kernel trick Walk-based graph kernels Mutual information kernels K (x, x ) = P θ (x)p θ (x )dθ Caveats One good feature among too many irrelevant ones may not be enough with L 2 regularization Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 7 / 26

11 Idea 1: define good features Motivation Estimate a function f (x) = w Φ(x) A good feature is more important than a good algorithm! Examples Explicit feature computations substring or subgraph indexation Fisher kernel Φ(x) = θ log P θ (x) Implicit feature construction + kernel trick Walk-based graph kernels Mutual information kernels K (x, x ) = P θ (x)p θ (x )dθ Caveats One good feature among too many irrelevant ones may not be enough with L 2 regularization Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 7 / 26

12 Example: string kernel with substring indexation Index the feature space by fixed-length strings, i.e., where Φ u (x) can be: Φ (x) = (Φ u (x)) u A k the number of occurrences of u in x (without gaps) : spectrum kernel (Leslie et al., 2002) the number of occurrences of u in x up to m mismatches (without gaps) : mismatch kernel (Leslie et al., 2004) the number of occurrences of u in x allowing gaps, with a weight decaying exponentially with the number of gaps : substring kernel (Lohdi et al., 2002) Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 8 / 26

13 Idea 2: define a good metric Motivation A kernel defines a Hilbert metric d(x, x ) = K (x, x) + K (x, x ) 2K (x, x ) The functions we can learn are smooth w.r.t this metric f (x) f (x ) f H d(x, x ) Examples Edit distances for strings or graphs, local alignment of biological sequences, graph matching distances MAMMOTH distance between protein 3D structures Caveats Most "good" distances are not Hilbertian Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 9 / 26

14 Idea 2: define a good metric Motivation A kernel defines a Hilbert metric d(x, x ) = K (x, x) + K (x, x ) 2K (x, x ) The functions we can learn are smooth w.r.t this metric f (x) f (x ) f H d(x, x ) Examples Edit distances for strings or graphs, local alignment of biological sequences, graph matching distances MAMMOTH distance between protein 3D structures Caveats Most "good" distances are not Hilbertian Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 9 / 26

15 Idea 2: define a good metric Motivation A kernel defines a Hilbert metric d(x, x ) = K (x, x) + K (x, x ) 2K (x, x ) The functions we can learn are smooth w.r.t this metric f (x) f (x ) f H d(x, x ) Examples Edit distances for strings or graphs, local alignment of biological sequences, graph matching distances MAMMOTH distance between protein 3D structures Caveats Most "good" distances are not Hilbertian Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 9 / 26

16 Example: local alignment kernel How to compare 2 protein sequences? x 1 = CGGSLIAMMWFGV x 2 = CLIVMMNRLMWFGV Find a good alignment π: Two non-hilbertian metrics K (β) LA CGGSLIAMM----WFGV C---LIVMMNRLMWFGV SW (x, y) := max s(π). π Π(x,y) (x, y) = log π Π(x,y) exp (βs (x, y, π)), Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26

17 Example: local alignment kernel How to compare 2 protein sequences? x 1 = CGGSLIAMMWFGV x 2 = CLIVMMNRLMWFGV Find a good alignment π: Two non-hilbertian metrics K (β) LA CGGSLIAMM----WFGV C---LIVMMNRLMWFGV SW (x, y) := max s(π). π Π(x,y) (x, y) = log π Π(x,y) exp (βs (x, y, π)), Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26

18 Idea 3: define a good penalty function Motivation The kernel constrains the set of functions over which we optimize (balls in RKHS). We may first define a penalty we like, then find the associated kernel. Examples Caveats graph Laplacian over gene networks cluster kernel for protein remote homology detection Some penalties may not be RKHS norms (eg, total variation to estimate piecewise constant functions) Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26

19 Idea 3: define a good penalty function Motivation The kernel constrains the set of functions over which we optimize (balls in RKHS). We may first define a penalty we like, then find the associated kernel. Examples Caveats graph Laplacian over gene networks cluster kernel for protein remote homology detection Some penalties may not be RKHS norms (eg, total variation to estimate piecewise constant functions) Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26

20 Idea 3: define a good penalty function Motivation The kernel constrains the set of functions over which we optimize (balls in RKHS). We may first define a penalty we like, then find the associated kernel. Examples Caveats graph Laplacian over gene networks cluster kernel for protein remote homology detection Some penalties may not be RKHS norms (eg, total variation to estimate piecewise constant functions) Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26

21 Example : Kernel on a graph φ Laplacian-based kernel The set H = { f R m : m i=1 f i = 0 } endowed with the norm: Ω (f ) = i j ( f (xi ) f ( x j )) 2 is a RKHS whose reproducing kernel is the pseudo-inverse of the graph Laplacian. Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26

22 The choice of kernel makes a difference No. of families with given performance SVM-LA SVM-pairwise SVM-Mismatch SVM-Fisher ROC50 Performance on the SCOP superfamily recognition benchmark. Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26

23 Outline 1 Kernel design 2 Kernel combination 3 Conclusion Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26

24 Motivation We can imagine plenty of kernels for a given application different kernels for the same data (e.g., different string kernels) kernels for different types of data (e.g., integrating string and 3D structures for protein classification) Which one to use? Perhaps we can combine them to make better than each one individually? Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26

25 Sum kernels Consider p kernels K 1,..., K p Form the sum (eg, Pavlidis et al., 2002): K = p K i. i=1 Equivalently, concatenate the features of the different kernels Equivalently, work in the RKHS H = H 1... H p with f 2 H = inf f =f f p p f i 2 H i. i=1 Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26

26 Some early work Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26

27 Huge improvements can be observed 1 1 Ratio of true positives exp loc 0.2 phy y2h int Ratio of false positives Ratio of true positives exp 0.2 loc phy int Ratio of false positives Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26

28 Multiple kernel learning (MKL) Form the convex combination: K = p η i K i. i=1 where the weights are chosen to minimize the following convex function under the constraint tr(k ) = 1 (Lanckriet et al., 2003): h(k ) = inf f H K {R(f ) + λ f HK } Equivalently, work in the space H = H H p with non-hilbertian group L 1 norm (Bach et al., 2004): f H = inf f =f f p p f i Hi. i=1 Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26

29 Example: Lanckriet et al. (2004) Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26

30 MKL or sum kernel for protein network inference? Precision K1 expression K2 localization K3 phylogenetic Recall Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26

31 MKL or sum kernel for protein network inference? Precision K1 expression K2 localization K3 phylogenetic Sum Recall Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26

32 MKL or sum kernel for protein network inference? Precision K1 expression K2 localization K3 phylogenetic Sum MKL Recall Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26

33 Why MKL does not estimate a good kernel combination Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26

34 Why MKL does not estimate a good kernel combination Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26

35 Why MKL does not estimate a good kernel combination Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26

36 Why MKL does not estimate a good kernel combination Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26

37 Sometimes MKL works Subcellular protein classficiation from 69 kernels Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26

38 MKL or sum kernel? Sum is simpler and works better to combine well-engineered kernels (eg, for data integration). In spite of its misleading name, MKL is better suited for kernel selection than for weight optimization (l 2 vs l 1 ). Useful to select among large sets of kernels. We would love to be able to select the "optimal" linear combination of a few kernels Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26

39 Outline 1 Kernel design 2 Kernel combination 3 Conclusion Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26

40 Conclusion Are kernels popular and useful in bioinformatics? YES Is kernel design useful? YES, and we have many tricks for that Is kernels combination useful for performance? YES, and the sum kernel does a good job Is MKL useful? Hardly yet, but it offers the promising possibility to work with MANY kernels and emphasize INTERPRETABILITY Do we want to learn good linear combinations? YES Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26

Classification of biological sequences with kernel methods

Classification of biological sequences with kernel methods Classification of biological sequences with kernel methods Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Centre for Computational Biology Ecole des Mines de Paris, ParisTech International Conference on

More information

9. Support Vector Machines. The linearly separable case: hard-margin SVMs. The linearly separable case: hard-margin SVMs. Learning objectives

9. Support Vector Machines. The linearly separable case: hard-margin SVMs. The linearly separable case: hard-margin SVMs. Learning objectives Foundations of Machine Learning École Centrale Paris Fall 25 9. Support Vector Machines Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech Learning objectives chloe agathe.azencott@mines

More information

10. Support Vector Machines

10. Support Vector Machines Foundations of Machine Learning CentraleSupélec Fall 2017 10. Support Vector Machines Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning

More information

Kernels for Structured Data

Kernels for Structured Data T-122.102 Special Course in Information Science VI: Co-occurence methods in analysis of discrete data Kernels for Structured Data Based on article: A Survey of Kernels for Structured Data by Thomas Gärtner

More information

Mismatch String Kernels for SVM Protein Classification

Mismatch String Kernels for SVM Protein Classification Mismatch String Kernels for SVM Protein Classification by C. Leslie, E. Eskin, J. Weston, W.S. Noble Athina Spiliopoulou Morfoula Fragopoulou Ioannis Konstas Outline Definitions & Background Proteins Remote

More information

New String Kernels for Biosequence Data

New String Kernels for Biosequence Data Workshop on Kernel Methods in Bioinformatics New String Kernels for Biosequence Data Christina Leslie Department of Computer Science Columbia University Biological Sequence Classification Problems Protein

More information

Semi-supervised protein classification using cluster kernels

Semi-supervised protein classification using cluster kernels Semi-supervised protein classification using cluster kernels Jason Weston Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany weston@tuebingen.mpg.de Dengyong Zhou, Andre Elisseeff

More information

Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings

Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings 1 Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan 2 Outline 1. SVM and kernel

More information

A Weighted Kernel PCA Approach to Graph-Based Image Segmentation

A Weighted Kernel PCA Approach to Graph-Based Image Segmentation A Weighted Kernel PCA Approach to Graph-Based Image Segmentation Carlos Alzate Johan A. K. Suykens ESAT-SCD-SISTA Katholieke Universiteit Leuven Leuven, Belgium January 25, 2007 International Conference

More information

A Taxonomy of Semi-Supervised Learning Algorithms

A Taxonomy of Semi-Supervised Learning Algorithms A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph

More information

A Dendrogram. Bioinformatics (Lec 17)

A Dendrogram. Bioinformatics (Lec 17) A Dendrogram 3/15/05 1 Hierarchical Clustering [Johnson, SC, 1967] Given n points in R d, compute the distance between every pair of points While (not done) Pick closest pair of points s i and s j and

More information

The Weisfeiler-Lehman Kernel

The Weisfeiler-Lehman Kernel The Weisfeiler-Lehman Kernel Karsten Borgwardt and Nino Shervashidze Machine Learning and Computational Biology Research Group, Max Planck Institute for Biological Cybernetics and Max Planck Institute

More information

Profile-based String Kernels for Remote Homology Detection and Motif Extraction

Profile-based String Kernels for Remote Homology Detection and Motif Extraction Profile-based String Kernels for Remote Homology Detection and Motif Extraction Ray Kuang, Eugene Ie, Ke Wang, Kai Wang, Mahira Siddiqi, Yoav Freund and Christina Leslie. Department of Computer Science

More information

Fast Kernels for Inexact String Matching

Fast Kernels for Inexact String Matching Fast Kernels for Inexact String Matching Christina Leslie and Rui Kuang Columbia University, New York NY 10027, USA {rkuang,cleslie}@cs.columbia.edu Abstract. We introduce several new families of string

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14

More information

Non-linearity and spatial correlation in landslide susceptibility mapping

Non-linearity and spatial correlation in landslide susceptibility mapping Non-linearity and spatial correlation in landslide susceptibility mapping C. Ballabio, J. Blahut, S. Sterlacchini University of Milano-Bicocca GIT 2009 15/09/2009 1 Summary Landslide susceptibility modeling

More information

SVM cont d. Applications face detection [IEEE INTELLIGENT SYSTEMS]

SVM cont d. Applications face detection [IEEE INTELLIGENT SYSTEMS] SVM cont d A method of choice when examples are represented by vectors or matrices Input space cannot be readily used as attribute-vector (e.g. too many attrs) Kernel methods: map data from input space

More information

Learning with infinitely many features

Learning with infinitely many features Learning with infinitely many features R. Flamary, Joint work with A. Rakotomamonjy F. Yger, M. Volpi, M. Dalla Mura, D. Tuia Laboratoire Lagrange, Université de Nice Sophia Antipolis December 2012 Example

More information

Multiple Kernel Machines Using Localized Kernels

Multiple Kernel Machines Using Localized Kernels Multiple Kernel Machines Using Localized Kernels Mehmet Gönen and Ethem Alpaydın Department of Computer Engineering Boğaziçi University TR-3434, Bebek, İstanbul, Turkey gonen@boun.edu.tr alpaydin@boun.edu.tr

More information

6 Model selection and kernels

6 Model selection and kernels 6. Bias-Variance Dilemma Esercizio 6. While you fit a Linear Model to your data set. You are thinking about changing the Linear Model to a Quadratic one (i.e., a Linear Model with quadratic features φ(x)

More information

COMS 4771 Support Vector Machines. Nakul Verma

COMS 4771 Support Vector Machines. Nakul Verma COMS 4771 Support Vector Machines Nakul Verma Last time Decision boundaries for classification Linear decision boundary (linear classification) The Perceptron algorithm Mistake bound for the perceptron

More information

Text Classification and Clustering Using Kernels for Structured Data

Text Classification and Clustering Using Kernels for Structured Data Text Mining SVM Conclusion Text Classification and Clustering Using, pgeibel@uos.de DGFS Institut für Kognitionswissenschaft Universität Osnabrück February 2005 Outline Text Mining SVM Conclusion 1 Text

More information

Basis Functions. Volker Tresp Summer 2017

Basis Functions. Volker Tresp Summer 2017 Basis Functions Volker Tresp Summer 2017 1 Nonlinear Mappings and Nonlinear Classifiers Regression: Linearity is often a good assumption when many inputs influence the output Some natural laws are (approximately)

More information

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017 Data Analysis 3 Support Vector Machines Jan Platoš October 30, 2017 Department of Computer Science Faculty of Electrical Engineering and Computer Science VŠB - Technical University of Ostrava Table of

More information

One-class Problems and Outlier Detection. 陶卿 中国科学院自动化研究所

One-class Problems and Outlier Detection. 陶卿 中国科学院自动化研究所 One-class Problems and Outlier Detection 陶卿 Qing.tao@mail.ia.ac.cn 中国科学院自动化研究所 Application-driven Various kinds of detection problems: unexpected conditions in engineering; abnormalities in medical data,

More information

Introduction to Kernels (part II)Application to sequences p.1

Introduction to Kernels (part II)Application to sequences p.1 Introduction to Kernels (part II) Application to sequences Liva Ralaivola liva@ics.uci.edu School of Information and Computer Science Institute for Genomics and Bioinformatics Introduction to Kernels (part

More information

Sparse and large-scale learning with heterogeneous data

Sparse and large-scale learning with heterogeneous data Sparse and large-scale learning with heterogeneous data February 15, 2007 Gert Lanckriet (gert@ece.ucsd.edu) IEEE-SDCIS In this talk Statistical machine learning Techniques: roots in classical statistics

More information

Learning Kernels with Random Features

Learning Kernels with Random Features Learning Kernels with Random Features Aman Sinha John Duchi Stanford University NIPS, 2016 Presenter: Ritambhara Singh Outline 1 Introduction Motivation Background State-of-the-art 2 Proposed Approach

More information

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs) Data Mining: Concepts and Techniques Chapter 9 Classification: Support Vector Machines 1 Support Vector Machines (SVMs) SVMs are a set of related supervised learning methods used for classification Based

More information

Positive Definite Kernel Functions on Fuzzy Sets

Positive Definite Kernel Functions on Fuzzy Sets Positive Definite Kernel Functions on Fuzzy Sets FUZZ 2014 Jorge Guevara Díaz 1 Roberto Hirata Jr. 1 Stéphane Canu 2 1 Institute of Mathematics and Statistics University of Sao Paulo-Brazil 2 Institut

More information

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Data Matrices and Vector Space Model Denis Helic KTI, TU Graz Nov 6, 2014 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 1 / 55 Big picture: KDDM Probability

More information

Generative and discriminative classification

Generative and discriminative classification Generative and discriminative classification Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Classification in its simplest form Given training data labeled for two or more classes Classification

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Object Recognition 2015-2016 Jakob Verbeek, December 11, 2015 Course website: http://lear.inrialpes.fr/~verbeek/mlor.15.16 Classification

More information

LTI Thesis Defense: Riemannian Geometry and Statistical Machine Learning

LTI Thesis Defense: Riemannian Geometry and Statistical Machine Learning Outline LTI Thesis Defense: Riemannian Geometry and Statistical Machine Learning Guy Lebanon Motivation Introduction Previous Work Committee:John Lafferty Geoff Gordon, Michael I. Jordan, Larry Wasserman

More information

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Kernels + K-Means Matt Gormley Lecture 29 April 25, 2018 1 Reminders Homework 8:

More information

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017 CPSC 340: Machine Learning and Data Mining Kernel Trick Fall 2017 Admin Assignment 3: Due Friday. Midterm: Can view your exam during instructor office hours or after class this week. Digression: the other

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007,

More information

MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE

MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE REGULARIZATION METHODS FOR HIGH DIMENSIONAL LEARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 6, 2011 ABOUT THIS

More information

Learning Models of Similarity: Metric and Kernel Learning. Eric Heim, University of Pittsburgh

Learning Models of Similarity: Metric and Kernel Learning. Eric Heim, University of Pittsburgh Learning Models of Similarity: Metric and Kernel Learning Eric Heim, University of Pittsburgh Standard Machine Learning Pipeline Manually-Tuned Features Machine Learning Model Desired Output for Task Features

More information

An Introduction to Machine Learning

An Introduction to Machine Learning TRIPODS Summer Boorcamp: Topology and Machine Learning August 6, 2018 General Set-up Introduction Set-up and Goal Suppose we have X 1,X 2,...,X n data samples. Can we predict properites about any given

More information

Multiple cosegmentation

Multiple cosegmentation Armand Joulin, Francis Bach and Jean Ponce. INRIA -Ecole Normale Supérieure April 25, 2012 Segmentation Introduction Segmentation Supervised and weakly-supervised segmentation Cosegmentation Segmentation

More information

Bilevel Sparse Coding

Bilevel Sparse Coding Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional

More information

Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison*

Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison* Tracking Trends: Incorporating Term Volume into Temporal Topic Models Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison* Dept. of Computer Science and Engineering, Lehigh University, Bethlehem, PA,

More information

Discriminative Clustering for Image Co-segmentation

Discriminative Clustering for Image Co-segmentation Discriminative Clustering for Image Co-segmentation Armand Joulin Francis Bach Jean Ponce INRIA Ecole Normale Supérieure, Paris January 2010 Introduction Introduction Task: dividing simultaneously q images

More information

Computational Statistics and Mathematics for Cyber Security

Computational Statistics and Mathematics for Cyber Security and Mathematics for Cyber Security David J. Marchette Sept, 0 Acknowledgment: This work funded in part by the NSWC In-House Laboratory Independent Research (ILIR) program. NSWCDD-PN--00 Topics NSWCDD-PN--00

More information

Discriminative Clustering for Image Co-segmentation

Discriminative Clustering for Image Co-segmentation Discriminative Clustering for Image Co-segmentation Armand Joulin 1,2 Francis Bach 1,2 Jean Ponce 2 1 INRIA 2 Ecole Normale Supérieure January 15, 2010 Introduction Introduction problem: dividing simultaneously

More information

Mismatch String Kernels for SVM Protein Classification

Mismatch String Kernels for SVM Protein Classification Mismatch String Kernels for SVM Protein Classification Christina Leslie Department of Computer Science Columbia University cleslie@cs.columbia.edu Jason Weston Max-Planck Institute Tuebingen, Germany weston@tuebingen.mpg.de

More information

A Short SVM (Support Vector Machine) Tutorial

A Short SVM (Support Vector Machine) Tutorial A Short SVM (Support Vector Machine) Tutorial j.p.lewis CGIT Lab / IMSC U. Southern California version 0.zz dec 004 This tutorial assumes you are familiar with linear algebra and equality-constrained optimization/lagrange

More information

Preprocessing, Multiphase Inference, and Massive Data in Theory and Practice

Preprocessing, Multiphase Inference, and Massive Data in Theory and Practice Preprocessing, Multiphase Inference, and Massive Data in Theory and Practice Alexander W Blocker Department of Statistics Harvard University MMDS 2012 July 13, 2012 Joint work with Xiao-Li Meng Alexander

More information

Supervised graph inference

Supervised graph inference Supervised graph inference Jean-Philippe Vert Centre de Géostatistique Ecole des Mines de Paris 35 rue Saint-Honoré 773 Fontainebleau, France Jean-Philippe.Vert@mines.org Yoshihiro Yamanishi Bioinformatics

More information

BIOINFORMATICS. Mismatch string kernels for discriminative protein classification

BIOINFORMATICS. Mismatch string kernels for discriminative protein classification BIOINFORMATICS Vol. 1 no. 1 2003 Pages 1 10 Mismatch string kernels for discriminative protein classification Christina Leslie 1, Eleazar Eskin 1, Adiel Cohen 1, Jason Weston 2 and William Stafford Noble

More information

Packet Classification using Support Vector Machines with String Kernels

Packet Classification using Support Vector Machines with String Kernels RESEARCH ARTICLE Packet Classification using Support Vector Machines with String Kernels Sarthak Munshi *Department Of Computer Engineering, Pune Institute Of Computer Technology, Savitribai Phule Pune

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting

More information

Variable Selection 6.783, Biomedical Decision Support

Variable Selection 6.783, Biomedical Decision Support 6.783, Biomedical Decision Support (lrosasco@mit.edu) Department of Brain and Cognitive Science- MIT November 2, 2009 About this class Why selecting variables Approaches to variable selection Sparsity-based

More information

Basis Functions. Volker Tresp Summer 2016

Basis Functions. Volker Tresp Summer 2016 Basis Functions Volker Tresp Summer 2016 1 I am an AI optimist. We ve got a lot of work in machine learning, which is sort of the polite term for AI nowadays because it got so broad that it s not that

More information

Markov Networks in Computer Vision

Markov Networks in Computer Vision Markov Networks in Computer Vision Sargur Srihari srihari@cedar.buffalo.edu 1 Markov Networks for Computer Vision Some applications: 1. Image segmentation 2. Removal of blur/noise 3. Stereo reconstruction

More information

Improving the Performance of Text Categorization using N-gram Kernels

Improving the Performance of Text Categorization using N-gram Kernels Improving the Performance of Text Categorization using N-gram Kernels Varsha K. V *., Santhosh Kumar C., Reghu Raj P. C. * * Department of Computer Science and Engineering Govt. Engineering College, Palakkad,

More information

Markov Networks in Computer Vision. Sargur Srihari

Markov Networks in Computer Vision. Sargur Srihari Markov Networks in Computer Vision Sargur srihari@cedar.buffalo.edu 1 Markov Networks for Computer Vision Important application area for MNs 1. Image segmentation 2. Removal of blur/noise 3. Stereo reconstruction

More information

BLAST, Profile, and PSI-BLAST

BLAST, Profile, and PSI-BLAST BLAST, Profile, and PSI-BLAST Jianlin Cheng, PhD School of Electrical Engineering and Computer Science University of Central Florida 26 Free for academic use Copyright @ Jianlin Cheng & original sources

More information

Support vector machines

Support vector machines Support vector machines When the data is linearly separable, which of the many possible solutions should we prefer? SVM criterion: maximize the margin, or distance between the hyperplane and the closest

More information

Brief review from last class

Brief review from last class Sequence Alignment Brief review from last class DNA is has direction, we will use only one (5 -> 3 ) and generate the opposite strand as needed. DNA is a 3D object (see lecture 1) but we will model it

More information

Basics of Multiple Sequence Alignment

Basics of Multiple Sequence Alignment Basics of Multiple Sequence Alignment Tandy Warnow February 10, 2018 Basics of Multiple Sequence Alignment Tandy Warnow Basic issues What is a multiple sequence alignment? Evolutionary processes operating

More information

Support Vector Machines

Support Vector Machines Support Vector Machines About the Name... A Support Vector A training sample used to define classification boundaries in SVMs located near class boundaries Support Vector Machines Binary classifiers whose

More information

Protein Function Prediction by Integrating Multiple Kernels

Protein Function Prediction by Integrating Multiple Kernels Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Protein Function Prediction by Integrating Multiple Kernels Guoxian Yu,4, Huzefa Rangwala 2, Carlotta Domeniconi

More information

Thorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA

Thorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA Retrospective ICML99 Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA Outline The paper in

More information

Generalized Similarity Kernels for Efficient Sequence Classification

Generalized Similarity Kernels for Efficient Sequence Classification Generalized Similarity Kernels for Efficient Sequence Classification Pavel P. Kuksa NEC Laboratories America, Inc. Princeton, NJ 08540 pkuksa@nec-labs.com Imdadullah Khan Department of Computer Science

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

More information

Learning Hierarchies at Two-class Complexity

Learning Hierarchies at Two-class Complexity Learning Hierarchies at Two-class Complexity Sandor Szedmak ss03v@ecs.soton.ac.uk Craig Saunders cjs@ecs.soton.ac.uk John Shawe-Taylor jst@ecs.soton.ac.uk ISIS Group, Electronics and Computer Science University

More information

Application of Support Vector Machine In Bioinformatics

Application of Support Vector Machine In Bioinformatics Application of Support Vector Machine In Bioinformatics V. K. Jayaraman Scientific and Engineering Computing Group CDAC, Pune jayaramanv@cdac.in Arun Gupta Computational Biology Group AbhyudayaTech, Indore

More information

Learning Algorithms for Medical Image Analysis. Matteo Santoro slipguru

Learning Algorithms for Medical Image Analysis. Matteo Santoro slipguru Learning Algorithms for Medical Image Analysis Matteo Santoro slipguru santoro@disi.unige.it June 8, 2010 Outline 1. learning-based strategies for quantitative image analysis 2. automatic annotation of

More information

Bipartite Edge Prediction via Transductive Learning over Product Graphs

Bipartite Edge Prediction via Transductive Learning over Product Graphs Bipartite Edge Prediction via Transductive Learning over Product Graphs Hanxiao Liu, Yiming Yang School of Computer Science, Carnegie Mellon University July 8, 2015 ICML 2015 Bipartite Edge Prediction

More information

Kernel Methods & Support Vector Machines

Kernel Methods & Support Vector Machines & Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector

More information

Machine Learning: Think Big and Parallel

Machine Learning: Think Big and Parallel Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least

More information

Choosing the kernel parameters for SVMs by the inter-cluster distance in the feature space Authors: Kuo-Ping Wu, Sheng-De Wang Published 2008

Choosing the kernel parameters for SVMs by the inter-cluster distance in the feature space Authors: Kuo-Ping Wu, Sheng-De Wang Published 2008 Choosing the kernel parameters for SVMs by the inter-cluster distance in the feature space Authors: Kuo-Ping Wu, Sheng-De Wang Published 2008 Presented by: Nandini Deka UH Mathematics Spring 2014 Workshop

More information

1 Case study of SVM (Rob)

1 Case study of SVM (Rob) DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how

More information

Manifolds Pt. 1. PHYS Southern Illinois University. August 24, 2016

Manifolds Pt. 1. PHYS Southern Illinois University. August 24, 2016 Manifolds Pt. 1 PHYS 500 - Southern Illinois University August 24, 2016 PHYS 500 - Southern Illinois University Manifolds Pt. 1 August 24, 2016 1 / 20 Motivating Example: Pendulum Example Consider a pendulum

More information

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines A popular method for moving beyond linearity 2. Basis expansion and regularization 1 Idea: Augment the vector inputs x with additional variables which are transformation of x use linear models in this

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

Stephen Scott.

Stephen Scott. 1 / 33 sscott@cse.unl.edu 2 / 33 Start with a set of sequences In each column, residues are homolgous Residues occupy similar positions in 3D structure Residues diverge from a common ancestral residue

More information

Spectral Clustering of Biological Sequence Data

Spectral Clustering of Biological Sequence Data Spectral Clustering of Biological Sequence Data William Pentney Department of Computer Science and Engineering University of Washington bill@cs.washington.edu Marina Meila Department of Statistics University

More information

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio. 1990. CS 466 Saurabh Sinha Motivation Sequence homology to a known protein suggest function of newly sequenced protein Bioinformatics

More information

12 Classification using Support Vector Machines

12 Classification using Support Vector Machines 160 Bioinformatics I, WS 14/15, D. Huson, January 28, 2015 12 Classification using Support Vector Machines This lecture is based on the following sources, which are all recommended reading: F. Markowetz.

More information

Expectation Maximization!

Expectation Maximization! Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University and http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Steps in Clustering Select Features

More information

Support Vector Machines (SVM)

Support Vector Machines (SVM) Support Vector Machines a new classifier Attractive because (SVM) Has sound mathematical foundations Performs very well in diverse and difficult applications See paper placed on the class website Review

More information

All lecture slides will be available at CSC2515_Winter15.html

All lecture slides will be available at  CSC2515_Winter15.html CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many

More information

Machine Learning Classifiers and Boosting

Machine Learning Classifiers and Boosting Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve

More information

SpicyMKL Efficient multiple kernel learning method using dual augmented Lagrangian

SpicyMKL Efficient multiple kernel learning method using dual augmented Lagrangian SpicyMKL Efficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School of Information Science and Technology Department of

More information

Energy Minimization for Segmentation in Computer Vision

Energy Minimization for Segmentation in Computer Vision S * = arg S min E(S) Energy Minimization for Segmentation in Computer Vision Meng Tang, Dmitrii Marin, Ismail Ben Ayed, Yuri Boykov Outline Clustering/segmentation methods K-means, GrabCut, Normalized

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 20: 10/12/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology 9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example

More information

Ranking on Graph Data

Ranking on Graph Data Shivani Agarwal SHIVANI@CSAIL.MIT.EDU Computer Science and AI Laboratory, Massachusetts Institute of Technology, Cambridge, MA 0139, USA Abstract In ranking, one is given examples of order relationships

More information

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Introduction to object recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Overview Basic recognition tasks A statistical learning approach Traditional or shallow recognition

More information

Content-based image and video analysis. Machine learning

Content-based image and video analysis. Machine learning Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all

More information

k-means Clustering Todd W. Neller Gettysburg College Laura E. Brown Michigan Technological University

k-means Clustering Todd W. Neller Gettysburg College Laura E. Brown Michigan Technological University k-means Clustering Todd W. Neller Gettysburg College Laura E. Brown Michigan Technological University Outline Unsupervised versus Supervised Learning Clustering Problem k-means Clustering Algorithm Visual

More information

Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure

Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure BIOINFORMATICS ORIGINAL PAPER Vol. 22 no. 22 2006, pages 2753 2760 doi:10.1093/bioinformatics/btl475 Structural bioinformatics Support vector machine learning from heterogeneous data: an empirical analysis

More information

Iteratively Re-weighted Least Squares for Sums of Convex Functions

Iteratively Re-weighted Least Squares for Sums of Convex Functions Iteratively Re-weighted Least Squares for Sums of Convex Functions James Burke University of Washington Jiashan Wang LinkedIn Frank Curtis Lehigh University Hao Wang Shanghai Tech University Daiwei He

More information

Feature Selection in Learning Using Privileged Information

Feature Selection in Learning Using Privileged Information November 18, 2017 ICDM 2017 New Orleans Feature Selection in Learning Using Privileged Information Rauf Izmailov, Blerta Lindqvist, Peter Lin rizmailov@vencorelabs.com Phone: 908-748-2891 Agenda Learning

More information

APPROXIMATING PDE s IN L 1

APPROXIMATING PDE s IN L 1 APPROXIMATING PDE s IN L 1 Veselin Dobrev Jean-Luc Guermond Bojan Popov Department of Mathematics Texas A&M University NONLINEAR APPROXIMATION TECHNIQUES USING L 1 Texas A&M May 16-18, 2008 Outline 1 Outline

More information

Warped Mixture Models

Warped Mixture Models Warped Mixture Models Tomoharu Iwata, David Duvenaud, Zoubin Ghahramani Cambridge University Computational and Biological Learning Lab March 11, 2013 OUTLINE Motivation Gaussian Process Latent Variable

More information