Desiging and combining kernels: some lessons learned from bioinformatics
|
|
- Alban Watson
- 5 years ago
- Views:
Transcription
1 Desiging and combining kernels: some lessons learned from bioinformatics Jean-Philippe Vert Mines ParisTech & Institut Curie NIPS MKL workshop, Dec 12, Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 1 / 26
2 Kernels are very popular in bioinformatics Why? Many problems can be approached by kernels methods (classification, regression, feature construction,...) Many data with particular structures Kernel design Need to integrate heterogeneous data Kernel combination Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 2 / 26
3 Outline 1 Kernel design 2 Kernel combination 3 Conclusion Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 3 / 26
4 Outline 1 Kernel design 2 Kernel combination 3 Conclusion Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 4 / 26
5 What is a GOOD kernel? Leads to good performances Mathematically valid Fast to compute Interpretable model (?) X maskat... msises marssl... malhtv... mappsv... mahtlg... φ F Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 5 / 26
6 How to MAKE a good kernel? 3 main ideas 1 Define good features K (x, x ) = Φ(x) Φ(x ) 2 Define a good metric d(x, x ) = K (x, x) + K (x, x ) 2K (x, x ) 3 Define a good functional penalty { } min R(f ) + λ f 2 H f H Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 6 / 26
7 How to MAKE a good kernel? 3 main ideas 1 Define good features K (x, x ) = Φ(x) Φ(x ) 2 Define a good metric d(x, x ) = K (x, x) + K (x, x ) 2K (x, x ) 3 Define a good functional penalty { } min R(f ) + λ f 2 H f H Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 6 / 26
8 How to MAKE a good kernel? 3 main ideas 1 Define good features K (x, x ) = Φ(x) Φ(x ) 2 Define a good metric d(x, x ) = K (x, x) + K (x, x ) 2K (x, x ) 3 Define a good functional penalty { } min R(f ) + λ f 2 H f H Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 6 / 26
9 Idea 1: define good features Motivation Estimate a function f (x) = w Φ(x) A good feature is more important than a good algorithm! Examples Explicit feature computations substring or subgraph indexation Fisher kernel Φ(x) = θ log P θ (x) Implicit feature construction + kernel trick Walk-based graph kernels Mutual information kernels K (x, x ) = P θ (x)p θ (x )dθ Caveats One good feature among too many irrelevant ones may not be enough with L 2 regularization Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 7 / 26
10 Idea 1: define good features Motivation Estimate a function f (x) = w Φ(x) A good feature is more important than a good algorithm! Examples Explicit feature computations substring or subgraph indexation Fisher kernel Φ(x) = θ log P θ (x) Implicit feature construction + kernel trick Walk-based graph kernels Mutual information kernels K (x, x ) = P θ (x)p θ (x )dθ Caveats One good feature among too many irrelevant ones may not be enough with L 2 regularization Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 7 / 26
11 Idea 1: define good features Motivation Estimate a function f (x) = w Φ(x) A good feature is more important than a good algorithm! Examples Explicit feature computations substring or subgraph indexation Fisher kernel Φ(x) = θ log P θ (x) Implicit feature construction + kernel trick Walk-based graph kernels Mutual information kernels K (x, x ) = P θ (x)p θ (x )dθ Caveats One good feature among too many irrelevant ones may not be enough with L 2 regularization Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 7 / 26
12 Example: string kernel with substring indexation Index the feature space by fixed-length strings, i.e., where Φ u (x) can be: Φ (x) = (Φ u (x)) u A k the number of occurrences of u in x (without gaps) : spectrum kernel (Leslie et al., 2002) the number of occurrences of u in x up to m mismatches (without gaps) : mismatch kernel (Leslie et al., 2004) the number of occurrences of u in x allowing gaps, with a weight decaying exponentially with the number of gaps : substring kernel (Lohdi et al., 2002) Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 8 / 26
13 Idea 2: define a good metric Motivation A kernel defines a Hilbert metric d(x, x ) = K (x, x) + K (x, x ) 2K (x, x ) The functions we can learn are smooth w.r.t this metric f (x) f (x ) f H d(x, x ) Examples Edit distances for strings or graphs, local alignment of biological sequences, graph matching distances MAMMOTH distance between protein 3D structures Caveats Most "good" distances are not Hilbertian Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 9 / 26
14 Idea 2: define a good metric Motivation A kernel defines a Hilbert metric d(x, x ) = K (x, x) + K (x, x ) 2K (x, x ) The functions we can learn are smooth w.r.t this metric f (x) f (x ) f H d(x, x ) Examples Edit distances for strings or graphs, local alignment of biological sequences, graph matching distances MAMMOTH distance between protein 3D structures Caveats Most "good" distances are not Hilbertian Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 9 / 26
15 Idea 2: define a good metric Motivation A kernel defines a Hilbert metric d(x, x ) = K (x, x) + K (x, x ) 2K (x, x ) The functions we can learn are smooth w.r.t this metric f (x) f (x ) f H d(x, x ) Examples Edit distances for strings or graphs, local alignment of biological sequences, graph matching distances MAMMOTH distance between protein 3D structures Caveats Most "good" distances are not Hilbertian Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop 09 9 / 26
16 Example: local alignment kernel How to compare 2 protein sequences? x 1 = CGGSLIAMMWFGV x 2 = CLIVMMNRLMWFGV Find a good alignment π: Two non-hilbertian metrics K (β) LA CGGSLIAMM----WFGV C---LIVMMNRLMWFGV SW (x, y) := max s(π). π Π(x,y) (x, y) = log π Π(x,y) exp (βs (x, y, π)), Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26
17 Example: local alignment kernel How to compare 2 protein sequences? x 1 = CGGSLIAMMWFGV x 2 = CLIVMMNRLMWFGV Find a good alignment π: Two non-hilbertian metrics K (β) LA CGGSLIAMM----WFGV C---LIVMMNRLMWFGV SW (x, y) := max s(π). π Π(x,y) (x, y) = log π Π(x,y) exp (βs (x, y, π)), Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26
18 Idea 3: define a good penalty function Motivation The kernel constrains the set of functions over which we optimize (balls in RKHS). We may first define a penalty we like, then find the associated kernel. Examples Caveats graph Laplacian over gene networks cluster kernel for protein remote homology detection Some penalties may not be RKHS norms (eg, total variation to estimate piecewise constant functions) Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26
19 Idea 3: define a good penalty function Motivation The kernel constrains the set of functions over which we optimize (balls in RKHS). We may first define a penalty we like, then find the associated kernel. Examples Caveats graph Laplacian over gene networks cluster kernel for protein remote homology detection Some penalties may not be RKHS norms (eg, total variation to estimate piecewise constant functions) Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26
20 Idea 3: define a good penalty function Motivation The kernel constrains the set of functions over which we optimize (balls in RKHS). We may first define a penalty we like, then find the associated kernel. Examples Caveats graph Laplacian over gene networks cluster kernel for protein remote homology detection Some penalties may not be RKHS norms (eg, total variation to estimate piecewise constant functions) Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26
21 Example : Kernel on a graph φ Laplacian-based kernel The set H = { f R m : m i=1 f i = 0 } endowed with the norm: Ω (f ) = i j ( f (xi ) f ( x j )) 2 is a RKHS whose reproducing kernel is the pseudo-inverse of the graph Laplacian. Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26
22 The choice of kernel makes a difference No. of families with given performance SVM-LA SVM-pairwise SVM-Mismatch SVM-Fisher ROC50 Performance on the SCOP superfamily recognition benchmark. Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26
23 Outline 1 Kernel design 2 Kernel combination 3 Conclusion Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26
24 Motivation We can imagine plenty of kernels for a given application different kernels for the same data (e.g., different string kernels) kernels for different types of data (e.g., integrating string and 3D structures for protein classification) Which one to use? Perhaps we can combine them to make better than each one individually? Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26
25 Sum kernels Consider p kernels K 1,..., K p Form the sum (eg, Pavlidis et al., 2002): K = p K i. i=1 Equivalently, concatenate the features of the different kernels Equivalently, work in the RKHS H = H 1... H p with f 2 H = inf f =f f p p f i 2 H i. i=1 Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26
26 Some early work Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26
27 Huge improvements can be observed 1 1 Ratio of true positives exp loc 0.2 phy y2h int Ratio of false positives Ratio of true positives exp 0.2 loc phy int Ratio of false positives Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26
28 Multiple kernel learning (MKL) Form the convex combination: K = p η i K i. i=1 where the weights are chosen to minimize the following convex function under the constraint tr(k ) = 1 (Lanckriet et al., 2003): h(k ) = inf f H K {R(f ) + λ f HK } Equivalently, work in the space H = H H p with non-hilbertian group L 1 norm (Bach et al., 2004): f H = inf f =f f p p f i Hi. i=1 Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26
29 Example: Lanckriet et al. (2004) Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26
30 MKL or sum kernel for protein network inference? Precision K1 expression K2 localization K3 phylogenetic Recall Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26
31 MKL or sum kernel for protein network inference? Precision K1 expression K2 localization K3 phylogenetic Sum Recall Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26
32 MKL or sum kernel for protein network inference? Precision K1 expression K2 localization K3 phylogenetic Sum MKL Recall Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26
33 Why MKL does not estimate a good kernel combination Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26
34 Why MKL does not estimate a good kernel combination Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26
35 Why MKL does not estimate a good kernel combination Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26
36 Why MKL does not estimate a good kernel combination Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26
37 Sometimes MKL works Subcellular protein classficiation from 69 kernels Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26
38 MKL or sum kernel? Sum is simpler and works better to combine well-engineered kernels (eg, for data integration). In spite of its misleading name, MKL is better suited for kernel selection than for weight optimization (l 2 vs l 1 ). Useful to select among large sets of kernels. We would love to be able to select the "optimal" linear combination of a few kernels Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26
39 Outline 1 Kernel design 2 Kernel combination 3 Conclusion Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26
40 Conclusion Are kernels popular and useful in bioinformatics? YES Is kernel design useful? YES, and we have many tricks for that Is kernels combination useful for performance? YES, and the sum kernel does a good job Is MKL useful? Hardly yet, but it offers the promising possibility to work with MANY kernels and emphasize INTERPRETABILITY Do we want to learn good linear combinations? YES Jean-Philippe Vert (ParisTech) Designing and combining kernels NIPS MKL workshop / 26
Classification of biological sequences with kernel methods
Classification of biological sequences with kernel methods Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Centre for Computational Biology Ecole des Mines de Paris, ParisTech International Conference on
More information9. Support Vector Machines. The linearly separable case: hard-margin SVMs. The linearly separable case: hard-margin SVMs. Learning objectives
Foundations of Machine Learning École Centrale Paris Fall 25 9. Support Vector Machines Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech Learning objectives chloe agathe.azencott@mines
More information10. Support Vector Machines
Foundations of Machine Learning CentraleSupélec Fall 2017 10. Support Vector Machines Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning
More informationKernels for Structured Data
T-122.102 Special Course in Information Science VI: Co-occurence methods in analysis of discrete data Kernels for Structured Data Based on article: A Survey of Kernels for Structured Data by Thomas Gärtner
More informationMismatch String Kernels for SVM Protein Classification
Mismatch String Kernels for SVM Protein Classification by C. Leslie, E. Eskin, J. Weston, W.S. Noble Athina Spiliopoulou Morfoula Fragopoulou Ioannis Konstas Outline Definitions & Background Proteins Remote
More informationNew String Kernels for Biosequence Data
Workshop on Kernel Methods in Bioinformatics New String Kernels for Biosequence Data Christina Leslie Department of Computer Science Columbia University Biological Sequence Classification Problems Protein
More informationSemi-supervised protein classification using cluster kernels
Semi-supervised protein classification using cluster kernels Jason Weston Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany weston@tuebingen.mpg.de Dengyong Zhou, Andre Elisseeff
More informationSupport vector machine prediction of signal peptide cleavage site using a new class of kernels for strings
1 Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan 2 Outline 1. SVM and kernel
More informationA Weighted Kernel PCA Approach to Graph-Based Image Segmentation
A Weighted Kernel PCA Approach to Graph-Based Image Segmentation Carlos Alzate Johan A. K. Suykens ESAT-SCD-SISTA Katholieke Universiteit Leuven Leuven, Belgium January 25, 2007 International Conference
More informationA Taxonomy of Semi-Supervised Learning Algorithms
A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph
More informationA Dendrogram. Bioinformatics (Lec 17)
A Dendrogram 3/15/05 1 Hierarchical Clustering [Johnson, SC, 1967] Given n points in R d, compute the distance between every pair of points While (not done) Pick closest pair of points s i and s j and
More informationThe Weisfeiler-Lehman Kernel
The Weisfeiler-Lehman Kernel Karsten Borgwardt and Nino Shervashidze Machine Learning and Computational Biology Research Group, Max Planck Institute for Biological Cybernetics and Max Planck Institute
More informationProfile-based String Kernels for Remote Homology Detection and Motif Extraction
Profile-based String Kernels for Remote Homology Detection and Motif Extraction Ray Kuang, Eugene Ie, Ke Wang, Kai Wang, Mahira Siddiqi, Yoav Freund and Christina Leslie. Department of Computer Science
More informationFast Kernels for Inexact String Matching
Fast Kernels for Inexact String Matching Christina Leslie and Rui Kuang Columbia University, New York NY 10027, USA {rkuang,cleslie}@cs.columbia.edu Abstract. We introduce several new families of string
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14
More informationNon-linearity and spatial correlation in landslide susceptibility mapping
Non-linearity and spatial correlation in landslide susceptibility mapping C. Ballabio, J. Blahut, S. Sterlacchini University of Milano-Bicocca GIT 2009 15/09/2009 1 Summary Landslide susceptibility modeling
More informationSVM cont d. Applications face detection [IEEE INTELLIGENT SYSTEMS]
SVM cont d A method of choice when examples are represented by vectors or matrices Input space cannot be readily used as attribute-vector (e.g. too many attrs) Kernel methods: map data from input space
More informationLearning with infinitely many features
Learning with infinitely many features R. Flamary, Joint work with A. Rakotomamonjy F. Yger, M. Volpi, M. Dalla Mura, D. Tuia Laboratoire Lagrange, Université de Nice Sophia Antipolis December 2012 Example
More informationMultiple Kernel Machines Using Localized Kernels
Multiple Kernel Machines Using Localized Kernels Mehmet Gönen and Ethem Alpaydın Department of Computer Engineering Boğaziçi University TR-3434, Bebek, İstanbul, Turkey gonen@boun.edu.tr alpaydin@boun.edu.tr
More information6 Model selection and kernels
6. Bias-Variance Dilemma Esercizio 6. While you fit a Linear Model to your data set. You are thinking about changing the Linear Model to a Quadratic one (i.e., a Linear Model with quadratic features φ(x)
More informationCOMS 4771 Support Vector Machines. Nakul Verma
COMS 4771 Support Vector Machines Nakul Verma Last time Decision boundaries for classification Linear decision boundary (linear classification) The Perceptron algorithm Mistake bound for the perceptron
More informationText Classification and Clustering Using Kernels for Structured Data
Text Mining SVM Conclusion Text Classification and Clustering Using, pgeibel@uos.de DGFS Institut für Kognitionswissenschaft Universität Osnabrück February 2005 Outline Text Mining SVM Conclusion 1 Text
More informationBasis Functions. Volker Tresp Summer 2017
Basis Functions Volker Tresp Summer 2017 1 Nonlinear Mappings and Nonlinear Classifiers Regression: Linearity is often a good assumption when many inputs influence the output Some natural laws are (approximately)
More informationData Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017
Data Analysis 3 Support Vector Machines Jan Platoš October 30, 2017 Department of Computer Science Faculty of Electrical Engineering and Computer Science VŠB - Technical University of Ostrava Table of
More informationOne-class Problems and Outlier Detection. 陶卿 中国科学院自动化研究所
One-class Problems and Outlier Detection 陶卿 Qing.tao@mail.ia.ac.cn 中国科学院自动化研究所 Application-driven Various kinds of detection problems: unexpected conditions in engineering; abnormalities in medical data,
More informationIntroduction to Kernels (part II)Application to sequences p.1
Introduction to Kernels (part II) Application to sequences Liva Ralaivola liva@ics.uci.edu School of Information and Computer Science Institute for Genomics and Bioinformatics Introduction to Kernels (part
More informationSparse and large-scale learning with heterogeneous data
Sparse and large-scale learning with heterogeneous data February 15, 2007 Gert Lanckriet (gert@ece.ucsd.edu) IEEE-SDCIS In this talk Statistical machine learning Techniques: roots in classical statistics
More informationLearning Kernels with Random Features
Learning Kernels with Random Features Aman Sinha John Duchi Stanford University NIPS, 2016 Presenter: Ritambhara Singh Outline 1 Introduction Motivation Background State-of-the-art 2 Proposed Approach
More informationData Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)
Data Mining: Concepts and Techniques Chapter 9 Classification: Support Vector Machines 1 Support Vector Machines (SVMs) SVMs are a set of related supervised learning methods used for classification Based
More informationPositive Definite Kernel Functions on Fuzzy Sets
Positive Definite Kernel Functions on Fuzzy Sets FUZZ 2014 Jorge Guevara Díaz 1 Roberto Hirata Jr. 1 Stéphane Canu 2 1 Institute of Mathematics and Statistics University of Sao Paulo-Brazil 2 Institut
More informationKnowledge Discovery and Data Mining 1 (VO) ( )
Knowledge Discovery and Data Mining 1 (VO) (707.003) Data Matrices and Vector Space Model Denis Helic KTI, TU Graz Nov 6, 2014 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 1 / 55 Big picture: KDDM Probability
More informationGenerative and discriminative classification
Generative and discriminative classification Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Classification in its simplest form Given training data labeled for two or more classes Classification
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Object Recognition 2015-2016 Jakob Verbeek, December 11, 2015 Course website: http://lear.inrialpes.fr/~verbeek/mlor.15.16 Classification
More informationLTI Thesis Defense: Riemannian Geometry and Statistical Machine Learning
Outline LTI Thesis Defense: Riemannian Geometry and Statistical Machine Learning Guy Lebanon Motivation Introduction Previous Work Committee:John Lafferty Geoff Gordon, Michael I. Jordan, Larry Wasserman
More informationKernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Kernels + K-Means Matt Gormley Lecture 29 April 25, 2018 1 Reminders Homework 8:
More informationCPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017
CPSC 340: Machine Learning and Data Mining Kernel Trick Fall 2017 Admin Assignment 3: Due Friday. Midterm: Can view your exam during instructor office hours or after class this week. Digression: the other
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007,
More informationMODEL SELECTION AND REGULARIZATION PARAMETER CHOICE
MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE REGULARIZATION METHODS FOR HIGH DIMENSIONAL LEARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 6, 2011 ABOUT THIS
More informationLearning Models of Similarity: Metric and Kernel Learning. Eric Heim, University of Pittsburgh
Learning Models of Similarity: Metric and Kernel Learning Eric Heim, University of Pittsburgh Standard Machine Learning Pipeline Manually-Tuned Features Machine Learning Model Desired Output for Task Features
More informationAn Introduction to Machine Learning
TRIPODS Summer Boorcamp: Topology and Machine Learning August 6, 2018 General Set-up Introduction Set-up and Goal Suppose we have X 1,X 2,...,X n data samples. Can we predict properites about any given
More informationMultiple cosegmentation
Armand Joulin, Francis Bach and Jean Ponce. INRIA -Ecole Normale Supérieure April 25, 2012 Segmentation Introduction Segmentation Supervised and weakly-supervised segmentation Cosegmentation Segmentation
More informationBilevel Sparse Coding
Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional
More informationLiangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison*
Tracking Trends: Incorporating Term Volume into Temporal Topic Models Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison* Dept. of Computer Science and Engineering, Lehigh University, Bethlehem, PA,
More informationDiscriminative Clustering for Image Co-segmentation
Discriminative Clustering for Image Co-segmentation Armand Joulin Francis Bach Jean Ponce INRIA Ecole Normale Supérieure, Paris January 2010 Introduction Introduction Task: dividing simultaneously q images
More informationComputational Statistics and Mathematics for Cyber Security
and Mathematics for Cyber Security David J. Marchette Sept, 0 Acknowledgment: This work funded in part by the NSWC In-House Laboratory Independent Research (ILIR) program. NSWCDD-PN--00 Topics NSWCDD-PN--00
More informationDiscriminative Clustering for Image Co-segmentation
Discriminative Clustering for Image Co-segmentation Armand Joulin 1,2 Francis Bach 1,2 Jean Ponce 2 1 INRIA 2 Ecole Normale Supérieure January 15, 2010 Introduction Introduction problem: dividing simultaneously
More informationMismatch String Kernels for SVM Protein Classification
Mismatch String Kernels for SVM Protein Classification Christina Leslie Department of Computer Science Columbia University cleslie@cs.columbia.edu Jason Weston Max-Planck Institute Tuebingen, Germany weston@tuebingen.mpg.de
More informationA Short SVM (Support Vector Machine) Tutorial
A Short SVM (Support Vector Machine) Tutorial j.p.lewis CGIT Lab / IMSC U. Southern California version 0.zz dec 004 This tutorial assumes you are familiar with linear algebra and equality-constrained optimization/lagrange
More informationPreprocessing, Multiphase Inference, and Massive Data in Theory and Practice
Preprocessing, Multiphase Inference, and Massive Data in Theory and Practice Alexander W Blocker Department of Statistics Harvard University MMDS 2012 July 13, 2012 Joint work with Xiao-Li Meng Alexander
More informationSupervised graph inference
Supervised graph inference Jean-Philippe Vert Centre de Géostatistique Ecole des Mines de Paris 35 rue Saint-Honoré 773 Fontainebleau, France Jean-Philippe.Vert@mines.org Yoshihiro Yamanishi Bioinformatics
More informationBIOINFORMATICS. Mismatch string kernels for discriminative protein classification
BIOINFORMATICS Vol. 1 no. 1 2003 Pages 1 10 Mismatch string kernels for discriminative protein classification Christina Leslie 1, Eleazar Eskin 1, Adiel Cohen 1, Jason Weston 2 and William Stafford Noble
More informationPacket Classification using Support Vector Machines with String Kernels
RESEARCH ARTICLE Packet Classification using Support Vector Machines with String Kernels Sarthak Munshi *Department Of Computer Engineering, Pune Institute Of Computer Technology, Savitribai Phule Pune
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationVariable Selection 6.783, Biomedical Decision Support
6.783, Biomedical Decision Support (lrosasco@mit.edu) Department of Brain and Cognitive Science- MIT November 2, 2009 About this class Why selecting variables Approaches to variable selection Sparsity-based
More informationBasis Functions. Volker Tresp Summer 2016
Basis Functions Volker Tresp Summer 2016 1 I am an AI optimist. We ve got a lot of work in machine learning, which is sort of the polite term for AI nowadays because it got so broad that it s not that
More informationMarkov Networks in Computer Vision
Markov Networks in Computer Vision Sargur Srihari srihari@cedar.buffalo.edu 1 Markov Networks for Computer Vision Some applications: 1. Image segmentation 2. Removal of blur/noise 3. Stereo reconstruction
More informationImproving the Performance of Text Categorization using N-gram Kernels
Improving the Performance of Text Categorization using N-gram Kernels Varsha K. V *., Santhosh Kumar C., Reghu Raj P. C. * * Department of Computer Science and Engineering Govt. Engineering College, Palakkad,
More informationMarkov Networks in Computer Vision. Sargur Srihari
Markov Networks in Computer Vision Sargur srihari@cedar.buffalo.edu 1 Markov Networks for Computer Vision Important application area for MNs 1. Image segmentation 2. Removal of blur/noise 3. Stereo reconstruction
More informationBLAST, Profile, and PSI-BLAST
BLAST, Profile, and PSI-BLAST Jianlin Cheng, PhD School of Electrical Engineering and Computer Science University of Central Florida 26 Free for academic use Copyright @ Jianlin Cheng & original sources
More informationSupport vector machines
Support vector machines When the data is linearly separable, which of the many possible solutions should we prefer? SVM criterion: maximize the margin, or distance between the hyperplane and the closest
More informationBrief review from last class
Sequence Alignment Brief review from last class DNA is has direction, we will use only one (5 -> 3 ) and generate the opposite strand as needed. DNA is a 3D object (see lecture 1) but we will model it
More informationBasics of Multiple Sequence Alignment
Basics of Multiple Sequence Alignment Tandy Warnow February 10, 2018 Basics of Multiple Sequence Alignment Tandy Warnow Basic issues What is a multiple sequence alignment? Evolutionary processes operating
More informationSupport Vector Machines
Support Vector Machines About the Name... A Support Vector A training sample used to define classification boundaries in SVMs located near class boundaries Support Vector Machines Binary classifiers whose
More informationProtein Function Prediction by Integrating Multiple Kernels
Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Protein Function Prediction by Integrating Multiple Kernels Guoxian Yu,4, Huzefa Rangwala 2, Carlotta Domeniconi
More informationThorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA
Retrospective ICML99 Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA Outline The paper in
More informationGeneralized Similarity Kernels for Efficient Sequence Classification
Generalized Similarity Kernels for Efficient Sequence Classification Pavel P. Kuksa NEC Laboratories America, Inc. Princeton, NJ 08540 pkuksa@nec-labs.com Imdadullah Khan Department of Computer Science
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu
More informationLearning Hierarchies at Two-class Complexity
Learning Hierarchies at Two-class Complexity Sandor Szedmak ss03v@ecs.soton.ac.uk Craig Saunders cjs@ecs.soton.ac.uk John Shawe-Taylor jst@ecs.soton.ac.uk ISIS Group, Electronics and Computer Science University
More informationApplication of Support Vector Machine In Bioinformatics
Application of Support Vector Machine In Bioinformatics V. K. Jayaraman Scientific and Engineering Computing Group CDAC, Pune jayaramanv@cdac.in Arun Gupta Computational Biology Group AbhyudayaTech, Indore
More informationLearning Algorithms for Medical Image Analysis. Matteo Santoro slipguru
Learning Algorithms for Medical Image Analysis Matteo Santoro slipguru santoro@disi.unige.it June 8, 2010 Outline 1. learning-based strategies for quantitative image analysis 2. automatic annotation of
More informationBipartite Edge Prediction via Transductive Learning over Product Graphs
Bipartite Edge Prediction via Transductive Learning over Product Graphs Hanxiao Liu, Yiming Yang School of Computer Science, Carnegie Mellon University July 8, 2015 ICML 2015 Bipartite Edge Prediction
More informationKernel Methods & Support Vector Machines
& Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector
More informationMachine Learning: Think Big and Parallel
Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least
More informationChoosing the kernel parameters for SVMs by the inter-cluster distance in the feature space Authors: Kuo-Ping Wu, Sheng-De Wang Published 2008
Choosing the kernel parameters for SVMs by the inter-cluster distance in the feature space Authors: Kuo-Ping Wu, Sheng-De Wang Published 2008 Presented by: Nandini Deka UH Mathematics Spring 2014 Workshop
More information1 Case study of SVM (Rob)
DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how
More informationManifolds Pt. 1. PHYS Southern Illinois University. August 24, 2016
Manifolds Pt. 1 PHYS 500 - Southern Illinois University August 24, 2016 PHYS 500 - Southern Illinois University Manifolds Pt. 1 August 24, 2016 1 / 20 Motivating Example: Pendulum Example Consider a pendulum
More informationA popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines
A popular method for moving beyond linearity 2. Basis expansion and regularization 1 Idea: Augment the vector inputs x with additional variables which are transformation of x use linear models in this
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationStephen Scott.
1 / 33 sscott@cse.unl.edu 2 / 33 Start with a set of sequences In each column, residues are homolgous Residues occupy similar positions in 3D structure Residues diverge from a common ancestral residue
More informationSpectral Clustering of Biological Sequence Data
Spectral Clustering of Biological Sequence Data William Pentney Department of Computer Science and Engineering University of Washington bill@cs.washington.edu Marina Meila Department of Statistics University
More informationBLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio. 1990. CS 466 Saurabh Sinha Motivation Sequence homology to a known protein suggest function of newly sequenced protein Bioinformatics
More information12 Classification using Support Vector Machines
160 Bioinformatics I, WS 14/15, D. Huson, January 28, 2015 12 Classification using Support Vector Machines This lecture is based on the following sources, which are all recommended reading: F. Markowetz.
More informationExpectation Maximization!
Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University and http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Steps in Clustering Select Features
More informationSupport Vector Machines (SVM)
Support Vector Machines a new classifier Attractive because (SVM) Has sound mathematical foundations Performs very well in diverse and difficult applications See paper placed on the class website Review
More informationAll lecture slides will be available at CSC2515_Winter15.html
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More informationSpicyMKL Efficient multiple kernel learning method using dual augmented Lagrangian
SpicyMKL Efficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School of Information Science and Technology Department of
More informationEnergy Minimization for Segmentation in Computer Vision
S * = arg S min E(S) Energy Minimization for Segmentation in Computer Vision Meng Tang, Dmitrii Marin, Ismail Ben Ayed, Yuri Boykov Outline Clustering/segmentation methods K-means, GrabCut, Normalized
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 20: 10/12/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter
More information9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology
9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example
More informationRanking on Graph Data
Shivani Agarwal SHIVANI@CSAIL.MIT.EDU Computer Science and AI Laboratory, Massachusetts Institute of Technology, Cambridge, MA 0139, USA Abstract In ranking, one is given examples of order relationships
More informationIntroduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others
Introduction to object recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Overview Basic recognition tasks A statistical learning approach Traditional or shallow recognition
More informationContent-based image and video analysis. Machine learning
Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all
More informationk-means Clustering Todd W. Neller Gettysburg College Laura E. Brown Michigan Technological University
k-means Clustering Todd W. Neller Gettysburg College Laura E. Brown Michigan Technological University Outline Unsupervised versus Supervised Learning Clustering Problem k-means Clustering Algorithm Visual
More informationSupport vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure
BIOINFORMATICS ORIGINAL PAPER Vol. 22 no. 22 2006, pages 2753 2760 doi:10.1093/bioinformatics/btl475 Structural bioinformatics Support vector machine learning from heterogeneous data: an empirical analysis
More informationIteratively Re-weighted Least Squares for Sums of Convex Functions
Iteratively Re-weighted Least Squares for Sums of Convex Functions James Burke University of Washington Jiashan Wang LinkedIn Frank Curtis Lehigh University Hao Wang Shanghai Tech University Daiwei He
More informationFeature Selection in Learning Using Privileged Information
November 18, 2017 ICDM 2017 New Orleans Feature Selection in Learning Using Privileged Information Rauf Izmailov, Blerta Lindqvist, Peter Lin rizmailov@vencorelabs.com Phone: 908-748-2891 Agenda Learning
More informationAPPROXIMATING PDE s IN L 1
APPROXIMATING PDE s IN L 1 Veselin Dobrev Jean-Luc Guermond Bojan Popov Department of Mathematics Texas A&M University NONLINEAR APPROXIMATION TECHNIQUES USING L 1 Texas A&M May 16-18, 2008 Outline 1 Outline
More informationWarped Mixture Models
Warped Mixture Models Tomoharu Iwata, David Duvenaud, Zoubin Ghahramani Cambridge University Computational and Biological Learning Lab March 11, 2013 OUTLINE Motivation Gaussian Process Latent Variable
More information