Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings

Similar documents
Application of Support Vector Machine In Bioinformatics

Profile-based String Kernels for Remote Homology Detection and Motif Extraction

SUPPORT VECTOR MACHINES

9. Support Vector Machines. The linearly separable case: hard-margin SVMs. The linearly separable case: hard-margin SVMs. Learning objectives

SVM cont d. Applications face detection [IEEE INTELLIGENT SYSTEMS]

Kernel Methods & Support Vector Machines

New String Kernels for Biosequence Data

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017

Kernels for Structured Data

Support Vector Machines.

SUPPORT VECTOR MACHINES

Desiging and combining kernels: some lessons learned from bioinformatics

Mathematical Themes in Economics, Machine Learning, and Bioinformatics

LECTURE 5: DUAL PROBLEMS AND KERNELS. * Most of the slides in this lecture are from

Evaluation of different biological data and computational classification methods for use in protein interaction prediction.

Tutorial on Machine Learning Tools

Chakra Chennubhotla and David Koes

Classification of biological sequences with kernel methods

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

Choosing the kernel parameters for SVMs by the inter-cluster distance in the feature space Authors: Kuo-Ping Wu, Sheng-De Wang Published 2008

Support vector machine (II): non-linear SVM. LING 572 Fei Xia

SVM Classification in -Arrays

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

Lecture 9: Support Vector Machines

10. Support Vector Machines

Machine Learning for. Artem Lind & Aleskandr Tkachenko

Generative and discriminative classification techniques

Introduction to Kernels (part II)Application to sequences p.1

Support vector machines

Support Vector Machines

Content-based image and video analysis. Machine learning

CSE 417T: Introduction to Machine Learning. Lecture 22: The Kernel Trick. Henry Chai 11/15/18

Support Vector Machines

Support Vector Machines (SVM)

All lecture slides will be available at CSC2515_Winter15.html

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Sparse and large-scale learning with heterogeneous data

12 Classification using Support Vector Machines

Mismatch String Kernels for SVM Protein Classification

Search Engines. Information Retrieval in Practice

Classification by Support Vector Machines

Supervised vs unsupervised clustering

6 Model selection and kernels

CS249: ADVANCED DATA MINING

Non-linearity and spatial correlation in landslide susceptibility mapping

Machine Learning. Classification

Classification by Support Vector Machines

CS145: INTRODUCTION TO DATA MINING

Support Vector Machines

CS 559: Machine Learning Fundamentals and Applications 9 th Set of Notes

Machine Learning for NLP

Supervised Learning Classification Algorithms Comparison

Unsupervised Learning

THE SPECTRUM KERNEL: A STRING KERNEL FOR SVM PROTEIN CLASSIFICATION

Table of Contents. Recognition of Facial Gestures... 1 Attila Fazekas

Support Vector Machines

Predicting Popular Xbox games based on Search Queries of Users

Data Mining in Bioinformatics Day 1: Classification

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Time series, HMMs, Kalman Filters

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Support vector machines. Dominik Wisniewski Wojciech Wawrzyniak

A Dendrogram. Bioinformatics (Lec 17)

Performance Measures

Adaptive Learning of an Accurate Skin-Color Model

May 1, CODY, Error Backpropagation, Bischop 5.3, and Support Vector Machines (SVM) Bishop Ch 7. May 3, Class HW SVM, PCA, and K-means, Bishop Ch

Machine Learning. Supervised Learning. Manfred Huber

The Weisfeiler-Lehman Kernel

A Practical Guide to Support Vector Classification

Weka ( )

Text Classification and Clustering Using Kernels for Structured Data

Support Vector Machines + Classification for IR

Introduction to Machine Learning

Support Vector Machines

Neural Networks and Deep Learning

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

Artificial Neural Networks (Feedforward Nets)

Transductive Learning: Motivation, Model, Algorithms

Laplacian Eigenmaps and Bayesian Clustering Based Layout Pattern Sampling and Its Applications to Hotspot Detection and OPC

CS 229 Midterm Review

Posture detection by kernel PCA-based manifold learning

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Machine Learning based session drop prediction in LTE networks and its SON aspects

Multivariate Data Analysis and Machine Learning in High Energy Physics (V)

Support vector machines

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Computerlinguistische Anwendungen Support Vector Machines

To earn the extra credit, one of the following has to hold true. Please circle and sign.

Naïve Bayes for text classification

Perceptron Introduction to Machine Learning. Matt Gormley Lecture 5 Jan. 31, 2018

Information Management course

Network Traffic Measurements and Analysis

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

Evaluating Classifiers

Classication of Corporate and Public Text

Module 4. Non-linear machine learning econometrics: Support Vector Machine

Introduction The problem of cancer classication has clear implications on cancer treatment. Additionally, the advent of DNA microarrays introduces a w

Introduction to Support Vector Machines

Transcription:

1 Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan

2 Outline 1. SVM and kernel methods 2. New kernels for bioinformatics 3. Example: signal peptide cleavage site prediction

3 Part 1 SVM and kernel methods

4 Support vector machines φ Objects to classified x mapped to a feature space Largest margin separating hyperplan in the feature space

5 The kernel trick Implicit definition of x Φ(x) through the kernel: K(x, y) def = < Φ(x), Φ(y) >

5 The kernel trick Implicit definition of x Φ(x) through the kernel: K(x, y) def = < Φ(x), Φ(y) > Simple kernels can represent complex Φ

5 The kernel trick Implicit definition of x Φ(x) through the kernel: K(x, y) def = < Φ(x), Φ(y) > Simple kernels can represent complex Φ For a given kernel, not only SVM but also clustering, PCA, ICA... possible in the feature space = kernel methods

6 Kernel examples Classical kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors

6 Kernel examples Classical kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors Exotic kernels for strings:

6 Kernel examples Classical kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors Exotic kernels for strings: Fisher kernel (Jaakkoola and Haussler 98)

6 Kernel examples Classical kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors Exotic kernels for strings: Fisher kernel (Jaakkoola and Haussler 98) Convolution kernels (Haussler 99, Watkins 99)

6 Kernel examples Classical kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors Exotic kernels for strings: Fisher kernel (Jaakkoola and Haussler 98) Convolution kernels (Haussler 99, Watkins 99) Kernel for translation initiation site (Zien et al. 00)

6 Kernel examples Classical kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors Exotic kernels for strings: Fisher kernel (Jaakkoola and Haussler 98) Convolution kernels (Haussler 99, Watkins 99) Kernel for translation initiation site (Zien et al. 00) String kernel (Lodhi et al. 00)

6 Kernel examples Classical kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors Exotic kernels for strings: Fisher kernel (Jaakkoola and Haussler 98) Convolution kernels (Haussler 99, Watkins 99) Kernel for translation initiation site (Zien et al. 00) String kernel (Lodhi et al. 00) Spectrum kernel (Leslie et al., PSB 2002)

7 Kernel engineering Use prior knowledge to build the geometry of the feature space through K(.,.)

8 Part 2 New kernels for bioinfomatics

9 The problem X a set of (structured) objects

9 The problem X a set of (structured) objects p(x) a probability distribution on X

9 The problem X a set of (structured) objects p(x) a probability distribution on X How to build K(x, y) from p(x)?

10 Product kernel K prod (x, y) = p(x)p(y)

10 Product kernel K prod (x, y) = p(x)p(y) y x 0 p(x) p(y)

10 Product kernel K prod (x, y) = p(x)p(y) y x 0 p(x) p(y) SVM = Bayesian classifier

11 Diagonal kernel K diag (x, y) = p(x)δ(x, y)

11 Diagonal kernel K diag (x, y) = p(x)δ(x, y) x p(x) z y p(z) p(y)

11 Diagonal kernel K diag (x, y) = p(x)δ(x, y) x p(x) z y p(z) p(y) No learning

12 Interpolated kernel If objects are composite: x = (x 1, x 2 ) : K(x, y) = K diag (x 1, y 1 )K prod (x 2, y 2 )

12 Interpolated kernel If objects are composite: x = (x 1, x 2 ) : K(x, y) = K diag (x 1, y 1 )K prod (x 2, y 2 ) = p(x 1 )δ(x 1, y 1 ) p(x 2 x 1 )p(y 2 y 1 ) AB BB AA BA A* B*

13 General interpolated kernel Composite objects x = (x 1,..., x n )

13 General interpolated kernel Composite objects x = (x 1,..., x n ) A list of index subsets: V = {I 1,..., I v } where I i {1,..., n}

13 General interpolated kernel Composite objects x = (x 1,..., x n ) A list of index subsets: V = {I 1,..., I v } where I i {1,..., n} Interpolated kernel: K V (x, y) = 1 V K diag (x I, y I )K prod (x I c, y I c) I V

14 Rare common subparts For a given p(x) and p(y), we have: K V (x, y) = K prod (x, y) 1 V I V δ(x I, y I ) p(x I )

14 Rare common subparts For a given p(x) and p(y), we have: K V (x, y) = K prod (x, y) 1 V I V δ(x I, y I ) p(x I ) x and y get closer in the feature space when they share rare common subparts

15 Implementation Factorization for particular choices of p(.) and V

15 Implementation Factorization for particular choices of p(.) and V Example: V = P({1,..., n}) the set of all subsets: V = 2 n

15 Implementation Factorization for particular choices of p(.) and V Example: V = P({1,..., n}) the set of all subsets: V = 2 n product distribution p(x) = n j=1 p j(x j ).

15 Implementation Factorization for particular choices of p(.) and V Example: V = P({1,..., n}) the set of all subsets: V = 2 n product distribution p(x) = n j=1 p j(x j ). implementation in O(n)

16 Part 3 Application: SVM prediction of signal peptide cleavage site

17 Secretory pathway mrna Signal peptide Nascent protein Cell surface (secreted) Lysosome Plasma membrane ER Golgi Nucleus Chloroplast Mitochondrion Peroxisome Cytosole

18 Signal peptides Protein -1 +1 (1) MKANAKTIIAGMIALAISHTAMA EE... (2) MKQSTIALALLPLLFTPVTKA RT... (3) MKATKLVLGAVILGSTLLAG CS... (1):Leucine-binding protein, (2):Pre-alkaline phosphatase, (3)Pre-lipoprotein

18 Signal peptides Protein -1 +1 (1) MKANAKTIIAGMIALAISHTAMA EE... (2) MKQSTIALALLPLLFTPVTKA RT... (3) MKATKLVLGAVILGSTLLAG CS... (1):Leucine-binding protein, (2):Pre-alkaline phosphatase, (3)Pre-lipoprotein 6-12 hydrophobic residues (in yellow) (-3,-1) : small uncharged residues

19 Experiment Challenge : classification of aminoacids windows, positive if cleavage occurs between -1 and +1: [x 8, x 7,..., x 1, x 1, x 2 ]

19 Experiment Challenge : classification of aminoacids windows, positive if cleavage occurs between -1 and +1: [x 8, x 7,..., x 1, x 1, x 2 ] 1,418 positive examples, 65,216 negative examples

19 Experiment Challenge : classification of aminoacids windows, positive if cleavage occurs between -1 and +1: [x 8, x 7,..., x 1, x 1, x 2 ] 1,418 positive examples, 65,216 negative examples Computation of a weight matrix: SVM + K prod (naive Bayes) vs SVM + K interpolated

20 Result: ROC curves 100 Interpolated Kernel False Negative (%) 80 60 Product Kernel (Bayes) 40 0 4 8 12 16 20 24 False positive (%)

Conclusion 21

22 Conclusion An other way to derive a kernel from a probability distribution

22 Conclusion An other way to derive a kernel from a probability distribution Useful when objects can be compared by comparing subparts

22 Conclusion An other way to derive a kernel from a probability distribution Useful when objects can be compared by comparing subparts Encouraging results on real-world application how to improve a weight matrix based classifier

22 Conclusion An other way to derive a kernel from a probability distribution Useful when objects can be compared by comparing subparts Encouraging results on real-world application how to improve a weight matrix based classifier Future work: more application-specific kernels

23 Acknowledgement Minoru Kanehisa Applied Biosystems for the travel grant