Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings

1 Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan

2 Outline 1. SVM and kernel methods 2. New kernels for bioinformatics 3. Example: signal peptide cleavage site prediction

3 Part 1 SVM and kernel methods

4 Support vector machines φ Objects to classified x mapped to a feature space Largest margin separating hyperplan in the feature space

5 The kernel trick Implicit definition of x Φ(x) through the kernel: K(x, y) def = < Φ(x), Φ(y) >

5 The kernel trick Implicit definition of x Φ(x) through the kernel: K(x, y) def = < Φ(x), Φ(y) > Simple kernels can represent complex Φ

5 The kernel trick Implicit definition of x Φ(x) through the kernel: K(x, y) def = < Φ(x), Φ(y) > Simple kernels can represent complex Φ For a given kernel, not only SVM but also clustering, PCA, ICA... possible in the feature space = kernel methods

6 Kernel examples Classical kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors

6 Kernel examples Classical kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors Exotic kernels for strings:

6 Kernel examples Classical kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors Exotic kernels for strings: Fisher kernel (Jaakkoola and Haussler 98)

6 Kernel examples Classical kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors Exotic kernels for strings: Fisher kernel (Jaakkoola and Haussler 98) Convolution kernels (Haussler 99, Watkins 99) Kernel for translation initiation site (Zien et al. 00)

7 Kernel engineering Use prior knowledge to build the geometry of the feature space through K(.,.)

8 Part 2 New kernels for bioinfomatics

9 The problem X a set of (structured) objects

9 The problem X a set of (structured) objects p(x) a probability distribution on X

9 The problem X a set of (structured) objects p(x) a probability distribution on X How to build K(x, y) from p(x)?

10 Product kernel K prod (x, y) = p(x)p(y)

10 Product kernel K prod (x, y) = p(x)p(y) y x 0 p(x) p(y)

10 Product kernel K prod (x, y) = p(x)p(y) y x 0 p(x) p(y) SVM = Bayesian classifier

11 Diagonal kernel K diag (x, y) = p(x)δ(x, y)

11 Diagonal kernel K diag (x, y) = p(x)δ(x, y) x p(x) z y p(z) p(y)

11 Diagonal kernel K diag (x, y) = p(x)δ(x, y) x p(x) z y p(z) p(y) No learning

12 Interpolated kernel If objects are composite: x = (x 1, x 2 ) : K(x, y) = K diag (x 1, y 1 )K prod (x 2, y 2 )

12 Interpolated kernel If objects are composite: x = (x 1, x 2 ) : K(x, y) = K diag (x 1, y 1 )K prod (x 2, y 2 ) = p(x 1 )δ(x 1, y 1 ) p(x 2 x 1 )p(y 2 y 1 ) AB BB AA BA A* B*

13 General interpolated kernel Composite objects x = (x 1,..., x n )

13 General interpolated kernel Composite objects x = (x 1,..., x n ) A list of index subsets: V = {I 1,..., I v } where I i {1,..., n}

13 General interpolated kernel Composite objects x = (x 1,..., x n ) A list of index subsets: V = {I 1,..., I v } where I i {1,..., n} Interpolated kernel: K V (x, y) = 1 V K diag (x I, y I )K prod (x I c, y I c) I V

14 Rare common subparts For a given p(x) and p(y), we have: K V (x, y) = K prod (x, y) 1 V I V δ(x I, y I ) p(x I )

14 Rare common subparts For a given p(x) and p(y), we have: K V (x, y) = K prod (x, y) 1 V I V δ(x I, y I ) p(x I ) x and y get closer in the feature space when they share rare common subparts

15 Implementation Factorization for particular choices of p(.) and V

15 Implementation Factorization for particular choices of p(.) and V Example: V = P({1,..., n}) the set of all subsets: V = 2 n

15 Implementation Factorization for particular choices of p(.) and V Example: V = P({1,..., n}) the set of all subsets: V = 2 n product distribution p(x) = n j=1 p j(x j ).

15 Implementation Factorization for particular choices of p(.) and V Example: V = P({1,..., n}) the set of all subsets: V = 2 n product distribution p(x) = n j=1 p j(x j ). implementation in O(n)

16 Part 3 Application: SVM prediction of signal peptide cleavage site

17 Secretory pathway mrna Signal peptide Nascent protein Cell surface (secreted) Lysosome Plasma membrane ER Golgi Nucleus Chloroplast Mitochondrion Peroxisome Cytosole

18 Signal peptides Protein -1 +1 (1) MKANAKTIIAGMIALAISHTAMA EE... (2) MKQSTIALALLPLLFTPVTKA RT... (3) MKATKLVLGAVILGSTLLAG CS... (1):Leucine-binding protein, (2):Pre-alkaline phosphatase, (3)Pre-lipoprotein

19 Experiment Challenge : classification of aminoacids windows, positive if cleavage occurs between -1 and +1: [x 8, x 7,..., x 1, x 1, x 2 ]

19 Experiment Challenge : classification of aminoacids windows, positive if cleavage occurs between -1 and +1: [x 8, x 7,..., x 1, x 1, x 2 ] 1,418 positive examples, 65,216 negative examples

19 Experiment Challenge : classification of aminoacids windows, positive if cleavage occurs between -1 and +1: [x 8, x 7,..., x 1, x 1, x 2 ] 1,418 positive examples, 65,216 negative examples Computation of a weight matrix: SVM + K prod (naive Bayes) vs SVM + K interpolated

20 Result: ROC curves 100 Interpolated Kernel False Negative (%) 80 60 Product Kernel (Bayes) 40 0 4 8 12 16 20 24 False positive (%)

Conclusion 21

22 Conclusion An other way to derive a kernel from a probability distribution

22 Conclusion An other way to derive a kernel from a probability distribution Useful when objects can be compared by comparing subparts

22 Conclusion An other way to derive a kernel from a probability distribution Useful when objects can be compared by comparing subparts Encouraging results on real-world application how to improve a weight matrix based classifier

23 Acknowledgement Minoru Kanehisa Applied Biosystems for the travel grant