Mismatch String Kernels for SVM Protein Classification
|
|
- Miles Waters
- 6 years ago
- Views:
Transcription
1 Mismatch String Kernels for SVM Protein Classification by C. Leslie, E. Eskin, J. Weston, W.S. Noble Athina Spiliopoulou Morfoula Fragopoulou Ioannis Konstas
2 Outline Definitions & Background Proteins Remote Homology Detection SVMs Insides of the algorithm Feature mapping Mismatch tree data structure Mismatch tree traversal Computational Efficiency Experiments Discussion
3 Proteins Primary structure: amino acid sequence Secondary Structure 3D Structure The same amino-acid sequence almost always folds into the same 3D structure
4 Homologues, Remote Homologues Amino acid sequences subject to mutation Structures serving important biological function highly conserved Homologues: share the same ancestor + sequence similarity > 30% Remote Homologues: share the same ancestor + sequence similarity < 30%
5 Protein Classification Superfamily Family Homologues Remote Homologues Non-homologues Homology Detection: Classify sequences into families Remote Homology Detection: Classify sequences into superfamilies
6 Remote Homology Detection Data available: amino-acid sequences Remote Homology Detection: great challenge due to low sequence similarity Previous Methods (generative models): pairwise sequence alignment profiles for protein families consensus patterns using motifs profile Hidden Markov Models SVM-Fisher: breakthrough for remote homology detection
7 SVMs in Remote Homology Detection Discriminative classifiers that learn linear decision boundaries Explicitly model difference between positive and negative examples Behave and generalise well with sparse data Input data can be mapped to a feature space Kernel Trick Explicit calculation of feature vectors can be avoided
8 Outline Definitions & Background Proteins Remote Homology Detection SVMs Insides of the algorithm Feature mapping Mismatch tree data structure Mismatch tree traversal Computational Efficiency Experiments Discussion
9 Feature Mapping = 20 Amino acid alphabet A: length symbols l k-mer: a k-length subsequence in a protein sequence l k Feature Space: the -dimensional vector space indexed by the set of all possible k-mersfrom A
10 Feature Mapping (cont.) Alphabet A = (A, V, L) k = 3 A A L A A V AAL ALA LAA AAV AAA AAL AAV ALA AVA LAA VAA
11 Mismatch String Kernel Allows for mutations k = 3, m = 1 A A L A A V AAV AAA LAL VAL AVL ALL Mismatch neighbourhood: οf the 3-mer α = AAL The feature mapping of a k-merαis given by: Φ( )( ) ( ( )) k, m α = φ β α k βεα, where φ ( α ) β = 1if β belongs to neighbourhood and 0 otherwise N a ( ) ( 3,1)
12 Mismatch String Kernel (cont.) The feature mapping of sequence x is given by: Φ ( )( ) k, m x = φ( k, m)( α ) k mers a in x The (k,m)-mismatch kernel is given by: K ( k, m)( x, y) = Φ( k, m)( x), Φ( k, m)( y)
13 Mismatch Tree -An efficient data Structure Representation of feature space as a tree Depth of tree: k Number of branches of each internal node: A = l Label of each branch: a symbol from A
14 Mismatch Tree-An efficient data Structure (cont.) Alphabet A = (A, V, L) k = 3 A V L Internal nodes: prefix of k-mer A V L A V L AA AV AL A V L AAA AAV AAL Leaf nodes: fixed k-mers
15 Mismatch Tree Traversal (DFS) Sequence: AALA k = 3, m = A L L A 0 1 L A A L A 0 0 A A A L L A A L V K ( x, y) K( x, y) + count( x) count( y)
16 Outline Definitions & Background Proteins Remote Homology Detection SVMs Insides of the algorithm Feature mapping Mismatch tree data structure Mismatch tree traversal Computational Efficiency Experiments Discussion
17 Efficiency Space Complexity No need to store the entire tree For k = billion nodes! No need to store all feature vectors No need to store all feature vectors Kernel trick!
18 Efficiency Time Complexity A fixed k-mer α has: O k m l m k-mers to its neighbourhood N = Mn, where ( ) M: number of sequences and n: the length of each sequence N: total length of the dataset ( m l m ) O Nk Whole dataset: k-mers ( ) 2 Worst case: perform O M updates to the kernel matrix Overall running complexity: ( M nk m l m ) O 2
19 System Pipeline Training Phase Compute the kernel matrix for all the training sequences Normalize (divide by the length of the vectors) Train the SVM classifier Compute and store the k-mer scores of the Support Vectors Testing Phase Compute the feature vector for each test datum and predict its class in linear time f r ( x) = yiai Φ( k, m)( xi ), Φ( k, m)( x) i= 1 + b
20 Experiments Benchmark dataset designed by Jaakkola et al. from the SCOP database 33 Families Superfamily Family Pos. Train Pos. Test Negative Train
21 Experiments (cont.) Comparison to other methods: PSI-BLAST (mainly used for homology detection) SAM-T98 Fisher-SVM (the state-of-the- art)
22 ROC Curve - ROC Scores 1 1 TP TP 0,8 0,7 0 FP 1 0 FP 1 ROC Score is the area under the curve
23 Comparison of all methods
24 Family-by-family Comparison
25 Discussion Mismatch-SVM performs equally well with Fisher- SVM method Mismatch-SVM much more efficient Efficiency: important issue Large real-world datasets Multi-class prediction Accuracy increased by incorporating biological knowledge
26 Questions?
New String Kernels for Biosequence Data
Workshop on Kernel Methods in Bioinformatics New String Kernels for Biosequence Data Christina Leslie Department of Computer Science Columbia University Biological Sequence Classification Problems Protein
More informationProfile-based String Kernels for Remote Homology Detection and Motif Extraction
Profile-based String Kernels for Remote Homology Detection and Motif Extraction Ray Kuang, Eugene Ie, Ke Wang, Kai Wang, Mahira Siddiqi, Yoav Freund and Christina Leslie. Department of Computer Science
More informationMismatch String Kernels for SVM Protein Classification
Mismatch String Kernels for SVM Protein Classification Christina Leslie Department of Computer Science Columbia University cleslie@cs.columbia.edu Jason Weston Max-Planck Institute Tuebingen, Germany weston@tuebingen.mpg.de
More informationBIOINFORMATICS. Mismatch string kernels for discriminative protein classification
BIOINFORMATICS Vol. 1 no. 1 2003 Pages 1 10 Mismatch string kernels for discriminative protein classification Christina Leslie 1, Eleazar Eskin 1, Adiel Cohen 1, Jason Weston 2 and William Stafford Noble
More informationIntroduction to Kernels (part II)Application to sequences p.1
Introduction to Kernels (part II) Application to sequences Liva Ralaivola liva@ics.uci.edu School of Information and Computer Science Institute for Genomics and Bioinformatics Introduction to Kernels (part
More informationSemi-supervised protein classification using cluster kernels
Semi-supervised protein classification using cluster kernels Jason Weston Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany weston@tuebingen.mpg.de Dengyong Zhou, Andre Elisseeff
More informationFast Kernels for Inexact String Matching
Fast Kernels for Inexact String Matching Christina Leslie and Rui Kuang Columbia University, New York NY 10027, USA {rkuang,cleslie}@cs.columbia.edu Abstract. We introduce several new families of string
More informationTHE SPECTRUM KERNEL: A STRING KERNEL FOR SVM PROTEIN CLASSIFICATION
THE SPECTRUM KERNEL: A STRING KERNEL FOR SVM PROTEIN CLASSIFICATION CHRISTINA LESLIE, ELEAZAR ESKIN, WILLIAM STAFFORD NOBLE a {cleslie,eeskin,noble}@cs.columbia.edu Department of Computer Science, Columbia
More informationA fast, large-scale learning method for protein sequence classification
A fast, large-scale learning method for protein sequence classification Pavel Kuksa, Pai-Hsi Huang, Vladimir Pavlovic Department of Computer Science Rutgers University Piscataway, NJ 8854 {pkuksa;paihuang;vladimir}@cs.rutgers.edu
More informationModifying Kernels Using Label Information Improves SVM Classification Performance
Modifying Kernels Using Label Information Improves SVM Classification Performance Renqiang Min and Anthony Bonner Department of Computer Science University of Toronto Toronto, ON M5S3G4, Canada minrq@cs.toronto.edu
More informationGeneralized Similarity Kernels for Efficient Sequence Classification
Generalized Similarity Kernels for Efficient Sequence Classification Pavel P. Kuksa NEC Laboratories America, Inc. Princeton, NJ 08540 pkuksa@nec-labs.com Imdadullah Khan Department of Computer Science
More informationClassification of biological sequences with kernel methods
Classification of biological sequences with kernel methods Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Centre for Computational Biology Ecole des Mines de Paris, ParisTech International Conference on
More informationMultiple Sequence Alignment: Multidimensional. Biological Motivation
Multiple Sequence Alignment: Multidimensional Dynamic Programming Boston University Biological Motivation Compare a new sequence with the sequences in a protein family. Proteins can be categorized into
More informationC E N T R. Introduction to bioinformatics 2007 E B I O I N F O R M A T I C S V U F O R I N T. Lecture 13 G R A T I V. Iterative homology searching,
C E N T R E F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U Introduction to bioinformatics 2007 Lecture 13 Iterative homology searching, PSI (Position Specific Iterated) BLAST basic idea use
More informationRemote Homolog Detection Using Local Sequence Structure Correlations
PROTEINS: Structure, Function, and Bioinformatics 57:518 530 (2004) Remote Homolog Detection Using Local Sequence Structure Correlations Yuna Hou, 1 * Wynne Hsu, 1 Mong Li Lee, 1 and Christopher Bystroff
More informationScalable Algorithms for String Kernels with Inexact Matching
Scalable Algorithms for String Kernels with Inexact Matching Pavel P. Kuksa, Pai-Hsi Huang, Vladimir Pavlovic Department of Computer Science, Rutgers University, Piscataway, NJ 08854 {pkuksa,paihuang,vladimir}@cs.rutgers.edu
More informationDesiging and combining kernels: some lessons learned from bioinformatics
Desiging and combining kernels: some lessons learned from bioinformatics Jean-Philippe Vert Jean-Philippe.Vert@mines-paristech.fr Mines ParisTech & Institut Curie NIPS MKL workshop, Dec 12, 2009. Jean-Philippe
More informationGeneralized Similarity Kernels for Efficient Sequence Classification
Rutgers Computer Science Technical Report RU-DCS-TR684 February 2011 Generalized Similarity Kernels for Efficient Sequence Classification by Pavel P. Kuksa, Imdadullah Khan, Vladimir Pavlovic Rutgers University
More informationBLAST, Profile, and PSI-BLAST
BLAST, Profile, and PSI-BLAST Jianlin Cheng, PhD School of Electrical Engineering and Computer Science University of Central Florida 26 Free for academic use Copyright @ Jianlin Cheng & original sources
More informationApplication of Support Vector Machine In Bioinformatics
Application of Support Vector Machine In Bioinformatics V. K. Jayaraman Scientific and Engineering Computing Group CDAC, Pune jayaramanv@cdac.in Arun Gupta Computational Biology Group AbhyudayaTech, Indore
More informationMultiple Sequence Alignment Based on Profile Alignment of Intermediate Sequences
Multiple Sequence Alignment Based on Profile Alignment of Intermediate Sequences Yue Lu and Sing-Hoi Sze RECOMB 2007 Presented by: Wanxing Xu March 6, 2008 Content Biology Motivation Computation Problem
More informationAs of August 15, 2008, GenBank contained bases from reported sequences. The search procedure should be
48 Bioinformatics I, WS 09-10, S. Henz (script by D. Huson) November 26, 2009 4 BLAST and BLAT Outline of the chapter: 1. Heuristics for the pairwise local alignment of two sequences 2. BLAST: search and
More informationBasic Local Alignment Search Tool (BLAST)
BLAST 26.04.2018 Basic Local Alignment Search Tool (BLAST) BLAST (Altshul-1990) is an heuristic Pairwise Alignment composed by six-steps that search for local similarities. The most used access point to
More informationPROTEIN MULTIPLE ALIGNMENT MOTIVATION: BACKGROUND: Marina Sirota
Marina Sirota MOTIVATION: PROTEIN MULTIPLE ALIGNMENT To study evolution on the genetic level across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein
More information15-780: Graduate Artificial Intelligence. Computational biology: Sequence alignment and profile HMMs
5-78: Graduate rtificial Intelligence omputational biology: Sequence alignment and profile HMMs entral dogma DN GGGG transcription mrn UGGUUUGUG translation Protein PEPIDE 2 omparison of Different Organisms
More informationA Coprocessor Architecture for Fast Protein Structure Prediction
A Coprocessor Architecture for Fast Protein Structure Prediction M. Marolia, R. Khoja, T. Acharya, C. Chakrabarti Department of Electrical Engineering Arizona State University, Tempe, USA. Abstract Predicting
More informationProfiles and Multiple Alignments. COMP 571 Luay Nakhleh, Rice University
Profiles and Multiple Alignments COMP 571 Luay Nakhleh, Rice University Outline Profiles and sequence logos Profile hidden Markov models Aligning profiles Multiple sequence alignment by gradual sequence
More informationTransfer String Kernel for Cross-Context Sequence Specific DNA-Protein Binding Prediction. by Ritambhara Singh IIIT-Delhi June 10, 2016
Transfer String Kernel for Cross-Context Sequence Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 Biology in a Slide DNA RNA PROTEIN CELL ORGANISM 2 DNA and Diseases
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationProtein Sequence Classification Using Probabilistic Motifs and Neural Networks
Protein Sequence Classification Using Probabilistic Motifs and Neural Networks Konstantinos Blekas, Dimitrios I. Fotiadis, and Aristidis Likas Department of Computer Science, University of Ioannina, 45110
More information1. R. Durbin, S. Eddy, A. Krogh und G. Mitchison: Biological sequence analysis, Cambridge, 1998
7 Multiple Sequence Alignment The exposition was prepared by Clemens Gröpl, based on earlier versions by Daniel Huson, Knut Reinert, and Gunnar Klau. It is based on the following sources, which are all
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationPROTEIN HOMOLOGY DETECTION WITH SPARSE MODELS
PROTEIN HOMOLOGY DETECTION WITH SPARSE MODELS BY PAI-HSI HUANG A dissertation submitted to the Graduate School New Brunswick Rutgers, The State University of New Jersey in partial fulfillment of the requirements
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2015 1 Sequence Alignment Dannie Durand Pairwise Sequence Alignment The goal of pairwise sequence alignment is to establish a correspondence between the
More informationUSING AN EXTENDED SUFFIX TREE TO SPEED-UP SEQUENCE ALIGNMENT
IADIS International Conference Applied Computing 2006 USING AN EXTENDED SUFFIX TREE TO SPEED-UP SEQUENCE ALIGNMENT Divya R. Singh Software Engineer Microsoft Corporation, Redmond, WA 98052, USA Abdullah
More informationSVM-KNN : Discriminative Nearest Neighbor Classification for Visual Category Recognition
SVM-KNN : Discriminative Nearest Neighbor Classification for Visual Category Recognition Hao Zhang, Alexander Berg, Michael Maire Jitendra Malik EECS, UC Berkeley Presented by Adam Bickett Objective Visual
More information1. R. Durbin, S. Eddy, A. Krogh und G. Mitchison: Biological sequence analysis, Cambridge, 1998
7 Multiple Sequence Alignment The exposition was prepared by Clemens GrÃP pl, based on earlier versions by Daniel Huson, Knut Reinert, and Gunnar Klau. It is based on the following sources, which are all
More informationMultiple Sequence Alignment. Mark Whitsitt - NCSA
Multiple Sequence Alignment Mark Whitsitt - NCSA What is a Multiple Sequence Alignment (MA)? GMHGTVYANYAVDSSDLLLAFGVRFDDRVTGKLEAFASRAKIVHIDIDSAEIGKNKQPHV GMHGTVYANYAVEHSDLLLAFGVRFDDRVTGKLEAFASRAKIVHIDIDSAEIGKNKTPHV
More informationBLAST - Basic Local Alignment Search Tool
Lecture for ic Bioinformatics (DD2450) April 11, 2013 Searching 1. Input: Query Sequence 2. Database of sequences 3. Subject Sequence(s) 4. Output: High Segment Pairs (HSPs) Sequence Similarity Measures:
More informationMachine Learning. Computational biology: Sequence alignment and profile HMMs
10-601 Machine Learning Computational biology: Sequence alignment and profile HMMs Central dogma DNA CCTGAGCCAACTATTGATGAA transcription mrna CCUGAGCCAACUAUUGAUGAA translation Protein PEPTIDE 2 Growth
More informationStructured Learning. Jun Zhu
Structured Learning Jun Zhu Supervised learning Given a set of I.I.D. training samples Learn a prediction function b r a c e Supervised learning (cont d) Many different choices Logistic Regression Maximum
More informationCISC 889 Bioinformatics (Spring 2003) Multiple Sequence Alignment
CISC 889 Bioinformatics (Spring 2003) Multiple Sequence Alignment Courtesy of jalview 1 Motivations Collective statistic Protein families Identification and representation of conserved sequence features
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 06: Multiple Sequence Alignment https://upload.wikimedia.org/wikipedia/commons/thumb/7/79/rplp0_90_clustalw_aln.gif/575px-rplp0_90_clustalw_aln.gif Slides
More informationUsing Hidden Markov Models to Detect DNA Motifs
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 5-13-2015 Using Hidden Markov Models to Detect DNA Motifs Santrupti Nerli San Jose State University
More informationEvaluation of different biological data and computational classification methods for use in protein interaction prediction.
Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Yanjun Qi, Ziv Bar-Joseph, Judith Klein-Seetharaman Protein 2006 Motivation Correctly
More informationDynamic Programming User Manual v1.0 Anton E. Weisstein, Truman State University Aug. 19, 2014
Dynamic Programming User Manual v1.0 Anton E. Weisstein, Truman State University Aug. 19, 2014 Dynamic programming is a group of mathematical methods used to sequentially split a complicated problem into
More informationChapter 6. Multiple sequence alignment (week 10)
Course organization Introduction ( Week 1,2) Part I: Algorithms for Sequence Analysis (Week 1-11) Chapter 1-3, Models and theories» Probability theory and Statistics (Week 3)» Algorithm complexity analysis
More informationLECTURE 5: DUAL PROBLEMS AND KERNELS. * Most of the slides in this lecture are from
LECTURE 5: DUAL PROBLEMS AND KERNELS * Most of the slides in this lecture are from http://www.robots.ox.ac.uk/~az/lectures/ml Optimization Loss function Loss functions SVM review PRIMAL-DUAL PROBLEM Max-min
More informationMachine Learning Models for Pattern Classification. Comp 473/6731
Machine Learning Models for Pattern Classification Comp 473/6731 November 24th 2016 Prof. Neamat El Gayar Neural Networks Neural Networks Low level computational algorithms Learn by example (no required
More informationSemi-Supervised Abstraction-Augmented String Kernel for bio-relationship Extraction
Semi-Supervised Abstraction-Augmented String Kernel for bio-relationship Extraction Pavel P. Kuksa, Rutgers University Yanjun Qi, Bing Bai, Ronan Collobert, NEC Labs Jason Weston, Google Research NY Vladimir
More informationComputational Molecular Biology
Computational Molecular Biology Erwin M. Bakker Lecture 3, mainly from material by R. Shamir [2] and H.J. Hoogeboom [4]. 1 Pairwise Sequence Alignment Biological Motivation Algorithmic Aspect Recursive
More informationMachine Learning for. Artem Lind & Aleskandr Tkachenko
Machine Learning for Object Recognition Artem Lind & Aleskandr Tkachenko Outline Problem overview Classification demo Examples of learning algorithms Probabilistic modeling Bayes classifier Maximum margin
More informationDiscriminative classifiers for image recognition
Discriminative classifiers for image recognition May 26 th, 2015 Yong Jae Lee UC Davis Outline Last time: window-based generic object detection basic pipeline face detection with boosting as case study
More informationSupport vector machine prediction of signal peptide cleavage site using a new class of kernels for strings
1 Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan 2 Outline 1. SVM and kernel
More informationChoosing the kernel parameters for SVMs by the inter-cluster distance in the feature space Authors: Kuo-Ping Wu, Sheng-De Wang Published 2008
Choosing the kernel parameters for SVMs by the inter-cluster distance in the feature space Authors: Kuo-Ping Wu, Sheng-De Wang Published 2008 Presented by: Nandini Deka UH Mathematics Spring 2014 Workshop
More informationClassification. Slide sources:
Classification Slide sources: Gideon Dror, Academic College of TA Yaffo Nathan Ifill, Leicester MA4102 Data Mining and Neural Networks Andrew Moore, CMU : http://www.cs.cmu.edu/~awm/tutorials 1 Outline
More informationBLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio. 1990. CS 466 Saurabh Sinha Motivation Sequence homology to a known protein suggest function of newly sequenced protein Bioinformatics
More information9. Support Vector Machines. The linearly separable case: hard-margin SVMs. The linearly separable case: hard-margin SVMs. Learning objectives
Foundations of Machine Learning École Centrale Paris Fall 25 9. Support Vector Machines Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech Learning objectives chloe agathe.azencott@mines
More informationAlgorithmic Approaches for Biological Data, Lecture #20
Algorithmic Approaches for Biological Data, Lecture #20 Katherine St. John City University of New York American Museum of Natural History 20 April 2016 Outline Aligning with Gaps and Substitution Matrices
More informationKernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Kernels + K-Means Matt Gormley Lecture 29 April 25, 2018 1 Reminders Homework 8:
More informationDiscriminate Analysis
Discriminate Analysis Outline Introduction Linear Discriminant Analysis Examples 1 Introduction What is Discriminant Analysis? Statistical technique to classify objects into mutually exclusive and exhaustive
More informationSVM cont d. Applications face detection [IEEE INTELLIGENT SYSTEMS]
SVM cont d A method of choice when examples are represented by vectors or matrices Input space cannot be readily used as attribute-vector (e.g. too many attrs) Kernel methods: map data from input space
More informationWeighted Tree Kernels for Sequence Analysis
ESANN 2014 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence Weighted Tree Kernels for Sequence Analysis Christopher J. Bowles and James M. Hogan School of Electrical
More informationProtein homology detection using string alignment kernels
BIOINFORMATICS Vol. 20 no. 11 2004, pages 1682 1689 doi:10.1093/bioinformatics/bth141 Protein homology detection using string alignment kernels Hiroto Saigo 1, Jean-Philippe Vert 2,, Nobuhisa Ueda 1 and
More informationSupport vector machines. Dominik Wisniewski Wojciech Wawrzyniak
Support vector machines Dominik Wisniewski Wojciech Wawrzyniak Outline 1. A brief history of SVM. 2. What is SVM and how does it work? 3. How would you classify this data? 4. Are all the separating lines
More informationKeyword Extraction by KNN considering Similarity among Features
64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,
More informationHIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT
HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT - Swarbhanu Chatterjee. Hidden Markov models are a sophisticated and flexible statistical tool for the study of protein models. Using HMMs to analyze proteins
More informationData Mining Classification: Alternative Techniques. Imbalanced Class Problem
Data Mining Classification: Alternative Techniques Imbalanced Class Problem Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Class Imbalance Problem Lots of classification problems
More informationGlobal Alignment Scoring Matrices Local Alignment Alignment with Affine Gap Penalties
Global Alignment Scoring Matrices Local Alignment Alignment with Affine Gap Penalties From LCS to Alignment: Change the Scoring The Longest Common Subsequence (LCS) problem the simplest form of sequence
More information3.4 Multiple sequence alignment
3.4 Multiple sequence alignment Why produce a multiple sequence alignment? Using more than two sequences results in a more convincing alignment by revealing conserved regions in ALL of the sequences Aligned
More informationFeature Selection in Learning Using Privileged Information
November 18, 2017 ICDM 2017 New Orleans Feature Selection in Learning Using Privileged Information Rauf Izmailov, Blerta Lindqvist, Peter Lin rizmailov@vencorelabs.com Phone: 908-748-2891 Agenda Learning
More informationCompares a sequence of protein to another sequence or database of a protein, or a sequence of DNA to another sequence or library of DNA.
Compares a sequence of protein to another sequence or database of a protein, or a sequence of DNA to another sequence or library of DNA. Fasta is used to compare a protein or DNA sequence to all of the
More informationDynamic Programming Course: A structure based flexible search method for motifs in RNA. By: Veksler, I., Ziv-Ukelson, M., Barash, D.
Dynamic Programming Course: A structure based flexible search method for motifs in RNA By: Veksler, I., Ziv-Ukelson, M., Barash, D., Kedem, K Outline Background Motivation RNA s structure representations
More informationChapter 8 Multiple sequence alignment. Chaochun Wei Spring 2018
1896 1920 1987 2006 Chapter 8 Multiple sequence alignment Chaochun Wei Spring 2018 Contents 1. Reading materials 2. Multiple sequence alignment basic algorithms and tools how to improve multiple alignment
More information.. Fall 2011 CSC 570: Bioinformatics Alexander Dekhtyar..
.. Fall 2011 CSC 570: Bioinformatics Alexander Dekhtyar.. PAM and BLOSUM Matrices Prepared by: Jason Banich and Chris Hoover Background As DNA sequences change and evolve, certain amino acids are more
More information12 Classification using Support Vector Machines
160 Bioinformatics I, WS 14/15, D. Huson, January 28, 2015 12 Classification using Support Vector Machines This lecture is based on the following sources, which are all recommended reading: F. Markowetz.
More informationSequence analysis Pairwise sequence alignment
UMF11 Introduction to bioinformatics, 25 Sequence analysis Pairwise sequence alignment 1. Sequence alignment Lecturer: Marina lexandersson 12 September, 25 here are two types of sequence alignments, global
More informationGLOBEX Bioinformatics (Summer 2015) Multiple Sequence Alignment
GLOBEX Bioinformatics (Summer 2015) Multiple Sequence Alignment Scoring Dynamic Programming algorithms Heuristic algorithms CLUSTAL W Courtesy of jalview Motivations Collective (or aggregate) statistic
More informationBioinformatics for Biologists
Bioinformatics for Biologists Sequence Analysis: Part I. Pairwise alignment and database searching Fran Lewitter, Ph.D. Director Bioinformatics & Research Computing Whitehead Institute Topics to Cover
More informationMetaPhyler Usage Manual
MetaPhyler Usage Manual Bo Liu boliu@umiacs.umd.edu March 13, 2012 Contents 1 What is MetaPhyler 1 2 Installation 1 3 Quick Start 2 3.1 Taxonomic profiling for metagenomic sequences.............. 2 3.2
More informationSpecial course in Computer Science: Advanced Text Algorithms
Special course in Computer Science: Advanced Text Algorithms Lecture 8: Multiple alignments Elena Czeizler and Ion Petre Department of IT, Abo Akademi Computational Biomodelling Laboratory http://www.users.abo.fi/ipetre/textalg
More informationMULTIPLE SEQUENCE ALIGNMENT SOLUTIONS AND APPLICATIONS
MULTIPLE SEQUENCE ALIGNMENT SOLUTIONS AND APPLICATIONS By XU ZHANG A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE
More informationList of Exercises: Data Mining 1 December 12th, 2015
List of Exercises: Data Mining 1 December 12th, 2015 1. We trained a model on a two-class balanced dataset using five-fold cross validation. One person calculated the performance of the classifier by measuring
More informationAn Analysis of Pairwise Sequence Alignment Algorithm Complexities: Needleman-Wunsch, Smith-Waterman, FASTA, BLAST and Gapped BLAST
An Analysis of Pairwise Sequence Alignment Algorithm Complexities: Needleman-Wunsch, Smith-Waterman, FASTA, BLAST and Gapped BLAST Alexander Chan 5075504 Biochemistry 218 Final Project An Analysis of Pairwise
More informationSequence alignment theory and applications Session 3: BLAST algorithm
Sequence alignment theory and applications Session 3: BLAST algorithm Introduction to Bioinformatics online course : IBT Sonal Henson Learning Objectives Understand the principles of the BLAST algorithm
More informationInitiate a PSI-BLAST search simply by choosing the option on the BLAST input form.
1 2 Initiate a PSI-BLAST search simply by choosing the option on the BLAST input form. But note: invoking the algorithm is trivial. Using it correctly and interpreting the results, perhaps not so much.
More informationThe role of Fisher information in primary data space for neighbourhood mapping
The role of Fisher information in primary data space for neighbourhood mapping H. Ruiz 1, I. H. Jarman 2, J. D. Martín 3, P. J. Lisboa 1 1 - School of Computing and Mathematical Sciences - Department of
More informationComplex Prediction Problems
Problems A novel approach to multiple Structured Output Prediction Max-Planck Institute ECML HLIE08 Information Extraction Extract structured information from unstructured data Typical subtasks Named Entity
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationHow Do We Measure Protein Shape? A Pattern Matching Example. A Simple Pattern Matching Algorithm. Comparing Protein Structures II
How Do We Measure Protein Shape? omparing Protein Structures II Protein function is largely based on the proteins geometric shape Protein substructures with similar shapes are likely to share a common
More informationSpectral Clustering of Biological Sequence Data
Spectral Clustering of Biological Sequence Data William Pentney Department of Computer Science and Engineering University of Washington bill@cs.washington.edu Marina Meila Department of Statistics University
More informationPattern recognition (4)
Pattern recognition (4) 1 Things we have discussed until now Statistical pattern recognition Building simple classifiers Supervised classification Minimum distance classifier Bayesian classifier (1D and
More informationLearning to Localize Objects with Structured Output Regression
Learning to Localize Objects with Structured Output Regression Matthew Blaschko and Christopher Lampert ECCV 2008 Best Student Paper Award Presentation by Jaeyong Sung and Yiting Xie 1 Object Localization
More informationA Kernel Approach for Learning from Almost Orthogonal Patterns
A Kernel Approach for Learning from Almost Orthogonal Patterns Bernhard Schölkopf 1, Jason Weston 1, Eleazar Eskin 2, Christina Leslie 2,and William Stafford Noble 2;3 1 Max-Planck-Institut für biologische
More informationAlignment of Pairs of Sequences
Bi03a_1 Unit 03a: Alignment of Pairs of Sequences Partners for alignment Bi03a_2 Protein 1 Protein 2 =amino-acid sequences (20 letter alphabeth + gap) LGPSSKQTGKGS-SRIWDN LN-ITKSAGKGAIMRLGDA -------TGKG--------
More informationCISC 636 Computational Biology & Bioinformatics (Fall 2016)
CISC 636 Computational Biology & Bioinformatics (Fall 2016) Sequence pairwise alignment Score statistics: E-value and p-value Heuristic algorithms: BLAST and FASTA Database search: gene finding and annotations
More informationLearning Hierarchies at Two-class Complexity
Learning Hierarchies at Two-class Complexity Sandor Szedmak ss03v@ecs.soton.ac.uk Craig Saunders cjs@ecs.soton.ac.uk John Shawe-Taylor jst@ecs.soton.ac.uk ISIS Group, Electronics and Computer Science University
More informationData Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017
Data Analysis 3 Support Vector Machines Jan Platoš October 30, 2017 Department of Computer Science Faculty of Electrical Engineering and Computer Science VŠB - Technical University of Ostrava Table of
More informationMultiple Kernel Machines Using Localized Kernels
Multiple Kernel Machines Using Localized Kernels Mehmet Gönen and Ethem Alpaydın Department of Computer Engineering Boğaziçi University TR-3434, Bebek, İstanbul, Turkey gonen@boun.edu.tr alpaydin@boun.edu.tr
More informationJET 2 User Manual 1 INSTALLATION 2 EXECUTION AND FUNCTIONALITIES. 1.1 Download. 1.2 System requirements. 1.3 How to install JET 2
JET 2 User Manual 1 INSTALLATION 1.1 Download The JET 2 package is available at www.lcqb.upmc.fr/jet2. 1.2 System requirements JET 2 runs on Linux or Mac OS X. The program requires some external tools
More information