Weighted Finite-State Transducers in Computational Biology
|
|
- Marcus Wiggins
- 6 years ago
- Views:
Transcription
1 Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences Joint work with Corinna Cortes (Google Research). 1
2 This Tutorial Weighted finite-state transducers and software library Pairwise alignments for bioinformatics Learning weighted transducers Corinna Cortes and Mehryar Mohri - WFSTs in Computational Biology - Part II page 2 2
3 Biological Motivation General idea: use sequence similarity and known properties of some proteins to predict functional and structural information for other proteins. Applications: Reconstruction of long DNA sequences, Searching and mining DNA databases, Finding frequent nucleotide patterns, Determining informative sequences. Corinna Cortes and Mehryar Mohri - WFSTs in Computational Biology - Part II page 3 3
4 Why Weighted Finite-State Transducers? (Mohri, Pereira, and Riley, 2000) Generality Unified representation framework Single general algorithm No need to design and implement a novel algorithm for each new sequence alignment problem Just define a new alignment transducer Alignments between sets of strings, finite automata, or weighted automata (Mohri, 2003). Corinna Cortes and Mehryar Mohri - WFSTs in Computational Biology - Part II page 4 4
5 Why Weighted Finite-State Transducers? Efficiency Compact representation of alignment transducer Benefit of general transducer optimization algorithms General quadratic-time alignment algorithm Exploits graph representation and sparseness Most pruning ideas used in dynamic programming can be used with WFSTs as well. Corinna Cortes and Mehryar Mohri - WFSTs in Computational Biology - Part II page 5 5
6 Why Weighted Finite-State Transducers? Flexibility Can use any weighted finite-state transducer Can create more complex alignment transducers using general transducer algorithms, e.g., rational operations Easy to modify, e.g., tailor alignment transducer to a specific problem Graphical representation helps design and understanding Corinna Cortes and Mehryar Mohri - WFSTs in Computational Biology - Part II page 6 6
7 Terminology Biology sequence subsequence Computer science: Computer Science string factor - substring Factor: contiguous symbols. Substring or subsequence: nonnecessarily contiguous symbols. Corinna Cortes and Mehryar Mohri - WFSTs in Computational Biology - Part II page 7 7
8 Local Edits Insertion: ε a Deletion: a ε Substitution: a b (a b) Example: 2 insertions, 3 deletions, 1 substitution cttgɛɛac ɛ taɛgtɛc This is called an alignment. Corinna Cortes and Mehryar Mohri - WFSTs in Computational Biology - Part II page 8 8
9 Edit-Distance - Global Alignment Let c(a, b) be the cost associated to the alignment of a with b, where a and b are in Σ {ɛ}. Problem: Find the best alignment (minimal cost) of two sequences x and y. Solution: general algorithm; Compute best path of Construct edit transducer T e using c(a, b) costs. Represent x and y by automata X and Y. X T e Y. Corinna Cortes and Mehryar Mohri - WFSTs in Computational Biology - Part II page 9 9
10 Global Alignment - Example Example: c(a, G) = 1, c(a, T) = c(g, C) =.5, no cost for matching symbols. Representation: A G C T A C C T G C:G/0.5 G:C/0.5 T:A/0.5 A:T/0.5 G:A/1.0 A:G/1.0 T:T/0.0 G:G/0.0 C:C/0.0 A:A/0.0 0 echo A G C T farcompilestrings >X.fsm Corinna Cortes and Mehryar Mohri - WFSTs in Computational Biology - Part II page 10 10
11 Global Alignment - Example Program: fsmcompose X.fsm Te.fsm Y.fsm fsmbestpath -n 1 >A.fsm Graphical representation: A:A/0 0 1 G:C/0.5 2 C:C/0 3 T:T/0 4 e:g/2 5/0 e:e/0 1 A:A/0 2 G:C/0.5 3 C:C/0 T:T/0 e:g/ /0 0 e:e/0 7 A:A/0 8 G:e/2 9 e:c/2 10 C:C/0 11 T:T/0 12 e:g/2 13/0 Corinna Cortes and Mehryar Mohri - WFSTs in Computational Biology - Part II page 11 11
12 Local Alignment Problem: Find the best local alignment (alignment between factors) of two sequences x and y. Motivation: ignore non-coding regions (introns). Solution: general algorithm; Construct edit transducer T e Construct factor transducer T f. Represent x and y by automata X and Y. Compute X =Project 2 (X T f ) and Y =Project 1 (T 1 f Y ). Compute best path of X T e Y. using c(a, b) costs. Corinna Cortes and Mehryar Mohri - WFSTs in Computational Biology - Part II page 12 12
13 Factor Transducer Definition: transducer mapping a string to the set of factors of that string. T:e G:e C:e A:e T:T G:G C:C A:A T:e G:e C:e A:e 0 e:e 1 e:e 2 Corinna Cortes and Mehryar Mohri - WFSTs in Computational Biology - Part II page 13 13
14 Factor Automaton of a String Program: fsmcompose X.fsm Tf.fsm fsmproject -2 fsmrmepsilon fsmdeterminize fsmminimize >Xp.fsm Representation: size linear in that of X. A A G G T C T G A G 5 T T C 4 C G Corinna Cortes and Mehryar Mohri - WFSTs in Computational Biology - Part II page 14 14
15 Local Alignment - Example Program: fsmcompose Xp.fsm Te.fsm Yp.fsm fsmbestpath -n1 >B.fsm Graphical representation: C:C/ T:e/1 2 G:G/ A:A/-1.5 4/0 X = T G T C T G A G Y = A C A C G A C T G A C ε G A matching cost: -1.5 insertion/deletion/substitution costs: +1 Corinna Cortes and Mehryar Mohri - WFSTs in Computational Biology - Part II page 15 15
16 Alignment with Gaps Problem: Find the best local or global alignment with gaps (consecutive sequence of deletions/ insertions) two sequences x and y. Motivation: ignore long gaps due to introns. Example: Gap penalties: A T G A ε ε G A C A ε ε G A T G ε C Constant: constant penalty Affine: w 0 + w 1 gap Convex: w 0 log gap. w 0 per gap. Corinna Cortes and Mehryar Mohri - WFSTs in Computational Biology - Part II page 16 16
17 Gap Alignment Transducer Representation: affine gap penalties. a:b/1 a:a/0 0/0 a:e/(w0 + w1) e:a/(w0 + w1) a:a/0 a:b/1 e:a/w1 a:e/w1 1/0 Corinna Cortes and Mehryar Mohri - WFSTs in Computational Biology - Part II page 17 17
18 More Complex Alignments Complex alignment transducers Simpler alignment transducers can be combined using rational operations to create more complex ones. Any weighted transducer can help define an alignment. Alignment transducers can be learned from examples. Corinna Cortes and Mehryar Mohri - WFSTs in Computational Biology - Part II page 18 18
19 Learning Alignment Transducers Problem: given a fix topology for the transducer examples., learn the transition weights from Example: (Ristad and Yianilos, 1997) learn edit costs of one-state edit-distance transducer; data: large corpus of pairs of related sequences; algorithm: use the Expectation-Maximization (EM) algorithm. T e Corinna Cortes and Mehryar Mohri - WFSTs in Computational Biology - Part II page 19 19
20 Alignment of Sets of Sequences Problem: Find the best alignment (minimal cost) of two sets of sequences X and Y. Note: X and Y may be finite sets of biological sequences or infinite sets, e.g., represented or modeled by HMMs. Solution: T e Construct edit transducer. A X Represent X and Y by automata and. Compute best path of A X T e A Y. A Y Corinna Cortes and Mehryar Mohri - WFSTs in Computational Biology - Part II page 20 20
21 Longest Common Subsequence of Automata Problem: Find the longest common subsequence of two (finite) sets of sequences X and Y each given by a finite automaton. lcs(x, Y ) = max { lcs(x, y) : x X, y Y }. Solution: Construct LCS transducer. Represent X and Y by automata and. Compute best path of T lcs A X A X T lcs A Y. A Y Corinna Cortes and Mehryar Mohri - WFSTs in Computational Biology - Part II page 21 21
22 Longest Common Subsequence Transducer Representation: example. e:b/0 e:a/0 b:e/0 a:e/0 b:b/-1 a:a/-1 0/0 Corinna Cortes and Mehryar Mohri - WFSTs in Computational Biology - Part II page 22 22
23 Summary An alignment scoring function can be defined by a WFST. A general algorithm can be used to efficiently compute the best alignments between sequences, or sets of sequences represented by finite automata. Complex alignments can be learned from examples. The FSM library can be used for the implementation and computation of alignments. Corinna Cortes and Mehryar Mohri - WFSTs in Computational Biology - Part II page 23 23
Weighted Finite-State Transducers in Computational Biology
Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). This Tutorial Weighted
More informationSpeech Recognition Lecture 12: Lattice Algorithms. Cyril Allauzen Google, NYU Courant Institute Slide Credit: Mehryar Mohri
Speech Recognition Lecture 12: Lattice Algorithms. Cyril Allauzen Google, NYU Courant Institute allauzen@cs.nyu.edu Slide Credit: Mehryar Mohri This Lecture Speech recognition evaluation N-best strings
More informationLecture 5: Markov models
Master s course Bioinformatics Data Analysis and Tools Lecture 5: Markov models Centre for Integrative Bioinformatics Problem in biology Data and patterns are often not clear cut When we want to make a
More informationBMI/CS 576 Fall 2015 Midterm Exam
BMI/CS 576 Fall 2015 Midterm Exam Prof. Colin Dewey Tuesday, October 27th, 2015 11:00am-12:15pm Name: KEY Write your answers on these pages and show your work. You may use the back sides of pages as necessary.
More informationReport for each of the weighted automata obtained ˆ the number of states; ˆ the number of ɛ-transitions;
Mehryar Mohri Speech Recognition Courant Institute of Mathematical Sciences Homework assignment 3 (Solution) Part 2, 3 written by David Alvarez 1. For this question, it is recommended that you use the
More informationSequence alignment is an essential concept for bioinformatics, as most of our data analysis and interpretation techniques make use of it.
Sequence Alignments Overview Sequence alignment is an essential concept for bioinformatics, as most of our data analysis and interpretation techniques make use of it. Sequence alignment means arranging
More informationComputational Molecular Biology
Computational Molecular Biology Erwin M. Bakker Lecture 2 Materials used from R. Shamir [2] and H.J. Hoogeboom [4]. 1 Molecular Biology Sequences DNA A, T, C, G RNA A, U, C, G Protein A, R, D, N, C E,
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2015 1 Sequence Alignment Dannie Durand Pairwise Sequence Alignment The goal of pairwise sequence alignment is to establish a correspondence between the
More informationDynamic Programming User Manual v1.0 Anton E. Weisstein, Truman State University Aug. 19, 2014
Dynamic Programming User Manual v1.0 Anton E. Weisstein, Truman State University Aug. 19, 2014 Dynamic programming is a group of mathematical methods used to sequentially split a complicated problem into
More informationA General Weighted Grammar Library
A General Weighted Grammar Library Cyril Allauzen, Mehryar Mohri, and Brian Roark AT&T Labs Research, Shannon Laboratory 80 Park Avenue, Florham Park, NJ 0792-097 {allauzen, mohri, roark}@research.att.com
More informationA General Weighted Grammar Library
A General Weighted Grammar Library Cyril Allauzen, Mehryar Mohri 2, and Brian Roark 3 AT&T Labs Research 80 Park Avenue, Florham Park, NJ 07932-097 allauzen@research.att.com 2 Department of Computer Science
More informationGlobal Alignment Scoring Matrices Local Alignment Alignment with Affine Gap Penalties
Global Alignment Scoring Matrices Local Alignment Alignment with Affine Gap Penalties From LCS to Alignment: Change the Scoring The Longest Common Subsequence (LCS) problem the simplest form of sequence
More informationYou will likely want to log into your assigned x-node from yesterday. Part 1: Shake-and-bake language generation
FSM Tutorial I assume that you re using bash as your shell; if not, then type bash before you start (you can use csh-derivatives if you want, but your mileage may vary). You will likely want to log into
More informationSAPLING: Suffix Array Piecewise Linear INdex for Genomics Michael Kirsche
SAPLING: Suffix Array Piecewise Linear INdex for Genomics Michael Kirsche mkirsche@jhu.edu StringBio 2018 Outline Substring Search Problem Caching and Learned Data Structures Methods Results Ongoing work
More informationRegular Expression Constrained Sequence Alignment
Regular Expression Constrained Sequence Alignment By Abdullah N. Arslan Department of Computer science University of Vermont Presented by Tomer Heber & Raz Nissim otivation When comparing two proteins,
More informationLecture 3: February Local Alignment: The Smith-Waterman Algorithm
CSCI1820: Sequence Alignment Spring 2017 Lecture 3: February 7 Lecturer: Sorin Istrail Scribe: Pranavan Chanthrakumar Note: LaTeX template courtesy of UC Berkeley EECS dept. Notes are also adapted from
More informationCIS-331 Spring 2016 Exam 1 Name: Total of 109 Points Version 1
Version 1 Instructions Write your name on the exam paper. Write your name and version number on the top of the yellow paper. Answer Question 1 on the exam paper. Answer Questions 2-4 on the yellow paper.
More informationA MACHINE LEARNING FRAMEWORK FOR SPOKEN-DIALOG CLASSIFICATION. Patrick Haffner Park Avenue Florham Park, NJ 07932
Springer Handbook on Speech Processing and Speech Communication A MACHINE LEARNING FRAMEWORK FOR SPOKEN-DIALOG CLASSIFICATION Corinna Cortes Google Research 76 Ninth Avenue New York, NY corinna@google.com
More informationCIS-331 Exam 2 Fall 2015 Total of 105 Points Version 1
Version 1 1. (20 Points) Given the class A network address 117.0.0.0 will be divided into multiple subnets. a. (5 Points) How many bits will be necessary to address 4,000 subnets? b. (5 Points) What is
More informationMachine Learning. Computational biology: Sequence alignment and profile HMMs
10-601 Machine Learning Computational biology: Sequence alignment and profile HMMs Central dogma DNA CCTGAGCCAACTATTGATGAA transcription mrna CCUGAGCCAACUAUUGAUGAA translation Protein PEPTIDE 2 Growth
More informationA multiple alignment tool in 3D
Outline Department of Computer Science, Bioinformatics Group University of Leipzig TBI Winterseminar Bled, Slovenia February 2005 Outline Outline 1 Multiple Alignments Problems Goal Outline Outline 1 Multiple
More informationBLAST - Basic Local Alignment Search Tool
Lecture for ic Bioinformatics (DD2450) April 11, 2013 Searching 1. Input: Query Sequence 2. Database of sequences 3. Subject Sequence(s) 4. Output: High Segment Pairs (HSPs) Sequence Similarity Measures:
More informationReconstructing long sequences from overlapping sequence fragment. Searching databases for related sequences and subsequences
SEQUENCE ALIGNMENT ALGORITHMS 1 Why compare sequences? Reconstructing long sequences from overlapping sequence fragment Searching databases for related sequences and subsequences Storing, retrieving and
More informationCIS-331 Fall 2014 Exam 1 Name: Total of 109 Points Version 1
Version 1 1. (24 Points) Show the routing tables for routers A, B, C, and D. Make sure you account for traffic to the Internet. Router A Router B Router C Router D Network Next Hop Next Hop Next Hop Next
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 04: Variations of sequence alignments http://www.pitt.edu/~mcs2/teaching/biocomp/tutorials/global.html Slides adapted from Dr. Shaojie Zhang (University
More informationFast Kernels for Inexact String Matching
Fast Kernels for Inexact String Matching Christina Leslie and Rui Kuang Columbia University, New York NY 10027, USA {rkuang,cleslie}@cs.columbia.edu Abstract. We introduce several new families of string
More informationStone Soup Translation
Stone Soup Translation DJ Hovermale and Jeremy Morris and Andrew Watts December 3, 2005 1 Introduction 2 Overview of Stone Soup Translation 2.1 Finite State Automata The Stone Soup Translation model is
More information4. Specifications and Additional Information
4. Specifications and Additional Information AGX52004-1.0 8B/10B Code This section provides information about the data and control codes for Arria GX devices. Code Notation The 8B/10B data and control
More informationBiostrings. Martin Morgan Bioconductor / Fred Hutchinson Cancer Research Center Seattle, WA, USA June 2009
Biostrings Martin Morgan Bioconductor / Fred Hutchinson Cancer Research Center Seattle, WA, USA 15-19 June 2009 Biostrings Representation DNA, RNA, amino acid, and general biological strings Manipulation
More informationToday s Lecture. Multiple sequence alignment. Improved scoring of pairwise alignments. Affine gap penalties Profiles
Today s Lecture Multiple sequence alignment Improved scoring of pairwise alignments Affine gap penalties Profiles 1 The Edit Graph for a Pair of Sequences G A C G T T G A A T G A C C C A C A T G A C G
More information1 Finite Representations of Languages
1 Finite Representations of Languages Languages may be infinite sets of strings. We need a finite notation for them. There are at least four ways to do this: 1. Language generators. The language can be
More informationMultiple Sequence Alignment Based on Profile Alignment of Intermediate Sequences
Multiple Sequence Alignment Based on Profile Alignment of Intermediate Sequences Yue Lu and Sing-Hoi Sze RECOMB 2007 Presented by: Wanxing Xu March 6, 2008 Content Biology Motivation Computation Problem
More informationSuggesting Edits to Explain Failing Traces
Suggesting Edits to Explain Failing Traces Giles Reger University of Manchester, UK Abstract. Runtime verification involves checking whether an execution trace produced by a running system satisfies a
More informationFinding homologous sequences in databases
Finding homologous sequences in databases There are multiple algorithms to search sequences databases BLAST (EMBL, NCBI, DDBJ, local) FASTA (EMBL, local) For protein only databases scan via Smith-Waterman
More informationSequence Alignment 1
Sequence Alignment 1 Nucleotide and Base Pairs Purine: A and G Pyrimidine: T and C 2 DNA 3 For this course DNA is double-helical, with two complementary strands. Complementary bases: Adenine (A) - Thymine
More informationThe Chvátal-Gomory Closure of a Strictly Convex Body is a Rational Polyhedron
The Chvátal-Gomory Closure of a Strictly Convex Body is a Rational Polyhedron Juan Pablo Vielma Joint work with Daniel Dadush and Santanu S. Dey July, Atlanta, GA Outline Introduction Proof: Step Step
More information15-780: Graduate Artificial Intelligence. Computational biology: Sequence alignment and profile HMMs
5-78: Graduate rtificial Intelligence omputational biology: Sequence alignment and profile HMMs entral dogma DN GGGG transcription mrn UGGUUUGUG translation Protein PEPIDE 2 omparison of Different Organisms
More informationSequence alignment theory and applications Session 3: BLAST algorithm
Sequence alignment theory and applications Session 3: BLAST algorithm Introduction to Bioinformatics online course : IBT Sonal Henson Learning Objectives Understand the principles of the BLAST algorithm
More informationBrief review from last class
Sequence Alignment Brief review from last class DNA is has direction, we will use only one (5 -> 3 ) and generate the opposite strand as needed. DNA is a 3D object (see lecture 1) but we will model it
More informationChapter 13, Sequence Data Mining
CSI 4352, Introduction to Data Mining Chapter 13, Sequence Data Mining Young-Rae Cho Associate Professor Department of Computer Science Baylor University Topics Single Sequence Mining Frequent sequence
More informationLexicographic Semirings for Exact Automata Encoding of Sequence Models
Lexicographic Semirings for Exact Automata Encoding of Sequence Models Brian Roark, Richard Sproat, and Izhak Shafran {roark,rws,zak}@cslu.ogi.edu Abstract In this paper we introduce a novel use of the
More informationOverview. Search and Decoding. HMM Speech Recognition. The Search Problem in ASR (1) Today s lecture. Steve Renals
Overview Search and Decoding Steve Renals Automatic Speech Recognition ASR Lecture 10 January - March 2012 Today s lecture Search in (large vocabulary) speech recognition Viterbi decoding Approximate search
More informationFinite-State Transducers in Language and Speech Processing
Finite-State Transducers in Language and Speech Processing Mehryar Mohri AT&T Labs-Research Finite-state machines have been used in various domains of natural language processing. We consider here the
More informationBLAST & Genome assembly
BLAST & Genome assembly Solon P. Pissis Tomáš Flouri Heidelberg Institute for Theoretical Studies May 15, 2014 1 BLAST What is BLAST? The algorithm 2 Genome assembly De novo assembly Mapping assembly 3
More informationAs of August 15, 2008, GenBank contained bases from reported sequences. The search procedure should be
48 Bioinformatics I, WS 09-10, S. Henz (script by D. Huson) November 26, 2009 4 BLAST and BLAT Outline of the chapter: 1. Heuristics for the pairwise local alignment of two sequences 2. BLAST: search and
More informationSequencing Alignment I
Sequencing Alignment I Lectures 16 Nov 21, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall (JHN) 022 1 Outline: Sequence
More informationLecture 12: Divide and Conquer Algorithms
Lecture 12: Divide and Conquer Algorithms Study Chapter 7.1 7.4 1 Divide and Conquer Algorithms Divide problem into sub-problems Conquer by solving sub-problems recursively. If the sub-problems are small
More informationLearning with Weighted Transducers
Learning with Weighted Transducers Corinna CORTES a and Mehryar MOHRI b,1 a Google Research, 76 Ninth Avenue, New York, NY 10011 b Courant Institute of Mathematical Sciences and Google Research, 251 Mercer
More informationCIS-331 Final Exam Spring 2015 Total of 115 Points. Version 1
Version 1 1. (25 Points) Given that a frame is formatted as follows: And given that a datagram is formatted as follows: And given that a TCP segment is formatted as follows: Assuming no options are present
More informationSVM cont d. Applications face detection [IEEE INTELLIGENT SYSTEMS]
SVM cont d A method of choice when examples are represented by vectors or matrices Input space cannot be readily used as attribute-vector (e.g. too many attrs) Kernel methods: map data from input space
More informationSparse matrices, graphs, and tree elimination
Logistics Week 6: Friday, Oct 2 1. I will be out of town next Tuesday, October 6, and so will not have office hours on that day. I will be around on Monday, except during the SCAN seminar (1:25-2:15);
More informationWFSTDM Builder Network-based Spoken Dialogue System Builder for Easy Prototyping
WFSTDM Builder Network-based Spoken Dialogue System Builder for Easy Prototyping Etsuo Mizukami and Chiori Hori Abstract This paper introduces a network-based spoken dialog system development tool kit:
More informationNew String Kernels for Biosequence Data
Workshop on Kernel Methods in Bioinformatics New String Kernels for Biosequence Data Christina Leslie Department of Computer Science Columbia University Biological Sequence Classification Problems Protein
More informationImproving the Performance of Text Categorization using N-gram Kernels
Improving the Performance of Text Categorization using N-gram Kernels Varsha K. V *., Santhosh Kumar C., Reghu Raj P. C. * * Department of Computer Science and Engineering Govt. Engineering College, Palakkad,
More information3.4 Multiple sequence alignment
3.4 Multiple sequence alignment Why produce a multiple sequence alignment? Using more than two sequences results in a more convincing alignment by revealing conserved regions in ALL of the sequences Aligned
More informationDarwin: A Genomic Co-processor gives up to 15,000X speedup on long read assembly (To appear in ASPLOS 2018)
Darwin: A Genomic Co-processor gives up to 15,000X speedup on long read assembly (To appear in ASPLOS 2018) Yatish Turakhia EE PhD candidate Stanford University Prof. Bill Dally (Electrical Engineering
More informationCIS-331 Exam 2 Fall 2014 Total of 105 Points. Version 1
Version 1 1. (20 Points) Given the class A network address 119.0.0.0 will be divided into a maximum of 15,900 subnets. a. (5 Points) How many bits will be necessary to address the 15,900 subnets? b. (5
More informationLASH: Large-Scale Sequence Mining with Hierarchies
LASH: Large-Scale Sequence Mining with Hierarchies Kaustubh Beedkar and Rainer Gemulla Data and Web Science Group University of Mannheim June 2 nd, 2015 SIGMOD 2015 Kaustubh Beedkar and Rainer Gemulla
More informationCIS-331 Fall 2013 Exam 1 Name: Total of 120 Points Version 1
Version 1 1. (24 Points) Show the routing tables for routers A, B, C, and D. Make sure you account for traffic to the Internet. NOTE: Router E should only be used for Internet traffic. Router A Router
More informationResearch Article An Improved Scoring Matrix for Multiple Sequence Alignment
Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2012, Article ID 490649, 9 pages doi:10.1155/2012/490649 Research Article An Improved Scoring Matrix for Multiple Sequence Alignment
More informationPrinciples of Bioinformatics. BIO540/STA569/CSI660 Fall 2010
Principles of Bioinformatics BIO540/STA569/CSI660 Fall 2010 Lecture 11 Multiple Sequence Alignment I Administrivia Administrivia The midterm examination will be Monday, October 18 th, in class. Closed
More informationBiology 644: Bioinformatics
A statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states in the training data. First used in speech and handwriting recognition In
More informationLearning to Align Sequences: A Maximum-Margin Approach
Learning to Align Sequences: A Maximum-Margin Approach Thorsten Joachims Department of Computer Science Cornell University Ithaca, NY 14853 tj@cs.cornell.edu August 28, 2003 Abstract We propose a discriminative
More informationNotes on Dynamic-Programming Sequence Alignment
Notes on Dynamic-Programming Sequence Alignment Introduction. Following its introduction by Needleman and Wunsch (1970), dynamic programming has become the method of choice for rigorous alignment of DNA
More informationPattern Mining in Frequent Dynamic Subgraphs
Pattern Mining in Frequent Dynamic Subgraphs Karsten M. Borgwardt, Hans-Peter Kriegel, Peter Wackersreuther Institute of Computer Science Ludwig-Maximilians-Universität Munich, Germany kb kriegel wackersr@dbs.ifi.lmu.de
More informationSpecial course in Computer Science: Advanced Text Algorithms
Special course in Computer Science: Advanced Text Algorithms Lecture 6: Alignments Elena Czeizler and Ion Petre Department of IT, Abo Akademi Computational Biomodelling Laboratory http://www.users.abo.fi/ipetre/textalg
More informationCreating a custom mappings similarity matrix
BioNumerics Tutorial: Creating a custom mappings similarity matrix 1 Aim In BioNumerics, character values can be mapped to categorical names according to predefined criteria (see tutorial Importing non-numerical
More informationECHO Process Instrumentation, Inc. Modbus RS485 Module. Operating Instructions. Version 1.0 June 2010
ECHO Process Instrumentation, Inc. Modbus RS485 Module Operating Instructions Version 1.0 June 2010 ECHO Process Instrumentation, Inc. PO Box 800 Shalimar, FL 32579 PH: 850-609-1300 FX: 850-651-4777 EM:
More informationDynamic Programming & Smith-Waterman algorithm
m m Seminar: Classical Papers in Bioinformatics May 3rd, 2010 m m 1 2 3 m m Introduction m Definition is a method of solving problems by breaking them down into simpler steps problem need to contain overlapping
More informationGribskov Profile. Hidden Markov Models. Building a Hidden Markov Model #$ %&
Gribskov Profile #$ %& Hidden Markov Models Building a Hidden Markov Model "! Proteins, DNA and other genomic features can be classified into families of related sequences and structures How to detect
More informationarxiv:cs/ v1 [cs.cc] 20 Jul 2005
arxiv:cs/0507052v1 [cs.cc] 20 Jul 2005 Finite automata for testing uniqueness of Eulerian trails Qiang Li T-Life Research Center, Fudan University, Shanghai, China Hui-Min Xie Department of Mathematics,
More informationWFST: Weighted Finite State Transducer. September 12, 2017 Prof. Marie Meteer
+ WFST: Weighted Finite State Transducer September 12, 217 Prof. Marie Meteer + FSAs: A recurring structure in speech 2 Phonetic HMM Viterbi trellis Language model Pronunciation modeling + The language
More informationSupport Vector Machines (SVM)
Support Vector Machines a new classifier Attractive because (SVM) Has sound mathematical foundations Performs very well in diverse and difficult applications See paper placed on the class website Review
More information2-Type Fire Retardant Closures
2-Type Fire Retardant Closures 2-Type Closures that can really take the heat. The 2-Type Fire Retardant Closure is completely self-contained and capable of withstanding a 15 minute horizontal or vertical
More informationSPAREPARTSCATALOG: CONNECTORS SPARE CONNECTORS KTM ART.-NR.: 3CM EN
SPAREPARTSCATALOG: CONNECTORS ART.-NR.: 3CM3208201EN CONTENT SPARE CONNECTORS AA-AN SPARE CONNECTORS AO-BC SPARE CONNECTORS BD-BQ SPARE CONNECTORS BR-CD 3 4 5 6 SPARE CONNECTORS CE-CR SPARE CONNECTORS
More informationWeighted Finite State Transducers in Automatic Speech Recognition
Weighted Finite State Transducers in Automatic Speech Recognition ZRE lecture 10.04.2013 Mirko Hannemann Slides provided with permission, Daniel Povey some slides from T. Schultz, M. Mohri and M. Riley
More informationJuly Registration of a Cyrillic Character Set. Status of this Memo
Network Working Group Request for Comments: 1489 A. Chernov RELCOM Development Team July 1993 Status of this Memo Registration of a Cyrillic Character Set This memo provides information for the Internet
More informationFASTA. Besides that, FASTA package provides SSEARCH, an implementation of the optimal Smith- Waterman algorithm.
FASTA INTRODUCTION Definition (by David J. Lipman and William R. Pearson in 1985) - Compares a sequence of protein to another sequence or database of a protein, or a sequence of DNA to another sequence
More informationPROTEIN MULTIPLE ALIGNMENT MOTIVATION: BACKGROUND: Marina Sirota
Marina Sirota MOTIVATION: PROTEIN MULTIPLE ALIGNMENT To study evolution on the genetic level across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein
More informationTCCAGGTG-GAT TGCAAGTGCG-T. Local Sequence Alignment & Heuristic Local Aligners. Review: Probabilistic Interpretation. Chance or true homology?
Local Sequence Alignment & Heuristic Local Aligners Lectures 18 Nov 28, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall
More informationPairwise Sequence Alignment: Dynamic Programming Algorithms. COMP Spring 2015 Luay Nakhleh, Rice University
Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 - Spring 2015 Luay Nakhleh, Rice University DP Algorithms for Pairwise Alignment The number of all possible pairwise alignments (if
More informationTowards Declarative and Efficient Querying on Protein Structures
Towards Declarative and Efficient Querying on Protein Structures Jignesh M. Patel University of Michigan Biology Data Types Sequences: AGCGGTA. Structure: Interaction Maps: Micro-arrays: Gene A Gene B
More informationA Rational Design for a Weighted Finite-State Transducer Library
A Rational Design for a Weighted Finite-State Transducer Library Mehryar Mohri, Fernando Pereira, and Michael Riley AT&T Labs -- Research 180 Park Avenue, Florham Park, NJ 07932-0971 1 Overview We describe
More informationThe Design Principles of a Weighted Finite-State Transducer Library
The Design Principles of a Weighted Finite-State Transducer Library Mehryar Mohri 1, Fernando Pereira and Michael Riley AT&T Labs Research, 180 Park Avenue, Florham Park, NJ 0793-0971 Abstract We describe
More informationLectures by Volker Heun, Daniel Huson and Knut Reinert, in particular last years lectures
4 FastA and the chaining problem We will discuss: Heuristics used by the FastA program for sequence alignment Chaining problem 4.1 Sources for this lecture Lectures by Volker Heun, Daniel Huson and Knut
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 06: Multiple Sequence Alignment https://upload.wikimedia.org/wikipedia/commons/thumb/7/79/rplp0_90_clustalw_aln.gif/575px-rplp0_90_clustalw_aln.gif Slides
More informationComputational Molecular Biology
Computational Molecular Biology Erwin M. Bakker Lecture 3, mainly from material by R. Shamir [2] and H.J. Hoogeboom [4]. 1 Pairwise Sequence Alignment Biological Motivation Algorithmic Aspect Recursive
More informationSPARE CONNECTORS KTM 2014
SPAREPARTSCATALOG: // ENGINE ART.-NR.: 3208201EN CONTENT CONNECTORS FOR WIRING HARNESS AA-AN CONNECTORS FOR WIRING HARNESS AO-BC CONNECTORS FOR WIRING HARNESS BD-BQ CONNECTORS FOR WIRING HARNESS BR-CD
More informationDynamic Programming: Sequence alignment. CS 466 Saurabh Sinha
Dynamic Programming: Sequence alignment CS 466 Saurabh Sinha DNA Sequence Comparison: First Success Story Finding sequence similarities with genes of known function is a common approach to infer a newly
More informationPairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 Luay Nakhleh, Rice University
1 Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 Luay Nakhleh, Rice University DP Algorithms for Pairwise Alignment 2 The number of all possible pairwise alignments (if gaps are allowed)
More informationCS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution
CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution Exam Facts Format Wednesday, July 20 th from 11:00 a.m. 1:00 p.m. in Gates B01 The exam is designed to take roughly 90
More informationJNTUWORLD. Code No: R
Code No: R09220504 R09 SET-1 B.Tech II Year - II Semester Examinations, April-May, 2012 FORMAL LANGUAGES AND AUTOMATA THEORY (Computer Science and Engineering) Time: 3 hours Max. Marks: 75 Answer any five
More informationBLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio. 1990. CS 466 Saurabh Sinha Motivation Sequence homology to a known protein suggest function of newly sequenced protein Bioinformatics
More informationSpeech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri
Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute eugenew@cs.nyu.edu Slide Credit: Mehryar Mohri Speech Recognition Components Acoustic and pronunciation model:
More informationIn this section we describe how to extend the match refinement to the multiple case and then use T-Coffee to heuristically compute a multiple trace.
5 Multiple Match Refinement and T-Coffee In this section we describe how to extend the match refinement to the multiple case and then use T-Coffee to heuristically compute a multiple trace. This exposition
More informationBioinformatics. Sequence alignment BLAST Significance. Next time Protein Structure
Bioinformatics Sequence alignment BLAST Significance Next time Protein Structure 1 Experimental origins of sequence data The Sanger dideoxynucleotide method F Each color is one lane of an electrophoresis
More informationCompares a sequence of protein to another sequence or database of a protein, or a sequence of DNA to another sequence or library of DNA.
Compares a sequence of protein to another sequence or database of a protein, or a sequence of DNA to another sequence or library of DNA. Fasta is used to compare a protein or DNA sequence to all of the
More informationRegular Expressions. Chapter 6
Regular Expressions Chapter 6 Regular Languages Generates Regular Language Regular Expression Recognizes or Accepts Finite State Machine Stephen Cole Kleene 1909 1994, mathematical logician One of many
More informationDivide and Conquer. Bioinformatics: Issues and Algorithms. CSE Fall 2007 Lecture 12
Divide and Conquer Bioinformatics: Issues and Algorithms CSE 308-408 Fall 007 Lecture 1 Lopresti Fall 007 Lecture 1-1 - Outline MergeSort Finding mid-point in alignment matrix in linear space Linear space
More informationAlgorithms for Computational Biology. String Distance Measures
Algorithms for Computational Biology Zsuzsanna Lipták Masters in Molecular and Medical Biotechnology a.a. 2015/16, fall term String Distance Measures Similarity vs. distance Two ways of measuring the same
More information