Sequence Comparison: Dynamic Programming. Genome 373 Genomic Informatics Elhanan Borenstein

Similar documents
Outline. Sequence Alignment. Types of Sequence Alignment. Genomics & Computational Biology. Section 2. How Computers Store Information

Pairwise Sequence Alignment: Dynamic Programming Algorithms. COMP Spring 2015 Luay Nakhleh, Rice University

Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 Luay Nakhleh, Rice University

Sequence analysis Pairwise sequence alignment

Sequence comparison: Local alignment

Today s Lecture. Edit graph & alignment algorithms. Local vs global Computational complexity of pairwise alignment Multiple sequence alignment

Sequence Alignment. part 2

Biology 644: Bioinformatics

CMSC423: Bioinformatic Algorithms, Databases and Tools Lecture 8. Note

Algorithmic Approaches for Biological Data, Lecture #20

Dynamic Programming User Manual v1.0 Anton E. Weisstein, Truman State University Aug. 19, 2014

Sequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p Multiple alignment

Multiple Sequence Alignment

Lecture 10. Sequence alignments

Notes on Dynamic-Programming Sequence Alignment

Sequence alignment is an essential concept for bioinformatics, as most of our data analysis and interpretation techniques make use of it.

Programming assignment for the course Sequence Analysis (2006)

Computational Biology Lecture 4: Overlap detection, Local Alignment, Space Efficient Needleman-Wunsch Saad Mneimneh

Mouse, Human, Chimpanzee

Alignment of Long Sequences

Software Implementation of Smith-Waterman Algorithm in FPGA

Brief review from last class

A Bit-Parallel, General Integer-Scoring Sequence Alignment Algorithm

Lecture Overview. Sequence search & alignment. Searching sequence databases. Sequence Alignment & Search. Goals: Motivations:

Sequence Alignment & Search

Accelerating Smith Waterman (SW) Algorithm on Altera Cyclone II Field Programmable Gate Array

An Analysis of Pairwise Sequence Alignment Algorithm Complexities: Needleman-Wunsch, Smith-Waterman, FASTA, BLAST and Gapped BLAST

15-780: Graduate Artificial Intelligence. Computational biology: Sequence alignment and profile HMMs

LAGAN and Multi-LAGAN: Efficient Tools for Large-Scale Multiple Alignment of Genomic DNA

Bioinformatics explained: Smith-Waterman

Principles of Bioinformatics. BIO540/STA569/CSI660 Fall 2010

Acceleration of the Smith-Waterman algorithm for DNA sequence alignment using an FPGA platform

Research on Pairwise Sequence Alignment Needleman-Wunsch Algorithm

OPEN MP-BASED PARALLEL AND SCALABLE GENETIC SEQUENCE ALIGNMENT

Multiple sequence alignment. November 20, 2018

The implementation of bit-parallelism for DNA sequence alignment

EECS730: Introduction to Bioinformatics

Acceleration of Algorithm of Smith-Waterman Using Recursive Variable Expansion.

Computational Genomics and Molecular Biology, Fall

Profiles and Multiple Alignments. COMP 571 Luay Nakhleh, Rice University

CISC 636 Computational Biology & Bioinformatics (Fall 2016) Phylogenetic Trees (I)

BMI/CS 576 Fall 2015 Midterm Exam

Local Alignment & Gap Penalties CMSC 423

Reminder: sequence alignment in sub-quadratic time

An FPGA-Based Web Server for High Performance Biological Sequence Alignment

Dynamic Programming Part I: Examples. Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, / 77

Solving Systems Using Row Operations 1 Name

Distributed Protein Sequence Alignment

Pairwise Sequence alignment Basic Algorithms

Algorithmic Paradigms. Chapter 6 Dynamic Programming. Steps in Dynamic Programming. Dynamic Programming. Dynamic Programming Applications

Homework Python-3. Sup Biotech 3 Python. Pierre Parutto

Chapter 9: Elementary Graph Algorithms Basic Graph Concepts

BIOL591: Introduction to Bioinformatics Alignment of pairs of sequences

Sequence alignment algorithms

Basics of Multiple Sequence Alignment

Biochemistry 324 Bioinformatics. Multiple Sequence Alignment (MSA)

Keywords -Bioinformatics, sequence alignment, Smith- waterman (SW) algorithm, GPU, CUDA

On the Efficacy of Haskell for High Performance Computational Biology

Lecture 2 Pairwise sequence alignment. Principles Computational Biology Teresa Przytycka, PhD

Scalable Accelerator Architecture for Local Alignment of DNA Sequences

Dynamic Programming & Smith-Waterman algorithm

Lectures 12 and 13 Dynamic programming: weighted interval scheduling

Bioinformatics 1: lecture 4. Followup of lecture 3? Molecular evolution Global, semi-global and local Affine gap penalty

Today s Lecture. Multiple sequence alignment. Improved scoring of pairwise alignments. Affine gap penalties Profiles

DNA Alignment With Affine Gap Penalties

Multiple Sequence Alignment (MSA)

Darwin-WGA. A Co-processor Provides Increased Sensitivity in Whole Genome Alignments with High Speedup

34 Bioinformatics I, WS 12/13, D. Huson, November 11, 2012

FASTA. Besides that, FASTA package provides SSEARCH, an implementation of the optimal Smith- Waterman algorithm.

Lecture 3: February Local Alignment: The Smith-Waterman Algorithm

Multiple Sequence Alignment Theory and Applications

1. R. Durbin, S. Eddy, A. Krogh und G. Mitchison: Biological sequence analysis, Cambridge, 1998

Pairwise alignment II

BGGN 213 Foundations of Bioinformatics Barry Grant

BLAST MCDB 187. Friday, February 8, 13

Compares a sequence of protein to another sequence or database of a protein, or a sequence of DNA to another sequence or library of DNA.

Sequence Alignment 1

Computational Molecular Biology

Computational Geometry

Rochester Institute of Technology. Making personalized education scalable using Sequence Alignment Algorithm

Reconfigurable Supercomputing with Scalable Systolic Arrays and In-Stream Control for Wavefront Genomics Processing

We extend SVM s in order to support multi-class classification problems. Consider the training dataset

In this section we describe how to extend the match refinement to the multiple case and then use T-Coffee to heuristically compute a multiple trace.

EECS730: Introduction to Bioinformatics

1. R. Durbin, S. Eddy, A. Krogh und G. Mitchison: Biological sequence analysis, Cambridge, 1998

TCCAGGTG-GAT TGCAAGTGCG-T. Local Sequence Alignment & Heuristic Local Aligners. Review: Probabilistic Interpretation. Chance or true homology?

Multiple Sequence Alignment. Mark Whitsitt - NCSA

Salvador Capella-Gutiérrez, Jose M. Silla-Martínez and Toni Gabaldón

Nonlinear pairwise alignment of seismic traces

EECS 4425: Introductory Computational Bioinformatics Fall Suprakash Datta

Lecture 5: Multiple sequence alignment

Robert Collins CSE486, Penn State. Lecture 09: Stereo Algorithms

BLAST. NCBI BLAST Basic Local Alignment Search Tool

PROTEIN MULTIPLE ALIGNMENT MOTIVATION: BACKGROUND: Marina Sirota

Bioinformatics for Biologists

bl,...,b, where i runs from 0 to m and j runs from 0 to n. Then with initial values

Opinionated. Lessons. #52 Dynamic Programming. in Statistics. by Bill Press

CHAPTER-6 WEB USAGE MINING USING CLUSTERING

BLAST & Genome assembly

A simple noise model. Algorithm sketch. A simple noise model. Estimating the probabilities

Transcription:

Sequence omparison: Dynamic Programming Genome 373 Genomic Informatics Elhanan Borenstein

quick review: hallenges Find the best global alignment of two sequences Find the best global alignment of multiple sequences Find the best local (partial) alignment of two sequences Find the best match (alignment) of a given sequence in a longer dataset of sequences

quick review: Global lignment Global lignment Mission: Find the best global alignment between two sequences. e.g., Find the best alignment of GT and T: GT T GT- -T -GT- --T GT- -T GT- -T GT- -T G-T T- GT- -T orrect alignment vs. best alignment

quick review: Global lignment Global lignment Mission: Find the best global alignment between two sequences. n algorithm for finding the alignment with the best score method for scoring alignments

quick review: Scoring aligned bases Substitution matrix: Gap penalty: G T Linear gap penalty 10-5 0-5 ffine gap penalty -5 10-5 0 G 0-5 10-5 T -5 0-5 10 GT- -T d=-4-5 + 10 + -4 + 10 + -4 + 10 = 17

quick review: Global lignment Global lignment Mission: Find the best global alignment between two sequences. n algorithm for finding the alignment with the best score method for scoring alignments?

Exhaustive search lign the two sequences: GT and T GT T GT- -T -GT- --T GT- -T GT- -T GT- -T G-T T- GT- -T Simple (exhaustive search) algorithm 1) onstruct all possible alignments 2) Use the substitution matrix and gap penalty to score each alignment 3) Pick the alignment with the best score

Exhaustive search lign the two sequences: GT and T GT T GT- -T -GT- --T GT- -T GT- -T GT- -T G-T T- GT- -T Simple (exhaustive search) algorithm 1) onstruct all possible alignments 2) Use the substitution matrix and gap penalty to score each alignment 3) Pick the alignment with the best score

omputational omplexity & the Big O Notation

Mission: Find the best global alignment of two sequences. search algorithm for finding the alignment with the best score method for scoring alignments? 1. More efficient search 2. recipe

Mission: Find the best global alignment of two sequences. search algorithm for finding the alignment with the best score method for scoring alignments? The Needleman Wunsch lgorithm

The Needleman Wunsch lgorithm n algorithm for global alignment on two sequences Dynamic Programming (DP) approach Yes, it s a weird name. DP is closely related to recursion and to mathematical induction We can prove that the resulting score is optimal.

DP matrix j 0 1 2 3 etc. i 0 1 2 3 4 5 T

DP matrix j 0 1 2 3 etc. i 0 1 2 3 4 5 T initial row and column

Best alignment of G to DP matrix j 0 1 2 3 etc. i 0 1 2 5 Which value are we interested in? 3 T 4 5 The value at (i,j) is the score of the best alignment of the first i characters of one sequence versus the first j characters of the other sequence.

DP matrix j 0 1 2 3 etc. i 0 1 2 3 4 T 5 The score of the best alignment of the two sequences. 5

Moving in the DP matrix 5 T

DP matrix 5 T

Best alignment of G and - DP matrix 5 T Moving horizontally in the matrix introduces a gap in the sequence along the left edge.

Best alignment of G and - DP matrix 5 1 T Moving horizontally in the matrix introduces a gap in the sequence along the left edge.

Best alignment of G and - T DP matrix 5 Moving vertically in the matrix introduces a gap in the sequence along the top edge. T 1

DP matrix 5 T

Best alignment of G and T DP matrix 5 Moving diagonally in the matrix aligns two residues T 0

So. an I now fill this matrix? T 7

So. an I now fill this matrix? T 7 3 3 17

Initialization T

Initialization 0 T

G - Introducing a gap 0-4 T

- Introducing a gap 0-4 -4 T

----- T omplete first row and column 0-4 -8-12 -16-20 -4-8 T -12-16 -20

What about i=1, j=1 i 0 1 2 3 4 5 j 0 1 2 3 etc. 0-4 -8-12 -16-20 -4? -8 T -12-16 -20

i 0 1 2 3 4 5 G- - Three ways to get to i=1, j=1 j 0 1 2 3 etc. 0-4 -8-12 -16-20 -4-8 -8 T -12-16 -20

i 0 1 2 3 4 5 -G - Three ways to get to i=1, j=1 j 0 1 2 3 etc. 0-4 -8-12 -16-20 -4-8 -8 T -12-16 -20

i 0 1 2 3 4 5 G 0-4 -8-12 -16-20 -4-5 -8 T -12-16 -20 Three ways to get to i=1, j=1 j 0 1 2 3 etc.

ccept the highest scoring of the three 0-4 -8-12 -16-20 -4-5 -8 T -12-16 Then simply repeat the same rule progressively across the matrix -20

DP matrix 0-4 -8-12 -16-20 -4-5 -8? T -12-16 -20

G- -G --G - -4+0=-4-5+-4=-9-8+-4=-12 DP matrix 0-4 -8-12 -16-20 -4-5 -4-8? T -12-16 -20 0-4

G- -5+-4=-9 -G -4+0=-4 --G - -8+-4=-12 DP matrix 0-4 -8-12 -16-20 -4-5 -4-8 -4 T -12-16 -20 0-4

DP matrix 0-4 -8-12 -16-20 -4-5 -8-4 T -12? -16? -20?

DP matrix 0-4 -8-12 -16-20 -4-5 -8-4 T -12-8 -16-12 -20-16

DP matrix 0-4 -8-12 -16-20 -4-5? -8-4? T -12-8? -16-12? -20-16?

Traceback 0-4 -8-12 -16-20 -4-5 -9-8 -4 5 What is the alignment associated with this entry? T -12-8 1-16 -12 2-20 -16-2

Traceback 0-4 -8-12 -16-20 -4-5 -9-8 -4 5 T -12-8 1-16 -12 2-20 -16-2 What is the alignment associated with this entry? Just follow the arrows back - this is called the traceback -G- T

Full lignment 0-4 -8-12 -16-20 -4-5 -9-8 -4 5 T -12-8 1-16 -12 2 ontinue and find the optimal global alignment, and its score. -20-16 -2?

Full lignment 0-4 -8-12 -16-20 -4-5 -9-13 -12-6 -8-4 5 1-3 -7 T -12-8 1 0 11 7-16 -12 2 11 7 6-20 -16-2 7 11 17

Full lignment 0-4 -8-12 -16-20 -4 Best -5 alignment -9 starts -13 at -12-6 -8 bottom -4 right 5 and follows 1-3 traceback arrows to top left -7 T -12-8 1 0 11 7-16 -12 2 11 7 6-20 -16-2 7 11 17

G-T T- One best traceback 0-4 -8-12 -16-20 -4-5 -9-13 -12-6 -8-4 5 1-3 -7 T -12-8 1 0 11 7-16 -12 2 11 7 6-20 -16-2 7 11 17

GT- -T nother best traceback 0-4 -8-12 -16-20 -4-5 -9-13 -12-6 -8-4 5 1-3 -7 T -12-8 1 0 11 7-16 -12 2 11 7 6-20 -16-2 7 11 17

GT- -T G-T T- 0-4 -8-12 -16-20 -4-5 -9-13 -12-6 -8-4 5 1-3 -7 T -12-8 1 0 11 7-16 -12 2 11 7 6-20 -16-2 7 11 17

Multiple solutions G-T T- GT- -T GT- -T When a program returns a single sequence alignment, it may not be the only best alignment but it is guaranteed to be one of them. In our example, all of the alignments at the left have equal scores. GT- -T

What s the complexity of this algorithm?

Practice problem: Find a best pairwise alignment of GT and TT 0 T T

DP in equation form lign sequence x and y. F is the DP matrix; s is the substitution matrix; d is the linear gap penalty. d j i F d j i F y x s j i F j i F F j i 1, 1,, 1 1, max, 0 0,0

DP equation graphically F i 1, j 1 F i, j 1 s x, y i j d F i 1, j d F i, j take the max of these three