Alignment of Pairs of Sequences

Similar documents
Heuristic methods for pairwise alignment:

Computational Molecular Biology

Lecture Overview. Sequence search & alignment. Searching sequence databases. Sequence Alignment & Search. Goals: Motivations:

Basic Local Alignment Search Tool (BLAST)

Sequence Alignment & Search

As of August 15, 2008, GenBank contained bases from reported sequences. The search procedure should be

Compares a sequence of protein to another sequence or database of a protein, or a sequence of DNA to another sequence or library of DNA.

Bioinformatics. Sequence alignment BLAST Significance. Next time Protein Structure

.. Fall 2011 CSC 570: Bioinformatics Alexander Dekhtyar..

FASTA. Besides that, FASTA package provides SSEARCH, an implementation of the optimal Smith- Waterman algorithm.

Bioinformatics explained: BLAST. March 8, 2007

Sequence alignment theory and applications Session 3: BLAST algorithm

Global Alignment Scoring Matrices Local Alignment Alignment with Affine Gap Penalties

Multiple Sequence Alignment. Mark Whitsitt - NCSA

Bioinformatics for Biologists

Computational Genomics and Molecular Biology, Fall

Database Searching Using BLAST

B L A S T! BLAST: Basic local alignment search tool. Copyright notice. February 6, Pairwise alignment: key points. Outline of tonight s lecture

BLAST - Basic Local Alignment Search Tool

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha

Lecture 2 Pairwise sequence alignment. Principles Computational Biology Teresa Przytycka, PhD

Similarity Searches on Sequence Databases

Tutorial 1: Exploring the UCSC Genome Browser

AlignMe Manual. Version 1.1. Rene Staritzbichler, Marcus Stamm, Kamil Khafizov and Lucy R. Forrest

24 Grundlagen der Bioinformatik, SS 10, D. Huson, April 26, This lecture is based on the following papers, which are all recommended reading:

BLAST, Profile, and PSI-BLAST

COS 551: Introduction to Computational Molecular Biology Lecture: Oct 17, 2000 Lecturer: Mona Singh Scribe: Jacob Brenner 1. Database Searching

CS313 Exercise 4 Cover Page Fall 2017

Principles of Bioinformatics. BIO540/STA569/CSI660 Fall 2010

Scoring and heuristic methods for sequence alignment CG 17

Similarity searches in biological sequence databases

EECS730: Introduction to Bioinformatics

Biology 644: Bioinformatics

ICB Fall G4120: Introduction to Computational Biology. Oliver Jovanovic, Ph.D. Columbia University Department of Microbiology

BGGN 213 Foundations of Bioinformatics Barry Grant

2) NCBI BLAST tutorial This is a users guide written by the education department at NCBI.

Today s Lecture. Multiple sequence alignment. Improved scoring of pairwise alignments. Affine gap penalties Profiles

PROTEIN MULTIPLE ALIGNMENT MOTIVATION: BACKGROUND: Marina Sirota

TCCAGGTG-GAT TGCAAGTGCG-T. Local Sequence Alignment & Heuristic Local Aligners. Review: Probabilistic Interpretation. Chance or true homology?

Wilson Leung 01/03/2018 An Introduction to NCBI BLAST. Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

Lab 4: Multiple Sequence Alignment (MSA)

An Analysis of Pairwise Sequence Alignment Algorithm Complexities: Needleman-Wunsch, Smith-Waterman, FASTA, BLAST and Gapped BLAST

Outline. Sequence Alignment. Types of Sequence Alignment. Genomics & Computational Biology. Section 2. How Computers Store Information

15-780: Graduate Artificial Intelligence. Computational biology: Sequence alignment and profile HMMs

Homology Modeling FABP

LAGAN and Multi-LAGAN: Efficient Tools for Large-Scale Multiple Alignment of Genomic DNA

Dynamic Programming User Manual v1.0 Anton E. Weisstein, Truman State University Aug. 19, 2014

CS 284A: Algorithms for Computational Biology Notes on Lecture: BLAST. The statistics of alignment scores.

Multiple Sequence Alignment: Multidimensional. Biological Motivation

Brief review from last class

Efficient Implementation of a Generalized Pair HMM for Comparative Gene Finding. B. Majoros M. Pertea S.L. Salzberg

GLOBEX Bioinformatics (Summer 2015) Multiple Sequence Alignment

C E N T R. Introduction to bioinformatics 2007 E B I O I N F O R M A T I C S V U F O R I N T. Lecture 13 G R A T I V. Iterative homology searching,

Dynamic Programming Part I: Examples. Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, / 77

Lesson 13 Molecular Evolution

A Coprocessor Architecture for Fast Protein Structure Prediction

Wilson Leung 05/27/2008 A Simple Introduction to NCBI BLAST

CISC 889 Bioinformatics (Spring 2003) Multiple Sequence Alignment

BLAST MCDB 187. Friday, February 8, 13

Multiple Sequence Alignment Gene Finding, Conserved Elements

Bioinformatics explained: Smith-Waterman

EECS730: Introduction to Bioinformatics

Lecture 5 Advanced BLAST

Sequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p Multiple alignment

Reconstructing long sequences from overlapping sequence fragment. Searching databases for related sequences and subsequences

Machine Learning. Computational biology: Sequence alignment and profile HMMs

Genome Browsers Guide

Today s Lecture. Edit graph & alignment algorithms. Local vs global Computational complexity of pairwise alignment Multiple sequence alignment

Research Article Aligning Sequences by Minimum Description Length

COMPARATIVE MICROBIAL GENOMICS ANALYSIS WORKSHOP. Exercise 2: Predicting Protein-encoding Genes, BlastMatrix, BlastAtlas

- G T G T A C A C

Sequence Alignment Heuristics

Distributed Protein Sequence Alignment

EECS 4425: Introductory Computational Bioinformatics Fall Suprakash Datta

Chapter 4: Blast. Chaochun Wei Fall 2014

Computational Molecular Biology

From Smith-Waterman to BLAST

Parsimony-Based Approaches to Inferring Phylogenetic Trees

Introduction to Bioinformatics Online Course: IBT

A Study On Pair-Wise Local Alignment Of Protein Sequence For Identifying The Structural Similarity

Algorithmic Approaches for Biological Data, Lecture #20

B I O I N F O R M A T I C S

Sequence analysis Pairwise sequence alignment

Pairwise Sequence Alignment. Zhongming Zhao, PhD

User Manual. Ver. 3.0 March 19, 2012

Homology Modeling Professional for HyperChem Release Notes

Data Mining Technologies for Bioinformatics Sequences

Alignments BLAST, BLAT

Profiles and Multiple Alignments. COMP 571 Luay Nakhleh, Rice University

Bioinformatics Sequence comparison 2 local pairwise alignment

Dynamic Programming: Sequence alignment. CS 466 Saurabh Sinha

CAP BLAST. BIOINFORMATICS Su-Shing Chen CISE. 8/20/2005 Su-Shing Chen, CISE 1

Data Walkthrough: Background

Biochemistry 324 Bioinformatics. Multiple Sequence Alignment (MSA)

1. R. Durbin, S. Eddy, A. Krogh und G. Mitchison: Biological sequence analysis, Cambridge, 1998

Missing Data Estimation in Microarrays Using Multi-Organism Approach

BIOL 7020 Special Topics Cell/Molecular: Molecular Phylogenetics. Spring 2010 Section A

Sept. 9, An Introduction to Bioinformatics. Special Topics BSC5936:

Biology 644: Bioinformatics

Transcription:

Bi03a_1 Unit 03a: Alignment of Pairs of Sequences Partners for alignment Bi03a_2 Protein 1 Protein 2 =amino-acid sequences (20 letter alphabeth + gap) LGPSSKQTGKGS-SRIWDN LN-ITKSAGKGAIMRLGDA -------TGKG-------- -------AGKG-------- Global alignment Local alignment global alignment: entire sequence local alignment: parts of sequence...searching for characters or character patterns that are the same 1

Partners for alignment Bi03a_3 DNA 1 DNA 2 =Nucleotide sequences (4 letter alphabeth + gap)...actggaagtc......actgaacgta... global alignment: entire sequence local alignment: parts of sequence...searching for characters or character patterns that are the same Partners for alignment Bi03a_4 DNA Protein translate via genetic code GCC TCC GAC AAG CTC ATG Protein Protein ASDKLM global alignment: entire sequence local alignment: parts of sequence...searching for characters or character patterns that are the same 2

Partners for alignment Bi03a_5 Protein Structures high resolution...amlllmbsak..alm........ coil α-helix coil β-sheet global alignment: entire sequence local alignment: parts of sequence...searching for characters or character patterns that are the same Partners for alignment Bi03a_6 Protein Environment...AMLLLMBSAK..ALM........ hydrophilic hydrophobic polar global alignment: entire sequence local alignment: parts of sequence...searching for characters or character patterns that are the same 3

Aligning Sequences: What For? Bi03a_7 Similarity: many equal aa-residues in an alignment Homology: 2 sequences have a common ancestor (father) similarity? homology! Types of Homology Bi03a_8 A Speciation B C C Duplication A is the parent gene Speciation leads to B and C Gene duplication leads to C B and C are ORTHOLOGS C and C are PARALOGS from Altman (1999) orthologs: genes with same function in various species, have arisen by speciation paralogs: have arisen by geneduplication events ( members of multigene families 4

Aligning Sequences: What For?, ctd. Bi03a_9 given a new sequence with unknown 3D structure & biological function 1) 2) Align it to sequences in DB and find similar sequences, with known structure & function suggestion: if sequence is similar, then strucutre & function might also be similar get an idea (learn something) about structure & function of new sequence Aligning Sequences: Flow Chart Bi03a_10 Choose two sequences Are the sequences protein sequences? No Do sequences encode protein (e.g., cdna)? No Does sequence encode proteins and have introns? No Yes Yes Yes Perform local alignment Translate sequences Predict gene structure Is alignment of high quality? No Alter parameters, e.g., scoriing matrix, gap penalties, and repeat alignment Yes Perform statistical test of alignment score Examine sequences for presence of repeats or low-complexity sequences Yes Did alignment improve? No Is the alignment score significant? No Sequences are not detectably similar Yes Sequences are significantly similar high resolution from Mount (2001) 5

Methods of Sequence Alignment Bi03a_11 Modes of Analysis Dot matrix: visual analysis Dynamic Programming (DP) algorithm k-tuple-methods (FASTA, BLAST). Dot Matrix Alignment Bi03a_12 Modes of Analysis direct sliding window (filtered) weighted via match matrices 6

Dot Matrix Alignment Bi03a_13 MODE: direct exact match: score = 1 no match: score = 0 sequence 2 sequence 1 D O T M A T R I C E S A R E G R E A T 1 M A 1 1 1 T 1 1 1 R 1 1 1 I 1 C 1 E 1 1 1 S 1 M 1 A 1 1 1 Y G 1 R 1 1 1 E 1 1 1 A 1 1 1 T 1 1 1 L Y D 1 O 1 F U N DOT MATRIX: FUN EXAMPLE DOTMATRIX.pdf Dot Matrix Alignment Bi03a_14 MODE: direct exact match: score = 1 no match: score = 0 T H E F I R S T P A R T O F T H E S E Q U E N C E S S S S S S T 1 1 1 1 H 1 1 I 1 S 1 1 1 1 1 1 1 1 S 1 1 1 1 1 1 1 1 S 1 1 1 1 1 1 1 1 S 1 1 1 1 1 1 1 1 S 1 1 1 1 1 1 1 1 S 1 1 1 1 1 1 1 1 T 1 1 1 1 H 1 1 E 1 1 1 1 1 F 1 1 I 1 R 1 1 S 1 1 1 1 1 1 1 1 T 1 1 1 1 P 1 A 1 R 1 1 T 1 1 1 1 O 1 F 1 1 DOTMATRIX_gap.pdf T 1 1 1 1 H 1 1 E 1 1 1 1 1 S 1 1 1 1 1 1 1 1 E 1 1 1 1 1 Q 1 U 1 E 1 1 1 1 1 N 1 C 1 E 1 1 1 1 1 7

Dot Matrix Alignment Bi03a_15 MODE: direct exact match: score = 1 no match: score = 0 Dot matrix analysis of the amino acid sequences of the phage λ ci (horizontal sequence) and phage P22 c2 (vertical sequence) repressors. The window size and stringency were both 1. high resolution from Mount (2001) Dot Matrix Alignment Bi03a_16 MODE: sliding window (window length w/stringency s) if at least s matches in window of length w: score = 1 otherwise: score = 0 high resolution Dot matrix analysis of DNA sequences encoding phage λ ci (vertical sequence) and phage P22 c2 (horizontal sequence) represors. This analysis was performed using the dot matrix display of the Macintosh DNA sequence analysis program DNA Strider, vers. 1.3. The window size was 11 and the stringency 7, meaning that a dot is printed at a matrix position only if 7 out of the next 11 positions in the sequences are identifical. from Mount (2001) 8

Dot Matrix Alignment Bi03a_17 MODE: sliding window (window length w/stringency s) if at least s matches in window of length w: score = 1 otherwise: score = 0 w = 1, s = 1 w = 23, s = 7 dir1-1rep.gif repeat.gif Example after Mount 2001: Dot matrix analysis of the human LDL receptor against itself Dot Matrix Alignment Bi03a_18 MODE: using match matrices if similarity value > limit: score = 1 otherwise: score = 0 Mode: match matrices + sliding window if sum of similarity values in window > limit:score =1 otherwise: score = 0 9

Excursion: : Match Matrices Bi03a_19 Idea: in a protein replace one side chain (aa 1 ) by another (aa 2 ):... little Babylon around Match matrices... match matrices substitution matrices similarity matrices scoring matrices aa-replacement Bi03a_20 small effect, replacement occurs often large effect, replacement occurs rarely (seldom) aa k aa i aa j matrices 10

aa-replacement Bi03a_21 aa k details: small effect, replacement occurs often large effect, replacement occurs rarely (seldom) aai 1.) Altschul SF: Amino acid substitution matrices from an information theoretic perspective. J.Mol.Biol. 1991; 219:555-565 and 2.) http://www.sbc.su.se/~arne/kurser/ swell/substmatrix.html matrices aa j p i = p j = q ij = ab initio /background probability for aa i to occur ( in all proteins considered... ) likewise for aa j target frequency = probability to find aa i and aa j vis à vis in an alignment of 2 proteins out of the family proteins considered. Alignment based on golden Standard (3Dstructure) whenever possible. s ij substitution matrix score qij = ln / λ pp i j odds ratio normalization factor Match Matrices, Motivation Bi03a_22 is the effect (on structure and function) large or small? effect depends on similarity/non similarity of aa 1, and aa 2, aa 3 similarity regarding: size, polarity, charge... surveys: http://www.ncbi.nlm.nih.gov/education/blastinfo/scoring2.html 11

Bi03a_23 20 amino acids from Brown (1999) high resolution Bi03a_24 Match Matrices, Where From? ) generated by searching & evaluating databases for aa-substitutions between known related proteins. ) Different Types of DB, search & statistical evaluation gives rise to different matrices: PAM-matrices (Percent Accepted Mutation) BLOSUM-matrices (BLOcks SUbstitution Matrix)... what is meant by related?... 12

BLOSUM versus PAM matrices Bi03a_25 BLOSUMandPAM.gif Match Matrices, Meaning of Values Bi03a_26 Match matrix PAM250 A R N D C Q E G H I L K M F P S T W Y V A 2 R -2 6 N 0 0 2 D 0-1 2 4 C -2-4 -4-5 4 Q 0 1 1 2-5 4 E 0-1 1 3-5 2 4 G 1-3 0 1-3 -1 0 5 H -1 2 2 1-3 3 1-2 6 I -1-2 -2-2 -2-2 -2-3 -2 5 L -2-3 -3-4 -6-2 -3-4 -2 2 6 K -1 3 1 0-5 1 0-2 0-2 -3 5 M -1 0-2 -3-5 -1-2 -3-2 2 4 0 6 F -4-4 -4-6 -4-5 -5-5 -2 1 2-5 0 9 P 1 0-1 -1-3 0-1 -1 0-2 -3-1 -2-5 6 S 1 0 1 0 0-1 0 1-1 -1-3 0-2 -3 1 3 T 1-1 0 0-2 -1 0 0-1 0-2 0-1 -2 0 1 3 W -6 2-4 -7-8 -5-7 -7-3 -5-2 -3-4 0-6 -2-5 17 Y -3-4 -2-4 0-4 -4-5 0-1 -1-4 -2 7-5 -3-3 0 10 V 0-2 -2-2 -2-2 -2-1 -2 4 2-2 2-1 -1-1 0-6 -2 4 PAM250.pdf high value: favourable replacement low value: non favourable replacement value in diagonal: log odds for retaining aa. PAM250: 250 percent mutations accepted per 100 residues 13

Dot Matrices: Real life examples & Software Bi03a_27 Yeast Chromosome similarity viewer Online: http://genome-www.stanford.edu/saccharomyces/ssv/viewer_start.html Offline: viewer-start.html Dot Matrices: Real life examples & Software Bi03a_28 Tour to Human Haptoglobin Repeat Domains Online: http://www.ncbi.nlm.nih.gov/entrez/ Offline: Entrez-A.gif 14

Dot Matrices: Real life examples & Software Bi03a_29 Tour to Human Haptoglobin Repeat Domains Offline: Entrez-B.gif Dot Matrices: Real life examples & Software Bi03a_30 Tour to Human Haptoglobin Repeat Domains Offline: Entrez-C.gif 15

Dot Matrices: Real life examples & Software Bi03a_31 Tour to Human Haptoglobin Repeat Domains Offline: HPT2_HUMAN_FASTA.txt Dot Matrices: Real life examples & Software Bi03a_32 Tour to Human Haptoglobin Repeat Domains Online: http://www.isrec.isb-sib.ch/java/dotlet/dotlet.html Offline: Dotlet-A.gif 16

Dot Matrices: Real life examples & Software Bi03a_33 Tour to Human Haptoglobin Repeat Domains Offline: Dotlet-B.gif Dot Matrices: Real life examples & Software Bi03a_34 Tour to Human versus Rat Apolipoprotein Online: http://www.ncbi.nlm.nih.gov/entrez/ Offline: Entrez_A.gif 17

Dot Matrices: Real life examples & Software Bi03a_35 Tour to Human versus Rat Apolipoprotein Offline: Entrez_B.gif Dot Matrices: Real life examples & Software Bi03a_36 Tour to Human versus Rat Apolipoprotein Offline: human_apo_i_fasta.txt 18

Dot Matrices: Real life examples & Software Bi03a_37 Tour to Human versus Rat Apolipoprotein Online: http://www.ncbi.nlm.nih.gov/entrez/ Offline: Entrez_D.gif Dot Matrices: Real life examples & Software Bi03a_38 Tour to Human versus Rat Apolipoprotein Offline: rat_apo_i_fasta.txt 19

Dot Matrices: Real life examples & Software Bi03a_39 Tour to Human versus Rat Apolipoprotein Online: human_vs_rat_dotlet.gif Offline: http://www.isrec.isb-sib.ch/java/dotlet/dotlet.html Dot Matrices: Real life examples & Software Bi03a_40 Program Dotlet (+Tutorial) Online: http://us.expasy.org/java/dotlet/dotlet_examples.html Offline: Bild0.gif 20

Dot Matrices: Real life examples & Software Bi03a_41 Conserved protein domains Online: http://us.expasy.org/java/dotlet/consdomain.html Offline: Bild1.gif Dot Matrices: Real life examples & Software Bi03a_42 Exons and introns Online: http://us.expasy.org/java/dotlet/exonintron.html Offline: Bild2.gif 21

Dot Matrices: Real life examples & Software Bi03a_43 Terminators and other stem-loop structures Online: http://us.expasy.org/java/dotlet/terminator.html Offline: Bild3.gif Dot Matrices: Real life examples & Software Bi03a_44 Frameshifts Online: http://us.expasy.org/java/dotlet/frameshift Offline: Bild4.gif 22

Dot Matrices: Real life examples & Software Bi03a_45 Low-complexity regions Online: http://us.expasy.org/java/dotlet/lowcom.html Offline: Bild5.gif Dot Matrices: Real life examples & Software Bi03a_46 Repeated protein domains Online: http://us.expasy.org/java/dotlet/repeats.html Offline: Bild6.gif 23