Bi03a_1 Unit 03a: Alignment of Pairs of Sequences Partners for alignment Bi03a_2 Protein 1 Protein 2 =amino-acid sequences (20 letter alphabeth + gap) LGPSSKQTGKGS-SRIWDN LN-ITKSAGKGAIMRLGDA -------TGKG-------- -------AGKG-------- Global alignment Local alignment global alignment: entire sequence local alignment: parts of sequence...searching for characters or character patterns that are the same 1
Partners for alignment Bi03a_3 DNA 1 DNA 2 =Nucleotide sequences (4 letter alphabeth + gap)...actggaagtc......actgaacgta... global alignment: entire sequence local alignment: parts of sequence...searching for characters or character patterns that are the same Partners for alignment Bi03a_4 DNA Protein translate via genetic code GCC TCC GAC AAG CTC ATG Protein Protein ASDKLM global alignment: entire sequence local alignment: parts of sequence...searching for characters or character patterns that are the same 2
Partners for alignment Bi03a_5 Protein Structures high resolution...amlllmbsak..alm........ coil α-helix coil β-sheet global alignment: entire sequence local alignment: parts of sequence...searching for characters or character patterns that are the same Partners for alignment Bi03a_6 Protein Environment...AMLLLMBSAK..ALM........ hydrophilic hydrophobic polar global alignment: entire sequence local alignment: parts of sequence...searching for characters or character patterns that are the same 3
Aligning Sequences: What For? Bi03a_7 Similarity: many equal aa-residues in an alignment Homology: 2 sequences have a common ancestor (father) similarity? homology! Types of Homology Bi03a_8 A Speciation B C C Duplication A is the parent gene Speciation leads to B and C Gene duplication leads to C B and C are ORTHOLOGS C and C are PARALOGS from Altman (1999) orthologs: genes with same function in various species, have arisen by speciation paralogs: have arisen by geneduplication events ( members of multigene families 4
Aligning Sequences: What For?, ctd. Bi03a_9 given a new sequence with unknown 3D structure & biological function 1) 2) Align it to sequences in DB and find similar sequences, with known structure & function suggestion: if sequence is similar, then strucutre & function might also be similar get an idea (learn something) about structure & function of new sequence Aligning Sequences: Flow Chart Bi03a_10 Choose two sequences Are the sequences protein sequences? No Do sequences encode protein (e.g., cdna)? No Does sequence encode proteins and have introns? No Yes Yes Yes Perform local alignment Translate sequences Predict gene structure Is alignment of high quality? No Alter parameters, e.g., scoriing matrix, gap penalties, and repeat alignment Yes Perform statistical test of alignment score Examine sequences for presence of repeats or low-complexity sequences Yes Did alignment improve? No Is the alignment score significant? No Sequences are not detectably similar Yes Sequences are significantly similar high resolution from Mount (2001) 5
Methods of Sequence Alignment Bi03a_11 Modes of Analysis Dot matrix: visual analysis Dynamic Programming (DP) algorithm k-tuple-methods (FASTA, BLAST). Dot Matrix Alignment Bi03a_12 Modes of Analysis direct sliding window (filtered) weighted via match matrices 6
Dot Matrix Alignment Bi03a_13 MODE: direct exact match: score = 1 no match: score = 0 sequence 2 sequence 1 D O T M A T R I C E S A R E G R E A T 1 M A 1 1 1 T 1 1 1 R 1 1 1 I 1 C 1 E 1 1 1 S 1 M 1 A 1 1 1 Y G 1 R 1 1 1 E 1 1 1 A 1 1 1 T 1 1 1 L Y D 1 O 1 F U N DOT MATRIX: FUN EXAMPLE DOTMATRIX.pdf Dot Matrix Alignment Bi03a_14 MODE: direct exact match: score = 1 no match: score = 0 T H E F I R S T P A R T O F T H E S E Q U E N C E S S S S S S T 1 1 1 1 H 1 1 I 1 S 1 1 1 1 1 1 1 1 S 1 1 1 1 1 1 1 1 S 1 1 1 1 1 1 1 1 S 1 1 1 1 1 1 1 1 S 1 1 1 1 1 1 1 1 S 1 1 1 1 1 1 1 1 T 1 1 1 1 H 1 1 E 1 1 1 1 1 F 1 1 I 1 R 1 1 S 1 1 1 1 1 1 1 1 T 1 1 1 1 P 1 A 1 R 1 1 T 1 1 1 1 O 1 F 1 1 DOTMATRIX_gap.pdf T 1 1 1 1 H 1 1 E 1 1 1 1 1 S 1 1 1 1 1 1 1 1 E 1 1 1 1 1 Q 1 U 1 E 1 1 1 1 1 N 1 C 1 E 1 1 1 1 1 7
Dot Matrix Alignment Bi03a_15 MODE: direct exact match: score = 1 no match: score = 0 Dot matrix analysis of the amino acid sequences of the phage λ ci (horizontal sequence) and phage P22 c2 (vertical sequence) repressors. The window size and stringency were both 1. high resolution from Mount (2001) Dot Matrix Alignment Bi03a_16 MODE: sliding window (window length w/stringency s) if at least s matches in window of length w: score = 1 otherwise: score = 0 high resolution Dot matrix analysis of DNA sequences encoding phage λ ci (vertical sequence) and phage P22 c2 (horizontal sequence) represors. This analysis was performed using the dot matrix display of the Macintosh DNA sequence analysis program DNA Strider, vers. 1.3. The window size was 11 and the stringency 7, meaning that a dot is printed at a matrix position only if 7 out of the next 11 positions in the sequences are identifical. from Mount (2001) 8
Dot Matrix Alignment Bi03a_17 MODE: sliding window (window length w/stringency s) if at least s matches in window of length w: score = 1 otherwise: score = 0 w = 1, s = 1 w = 23, s = 7 dir1-1rep.gif repeat.gif Example after Mount 2001: Dot matrix analysis of the human LDL receptor against itself Dot Matrix Alignment Bi03a_18 MODE: using match matrices if similarity value > limit: score = 1 otherwise: score = 0 Mode: match matrices + sliding window if sum of similarity values in window > limit:score =1 otherwise: score = 0 9
Excursion: : Match Matrices Bi03a_19 Idea: in a protein replace one side chain (aa 1 ) by another (aa 2 ):... little Babylon around Match matrices... match matrices substitution matrices similarity matrices scoring matrices aa-replacement Bi03a_20 small effect, replacement occurs often large effect, replacement occurs rarely (seldom) aa k aa i aa j matrices 10
aa-replacement Bi03a_21 aa k details: small effect, replacement occurs often large effect, replacement occurs rarely (seldom) aai 1.) Altschul SF: Amino acid substitution matrices from an information theoretic perspective. J.Mol.Biol. 1991; 219:555-565 and 2.) http://www.sbc.su.se/~arne/kurser/ swell/substmatrix.html matrices aa j p i = p j = q ij = ab initio /background probability for aa i to occur ( in all proteins considered... ) likewise for aa j target frequency = probability to find aa i and aa j vis à vis in an alignment of 2 proteins out of the family proteins considered. Alignment based on golden Standard (3Dstructure) whenever possible. s ij substitution matrix score qij = ln / λ pp i j odds ratio normalization factor Match Matrices, Motivation Bi03a_22 is the effect (on structure and function) large or small? effect depends on similarity/non similarity of aa 1, and aa 2, aa 3 similarity regarding: size, polarity, charge... surveys: http://www.ncbi.nlm.nih.gov/education/blastinfo/scoring2.html 11
Bi03a_23 20 amino acids from Brown (1999) high resolution Bi03a_24 Match Matrices, Where From? ) generated by searching & evaluating databases for aa-substitutions between known related proteins. ) Different Types of DB, search & statistical evaluation gives rise to different matrices: PAM-matrices (Percent Accepted Mutation) BLOSUM-matrices (BLOcks SUbstitution Matrix)... what is meant by related?... 12
BLOSUM versus PAM matrices Bi03a_25 BLOSUMandPAM.gif Match Matrices, Meaning of Values Bi03a_26 Match matrix PAM250 A R N D C Q E G H I L K M F P S T W Y V A 2 R -2 6 N 0 0 2 D 0-1 2 4 C -2-4 -4-5 4 Q 0 1 1 2-5 4 E 0-1 1 3-5 2 4 G 1-3 0 1-3 -1 0 5 H -1 2 2 1-3 3 1-2 6 I -1-2 -2-2 -2-2 -2-3 -2 5 L -2-3 -3-4 -6-2 -3-4 -2 2 6 K -1 3 1 0-5 1 0-2 0-2 -3 5 M -1 0-2 -3-5 -1-2 -3-2 2 4 0 6 F -4-4 -4-6 -4-5 -5-5 -2 1 2-5 0 9 P 1 0-1 -1-3 0-1 -1 0-2 -3-1 -2-5 6 S 1 0 1 0 0-1 0 1-1 -1-3 0-2 -3 1 3 T 1-1 0 0-2 -1 0 0-1 0-2 0-1 -2 0 1 3 W -6 2-4 -7-8 -5-7 -7-3 -5-2 -3-4 0-6 -2-5 17 Y -3-4 -2-4 0-4 -4-5 0-1 -1-4 -2 7-5 -3-3 0 10 V 0-2 -2-2 -2-2 -2-1 -2 4 2-2 2-1 -1-1 0-6 -2 4 PAM250.pdf high value: favourable replacement low value: non favourable replacement value in diagonal: log odds for retaining aa. PAM250: 250 percent mutations accepted per 100 residues 13
Dot Matrices: Real life examples & Software Bi03a_27 Yeast Chromosome similarity viewer Online: http://genome-www.stanford.edu/saccharomyces/ssv/viewer_start.html Offline: viewer-start.html Dot Matrices: Real life examples & Software Bi03a_28 Tour to Human Haptoglobin Repeat Domains Online: http://www.ncbi.nlm.nih.gov/entrez/ Offline: Entrez-A.gif 14
Dot Matrices: Real life examples & Software Bi03a_29 Tour to Human Haptoglobin Repeat Domains Offline: Entrez-B.gif Dot Matrices: Real life examples & Software Bi03a_30 Tour to Human Haptoglobin Repeat Domains Offline: Entrez-C.gif 15
Dot Matrices: Real life examples & Software Bi03a_31 Tour to Human Haptoglobin Repeat Domains Offline: HPT2_HUMAN_FASTA.txt Dot Matrices: Real life examples & Software Bi03a_32 Tour to Human Haptoglobin Repeat Domains Online: http://www.isrec.isb-sib.ch/java/dotlet/dotlet.html Offline: Dotlet-A.gif 16
Dot Matrices: Real life examples & Software Bi03a_33 Tour to Human Haptoglobin Repeat Domains Offline: Dotlet-B.gif Dot Matrices: Real life examples & Software Bi03a_34 Tour to Human versus Rat Apolipoprotein Online: http://www.ncbi.nlm.nih.gov/entrez/ Offline: Entrez_A.gif 17
Dot Matrices: Real life examples & Software Bi03a_35 Tour to Human versus Rat Apolipoprotein Offline: Entrez_B.gif Dot Matrices: Real life examples & Software Bi03a_36 Tour to Human versus Rat Apolipoprotein Offline: human_apo_i_fasta.txt 18
Dot Matrices: Real life examples & Software Bi03a_37 Tour to Human versus Rat Apolipoprotein Online: http://www.ncbi.nlm.nih.gov/entrez/ Offline: Entrez_D.gif Dot Matrices: Real life examples & Software Bi03a_38 Tour to Human versus Rat Apolipoprotein Offline: rat_apo_i_fasta.txt 19
Dot Matrices: Real life examples & Software Bi03a_39 Tour to Human versus Rat Apolipoprotein Online: human_vs_rat_dotlet.gif Offline: http://www.isrec.isb-sib.ch/java/dotlet/dotlet.html Dot Matrices: Real life examples & Software Bi03a_40 Program Dotlet (+Tutorial) Online: http://us.expasy.org/java/dotlet/dotlet_examples.html Offline: Bild0.gif 20
Dot Matrices: Real life examples & Software Bi03a_41 Conserved protein domains Online: http://us.expasy.org/java/dotlet/consdomain.html Offline: Bild1.gif Dot Matrices: Real life examples & Software Bi03a_42 Exons and introns Online: http://us.expasy.org/java/dotlet/exonintron.html Offline: Bild2.gif 21
Dot Matrices: Real life examples & Software Bi03a_43 Terminators and other stem-loop structures Online: http://us.expasy.org/java/dotlet/terminator.html Offline: Bild3.gif Dot Matrices: Real life examples & Software Bi03a_44 Frameshifts Online: http://us.expasy.org/java/dotlet/frameshift Offline: Bild4.gif 22
Dot Matrices: Real life examples & Software Bi03a_45 Low-complexity regions Online: http://us.expasy.org/java/dotlet/lowcom.html Offline: Bild5.gif Dot Matrices: Real life examples & Software Bi03a_46 Repeated protein domains Online: http://us.expasy.org/java/dotlet/repeats.html Offline: Bild6.gif 23