Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il Lecture 5 Advanced BLAST
BLAST Recap Sequence Alignment Complexity and indexing BLASTN and BLASTP Basic parameters PAM and BLOSUM matrices Affine gap model E Values 2
Advanced BLAST Databases BLAST options More on BLAST output Taxonomic BLAST Translated BLAST PSI-BLAST BLink 3
BLASTN Databases nr htgs pat mito vector month chrom GenBank, EMBL, DDBJ, PDB and NCBI reference sequences (RefSeq) High-throughput genomic sequences (draft) Patented nucleotide sequences Mitochondrial sequences Vector subset of GenBank GenBank, EMBL, DDBJ, PDB from 30 days Contigs and chromosomes from RefSeq 4
BLASTP Databases nr swissprot pat pdb month GenBank CDS translations, RefSeq, PDB, SWISS-PROT, PIR, PRF SWISS-PROT Patented protein sequences Protein Data Bank GenBank CDS translations, PDB, SWISS- PROT, PIR, PRF from 30 days 5
BLASTN/P Options (1) Only search part of database using NCBI Entrez query format Search specific organism Remove low information content, e.g. short repeats or rich in only 2 nucleotides Remove known human repeats (LINEs, SINEs) 6
BLASTN/P Options (2) Threshold for results significance Costs to open and extend gap, score for nucleotide match or mismatch. Use index based on words of 7, 11 or 15 nucleotides Allowed gap scores: 10/1, 10/2, 11/1, 8/2, 9/2 7
BLASTP Options Scoring matrix: PAM, etc Costs to open and extend gap Search for a motif (PSI-BLAST) 8
BLASTN/P Formatting (1) Show colored bar chart Other (less important) options on what to show Number of sequences listed Number of alignments shown 9
BLASTN/P Formatting (2) How to display alignments Only show results which match Entrez search or are from specific organism Only show results with E values in this range 10
BLAST Output Header Request ID for later retrieval Query sequence details Database details Tax BLAST 11
BLAST Alignments (1) Normalized score of alignment Expected number of such hits (2e-11 = 2 10-11 ) Number of insertion / deletions Number of exact matches Number of matches with positive score Several alignments possible for one sequence match 12
BLAST Alignments (2) Insertion / deletion Exact match Query sequence Mismatch with positive score Position within sequence Matched sequence Masked low complexity region 13
Expectation Values Increases linearly with length of query sequence Increases linearly with length of database Decreases exponentially with score of alignment 14
Tax BLAST Lineage of organism with strongest hit Score of organism s strongest hit Shared ancestry in taxonomic tree Number of organism hits 15
Translated BLAST Type of search: blastx, tblastn, tblastx Database options reflect chosen search Codon to amino acid mapping table 16
PSI-BLAST Position-Specific Iterative BLAST Extension to BLASTP Finds more distantly related sequences Distant sequences with insignificant E values Even in distantly related sequences, important domains can be highly conserved PSI-BLAST gives more weight to those 17
A PSI-BLAST Iteration Perform ordinary BLASTP Use PAM/BLOSUM matrices Identify matched sequences Apply E value threshold with tweaking Perform multiple alignment on matches Extract position-specific profile from alignment Use profile as new scoring matrix 18
Position-Specific Scoring Matrix 19
Using PSI-BLAST (1) Available from main BLAST page Or switch on in BLASTP E value threshold for initial inclusion in multiple alignment for profile 20
Using PSI-BLAST (2) Align selected sequences, generate profile, search again Number of results to show next iteration New result Select whether to include in next iteration 21
BLink BLAST Link BLAST is efficient but still slow BLink contains pre-computed results For proteins in Entrez protein database Up to 200 BLAST hits for sequence Taxonomic grouping Alignment scores only No E values since database changes 22
BLAST Resources Statistics of similarity scores: http://www.ncbi.nlm.nih.gov/blast/tutorial/altschul-1.html Tutorial: http://www.ncbi.nlm.nih.gov/education/blastinfo/information3.html Frequently Asked Questions: http://www.ncbi.nlm.nih.gov/blast/blast_faqs.html BLAST Network client: ftp://ftp.ncbi.nih.gov/blast/blastcl3 WU-BLAST: http://blast.wustl.edu/ 23