Sequence Alignment. GBIO0002 Archana Bhardwaj University of Liege

Similar documents
Wilson Leung 01/03/2018 An Introduction to NCBI BLAST. Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment

INTRODUCTION TO BIOINFORMATICS

INTRODUCTION TO BIOINFORMATICS

Wilson Leung 05/27/2008 A Simple Introduction to NCBI BLAST

2) NCBI BLAST tutorial This is a users guide written by the education department at NCBI.

Database Searching Using BLAST

Tutorial 4 BLAST Searching the CHO Genome

Basic Local Alignment Search Tool (BLAST)

Genome Browsers - The UCSC Genome Browser

Lecture 5 Advanced BLAST

Sequence Alignment: BLAST

As of August 15, 2008, GenBank contained bases from reported sequences. The search procedure should be

Bioinformatics explained: BLAST. March 8, 2007

BLAST MCDB 187. Friday, February 8, 13

BLAST Exercise 2: Using mrna and EST Evidence in Annotation Adapted by W. Leung and SCR Elgin from Annotation Using mrna and ESTs by Dr. J.

How to Run NCBI BLAST on zcluster at GACRC

BLAST, Profile, and PSI-BLAST

MetaPhyler Usage Manual

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

Alignments BLAST, BLAT

Assessing Transcriptome Assembly

Pairwise Sequence Alignment. Zhongming Zhao, PhD

Similarity Searches on Sequence Databases

B L A S T! BLAST: Basic local alignment search tool. Copyright notice. February 6, Pairwise alignment: key points. Outline of tonight s lecture

Sequence alignment theory and applications Session 3: BLAST algorithm

Introduction to Computational Molecular Biology

24 Grundlagen der Bioinformatik, SS 10, D. Huson, April 26, This lecture is based on the following papers, which are all recommended reading:

Tutorial 1: Exploring the UCSC Genome Browser

Exercise 2: Browser-Based Annotation and RNA-Seq Data

2. Take a few minutes to look around the site. The goal is to familiarize yourself with a few key components of the NCBI.

4.1. Access the internet and log on to the UCSC Genome Bioinformatics Web Page (Figure 1-

Introduc)on to annota)on with Artemis. Download presenta.on and data

Practical Course in Genome Bioinformatics

How to use KAIKObase Version 3.1.0

Genome Browser. Background and Strategy

Similarity searches in biological sequence databases

Genome Browsers Guide

BLAST & Genome assembly

BLAST & Genome assembly

Introduction to Phylogenetics Week 2. Databases and Sequence Formats

Advanced UCSC Browser Functions

From Smith-Waterman to BLAST

Bioinformatics. Sequence alignment BLAST Significance. Next time Protein Structure

Browser Exercises - I. Alignments and Comparative genomics

BLAST. NCBI BLAST Basic Local Alignment Search Tool

BioExtract Server User Manual

The UCSC Genome Browser

Compares a sequence of protein to another sequence or database of a protein, or a sequence of DNA to another sequence or library of DNA.

Finding data. HMMER Answer key

Viewing Molecular Structures

BGGN 213 Foundations of Bioinformatics Barry Grant

Two Examples of Datanomic. David Du Digital Technology Center Intelligent Storage Consortium University of Minnesota

Install and run external command line softwares. Yanbin Yin

CS 284A: Algorithms for Computational Biology Notes on Lecture: BLAST. The statistics of alignment scores.

Introduction to Genome Browsers

BIOL591: Introduction to Bioinformatics Alignment of pairs of sequences

Public Repositories Tutorial: Bulk Downloads

CLC Server. End User USER MANUAL

Heuristic methods for pairwise alignment:

Bioinformatics for Biologists

Bioinformatics Database Worksheet

Introduction to Biopython

Using the UCSC genome browser

FASTA. Besides that, FASTA package provides SSEARCH, an implementation of the optimal Smith- Waterman algorithm.

CAP BLAST. BIOINFORMATICS Su-Shing Chen CISE. 8/20/2005 Su-Shing Chen, CISE 1

visualize and recover Grapegen Affymetrix Genechip Probeset Initial page: Optimized for Mozilla Firefox 3 (recommended browser)

Submitting allele sequences to the GenBank NGSengine allele submission Sequin

NCBI BLAST: a better web interface

The BLASTER suite Documentation

SlopMap: a software application tool for quick and flexible identification of similar sequences using exact k-mer matching

BovineMine Documentation

Lecture 4: January 1, Biological Databases and Retrieval Systems

Section A. 1. Objective types (Answer all the questions) [1 x 10 =10]

Introduction to BLAST with Protein Sequences. Utah State University Spring 2014 STAT 5570: Statistical Bioinformatics Notes 6.2

WSSP-10 Chapter 7 BLASTN: DNA vs DNA searches

Topics of the talk. Biodatabases. Data types. Some sequence terminology...

This tutorial will show you how to conduct a BLAST search. With BLAST you may:

Fast-track to Gene Annotation and Genome Analysis

Tutorial: chloroplast genomes

BIOINFORMATICS A PRACTICAL GUIDE TO THE ANALYSIS OF GENES AND PROTEINS

Mapping RNA sequence data (Part 1: using pathogen portal s RNAseq pipeline) Exercise 6

CS313 Exercise 4 Cover Page Fall 2017

Daniel H. Huson and Stephan C. Schuster with contributions from Alexander F. Auch, Daniel C. Richter, Suparna Mitra and Qi Ji.

Dr. Gabriela Salinas Dr. Orr Shomroni Kaamini Rhaithata

Data Mining Technologies for Bioinformatics Sequences

Using many concepts related to bioinformatics, an application was created to

HymenopteraMine Documentation

Dynamic Programming User Manual v1.0 Anton E. Weisstein, Truman State University Aug. 19, 2014

Optimizing Bioinformatics Workflow Execution Through Pipelining Techniques

Computational Theory MAT542 (Computational Methods in Genomics) - Part 2 & 3 -

Combinatorial Pattern Matching

Sequence Alignment: Mo1va1on and Algorithms. Lecture 2: August 23, 2012

ChIP-seq (NGS) Data Formats

Sequence Alignment: Mo1va1on and Algorithms

Finding and Exporting Data. BioMart

Genomic Analysis with Genome Browsers.

Semantic Web Analysis Service

ChIP-seq hands-on practical using Galaxy

biokepler: A Comprehensive Bioinforma2cs Scien2fic Workflow Module for Distributed Analysis of Large- Scale Biological Data

Having a BLAST Data Mining in Oracle 10g:

Transcription:

Sequence Alignment GBIO0002 Archana Bhardwaj University of Liege 1

What is Sequence Alignment? A sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity. Comparable? 2

Sequence Alignment :Uses (1) Sequence Assembly : Genome sequence are assembled by using the sequence alignment methods to find the overlap between many short pieces of DNA. 3

Sequence Alignment :Uses (2) Gene Finding : Sequence similarity could help us to find the gene prediction just by doing comparison against the other set of sequences. 4

Sequence Alignment :Uses (3) Function prediction : Function of any unknown sequence could be predicted by comparing with other known sequence. 5

Sequence Alignment :Uses (4 ) Sequence Divergence : Amount of sequence similarity (10%, 20%,30%...sometimes 90 %) between sequences tell us how closely they are related 6

Types of Alignments Global : This attempt to align every residue in every sequence. Local: It is more useful for dissimilar sequences that are suspected to contain regions of similarity or similar sequence motifs within their larger sequence context. 7

8

Types of Alignments: Based on number of sequences Pair wise Sequence Alignment : This alignments can only be used between two sequences at a time. Multiple Sequence Alignment : This alignments can only be used between more than two sequences at a time. 9

Tools for Sequence Alignments There are many tools for sequence Alignment. In this session, we will discuss about BLAST BLAT CLUSTALW 10

Sequence Alignment : BLAST BLAST stands for Basic Local Alignment Search Tool 11

Blast was developed by Stephan Altschul and colleagues at NCBI in 1990. BLAST is an algorithm for comparing primary biological sequence information, such as the aminoacid sequences of proteins or the nucleotides of DNA sequences. Blast is most used bioinformatics program (cited >60000 times). 12

A BLAST search enables a researcher to compare a query sequence with a library or databases of sequences, and identify library sequences that resemble the query sequence above a certain threshold. QUERY DATABASE 13

Types of BLAST (1) BLASTN : search nucleotide databases using a nucleotide query (A)Query : ATGCATCGATC (B) Database : ATCGATGATCGACATCGATCAGCTACG BLASTP : search protein databases using a protein query (A)Query : VIVALASVEGAS (B) DATABASE : TARDEFGGAVIVADAVISASTILHGGQWLC BLASTX : search protein databases using a translated nucleotide query (A)Query : ATGCATCGATC (B)DATABASE : TARDEFGGAVIVADAVISASTILHGGQWLC 14

Types of BLAST (2) TBLASTN : search translated nucleotide databases using a protein query (A)Query : TARDEFGGAVI (B)DATABASE : ATCGATGATCGACATCGATCAGCTACG TBLASTX : search translated nucleotide databases using a translated nucleotide query (A)Query : CGATGATCG (B)DATABASE : ATCGATGATCGACATCGATCAGCTACG 15

Types of BLAST : ALL 16

How does BLAST Works? Construct a dictionary of all words in the query Initiate a local alignment for each word match between query and DB 17

BLAST: Global Alignment It compares the whole sequence with another sequence. So, output of Global is one to one comparison of two sequences. This method is useful if you have small group of sequences. 18

BLAST: Local Alignment Local method uses the subset of sequence and attempts to align against the subset of another sequence. So, output of local alignment gives the subset of regions which are highly similar. Example : Compare two sequence A and B (A) GCATTACTAATATATTAGTAAATCAGAGTAGTA (B) AAGCGAATAATATATTTATACTCAGATTATTGCGCG 19

BLAST: Input Format Many program for sequence alignment expect sequences to be in FASTA format Example 1 : >L37107.1 Canis familiaris p53 mrna, partial cds GTTCCGTTTGGGGTTCCTGCATTCCGGGACAGCCAAGTCTGTTACTTGGACGTACTCCCCTCTCCTCAAC AAGTTGTTTTGCCAGCTGGCGAAGACCTGCCCCGTGCAGCTGTGGGTCAGCTCCCCACCCCCACCCAATA CCTGCGTCCGCGCTATGGCCATCTATAAGAAGTCGGAGTTCGTGACCGAGGTTGTGCGGCGCTGCCCCCA CCATGAACGCTGCTCTGACAGTAGTGACGGTCTTGCCCCTCCTCAGCATCTCATCCGAGTGGAAGGAAAT TTGCGGGCCAAGTACCTGGACGACAGAAACACTTTTCGACACAGTGTGGTGGTGCCTTATGAGCCACCCG AGGTTGGCTCTGACTATACCACCATCCACTACAACTACATGTGTAACAGTTCCTGCATGGGAGGCATGAA CCGGCGGCCCATCCTCACTATCATCACCCTGGAAGACTCCAGTGGAAACGTGCTGGGACGCAACAGCTTT GAGGTACGCGTTTGTGCCTGTCCCGGGAGAGACCGCCGGACTGAGGAGGAGAATTTCCACAAGAAGGGGG AGCCTTGTCCTGAGCCACCCCCCGGGAGTACCAAGCGAGCACTGCCTCCCAGCACCAGCTCCTCTCCCCC GCAAAAGAAGAAGCCACTAGATGGAGAATATTTCACCCTTCAGATCCGTGGGCGTGAACGCTATGAGATG TTCAGGAATCTGAATGAAGCCTTGGAGCTGAAGGATGCCCAGAGTGGAAAGGAGCCAGGGGGAAGCAGGG CTCACTCCAGCCACCTGAAGGCAAAGAAGGGGCAATCTACCTCTCGCCATAAAAAACTGATGTTCAAGAGAGAA Example 2 : >NM_033360.3 Homo sapiens KRAS proto-oncogene, GTPase (KRAS), transcript variant a, mrna TCCTAGGCGGCGGCCGCGGCGGCGGAGGCAGCAGCGGCGGCGGCAGTGGCGGCGGCGAAGGTGGCGGCGG CTCGGCCAGTACTCCCGGCCCCCGCCATTTCGGACTGGGAGCGAGCGCGGCGCAGGCACTGAAGGCGGCG GCGGGGCCAGAGGCTCAGCGGCTCCCAGGTGCGGGAGAGAGGCCTGCTGAAAATGACTGAATATAAACTT GTGGTAGTTGGAGCTGGTGGCGTAGGCAAGAGTGCCTTGACGATACAGCTAATTCAGAATCATTTTGTGG ACGAATATGATCCAACAATAGAGGATTCCTACAGGAAGCAAGTAGTAATTGATGGAGAAACCTGTCTCTT GGATATTCTCGACACAGCAGGTCAAGAGGAGTACAGTGCAATGAGGGACCAGTACATGAGGACTGGGGAG GGCTTTCTTTGTGTATTTGCCATAAATAATACTAAATCATTTGAAGATATTCACCATTATAGAGAACAAA TTAAAAGAGTTAAGGACTCTGAAGATGTACCTATGGTCCTAGTAGGAAATAAATGTGATTTGCCTTCTAG 20

NCBI BLAST SERVER Open the website : https://blast.ncbi.nlm.nih.gov/blast.cgi 21

Window of BLASTN 22

Let us work on BLASTN Select following sequence and give input into NCBI BLASTN query section >Seq1 ACCAAGGCCAGTCCTGAGCAGGCCCAACTCCAGTGCAGCTGCCCACCCTGCCGCCATGTCTCTGACCAAG ACTGAGAGGACCATCATTGTGTCCATGTGGGCCAAGATCTCCACGCAGGCCGACACCATCGGCACCGAGA CTCTGGAGAGGCTCTTCCTCAGCCACCCGCAGACCAAGACCTACTTCCCGCACTTCGACCTGCACCCGGG GTCCGCGCAGTTGCGCGCGCACGGCTCCAAGGTGGTGGCCGCCGTGGGCGACGCGGTGAAGAGCATCGAC GACATCGGCGGCGCCCTGTCCAAGCTGAGCGAGCTGCACGCCTACATCCTGCGCGTGGACCCGGTCAACT TCAAGCTCCTGTCCCACTGCCTGCTGGTCACCCTGGCCGCGCGCTTCCCCGCCGACTTCACGGCCGAGGC CCACGCCGCCTGGGACAAGTTCCTATCGGTCGTATCCTCTGTCCTGACCGAGAAGTACCGCTGAGCGCCG CCTCCGGGACCCCCAGGACAGGCTGCGGCCCCTCCCCCGTCCTGGAGGTTCCCCAGCCCCACTTACCGCG TAATGCGCCAATAAACCAATGAACGAAGC You will get list of Hits 23

You will see statistic of alignments ( Identity, E value) Click here 24

How well alignment is? : Bad, Good, Very Good? 25

RESULT INTERPRETATION 1. How many sequences crossed the threshold E value??? 2. How many sequences show > 50 % identity with database?? 3. How many sequences show > 90 % identity with database?? 4. Prepare tabular output for BLASTP and BLASTN results. 26

QUESTIONS 27

Blastx : Let us run 1. Perform the blastx 2. How many sequences shows 90% identity against the database 3. What is their e-value?? 28

QUESTIONS 29

Is it possible to localise its position on human genome? How to analysis its gene structure? For this, Open the UCSC Browser available at https://genome.ucsc.edu/ 30

Click BLAT 31

Difference Between BLAST and BLAT BLAT is an alignment tool like BLAST, but it is structured differently. BLAT works by keeping an index of an entire genome in memory. Thus, the target database of BLAT is not a set of GenBank sequences, but instead an index derived from the assembly of the entire genome. 32

Advantages of BLAT over BLAST Its Speed is very high (no queues, response in seconds). The ability to submit a long list of simultaneous queries in fasta format. A direct link into the UCSC browser. Alignment block details in natural genomic order. An option to launch the alignment later as part of a custom track. 33

Paste following sequence into Query search Box and click Submit >Seq1 ACCAAGGCCAGTCCTGAGCAGGCCCAACTCCAGTGCAGCTGCCCACCCTGCCGCCATGTCT CTGACCAAGACTGAGAGGACCATCATTGTGTCCATGTGGGCCAAGATCTCCACGCAGGCCG ACACCATCGGCACCGAGACTCTGGAGAGGCTCTTCCTCAGCCACCCGCAGACCAAGACCTA CTTCCCGCACTTCGACCTGCACCCGGGGTCCGCGCAGTTGCGCGCGCACGGCTCCAAGGTG GTGGCCGCCGTGGGCGACGCGGTGAAGAGCATCGACGACATCGGCGGCGCCCTGTCCAAGC TGAGCGAGCTGCACGCCTACATCCTGCGCGTGGACCCGGTCAACTTCAAGCTCCTGTCCCA CTGCCTGCTGGTCACCCTGGCCGCGCGCTTCCCCGCCGACTTCACGGCCGAGGCCCACGCC GCCTGGGACAAGTTCCTATCGGTCGTATCCTCTGTCCTGACCGAGAAGTACCGCTGAGCGC CGCCTCCGGGACCCCCAGGACAGGCTGCGGCCCCTCCCCCGTCCTGGAGGTTCCCCAGCCC CACTTACCGCGTAATGCGCCAATAAACCAATGAACGAAGC Which output did you see?? Can you have a look at your sequence? How? How many exons are present in your sequence? 34

35

QUESTIONS 36