Ensembl RNASeq Practical. Overview

Similar documents
High-throughout sequencing and using short-read aligners. Simon Anders

Maize genome sequence in FASTA format. Gene annotation file in gff format

Part 1: How to use IGV to visualize variants

Identiyfing splice junctions from RNA-Seq data

RNA-seq. Manpreet S. Katari

Our data for today is a small subset of Saimaa ringed seal RNA sequencing data (RNA_seq_reads.fasta). Let s first see how many reads are there:

NGS Analysis Using Galaxy

Lecture 12. Short read aligners

m6aviewer Version Documentation

Welcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page.

RNA-Seq in Galaxy: Tuxedo protocol. Igor Makunin, UQ RCC, QCIF

Preparation of alignments for variant calling with GATK: exercise instructions for BioHPC Lab computers

Dr. Gabriela Salinas Dr. Orr Shomroni Kaamini Rhaithata

The software and data for the RNA-Seq exercise are already available on the USB system

Bioinformatics in next generation sequencing projects

Analyzing ChIP- Seq Data in Galaxy

Copy Number Variations Detection - TD. Using Sequenza under Galaxy

NGS Data Visualization and Exploration Using IGV

Sequence Analysis Pipeline

Tutorial: RNA-Seq analysis part I: Getting started

Single/paired-end RNAseq analysis with Galaxy

Tiling Assembly for Annotation-independent Novel Gene Discovery

RNASeq2017 Course Salerno, September 27-29, 2017

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg

Goal: Learn how to use various tool to extract information from RNAseq reads.

Goal: Learn how to use various tool to extract information from RNAseq reads. 4.1 Mapping RNAseq Reads to a Genome Assembly

Read mapping with BWA and BOWTIE

Integrative Genomics Viewer. Prat Thiru

Genomic Analysis with Genome Browsers.

SAM : Sequence Alignment/Map format. A TAB-delimited text format storing the alignment information. A header section is optional.

Next Generation Sequence Alignment on the BRC Cluster. Steve Newhouse 22 July 2010

Colorado State University Bioinformatics Algorithms Assignment 6: Analysis of High- Throughput Biological Data Hamidreza Chitsaz, Ali Sharifi- Zarchi

Protocol: peak-calling for ChIP-seq data / segmentation analysis for histone modification data

Fusion Detection Using QIAseq RNAscan Panels

ChIP-seq practical: peak detection and peak annotation. Mali Salmon-Divon Remco Loos Myrto Kostadima

David Crossman, Ph.D. UAB Heflin Center for Genomic Science. GCC2012 Wednesday, July 25, 2012

Galaxy Platform For NGS Data Analyses

Genome Browsers Guide

HIPPIE User Manual. (v0.0.2-beta, 2015/4/26, Yih-Chii Hwang, yihhwang [at] mail.med.upenn.edu)

1. Download the data from ENA and QC it:

ChIP-seq (NGS) Data Formats

Read Naming Format Specification

Integrated Genome browser (IGB) installation

Genomes On The Cloud GotCloud. University of Michigan Center for Statistical Genetics Mary Kate Wing Goo Jun

Helpful Galaxy screencasts are available at:

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg

BGGN-213: FOUNDATIONS OF BIOINFORMATICS (Lecture 14)

Resequencing Analysis. (Pseudomonas aeruginosa MAPO1 ) Sample to Insight

NGS FASTQ file format

Practical Course in Genome Bioinformatics

RASER: Reads Aligner for SNPs and Editing sites of RNA (version 0.51) Manual

Tutorial. RNA-Seq Analysis of Breast Cancer Data. Sample to Insight. November 21, 2017

Creating and Using Genome Assemblies Tutorial

ChIP-Seq Tutorial on Galaxy

Exercise 1. RNA-seq alignment and quantification. Part 1. Prepare the working directory. Part 2. Examine qualities of the RNA-seq data files

Tutorial 1: Exploring the UCSC Genome Browser

Tutorial: De Novo Assembly of Paired Data

Mapping RNA sequence data (Part 1: using pathogen portal s RNAseq pipeline) Exercise 6

TP RNA-seq : Differential expression analysis

ITMO Ecole de Bioinformatique Hands-on session: smallrna-seq N. Servant 21 rd November 2013

Introduction to NGS analysis on a Raspberry Pi. Beta version 1.1 (04 June 2013)

From genomic regions to biology

Genome Browsers - The UCSC Genome Browser

Browser Exercises - I. Alignments and Comparative genomics

replace my_user_id in the commands with your actual user ID

How to store and visualize RNA-seq data

Data: ftp://ftp.broad.mit.edu/pub/users/bhaas/rnaseq_workshop/rnaseq_workshop_dat a.tgz. Software:

Super-Fast Genome BWA-Bam-Sort on GLAD

SPAR outputs and report page

Advanced UCSC Browser Functions

Galaxy workshop at the Winter School Igor Makunin

Connect to login8.stampede.tacc.utexas.edu. Sample Datasets

Sequence Mapping and Assembly

Welcome to GenomeView 101!

BovineMine Documentation

Tutorial. Identification of Variants Using GATK. Sample to Insight. November 21, 2017

Next generation sequencing: assembly by mapping reads. Laurent Falquet, Vital-IT Helsinki, June 3, 2010

Analysing High Throughput Sequencing Data with SeqMonk

AgroMarker Finder manual (1.1)

Variation among genomes

Analysis of ChIP-seq data

Sequencing. Short Read Alignment. Sequencing. Paired-End Sequencing 6/10/2010. Tobias Rausch 7 th June 2010 WGS. ChIP-Seq. Applied Biosystems.

Decrypting your genome data privately in the cloud

From the Schnable Lab:

Mapping NGS reads for genomics studies

Calling variants in diploid or multiploid genomes

Mar. EDICO GENOME CORP North Torrey Pines Court, Plaza Level, La Jolla, CA 92037

Running SNAP. The SNAP Team October 2012

Genome 373: Mapping Short Sequence Reads III. Doug Fowler

How To: Run the ENCODE histone ChIP- seq analysis pipeline on DNAnexus

all M 2M_gt_15 2M_8_15 2M_1_7 gt_2m TopHat2

Our typical RNA quantification pipeline

Demultiplexing Illumina sequencing data containing unique molecular indexes (UMIs)

Illumina Next Generation Sequencing Data analysis

Mapping reads to a reference genome

ChIP-seq Analysis. BaRC Hot Topics - March 21 st 2017 Bioinformatics and Research Computing Whitehead Institute.

Reference guided RNA-seq data analysis using BioHPC Lab computers

Click on "+" button Select your VCF data files (see #Input Formats->1 above) Remove file from files list:

epigenomegateway.wustl.edu

v0.2.0 XX:Z:UA - Unassigned XX:Z:G1 - Genome 1-specific XX:Z:G2 - Genome 2-specific XX:Z:CF - Conflicting

Transcription:

Ensembl RNASeq Practical The aim of this practical session is to use BWA to align 2 lanes of Zebrafish paired end Illumina RNASeq reads to chromosome 12 of the zebrafish ZV9 assembly. We have restricted the analysis to chromosome 12 in order to speed up the alignment process. Once the reads have been aligned you can process the alignments using Samtools and display the alignments in the Ensembl browser alongside some of our own annotation. Overview 1

Command-line actions are coloured green. Please note that most commands are wrapped across more than one line. URLs are coloured blue. You should have a folder entitled EnsemblRNASeqPractical that contains the following files: Bam/ Doc/ EnsemblRNASeqPractical.txt Fastq/ 2cell_chr12_R1.fastq 2cell_chr12_R2.fastq 6hpf_chr12_R1.fastq 6hpf_chr12_R2.fastq Genome/ chr12.fasta Step 1: Index the genome file First we need to index the genome file so that BWA can use it - we do this using the BWA index command. The following is one command, wrapped over two lines: BWA command to index the genome fasta file: /opt/bwa-0.6.1/bwa index -a bwtsw Step 2: Align the reads to the genome Once that has finished we can start the alignment. We are using 2 lanes of 76bp paired end reads, that gives us 4 files, 2 for each lane containing the 1st and 2nd reads respectively. We align each of the lanes independently using BWA aln creating.sai files. Command to align 2cell data, 1 st reads: home/training/desktop/ensemblrnaseqpractical/bam/2cell_chr12_r1.sai /home/training/desktop/ensemblrnaseqpractical/fastq/2cell_chr12_r1.fastq Command to align 2cell data, 2 nd reads: home/training/desktop/ensemblrnaseqpractical/bam/2cell_chr12_r2.sai /home/training/desktop/ensemblrnaseqpractical/fastq/2cell_chr12_r2.fastq 2

Command to align 6hpf data, 1 st reads: /home/training/desktop/ensemblrnaseqpractical/bam/6hpf_chr12_r1.sai /home/training/desktop/ensemblrnaseqpractical/fastq/6hpf_chr12_r1.fastq Command to align 6hpf data, 2nd reads: /home/training/desktop/ensemblrnaseqpractical/bam/6hpf_chr12_r2.sai /home/training/desktop/ensemblrnaseqpractical/fastq/6hpf_chr12_r2.fastq The -n 37 parameter allows BWA to include up to 37 mismatches in the alignment ie: half the read length. The -i 76 parameter means that we do not want any insertions in the alignment over the full length of the reads. Step 3: Create SAM files Once the alignments have run we need to create the SAM file from the pairs. We process the.sai files along with the genome (chromosome) and fastq files using BWA sampe this will produce a single SAM file for each sample. Command to make sam files for 2cell data: /opt/bwa-0.6.1/bwa sampe -A -a 200000 -f home/training/desktop/ensemblrnaseqpractical/bam/2cell_chr12.sam /home/training/desktop/ensemblrnaseqpractical/bam/2cell_chr12_r1.sai /home/training/desktop/ensemblrnaseqpractical/bam/2cell_chr12_r2.sai /home/training/desktop/ensemblrnaseqpractical/fastq/2cell_chr12_r1.fastq /home/training/desktop/ensemblrnaseqpractical/fastq/2cell_chr12_r2.fastq Command to make sam files for 6hpf data: /opt/bwa-0.6.1/bwa sampe -A -a 200000 -f /home/training/desktop/ensemblrnaseqpractical/bam/6hpf_chr12.sam /home/training/desktop/ensemblrnaseqpractical/bam/6hpf_chr12_r1.sai /home/training/desktop/ensemblrnaseqpractical/bam/6hpf_chr12_r2.sai /home/training/desktop/ensemblrnaseqpractical/fastq/6hpf_chr12_r1.fastq /home/training/desktop/ensemblrnaseqpractical/fastq/6hpf_chr12_r2.fastq The -A parameter tells BWA to discard the estimation of insert size - this is because BWA is expecting genomic reads rather than transcriptome reads. Transcriptome read pairs can span introns which causes problems when estimating insert size. The -a 200000 parameter tells BWA to use a maximum allowed insert size (distance between the 2 pairs) of 200Kb, this effectively acts as a maximum intron length for the alignment. In case samtools does not work, run this command: export LD_LIBRARY_PATH=/opt/zlib-1.2.6 3

Step 4: Create BAM files, sort and index them Once the pairs have been processed into sam files we use samtools to process the reads into BAM files. Command to make BAM files for 2cell data: /opt/samtools/samtools view -S -b /home/training/desktop/ensemblrnaseqpractical/bam/2cell_chr12.sam -o Command to make BAM files for 6hpf data: /opt/samtools/samtools view -S -b /home/training/desktop/ensemblrnaseqpractical/bam/6hpf_chr12.sam -o Here the -S parameter specifies the input is in SAM format, -b specifies to output in BAM format In order to use the files on the website they must be sorted and indexed, this can be done as follows. BWA command to sort the 2cell file: /opt/samtools/samtools sort /home/training/desktop/ensemblrnaseqpractical/bam/2cell_chr12_sorted BWA command to sort the 6hpf file: /opt/samtools/samtools sort /home/training/desktop/ensemblrnaseqpractical/bam/6hpf_chr12_sorted Note that the.bam extension is appended to the file name when using samtools sort Command to index the 2cell BAM file: /opt/samtools/samtools index /home/training/desktop/ensemblrnaseqpractical/bam/2cell_chr12_sorted.bam Command to index the 6hpf BAM file: /opt/samtools/samtools index /home/training/desktop/ensemblrnaseqpractical/bam/6hpf_chr12_sorted.bam Samtools flagstat will give you some basic statistics about the alignments: Flagstat command for 2cell data: /opt/samtools/samtools flagstat Flagstat command for 6hpf data: /opt/samtools/samtools flagstat 4

Step 5: View results BAM files are often quite large and are unsuitable for uploading to a website, so in order to view the alignments in the Ensembl browser you need to host the sorted and indexed BAM files on either a webserver or an ftp site. Both the sorted bam file and the index file ending.bai are needed to view the alignments on the website. The website requires the bam file URL to be entered, it then looks for a.bai file with the same name in the same directory. For convenience we have already set up an ftp site that contains the files you just created, if you enter the following URL into your web browser: ftp://ftp.sanger.ac.uk/pub/users/sw4/danio/practical/bam You will see 2 directories: Exons and Introns. Exons contain the alignments to chr12 that you just made. Introns contains spliced alignments that we created using the RNASeq pipeline for several tissues including the 2cell and 6hpf lanes we have used here. We have chosen ENSDARG00000055381 as a good example of a chromosome 12 gene with differential expression highlighted by the 6hpf and 2cell lanes, though the alignments cover the whole of chr12 if you want to look for other interesting examples. 1. To load the alignments first go the browser and go to www.ensembl.org 2. Enter ENSDARG00000055381 into the search box to take you to the gene view page. It is a gene called "bambia". 3. Click on the location tab at the top of the page to take you to the view of the region on the chromosome. 4. Now we can load our BAM files. Click on the "Configure this page" button on the left panel, this opens a configuration panel, we want the "custom data" tab at the top right. 5. To load the BAM files click on "Attach Remote File" on the left hand panel. 6. Here enter the URL of the files: ftp://ftp.sanger.ac.uk/pub/users/sw4/danio/bam/exons/2cells.bam 7. Select the data format as BAM and name the track. If you have a Ensembl account you can store the track in your account to use another time. 8. Then do the same for the 6hpf file and any of the Intron files you might like to view. ftp://ftp.sanger.ac.uk/pub/users/sw4/danio/bam/exons/6hpf.bam ftp://ftp.sanger.ac.uk/pub/users/sw4/danio/bam/introns/6hpf.bam ftp://ftp.sanger.ac.uk/pub/users/sw4/danio/bam/introns/2cells.bam 9. Once you have attached the remote files you should be able to see them in the region view browser, if they do not show up you may need to turn them on by going to "Configure this page" -> "Your data" 5