version /1/2011 Source code Linux x86_64 binary Mac OS X x86_64 binary

Similar documents
David Crossman, Ph.D. UAB Heflin Center for Genomic Science. GCC2012 Wednesday, July 25, 2012

NGS FASTQ file format

Goal: Learn how to use various tool to extract information from RNAseq reads. 4.1 Mapping RNAseq Reads to a Genome Assembly

RNA-Seq Analysis With the Tuxedo Suite

Questions about Cufflinks should be sent to Please do not technical questions to Cufflinks contributors directly.

From the Schnable Lab:

RNA-seq. Manpreet S. Katari

New releases and related tools will be announced through the mailing list

Cyverse tutorial 1 Logging in to Cyverse and data management. Open an Internet browser window and navigate to the Cyverse discovery environment:

TopHat, Cufflinks, Cuffdiff

Sequence Analysis Pipeline

A Tutorial: Genome- based RNA- Seq Analysis Using the TUXEDO Package

Data: ftp://ftp.broad.mit.edu/pub/users/bhaas/rnaseq_workshop/rnaseq_workshop_dat a.tgz. Software:

Goal: Learn how to use various tool to extract information from RNAseq reads.

Read Mapping. Slides by Carl Kingsford

mrna-seq Basic processing Read mapping (shown here, but optional. May due if time allows) Gene expression estimation

RNA Sequencing with TopHat Alignment v1.0 and Cufflinks Assembly & DE v1.1 App Guide

Maize genome sequence in FASTA format. Gene annotation file in gff format

Colorado State University Bioinformatics Algorithms Assignment 6: Analysis of High- Throughput Biological Data Hamidreza Chitsaz, Ali Sharifi- Zarchi

Tutorial: RNA-Seq analysis part I: Getting started

Reference guided RNA-seq data analysis using BioHPC Lab computers

RNASeq2017 Course Salerno, September 27-29, 2017

Galaxy Platform For NGS Data Analyses

RNA-seq Data Analysis

Tiling Assembly for Annotation-independent Novel Gene Discovery

Galaxy workshop at the Winter School Igor Makunin

1. Quality control software FASTQC:

KisSplice. Identifying and Quantifying SNPs, indels and Alternative Splicing Events from RNA-seq data. 29th may 2013

Services Performed. The following checklist confirms the steps of the RNA-Seq Service that were performed on your samples.

NGS Analysis Using Galaxy

Ballgown. flexible RNA-seq differential expression analysis. Alyssa Frazee Johns Hopkins

11/8/2017 Trinity De novo Transcriptome Assembly Workshop trinityrnaseq/rnaseq_trinity_tuxedo_workshop Wiki GitHub

The software and data for the RNA-Seq exercise are already available on the USB system

Read mapping with BWA and BOWTIE

RNA-Seq in Galaxy: Tuxedo protocol. Igor Makunin, UQ RCC, QCIF

Ensembl RNASeq Practical. Overview

Single/paired-end RNAseq analysis with Galaxy

TP RNA-seq : Differential expression analysis

Aligners. J Fass 21 June 2017

m6aviewer Version Documentation

RNAseq analysis: SNP calling. BTI bioinformatics course, spring 2013

A review of RNA-Seq normalization methods

Centre (CNIO). 3rd Melchor Fernández Almagro St , Madrid, Spain. s/n, Universidad de Vigo, Ourense, Spain.

ITMO Ecole de Bioinformatique Hands-on session: smallrna-seq N. Servant 21 rd November 2013

Illumina Next Generation Sequencing Data analysis

EBSeqHMM: An R package for identifying gene-expression changes in ordered RNA-seq experiments

Exercise 1 Review. --outfiltermismatchnmax : max number of mismatch (Default 10) --outreadsunmapped fastx: output unmapped reads

ChIP-seq practical: peak detection and peak annotation. Mali Salmon-Divon Remco Loos Myrto Kostadima

all M 2M_gt_15 2M_8_15 2M_1_7 gt_2m TopHat2

RNA-Seq data analysis software. User Guide 023UG050V0200

RNA-Seq. Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University

Visualization using CummeRbund 2014 Overview

Dr. Gabriela Salinas Dr. Orr Shomroni Kaamini Rhaithata

Evaluate NimbleGen SeqCap RNA Target Enrichment Data

Short Read Sequencing Analysis Workshop

T-IDBA: A de novo Iterative de Bruijn Graph Assembler for Transcriptome

Analyzing Variant Call results using EuPathDB Galaxy, Part II

Circ-Seq User Guide. A comprehensive bioinformatics workflow for circular RNA detection from transcriptome sequencing data

The preseq Manual. Timothy Daley Victoria Helus Andrew Smith. January 17, 2014

RNA-Seq data analysis software. User Guide 023UG050V0210

BGGN-213: FOUNDATIONS OF BIOINFORMATICS (Lecture 14)

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg

User Guide for Tn-seq analysis software (TSAS) by

HIPPIE User Manual. (v0.0.2-beta, 2015/4/26, Yih-Chii Hwang, yihhwang [at] mail.med.upenn.edu)

RNA Sequencing with TopHat and Cufflinks

RNA-Seq data analysis software. User Guide 023UG050V0100

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg

replace my_user_id in the commands with your actual user ID

Analysis of ChIP-seq data

High-throughout sequencing and using short-read aligners. Simon Anders

Our data for today is a small subset of Saimaa ringed seal RNA sequencing data (RNA_seq_reads.fasta). Let s first see how many reads are there:

Aligners. J Fass 23 August 2017

MISO Documentation. Release. Yarden Katz, Eric T. Wang, Edoardo M. Airoldi, Christopher B. Bur

Using Galaxy to provide a NGS Analysis Platform

RNA- SeQC Documentation

Integrated Genome browser (IGB) installation

Supplementary Figure 1. Fast read-mapping algorithm of BrowserGenome.

cgatools Installation Guide

Introduction to Galaxy

Exercise 1. RNA-seq alignment and quantification. Part 1. Prepare the working directory. Part 2. Examine qualities of the RNA-seq data files

DEWE v1.0.1 USER MANUAL

Welcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page.

RNA-seq. Read mapping and Quantification. Genomics: Lecture #12. Institut für Medizinische Genetik und Humangenetik Charité Universitätsmedizin Berlin

Data Processing and Analysis in Systems Medicine. Milena Kraus Data Management for Digital Health Summer 2017

Tutorial. RNA-Seq Analysis of Breast Cancer Data. Sample to Insight. November 21, 2017

Welcome to GenomeView 101!

Genomic Files. University of Massachusetts Medical School. October, 2014

CLC Server. End User USER MANUAL

STREAMING FRAGMENT ASSIGNMENT FOR REAL-TIME ANALYSIS OF SEQUENCING EXPERIMENTS. Supplementary Figure 1

ChIP-seq hands-on practical using Galaxy

JunctionSeq Package User Manual

Package srap. November 25, 2017

Quantitative Biology Bootcamp Intro to RNA-seq

User's guide: Manual for V-Xtractor 2.0

Using Galaxy: RNA-seq

Short Read Alignment. Mapping Reads to a Reference

Benchmarking of RNA-seq aligners

de.nbi and its Galaxy interface for RNA-Seq

JunctionSeq Package User Manual

DEWE v1.1 USER MANUAL

Transcription:

Cufflinks RNA-Seq analysis tools - Getting Started 1 of 6 14.07.2011 09:42 Cufflinks Transcript assembly, differential expression, and differential regulation for RNA-Seq Site Map Home Getting started Manual How Cufflinks works FAQ News and updates New releases and related tools will be announced through the mailing list Getting Help Releases Questions about Cufflinks should be sent to tophat.cufflinks@gmail.com. Please do not email technical questions to Cufflinks contributors directly. version 0.3 6/1/2011 Source code Linux x86_64 binary Mac OS X x86_64 binary Related Tools TopHat: Alignment of short RNA-Seq reads Bowtie: Ultrafast short read alignment Publications Trapnell C, Williams BA, Pertea G, Mortazavi AM, Kwan G, van Baren MJ, Salzberg SL, Wold B, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation Nature Biotechnology doi:10.1038/nbt.1621 Roberts A, Trapnell C, Donaghey J, Rinn JL, Pachter L. Improving RNA-Seq expression estimates by correcting for fragment bias Genome Biology doi:10.1186/gb-2011-12-3-r22 Contributors Cole Trapnell Adam Roberts Geo Pertea Brian Williams Ali Mortazavi Gordon Kwan Jeltje van Baren

Cufflinks RNA-Seq analysis tools - Getting Started 2 of 6 14.07.2011 09:42 Links Steven Salzberg Barbara Wold Lior Pachter Berkeley LMCB UMD CBCB Wold Lab

ufflinks RNA-Seq analysis tools - Getting Started 3 of 6 14.07.2011 09:42 Getting started Setting up Cufflinks Install quick-start Test the installation Common uses of the Cufflinks package Discovering novel genes and transcripts Identifying differentially expressed and regulated genes Install quick-start Installing a pre-compiled binary release In order to make it easy to install Cufflinks, we provide a few binary packages to save users from occasionally frustrating process of building Cufflinks, which requires that you install the Boost libraries. To use the binary packages, simply download the appropriate one for your machine, untar it, and make sure the cufflinks,cuffdiff and cuffcompare binaries are in a directory in your PATH environment variable. Building Cufflinks from source In order to build Cufflinks, you must have the Boost C++ libraries (version 38 or higher) installed on your system. See below for instructions on installing Boost. Installing Boost Download Boost and the bjam build engine. Unpack bjam and add it to your PATH. Unpack the Boost tarball and cd to the Boost source directory. This directory is called the BOOST_ROOT in some Boost installation instructions. 4. Build Boost. Note that you can specify where to put Boost with the --prefix option. The default Boost installation directory is /usr/local. Take note of the boost installation directory, because you will need to tell the Cufflinks installer where to find Boost later on. If you are on Mac OS X, type (all on one line): bjam --prefix=<your_boost_install_directory> --toolset=darwin architecture=x86 address_model=32_64 link=static runtime-link=static --layout=versioned stage install If you are on a 32-bit Linux system, type (all on one line): bjam --prefix=<your_boost_install_directory> --toolset=gcc architecture=x86 address_model=32 link=static runtime-link=static stage install If you are on a 64-bit Linux system, type (all on one line): bjam --prefix=<your_boost_install_directory> --toolset=gcc architecture=x86 address_model=64 link=static runtime-link=static stage install Installing the SAM tools 4. Download the SAM tools Unpack the SAM tools tarball and cd to the SAM tools source directory. Build the SAM tools by typing make at the command line. Choose a directory into which you wish to copy the SAM tools binary, the included library libbam.a, and the library

ufflinks RNA-Seq analysis tools - Getting Started 4 of 6 14.07.2011 09:42 headers. A common choice is /usr/local/. 5. Copy libbam.a to the lib/ directory in the folder you've chosen above (e.g./usr/local/lib/) 6. Create a directory called "bam" in the include/ directory (e.g. /usr/local/include/bam) 7. Copy the headers (files ending in.h) to the include/bam directory you've created above (e.g./usr/local /include/bam) 8. Copy the samtools binary to some directory in your PATH. Building Cufflinks Unpack the Cufflinks source tarball: tar zxvf cufflinks-0.7.0.tar.gz Change to the Cufflinks directory: cd cufflinks-0.7.0 4. Configure Cufflinks. If Boost is installed somewhere other than /usr/local, you will need to tell the installer where to find it using the --with-boost option. Specify where to install Cufflinks using the --prefix option../configure --prefix=/path/to/cufflinks/install --with-boost=/path/to/boost If you see any errors during configuration, verify that you are using Boost version 38 or higher, and that the directory you specified via --with-boost contains the boost header files and libraries. See the Boost Getting started page for more details. If you copied the SAM tools binaries to someplace other than /usr/local/, you may need to supply the --with-bam configuration option. Finally, make and install Cufflinks. make make install Testing the installation Download the test data In the directory where you placed the test file, type: cufflinks./test_data.sam You should see the following output: [bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file). File./test_data.sam doesn't appear to be a valid BAM file, trying SAM... [13:23:15] Inspecting reads and determining fragment length distribution. > Processed 1 loci. [*************************] 100% > Map Properties: > Total Map Mass: 104.00 > Read Type: 75bp paired-end > Fragment Length Distribution: Gaussian (default) > Estimated Mean: 209.55 > Estimated Std Dev: 70.54 [13:23:15] Assembling transcripts and estimating abundances. > Processed 1 loci. [*************************] 100% Verify that the file transcripts.gtf is in the current directory and looks like this (your file will have GTF attributes, omitted here for clarity) test_chromosome Cufflinks exon 53 250 1000 +. test_chromosome Cufflinks exon 351 400 1000 +. test_chromosome Cufflinks exon 501 550 1000 +.

ufflinks RNA-Seq analysis tools - Getting Started 5 of 6 14.07.2011 09:42 Common uses of the Cufflinks package Discovering novel genes and transcripts RNA-Seq is a powerful technology for gene and splice variant discovery. You can use Cufflinks to help annotate a new genome or find new genes and splice isoforms of known genes in even well-annotated genomes. Annotating genomes is a complex and difficult process, but we outline a basic workflow that should get you started here. The workflow also excludes examples of the commands you'd run to implement each step in the workflow. Suppose we have RNA-Seq reads from human liver, brain, and heart. Map the reads for each tissue to the reference genome We recommend that you use TopHat to map your reads to the reference genome. For this example, we'll assume you have paired-end RNA-Seq data. You can map reads as follows: tophat -r 50 -o tophat_brain /seqdata/indexes/hg19 brain_fq brain_fq tophat -r 50 -o tophat_liver /seqdata/indexes/hg19 liver_fq liver_fq tophat -r 50 -o tophat_heart /seqdata/indexes/hg19 heart_fq heart_fq The commands above are just examples of how to map reads with TopHat. Please see the TopHat manual for more details on RNA-Seq read mapping. Run Cufflinks on each mapping file The next step is to assemble each tissue sample independently using Cufflinks. Assemble each tissue like so: cufflinks -o cufflinks_brain tophat_brain/accepted_hits.bam cufflinks -o cufflinks_liver tophat_liver/accepted_hits.bam cufflinks -o cufflinks_heart tophat_liver/accepted_hits.bam Merge the resulting assemblies assemblies.txt: cufflinks_brain/transcripts.gtf cufflinks_liver/transcripts.gtf cufflinks_heart/transcripts.gtf Now run the merge script: cuffmerge -s /seqdata/fastafiles/hg19/hg19.fa assemblies.txt The final, merged annotation will be in the file merged_asm/merged.gtf. At this point, you can use your favorite browser to explore the structure of your genes, or feed this file into downstream informatic analyses, such as a search for orthologs in other organisms. You can also explore your samples with Cuffdiff and identify genes that are significantly differentially expressed between the three conditions. See the workflows below for more details on how to do this. 4. (optional) Compare the merged assembly with known or annotated genes If you want to discover new genes in a genome that has been annotated, you can use cuffcompare to sort out what is new in your assembly from what is already known. Run cuffcompare like this: cuffcompare -s /seqdata/fastafiles/hg19/hg19.fa -r known_annotation.gtf merged_asm/merged.gtf Cuffcompare will produce a number of output files that you can parse to select novel genes and isoforms. Identifying differentially expressed and regulated genes

Cufflinks RNA-Seq analysis tools - Getting Started 6 of 6 14.07.2011 09:42 There are two workflows you can choose from when looking for differentially expressed and regulated genes using the Cufflinks package. The first workflow is simpler and is a good choice when you aren't looking for novel genes and transcripts. This workflow requires that you not only have a reference genome, but also a reference gene annotation in GFF format (GFF3 or GTF2 formats are accepted, see details here). The second workflow, which includes steps to discover new genes and new splice variants of known genes, is more complex and requires more computing power. The second workflow can use and augment a reference gene annotation GFF if one is available. Differential analysis without gene and transcript discovery Map the reads for each condition to the reference genome We recommend that you use TopHat to map your reads to the reference genome. For this example, we'll assume you have paired-end RNA-Seq data. Suppose you have RNA-Seq from a knockdown experiment where you have two biological replicates of a mock condition as a control and two replicates of your knockdown. Note: Cuffdiff will work much better if you map your replicates independently, rather than pooling the replicates from one condition into a single set of reads. Note: While an GTF of known transcripts is not strictly required at this stage, providing one will improve alignment sensitivity, and ultimately, the accuracy of Cuffdiff's analysis. You can map reads as follows: tophat -r 50 -G annotation.gtf -o tophat_mock_rep1 /seqdata/indexes/hg19 \ mock_rep1_fq mock_rep1_fq tophat -r 50 -G annotation.gtf -o tophat_mock_rep2 /seqdata/indexes/hg19 \ mock_rep2_fq mock_rep2_fq tophat -r 50 -G annotation.gtf -o tophat_knockdown_rep1 /seqdata/indexes/hg19 \ knockdown_rep1_fq knockdown_rep1_fq tophat -r 50 -G annotation.gtf -o tophat_knockdown_rep2 /seqdata/indexes/hg19 \ knockdown_rep2_fq knockdown_rep2_fq Run Cuffdiff Take the annotated transcripts for your genome (as GFF or GTF) and provide them to cuffdiff along with the BAM files from TopHat for each replicate: cuffdiff annotation.gtf mock_repbam,mock_repbam \ knockdown_repbam,knockdown_repbam Differential analysis with gene and transcript discovery Complete steps 1-3 in "Discovering novel genes and transcripts", above Follow the protocol for gene and transcript discovery listed above. Be sure to provide TopHat and the assembly merging script with an reference annotation if one is available for your organism, to ensure the highest possible quality of differential expression analysis. Run Cuffdiff Take the merged assembly from produced in step 3 of the discovery protocol and provide it to cuffdiff along with the BAM files from TopHat: cuffdiff merged_asm/merged.gtf liverbam,liverbam brainbam,brainbam As shown above, replicate BAM files for each conditions must be given as a comma separated list. If you put spaces between replicate files instead of commas, cuffdiff will treat them as independent conditions. This research was supported in part by NIH grants R01-LM06845 and R01-GM083873, NSF grant CCF-0347992 and the Miller Institute for Basic Research in Science at UC Berkeley. Administrator: Cole Trapnell. Design by David Herreman