The software and data for the RNA-Seq exercise are already available on the USB system
|
|
- Brooke McGee
- 6 years ago
- Views:
Transcription
1 BIT815 Notes on R analysis of RNA-seq data The software and data for the RNA-Seq exercise are already available on the USB system The notes below regarding installation of R packages and other software are provided for future reference. Installation of R packages in a Linux environment for future reference only If the packages installed in R are to be available to all users on a Linux system, then R must be run with root privileges so the packages can be installed in the system library. To run R with root privileges, start it using the command: sudo R and then install the desired packages, either from the R repository using (for example): install.packages( RSQLite,dependencies=TRUE,lib= /usr/local/lib/r/site-library ) or from the Bioconductor repository using (for example): source(" bioclite("genomicfeatures", lib= /usr/local/lib/r/site-library ) If R is started without root privileges (ie no sudo command before R), then packages will be installed in the user's home directory (where the user has write permission), and those packages are not available for other users on the system. We will use the database package SQLite and the R packages Genomic Features and RSQLite to save a database of transcript annotation; these packages are installed on the USB system. Overview: Analysis of RNA-Seq data using R Align to reference sequence Annotate aligned reads Count reads per feature Analyze counts for differences 1. The analysis of RNA-seq data begins with aligning the sequence reads to some reference sequence, either a reference genome sequence or a reference transcriptome previously assembled from sequence reads using the Trinity software package or another appropriate assembly method. 2. The next step is to merge the read alignment results with annotation of the reference sequence to identify which regions of the reference correspond to specific features genes, transcripts, or exons. 3. The third step is to count the number of reads aligned to each feature in the reference sequence. The choice of which type of feature to use in counting reads is an important part of the experimental design, and depends on the biological question of interest. a. Counting reads that map to annotated genes means alternative splicing events will not be counted all the reads that map to a specific region of genomic DNA will be assigned to the gene annotated at that site. b. Counting reads that map to annotated transcripts allows detection of previously-known and annotated alternative splicing events, but does not allow detection of novel events.
2 c. Counting reads that map to annotated exons allows detection of alternative splicing events, regardless of previous annotation, provided they involve the presence or absence of annotated exons. d. De novo assembly of RNA-Seq reads can give insight into novel transcripts that have not been previously described or annotated. 4. The final step is to use the read counts as input to a statistical analysis that can model the variation in read counts as a function of both technical and biological variation, and provide insight into which genes, transcripts, or exons are likely to be differentially-represented in the read count data. We will try to use the GenomicFeatures package of Bioconductor to create a database containing information on Arabidopsis transcripts (a TxDb ), for use in analysis of the reads sampled from the experiment comparing bacteria-inoculated versus mock-inoculated Arabidopsis plants. Ssee for a description of the experiment, and the link to Gene-counter; a computational pipeline for the analysis of RNA-seq data for gene expression differences for details on the experiment. More information on the GenomicFeatures package is available at and documents linked from that webpage. Exercises 1. Open a terminal window, create a directory called rnaseq in the home directory, and copy five sequence read files and the chromosome 5 reference sequence from /media/lubuntu/data/data to it. The ts3.fastq.gz file on the USB system is corrupt in some cases, so a new copy of that file can be downloaded from a link on the course webpage. mkdir rnaseq cd /media/lubuntu/data/data cp Atchromo5.fasta.gz cn1.fastq.gz cn2.fastq.gz cn3.fastq.gz tsl.fastq.gz ts2.fastq.gz t ~/rnaseq Change to the rnaseq directory, and download the file t3.fq.gz from the course website (Week 5, transcriptome analysis page): cd ~/rnaseq wget 2. Start RStudio from the Applications Programming menu (lower left corner of the screen), and load the Bioconductor packages RSQLite, biomart, ShortRead, GenomicFeatures, and DESeq for use. library(rsqlite); library(biomart); library(genomicfeatures); library(deseq2) 3. Use the listmarts() command to see a list of databases available through Biomart. listmarts() Read the help file on the maketxdbfrombiomart() command to learn which Biomart databases can be used to build a TxDb this information is at the very bottom of the help file.
3 ?maketxdbfrombiomart 4. Use the maketxtdbfrombiomart to create a transcript database from the Arabidopsis TAIR10 gene annotation dataset athaliana_eg_gene in the plants_mart_25 database. transcriptdb <- maketranscriptdbfrombiomart(biomart="plants_mart_25", dataset="athaliana_eg_gene") In my experience, this command fails, with a cryptic error message about 0 or more than 1 file found on the FTP server at the Ensembl database. A web search using this error message as a query should identify a post on the Bioconductor support page asking why the error occurs, and the answers to that question are illuminating it seems to be due to a bug introduced in a recent upgrade of the software. See the DEanalysis.R script file for a description of an alternative approach to creating the transcript database, and download the file At.TAIR10.sqlite using the following command: wget -O At.TAIR10.sqlite The resulting file can be loaded into your R session with the loaddb() command. transcriptdb <- loaddb("at.tair10.sqlite") 5. Look at the properties of the transcript database you created how many transcripts are represented? How many exons? metadata(transcriptdb) 6. Read the help file on the transcripts() command 7. Use the transcripts() command to extract the set of all transcripts from chromosome 5 (as a GRanges object) out of the database. chr5txpts <- transcripts(transcriptdb,vals=list(tx_chrom="5")) 8. Set the working directory to ~/rnaseq, then save the database as an SQLite file for future use. setwd("rnaseq") savedb(transcriptdb,file="atdb.sqlite") To re-load the database from the saved file (in a new R session), use the command: mysaveddb <- loaddb("atdb.sqlite") 9. Look at the GRanges object chr5txpts to see how the chromosome name is represented what name is shown in the seqnames column? head(chr5txpts) 10. This part of the exercise will use the terminal window rather than the RStudio window, because you will use other programs to align the six fastq.gz files to the reference chromosome 5 sequence to produce SAM/BAM files. In the Linux terminal, use the zcat and head commands to look at the first line of the Atchromo5.fasta.gz file in the rnaseq directory what is the chromosome name in that file? zcat Atchromo5.fasta.gz head -1
4 11. KEY POINT: The reference sequence names in the BAM alignment files MUST MATCH the names in the R transcript database object to be used for analysis of those alignment results. The reason is obvious if you image aligning reads to a transcriptome assembly with 50,000 different sequences each sequence must have the same unique name in both the alignment file and the transcript database so R can make the connection between the two datasets. 12. Use sed to change the name of the reference sequence in the Atchromo5.fasta.gz file to 5, to match the name in the chr5txpts GRanges object in R. Then use the bwa index to create an index of the modified file, followed by bwa mem to align the six files of RNA-seq reads to the reference sequence and pipe the SAM output to samtools to filter out any unmapped reads and convert the alignments to BAM format and sort them. The following code carries out these steps. zcat Atchromo5.fasta.gz sed -e 's/gi ref NC_ /5/' gzip > Atchr5.fa.gz bwa index -p Atchr5 Atchr5.fa.gz bwa mem -t 3 Atchr5 cn1ln7.fastq.gz samtools view -SbuF4 - samtools sort - ctrl1 The last series of commands (bwa mem samtools view samtools sort) should be repeated for the other five fastq.gz files: cn2ln8, cn3ln1, ts1ln4, ts2ln6, and ts3ln2. The output bam files can be named ctrl2.bam, ctrl3.bam, test1.bam, test2.bam and test3.bam. The F4 option filters reads that are not aligned to the reference sequence from the output to reduce the file sizes. The samtools version we have (0.1.19) will give an error message that states [bam_header_read] EOF marker is absent. The input is probably truncated. This is a bug in the samtools code, not a real problem. Using 3 processors (as specified by the -t 3 option), my laptop completes each alignment job in one to three minutes. To see how many processors are available, execute the command cat /proc/cpuinfo grep processor at a terminal prompt. Note that processors are numbered using a 0-indexed system, so if the highest number is 3, there are a total of 4 processors. NOTE: BWA is not designed to map RNA-seq reads that contain splice junctions between exons to genomic DNA, so reads that overlap such junctions will be lost in this analysis. 13. The modified files should now be ready for importing into R and analysis. Return to the RStudio window and look up the readaligned command.?readaligned Change the working directory to /home/lubuntu/rnaseq, then load the BAM files into R using readaligned(). setwd("/home/lubuntu/rnaseq") c1 <- readaligned(".","ctrl1.bam",type="bam") c2 <- readaligned(".","ctrl2.bam",type="bam") c3 <- readaligned(".","ctrl3.bam",type="bam") t1 <- readaligned(".","test1.bam",type="bam") t2 <- readaligned(".","test2.bam",type="bam") t3 <- readaligned(".","test3.bam",type="bam") 14. Convert the AlignedRead objects produced by the readaligned() import process into GRanges objects using as(x, GRanges ), and set the strand variable for each aligned read to *, because the library preparation method used for these reads was not strand-specific, so there is no useful biological information in that variable.
5 c1gr <- as(c1, "GRanges") strand(c1gr) <- "*" c2gr <- as(c2, "GRanges") strand(c2gr) <- "*" c3gr <- as(c3, "GRanges") strand(c3gr) <- "*" t1gr <- as(t1, "GRanges") strand(t1gr) <- "*" t2gr <- as(t2, "GRanges") strand(t2gr) <- "*" t3gr <- as(t3, "GRanges") strand(t3gr) <- "*" 15. Look up the countoverlaps() function in R to see what it does.?countoverlaps 16. Run countoverlaps() on each of the GRanges objects of aligned reads, using the chr5txpts object as a reference. c1.counts=countoverlaps(chr5txpts,c1gr) c2.counts=countoverlaps(chr5txpts,c2gr) c3.counts=countoverlaps(chr5txpts,c3gr) t1.counts=countoverlaps(chr5txpts,t1gr) t2.counts=countoverlaps(chr5txpts,t2gr) t3.counts=countoverlaps(chr5txpts,t3gr) 17. Combine the six vectors of read counts into a dataframe. Extract the transcript IDs for the 9288 transcripts on chromosome 5 from the transcriptdb database and use those to name the 9288 rows of the all.counts dataframe. all.counts <- data.frame(c1=c1.counts, c2=c2.counts, c3=c3.counts, t1=t1.counts, t2=t2.counts, t3=t3.counts) GRList <- transcriptsby(transcriptdb, by = "gene") tx_ids <- names(grlist) txpt.names <- select(transcriptdb, keys=tx_ids, keytype="geneid", columns=c("txid", "TXNAME", "GENEID")) chr5.rows <- which(substr(txpt.names$geneid,1,3)=="at5") rownames(all.counts) <- txpt.names[chr5.rows,3] 18. Create a vector of factors that define the experimental treatment of each column in the dataframe. trtmnts <- data.frame(condition=c(rep("ctrl",3),rep("test",3))) 19. Look up the DESeqDataSetFromMatrix() command from the DESeq2 package.?deseqdatasetfrommatrix() 20. Use DESeqDataSetFromMatrix() to convert the dataframe and vector of factors into a DESeqDataSet. Data <- DESeqDataSetFromMatrix(all.counts, trtmnts,formula(~condition)) 21. Use estimatesizefactors() to estimate the size of each of the six samples of reads Data <- estimatesizefactors(data)
6 22. Use estimatedispersions() to estimate the variance among replicate samples Data <- estimatedispersions(data) 23. Use nbinomwaldtest() to test the significance of differential expression for all transcripts. Data <- nbinomwaldtest(data) 24. Find the lines in the table of results with adjusted p-values (after correction for multiple testing) less than Recover the gene IDs (the first 9 characters of the transcript IDs in the rownames of the signif.genes table) as a vector. result <- nbinomtest(cdata, "test", "ctrl") signif.genes <- result[which(result$padj < 0.05),] gene.ids <- substr(rownames(signif.genes),1,9] 25. Retrieve functional annotation for the differentially-expressed genes from Biomart. atdb <- usemart("plants_mart_21", dataset="athaliana_eg_gene") filters <- listfilters(atdb) attributes <- listattributes(atdb) descriptions <- getbm(attributes=c("tair_locus","description"),filters="tair_locus",values=gene.ids,mart=atdb) Look in the Environment pane of the RStudio window for a summary of the descriptions and signif.genes objects you will note that they differ by several rows in size. This is because there are several cases of two or more differentially-expressed transcripts from single genes. You can find which gene IDs are duplicated in the vector gene.ids using the command which(duplicated(gene.ids)==true) there are a total of 18. The same gene description from the descriptions object can be applied to all rows of the signif.genes object that share the same TAIR locus name. The following lines create a table of differentially-expressed genes with the adjusted p-value and annotation, then write that table to a text file called DEgenes.txt in the current working directory. names.padj <- data.frame(names=gene.ids,padj=signif.genes$padj) merge.out <- merge(names.padj,descriptions,by.x=1,by.y=1,all.x=true) write.table(merge.out,"degenes.txt",row.names=f,col.names=t,quote=f,sep="\t")
ls /data/atrnaseq/ egrep "(fastq fasta fq fa)\.gz" ls /data/atrnaseq/ egrep "(cn ts)[1-3]ln[^3a-za-z]\."
Command line tools - bash, awk and sed We can only explore a small fraction of the capabilities of the bash shell and command-line utilities in Linux during this course. An entire course could be taught
More informationSequence Analysis Pipeline
Sequence Analysis Pipeline Transcript fragments 1. PREPROCESSING 2. ASSEMBLY (today) Removal of contaminants, vector, adaptors, etc Put overlapping sequence together and calculate bigger sequences 3. Analysis/Annotation
More informationEnsembl RNASeq Practical. Overview
Ensembl RNASeq Practical The aim of this practical session is to use BWA to align 2 lanes of Zebrafish paired end Illumina RNASeq reads to chromosome 12 of the zebrafish ZV9 assembly. We have restricted
More informationGene and Genome Annotation Resources in Bioconductor
Gene and Genome Annotation Resources in Bioconductor Martin Morgan 2012-07-05 Thu Contents 1 Introduction: Gene and genome annotations 1 2 Gene-centric discovery & selection 2 3 Genomic discovery & selection
More informationHandling genomic data using Bioconductor II: GenomicRanges and GenomicFeatures
Handling genomic data using Bioconductor II: GenomicRanges and GenomicFeatures Motivating examples Genomic Features (e.g., genes, exons, CpG islands) on the genome are often represented as intervals, e.g.,
More informationUsing the GenomicFeatures package
Using the GenomicFeatures package Marc Carlson Fred Hutchinson Cancer Research Center December 10th 2010 Bioconductor Annotation Packages: a bigger picture PLATFORM PKGS GENE ID HOMOLOGY PKGS GENE ID ORG
More informationQuantification. Part I, using Excel
Quantification In this exercise we will work with RNA-seq data from a study by Serin et al (2017). RNA-seq was performed on Arabidopsis seeds matured at standard temperature (ST, 22 C day/18 C night) or
More informationDifferential gene expression analysis
Differential gene expression analysis Overview In this exercise, we will analyze RNA-seq data to measure changes in gene expression levels between wild-type and a mutant strain of the bacterium Listeria
More informationExercise 1. RNA-seq alignment and quantification. Part 1. Prepare the working directory. Part 2. Examine qualities of the RNA-seq data files
Exercise 1. RNA-seq alignment and quantification Part 1. Prepare the working directory. 1. Connect to your assigned computer. If you do not know how, follow the instruction at http://cbsu.tc.cornell.edu/lab/doc/remote_access.pdf
More informationIdentiyfing splice junctions from RNA-Seq data
Identiyfing splice junctions from RNA-Seq data Joseph K. Pickrell pickrell@uchicago.edu October 4, 2010 Contents 1 Motivation 2 2 Identification of potential junction-spanning reads 2 3 Calling splice
More informationPractical: Read Counting in RNA-seq
Practical: Read Counting in RNA-seq Hervé Pagès (hpages@fhcrc.org) 5 February 2014 Contents 1 Introduction 1 2 First look at some precomputed read counts 2 3 Aligned reads and BAM files 4 4 Choosing and
More informationRNA-Seq Analysis With the Tuxedo Suite
June 2016 RNA-Seq Analysis With the Tuxedo Suite Dena Leshkowitz Introduction In this exercise we will learn how to analyse RNA-Seq data using the Tuxedo Suite tools: Tophat, Cuffmerge, Cufflinks and Cuffdiff.
More informationTiling Assembly for Annotation-independent Novel Gene Discovery
Tiling Assembly for Annotation-independent Novel Gene Discovery By Jennifer Lopez and Kenneth Watanabe Last edited on September 7, 2015 by Kenneth Watanabe The following procedure explains how to run the
More informationRsubread package: high-performance read alignment, quantification and mutation discovery
Rsubread package: high-performance read alignment, quantification and mutation discovery Wei Shi 14 September 2015 1 Introduction This vignette provides a brief description to the Rsubread package. For
More informationDr. Gabriela Salinas Dr. Orr Shomroni Kaamini Rhaithata
Analysis of RNA sequencing data sets using the Galaxy environment Dr. Gabriela Salinas Dr. Orr Shomroni Kaamini Rhaithata Microarray and Deep-sequencing core facility 30.10.2017 RNA-seq workflow I Hypothesis
More informationThe QoRTs Analysis Pipeline Example Walkthrough
The QoRTs Analysis Pipeline Example Walkthrough Stephen Hartley National Human Genome Research Institute National Institutes of Health October 31, 2017 QoRTs v1.0.1 JunctionSeq v1.9.0 Contents 1 Overview
More informationRNA-seq. Manpreet S. Katari
RNA-seq Manpreet S. Katari Evolution of Sequence Technology Normalizing the Data RPKM (Reads per Kilobase of exons per million reads) Score = R NT R = # of unique reads for the gene N = Size of the gene
More informationRsubread package: high-performance read alignment, quantification and mutation discovery
Rsubread package: high-performance read alignment, quantification and mutation discovery Wei Shi 14 September 2015 1 Introduction This vignette provides a brief description to the Rsubread package. For
More informationData: ftp://ftp.broad.mit.edu/pub/users/bhaas/rnaseq_workshop/rnaseq_workshop_dat a.tgz. Software:
A Tutorial: De novo RNA- Seq Assembly and Analysis Using Trinity and edger The following data and software resources are required for following the tutorial: Data: ftp://ftp.broad.mit.edu/pub/users/bhaas/rnaseq_workshop/rnaseq_workshop_dat
More informationMaize genome sequence in FASTA format. Gene annotation file in gff format
Exercise 1. Using Tophat/Cufflinks to analyze RNAseq data. Step 1. One of CBSU BioHPC Lab workstations has been allocated for your workshop exercise. The allocations are listed on the workshop exercise
More informationOur data for today is a small subset of Saimaa ringed seal RNA sequencing data (RNA_seq_reads.fasta). Let s first see how many reads are there:
Practical Course in Genome Bioinformatics 19.2.2016 (CORRECTED 22.2.2016) Exercises - Day 5 http://ekhidna.biocenter.helsinki.fi/downloads/teaching/spring2016/ Answer the 5 questions (Q1-Q5) according
More informationExercises: Analysing RNA-Seq data
Exercises: Analysing RNA-Seq data Version 2018-03 Exercises: Analysing RNA-Seq data 2 Licence This manual is 2011-18, Simon Andrews, Laura Biggins. This manual is distributed under the creative commons
More informationNGS Data Visualization and Exploration Using IGV
1 What is Galaxy Galaxy for Bioinformaticians Galaxy for Experimental Biologists Using Galaxy for NGS Analysis NGS Data Visualization and Exploration Using IGV 2 What is Galaxy Galaxy for Bioinformaticians
More informationMerge Conflicts p. 92 More GitHub Workflows: Forking and Pull Requests p. 97 Using Git to Make Life Easier: Working with Past Commits p.
Preface p. xiii Ideology: Data Skills for Robust and Reproducible Bioinformatics How to Learn Bioinformatics p. 1 Why Bioinformatics? Biology's Growing Data p. 1 Learning Data Skills to Learn Bioinformatics
More informationPreparation of alignments for variant calling with GATK: exercise instructions for BioHPC Lab computers
Preparation of alignments for variant calling with GATK: exercise instructions for BioHPC Lab computers Data used in the exercise We will use D. melanogaster WGS paired-end Illumina data with NCBI accessions
More informationNGS Analysis Using Galaxy
NGS Analysis Using Galaxy Sequences and Alignment Format Galaxy overview and Interface Get;ng Data in Galaxy Analyzing Data in Galaxy Quality Control Mapping Data History and workflow Galaxy Exercises
More informationversion /1/2011 Source code Linux x86_64 binary Mac OS X x86_64 binary
Cufflinks RNA-Seq analysis tools - Getting Started 1 of 6 14.07.2011 09:42 Cufflinks Transcript assembly, differential expression, and differential regulation for RNA-Seq Site Map Home Getting started
More informationPackage customprodb. September 9, 2018
Type Package Package customprodb September 9, 2018 Title Generate customized protein database from NGS data, with a focus on RNA-Seq data, for proteomics search Version 1.20.2 Date 2018-08-08 Author Maintainer
More informationTutorial: RNA-Seq Analysis Part II (Tracks): Non-Specific Matches, Mapping Modes and Expression measures
: RNA-Seq Analysis Part II (Tracks): Non-Specific Matches, Mapping Modes and February 24, 2014 Sample to Insight : RNA-Seq Analysis Part II (Tracks): Non-Specific Matches, Mapping Modes and : RNA-Seq Analysis
More informationExercise 1 Review. --outfiltermismatchnmax : max number of mismatch (Default 10) --outreadsunmapped fastx: output unmapped reads
Exercise 1 Review Setting parameters STAR --quantmode GeneCounts --genomedir genomedb -- runthreadn 2 --outfiltermismatchnmax 2 --readfilesin WTa.fastq.gz --readfilescommand zcat --outfilenameprefix WTa
More informationGoal: Learn how to use various tool to extract information from RNAseq reads. 4.1 Mapping RNAseq Reads to a Genome Assembly
ESSENTIALS OF NEXT GENERATION SEQUENCING WORKSHOP 2014 UNIVERSITY OF KENTUCKY AGTC Class 4 RNAseq Goal: Learn how to use various tool to extract information from RNAseq reads. Input(s): magnaporthe_oryzae_70-15_8_supercontigs.fasta
More informationSAM : Sequence Alignment/Map format. A TAB-delimited text format storing the alignment information. A header section is optional.
Alignment of NGS reads, samtools and visualization Hands-on Software used in this practical BWA MEM : Burrows-Wheeler Aligner. A software package for mapping low-divergent sequences against a large reference
More informationHigh-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg
High-throughput sequencing: Alignment and related topic Simon Anders EMBL Heidelberg Established platforms HTS Platforms Illumina HiSeq, ABI SOLiD, Roche 454 Newcomers: Benchtop machines 454 GS Junior,
More informationIllumina Next Generation Sequencing Data analysis
Illumina Next Generation Sequencing Data analysis Chiara Dal Fiume Sr Field Application Scientist Italy 2010 Illumina, Inc. All rights reserved. Illumina, illuminadx, Solexa, Making Sense Out of Life,
More informationGalaxy workshop at the Winter School Igor Makunin
Galaxy workshop at the Winter School 2016 Igor Makunin i.makunin@uq.edu.au Winter school, UQ, July 6, 2016 Plan Overview of the Genomics Virtual Lab Introduce Galaxy, a web based platform for analysis
More informationHow to store and visualize RNA-seq data
How to store and visualize RNA-seq data Gabriella Rustici Functional Genomics Group gabry@ebi.ac.uk EBI is an Outstation of the European Molecular Biology Laboratory. Talk summary How do we archive RNA-seq
More informationm6aviewer Version Documentation
m6aviewer Version 1.6.0 Documentation Contents 1. About 2. Requirements 3. Launching m6aviewer 4. Running Time Estimates 5. Basic Peak Calling 6. Running Modes 7. Multiple Samples/Sample Replicates 8.
More informationRNA-Seq data analysis software. User Guide 023UG050V0100
RNA-Seq data analysis software User Guide 023UG050V0100 FOR RESEARCH USE ONLY. NOT INTENDED FOR DIAGNOSTIC OR THERAPEUTIC USE. INFORMATION IN THIS DOCUMENT IS SUBJECT TO CHANGE WITHOUT NOTICE. Lexogen
More informationLecture 12. Short read aligners
Lecture 12 Short read aligners Ebola reference genome We will align ebola sequencing data against the 1976 Mayinga reference genome. We will hold the reference gnome and all indices: mkdir -p ~/reference/ebola
More informationHigh-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg
High-throughput sequencing: Alignment and related topic Simon Anders EMBL Heidelberg Established platforms HTS Platforms Illumina HiSeq, ABI SOLiD, Roche 454 Newcomers: Benchtop machines: Illumina MiSeq,
More informationDifferential Expression
Differential Expression Data In this practical, as before, we will work with RNA-Seq data from Arabidopsis seeds that matured at standard temperature (ST, 22 C day/18 C night) or at high temperature (HT,
More informationGoal: Learn how to use various tool to extract information from RNAseq reads.
ESSENTIALS OF NEXT GENERATION SEQUENCING WORKSHOP 2017 Class 4 RNAseq Goal: Learn how to use various tool to extract information from RNAseq reads. Input(s): Output(s): magnaporthe_oryzae_70-15_8_supercontigs.fasta
More informationReference guided RNA-seq data analysis using BioHPC Lab computers
Reference guided RNA-seq data analysis using BioHPC Lab computers This document assumes that you already know some basics of how to use a Linux computer. Some of the command lines in this document are
More informationRNA-Seq data analysis software. User Guide 023UG050V0210
RNA-Seq data analysis software User Guide 023UG050V0210 FOR RESEARCH USE ONLY. NOT INTENDED FOR DIAGNOSTIC OR THERAPEUTIC USE. INFORMATION IN THIS DOCUMENT IS SUBJECT TO CHANGE WITHOUT NOTICE. Lexogen
More informationMapping NGS reads for genomics studies
Mapping NGS reads for genomics studies Valencia, 28-30 Sep 2015 BIER Alejandro Alemán aaleman@cipf.es Genomics Data Analysis CIBERER Where are we? Fastq Sequence preprocessing Fastq Alignment BAM Visualization
More informationRNA-Seq data analysis software. User Guide 023UG050V0200
RNA-Seq data analysis software User Guide 023UG050V0200 FOR RESEARCH USE ONLY. NOT INTENDED FOR DIAGNOSTIC OR THERAPEUTIC USE. INFORMATION IN THIS DOCUMENT IS SUBJECT TO CHANGE WITHOUT NOTICE. Lexogen
More informationBenchmarking of RNA-seq aligners
Lecture 17 RNA-seq Alignment STAR Benchmarking of RNA-seq aligners Benchmarking of RNA-seq aligners Benchmarking of RNA-seq aligners Benchmarking of RNA-seq aligners Based on this analysis the most reliable
More informationpreparation methods and new bacterial strains. Parts of the pipeline that can be updated will be annotated in this guide.
BacSeq Introduction The purpose of this guide is to aid current and future Whiteley Lab members and University of Texas microbiologists with bacterial RNA?Seq analysis. Once you have analyzed your data
More informationColorado State University Bioinformatics Algorithms Assignment 6: Analysis of High- Throughput Biological Data Hamidreza Chitsaz, Ali Sharifi- Zarchi
Colorado State University Bioinformatics Algorithms Assignment 6: Analysis of High- Throughput Biological Data Hamidreza Chitsaz, Ali Sharifi- Zarchi Although a little- bit long, this is an easy exercise
More informationRead mapping with BWA and BOWTIE
Read mapping with BWA and BOWTIE Before We Start In order to save a lot of typing, and to allow us some flexibility in designing these courses, we will establish a UNIX shell variable BASE to point to
More informationChIP-Seq Tutorial on Galaxy
1 Introduction ChIP-Seq Tutorial on Galaxy 2 December 2010 (modified April 6, 2017) Rory Stark The aim of this practical is to give you some experience handling ChIP-Seq data. We will be working with data
More informationCounting with summarizeoverlaps
Counting with summarizeoverlaps Valerie Obenchain Edited: August 2012; Compiled: August 23, 2013 Contents 1 Introduction 1 2 A First Example 1 3 Counting Modes 2 4 Counting Features 3 5 pasilla Data 6
More informationBGGN-213: FOUNDATIONS OF BIOINFORMATICS (Lecture 14)
BGGN-213: FOUNDATIONS OF BIOINFORMATICS (Lecture 14) Genome Informatics (Part 1) https://bioboot.github.io/bggn213_f17/lectures/#14 Dr. Barry Grant Nov 2017 Overview: The purpose of this lab session is
More informationSep. Guide. Edico Genome Corp North Torrey Pines Court, Plaza Level, La Jolla, CA 92037
Sep 2017 DRAGEN TM Quick Start Guide www.edicogenome.com info@edicogenome.com Edico Genome Corp. 3344 North Torrey Pines Court, Plaza Level, La Jolla, CA 92037 Notice Contents of this document and associated
More informationGenomics. Nolan C. Kane
Genomics Nolan C. Kane Nolan.Kane@Colorado.edu Course info http://nkane.weebly.com/genomics.html Emails let me know if you are not getting them! Email me at nolan.kane@colorado.edu Office hours by appointment
More information11/8/2017 Trinity De novo Transcriptome Assembly Workshop trinityrnaseq/rnaseq_trinity_tuxedo_workshop Wiki GitHub
trinityrnaseq / RNASeq_Trinity_Tuxedo_Workshop Trinity De novo Transcriptome Assembly Workshop Brian Haas edited this page on Oct 17, 2015 14 revisions De novo RNA-Seq Assembly and Analysis Using Trinity
More informationPackage roar. August 31, 2018
Type Package Package roar August 31, 2018 Title Identify differential APA usage from RNA-seq alignments Version 1.16.0 Date 2016-03-21 Author Elena Grassi Maintainer Elena Grassi Identify
More informationPICS: Probabilistic Inference for ChIP-Seq
PICS: Probabilistic Inference for ChIP-Seq Xuekui Zhang * and Raphael Gottardo, Arnaud Droit and Renan Sauteraud April 30, 2018 A step-by-step guide in the analysis of ChIP-Seq data using the PICS package
More informationUseful software utilities for computational genomics. Shamith Samarajiwa CRUK Autumn School in Bioinformatics September 2017
Useful software utilities for computational genomics Shamith Samarajiwa CRUK Autumn School in Bioinformatics September 2017 Overview Search and download genomic datasets: GEOquery, GEOsearch and GEOmetadb,
More informationMar. Guide. Edico Genome Inc North Torrey Pines Court, Plaza Level, La Jolla, CA 92037
Mar 2017 DRAGEN TM Quick Start Guide www.edicogenome.com info@edicogenome.com Edico Genome Inc. 3344 North Torrey Pines Court, Plaza Level, La Jolla, CA 92037 Notice Contents of this document and associated
More informationStandard output. Some of the output files can be redirected into the standard output, which may facilitate in creating the pipelines:
Lecture 18 RNA-seq Alignment Standard output Some of the output files can be redirected into the standard output, which may facilitate in creating the pipelines: Filtering of the alignments STAR performs
More informationreplace my_user_id in the commands with your actual user ID
Exercise 1. Alignment with TOPHAT Part 1. Prepare the working directory. 1. Find out the name of the computer that has been reserved for you (https://cbsu.tc.cornell.edu/ww/machines.aspx?i=57 ). Everyone
More informationThe preseq Manual. Timothy Daley Victoria Helus Andrew Smith. January 17, 2014
The preseq Manual Timothy Daley Victoria Helus Andrew Smith January 17, 2014 Contents 1 Quick Start 2 2 Installation 3 3 Using preseq 4 4 File Format 5 5 Detailed usage 6 6 lc extrap Examples 8 7 preseq
More informationSequencing. Short Read Alignment. Sequencing. Paired-End Sequencing 6/10/2010. Tobias Rausch 7 th June 2010 WGS. ChIP-Seq. Applied Biosystems.
Sequencing Short Alignment Tobias Rausch 7 th June 2010 WGS RNA-Seq Exon Capture ChIP-Seq Sequencing Paired-End Sequencing Target genome Fragments Roche GS FLX Titanium Illumina Applied Biosystems SOLiD
More informationPackage HTSeqGenie. April 16, 2019
Package HTSeqGenie April 16, 2019 Imports BiocGenerics (>= 0.2.0), S4Vectors (>= 0.9.25), IRanges (>= 1.21.39), GenomicRanges (>= 1.23.21), Rsamtools (>= 1.8.5), Biostrings (>= 2.24.1), chipseq (>= 1.6.1),
More informationDifferential gene expression analysis using RNA-seq
https://abc.med.cornell.edu/ Differential gene expression analysis using RNA-seq Applied Bioinformatics Core, September/October 2018 Friederike Dündar with Luce Skrabanek & Paul Zumbo Day 3: Counting reads
More informationHigh-throughout sequencing and using short-read aligners. Simon Anders
High-throughout sequencing and using short-read aligners Simon Anders High-throughput sequencing (HTS) Sequencing millions of short DNA fragments in parallel. a.k.a.: next-generation sequencing (NGS) massively-parallel
More informationEnsembl Core API. EMBL European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD, UK
Ensembl Core API EMBL European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD, UK EBI is an Outstation of the European Molecular Biology Laboratory. Outline a. b. c.
More informationIntroduc)on to annota)on with Artemis. Download presenta.on and data
Introduc)on to annota)on with Artemis Download presenta.on and data Annota)on Assign an informa)on to genomic sequences???? Genome annota)on 1. Iden.fying genomic elements by: Predic)on (structural annota.on
More informationGene Expression Data Analysis. Qin Ma, Ph.D. December 10, 2017
1 Gene Expression Data Analysis Qin Ma, Ph.D. December 10, 2017 2 Bioinformatics Systems biology This interdisciplinary science is about providing computational support to studies on linking the behavior
More informationQIAseq Targeted RNAscan Panel Analysis Plugin USER MANUAL
QIAseq Targeted RNAscan Panel Analysis Plugin USER MANUAL User manual for QIAseq Targeted RNAscan Panel Analysis 0.5.2 beta 1 Windows, Mac OS X and Linux February 5, 2018 This software is for research
More informationPackage RNASeqR. January 8, 2019
Type Package Package RNASeqR January 8, 2019 Title RNASeqR: RNA-Seq workflow for case-control study Version 1.1.3 Date 2018-8-7 Author Maintainer biocviews Genetics, Infrastructure,
More informationCirc-Seq User Guide. A comprehensive bioinformatics workflow for circular RNA detection from transcriptome sequencing data
Circ-Seq User Guide A comprehensive bioinformatics workflow for circular RNA detection from transcriptome sequencing data 02/03/2016 Table of Contents Introduction... 2 Local Installation to your system...
More informationTutorial: RNA-Seq analysis part I: Getting started
: RNA-Seq analysis part I: Getting started August 9, 2012 CLC bio Finlandsgade 10-12 8200 Aarhus N Denmark Telephone: +45 70 22 55 09 Fax: +45 70 22 55 19 www.clcbio.com support@clcbio.com : RNA-Seq analysis
More informationA quick introduction to GRanges and GRangesList objects
A quick introduction to GRanges and GRangesList objects Hervé Pagès hpages@fredhutch.org Michael Lawrence lawrence.michael@gene.com July 2015 GRanges objects The GRanges() constructor GRanges accessors
More informationHIPPIE User Manual. (v0.0.2-beta, 2015/4/26, Yih-Chii Hwang, yihhwang [at] mail.med.upenn.edu)
HIPPIE User Manual (v0.0.2-beta, 2015/4/26, Yih-Chii Hwang, yihhwang [at] mail.med.upenn.edu) OVERVIEW OF HIPPIE o Flowchart of HIPPIE o Requirements PREPARE DIRECTORY STRUCTURE FOR HIPPIE EXECUTION o
More informationPractical Linux examples: Exercises
Practical Linux examples: Exercises 1. Login (ssh) to the machine that you are assigned for this workshop (assigned machines: https://cbsu.tc.cornell.edu/ww/machines.aspx?i=87 ). Prepare working directory,
More informationBuilding and Using Ensembl Based Annotation Packages with ensembldb
Building and Using Ensembl Based Annotation Packages with ensembldb Johannes Rainer 1 June 25, 2016 1 johannes.rainer@eurac.edu Introduction TxDb objects from GenomicFeatures provide gene model annotations:
More informationIntroduction to Biocondcutor tools for second-generation sequencing analysis
Introduction to Biocondcutor tools for second-generation sequencing analysis Héctor Corrada Bravo based on slides developed by James Bullard, Kasper Hansen and Margaret Taub PASI, Guanajuato, México May
More informationToday's outline. Resources. Genome browser components. Genome browsers: Discovering biology through genomics. Genome browser tutorial materials
Today's outline Genome browsers: Discovering biology through genomics BaRC Hot Topics April 2013 George Bell, Ph.D. http://jura.wi.mit.edu/bio/education/hot_topics/ Genome browser introduction Popular
More informationIdentify differential APA usage from RNA-seq alignments
Identify differential APA usage from RNA-seq alignments Elena Grassi Department of Molecular Biotechnologies and Health Sciences, MBC, University of Turin, Italy roar version 1.16.0 (Last revision 2014-09-24)
More informationMaking and Utilizing TxDb Objects
Marc Carlson, Patrick Aboyoun, HervÃľ PagÃĺs, Seth Falcon, Martin Morgan October 30, 2017 1 Introduction The GenomicFeatures package retrieves and manages transcript-related features from the UCSC Genome
More informationHT Expression Data Analysis
HT Expression Data Analysis 台大農藝系劉力瑜 lyliu@ntu.edu.tw 08/03/2018 1 HT Transcriptomic Data Microarray RNA-seq HT Transcriptomic Data Microarray RNA-seq Workflow Data import Preprocessing* Visualization
More informationGenomic Files. University of Massachusetts Medical School. October, 2014
.. Genomic Files University of Massachusetts Medical School October, 2014 2 / 39. A Typical Deep-Sequencing Workflow Samples Fastq Files Fastq Files Sam / Bam Files Various files Deep Sequencing Further
More informationPackage Rsubread. July 21, 2013
Package Rsubread July 21, 2013 Type Package Title Rsubread: an R package for the alignment, summarization and analyses of next-generation sequencing data Version 1.10.5 Author Wei Shi and Yang Liao with
More informationRNA-Seq analysis with Astrocyte Differential expression and transcriptome assembly
RNA-Seq analysis with Astrocyte Differential expression and transcriptome assembly Beibei Chen Ph.D BICF 9/28/2016 Agenda Launch Workflows using Astrocyte BICF Workflows BICF RNA-seq Workflow Experimental
More informationGenomic Files. University of Massachusetts Medical School. October, 2015
.. Genomic Files University of Massachusetts Medical School October, 2015 2 / 55. A Typical Deep-Sequencing Workflow Samples Fastq Files Fastq Files Sam / Bam Files Various files Deep Sequencing Further
More informationRNA Alternative Splicing and Structures
RNA Alternative Splicing and Structures Tools and Applications Fang Zhaoyuan Wang Zefeng Lab, PICB Outline Alternative splicing analyses from RNA seq data MISO rmats RNA secondary structure analyses RNAfold
More informationJunctionSeq Package User Manual
JunctionSeq Package User Manual Stephen Hartley National Human Genome Research Institute National Institutes of Health March 30, 2017 JunctionSeq v1.5.4 Contents 1 Overview 2 2 Requirements 3 2.1 Alignment.........................................
More informationJunctionSeq Package User Manual
JunctionSeq Package User Manual Stephen Hartley National Human Genome Research Institute National Institutes of Health v0.6.10 November 20, 2015 Contents 1 Overview 2 2 Requirements 3 2.1 Alignment.........................................
More informationRNASeq2017 Course Salerno, September 27-29, 2017
RNASeq2017 Course Salerno, September 27-29, 2017 RNA- seq Hands on Exercise Fabrizio Ferrè, University of Bologna Alma Mater (fabrizio.ferre@unibo.it) Hands- on tutorial based on the EBI teaching materials
More informationDNA Sequencing analysis on Artemis
DNA Sequencing analysis on Artemis Mapping and Variant Calling Tracy Chew Senior Research Bioinformatics Technical Officer Rosemarie Sadsad Informatics Services Lead Hayim Dar Informatics Technical Officer
More informationGenerating and using Ensembl based annotation packages
Generating and using Ensembl based annotation packages Johannes Rainer Modified: 9 October, 2015. Compiled: January 19, 2016 Contents 1 Introduction 1 2 Using ensembldb annotation packages to retrieve
More informationExome sequencing. Jong Kyoung Kim
Exome sequencing Jong Kyoung Kim Genome Analysis Toolkit The GATK is the industry standard for identifying SNPs and indels in germline DNA and RNAseq data. Its scope is now expanding to include somatic
More informationDavid Crossman, Ph.D. UAB Heflin Center for Genomic Science. GCC2012 Wednesday, July 25, 2012
David Crossman, Ph.D. UAB Heflin Center for Genomic Science GCC2012 Wednesday, July 25, 2012 Galaxy Splash Page Colors Random Galaxy icons/colors Queued Running Completed Download/Save Failed Icons Display
More informationChIP-seq (NGS) Data Formats
ChIP-seq (NGS) Data Formats Biological samples Sequence reads SRA/SRF, FASTQ Quality control SAM/BAM/Pileup?? Mapping Assembly... DE Analysis Variant Detection Peak Calling...? Counts, RPKM VCF BED/narrowPeak/
More informationPart 1: How to use IGV to visualize variants
Using IGV to identify true somatic variants from the false variants http://www.broadinstitute.org/igv A FAQ, sample files and a user guide are available on IGV website If you use IGV in your publication:
More informationAligners. J Fass 21 June 2017
Aligners J Fass 21 June 2017 Definitions Assembly: I ve found the shredded remains of an important document; put it back together! UC Davis Genome Center Bioinformatics Core J Fass Aligners 2017-06-21
More informationBioinformatics in next generation sequencing projects
Bioinformatics in next generation sequencing projects Rickard Sandberg Assistant Professor Department of Cell and Molecular Biology Karolinska Institutet March 2011 Once sequenced the problem becomes computational
More informationITMO Ecole de Bioinformatique Hands-on session: smallrna-seq N. Servant 21 rd November 2013
ITMO Ecole de Bioinformatique Hands-on session: smallrna-seq N. Servant 21 rd November 2013 1. Data and objectives We will use the data from GEO (GSE35368, Toedling, Servant et al. 2011). Two samples were
More information