NGS Analysis Using Galaxy

Similar documents
NGS Data Visualization and Exploration Using IGV

Welcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page.

Galaxy Platform For NGS Data Analyses

RNA-seq. Manpreet S. Katari

RNA-Seq in Galaxy: Tuxedo protocol. Igor Makunin, UQ RCC, QCIF

David Crossman, Ph.D. UAB Heflin Center for Genomic Science. GCC2012 Wednesday, July 25, 2012

Colorado State University Bioinformatics Algorithms Assignment 6: Analysis of High- Throughput Biological Data Hamidreza Chitsaz, Ali Sharifi- Zarchi

Maize genome sequence in FASTA format. Gene annotation file in gff format

NGS FASTQ file format

Galaxy workshop at the Winter School Igor Makunin

Genome 373: Mapping Short Sequence Reads III. Doug Fowler

Sequence Analysis Pipeline

ChIP-seq hands-on practical using Galaxy

Read Mapping and Variant Calling

BGGN-213: FOUNDATIONS OF BIOINFORMATICS (Lecture 14)

Single/paired-end RNAseq analysis with Galaxy

Dr. Gabriela Salinas Dr. Orr Shomroni Kaamini Rhaithata

RNAseq analysis: SNP calling. BTI bioinformatics course, spring 2013

RNA-Seq Analysis With the Tuxedo Suite

ChIP-seq (NGS) Data Formats

ChIP-seq hands-on practical using Galaxy

Ensembl RNASeq Practical. Overview

Integrative Genomics Viewer. Prat Thiru

Cyverse tutorial 1 Logging in to Cyverse and data management. Open an Internet browser window and navigate to the Cyverse discovery environment:

Bioinformatics in next generation sequencing projects

Using Galaxy for NGS Analyses Luce Skrabanek

de.nbi and its Galaxy interface for RNA-Seq

Mapping NGS reads for genomics studies

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg

Read mapping with BWA and BOWTIE

TP RNA-seq : Differential expression analysis

Practical exercises Day 2. Variant Calling

RNA-seq Data Analysis

Protocol: peak-calling for ChIP-seq data / segmentation analysis for histone modification data

Using Galaxy: RNA-seq

SAMtools. SAM BAM. mapping. BAM sort & indexing (ex: IGV) SNP call

Variant calling using SAMtools

Evaluate NimbleGen SeqCap RNA Target Enrichment Data

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg

Illumina Next Generation Sequencing Data analysis

NGS Data Analysis. Roberto Preste

Preparation of alignments for variant calling with GATK: exercise instructions for BioHPC Lab computers

AgroMarker Finder manual (1.1)

High-throughout sequencing and using short-read aligners. Simon Anders

Helpful Galaxy screencasts are available at:

ITMO Ecole de Bioinformatique Hands-on session: smallrna-seq N. Servant 21 rd November 2013

Our typical RNA quantification pipeline

NGS : reads quality control

Analyzing Variant Call results using EuPathDB Galaxy, Part II

replace my_user_id in the commands with your actual user ID

Services Performed. The following checklist confirms the steps of the RNA-Seq Service that were performed on your samples.

Variation among genomes

From the Schnable Lab:

Genomic Files. University of Massachusetts Medical School. October, 2014

Identiyfing splice junctions from RNA-Seq data

Resequencing Analysis. (Pseudomonas aeruginosa MAPO1 ) Sample to Insight

TopHat, Cufflinks, Cuffdiff

Mapping RNA sequence data (Part 1: using pathogen portal s RNAseq pipeline) Exercise 6

Short Read Sequencing Analysis Workshop

Handling sam and vcf data, quality control

Today's outline. Resources. Genome browser components. Genome browsers: Discovering biology through genomics. Genome browser tutorial materials

Next Generation Sequence Alignment on the BRC Cluster. Steve Newhouse 22 July 2010

Decrypting your genome data privately in the cloud

Genomic Files. University of Massachusetts Medical School. October, 2015

Our data for today is a small subset of Saimaa ringed seal RNA sequencing data (RNA_seq_reads.fasta). Let s first see how many reads are there:

Part 1: How to use IGV to visualize variants

DNA Sequencing analysis on Artemis

Handling important NGS data formats in UNIX Prac8cal training course NGS Workshop in Nove Hrady 2014

Introduction to Galaxy

Scalable RNA Sequencing on Clusters of Multicore Processors

ChIP-Seq Tutorial on Galaxy

Welcome to GenomeView 101!

Introduction to Galaxy

Supplementary Figure 1. Fast read-mapping algorithm of BrowserGenome.

GSNAP: Fast and SNP-tolerant detection of complex variants and splicing in short reads by Thomas D. Wu and Serban Nacu

NGS Analyses with Galaxy

Importing your Exeter NGS data into Galaxy:

Genome representa;on concepts. Week 12, Lecture 24. Coordinate systems. Genomic coordinates brief overview 11/13/14

WM2 Bioinformatics. ExomeSeq data analysis part 1. Dietmar Rieder

NGS Sequence data. Jason Stajich. UC Riverside. jason.stajich[at]ucr.edu. twitter:hyphaltip stajichlab

Goal: Learn how to use various tool to extract information from RNAseq reads.

Supplementary Information. Detecting and annotating genetic variations using the HugeSeq pipeline

The software comes with 2 installers: (1) SureCall installer (2) GenAligners (contains BWA, BWA- MEM).

Pre-processing and quality control of sequence data. Barbera van Schaik KEBB - Bioinformatics Laboratory

m6aviewer Version Documentation

RNA Sequencing with TopHat and Cufflinks

mrna-seq Basic processing Read mapping (shown here, but optional. May due if time allows) Gene expression estimation

Lecture 12. Short read aligners

Analyzing ChIP- Seq Data in Galaxy

RNASeq2017 Course Salerno, September 27-29, 2017

ChIP-seq practical: peak detection and peak annotation. Mali Salmon-Divon Remco Loos Myrto Kostadima

Introduction to Read Alignment. UCD Genome Center Bioinformatics Core Tuesday 15 September 2015

RNA Sequencing with TopHat Alignment v1.0 and Cufflinks Assembly & DE v1.1 App Guide

Reference guided RNA-seq data analysis using BioHPC Lab computers

Copy Number Variations Detection - TD. Using Sequenza under Galaxy

Copyright 2014 Regents of the University of Minnesota

Aligners. J Fass 21 June 2017

INTRODUCTION AUX FORMATS DE FICHIERS

Short Read Alignment. Mapping Reads to a Reference

Using the Galaxy Local Bioinformatics Cloud at CARC

Transcription:

NGS Analysis Using Galaxy Sequences and Alignment Format Galaxy overview and Interface Get;ng Data in Galaxy Analyzing Data in Galaxy Quality Control Mapping Data History and workflow Galaxy Exercises 1

UCR Galaxy homepage (https://galaxy.bioinfo.ucr.edu)

SNP-seq Analysis dataset Data source: SRR038850 sample from experiment published by Kaufman et al (2012, GSE20176) Understand the target molecular mechanisms underlying AP1 func;on Fastq File: hxp://biocluster.ucr.edu/~rkaundal/galaxy_workshop/snpseq/srr038850.fastq TAIR10 Genome: hxp://biocluster.ucr.edu/~rkaundal/galaxy_workshop/snpseq/tair10chr.fasta http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=gse20176

SNP-seq pipeline Upload data and QC Sequence (fastq file) Reference genome (fasta file) Filter reads based on quality scores Alignment with BWA Align with BWA (Burrows Wheeler Aligner) SAM to BAM conversion Variant calling Mpileup Bcftools view

Upload data Go to "Get Data, click open "Upload File from your computer. Then specify the following list of URLs in URL/Text box http://biocluster.ucr.edu/~rkaundal/galaxy_workshop/snpseq/srr038850.fastq http://biocluster.ucr.edu/~rkaundal/galaxy_workshop/snpseq/tair10chr.fasta 5

Convert Fastq file to sanger format Select NGS: QC and manipulation and Fastq groomer The FASTQ Groomer tool is used to verify and convert between the known FASTQ variants. After grooming, the user is presented with a valid FASTQ format that is accepted by all downstream analysis tools. 6

Features available in history panel View, Edit, Delete file Size of the file Save the file Repeat the analysis

FASTQ Summary Statistics To understand the quality proper;es of the reads, one can run the FASTQ Summary Sta;s;cs tool from NGS: QC and manipula;on. 8

FASTQ Quality control } To understand the quality proper;es of the reads, one can also run the FASTQC: Read QC reports from NGS: QC and manipula;on. 9

Quality control output 10

Quality control reports Per base sequence quality Per base sequence content Per sequence GC content 11

Quality filter This tool filters reads based on quality scores. NGS: QC and manipula;on - > Generic FASTQ manipula;on- >Filter FASTQ reads by quality score and length 12

Alignment with BWA BWA is a fast and accurate short read aligner that allows mismatches and indels (Link) Go to "NGS: Mapping and click on "Map with BWA. (Selngs) 13

SAM to BAM format conversion Produce an indexed BAM file based on a sorted input SAM file. Go to "NGS: SAM Tools, then click open "SAM- to- BAM. 14

Variants calling with Mpileup SNP and INDEL caller to Generate BCF (Binary Variant Format) for one or mul;ple BAM files Go to "NGS: SAM Tools, then click open "MPileup SNP and indel caller. 15

Bcftools view Converts BCF format to VCF format. Go to "NGS: SAM Tools, then click open "bcftools view 16

Rename history

Extract workflow from history

Exercise 2: RNA-seq Analysis Data source: RNA- seq experiment SRA023501 Four Samples: Samples Factors Fastq AP3_f14 AP3 http://biocluster.ucr.edu/~rkaundal/galaxy_workshop/rnaseq/srr064154.fastq AP3_f14 AP3 http://biocluster.ucr.edu/~rkaundal/galaxy_workshop/rnaseq/srr064155.fastq T1_f14 TRL http://biocluster.ucr.edu/~rkaundal/galaxy_workshop/rnaseq/srr064166.fastq T1_f14 TRL http://biocluster.ucr.edu/~rkaundal/galaxy_workshop/rnaseq/srr064167.fastq 19

RNA-seq Analysis workflow Input data Upload the fastq files, reference genome and GTF file Use fastq groomer to groom the fastq sequences Alignment TopHat for alignment Results in inser;ons, dele;ons, splice junctions and accepted hits Use Cuffdiff for differen;al expression tes;ng Differen;al expression tes;ng Finds significant changes in gene expression, splicing and promoter use

Upload Data Upload four fastq files with URL Upload tair10chr.fa with URL (http://biocluster.ucr.edu/~rkaundal/galaxy_workshop/rnaseq/tair10chr.fasta) Upload TAIR10.GTF with URL, specify the format "gtf (http://biocluster.ucr.edu/~rkaundal/galaxy_workshop/rnaseq/tair10.gtf) 21

Fastq Groomer Select NGS: QC and manipula;on and Fastq groomer Run Fastq Groomer for all the 4 fastq sequences. 22

RNA-seq Analysis

Alignment with TopHat TopHat is a fast splice junction mapper for RNA- Seq reads, it can iden;fy splice junctions between exons. Go to "NGS RNA analysis, click open "Tophat for illumina Similarly repeat this process for all the 4 fastq groomed sequences 24

Find Significant Changes Cuffdiff find significant changes in transcript expression. Go to "NGS RNA analysis, click open "Cuffdiff 25

Cuffdiff Output TSS... files report on Transcrip;on Start Sites splicing... report on splicing CDS... track coding region expression transcript... track transcripts gene... rolls up the transcripts into their genes gene/transcript FPKM tracking: gives informa;on about the gene/ transcript (length, nearest ref id, TSS, etc) and the confidence intervals for FPKM for each condition. gene/transcript differen;al expression tes;ng: gives the expression change between groups, a status of whether there was enough data for that value to be accurate (OK is good, FAIL and NOTEST are bad. LOWDATA is somewhere in between). Finally, it gives a p- value. see more details Link 26

Find Significant Changes Cuffdiff find significant changes in transcript expression 27

Output Files from Galaxy SNP- seq Save Bam file BWA generated Save.bai file (index of BAM) BWA generated Save vcf file Samtools mpileup generated Already saved them at: http://biocluster.ucr.edu/~rkaundal/galaxy_workshop/snpseq/ RNA- seq Save four Bam files and four.bai files Already saved them at: http://biocluster.ucr.edu/~rkaundal/galaxy_workshop/rnaseq/ 28

Downloading bam index file (.bai)

Outline What is Galaxy Galaxy for Bioinforma;cians Galaxy for Experimental Biologists Using Galaxy for NGS Analysis NGS Data Visualiza;on and Explora;on Using IGV 30

Why IGV IGV is an integrated visualiza;on tool of large data types View large dataset easily Faster naviga;on on browsing Run it locally on your computer Easy to use interface http://www.broadinstitute.org/igv/

IGV download 32

IGV interface Specify genomic region Zoom in Ruler Tracks

Load data Select genome: Click the genome drop- down list in the toolbar and select the genome Select chromosome 34

Load from URL, file, server Load data files 35

Toolbar Genome drop- down box: loads a genome Chromosome drop- down box: zooms to a chromosome Search box: Displays the chromosome loca;on being shown. To scroll to a different loca;on, enter the gene name, locus or track name and click Go. Whole genome view: Zooms to whole genome view. Define a region: Defines a region of interest on the chromosome. Zoom slider: Zooms in and out on a chromosome. 36

Change Display Options IGV offers several display op;ons for tracks Zoom in and Zoom out Modify Track Height Sort the Tracks Filter the Tracks Group the Tracks Sort Tracks based on Region of Interest 37

Variants Visualization in IGV Load TAIR10 genome to IGV Load BAM file SRR038850.bam to IGV with Load from URL http://biocluster.ucr.edu/~rkaundal/galaxy_workshop/snpseq/sam- to- BAM_BAM.bam Load VCF file var.raw.vcf to IGV with Load from URL http://biocluster.ucr.edu/~rkaundal/galaxy_workshop/snpseq/var.raw.vcf 38

Zoom in to : Chr5:57,073-57,142 Zoom In Screen 39

Variants Visualization in IGV Load TAIR10 genome to IGV Load BAM file SRR064154.bam to IGV with Load from URL http://biocluster.ucr.edu/~rkaundal/galaxy_workshop/snpseq/srr064154.fastq.bam Load VCF file var.raw.vcf to IGV with Load from URL hxp://biocluster.ucr.edu/~rkaundal/galaxy_workshop/snpseq/gatk.vcf 40

Zoom in position (chr5:6,435-6,475)

Zoom in position (chr5:6,435-6,475)

Right click on track for more options

Show all bases in IGV

RNA-seq Results Visualization Load four BAM files to IGV http://biocluster.ucr.edu/~rkaundal/galaxy_workshop/rnaseq/srr064154.bam http://biocluster.ucr.edu/~rkaundal/galaxy_workshop/rnaseq/srr064155.bam http://biocluster.ucr.edu/~rkaundal/galaxy_workshop/rnaseq/srr064166.bam http://biocluster.ucr.edu/~rkaundal/galaxy_workshop/rnaseq/srr064167.bam Load gene differen;al expression GFF3 file expression_diff.gff3 to IGV http://biocluster.ucr.edu/~rkaundal/galaxy_workshop/rnaseq/expression_diff.gff3 44

Exercise 2: RNA-seq Result Visualization Zoom in to Chr1:41,351-51,208

Exercise 2: RNA-seq Result Visualization Zoom in Chr1:49,457-51,457

UCR Galaxy homepage (https://galaxy.bioinfo.ucr.edu)

Thank You