ChIP-seq hands-on practical using Galaxy

Similar documents
ChIP-seq hands-on practical using Galaxy

ChIP-Seq Tutorial on Galaxy

Analyzing ChIP- Seq Data in Galaxy

Protocol: peak-calling for ChIP-seq data / segmentation analysis for histone modification data

NGS Data Visualization and Exploration Using IGV

Genome 373: Mapping Short Sequence Reads III. Doug Fowler

BGGN-213: FOUNDATIONS OF BIOINFORMATICS (Lecture 14)

Colorado State University Bioinformatics Algorithms Assignment 6: Analysis of High- Throughput Biological Data Hamidreza Chitsaz, Ali Sharifi- Zarchi

ChIP-seq practical: peak detection and peak annotation. Mali Salmon-Divon Remco Loos Myrto Kostadima

Galaxy Platform For NGS Data Analyses

Using the Galaxy Local Bioinformatics Cloud at CARC

ChIP-seq Analysis. BaRC Hot Topics - March 21 st 2017 Bioinformatics and Research Computing Whitehead Institute.

de.nbi and its Galaxy interface for RNA-Seq

David Crossman, Ph.D. UAB Heflin Center for Genomic Science. GCC2012 Wednesday, July 25, 2012

ITMO Ecole de Bioinformatique Hands-on session: smallrna-seq N. Servant 21 rd November 2013

NGS Analysis Using Galaxy

Easy visualization of the read coverage using the CoverageView package

Analysis of ChIP-seq data

From genomic regions to biology

Copy Number Variations Detection - TD. Using Sequenza under Galaxy

TP RNA-seq : Differential expression analysis

A short Introduction to UCSC Genome Browser

RNA-Seq in Galaxy: Tuxedo protocol. Igor Makunin, UQ RCC, QCIF

Getting Started. April Strand Life Sciences, Inc All rights reserved.

ChIP-seq Analysis. BaRC Hot Topics - Feb 23 th 2016 Bioinformatics and Research Computing Whitehead Institute.

Analyzing Variant Call results using EuPathDB Galaxy, Part II

Galaxy workshop at the Winter School Igor Makunin

NGS FASTQ file format

Copyright 2014 Regents of the University of Minnesota

Part 1: How to use IGV to visualize variants

ChIP-seq Analysis Practical

How To: Run the ENCODE histone ChIP- seq analysis pipeline on DNAnexus

Importing your Exeter NGS data into Galaxy:

Advanced UCSC Browser Functions

m6aviewer Version Documentation

Introduction to Galaxy

Copyright 2014 Regents of the University of Minnesota

Dr. Gabriela Salinas Dr. Orr Shomroni Kaamini Rhaithata

NGS : reads quality control

Cyverse tutorial 1 Logging in to Cyverse and data management. Open an Internet browser window and navigate to the Cyverse discovery environment:

Introduction to Galaxy

Agilent Genomic Workbench Lite Edition 6.5

ChIP- seq Analysis. BaRC Hot Topics - Feb 24 th 2015 BioinformaBcs and Research CompuBng Whitehead InsBtute. hgp://barc.wi.mit.

Using Galaxy to Perform Large-Scale Interactive Data Analyses

Mapping RNA sequence data (Part 1: using pathogen portal s RNAseq pipeline) Exercise 6

Genomic Analysis with Genome Browsers.

Using Galaxy for NGS Analyses Luce Skrabanek

Demo 1: Free text search of ENCODE data

replace my_user_id in the commands with your actual user ID

Exercise 1. RNA-seq alignment and quantification. Part 1. Prepare the working directory. Part 2. Examine qualities of the RNA-seq data files

INF-BIO5121/ Oct 7, Analyzing mirna data using Lifeportal PRACTICALS

!"#$%&$'()#$*)+,-./).01"0#,23+3,303456"6,&((46,7$+-./&((468,

Accessible, Transparent and Reproducible Analysis with Galaxy

High-throughout sequencing and using short-read aligners. Simon Anders

Supplementary Figure 1. Fast read-mapping algorithm of BrowserGenome.

mrna-seq Basic processing Read mapping (shown here, but optional. May due if time allows) Gene expression estimation

Practical 4: ChIP-seq Peak calling

ChIP-seq (NGS) Data Formats

RNA-seq. Manpreet S. Katari

Welcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page.

RNA-Seq Analysis With the Tuxedo Suite

Helpful Galaxy screencasts are available at:

Resequencing Analysis. (Pseudomonas aeruginosa MAPO1 ) Sample to Insight

How to use earray to create custom content for the SureSelect Target Enrichment platform. Page 1

The UCSC Gene Sorter, Table Browser & Custom Tracks

Maximizing Public Data Sources for Sequencing and GWAS

Useful software utilities for computational genomics. Shamith Samarajiwa CRUK Autumn School in Bioinformatics September 2017

Maize genome sequence in FASTA format. Gene annotation file in gff format

You will be re-directed to the following result page.

SMRT-Portal Exercises. J Fass UCD Genome Center Bioinformatics Core Thursday April 16, 2015

Using Galaxy to provide a NGS Analysis Platform GTC s NGS & Bioinformatics Summit Europe October 7-8, 2013 in Berlin, Germany.

Handling sam and vcf data, quality control

CLC Server. End User USER MANUAL

Today's outline. Resources. Genome browser components. Genome browsers: Discovering biology through genomics. Genome browser tutorial materials

Demo 1: Free text search of ENCODE data

UCSC Genome Browser ASHG 2014 Workshop

Our typical RNA quantification pipeline

Genome Browsers - The UCSC Genome Browser

Single/paired-end RNAseq analysis with Galaxy

Galaxy. Daniel Blankenberg The Galaxy Team

Working with ChIP-Seq Data in R/Bioconductor

Using Galaxy to provide a NGS Analysis Platform

UCSC Genome Browser Pittsburgh Workshop -- Practical Exercises

Practical Linux Examples

Short Read Sequencing Analysis Workshop

Charter/PenguinData TQA Manual

Bioinformatics in next generation sequencing projects

HIPPIE User Manual. (v0.0.2-beta, 2015/4/26, Yih-Chii Hwang, yihhwang [at] mail.med.upenn.edu)

Helsinki 19 Jan Practical course in genome bioinformatics DAY 0

Reference guided RNA-seq data analysis using BioHPC Lab computers

Sequence Analysis Pipeline

QIAseq DNA V3 Panel Analysis Plugin USER MANUAL

Chromatin signature discovery via histone modification profile alignments Jianrong Wang, Victoria V. Lunyak and I. King Jordan

Supplementary Material. Cell type-specific termination of transcription by transposable element sequences

Ensembl RNASeq Practical. Overview

ChIP-Seq data analysis workshop

DNA Sequencing analysis on Artemis

Integrative Genomics Viewer. Prat Thiru

epigenomegateway.wustl.edu

Year 8. Revision Exercise April Computer Science CARDINAL NEWMAN CATHOLIC SCHOOL. Student Name : Subject Teacher : Tutor Group:

Transcription:

ChIP-seq hands-on practical using Galaxy In this exercise we will cover some of the basic NGS analysis steps for ChIP-seq using the Galaxy framework: Quality control Mapping of reads using Bowtie2 Peak-calling using MACS Assign peaks to genes We will work with two cell lines, each with two replicated sample-control experiments. Startup Galaxy is already running in server mode inside your VM, so you just have to start a web browser to connect to it. In Bio Linux 7 you can either: Find a Galaxy icon to click in the Ubuntu Dash home - top left Find a Galaxy icon to double-click in the Bioinformatics Applications selection on your Desktop. Open Firefox manually and navigate to localhost:8080 Select User Login from the Galaxy front-page and login as ngscourse@uib.no with password manager. Quality Control First we ll have a look at the fastq raw data files for the ChIP-seq experiments. We will use the FastQC to get some statistics on the reads. 1. First upload one of the course ChIP-seq fastq files from the H1hsec cell line into Galaxy through Get Data Upload File in the left menu. Specify fastqsanger as file format and chr19 as database Tip: Close the Get Data sub menu again by clicking it s name 2. After upload is completed, rename your fastq file element in your history to a shorter meaningfull name revealing cell-line, TF/Ctrl name and replicate no (Tip: Click the pencil icon to edit the item in the history). 1

3. Rename your history to MACS on H1hsec by clicking on the history name to edit it 4. Open the NGS: QC and manipulation sub menu in the left menu and select the FastQC: Read QC tool 5. Check that your fastq file is suggested correctly as the input and start the tool 6. This should produce a new element in your history, inspect it and review the results of the tool in the centre panel. 7. Note the read length of the data (needed later) 8. Try the same procedure for a second fastq file, the one matching to your previous file in this sample-control pair. Mapping Due to time constraints we will only do live mapping of 1 fastq file and upload pre-aligned bam files for the others. 1. In the left menu, open the NGS: Mapping sub-menu and select the Bowtie2 tool. 2. Select your H1hsec Egr1 sample fastq file, select chr19 as your reference genome, and use defualt settings for the rest. 3. Execute your mapping job, this can take some time (15-20 min depending on your computer), there will come a new data item in your history that should turn from grey via yellow to green when the job is done. 4. Rename your new bam file entry in the history to something meaningful containing cell-line, TF/Ctrl name and rep no. 5. For visual inspection of the results, either try to visualize your bam file using Trackster (horizontal bar chart icon in the detailed view of the bam file entry in the history) or download the bam file and inspect it in IGV as you did for other bam files yesterday ChIP-seq doen not have the same clear expectations to where reads should ditribute across the genome as RNA-seq data have. But for curiosity we might ask how many of the reads binds within 10kb upstream of the genes?. 1. The Get data UCSC Main tool let us query and download interval data in bed and gtf format easily. 2

Navigate to the hg19 genome, select Genes and Genes Predicions, only for chr19:1-59128983, tick Send output to Galaxy, specify BED format and click the get output button. In the next page, tick and fill in Upstream by 10000 bases and click Send query to Galaxy button. 2. You will then get a new BED file item in your history. 3. Use the NGS:Something Slice bam file tool to cut out the reads from your bam file that overlap the BED file. 4. The number of reads in the resulting bam file now tells you how many out of the total mapped reads that lie within 10kb upstream of the genes on chr19. How big percentage is this? Now we have to speed up a bit, and thus we: 1. Upload the bam file that match the sample you ve already mapped. Same cell-line, matching Ctrl or TF, same replicate no. (Using Get data Upload File as before). 2. Rename to a shorter and more meaningful name as before. Peak-calling Now we have two bam files, one for our sample enriched for TF bound DNA and one for the non-enriched control. The next task is to call peaks of reads that are significantly enriched in the sample over the control. For this we use the MACS tool. 1. Open the MACS tool under NGS: Peak-calling MACS. 2. Name the experiment MACS on H1hsec rep 1 (or rep 2 if you started on that one) 3. Select your TF sample as the ChIP-Seq Tag File 4. Select your Control as the ChIP-Seq Control File 5. Enter the effective genome size as the length of chr19: 59128983 6. Tag size is the read length that you found under the FastQC step above :) 7. Since we have downsampled the original data, we also lower the minimum MFOLD high-confidence ratio to 10. 8. Leave other parameters at default. 3

9. Execute Let s then examine our results in either Trackster inside Galaxy (Visualize icon on the bed file result item) or in IGV by downloading the bed file first. 1. In addition to loading the bed file result itself (on chr19 background) 2. Load/Add your bam files for Sample and Control as additional tracks 3. Load the Ensemble gene information from the course material as well (GTF file). 4. The called peaks are named MACS no, so by using the position/search field, try to inspect the first peaks. 5. Check how the peaks are sorted in the Galaxy history preview/full view of the peak bed file. Assign peaks to genes The next step will be to assign peaks to the closest gene. To do this we need to combine our bed interval file for the peaks, with one file representing genes. Galaxy provides by default many tools that works with combinations of genome interval files. We will use two of them that: Find genes with Trancription Start Site (TSS) that overlaps with our MACS peak calling results. Find the closest non-overlapping TSS for all peaks. Then we first need a bed file encoding the TSS in an interval file like bed foramt. This we have prepared for you for chr19, and is available at the course download webpage. We will use the tools under the Operate on Genomic Intervals sub menu of the tools menu to the left. So, let s get started. 1) Get hold of the TSS bed-file 2) Upload it into your Galaxy history ( Get data in left menu) 3) First use the Intersect tool to see how many of our called peaks overlap with a TSS. 4) Then we ll use the Fetch closest non-overlapping feature tool to find the nearest TSS not overlapping with our peaks. 4

Next replicate Since we have two replicated experiments for our TF ChIP-seq data, start from the two bam files from the second replicate do the peak-calling and assign genes to the peaks afterwards. 1. How many peaks were called in each of the replicate expetiements? 2. How many of these peaks are overlapping between the replicates? Next cell line We also have a second cell line Mcf7 with two replicated ChHIP-seq experiments for the same TF. 1) Create a new history and call it Mcf7 MACS 2) Start from bam files and do peak calling and assign to genes 3) How good is the overlap between the replicates in this cell line? Between Cell line comparison This next exercise you can either do in Galaxy (get your bed files holding peaks called in both replicates of a cell-line into a new history) or using bed-tools from the commandline (tip: intersectbed commandline tool) 1) How many of the peaks called by both replicates in a cell-line is present in both cell lines? 2) And how many peaks are unique for each cell-line Enjoy :) 5