Workshop 6: DNA Methylation Analysis using Bisulfite Sequencing. Fides D Lay UCLA QCB Fellow

Similar documents
metilene - a tool for fast and sensitive detection of differential DNA methylation

Galaxy Platform For NGS Data Analyses

arxiv: v2 [q-bio.gn] 13 May 2014

WM2 Bioinformatics. ExomeSeq data analysis part 1. Dietmar Rieder

EpiGnome Methyl Seq Bioinformatics User Guide Rev. 0.1

Variant calling using SAMtools

NGS Data Visualization and Exploration Using IGV

epigenomegateway.wustl.edu

panda Documentation Release 1.0 Daniel Vera

HIPPIE User Manual. (v0.0.2-beta, 2015/4/26, Yih-Chii Hwang, yihhwang [at] mail.med.upenn.edu)

Sequence Analysis Pipeline

Integrative Genomics Viewer. Prat Thiru

ChIP-seq (NGS) Data Formats

Galaxy. Daniel Blankenberg The Galaxy Team

HMPL User Manual. Shuying Sun or Texas State University

Analyzing ChIP- Seq Data in Galaxy

Sep. Guide. Edico Genome Corp North Torrey Pines Court, Plaza Level, La Jolla, CA 92037

Our typical RNA quantification pipeline

Supplementary Information. Detecting and annotating genetic variations using the HugeSeq pipeline

Data: ftp://ftp.broad.mit.edu/pub/users/bhaas/rnaseq_workshop/rnaseq_workshop_dat a.tgz. Software:

11/8/2017 Trinity De novo Transcriptome Assembly Workshop trinityrnaseq/rnaseq_trinity_tuxedo_workshop Wiki GitHub

Calling variants in diploid or multiploid genomes

Easy visualization of the read coverage using the CoverageView package

Copy Number Variations Detection - TD. Using Sequenza under Galaxy

Package DMCHMM. January 19, 2019

Practical Linux examples: Exercises

Tutorial on gene-c ancestry es-ma-on: How to use LASER. Chaolong Wang Sequence Analysis Workshop June University of Michigan

Preparation of alignments for variant calling with GATK: exercise instructions for BioHPC Lab computers

Unix Essentials. BaRC Hot Topics Bioinformatics and Research Computing Whitehead Institute October 12 th

Maize genome sequence in FASTA format. Gene annotation file in gff format

SAM / BAM Tutorial. EMBL Heidelberg. Course Materials. Tobias Rausch September 2012

Handling sam and vcf data, quality control

Part 1: How to use IGV to visualize variants

Advanced UCSC Browser Functions

RNAseq analysis: SNP calling. BTI bioinformatics course, spring 2013

Genomic Analysis with Genome Browsers.

SNP Calling. Tuesday 4/21/15

Exercise 1. RNA-seq alignment and quantification. Part 1. Prepare the working directory. Part 2. Examine qualities of the RNA-seq data files

GenomeStudio Software Release Notes

SAM : Sequence Alignment/Map format. A TAB-delimited text format storing the alignment information. A header section is optional.

DNA Sequencing analysis on Artemis

Sentieon Documentation

Useful software utilities for computational genomics. Shamith Samarajiwa CRUK Autumn School in Bioinformatics September 2017

NGS Analysis Using Galaxy

User's guide to ChIP-Seq applications: command-line usage and option summary

Protocol: peak-calling for ChIP-seq data / segmentation analysis for histone modification data

Merge Conflicts p. 92 More GitHub Workflows: Forking and Pull Requests p. 97 Using Git to Make Life Easier: Working with Past Commits p.

Release Note. Agilent Genomic Workbench Standard Edition

Package dmrseq. September 14, 2018

Falcon Accelerated Genomics Data Analysis Solutions. User Guide

USING BRAT-BW Table 1. Feature comparison of BRAT-bw, BRAT-large, Bismark and BS Seeker (as of on March, 2012)

Analyzing Genomic Data with NOJAH

Practical Linux Examples

Variation among genomes

A short Introduction to UCSC Genome Browser

Chromatin signature discovery via histone modification profile alignments Jianrong Wang, Victoria V. Lunyak and I. King Jordan

The Smithlab DNA Methylation Data Analysis Pipeline (MethPipe)

Introduction to NGS analysis on a Raspberry Pi. Beta version 1.1 (04 June 2013)

Bioinformatics in next generation sequencing projects

Helpful Galaxy screencasts are available at:

Getting Started. April Strand Life Sciences, Inc All rights reserved.

From genomic regions to biology

Genomic Island Hunter (GIHunter)

replace my_user_id in the commands with your actual user ID

Mapping NGS reads for genomics studies

Read mapping with BWA and BOWTIE

BGGN-213: FOUNDATIONS OF BIOINFORMATICS (Lecture 14)

Introduction to Read Alignment. UCD Genome Center Bioinformatics Core Tuesday 15 September 2015

Using Galaxy to provide a NGS Analysis Platform

Next Generation Sequence Alignment on the BRC Cluster. Steve Newhouse 22 July 2010

Integra(ve Genomics Viewer IGV. Tom Carroll MRC Clinical Sciences Centre

RNA-Seq in Galaxy: Tuxedo protocol. Igor Makunin, UQ RCC, QCIF

MPG NGS workshop I: Quality assessment of SNP calls

Introduction to Galaxy

MACAU User Manual. Xiang Zhou. March 15, 2017

- 1 - Web page:

Mar. Guide. Edico Genome Inc North Torrey Pines Court, Plaza Level, La Jolla, CA 92037

ChIP-Seq Tutorial on Galaxy

methylmnm Tutorial Yan Zhou, Bo Zhang, Nan Lin, BaoXue Zhang and Ting Wang January 14, 2013

Package ENmix. September 17, 2015

WASHU EPIGENOME BROWSER 2018 epigenomegateway.wustl.edu

Using Linux as a Virtual Machine

Analysis of ChIP-seq data

Today's outline. Resources. Genome browser components. Genome browsers: Discovering biology through genomics. Genome browser tutorial materials

Genomic Data Analysis Services Available for PL-Grid Users

SolexaLIMS: A Laboratory Information Management System for the Solexa Sequencing Platform

Lecture 3. Essential skills for bioinformatics: Unix/Linux

Genomics. Nolan C. Kane

Exome sequencing. Jong Kyoung Kim

NGS Data Analysis. Roberto Preste

Handling important NGS data formats in UNIX Prac8cal training course NGS Workshop in Nove Hrady 2014

ChIP-seq Analysis Practical

Practical 4: ChIP-seq Peak calling

Goldmine: Integrating information to place sets of genomic ranges into biological contexts Jeff Bhasin

iloci software is used to calculate the gene-gene interactions from GWAS data. This software was implemented by the OpenCL framework.

m6aviewer Version Documentation

New High Performance Computing Cluster For Large Scale Multi-omics Data Analysis. 28 February 2018 (Wed) 2:30pm 3:30pm Seminar Room 1A, G/F

Welcome to GenomeView 101!

Recap From Last Time:

Package MIRA. October 16, 2018

Transcription:

Workshop 6: DNA Methylation Analysis using Bisulfite Sequencing Fides D Lay UCLA QCB Fellow lay.fides@gmail.com

Workshop 6 Outline Day 1: Introduction to DNA methylation & WGBS Quick review of linux, Hoffman2 and high-throughput sequencing glossary. Aligning WGBS reads using bwa-meth Day 2: DNA methylation calling using Bis-SNP Analysis of differentially methylated regions (DMRs) using metilene Day 3: Visualization of DNA methylation data WGBS analysis using BS-Seeker2

Day 2

Methylation Calling using Bis-SNP

Methylation Calling using Bis-SNP Liu et al 2012

Methylation Calling using Bis-SNP Advantages: Simultaneous variant calling to determine DNA methylation and genomic variants Calling of cytosines present in the context of GCH as used in NOMe-seq assay Disadvantages: Computationally intensive & slower Currently only compatible with Java6 No alignment tool by itself

Methylation Calling using Bis-SNP

Methylation Calling using Bis-SNP

Methylation Calling using Bis-SNP #bwa-meth v0.10 contains a wrapper for Bis-SNP

Methylation Calling using Bis-SNP #bwa-meth contains a wrapper for Bis-SNP for easier use #load dependencies module module load java/1.6.0_23 module load gatk module load python/2.7.3 module load bwa module load samtools/1.2 python /u/home/galaxy/collaboratory/apps/bwa-meth-0.10/ bwameth.py tabulate --reference /u/scratch/f/flay/workshop6/ genome/hg19_rcrschrm.fa --threads 5 --prefix N25_bissnp -- dbsnp /u/scratch/f/flay/workshop6/genome/dbsnp/ dbsnp_135.hg19.sort.vcf --bissnp /u/scratch/f/flay/workshop6/ tools/bis-snp/bissnp-0.87.jar --context CG N25.bam

Methylation Calling using Bis-SNP

Methylation Calling using Bis-SNP: Output Files

Assessing Library Quality: Bisulfite Conversion Rate #open the summary file view N25_bissnp.meth.vcf.MethySummarizeList.txt

Quick Recap: Features and Variation of DNA Methylation Rakyan et al 2011

Identifying Differentially Methylated Regions (DMRs) Using metilene

metilene Workflow Juhling et al, 2015

Identifying Differentially Methylated Regions (DMRs) Using metilene Installation ü Installing software ü Adding to environment ($PATH) Preparing bwa-meth output files for metilene ü Convert bed to bedgraph ü Generate metilene input files DMR analysis ü Running metilene ü Filtering

Identifying Differentially Methylated Regions (DMRs) Using metilene Installation ü Installing software ü Adding to environment ($PATH) Preparing bwa-meth output files for metilene ü Convert bed to bedgraph ü Generate metilene input files DMR analysis ü Running metilene ü Filtering

Install metilene #change directory to where metilene will be installed mkdir $HOME/software cd $HOME/software #download the latest version of metilene from http://www.bioinf.uni-leipzig.de/software/metilene/ wget http://www.bioinf.uni-leipzig.de/software/metilene/ metilene_v02-5.tar.gz

Install metilene #unpack file tar xvzf metilene_v02-5.tar.gz #enter the directory cd metilene_v0.2-5 #always read the notes and instructions before doing anything view README #to install make

Install metilene #add the directory containing metilene into your $PATH by editing your ~/.bash_profile view ~/.bash_profile #to insert, type i and copy and paste the directory i #to save changes :wq! source ~/.bash_profile

Install metilene

Identifying Differentially Methylated Regions (DMRs) Using metilene Installation ü Installing software ü Adding to environment ($PATH) Preparing bwa-meth output files for metilene ü Convert bed to bedgraph ü Generate metilene input files DMR analysis ü Running metilene ü Filtering

Formatting Files To Input into metilene #Convert.bed file to.bedgraph sed '1d' filename.bed awk '{print $1 "\t" $2 "\t" $3 "\t" $4} > filename.bedgraph Remove first line from file Print the first four columns

Formatting Files To Input into metilene #load dependency module module load bedtools/2.23.0 #sort all.bedgraph files sortbed -i filename.bedgraph > filename_sort.bedgraph #Generate input files perl /u/home/f/flay/software/metilene/metilene_input.pl -in1 file1.bedgraph file2.bedgraph -in2 name1.bedgraph name2.bedgraph -h1 file -h2 name -o metilene_file_name.input

Formatting Files To Input into metilene #Today s input files are already in /u/scratch/f/flay/workshop6/data/metilene/metilene_bl_fl.input #To see what the input file look like: head metilene_bl_fl.input chr pos file_1 file_2 file_... name_1 name_2 name_... Group 1 with n samples Group 2 with n samples

Identifying Differentially Methylated Regions (DMRs) Using metilene Installation ü Installing software ü Adding to environment ($PATH) Preparing bwa-meth output files for metilene ü Convert bed to bedgraph ü Generate metilene input files DMR analysis ü Running metilene ü Filtering

metilene Options

Running metilene: Identify DMRs #Run metilene on default mode: metilene -a BL -b FL metilene_bl_fl.input sort -V -k1,1 -k2,2n > metilene_bl_fl.output chr start stop q- value Mean meth. diff. #CpG p(mwu) p(2d KS) Mean methylati on group 1 (-a) Mean methyl ation group 2 (-b) #How many DMRs were identified at this stage? wc -l metilene_bl_fl.output 1101

Running metilene: Filter DMRs #Let s look at our DMRs more closely: module load R perl /u/home/f/flay/software/metilene_v0.2-5/metilene_output.pl -q metilene_bl_fl.output -o metilene_bl_fl.filter -a BL -b FL

Running metilene: Output

DMR Statistics

DMR Statistics

DMR Statistics

Visualization of DNA Methylation: IGV #download IGV Browser to your desktop #IGV already contains built-in genomes for some common ones

IGV Browser

IGV Browser

IGV Browser

Visualization File Formats #For visualization on IGV, use bedgraph or bigwig files. #convert.bed file output of bwa-meth/bis-snp to.bedgraph #We did this already when preparing files for metilene sed '1d' filename.bed awk '{print $1 "\t" $2 "\t" $3 "\t" $4} > filename.bedgraph

Visualization File Formats #Use UCSC Tools to convert between bedgraph to wig and bigwig formats. http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/ #Tools are also available through Galaxy

UCSC Tools #To install these tools, get the download link of the tool of interest: #For example, in your installation directory: wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bedcoverage #Make a specific file executable: chmod +x bedcoverage #or make all files in the directory executable: chmod a+x #Add installation directory to $PATH for easy access or specify full path/directory when calling the executable script #For the tools we need today, they are already in: /u/scratch/f/flay/workshop6/tools/ucsc/

UCSC Tools:bedGraphToBigWig # Convert bedgraph to bigwig file #Input bedgraph must be sorted bedgraphtobigwig filename_sort.bedgraph /u/scratch/f/ flay/workshop6/tools/ucsc/hg19.chrom.sizes filename.bw

IGV Browser

IGV Browser

Make Heatmap #Use R to make heatmap R is covered in Workshop 3 module load R #call R by typing R in command line #load R packages when R is booted: library( RColorBrewer ) library( gplots ) #to read the methylation data into R met = read.delim("metilene_bl_fl.input", sep="\t") #make a data matrix file met.ma = as.matrix(met[,3:20])

Make Heatmap #calculate variance across all samples in each CG site. met.ma.var = apply(met.ma,1,var) #extract the top 1% most variable CG sites in the dataset and write as a new matrix met.var = met.ma[met.ma.var>quantile(met.ma.var,0.99),] #perform clustering of data matrix met.hclust=hclust(dist(met.var)) #Generate a heatmap pdf( heatmap.pdf ) heatmap.2(met.var,col=colorramppalette(c("blue","yellow")), Colv = "FALSE", dendrogram= "row", Rowv=as.dendrogram(met.hclust)) dev.off()

Make Heatmap #download pdf on to your desktop scp -r flay@hoffman2.idre.ucla.edu:/u/scratch/f/flay/workshop6/data/raw/heatmap.pdf ~/Desktop

Make Heatmap #To alter plotting parameters, check the various arguments for the function by typing:?heatmap.2

Smooth Scatter Density Plot #Use a built-in function smoothscatter pdf("smoothscatter.pdf") smoothscatter(met.ma[,1], met.ma[,11], colramp=colorramppalette(brewer.pal(8,"ylgnbu"))) dev.off()

Colors in R Plot #brewer.pal is a function of RColorBrewer package #For more brewer.pal color options: http://www.personal.psu.edu/cab38/colorbrewer/ ColorBrewer.html #Other color options: https://www.nceas.ucsb.edu/~frazier/rspatialguides/ colorpalettecheatsheet.pdf

Other Commonly Used WGBS Tools Aligners and Methylation Callers: BSMAP v2.9 https://code.google.com/archive/p/bsmap/ BS-Seeker2 http://pellegrini.mcdb.ucla.edu/bs_seeker2/ Bis-mark http://www.bioinformatics.babraham.ac.uk/projects/bismark/ DMR Callers: bsseq http://bioconductor.org/packages/release/bioc/html/bsseq.html MOABS https://code.google.com/archive/p/moabs/