Chromatin signature discovery via histone modification profile alignments Jianrong Wang, Victoria V. Lunyak and I. King Jordan

Similar documents
Supplementary Material. Cell type-specific termination of transcription by transposable element sequences

Easy visualization of the read coverage using the CoverageView package

You will be re-directed to the following result page.

ChromHMM: automating chromatin-state discovery and characterization

Protocol: peak-calling for ChIP-seq data / segmentation analysis for histone modification data

Analyzing ChIP- Seq Data in Galaxy

W ASHU E PI G ENOME B ROWSER

User's guide to ChIP-Seq applications: command-line usage and option summary

From genomic regions to biology

panda Documentation Release 1.0 Daniel Vera

PICS: Probabilistic Inference for ChIP-Seq

ChIP-seq practical: peak detection and peak annotation. Mali Salmon-Divon Remco Loos Myrto Kostadima

ChIP-seq hands-on practical using Galaxy

SICER 1.1. If you use SICER to analyze your data in a published work, please cite the above paper in the main text of your publication.

The list of data and results files that have been used for the analysis can be found at:

m6aviewer Version Documentation

How To: Run the ENCODE histone ChIP- seq analysis pipeline on DNAnexus

Package ChIC. R topics documented: May 4, Title Quality Control Pipeline for ChIP-Seq Data Version Author Carmen Maria Livi

Genomic Analysis with Genome Browsers.

Goldmine: Integrating information to place sets of genomic ranges into biological contexts Jeff Bhasin

Table of contents Genomatix AG 1

epigenomegateway.wustl.edu

W ASHU E PI G ENOME B ROWSER

The ChIP-seq quality Control package ChIC: A short introduction

A short Introduction to UCSC Genome Browser

HIPPIE User Manual. (v0.0.2-beta, 2015/4/26, Yih-Chii Hwang, yihhwang [at] mail.med.upenn.edu)

Discovering Geometric Patterns in Genomic Data

Package VSE. March 21, 2016

Tutorial: Jump Start on the Human Epigenome Browser at Washington University

Advanced UCSC Browser Functions

ChromHMM Version 1.14 User Manual Jason Ernst and Manolis Kellis

ChIP-seq hands-on practical using Galaxy

CLC Server. End User USER MANUAL

BIOINFORMATICS ORIGINAL PAPER

Chip-seq data analysis: from quality check to motif discovery and more Lausanne, 4-8 April 2016

ChIP-Seq Tutorial on Galaxy

Analysis of ChIP-seq data

Hidden Markov Model II

Package SVM2CRM. R topics documented: May 1, Type Package

Triform: peak finding in ChIP-Seq enrichment profiles for transcription factors

ChIP-seq Analysis. BaRC Hot Topics - March 21 st 2017 Bioinformatics and Research Computing Whitehead Institute.

Welcome to GenomeView 101!

Package BAC. R topics documented: November 30, 2018

Background and Strategy. Smitha, Adrian, Devin, Jeff, Ali, Sanjeev, Karthikeyan

Differential Expression Analysis at PATRIC

Analysis of ChIP-seq Data with mosaics Package: MOSAiCS and MOSAiCS-HMM

User Manual of ChIP-BIT 2.0

ChIP-seq (NGS) Data Formats

Nature Biotechnology: doi: /nbt Supplementary Figure 1

Package FunciSNP. November 16, 2018

Finding and Exporting Data. BioMart

WASHU EPIGENOME BROWSER 2018 epigenomegateway.wustl.edu

Useful software utilities for computational genomics. Shamith Samarajiwa CRUK Autumn School in Bioinformatics September 2017

Agilent Genomic Workbench 7.0

Tutorial. RNA-Seq Analysis of Breast Cancer Data. Sample to Insight. November 21, 2017

Agilent Genomic Workbench Lite Edition 6.5

The UCSC Gene Sorter, Table Browser & Custom Tracks

Package polyphemus. February 15, 2013

ChIP-seq Analysis Practical

Expander 7.2 Online Documentation

Browser Exercises - I. Alignments and Comparative genomics

Agilent Genomic Workbench 6.5

Tiling Assembly for Annotation-independent Novel Gene Discovery

Exploring long-range genome interaction data using the WashU Epigenome Browser

Package mosaics. April 23, 2016

Sequence Analysis Pipeline

Visualisation, transformations and arithmetic operations for grouped genomic intervals

Analysis of ChIP-seq Data with mosaics Package

Release Note. Agilent Genomic Workbench Standard Edition

User Guide for Tn-seq analysis software (TSAS) by

Package ChIPseeker. July 12, 2018

EPITENSOR MANUAL. Version 0.9. Yun Zhu

BCRANK: predicting binding site consensus from ranked DNA sequences

HymenopteraMine Documentation

Practical 4: ChIP-seq Peak calling

Rsubread package: high-performance read alignment, quantification and mutation discovery

Sequence Alignment. GBIO0002 Archana Bhardwaj University of Liege

srna Detection Results

Part 1: How to use IGV to visualize variants

Order Preserving Triclustering Algorithm. (Version1.0)

Workshop 6: DNA Methylation Analysis using Bisulfite Sequencing. Fides D Lay UCLA QCB Fellow

Analyzing Genomic Data with NOJAH

Rsubread package: high-performance read alignment, quantification and mutation discovery

ROTS: Reproducibility Optimized Test Statistic

Demo 1: Free text search of ENCODE data

Agilent Genomic Workbench 7.0

CisGenome User s Manual

Working with ChIP-Seq Data in R/Bioconductor

Click on "+" button Select your VCF data files (see #Input Formats->1 above) Remove file from files list:

PROMO 2017a - Tutorial

Gegenees genome format...7. Gegenees comparisons...8 Creating a fragmented all-all comparison...9 The alignment The analysis...

Demo 1: Free text search of ENCODE data

nanoraw Documentation

ChIP-seq Analysis. BaRC Hot Topics - Feb 23 th 2016 Bioinformatics and Research Computing Whitehead Institute.

Package GSCA. October 12, 2016

Short Read Alignment Algorithms

Package GSCA. April 12, 2018

QIAseq Targeted RNAscan Panel Analysis Plugin USER MANUAL

Intervene Documentation

SEEK User Manual. Introduction

Transcription:

Supplementary Information for: Chromatin signature discovery via histone modification profile alignments Jianrong Wang, Victoria V. Lunyak and I. King Jordan Contents Instructions for installing and running the ChAT software 2-4 Supplementary Table S1 5 Supplementary Figure S1 6 Supplementary Table S2 7 Supplementary Figure S2 8 Supplementary Figure S3 9 Supplementary Figure S4 10 1

Instructions for installing and running the ChAT software 1. Preparation: In order to run ChAT, you need to: 1) Download the compressed folder ChAT_package.tar.gz from http://jordan.biology.gatech.edu/page/software/chat; 2) Decompress the folder. There are three files within the created folder: A) ChAT, B) clustering.r and C) cluster_figure.r. 3) Make sure these files are always kept in the same folder. 4) Make sure R program is already installed on your computer. 5) Check the shebang line of the ChAT file and correct it by the path of env of your computer. 6) Add the directory of the folder ChAT_package into the PATH. 2. Command-line of ChAT: The command line for ChAT is: $ ChAT -i [the directory where the set of input files located] -o [the output directory where both the final and intermediate files located] -m [the file of the list of histone modifications] -d [the file of the list of critical histone modifications used for initial grouping] -c [the file of the list of chromosomes] -p [p-value threshold to cut the hierarchical tree] -b [bin size] The detailed explanations of the parameters can be found in Section 4. One example of running ChAT is: $ ChAT -i /home/cd4_sample_data -o sample_pattern -m mark_name.txt -d critical_mark.txt -c chromosome.txt -p 0.05 -b 200 This command takes the Wiggle format histone modification files (must be named as *.wig) located in /home/cd4_sample_data folder as the inputs and create the directory sample_pattern to store all the final and intermediate results. mark_name.txt contains the list of histone modifications (each row has a histone modification name) that are consistent with the file names in the input directory. critical_mark.txt contains a subset of histone modifications used for initial grouping. chromosome.txt contains a list of chromosomes (each row has a chromosome name that are consistent with the chromosome names in the wiggle format input files) under consideration. The p-value threshold used to cut the hierarchical tree is set as 0.05. The bin size is set as 200bp using -b. 2

3. File Format: (A) Input files Corresponding to each individual histone modification, there is a Wiggle format file of the ChIP-seq data. All of the files need to be named as histone_mark_name.wig. For example,.wig for. All the files must be stored in the same directory. The name of the directory is the most important parameter for ChAT. A file of the list of all the histone modifications under consideration need to be provided. Each row has the name of a histone modification. A file of the list of critical histone modifications for initial grouping need to be provided. Those modifications are important marks based on a priori biological knowledge. This list must be a subset of the modifications under consideration. Each row has the name of a critical histone modification. A file of the list of all the chromosomes under consideration need to be provided. Each row has the name of a chromosome. The names need to be consistent with the chromosome names in the input wiggle format files. (B) Output files All the output files are stored in the created folder specified by -o. The most important final results are saved in 2 folders. The BED format tracks of genomic locations sharing specific combinatorial chromatin signatures are stored in BED_tracks. The average histone modification profiles of each signature and the corresponding enrichment curves in PDF files are stored in Signature_info. 4. Parameters: -i: The directory where all the wiggle format input files (one file for each histone modification) are located. The wiggle format files must be names as *.wig. -o: The output directory where all the final and intermediate results are stored. -m: The file with the list of histone modifications under consideration. Each row has the name of one histone modification. They need to be consistent with the name of the wiggle format input files. -d: The file with the list of critical histone modifications used for initial grouping. Each row has the name of one histone modification. 3

-c: The file with the list of chromosomes under consideration. Each row has the name of one chromosome. They need to be consistent with the chromosome names in the wiggle format input files. -p: The p-value threshold used to cut the hierarchical tree, default value: 0.05. -b: The size of bin, default value 200 (bp). -h,-help: Display brief explanations of parameters. 5. Computational performance: ChAT is tested on a Ubuntu Linux server (with memory 8 Gb) to identify combinatorial signatures based on the ChIP-seq datasets of 14 histone methylations on human chromosome 2, and it takes 25.5 minutes to produce the combinatorial chromatin signatures. 4

Supplementary Table S1: Overview of the algorithmic features of ChAT and related software. Free of sizerestriction No use of motif seeds Multi-modal signatures Deal with flexible histone modification distributions Classify distinct shapes Intrinsic statistical criterion ChromaSig CoSBI ChromHMM Segway ChAT x x x x x x x x x x x x x x x x 5

Supplementary Figure S1: Algorithm performance comparison. A set of histone methylation ChIP-seq datasets on human chromosome 2 are used to test four specific algorithmic features of ChAT, ChromaSig and CoSBI. The three softwares identify similar chromatin signatures for a standard mono-modal pattern (A). For a bi-modal pattern identified by ChAT, ChromaSig and CoSBI only found mono-modal signatures (B). For a set of genomic locations enriched with the same set of histone modifications, ChAT discriminate two patterns with distinct enrichment shapes (C). ChAT identified a complex large-sized signature (~82 kb) for a set of genomic locations, while ChromaSig found a number of small-sized signatures as parts of the large signature (D). A mono-modal small signature B 0.0 0.2 0.4 0.6 0.8 H3K4me2 ChAT ChromaSig CoSBI 0 2400 0.00 0.05 0.10 0.15 H3K4me2 0 5 10 15 20 25 30 0 5 10 15 20 25 H3K4me2 bi-modal small signature 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0 4600 0.00 0.05 0.10 0.15 0 1 2 3 0.5 1.5 2.5 3.5 C small signatures with distinct shapes D complex large signature 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.8 0 4400 0 4800 H3K79me2 0 81600 0.00 0.10 0.20 0.30 0.0 0.2 0.4 0.6 H3K79me2 0.0 0.1 0.2 0.3 0.0 0.1 0.2 0.3 H3K79me2 0.1 0.3 0.5 0.00 0.05 0.10 0.15 H3K79me2 H3K79me2 0 2 4 6 8 0 5 10 15 6

Supplementary Table S2: Enrichments of small-sized combinatorial histone modification patterns with functional genomic features. The numbers and fractions of combinatorial histone modification patterns enriched with specific genomic features are summarized. Genomic Features No. patterns enriched with FE a >3 No. patterns enriched with FE>5 No. patterns enriched with FE>8 TSS b 36 (25.0%) 21 (14.6%) 8 (5.6%) TTS c 9 (6.3%) 0 0 p300 d 18 (12.5%) 16 (11.1%) 12 (8.3%) DNase I e 60 (41.7%) 51 (35.4%) 40 (27.8%) CNE f 144 (100.0%) 142 (98.6%) 137 (95.1%) a FE: ratios of the fractions of patterns overlapping with the specific features over the genomic fractions of the corresponding features. b TSS: transcription start site. c TTS: transcription termination site. d p300: binding sites of p300. e DNase I: DNase I hypersensitive sites. f CNE: Conserved non-coding elements predicted based on sequence alignments of 28 vertebrate species (data downloaded from UCSC genome browser). 7

Supplementary Figure S2: Histone modification profiles for -H3K9me3 bivalent pattern. Genomic locations with this specific bivalent pattern are aligned and levels of and H3K9me3 are shown as heatmaps on the left (yellow - higher levels, blue - lower levels). The average profiles of histone modifications of this pattern are shown on the right. 0.40 0.55 0.70 4.4 kb H3K9me3 0.30 0.40 0.55 H3K9me3 4.4 kb 4.4 kb 8

Supplementary Figure S3: Average histone modification profiles for the large pattern example A. Each curve shows the average profile of a specific histone modification of genomic locations with the same pattern. H2AZ H3K27ac H3K36ac 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.3 0.4 0.5 0.6 0.2 0.3 0.4 0.5 0.6 0.3 0.4 0.5 0.6 0.20 0.25 0.30 0.35 0.40 0.45 H3K79me1 H3K79me3 H4K20me1 0.15 0.20 0.25 0.30 0.35 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.2 0.3 0.4 0.5 0.6 0.7 0.3 0.4 0.5 0.6 0.7 9

Supplementary Figure S4: Average histone modification profiles for the large pattern example B. Each curve shows the average profile of a specific histone modification of genomic locations with the same pattern. H2AZ H3K27ac H3K36ac 0.00 0.05 0.10 0.15 0.10 0.15 0.20 0.25 0.30 0.10 0.15 0.20 0.25 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.3 0.4 0.5 0.6 0.4 0.5 0.6 0.7 0.25 0.30 0.35 0.40 0.45 0.50 H3K79me1 H3K79me3 H4K20me1 0.15 0.20 0.25 0.20 0.25 0.30 0.35 0.40 0.45 0.3 0.4 0.5 0.6 0.3 0.4 0.5 0.6 0.7 10