Nature Biotechnology: doi: /nbt Supplementary Figure 1

Similar documents
TECH NOTE Improving the Sensitivity of Ultra Low Input mrna Seq

Easy visualization of the read coverage using the CoverageView package

ChIP-seq hands-on practical using Galaxy

Supplementary Material. Cell type-specific termination of transcription by transposable element sequences

You will be re-directed to the following result page.

Useful software utilities for computational genomics. Shamith Samarajiwa CRUK Autumn School in Bioinformatics September 2017

ChIP-seq Analysis. BaRC Hot Topics - Feb 23 th 2016 Bioinformatics and Research Computing Whitehead Institute.

m6aviewer Version Documentation

ChIP-seq Analysis. BaRC Hot Topics - March 21 st 2017 Bioinformatics and Research Computing Whitehead Institute.

HIPPIE User Manual. (v0.0.2-beta, 2015/4/26, Yih-Chii Hwang, yihhwang [at] mail.med.upenn.edu)

Tutorial 1: Exploring the UCSC Genome Browser

Analyzing ChIP- Seq Data in Galaxy

Table of contents Genomatix AG 1

Analysis of ChIP-seq data

Browser Exercises - I. Alignments and Comparative genomics

SPAR outputs and report page

Supplementary Figure 1. Fast read-mapping algorithm of BrowserGenome.

Functional Genomics Research Stream. Computational Meeting: March 29, 2012 RNA-seq Analysis Pipeline

epigenomegateway.wustl.edu

NGS NEXT GENERATION SEQUENCING

ChIP-seq hands-on practical using Galaxy

QIAseq DNA V3 Panel Analysis Plugin USER MANUAL

The ChIP-seq quality Control package ChIC: A short introduction

From genomic regions to biology

W ASHU E PI G ENOME B ROWSER

QIAseq Targeted RNAscan Panel Analysis Plugin USER MANUAL

Tn-seq Explorer 1.2. User guide

Goldmine: Integrating information to place sets of genomic ranges into biological contexts Jeff Bhasin

ChIP-Seq Tutorial on Galaxy

W ASHU E PI G ENOME B ROWSER

High-throughout sequencing and using short-read aligners. Simon Anders

Genome Environment Browser (GEB) user guide

User's guide to ChIP-Seq applications: command-line usage and option summary

Tutorial for Windows and Macintosh. Trimming Sequence Gene Codes Corporation

Topics of the talk. Biodatabases. Data types. Some sequence terminology...

Protocol: peak-calling for ChIP-seq data / segmentation analysis for histone modification data

Advanced UCSC Browser Functions

Genome Browsers - The UCSC Genome Browser

Peter Schweitzer, Director, DNA Sequencing and Genotyping Lab

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg

Practical 4: ChIP-seq Peak calling

MSCBIO 2070/02-710: Computational Genomics, Spring A4: spline, HMM, clustering, time-series data analysis, RNA-folding

Chromatin signature discovery via histone modification profile alignments Jianrong Wang, Victoria V. Lunyak and I. King Jordan

Genomic Analysis with Genome Browsers.

ChromHMM: automating chromatin-state discovery and characterization

Tutorial: RNA-Seq analysis part I: Getting started

Getting Started. April Strand Life Sciences, Inc All rights reserved.

ChIP-seq practical: peak detection and peak annotation. Mali Salmon-Divon Remco Loos Myrto Kostadima

Nature Publishing Group

ChIP- seq Analysis. BaRC Hot Topics - Feb 24 th 2015 BioinformaBcs and Research CompuBng Whitehead InsBtute. hgp://barc.wi.mit.

BCRANK: predicting binding site consensus from ranked DNA sequences

Supplementary information: Detection of differentially expressed segments in tiling array data

WASHU EPIGENOME BROWSER 2018 epigenomegateway.wustl.edu

User Manual of ChIP-BIT 2.0

DNA Inspired Bi-directional Lempel-Ziv-like Compression Algorithms

Exploring long-range genome interaction data using the WashU Epigenome Browser

CLC Server. End User USER MANUAL

How To: Run the ENCODE histone ChIP- seq analysis pipeline on DNAnexus

Genome Assembly and De Novo RNAseq

Mapping RNA sequence data (Part 1: using pathogen portal s RNAseq pipeline) Exercise 6

Identify differential APA usage from RNA-seq alignments

Tutorial: RNA-Seq Analysis Part II (Tracks): Non-Specific Matches, Mapping Modes and Expression measures

LIMS User Guide 1. Content

/ Computational Genomics. Normalization

Genomics - Problem Set 2 Part 1 due Friday, 1/25/2019 by 9:00am Part 2 due Friday, 2/1/2019 by 9:00am

UCSC Genome Browser Pittsburgh Workshop -- Practical Exercises

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg

ChIP-seq Analysis Practical

Documentation of S MART

RNA-seq. Manpreet S. Katari

Exercise 2: Browser-Based Annotation and RNA-Seq Data

Exercises. Biological Data Analysis Using InterMine workshop exercises with answers

Using Manhattan distance and standard deviation for expressed sequence tag clustering. Dane Kennedy Supervisor: Scott Hazelhurst

Download and Register SnapGene 7. Generate an with a Download Link 10. Unregister the Computer You Are Using 12

Design and Annotation Files

ABSTRACT. WIMBERLEY, CHARLES ERIC. PeakPass: A Machine Learning Approach for ChIP-Seq Blacklisting. (Under the direction of Dr. Steffen Heber.

Eval: A Gene Set Comparison System

Tn-seq Explorer 1.3. User guide

Genomics - Problem Set 2 Part 1 due Friday, 1/26/2018 by 9:00am Part 2 due Friday, 2/2/2018 by 9:00am

Discovery Net : A UK e-science Pilot Project for Grid-based Knowledge Discovery Services. Patrick Wendel Imperial College, London

Package ChIC. R topics documented: May 4, Title Quality Control Pipeline for ChIP-Seq Data Version Author Carmen Maria Livi

Introduction to the TSSi package: Identification of Transcription Start Sites

PICS: Probabilistic Inference for ChIP-Seq

Colorado State University Bioinformatics Algorithms Assignment 6: Analysis of High- Throughput Biological Data Hamidreza Chitsaz, Ali Sharifi- Zarchi

Circ-Seq User Guide. A comprehensive bioinformatics workflow for circular RNA detection from transcriptome sequencing data

Triform: peak finding in ChIP-Seq enrichment profiles for transcription factors

GeneSpring Tutorial version 3.5

CAP BIOINFORMATICS Su-Shing Chen CISE. 8/19/2005 Su-Shing Chen, CISE 1

Agilent Genomic Workbench Lite Edition 6.5

SUPPLEMENTARY INFORMATION

BIOINFORMATICS ORIGINAL PAPER

Genome Browsers Guide

Services Performed. The following checklist confirms the steps of the RNA-Seq Service that were performed on your samples.

Identiyfing splice junctions from RNA-Seq data

Data Processing and Analysis in Systems Medicine. Milena Kraus Data Management for Digital Health Summer 2017

4.1. Access the internet and log on to the UCSC Genome Bioinformatics Web Page (Figure 1-

Tutorial: Jump Start on the Human Epigenome Browser at Washington University

The Kodon quickguide

Sequence Preprocessing: A perspective

When we search a nucleic acid databases, there is no need for you to carry out your own six frame translation. Mascot always performs a 6 frame

Transcription:

Supplementary Figure 1 Detailed schematic representation of SuRE methodology. See Methods for detailed description. a. Size-selected and A-tailed random fragments ( queries ) of the human genome are inserted in bulk into barcoded T-overhang plasmids by ligation. BC, barcode; ORF, open reading frame; PAS, polyadenylation signal. b. The library is digested by endonuclease I-CeuI so that the barcode with the query sequence is released. This is then self-ligated and again digested with a frequent cutter restriction enzyme to reduce the insert size. After another self-ligation the circle is linearized, PCR amplified and subjected to high-throughput sequencing. c. Per biological replicate ~100 million cells are transfected. Those plasmids that contain promoter activity in the direction of the barcode will transcribe the barcode into RNA. Cells are harvested after 24 hours, RNA is extracted, polya purified, reverse transcribed, PCR amplified and subjected to high-throughput sequencing. By normalization to estimated barcode frequencies in the SuRE plasmid library a genome-wide SuRE expression profile is generated.

Supplementary Figure 2 SuRE genome coverage, reproducibility and peaks. a. Coverage of the human genome by unique elements in the SuRE library. b. Distribution (fold enrichment) of SuRE peaks among the 25 types of chromatin 1. c. Correlation of SuRE enrichment between biological replicates at TSSs. d. Correlation between CAGE 1 and SuRE at the TSSs. e. Same as Fig. 1e but with Histone genes indicated in red. Correlation between relative promoter autonomy (log 10 (SuRE/GRO-cap)) and tissue specificity (number of cell types and tissues in which each TSS is active, out of 889 tested 2 ). Grey line shows linear fit. f. Correlation between relative promoter autonomy and the total number of promoters (ENCODE chromatin type Tss ) that are found in a fixed window of 5-50 kb from the TSS. g. Size distribution of genomic fragments in the SuRE library. h. Number of reads (per individual replicate) of barcodes in cdna. Only barcodes linked to a unique genomic fragment were counted. i. Venn diagram representing the overlap between the summits of SuRE peaks as called by the MACS algorithm 3 and ENCODE-annotated promoters ( Tss ) and enhancers ( Enh and EnhW combined) 1. Because >1 peak summit can overlap a ENCODE annotation, overlaps are given for each direction of the comparison in the color of the annotation. j. Relative SuRE expression (SuRE/GRO-cap) of SuRE fragments for which the 3 ends either in an intron (black) or an exon (red). Expression is normalized to GRO-cap to avoid systematic biases resulting from possible correlations between gene structure and expression level. A LOESS curve was separately fit to the logratios for all exon- and intron-terminal fragments using the distance each fragment ended downstream of the corresponding TSS, then predicted ratios were normalized to a

maximum of 1. 1. Encode Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74 (2012). 2. FANTOM Consortium. A promoter-level mammalian expression atlas. Nature 507, 462-470 (2014). 3. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol 9, R137 (2008).

Supplementary Figure 3 Focused BAC library. a. Correlation between biological replicates for the focused SuRE library. Data is shown for all TSSs within in the BAC library. b. Correlation between SuRE enrichment obtained with the genome-wide library (x-axis) and the focused library (y-axis) for all peaks overlapping the BAC library. c. Same as (b) but for all TSSs in the BAC library. d. Correlation between SuRE enrichment obtained with the genome-wide library (x-axis) and a conventional reporter assay (y-axis) for 23 promoters. Grey line shows linear fit. e. Correlation between pre-transfection read-counts and post-transfection read-counts for all TSSs in the BAC library.

Supplementary Figure 4 Run-on transcription around LTR12C elements, antisense. Average PRO-seq run-on transcription activity 4 around LTR12C elements as in Fig. 5e, but in antisense orientation. 4. Core, L.J. et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat Genet 46, 1311-1320 (2014).

Supplementary Figure 5 Chromatin marks associated to unannotated SuRE peaks. a. Mean enrichment for 4 chromatin marks centered on the summit of unannotated SuRE peaks, i.e. peaks that did not overlap ENCODE annotated promoters or enhancers ( Tss or Enh chromatin state) or repetitive elements of the ERV1 or ERVL-MaLR family. b. Same as (a) but for SuRE peaks that overlapped encode annotated promoters. c. Mean SuRE enrichment for all peaks overlapping ENCODE annotated promoters (green) and unannotated SuRE peaks. d. Same as (c) but for mean GRO-cap signal.

Supplementary Figure 6 Envisioned SuRE methodology for enhancer detection. a. Current SuRE reporter construct for promoter detection. b. Envisioned reporter construct for enhancer detection. Query: genomic fragment, BC: barcode, ORF: open reading frame, PAS: polyadenylation signal, mpr: minimal promoter.

Supplementary Table 1. hg19 coordinates and references for Bacterial Artificial Chromosomes chr start end BAC reference chr16 146,255 277,801 CTD-2100B21 chr4 74,192,088 74,388,711 CTD-2610J6 chr5 139,961,667 140,166,117 CTD-3252A18 chr1 109,571,381 109,687,521 CTD- 3156P24 chr6 26,115,655 26,242,415 CTD-2153L18 chr1 155,087,497 155,235,687 CTD-3075C4 chr2 11,628,199 11,749,760 CTD-2079O10 chr17 46,595,129 46,735,928 CTD-2508F13 chr2 176,924,832 177,067,161 CTD-2521G12

Supplementary Table 2. Oligo nucleotide sequences name description synthesis type sequence 256JvA Forward primer for usage on barcoding template standard 254JvA barcoding template ultramer TGTGATGGTTGGCCAACCTTGGAATTCCGGAAGGGATCTGGTTAACCTTGG AACC AAGGGATCTGGTTAACCTTGGAACCTTGGCCAACGTACGACTGGAGATCG GAAGAGCACACGTCTGAACTCCAGTCACTAGGGATAACAGGGTAATACACT CTTTCCCTACACGACGCTCTTCCGATCT 264JvA Reverse primer for usage on barcoding template (containing barcode) standard TTGGTTCCTAGG(N)(N)(N)(N)(N)(N)(N)(N)(N)(N)(N)(N)(N)(N)(N)(N)(N)(N)(N)(N )AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT 247JvA First strand primer of reverse transcription standard CCTCTCCGCCGCCCACCAGCTCGAACTCCAC 211JvA Reverse primer for cdna or plasmiddna; containing S2, index and P7 standard CAAGCAGAAGACGGCATACGAGATCGTGATGTGACTGGAGTTCAGACGTG TGCTCTTCCGATCTGGTGATGCGGCACTCGATCTTCATGGC 117JvA Reverse primer for ipcr; containing S2 and P7 standard CAAGCAGAAGACGGCATACGAGATACATCGGTGACTGGAGTTCAGACGTG TGCTCTTCCGATC AR151 Forward primer for cdna, plasmiddna or ipcr; containing S1 and P5 standard AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTT CCGATCT 383JvA TBP FW standard ACTTCGTGCCCGAAACGC 384JvA TBP FW standard ATCCTCATGATTACCGCAGCAAAC 419JvA Alpha globin 1 FW standard CTCGGTGGCCATGCTTCTT 420JvA Alpha globin 1 RV standard GCCGCCCACTCAGACTTTAT

421JvA Alpha globin 2 FW standard TCAAGCTCCTAAGCCACTGC 422JvA Alpha globin 2 RV standard CAGGAGGAACGGCTACCGAG 431JvA Theta1 globin FW standard CCTGAGCCACGTTATCTCGG 432JvA Theta1 globin RV standard GGCTTTACTCAAACACGGGG 433JvA Zeta globin FW standard CTGAGCAGGCCCAACTCC 434JvA Zeta globin RV standard GATCTTGGCCCACATGGACA 446JvA BetaActin FW standard ACAGAGCCTCGCCTTTGCC 447JvA BetaActin RV standard GAGGATGCCTCTCTTGCTCTG AR68FW GFP FW standard AGGACAGCGTGATCTTCACC AR68RV GFP RV standard CTTGAAGTGCATGTGGCTGT 444JvA YFP FW standard GATCTGCACCACCGGCAAG 445JvA YFP RV standard GCTGCTTCATGTGGTCGGG PAIP2_FW confirmation set supplementary figure 3d standard TGTCTGAGTGCGGATGTTTGC

PAIP2_RV confirmation set supplementary figure 3d standard GGCAGCCAACGAATCCTGTC PARK7_FW confirmation set supplementary figure 3d standard TGTGCGCAGCACTGCTCTAGT PARK7_RV confirmation set supplementary figure 3d standard GCCCGTTGGGTACCACTCAC TIMM10B_FW confirmation set supplementary figure 3d standard GGTCTCCCCTCCTCCGTCTC TIMM10B_RV confirmation set supplementary figure 3d standard CGTCTGGCCTCGAAACGACT HIST1H2BD_FW confirmation set supplementary figure 3d standard CGAAGGGCTACATTTCAAGTGC HIST1H2BD_RV confirmation set supplementary figure 3d standard TCTTTGGGGCAGGAGCAGAC PKLR_FW confirmation set supplementary figure 3d standard TTTCCCTGGGGGTAGGAGTC PKLR_RV confirmation set supplementary figure 3d standard ACCGAAGCTGCAGGGATGAT ALG3_FW confirmation set supplementary figure 3d standard CCCCCAACGCTCAAACTCTG ALG3_RV confirmation set supplementary figure 3d standard TGGCAGTACAGCCGGAGGAT CNOT4_FW confirmation set supplementary figure 3d standard AATGCGCAAGGACAGGGAAA CNOT4_RV confirmation set supplementary figure 3d standard GCTTCAGCGAGTCCGACCTT

GLMN_FW confirmation set supplementary figure 3d standard TCTGGGGGAAGAGGGGAGTC GLMN_RV confirmation set supplementary figure 3d standard TCCACTTACCGGCCAGAACC ZNF669_FW confirmation set supplementary figure 3d standard CATCCCCAACCTTGGCAAAA ZNF669_RV confirmation set supplementary figure 3d standard CTCCGGCGAAGGAGAGACAA C9orf156_FW confirmation set supplementary figure 3d standard TTTCCCACCACCCAGGGATA C9orf156_RV confirmation set supplementary figure 3d standard CGCATGGCTACTGGTTGCTG POLR2J_FW confirmation set supplementary figure 3d standard TTGTCCCTCCCGGCTAACAA POLR2J_RV confirmation set supplementary figure 3d standard GCCCTCGAAGAGCAAGAACG RPL37_FW confirmation set supplementary figure 3d standard AAAGTCAGCGTCGGCCAAAA RPL37_RV confirmation set supplementary figure 3d standard CCCCAAGCACAGCAAACAGA TOMM7_FW confirmation set supplementary figure 3d standard TGTGCAGCCAGGGTTGAGAA TOMM7_RV confirmation set supplementary figure 3d standard CGGGAATCCGAAAGGGAAAG NHSL1_FW confirmation set supplementary figure 3d standard TGCTTTGGAACACACAATGCTG

NHSL1_RV confirmation set supplementary figure 3d standard TTCCCCCGGTCTCATATCCTT IL1R1_FW confirmation set supplementary figure 3d standard CGCCCCTGGTGTGTCAGGTA IL1R1_RV confirmation set supplementary figure 3d standard TGGGTGTCACCTCCCATTTTT PRKAG2_FW confirmation set supplementary figure 3d standard CTTGCTGGGAGGTGGGATTG PRKAG2_RV confirmation set supplementary figure 3d standard GGCAGCAGGTTCCAGATGTGT GANC_FW confirmation set supplementary figure 3d standard GCCTGGCCTGAGTCTTTTCTG GANC_RV confirmation set supplementary figure 3d standard TCAGGCCCAAACTAGCGTTTC DMKN_FW confirmation set supplementary figure 3d standard AGAATGGGGGCAGGACTGTG DMKN_RV confirmation set supplementary figure 3d standard TCCCCCTCTTTAGCCTGTTGG OSBPL6_FW confirmation set supplementary figure 3d standard AGCGCTGGAGCCGTTCTG OSBPL6_RV confirmation set supplementary figure 3d standard CCCAAGCAATCCCTTTGCAG TLR4_FW confirmation set supplementary figure 3d standard TGGTGGGCCCTAATCCAACA TLR4_RV confirmation set supplementary figure 3d standard GCGAGGCAGACATCATCCTG

SLC13A4_FW confirmation set supplementary figure 3d standard CTTTGCCAGGGAGGCAGCTA SLC13A4_RV confirmation set supplementary figure 3d standard GGGCCTGCAAAGCAGAAAAG CEP85L_FW confirmation set supplementary figure 3d standard CCACCCCAAATCCCACTGAA CEP85L_RV confirmation set supplementary figure 3d standard TGCTCCACAATTGGAGAAACAA MGST3_FW confirmation set supplementary figure 3d standard GCCAGCTCTCGGCAAAACTAA MGST3_RV confirmation set supplementary figure 3d standard CCTTCGAACAGCTGGAGCAGA