Nature Publishing Group

Similar documents
Introduction to Bioinformatics AS Laboratory Assignment 2

/ Computational Genomics. Normalization

Genome Environment Browser (GEB) user guide

TraceFinder Analysis Quick Reference Guide

Microarray Data Analysis (V) Preprocessing (i): two-color spotted arrays

Course on Microarray Gene Expression Analysis

Introduction to GE Microarray data analysis Practical Course MolBio 2012

Organizing, cleaning, and normalizing (smoothing) cdna microarray data

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Methodology for spot quality evaluation

PROCEDURE HELP PREPARED BY RYAN MURPHY

CLUSTERING IN BIOINFORMATICS

Supplementary information: Detection of differentially expressed segments in tiling array data

MAT 142 College Mathematics. Module ST. Statistics. Terri Miller revised July 14, 2015

Themes in the Texas CCRS - Mathematics

TOPIC 2 Building Blocks of Geometry. Good Luck To

Importing and processing a DGGE gel image

Graphical Analysis of Data using Microsoft Excel [2016 Version]

Supervised Clustering of Yeast Gene Expression Data

Tutorial:OverRepresentation - OpenTutorials

Supplementary Appendix

BMEGUI Tutorial 1 Spatial kriging

Contents. About the Author. Decimals. Acknowledgments. Correlation Chart. Whole Numbers

Frequency Distributions

LAB 2: DATA FILTERING AND NOISE REDUCTION

Package cgh. R topics documented: February 19, 2015

LAB 2: DATA FILTERING AND NOISE REDUCTION

Measures of Dispersion

Using Excel for Graphical Analysis of Data

Preprocessing -- examples in microarrays

How do microarrays work

Contents NUMBER. Resource Overview xv. Counting Forward and Backward; Counting. Principles; Count On and Count Back. How Many? 3 58.

Agilent CytoGenomics 2.0 Feature Extraction for CytoGenomics

Comparisons and validation of statistical clustering techniques for microarray gene expression data. Outline. Microarrays.

Using Excel for Graphical Analysis of Data

Digital Image Processing. Prof. P. K. Biswas. Department of Electronic & Electrical Communication Engineering

Nature Biotechnology: doi: /nbt Supplementary Figure 1

User Manual Release Note. PhaserMatch

Supplementary material. A two-component nonphotochemical fluorescence quenching in eustigmatophyte algae

Micro-array Image Analysis using Clustering Methods

Agilent Feature Extraction Software (v10.5)

Introduction to Data Mining of Microarrays using the MicroArray Explorer

Single Slit Diffraction

Import GEO Experiment into Partek Genomics Suite

Automatic Techniques for Gridding cdna Microarray Images

Fuzzy C-means with Bi-dimensional Empirical Mode Decomposition for Segmentation of Microarray Image

MATH3880 Introduction to Statistics and DNA MATH5880 Statistics and DNA Practical Session Monday, 16 November pm BRAGG Cluster

Descriptive Statistics, Standard Deviation and Standard Error

MICROARRAY IMAGE SEGMENTATION USING CLUSTERING METHODS

CITY UNIVERSITY OF NEW YORK. Creating a New Project in IRBNet. i. After logging in, click Create New Project on left side of the page.

Acknowledgments. Acronyms

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

Applying Data-Driven Normalization Strategies for qpcr Data Using Bioconductor

General Program Description

Agilent Genomic Workbench 7.0

Dealing with Data in Excel 2013/2016

The analysis of acgh data: Overview

Microarray Excel Hands-on Workshop Handout

An open source tool for automatic spatiotemporal assessment of calcium transients and local signal-close-to-noise activity in calcium imaging data

SUPPLEMENTARY FILE S1: 3D AIRWAY TUBE RECONSTRUCTION AND CELL-BASED MECHANICAL MODEL. RELATED TO FIGURE 1, FIGURE 7, AND STAR METHODS.

Subject : Mathematics Level B1 Class VII Lesson: 1 (Integers)

Multivariate Calibration Quick Guide

Implementing the Scale Invariant Feature Transform(SIFT) Method

Tips and Guidance for Analyzing Data. Executive Summary

Protocol: peak-calling for ChIP-seq data / segmentation analysis for histone modification data

Sciex QTrap Operational Steps for Trained Personnel

COMPUTER AND ROBOT VISION

Hands-On Standards Deluxe Grades: 7, 8 States: California Content Standards

Chapter 6 Normal Probability Distributions

BioFuel Graphing instructions using Microsoft Excel 2003 (Microsoft Excel 2007 instructions start on page mei-7)

A. Incorrect! This would be the negative of the range. B. Correct! The range is the maximum data value minus the minimum data value.

Excel Primer CH141 Fall, 2017

Image Analysis begins with loading an image into GenePix Pro, and takes you through all the analysis steps required to extract data from the image.

STA 570 Spring Lecture 5 Tuesday, Feb 1

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

epigenomegateway.wustl.edu

Agilent Genomic Workbench 7.0

PREREQUISITE:Individualized Educational Plan with this component. REQUIRED MATERIALS: notebook, pencil, calculator, agenda book

NCSS Statistical Software

MiChip. Jonathon Blake. October 30, Introduction 1. 5 Plotting Functions 3. 6 Normalization 3. 7 Writing Output Files 3

The first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies.

Surfacing using Creo Parametric 3.0

Mathematics Shape and Space: Polygon Angles

qpcr Hand Calculations - Multiple Reference Genes

Integrated Algebra 2 and Trigonometry. Quarter 1

Edge and local feature detection - 2. Importance of edge detection in computer vision

Genomics - Problem Set 2 Part 1 due Friday, 1/26/2018 by 9:00am Part 2 due Friday, 2/2/2018 by 9:00am

CITY UNIVERSITY OF NEW YORK. i. Visit:

Stochastic Simulation: Algorithms and Analysis

Pre-Lab Excel Problem

ChIP-Seq Tutorial on Galaxy

Release Note. Agilent Genomic Workbench Standard Edition

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Data Analysis Guidelines

Middle School Math Course 2

Obtaining Feature Correspondences

Department of Chemical Engineering ChE-101: Approaches to Chemical Engineering Problem Solving MATLAB Tutorial Vb

Mapping Common Core State Standard Clusters and. Ohio Grade Level Indicator. Grade 5 Mathematics

Scientific Graphing in Excel 2007

Table of Contents. Imaging Nucleic Acid Gels on the LI-COR Odyssey FC Imager User Guide University of Puget Sound Updated June 2014 by Amy Replogle

Transcription:

Figure S I II III 6 7 8 IV ratio ssdna (S/G) WT hr hr hr 6 7 8 9 V 6 6 7 7 8 8 9 9 VII 6 7 8 9 X VI XI VIII IX ratio ssdna (S/G) rad hr hr hr 6 7 Chromosome Coordinate (kb) 6 6 Nature Publishing Group

Figure S Continued XII 6 7 8 9 XIII ratio ssdna (S/G) WT hr hr hr 6 7 8 9 XIV 6 7 XV ratio ssdna (S/G) rad hr hr hr 6 7 8 9 XVI 6 7 8 9 Chromosome Coordinate (kb) 6 Nature Publishing Group

Figure S. Overlay of smoothed ssdna profiles for S. cerevisiae. The time course for both WT and rad strains are shown in series of increasing color intensity. WT cell profiles at, and -hour post release are shown as light purple, magenta and dark purple curves, respectively; rad cell profiles at, and -hour post release are shown as yellow, orange and red curves, respectively. WT cell profiles are plotted on the Y (left) axes and rad cell profiles are plotted on the Y (right) axes. Positions of Pro- ARSs are shown as green diamonds. Positions of clustered origins (those that appear in at least two of the timed samples) from ssdna profiles of rad cells are shown as filled blue circles. Positions of singleton origins (those that appear in only one of the three timed sample) are shown as filled red circles. 6 Nature Publishing Group

Figure S I II III 6 7 8 IV 6 7 8 9 ratio ssdna (S/G) WT hr V 66 7 7 8 8 9 9 VII 6 7 8 9 VI VIII IX copy number X XI 6 7 Chromosome Coordinate (kb) 6 6 Nature Publishing Group

Figure S Continued XII 6 7 8 9 XIII ratio ssdna (S/G) WT hr 6 7 8 9 XIV 6 7 copy number XV 6 7 8 9 XVI 6 7 8 9 Chromosome Coordinate (kb) 6 Nature Publishing Group

Figure S. Comparison between S. cerevisiae Rad-unchecked origins and early origins identified by copy number change detection at 9 minutes in HU. The smoothed data of ratios of ssdna (S/G) for WT cells at hour post release (green curves) are overlaid with changes in copy number of genomic DNA in the presence of HU (orange shaded curves). The ratios of ssdna are plotted on the Y (left) axes and the copy number change is plotted on the Y (right) axes. The positions of all the clustered ssdna peaks in rad cells representing all origins are indicated by filled blue circles; the peaks that occur in a single timed sample are indicated by filled red circles. The ssdna peaks that meet the statistical criteria for local maxima in the WT profile are indicated by open diamonds: those that match the clustered ssdna peaks in rad cells are shown in blue and those that match the singleton ssdna peaks in rad cells are shown in red. The four ssdna peaks that only appear in WT cells but are not identified as clustered or singleton ssdna peaks in rad cells are shown as solid black diamonds. 6 Nature Publishing Group

Relative Ratio ssdna (S/G) Figure S_chromosome I 6 AT AT ORI 6 8 6 6 8 6 8 6 8 6 8 6 pcr ORI 76 Chromosome I ORI c pars767 ORI c7 6 8 6 Chromosome Coordinate (kb) 6 Nature Publishing Group ORI 8 ORI AT AT rip ARS ORI 9

Figure S_chromosome II Chromosome II Relative Ratio ssdna (S/G) 6 AT AT 6 AT ORI c7 ORI 7 ARS 7 AT 8 tug/rum pars7 pars77 6 AT 96 pars77 6 Chromosome Coordinate (kb) 6 Nature Publishing Group

Figure S_chromosome III Chromosome III Relative Ratio ssdna (S/G) 6 ARS - 6 8 6 AT / AT ORI c nmt 6 8 Chromosome Coordinate (kb) pars7 6 Nature Publishing Group

Figure S. Overlay of ssdna profiles for S. pombe WT (purple) and cds (orange) cells. WT cell profiles are plotted on the Y (left) axes and cds cell profiles are plotted on the Y (right) axes. Positions of significant ssdna peaks in WT and cds cells are indicated by purple and orange filled circles respectively. Positions of AT-rich islands reported by Segurado et al. 9 are shown as green filled circles. Positions of previously mapped origins from Segurado et al. 9 and references therein are shown as red filled circles. Those previously mapped origins that have also been identified as significant ssdna peaks in cds cells are labeled above the graphs. The two origins that eluded our analysis are boxed. 6 Nature Publishing Group

SUPPLEMENTARY INFORMATION The data discussed in this publication have been deposited in NCBI s Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/ ) and are accessible through GEO Series accession number GSE99. Normalization Our experiments utilize Agilent microarrays that contain,87 oligonucleotide probes, covering 6,6 S. cerevisiae ORFs with extensive replicates and Eurogentec microarrays that contain, PCR amplified probes covering,976 S. pombe ORFs with replicates. In our microarray experiments, we sought a quantitative measurement of the ratio of single stranded DNA between a time point N (denoted TP N ) and a reference time point (denoted TP, for time point zero). After collection of DNA for TP N and TP and isolation of genomic DNA for time points N and, we differentially labeled DNA complementary to the single stranded DNA, using Cy- and Cy- dyes, respectively. The labeled samples were hybridized to the microarray. Using GenePix. software (Axon), we converted the Cy- and Cy- fluorescence intensities from TIFF files into numerical intensity data for each labeled sample and extracted the background subtracted median feature pixel intensity numbers for each spot on the array. Averaging over duplicated features, we obtained TP N Cy- signals (denoted a i *) and TP Cy- signals (denoted b i *), where i indexes the set of all yeast ORFs. We arrived at Equation (): observed ratio of ssdna at ORF i = a i */b i *, i indexing the ORFs. 6 Nature Publishing Group

This calculation led to the chromosomal observed single stranded ratio profile by plotting the data points (cj, a j */b j * ), where j runs over the ORFs for a given chromosome. Although informative, this profile suffers from the fact it is not quantitative: the TIFF files produced by core facilities arbitrarily adjust the gain in both Cy channels to achieve a roughly equal balance of total signal. To get around this problem, the data were normalized and made quantitative by applying the scheme laid out in Collingwood et al. (manuscript in preparation). The normalization uses an external measurement of ssdna content in S vs. G (slot blot) to correct one of the Cy channels, so as to restore to the microarray data the S/G ratio seen in the external calibration. The key result asserts the existence of a computable constant g, such that Equation (): actual ratio of ssdna at ORF i = g a i */b i *. The constant g=n/m, where n= (total TP N ss DNA)/(total TP ss DNA), m=(total Cy- signal)/(total Cy- signal). The constant m is directly computed from the array data. Because we isolated equal amounts of TP N and TP DNA, 6 Nature Publishing Group

n = (total TP N ssdna)/(total TP ssdna) = [(total TP N ssdna)/(total TP N DNA)]/ [(total TP ssdna)/(total TP N DNA)] = [(total TP N ssdna)/total TP N DNA]/ [(total TP ssdna)/total TP DNA] = (% TP N ssdna)/(% TP ssdna). We were able to experimentally compute this ratio of percentages, thus allowing us to compute the normalization constant g. We then obtained the actual single stranded ratio profile by plotting the data points (c j, g a j */b j * ), where j runs over the ORFs for a given chromosome. Smoothing We transformed the raw data of ssdna ratio by using the Fourier convolution smoothing technique previously introduced to obtain a smoothed profile. The smoothed profiles offered the advantage of prominently identifying local extrema in the data. In our application of smoothing, a window of kb was specified and a moving average with this window size was constructed. We used this moving average as a target and selected the closest Fourier smoothing among a large family of smoothings. See the supporting online text to Raghuraman et al. for full details of this procedure. Extrema detection Given any discrete dataset of points (x i,y i ), we detected local extrema as follows: First, for each data point, we calculated the numbers 6 Nature Publishing Group

S i L= (y i -y i- )/(x i -x i- ) and S i R= (y i+ y i )/(x i+ -x i ). If S i L > and S i R <, then we flagged the point (x i,y i ) as a local maximum, whereas if S i L < and S i R >, then we flagged the point (x i,y i ) as a local minimum. Identification of significant ssdna peaks In order to calculate standard deviation in the background level of ssdna labeling in all timed samples, we first identified and flagged those data points with values above the median value in each data set, thus removing prominent peaks from our estimation of background variation. We then removed those data points that were flagged in any timed sample data from all timed sample data in all further calculations. Averaging the three median values (Own Median), we obtained the Average Median and normalized the remaining data points by multiplying each value with the constant M (M=Average Median/Own Median). Standard deviation was calculated as the square root of variance (variance = Σ i {/[(X i, - X i, ) +( X i, - X i, ) +( X i, - X i, ) ] / total number of data points}, where i indexes the data points and the subscripts,, and index time points,, and hr). We present a sample calculation (numbers have been rounded here to two decimal places for clarity): 6 Nature Publishing Group

Calculate median for rad at, and hour Median_rad_hr =.9 Median_rad_hr =.8 Median_rad_hr =. Calculate average of the Medians above Average of Median =. Calculate the normalization factor (costant M) for rad at, and hr M_rad_hr =.9/. =.8 M_rad_hr =.8/. =. M_rad_hr =./. =. Normalize each data set by multiplying the smoothed ratio of ssdna (e.g., rad_xhr in the spread sheet below) with the normalization factor constant M and arrive at a normalized value (e.g., rad_xhr_nm in the spread sheet below). Shown below is a spreadsheet of calculations for a portion of chromosome. chr coord rad_hr rad_hr_nm rad_hr rad_hr_nm rad_hr rad_hr_nm 8.8.6..98.7. 9..79.76.7..77..69.66.6..78.8.7.6.6.7.9.7.8.6.6.6..66.9.68.6.. 6 Nature Publishing Group

8.7..7.7..8 9.6.9.6.6.8. 6.7.8.9.6..89 6..8.6.7..79 6.6.8.67.6.97.7 6.6.9.79.76..8 6.77.8.9.89.7.9 6.88...96.7. 8.8..98.9.9. Calculate variance and standard deviation Variance = Σ i {/[(X i, - X i, ) +( X i, - X i, ) +( X i, - X i, ) ]} / total number of data points, where i indexes the data points and the subscripts,, and index time points,, and hr). Standard deviation = square root of variance For example, at the first coordinate (8 kb on chr ), calculate /[(.6-.977) +(.6-.6) +(.977-.6) ] =.7. Repeat this calculation for all the coordinates in the genome and average all of them and arrive at Variance =.8; Standard deviation =.6. We next identified the local maxima and minima in each timed sample and calculated the difference between every local maximum and its two flanking minima (note that for some of the telomeric points, not every local maximum is flanked by two 6 Nature Publishing Group

local minima). Those local maxima with values that are above standard deviations from both its flanking local minima were considered significant ssdna peaks. Those telomeric local maxima that only have a single flanking local minimum but are above standard deviation of the said local minimum are also considered significant ssdna peaks. 6 Nature Publishing Group

Table S. Complete list of clusters of origins identified from the ssdna profiles of rad cells in HU. Column contains the chromosome number. Columns to list locations of all elements of each clustered origin from ssdna profiles of rad cells at three time points as described in the text. For those origin clusters that contain fewer than three elements, blank entries indicate that no significant peak was identified at the given location in that sample. Peak locations that were identified only once among all repetitions of the experiment (true singletons) are shown with asterisks. The positions of ssdna peaks that appear in the WT hour sample are listed in column. The 7 ssdna peaks in WT that do not match clustered ssdna peaks in rad cells are italicized. Note that of these 7 peaks in WT do match a singleton ssdna peak in rad cells. The name and position of the corresponding Pro-ARS are adapted from Wyrick et al., and are listed in column 6 through 9. False indicates predicted Pro-ARSs found not to show ARS activity. Chromosome Position Position Position Position Pro-ARS/ Start (kb) End (kb) Mid(kb) rad_hr rad_hr rad_hr WT_hr ARS (kb) (kb) (kb) (kb) name..69.9 7.7 8.8 8.6.7.7.97.88.8.99 7 7 76 6 69. 7.66 7.97 7.99.88.9 8 7 8 8 7. 7.96 7.6 6 6 6 9 9.79 6.6 9.9 66 7 8 97 96 97 6.97.67.9 6. 7. 6.68 9* 6 Nature Publishing Group

6 8 6 6 6 6 6.9 6.866 6.9 9 9 9 9.89 9.8 9.6 9.877 96.9 96.88.6.9.97 6.87.98.87 7 69 69 7 69. 7.8 69.98 98 * 7 8 7. 8.6 7.79 9.66.7.76 8.9 9. 9. 6.6 6.79 6.8 79.8 79.87 79. 89 88 89 89.7 9.9 89.7 7 8 7 7 7. 8. 7.6 8* 6.86 7.98 7.9 87 87 87 88 6 86.8 87. 86.986 7..6. 7 7 8 9 9 9 8 9.69 9.9 9.8 87 9 89.79 9.96 9. 9 9 9 6 6 6 6 6.8 6.7 6. 6.897 6.9 6. 7 7 7 7. 7.6 7.7 7 7 7 79. 7.8 79.76 7 7 7 7.7 7.9 7.86 77 77 78 77.6 77.76 77.8 6 76.8 76. 76. 77 77 77 7 77.88 77.878 77.7 79 79 8 79.769 79.8 79.86 9 8.88 8.8 8. 89*..6.8...9.8.7.9.8.8.97.7 /.97.787.67 FALSE 8. 8.799 8.67 8 8.777 9.76 9.7 FALSE..68.8 7 7 7 7 8 9 8 8 8 7 9..6 9.688 9.89.88.87 67 67 66 66. 67.8 66.67 9 96 9 9.7 9. 9.8 97.6 99.9 98. 6 Nature Publishing Group

6 6 7 FALSE.6..786.6.9.877 7 7 6 7.86 7.6 7.97 9 9 8 9. 9.7 9.8 7 9.7 9. 9. 8* 7 7.8 8.68 8.8.76.67.8 6 6.78 6..9 7 7 8.99 6.7 6.9 8 8 8 8.986 86.8 86.97 6.9.866.78 7 7. 8.9 7.96 8 9 8 7.9 8.6 7.98 9.9..7.7.99.869 6.9 6.6 6.8 7 8 6.87 6.97 6.678 9 9 9.9..7 8 8 8 7 7.7 8.9 7.869.78.98.8 6 6 6 6 6.6 6.7 6.9 8 8 8 8 7 8.9 8. 8. 6 6 8.9.766.79 68 9 67.79 68.69 68.89 6 68.7 69.9 69. 6 6 6 69.86 6.7 6. 7 7 7 7.796 7.7 7. 7 7 7 7 7 7 7.7 7.7 7.9 76.9 76.6 76.6 86 87 87 8.67 86.69 86. 6 8.99 8.89 8.8 87 7 899. 899.87 899. 9 9 9 9 8 9.8 9. 9.9 99 9 9.68 9.96 9.8 6 7.78 7.7 6.87 8 8 8 7.66 8. 7.9 8. 9.6 9.9 67 67 67 67.7.8.99 76 76 77 76.6 76.688 76.67.7.8.677 6.76 6. 6. 6 Nature Publishing Group

7.6.79.9 8.87 6.99.8 8 8 9 79.6 8.89 79.89.7.79. 7 6 6 7. 7.878 7.6..9.8 6 6 6 6.877 6.6 6. 68.6 69.8 69. 7.66 7.8 7.7 87 87 87 6 87.86 87.9 87.8 7.8.6.78 8.8 6.6 6.6 9.6.67.6 *.97.8. 7 8 6.6 7. 6.87 8.786.9 9...8.868 6 7.8 8.77 8.8 9 6 6 9 7 9.8 9.67 9.9 9 9 9 9 8 9.9 9.6 9.7 9 8.977 9.89 9. 6..96.69 7 7 7 7 7.7 7.9 7.8.87.86. 6 7 7 77 77 76 86.9 87.7 87.7 7 7 7 7 6.7..9 8 8 8 7 6.8 7.9 6.88 9 8 8 8 8.77 9.6 9.9 9..7.7 99 98.7 99. 98.88 9 9 9. 9.7 9.66 69 69. 7.8 69. 6 6.... 6 6* 6..9.99.9 6 6. 8.68 9.76 9. 6 6. 9.76.87. 6 6/6.96.7.89 6 FALSE 8.67 8.8 8.78 6 69 68 68 6 68.697 69. 68.9 6 99 99 99 6 9 9 9 8 6. 8.78 9. 8.9 6 6 6 6 6.69 6.9 6.9 6 67 67 68 68 66 67.7 67.88 67.698 6 67 99. 99.86 99. 6 Nature Publishing Group

6 FALSE..7.7 6 7 6 6 6 6 8 69 6. 7.7 6.99 6 6 69.9 69. 69.76 7 8 8 8 7 7 7 7 6 6 6 7 6. 6. 6.8 7 7 69. 69.67 69. 7 7.7.77. 7 7.7.9.8 7 76 7.6 7.86 7.79 7 6 6 6 77 6.76 6. 6.87 7 78 66. 67.6 66.79 7 88 88 87 79 87.68 87.8 87.6 7 7.87.986. 7 7.88.6. 7 7 69.79 7.8 69.779 7 8 8 8 86 7 8.98 87. 86.69 7 7 88. 88. 88.8 7 76.96.6.676 7 9 9 9 9 77 88.7 88.966 88.88 7 78..8.86 7 8 8 87 8 79 8.77 8.96 8. 7 7 7 7 8 7 8.8 9. 8.699 7 7 67.7 68.7 68. 7 77 77 77 76 7 7.9 7.9 7.9 7 7 7.78 76. 7.99 7 7 8. 8.7 8.88 7 7 67.77 67. 67. 7 76 6.97 6.78 6.8 7 69 69 67 77 69.76 66.7 66.8 7 7 7 77 76 78 7.976 7.879 7.7 7 778 778 778 777 79 777. 778.778 778. 7 7 79.678 79.6 79.67 7 8 8 8 8 7 8.76 8.68 8.79 7 889 889 889 89 7 98 99 99 7 96.69 97. 96.86 7 978 978 979 7 977.77 978. 977.9 7 7 999..8 999.8 7 7..8.76 7 6 6 6 7 76 8.79 8.86 8.6 7 88 89 8 8.7 6..9 8 7 7 8 8. 8.98 8.6 8 8..66.68 8 6 6 6 8.688.97.87 8 6 6 6 6 8 6. 6.8 6.6 6 Nature Publishing Group

8 6 86 6.7 6. 6.9 8 87.8..6 8 69 68 68 89 68. 68.88 68.76 8 8.7.8.9 8 8.78.8.9 8 8 8.8 8.998 8.9 8 6 6 6 8.7 6.9.88 8 8 86.77 87. 86.9 8 97 97 96 99 8 96.9 97.87 96.98 8 6 9 86 9.7 9.7 9.9 8 87 8.6 8.7 8.898 8 9 9 9 88 9.697 9.69 9.68 8 89 9.7 9.6 9.99 8 7 7 6 8 6.9 7. 7.7 8 7 7 7 8 7. 7.6 7.8 8 8.8.8.76 8 8 9.7.8 9.89 8 8 6.69 7. 6.8 9 9 7.7 8.79 8. 9 9.9.9. 9 7 9 6. 6.78 6.6 9 9.66.78. 9 96.6.79. 9 7 76 7 97 7. 7.8 7.88 9 98 8.9 8. 8.9 9 7 6 6 6 99.676 6.7.89 9 9.7.86. 9 6 6 6 9 6. 6.6 6.78 9 7 7 7 9 7.67 76.6 76. 9 9 9 9 7 7 9 9.7.988.8 9 6 6 6 9 9 7.9 7.899 7.7 9 9 8.9 8.87 8.6 9 96 8.99 8.7 8.86 9 9 9 97..6.79 9 98.89..96 9 99...979 9 6 6 8 9 6.89 7. 7. 9 9 6.9 6.8 6.69 9 9.8..9 9 7 *..7.98 7. 8.776 8. 6 6. 6.767 6..79.. FALSE 6.77 6.887 6.89 67 66 66 67.68 67.89 67.78 6 99.6 99.697 99.78 6 Nature Publishing Group

7.7.7.7 6 6 6 8...7 8 8 8 7 9 8.97 8.7 8.9 98.7 98.8 98.7 7 7 7 6.7 7.967 7. 76 76 77 76 7.7 7. 7.6 7.7 76.8 7.9 8 8 8 9 6.7 7. 6.98..96.8 8 6 FALSE 8.87 9.6 9.99 8 8.9. 9.7 6 6 6 6 9 6.6 6.877 6.79 FALSE 69.7 6. 69.9 FALSE 6.8 6.79 6.88 6 6 6 6.97 6.6 6.76 68 68.97 68.6 68.6 686 686 68 FALSE 69. 69.9 69.6 FALSE 7.8 7.76 7.8 7 7 7 7.67 7.9 7.7 79 79 79.8 7. 79.7 7 7 76.86 77.697 77. 7.688 7.6 7..799.8. 7.9 8. 7.99 6.6.9.768 88.79 89.89 89. 98 97 98..7.886 6.7.78.89 97 96 7 8 7 7 7.88 8.8 7.88 8.69.7. 9 8 9 9.88 9.8 9.9.6.7.8 8.7 8. 8.96 88 88 88 88.6 89. 88.78 7 8 8 6. 7.9 6.8 7 6 7.8 8.6 7.7 6.7 6.8 6.88 6 6. 7.9 6.8 7..86. 8 8 8 8 8. 8.9 8.8 6 Nature Publishing Group

9 8.8 8.9 8.87 6 6 6 6.9 6. 6. 67.7 67.6 67. 6* 6.97 6. 6.66 6 6 6 6.6 6. 6.778 68.9 69. 69.6 9.77.8.899.9.8.796.86.89.7 7.7 7.89 7.8 77 77 78 76.76 77. 76.98 9 9 9 9 6 9.8 9.9 9.698 9 9 7 8.8 9.6 8.67 7 6 7 9 6. 6.8 6.9..686.9..7.77 89 89 9 89. 9. 89.7 6 7 7 7 7 7.98 7 7.6 8. 8.76 8.6..8.88 9 6 9.9. 9.76 7.6.786. 6 6 6 6 8 6.6 6.7 6.9 6 6 9 6.98 6.86 6.9 69 69 69 66 69.79 66.76 66. 688 7.78 7.8 7. 7 7 7 7. 7.8 7.6 7 7 7 77 7.8 7.6 7. 78.87 79. 78.9 769.8 77.6 77.7 79 79 79 6 79.988 79.8 79.6 8 8 8 7 8.9 8.86 8.6 888 888 889 8 99.799 9.99 9.96 9 97.6 98.79 98.77 98 98 9 9.888 9.6 9.97 97 97 96 97.97 98.6 98.68 7 7 6 6.989 7. 7...99.9.98.8.8 8. 9.7 9. 67 67 6 6.76 6.6 6.9 7 7. 7. 7.866 6 6.96 7. 6.6 6 Nature Publishing Group

9.9.99 9.7.68..9 9.8.87. 9 9 9 9.6 9. 9.87 6...8 7 6 7 7 7 7.8 7. 7. 9 9 9 8 8 79 8 8 8.968 8.7 8.69 6 6 6 6 6 9 6.68 6.8 6.8 87 89 89 88 86.6 87.97 86.76 89.7 9.8 89.87 69 69 69 7.6 7.8 7.998 7.7 7.696 7.66.66.8.7...789 67 67 67 6 68.6 68. 68. 7 77.6 78. 77.9 8 9. 9.6 9.9 9.88.77.79 6 6 7.69 6.6.887.88.977.98...87 7 7 6 6 6 6. 6.79 6.96 6 66 6 6.7 6.688 6.79 69 68 68 68 69.79 6. 69.67 689 689 69 6 688.86 689.8 688.97 79 76 76 7 77.86 78. 77.8 77 77 8 77.7 77.9 77.7 9 8.9 8.6 8.8 8 8 8 87 8. 8.6 8.8 87.76 87.98 87.86 878 878 878 898 897 896 899 897.6 898. 898. 98 9 98 97.89 98.6 98. 9. 9.9 9. 8 9 7.9 8. 7.87.6..97.6.8.97 8 7 7. 8.8 7.8 * 6 6 6 6 6.9 6.86 6.68 89 7 89.7 9. 89.99 8 9.799 9. 9. 98* 9 96. 96.6 96. 7 7 7 6.97 6.8 6.7 7 7 69 69.7 7.7 69.6 6 Nature Publishing Group

96 96 96 96. 96. 96.6 9 9..9.6 79 8 8 79.89 8.8 8.6.8.8.788 6..6.7 7..68.67 8 6.8 6.99 6.69 9 9.7 9.867 9.6 99 99 99 98.88 99.68 98.98 6.97 6. 6.6 6 6 6 9 6.76 6.6 6.8 68. 68. 68.67 77 77 6 69 69 6 69.8 6. 69.78 66.7 67. 66.7 66 66 6 67 6 6.96 6.9 6.69 69 69 7 69.8 69. 69.78 7 7 7 8 7.6 7.6 7.88 79 79 7 76 76 76 9 76.98 76.7 76.676 78* 78.77 78.8 78.78.78.89.86 8.76.8.6...7 6 8.6 9.9 9. 7 7 7 7 7 7.7 7. 7.87 8 8 8 8 8.6 8. 8.9 6 6 9.87.7. 66 66 67 67 66.9 67.88 67.9 8 6 7 8 78 79 8 77 77.6 78.6 77.8 9.9 9.9 9.6 8 9 6 7. 7.68 7. 8 8 8 6 6 6 6 9* 98 97 98 9.68 9. 9.796 67 67 66 6 66.89 66.87 66. 6 6 6 7 6.7 6.8 6.6 68 68 8 66.669 67.6 67.9 67 67 67 9 66.68 67. 66.89 68 68 68 68. 68. 68.99 7 7 7 79.6 7.6 79.8 767 767 766 766.6 766.867 766.7 766.9 767. 767.69 78 78 78 78.96 78.666 78.8 6 Nature Publishing Group

87 87 87 8.69 8. 8.87 87 87 6 87.66 87.78 87. 7 9.87 9.76 9.9 98 98 98 98 8 97.6 98. 97.99 96* 98 98 98 9 98.8 98. 98.979 8* 77.9 78.7 78. 88 86 6 6 6.97 7.9 7. 6 6.6.8.9 6 6 8.7 9.69 8.79 6 6.869.8.76 6 7 7 7 7 6 7.6 7.6 7.8 6 9 9 9 66 9.6 9.6 9.69 6 7 7 8 67 6.87 7.67 6.777 6 6 6 6 68 6. 6.6 6.8 6 9 9 9 6 69.77.76.7 6 6.986 6..98 6 6.87.66.6 6 6 6 6 6 6.9 6.76 6.8 6 6 66. 66.79 66. 6 9 89 89 9 6 89. 89.668 89. 6 6..86.99 6 7 7 6 66 6.87 7. 7.8 6 67.698.99.898 6 8 8 86 68 8. 8.768 8.8 6 8 8 9 69 8. 8.9 8.8 6 6.78.88.8 6 7 7 7 6 77 77 6 6.66.9.98 6 6 6 6 6 6 6.76 6. 6.88 6 8 8 8 6 6 6 6 6 6 6.76 6.8 6.97 6 6 6 68 68 6 68.8 68. 68.9 6 696 696 6 69.8 69.7 69.8 6 79 78 7 66 78.997 79. 79. 6 779 779 779 778 6 8 8 8 67 89.7 89. 89.97 6 8 8 8 8 68 8.66 8.7 8.96 6 69 8.7 8.6 8.6 6 88 88 88 6 88.7 88.8 88. 6 9 9 9 6 9.99 9.89 9.66 6 9 9 6 9.79 9.7 9. 6 Nature Publishing Group

Table S. List of ABOs that do not match the clustered ssdna peaks from the rad ssdna profiles within a kb distance. The chromosomal coordinate positions for Pro- ARSs and origins from previous studies are imported from MacAlpine and Bell, available at http://bell-lab-server.mit.edu/aborimaps/stable.html. Column is chromosome number. Column is the calculated average position of the following positions reported in MacAlpine and Bell : the start of Pro-ARSs from Wyrick et al. (column ), the origin positions from Yabuki et al. (column ) and the origin positions from Raghuraman et al. (column ). The comparison between ABOs and our clustered ssdna peaks uses column as the average position of an ABO. chr Ave_coord(kb) Pro-ARS_start(kb) CN_start(kb) HL_start(kb).9..79 9.8 69.76 68.7 69.79 6.9.66.9.79.6 9.7. 8.9 8. 7.9.7.6.8 7.67 7.6.6.8 7 6.8 6.76 6.6 6.8 7 87.66 88. 86.6 87.87 7 8.7. 6.6 8.9 7 7.78 7.9 7.6 78. 7.9..6 999.68 7.9 7.7 7.9 7.6 6.8 67.7 6. 6.6 6. 8..8.7 86.89 86. 8.7 89.9 7.96 7. 7.7 69. 88.9 89. 9.7 86.77 6.7 66. 6.7 6.7 6..78 9.6 7.96 6 Nature Publishing Group