MuG - Process TSV Pipelines Documentation

Similar documents
Additional License Authorizations for HPE OneView for Microsoft Azure Log Analytics

Bar Code Discovery. Administrator's Guide

MQ Port Scan Installation and Operation Manual

CA File Master Plus. Release Notes. Version

ServerStatus Installation and Operation Manual

Panasonic Audio Player 2 User Guide

Technics Audio Player User Guide

Distributed Intelligent Capture. Integration Guide

Bar Code Discovery. Administrator's Guide

Altus Shared Data Experience (SDX)

Cloudera QuickStart VM

AMS Device Manager. Planning and Installation Guide May Version 13.5 Planning and Installation Guide

Installation and Configuration Guide Simba Technologies Inc.

Grid Roll: Users Guide. Version 4.3 Edition

Print from SharePoint. Administrator's Guide

RTI Administration Console Release Notes

VERITAS Cluster Server

Solution Composer. Version 2.8. User's Guide

NexPump Ai Notification Guide

User s Guide for Windows with Stata, SAS, and R

Amazon Merchant Transport Utility

Online Manual. Print Plug-In for Office Guide. English

Installing ABAP Development Tools for SAP NetWeaver Client Version 2.44

Danfoss Turbocor Twin-Turbine Centrifugal Compressors

Prerequisite Softwares and Licenses

Ecma International Policy on Submission, Inclusion and Licensing of Software

USER ADMINISTRATOR HANDBOOK

Condor Roll: Users Guide. Version 5.4 Edition

Ecma International Policy on Submission, Inclusion and Licensing of Software

Oracle SQL Developer. User s Guide Release 3.0 E

Message Multiplexer (MMX) Operation and Installation Manual

Education Station. Administrator's Guide

Scan to Digitech v1.0

IETF TRUST. Legal Provisions Relating to IETF Documents. February 12, Effective Date: February 15, 2009

IETF TRUST. Legal Provisions Relating to IETF Documents. Approved November 6, Effective Date: November 10, 2008

OCTOSHAPE SDK AND CLIENT LICENSE AGREEMENT (SCLA)

The manufacturer of this Documentation is CA.

magicolor 3730DN Reference Guide A0VD A

Oracle Financial Services Asset Liability Management Application Pack

SDLC INTELLECTUAL PROPERTY POLICY

Installing the Tool. Contents. Starting the Tool

Terms of Use. Changes. General Use.

Open Source Used In IVRMgr 1.0.0

MERIDIANSOUNDINGBOARD.COM TERMS AND CONDITIONS

Oracle SQL Developer. User's Guide Release 3.2 E

Enterprise Payment Solutions (EPS) JHA EPS SmartPay Business October 2017 Remote Deposit Now. Remote Deposit Now (RDN) Handbook

Sophos Central Migration Tool Help. Product version: 2.0

Cisco Voice Provisioning Tool Release Notes, Release 1.0(1)

Enterprise Payment Solutions. SmartPay Express for the Merchant. November JHA EPS SmartPay Express SM

FLUENDO GENERIC EULA

The Travel Tree Terms and Conditions

TD-04 Contributed Software Attribution Policy

Enterprise Browser. for Honeywell Computers powered by Android. User Guide

Launcher. for Honeywell Computers powered by Android. User Guide

Oracle Binary Code License Agreement for Java Secure Sockets Extension for Connected Device Configuration 1.0.2

CALSTRS ONLINE AGREEMENT TERMS AND CONDITIONS

Policies & Medical Disclaimer

INCLUDING MEDICAL ADVICE DISCLAIMER

Enterprise Payment Solutions JHA EPS SmartPay Business SM February User Reports Handbook

End User License Agreement

BCDC 2E, 2012 (On-line Bidding Document for Stipulated Price Bidding)

XEP-0099: IQ Query Action Protocol

Revised 10/15/2014 WEBSITE TERMS OF USE

EMPLOYER CONTRIBUTION AGREEMENT

Oracle Binary Code License Agreement for the Java SE Platform Products and JavaFX

What would you do if you knew? Teradata JDBC Driver for Presto Installation and Configuration Guide Release B K May 2016

Using Oracle SQL Developer Web. Release 18.1

Mobile Banking and Mobile Deposit Terms & Conditions

Terms Of Use AGREEMENT BETWEEN USER AND DRAKE MODIFICATION OF THESE TERMS OF USE LINKS TO THIRD PARTY WEB SITES USE OF COOKIES

Open Source Used In Cisco Configuration Professional for Catalyst 1.0

Migration Assistant for Unicode Guide Release 2.1

Java DB Developer's Guide

AhnLab Software License Agreement

Remote Deposit Anywhere Service

Apple Inc. itunes 10 and QuickTime 7 Bundling Agreement (University CD Distribution) Licensee (Institution Name): Individual to Contact:

The Safe and Sound Services are available for individuals aged 18 years or older.

Migration Tool. Migration Tool (Beta) Technical Note

LOGO LICENSE AGREEMENT(S) CERTIPORT AND IC³

MySQL Enterprise Monitor 2.2

TERMS & CONDITIONS. Complied with GDPR rules and regulation CONDITIONS OF USE PROPRIETARY RIGHTS AND ACCEPTABLE USE OF CONTENT

FONT SOFTWARE END USER LICENSE AGREEMENT. We recommend that you print this Font Software End User License Agreement for further reference.

Terms of Use for companies accessing MyStay Product Database via MyStay API

Network-MIDI Driver Installation Guide

Avaya Software Keycode Installation Guide

PRODUCT SPECIFIC LICENSE TERMS Sybase Enterprise Portal Version 5 Application Edition ( Program )

4. Save as expressly set out herein no license is granted in respect of any intellectual property rights vested in F1000 or other third parties.

DME-N Network Driver Installation Guide for M7CL

Site Impact Policies for Website Use

python Roll: Users Guide 5.5 Edition

Getting Started with Derby. Version Derby Document build: August 10, 2009, 1:01:09 PM (PDT)

Oracle Technology Network Developer License Terms for Java Card Classic Edition and Java Card Connected Edition Specifications

Getting Started (No installation necessary) Windows On Windows systems, simply double click the AntGram icon to launch the program.

KEMP Driver for Red Hat OpenStack. KEMP LBaaS Red Hat OpenStack Driver. Installation Guide

PRODUCT SPECIFIC LICENSE TERMS Sybase Enterprise Portal Version 5 Enterprise Edition ( Program )

Enterprise Payment Solutions (EPS) JHA EPS SmartPay Business SM Remote Deposit Complete. April Remote Deposit Complete Handbook

VSC-PCTS2003 TEST SUITE TIME-LIMITED LICENSE AGREEMENT

Open Source Used In AsyncOS 11.4 for Cisco Content Security Management Appliances

Packet Trace Guide. Packet Trace Guide. Technical Note

End User Licence. PUBLIC 31 January 2017 Version: T +44 (0) E ukdataservice.ac.uk

Oracle Technology Network Developer License Terms for Java Card Classic Edition and Java Card Connected Edition Software Development Kits

Transcription:

MuG - Process TSV Pipelines Documentation Release 0.1 Mark McDowall Oct 31, 2018

Table of Contents 1 Requirements and Installation 1 1.1 Requirements............................................... 1 1.2 Installation................................................ 1 2 Pipelines 3 2.1 BED File Indexing............................................ 3 2.2 WIG File Indexing............................................ 4 2.3 GFF3 File Indexing........................................... 5 2.4 3D JSON Indexing............................................ 6 3 Tools to index genomic files 9 3.1 BED Indexer............................................... 9 3.2 WIG Indexer............................................... 11 3.3 GFF3 Indexer............................................... 12 3.4 3D JSON Indexer............................................. 13 4 License 17 5 Indices and tables 21 Python Module Index 23 i

ii

CHAPTER 1 Requirements and Installation 1.1 Requirements 1.1.1 Software Mongo DB 3.2 Python 2.7.10+ SamTools bedtobigbed - http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/ wigtobigwig - http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/ HDF5 1.1.2 Python Modules numpy h5py pybigwig pysam 1.2 Installation Directly from GitHub: 1 git clone https://github.com/multiscale-genomics/mg-process-files.git Using pip: 1

1 pip install git+https://github.com/multiscale-genomics/mg-process-files.git 2 Chapter 1. Requirements and Installation

CHAPTER 2 Pipelines 2.1 BED File Indexing This pipeline can process bed files into bigbed and HDF5 index files for web use. 2.1.1 Running from the command line assembly [str] Genome assembly ID (e.g. GCA_000001405.22) chrom [int] Location of chrom.size file bed_file [str] Location of input bed file h5_file [str] Location of HDF5 output file Returns BigBed [file] BigBed file HDF5 [file] HDF5 index file Example When using a local verion of the [COMPS virtual machine](http://www.bsc.es/computer-sciences/grid-computing/ comp-superscalar/downloads-and-documentation): chrom.size file 3

1 1 123000000 2 2 50000000 3 3 25000000 4 4 10000000 5 5 5000000 6 X 75000000 7 Y 12000000 1 runcompss --lang=python /home/compss/mg-process-files/process_bed.py --assembly GCA_ 000001405.22 --chrom chrom.size --bed_file <data_dir>/expt.bed --h5_file <data_dir>/ expt.hdf5 2.1.2 Methods class process_bed.process_bed(configuration=none) Workflow to index BED formatted files within the Multiscale Genomics (MuG) Virtural Research Environment (VRE) run(input_files, metadata, output_files) Main run function to index the BED files ready for use in the RESTful API. BED files are index in 2 different ways to allow for optimal data retreival. The first is as a bigbed file, this allows the data to get easily extracted as BED documents and served to the user. The second is as an HDF5 file that is used to identify which bed files have information at a given location. This is to help the REST clients make only the required calls to the relevant BED files rather than needing to pole all potential BED files. inpout_files (list) List of file locations metadata (list) Returns outputfiles List of locations for the output BED and HDF5 files Return type list 2.2 WIG File Indexing This pipeline can process WIG files into bigbed and HDF5 index files for web use. 2.2.1 Running from the command line assembly [str] Genome assembly ID (e.g. GCA_000001405.22) chrom [int] Location of chrom.size file wig_file [str] Location of input wig file h5_file [str] Location of HDF5 output file 4 Chapter 2. Pipelines

Returns BigWig [file] BigWig file HDF5 [file] HDF5 index file Example When using a local verion of the [COMPS virtual machine](http://www.bsc.es/computer-sciences/grid-computing/ comp-superscalar/downloads-and-documentation): chrom.size file: 1 1 123000000 2 2 50000000 3 3 25000000 4 4 10000000 5 5 5000000 6 X 75000000 7 Y 12000000 1 runcompss --lang=python /home/compss/mg-process-files/process_wig.py --assembly GCA_ 000001405.22 --chrom chrom.size --wig_file <data_dir>/expt.wig --h5_file <data_dir>/ expt.hdf5 2.2.2 Methods class process_wig.process_wig(configuration=none) Workflow to index WIG formatted files within the Multiscale Genomics (MuG) Virtural Research Environment (VRE) run(input_files, metadata, output_files) Main run function to index the WIG files ready for use in the RESTful API. WIG files are indexed in 2 different ways to allow for optimal data retreival. The first is as a bigwig file, this allows the data to get easily extracted as WIG documents and served to the user. The second is as an HDF5 file that is used to identify which bed files have information at a given location. This is to help the REST clients make only the required calls to the relevant WIG files rather than needing to pole all potential WIG files. inpout_files (list) List of file locations metadata (list) Returns outputfiles List of locations for the output BED and HDF5 files Return type list 2.3 GFF3 File Indexing This pipeline can process GFF3 files into Tabix and HDF5 index files for web use. 2.3. GFF3 File Indexing 5

2.3.1 Running from the command line assembly [str] Genome assembly ID (e.g. GCA_000001405.22) gff3_file [str] Location of the source gff3 file h5_file [str] Location of HDF5 index file Returns Tabix [file] Tabix index file HDF5 [file] HDF5 index file Example When using a local verion of the [COMPS virtual machine](http://www.bsc.es/computer-sciences/grid-computing/ comp-superscalar/downloads-and-documentation): 1 runcompss --lang=python /home/compss/mg-process-files/process_gff3.py --assembly GCA_ 000001405.22 --gff3_file <data_dir>/expt.gff3 --h5_file <data_dir>/expt.hdf5 2.3.2 Methods class process_wig.process_wig(configuration=none) Workflow to index WIG formatted files within the Multiscale Genomics (MuG) Virtural Research Environment (VRE) run(input_files, metadata, output_files) Main run function to index the WIG files ready for use in the RESTful API. WIG files are indexed in 2 different ways to allow for optimal data retreival. The first is as a bigwig file, this allows the data to get easily extracted as WIG documents and served to the user. The second is as an HDF5 file that is used to identify which bed files have information at a given location. This is to help the REST clients make only the required calls to the relevant WIG files rather than needing to pole all potential WIG files. inpout_files (list) List of file locations metadata (list) Returns outputfiles List of locations for the output BED and HDF5 files Return type list 2.4 3D JSON Indexing This pipeline processes the D JSON models that have been generated via TADbit into a single HDF5 file that can be used as part of a RESTful API for efficient querying and retrieval of the models. 6 Chapter 2. Pipelines

2.4.1 Running from the command line gz_file [str] Location of the input tar.gz file containing all of the output models and data from the TADbit modelling stage. Returns HDF5 [file] HDF5 index file Example When using a local verion of the [COMPS virtual machine](http://www.bsc.es/computer-sciences/grid-computing/ comp-superscalar/downloads-and-documentation): 1 runcompss --lang=python /home/compss/mg-process-files/process_json_3d.py --gz_file <data_dir>/expt.tar.gz 2.4.2 Methods class process_json_3d.process_json_3d(configuration=none) Workflow to index JSON formatted files within the Multiscale Genomics (MuG) Virtural Research Environment (VRE) that have been generated as part of the Hi-C analysis pipeline to model the 3D structure of the genome within the nucleus of the cell. run(input_files, metadata, output_files) Main run function to index the 3D JSON files that have been generated as part of the Hi-C analysis pipeline to model the 3D structure of the genome within the nucleus of the cellready for use in the RESTful API. files_ids (list) file [str] Location of the tar.gz file of JSON files representing the 3D models of the nucleus metadata (list) Returns outputfiles List with the location of the HDF5 index file for the given dataset Return type list 2.4. 3D JSON Indexing 7

8 Chapter 2. Pipelines

CHAPTER 3 Tools to index genomic files 3.1 BED Indexer class mg_process_files.tool.bed_indexer.bedindexertool(configuration=none) Tool for running indexers over a BED file for use in the RESTful API bed2bigbed(**kwargs) BED to BigBed converter This uses the bedtobigbed program binary provided at http://hgdownload.cse.ucsc.edu/admin/exe/ linux.x86_64/ to perform the conversion from bed to bigbed. file_sorted_bed (str) Location of the sorted BED file file_chrom (str) Location of the chrom.size file file_bb (str) Location of the bigbed file Example 1 if not self.bed2bigbed(bed_file, chrom_file, bb_file): 2 output_metadata.set_exception( 3 Exception( 4 "bed2bigbed: Could not process files {}, {}.".format(*input_ files))) bed2hdf5(**kwargs) BED to HDF5 converter Loads the BED file into the HDF5 index file that gets used by the REST API to determine if there are files that have data in a given region. Overlapping regions are condensed into a single feature block rather than maintaining all of the detail of the original bed file. 9

file_id (str) The file_id as stored by the DM-API so that it can be used for file retrieval later assembly (str) Assembly of the genome that is getting indexed so that the chromosomes match feature_length (int) Defines the level of resolution that the features should be recorded at. The 2 options are 1 or 1000. 1 records features at every single base whereas 1000 groups features into 1000bp chunks. The single base pair option should really only be used when features are less than 10bp to file_sorted_bed (str) Location of the sorted BED file file_hdf5 (str) Location of the HDF5 index file Example 1 if not self.bed2hdf5(file_id, assembly, bed_file, hdf5_file): 2 output_metadata.set_exception( 3 Exception( 4 "bed2hdf5: Could not process files {}, {}.".format(*input_files))) bed_feature_length(file_bed) BED Feature Length Function to calcualte the averagte length of a feature in BED file. file_bed (str) Location of teh BED file Returns average_feature_length The average length of the features in a BED file. Return type int run(input_files, input_metadata, output_files) Function to run the BED file sorter and indexer so that the files can get searched as part of the REST API Returns input_files (list) bed_file [str] Location of the sorted bed file chrom_size [str] Location of chrom.size file hdf5_file [str] Location of the HDF5 index file metadata (list) file_id [str] file_id used to identify the original bed file assembly [str] Genome assembly accession bed_file [str] Location of the sorted bed file bb_file [str] Location of the BigBed file hdf5_file [str] Location of the HDF5 index file Return type list 10 Chapter 3. Tools to index genomic files

Example 1 import tool 2 3 # Bed Indexer 4 b = tool.bedindexertool(self.configuration) 5 bi, bm = bd.run( 6 [bed_file_id, chrom_file_id, hdf5_file_id], [], {'assembly' : assembly} 7 ) 3.2 WIG Indexer class mg_process_files.tool.wig_indexer.wigindexertool(configuration=none) Tool for running indexers over a WIG file for use in the RESTful API run(input_files, input_metadata, output_files) Function to run the WIG file sorter and indexer so that the files can get searched as part of the REST API Returns input_files (dict) wig_file [str] Location of the wig file chrom_size [str] Location of chrom.size file hdf5_file [str] Location of the HDF5 index file meta_data (dict) bw_file [str] Location of the BigWig file hdf5_file [str] Location of the HDF5 index file Return type list wig2bigwig(**kwargs) WIG to BigWig converter This uses the wigtobigwig program binary provided at http://hgdownload.cse.ucsc.edu/admin/exe/ linux.x86_64/ to perform the conversion from WIG to BigWig. file_wig (str) Location of the wig file file_chrom (str) Location of the chrom.size file file_bw (str) Location of the bigwig file Example 1 if not self.wig2bigwig(wig_file, chrom_file, bw_file): 2 output_metadata.set_exception( 3 Exception( 4 "wig2bigwig: Could not process files {}, {}.".format(*input_ files))) 3.2. WIG Indexer 11

wig2hdf5(**kwargs) WIG to HDF5 converter Loads the WIG file into the HDF5 index file that gets used by the REST API to determine if there are files that have data in a given region. Overlapping regions are condensed into a single feature block rather than maintaining all of the detail of the original WIG file. file_id (str) The file_id as stored by the DMP so that it can be used for file retrieval later assembly (str) Assembly of the genome that is getting indexed so that the chromosomes match file_wig (str) Location of the wig file file_hdf5 (str) Location of the HDF5 index file Example 1 if not self.wig2hdf5(file_id, assembly, wig_file, hdf5_file): 2 output_metadata.set_exception( 3 Exception( 4 "wig2hdf5: Could not process files {}, {}.".format(*input_files))) 3.3 GFF3 Indexer class mg_process_files.tool.gff3_indexer.gff3indexertool(configuration=none) Tool for running indexers over a WIG file for use in the RESTful API gff32hdf5(**kwargs) GFF3 to HDF5 converter Loads the GFF3 file into the HDF5 index file that gets used by the REST API to determine if there are files that have data in a given region. Overlapping regions are condensed into a single feature block rather than maintaining all of the detail of the original bed file. file_id (str) The file_id as stored by the DM-API so that it can be used for file retrieval later assembly (str) Assembly of the genome that is getting indexed so that the chromosomes match file_sorted_gff3 (str) Location of the sorted GFF3 file file_hdf5 (str) Location of the HDF5 index file Example 1 if not self.gff32hdf5(file_id, assembly, bed_file, hdf5_file): 2 output_metadata.set_exception( 3 Exception( 4 "gff32hdf5: Could not process files {}, {}.".format(*input_ files))) (continues on next page) 12 Chapter 3. Tools to index genomic files

(continued from previous page) gff32tabix(**kwargs) GFF3 to Tabix Compresses the sorted GFF3 file and then uses Tabix to generate an index of the GFF3 file. file_sorted_gff3 (str) Location of a sorted GFF3 file file_sorted_gz_gff3 (str) Location of the bgzip compressed GFF3 file file_gff3_tbi (str) Location of the Tabix index file Example 1 if not self.gff32tabix(self, file_sorted_gff3, gz_file, tbi_file): 2 output_metadata.set_exception( 3 Exception( 4 "gff32tabix: Could not process files {}, {}.".format(*input_ files))) run(input_files, input_metadata, output_files) Function to run the BED file sorter and indexer so that the files can get searched as part of the REST API Returns input_files (list) gff3_file [str] Location of the bed file hdf5_file [str] Location of the HDF5 index file meta_data (list) file_id [str] file_id used to identify the original bed file assembly [str] Genome assembly accession gz_file [str] Location of the sorted gzipped GFF3 file tbi_file [str] Location of the Tabix index file hdf5_file [str] Location of the HDF5 index file Return type list 3.4 3D JSON Indexer class mg_process_files.tool.json_3d_indexer.json3dindexertool(configuration=none) Tool for running indexers over 3D JSON files for use in the RESTful API json2hdf5(**kwargs) Genome Model Indexing 3.4. 3D JSON Indexer 13

Load the JSON files generated by TADbit into a specified HDF5 file. The file includes the x, y and z coordinates of all the models for each region along with the matching stats, clusters, TADs and adjacency values used during the modelling. json_files (list) Locations of all the JSON 3D model files generated by TADbit for a given dataset file_hdf5 (str) Location of the HDF5 index file for this dataset. Example 1 if not self.json2hdf5(json_files, assembly, wig_file, hdf5_file): 2 output_metadata.set_exception( 3 Exception( 4 "wig2hdf5: Could not process files {}, {}.".format(*input_files))) run(input_files, input_metadata, output_files) Function to index models of the geome structure generated by TADbit on a per dataset basis so that they can be easily distributed as part of the RESTful API. Returns input_files (list) gz_file [str] Location of the archived JSON model files hdf5_file [str] Location of the HDF5 index file meta_data (list) file_id [str] file_id used to identify the original wig file assembly [str] Genome assembly accession hdf5_file [str] Location of the HDF5 index file Return type list Example 1 import tool 2 3 # WIG Indexer 4 j3d = tool.json3dindexertool(self.configuration) 5 j3di = j3d.run((gz_file, hdf5_file_id), ()) unzipjson(file_targz) Unzips the zipped folder containing all the models for regions of the genome based on the information within the adjacency matrixes generated by TADbit. archive_location (str) Location of archived JSON files Returns json_file_locations List of the locations of the files within an extracted archive Return type list 14 Chapter 3. Tools to index genomic files

Example 1 gz_file = '/home/<user>/test.tar.gz' 2 json_files = unzip(gz_file) 3.4. 3D JSON Indexer 15

16 Chapter 3. Tools to index genomic files

CHAPTER 4 License Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ 1. Definitions. License shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. Licensor shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. Legal Entity shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, control means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. You (or Your ) shall mean an individual or Legal Entity exercising permissions granted by this License. Source form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. Object form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. Work shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). Derivative Works shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. 17

Contribution shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, submitted means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as Not a Contribution. Contributor shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royaltyfree, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a NOTICE text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from 18 Chapter 4. License

the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets {} replaced with your own identifying information. (Don t include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same printed page as the copyright notice for easier identification within third-party archives. Copyright 2016 EMBL-European Bioinformatics Institute 19

Licensed under the Apache License, Version 2.0 (the License ); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/license-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. 20 Chapter 4. License

CHAPTER 5 Indices and tables genindex modindex search 21

22 Chapter 5. Indices and tables

Python Module Index m mg_process_files.tool, 9 p process_bed, 3 process_gff3, 5 process_json_3d, 6 process_wig, 4 23

24 Python Module Index

Index B run() (mg_process_files.tool.gff3_indexer.gff3indexertool bed2bigbed() (mg_process_files.tool.bed_indexer.bedindexertool method), 13 method), 9 run() (mg_process_files.tool.json_3d_indexer.json3dindexertool bed2hdf5() (mg_process_files.tool.bed_indexer.bedindexertool method), 14 method), 9 run() (mg_process_files.tool.wig_indexer.wigindexertool bed_feature_length() (mg_process_files.tool.bed_indexer.bedindexertool method), 11 method), 10 run() (process_bed.process_bed method), 4 bedindexertool (class in run() (process_json_3d.process_json_3d method), 7 mg_process_files.tool.bed_indexer), 9 run() (process_wig.process_wig method), 5, 6 G gff32hdf5() (mg_process_files.tool.gff3_indexer.gff3indexertool unzipjson() (mg_process_files.tool.json_3d_indexer.json3dindexertool method), 12 method), 14 gff32tabix() (mg_process_files.tool.gff3_indexer.gff3indexertool method), 13 W gff3indexertool (class in wig2bigwig() (mg_process_files.tool.wig_indexer.wigindexertool mg_process_files.tool.gff3_indexer), 12 method), 11 wig2hdf5() (mg_process_files.tool.wig_indexer.wigindexertool J method), 11 json2hdf5() (mg_process_files.tool.json_3d_indexer.json3dindexertool wigindexertool (class in method), 13 mg_process_files.tool.wig_indexer), 11 json3dindexertool (class in mg_process_files.tool.json_3d_indexer), 13 M mg_process_files.tool (module), 9 P process_bed (class in process_bed), 4 process_bed (module), 3 process_gff3 (module), 5 process_json_3d (class in process_json_3d), 7 process_json_3d (module), 6 process_wig (class in process_wig), 5, 6 process_wig (module), 4 R run() (mg_process_files.tool.bed_indexer.bedindexertool method), 10 U 25