The ExactNumCI Package Document. A Technical Document. written by DEQIANG SUN. Baylor College of Medicine. June 12, For ExactNumCI v1.2.

Similar documents
MACAU User Manual. Xiang Zhou. March 15, 2017

Dealing with Categorical Data Types in a Designed Experiment

PubMed Assistant: A Biologist-Friendly Interface for Enhanced PubMed Search

Categorical Data in a Designed Experiment Part 2: Sizing with a Binary Response

ClaNC: The Manual (v1.1)

Confidence Intervals. Dennis Sun Data 301

metilene - a tool for fast and sensitive detection of differential DNA methylation

Ensembl Core API. EMBL European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD, UK

Package bisect. April 16, 2018

StatsMate. User Guide

Package ArrayBin. February 19, 2015

Package sglr. February 20, 2015

Classification of Protein Crystallization Imagery

Statistical Analysis of List Experiments

COPYRIGHTED MATERIAL. ExpDesign Studio 1.1 INTRODUCTION

Package BlakerCI. August 20, 2015

Biostatistics 615/815 Lecture 13: R packages, and Matrix Library

Programmable Peer-to-Peer Systems

PhD: a web database application for phenotype data management

STA215 Inference about comparing two populations

Package nfca. February 20, 2015

Workshop 6: DNA Methylation Analysis using Bisulfite Sequencing. Fides D Lay UCLA QCB Fellow

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Package binmto. February 19, 2015

BASTA Bayesian statistical tissue profiling using DNA copy number amplifications (Manual updated )

An Excel Add-In for Capturing Simulation Statistics

R- installation and adminstration under Linux for dummie

Installing Lemur on Mac OS X and CSE Systems

Easy visualization of the read coverage using the CoverageView package

Package epitab. July 4, 2018

A Survey of Statistical Modeling Tools

Package PTE. October 10, 2017

cgatools Installation Guide

Package Development in Windows

MAC LAYER MISBEHAVIOR EFFECTIVENESS AND COLLECTIVE AGGRESSIVE REACTION APPROACH. Department of Electrical Engineering and Computer Science

BUCKy Bayesian Untangling of Concordance Knots (applied to yeast and other organisms)

A. Configuring Citavi

PROTEOMIC COMMAND LINE SOLUTION. Linux User Guide December, B i. Bioinformatics Solutions Inc.

Lecture Objectives. Structured Programming & an Introduction to Error. Review the basic good habits of programming

Introduction to SparseGrid

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha

The Expected Performance Curve: a New Assessment Measure for Person Authentication

boost Documentation Release 0.1 Carl Chenet

Package NNTbiomarker

TFM-Explorer user manual

USING BRAT-BW Table 1. Feature comparison of BRAT-bw, BRAT-large, Bismark and BS Seeker (as of on March, 2012)

Package CVR. March 22, 2017

Package samplesizelogisticcasecontrol

Solr Installation User Guide. Solr Installation Brainvire Infotech Pvt. Ltd

The Expected Performance Curve: a New Assessment Measure for Person Authentication

Conda Documentation. Release latest

Beyond the Assumption of Constant Hazard Rate in Estimating Incidence Rate on Current Status Data with Applications to Phase IV Cancer Trial

The Power and Sample Size Application

Setting up an SDK for Secondo

Installation and Upgrade Guide Zend Studio 9.x

Evaluating generalization (validation) Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support

The binmto Package. August 25, 2007

Cecil Jones Academy Mathematics Fundamentals

Package indelmiss. March 22, 2019

STATA 13 INTRODUCTION

1 RefresheR. Figure 1.1: Soy ice cream flavor preferences

Installing the Quantum ESPRESSO distribution

Binary Diagnostic Tests Clustered Samples

Statistics With Confidence: Confidence Intervals And Statistical Guidelines (Book With Diskette For Windows 95, 98, NT) READ ONLINE

Installation and Upgrade Guide Zend Studio 9.x

The preseq Manual. Timothy Daley Victoria Helus Andrew Smith. January 17, 2014

HybridCheck User Manual

Package UnivRNG. R topics documented: January 10, Type Package

7/2/2013 R packaging with Rstudio Topics:

ALGORITHM USER GUIDE FOR RVD

Workshop 8: Model selection

Introduction to QuickMath

Frequentist and Bayesian Interim Analysis in Clinical Trials: Group Sequential Testing and Posterior Predictive Probability Monitoring Using SAS

Release Note. Agilent Genomic Workbench Standard Edition

Package RcppBDT. August 29, 2016

Sep. Guide. Edico Genome Corp North Torrey Pines Court, Plaza Level, La Jolla, CA 92037

USING BRAT UPDATES 2 SYSTEM AND SPACE REQUIREMENTS

Package DPBBM. September 29, 2016

User's guide: Manual for V-Xtractor 2.0

A Macro Application on Confidence Intervals for Binominal Proportion

CPSC : Program 3, Perceptron and Backpropagation

Package bayesdp. July 10, 2018

Package Tnseq. April 13, 2017

Kyoto Constella Technologies Co., Ltd. CzeekS Manual

Package clusterpower

Salesforce DX Setup Guide

= = P. IE 434 Homework 2 Process Capability. Kate Gilland 10/2/13. Figure 1: Capability Analysis

MACHINE LEARNED BOUNDARY DEFINITIONS... The True Story of A Ten-Year Trail Across the Ph.D. Plains

An Introduction to Management Science, 12e. Instructions for Using Excel 2007

User Manual for TreeMix v1.1. Joseph K. Pickrell, Jonathan K. Pritchard

Package RcppArmadillo

SigmaXL Feature List Summary, What s New in Versions 6.0, 6.1 & 6.2, Installation Notes, System Requirements and Getting Help

Constrained Optimal Sample Allocation in Multilevel Randomized Experiments Using PowerUpR

CPSC : Program 3, Perceptron and Backpropagation

Development Environment of Embedded System

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

Chisel floor levels and doom3 botlib experimentation, documentation and extension

Software Documentation of the Potential Support Vector Machine

Package ssd. February 20, Index 5

Package rplotengine. R topics documented: August 8, 2018

Transcription:

The ExactNumCI Package Document A Technical Document written by DEQIANG SUN Baylor College of Medicine June 12, 2013 For ExactNumCI v1.2.1 Contact: deqiangs@bcm.edu

ii Abstract ExactNumCI: Exact Numerical Confidence Interval for Binomial Proportions Keywords: Bioinformatics, Biostatistics, Methylation, Hydroxymethylation, Binomial Proportion, Exact Confidence Interval The package ExactNumCI, available in C++ and R, provides EXACT numerical modeling for inference of Binomial Proportions. In addition it calculates the Confidence Interval (CI) for single binomial proportion, CI for difference of two binomial proportions, and CI for difference of difference. In terms of EXACT, its result is only restrained by numerical precision tolerance but not by any assumption or approximation. The method is directly applied in DNA methylation and hydroxymethylation digital analysis, for example in package MOABS.

1 TABLE OF CONTENTS CHAPTER Page 1 OVERVIEW............................ 2 1. Introduction......................... 2 2. Summary of available tools............... 2 3. Implementation and algorithmic approach...... 3 4. License and Availability................. 3 5. Cite ExactNumCI..................... 4 6. Contact............................ 4 2 INSTALLATION......................... 5 1. Installation of C++ version............... 5 2. Installation of R version................. 5 3 MANUAL............................. 7 1. Usage In R......................... 7 2. Usage of C++ binaries.................. 9 4 MISC................................ 10 1. News............................. 10 REFERENCES................................... 11

2 CHAPTER 1 OVERVIEW 1. Introduction The development was motivated by a need of exact calculation of confidence interval between two methylation ratios in bioinformatics analysis of high throughput whole genome bisulfite sequencing data. For more details of the methods please refer to the Chapter??. The package ExactNumCI, available in C++ and R, provides EXACT numerical modeling for inference of Binomial Proportions. In addition it calculates the Confidence Interval (CI) for single binomial proportion, CI for difference of two binomial proportions, and CI for difference of difference. In terms of EXACT, its result is only restrained by numerical precision tolerance but not by any assumption or approximation. The method is directly applied in DNA methylation and hydroxymethylation digital analysis, for example in package MOABS. 2. Summary of available tools There are 4 major functions available from R version. singleci(k, n) gives the CI for binomial proportion given (k, n). pdiff(k 1, n 1, k 2, n 2, d)

3 gives the probability that the difference of p 1 from p 2 is greater than d. pdiffci(k 1, n 1, k 2, n 2 ) gives the CI for difference of two binomial proportions given (k 1, n 1, k 2, n 2 ). dodci(k 1, n 1, k 2, n 2, K 1, N 1, K 2, N 2 ) gives the CI for difference of difference (p 1 p 2 ) (P 1 P 2 ). At default, a uniform priori distribution of p is assumed. You may also specify the priori distribution parameter α 0, β 0 by assuming p follows a nonuniform beta distribution beta( 0, β 0 ). 3. Implementation and algorithmic approach ExactNumCI was first implemented in C++ and makes extensive use of data structures and fundamental algorithms from the BOOST and Numerical Recipes (NR) libraries. It s then ported to R through Dirk Eddelbuettel s excellent Rcpp library. 4. License and Availability ExactNumCI (both C++ and R versions) is freely available under a GNU GPL v2 at google code site https://code.google.com/p/exactnumci If you want to use the C++ sources in your project or want to calculate the CI in the Linux/Windows terminal, you may download the C++ sources at http://dldcc-web.brc.bcm.edu/lilab/deqiangs/exactnumci/exactnumci-v1.2.1.tar.gz We also find it convenient to have the same capability in R terminal. We published the R package ExactNumCI through CRAN at http://cran.r-project.org/web/packages/exactnumci/

4 5. Cite ExactNumCI To be updated. 6. Contact The C++ package is developed by Deqiang Sun. The R package is developped by DEQIANG SUN and Hyun Jung Park. Please post any questions, suggestions or problems to the exactnumci google group or send email to Deqiang Sun at exactnumci@googlegroups.com. You are welcome to subscribe to the exactnumci google group for updates.

5 CHAPTER 2 INSTALLATION The C++ version and R version are made independent of each other, though the sources are mostly common. You may install only C++ version, or only R version, or both. 1. Installation of C++ version Make sure your system (Linux/Windows/MacOS) has the environment variable BOOST ROOT correctly set. For example if your BOOST include file is in /share/boost-1.46.1/include/, then you need execute the command export BOOST ROOT=/share/boost-1.46.1. Since it only uses the header files from BOOST, you do not have to build BOOST for installation of ExactNumCI. Commands for installation under Linux/Windows terminal: wget http://dldcc-web.brc.bcm.edu/lilab/deqiangs/exactnumci/exactnumci-v1.2.1.tar.gz tar -zxvf ExactNumCI-v1.2.1.tar.gz cd ExactNumCI-v1.2.1/ make 2. Installation of R version Since the R package is just a call of C++ sources through the R package Rcpp, your R need have Rcpp installed. You may download the binaries for Windows or Mac from CRAN. You may also compile the sources

6 Commands for installation under Linux/Windows terminal: wget http://cran.r-project.org/src/contrib/exactnumci 1.0.0.tar.gz R CMD INSTALL ExactNumCI 1.0.0.tar.gz Or you can install through the R terminal: install.packages( ExactNumCI,dependencies=TRUE)

7 CHAPTER 3 MANUAL Reference manual for R package is available at http://cran.r-project.org/web/packages/exactnumci/. For the example inclusion of sources in other projects, please refer to the MOABS project. 1. Usage In R The function singleci(k, n, α, method) returns the confidence interval of the binomial proportion at observance of the number of success k, the number of trials n, the significance level α, and the specified method of boundary condition. The binomial proportion p follows a Beta distribution Be(p; k +1, n k +1) under a uniform priori. By specifying the area under curve to be 1 α and the method 1 for boundary condition (minimal length of CI), the confidence interval of p is calculated. The parameter method 1 is the only currently implemented method. Setting α = 0.05 returns the commonly used 95% CI. > library(exactnumci) > singleci( 5, 10, 0.05,1) $a [1] 0.2337936 $b [1] 0.7662064

8 > singleci( 2, 10, 0.05,1) $a [1] 0 $b [1] 0.4700868 The function pdiff(k 1, n 1, k 2, n 2, d, tolerance) returns the probability that the difference of two independent binomial proportions is greater than d. Here k i is the success and n i is the total trials from sampling the binomial proportion p i. p 1 and p 2 are independent. The numerical precision tolerance is set at 1e-16 and will be dropped in later versions. > pdiff(5, 10, 2, 10, 0.75, 1e-16) [1] 0.0008290105 > pdiff(5, 10, 2, 10, 0.25, 1e-16) [1] 0.5107698 The function pdiffci(k 1, n 1, k 2, n 2, α, method) returns the confidence interval of the difference between the two binomial proportion at each observance of the number of success and the number of trials, significance level alpha, and specified boundary condition. The distribution for p 1 p 2 follows a joint probability which is numerically determined. The CI is calculated in a similar way as the function singleci(k, n, α, method). > pdiffci(5, 10, 2, 10, 0.05, 1)

9 $a [1] -0.06092909 $b [1] 0.543384 2. Usage of C++ binaries

10 CHAPTER 4 MISC 1. News 2013.06.15 The documentation is updated for ExactNumCI-v1.2.1. I have used two good references for writing latex and bibtex codes: http://groups.mrl.uiuc.edu/chiang/czoschke/latex.html and http://schneider.ncifcrf.gov/latex.html. The R command output is generated by Sweave.

REFERENCES 11