The ExactNumCI Package Document A Technical Document written by DEQIANG SUN Baylor College of Medicine June 12, 2013 For ExactNumCI v1.2.1 Contact: deqiangs@bcm.edu
ii Abstract ExactNumCI: Exact Numerical Confidence Interval for Binomial Proportions Keywords: Bioinformatics, Biostatistics, Methylation, Hydroxymethylation, Binomial Proportion, Exact Confidence Interval The package ExactNumCI, available in C++ and R, provides EXACT numerical modeling for inference of Binomial Proportions. In addition it calculates the Confidence Interval (CI) for single binomial proportion, CI for difference of two binomial proportions, and CI for difference of difference. In terms of EXACT, its result is only restrained by numerical precision tolerance but not by any assumption or approximation. The method is directly applied in DNA methylation and hydroxymethylation digital analysis, for example in package MOABS.
1 TABLE OF CONTENTS CHAPTER Page 1 OVERVIEW............................ 2 1. Introduction......................... 2 2. Summary of available tools............... 2 3. Implementation and algorithmic approach...... 3 4. License and Availability................. 3 5. Cite ExactNumCI..................... 4 6. Contact............................ 4 2 INSTALLATION......................... 5 1. Installation of C++ version............... 5 2. Installation of R version................. 5 3 MANUAL............................. 7 1. Usage In R......................... 7 2. Usage of C++ binaries.................. 9 4 MISC................................ 10 1. News............................. 10 REFERENCES................................... 11
2 CHAPTER 1 OVERVIEW 1. Introduction The development was motivated by a need of exact calculation of confidence interval between two methylation ratios in bioinformatics analysis of high throughput whole genome bisulfite sequencing data. For more details of the methods please refer to the Chapter??. The package ExactNumCI, available in C++ and R, provides EXACT numerical modeling for inference of Binomial Proportions. In addition it calculates the Confidence Interval (CI) for single binomial proportion, CI for difference of two binomial proportions, and CI for difference of difference. In terms of EXACT, its result is only restrained by numerical precision tolerance but not by any assumption or approximation. The method is directly applied in DNA methylation and hydroxymethylation digital analysis, for example in package MOABS. 2. Summary of available tools There are 4 major functions available from R version. singleci(k, n) gives the CI for binomial proportion given (k, n). pdiff(k 1, n 1, k 2, n 2, d)
3 gives the probability that the difference of p 1 from p 2 is greater than d. pdiffci(k 1, n 1, k 2, n 2 ) gives the CI for difference of two binomial proportions given (k 1, n 1, k 2, n 2 ). dodci(k 1, n 1, k 2, n 2, K 1, N 1, K 2, N 2 ) gives the CI for difference of difference (p 1 p 2 ) (P 1 P 2 ). At default, a uniform priori distribution of p is assumed. You may also specify the priori distribution parameter α 0, β 0 by assuming p follows a nonuniform beta distribution beta( 0, β 0 ). 3. Implementation and algorithmic approach ExactNumCI was first implemented in C++ and makes extensive use of data structures and fundamental algorithms from the BOOST and Numerical Recipes (NR) libraries. It s then ported to R through Dirk Eddelbuettel s excellent Rcpp library. 4. License and Availability ExactNumCI (both C++ and R versions) is freely available under a GNU GPL v2 at google code site https://code.google.com/p/exactnumci If you want to use the C++ sources in your project or want to calculate the CI in the Linux/Windows terminal, you may download the C++ sources at http://dldcc-web.brc.bcm.edu/lilab/deqiangs/exactnumci/exactnumci-v1.2.1.tar.gz We also find it convenient to have the same capability in R terminal. We published the R package ExactNumCI through CRAN at http://cran.r-project.org/web/packages/exactnumci/
4 5. Cite ExactNumCI To be updated. 6. Contact The C++ package is developed by Deqiang Sun. The R package is developped by DEQIANG SUN and Hyun Jung Park. Please post any questions, suggestions or problems to the exactnumci google group or send email to Deqiang Sun at exactnumci@googlegroups.com. You are welcome to subscribe to the exactnumci google group for updates.
5 CHAPTER 2 INSTALLATION The C++ version and R version are made independent of each other, though the sources are mostly common. You may install only C++ version, or only R version, or both. 1. Installation of C++ version Make sure your system (Linux/Windows/MacOS) has the environment variable BOOST ROOT correctly set. For example if your BOOST include file is in /share/boost-1.46.1/include/, then you need execute the command export BOOST ROOT=/share/boost-1.46.1. Since it only uses the header files from BOOST, you do not have to build BOOST for installation of ExactNumCI. Commands for installation under Linux/Windows terminal: wget http://dldcc-web.brc.bcm.edu/lilab/deqiangs/exactnumci/exactnumci-v1.2.1.tar.gz tar -zxvf ExactNumCI-v1.2.1.tar.gz cd ExactNumCI-v1.2.1/ make 2. Installation of R version Since the R package is just a call of C++ sources through the R package Rcpp, your R need have Rcpp installed. You may download the binaries for Windows or Mac from CRAN. You may also compile the sources
6 Commands for installation under Linux/Windows terminal: wget http://cran.r-project.org/src/contrib/exactnumci 1.0.0.tar.gz R CMD INSTALL ExactNumCI 1.0.0.tar.gz Or you can install through the R terminal: install.packages( ExactNumCI,dependencies=TRUE)
7 CHAPTER 3 MANUAL Reference manual for R package is available at http://cran.r-project.org/web/packages/exactnumci/. For the example inclusion of sources in other projects, please refer to the MOABS project. 1. Usage In R The function singleci(k, n, α, method) returns the confidence interval of the binomial proportion at observance of the number of success k, the number of trials n, the significance level α, and the specified method of boundary condition. The binomial proportion p follows a Beta distribution Be(p; k +1, n k +1) under a uniform priori. By specifying the area under curve to be 1 α and the method 1 for boundary condition (minimal length of CI), the confidence interval of p is calculated. The parameter method 1 is the only currently implemented method. Setting α = 0.05 returns the commonly used 95% CI. > library(exactnumci) > singleci( 5, 10, 0.05,1) $a [1] 0.2337936 $b [1] 0.7662064
8 > singleci( 2, 10, 0.05,1) $a [1] 0 $b [1] 0.4700868 The function pdiff(k 1, n 1, k 2, n 2, d, tolerance) returns the probability that the difference of two independent binomial proportions is greater than d. Here k i is the success and n i is the total trials from sampling the binomial proportion p i. p 1 and p 2 are independent. The numerical precision tolerance is set at 1e-16 and will be dropped in later versions. > pdiff(5, 10, 2, 10, 0.75, 1e-16) [1] 0.0008290105 > pdiff(5, 10, 2, 10, 0.25, 1e-16) [1] 0.5107698 The function pdiffci(k 1, n 1, k 2, n 2, α, method) returns the confidence interval of the difference between the two binomial proportion at each observance of the number of success and the number of trials, significance level alpha, and specified boundary condition. The distribution for p 1 p 2 follows a joint probability which is numerically determined. The CI is calculated in a similar way as the function singleci(k, n, α, method). > pdiffci(5, 10, 2, 10, 0.05, 1)
9 $a [1] -0.06092909 $b [1] 0.543384 2. Usage of C++ binaries
10 CHAPTER 4 MISC 1. News 2013.06.15 The documentation is updated for ExactNumCI-v1.2.1. I have used two good references for writing latex and bibtex codes: http://groups.mrl.uiuc.edu/chiang/czoschke/latex.html and http://schneider.ncifcrf.gov/latex.html. The R command output is generated by Sweave.
REFERENCES 11