Package tatest. July 18, 2018

Similar documents
Package ETC. February 19, 2015

Package starank. August 3, 2013

What R is. STAT:5400 (22S:166) Computing in Statistics

Package condir. R topics documented: February 15, 2017

Package pwrab. R topics documented: June 6, Type Package Title Power Analysis for AB Testing Version 0.1.0

Package clusterpower

Package RobustRankAggreg

ROTS: Reproducibility Optimized Test Statistic

Package GSRI. March 31, 2019

Package starank. May 11, 2018

Package binmto. February 19, 2015

Package infer. July 11, Type Package Title Tidy Statistical Inference Version 0.3.0

The ctest Package. January 3, 2000

Package pcr. November 20, 2017

Package merror. November 3, 2015

Automated Bioinformatics Analysis System on Chip ABASOC. version 1.1

Package Tnseq. April 13, 2017

Package irrna. April 11, 2018

Package PIPS. February 19, 2015

Package enrich. September 3, 2013

Package mgc. April 13, 2018

Package orqa. R topics documented: February 20, Type Package

Package texteffect. November 27, 2017

Package gee. June 29, 2015

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Package tpr. R topics documented: February 20, Type Package. Title Temporal Process Regression. Version

Package ArrayBin. February 19, 2015

Package gpdtest. February 19, 2015

Package frt. R topics documented: February 19, 2015

Package intccr. September 12, 2017

Package polyphemus. February 15, 2013

Package INCATome. October 5, 2017

Package rococo. October 12, 2018

Package DPBBM. September 29, 2016

Package LncPath. May 16, 2016

Package cycle. March 30, 2019

Package BoolFilter. January 9, 2017

Package IntNMF. R topics documented: July 19, 2018

Package roar. August 31, 2018

Package CONOR. August 29, 2013

Package Linnorm. October 12, 2016

Package TANOVA. R topics documented: February 19, Version Date

Package mmeta. R topics documented: March 28, 2017

The binmto Package. August 25, 2007

Package clusterrepro

Package distdichor. R topics documented: September 24, Type Package

Package PTE. October 10, 2017

Package svapls. February 20, 2015

Package kirby21.base

Package OrderedList. December 31, 2017

Package oaxaca. January 3, Index 11

Package addhaz. September 26, 2018

Package endogenous. October 29, 2016

Package twilight. August 3, 2013

Package NBLDA. July 1, 2018

Package GFD. January 4, 2018

Package corclass. R topics documented: January 20, 2016

The Power and Sample Size Application

Package FHtest. November 8, 2017

Package treedater. R topics documented: May 4, Type Package

Package rococo. August 29, 2013

Package ClusterBootstrap

Package apastyle. March 29, 2017

Package bisect. April 16, 2018

Package anidom. July 25, 2017

Package genelistpie. February 19, 2015

Package swcrtdesign. February 12, 2018

Package nlnet. April 8, 2018

Package simr. April 30, 2018

Package ECctmc. May 1, 2018

Package st. July 8, 2015

Package EFS. R topics documented:

Package arulesnbminer

Package nplr. August 1, 2015

Package GCAI.bias. February 19, 2015

Package StructFDR. April 13, 2017

STATS PAD USER MANUAL

Package cgh. R topics documented: February 19, 2015

Package ssizerna. January 9, 2017

Package Modeler. May 18, 2018

Fathom Dynamic Data TM Version 2 Specifications

Package fbroc. August 29, 2016

Package TANDEM. R topics documented: June 15, Type Package

Package rmi. R topics documented: August 2, Title Mutual Information Estimators Version Author Isaac Michaud [cre, aut]

Package CINID. February 19, 2015

Package Canopy. April 8, 2017

Package bdots. March 12, 2018

Package RNAseqNet. May 16, 2017

Introduction to R (BaRC Hot Topics)

Package G1DBN. R topics documented: February 19, Version Date

Package binequality. July 9, 2017

How to use the DEGseq Package

Package EDDA. April 14, 2019

Package sigqc. June 13, 2018

Package EMDomics. November 11, 2018

Package tidystats. May 6, 2018

Package ThreeArmedTrials

Application of Hierarchical Clustering to Find Expression Modules in Cancer

Package InformativeCensoring

Package simctest. March 23, 2017

Transcription:

Type Package Title Two-Group Ta-Test Version 1.0 Date 2018-07-10 Author Yuan-De Tan Package tatest July 18, 2018 Maintainer Yuan-De Tan <tanyuande@gmail.com> The ta-test is a modified two-sample or two-group t-test of Gosset (1908). In small samples with less than 15 replicates,the ta-test significantly reduces type I error rate but has almost the same power with the t-test and hence can greatly enhance reliability or reproducibility of discoveries in biology and medicine. The ta-test can test single null hypothesis or multiple null hypotheses without needing to correct p-values. License GPL-3 Depends R (>= 3.3.0), stats, base,utils LazyLoad yes NeedsCompilation no Repository CRAN Date/Publication 2018-07-18 11:30:05 UTC R topics documented: tatest-package........................................ 2 Ecadherin.......................................... 2 momega........................................... 3 omega............................................ 4 rhov............................................. 6 ta.test............................................ 7 ttest............................................. 9 Index 12 1

2 Ecadherin tatest-package Two-Group T _α-test The t α -test is a modified two-sample or two-group t-test of Gosset(1908). The package tatest consists of three t-tests: ttest, tatest and mtatest. ttest, rhov,and omega construct tatest, ttest and rhov and momega construct mtatest. t α -test is used to test for a single null hypothesis and mt α -test is used to identify differential expressed genes in microarray data and RPPA data or proteomic data. Author(s) Yuande Tan <tanyuande@gmail.com> Yuande Tan Maintainer: <tanyuande@gmail.com> See Also t.test X<-c(112,122,108,127) Y<-c(302, 314,322,328) dat<-cbind(x,y) ta.test(x=dat,na=4,nb=4, eqv=true, paired=false, alpha=0.05, alternative = "two.sided",log="null") t.test(x=x,y=y) ttest(x,y,alternative = "two.sided", LG = "NULL") Ecadherin A microarray data Analysis of the effects of loss of E-cadherin and cell adhesion on human mammary epithelial cells Usage data("ecadherin")

momega 3 Format A data frame with 5000 observations on the following 7 variables. geneid a factor with levels shcntrl_rep1 a numeric vector shcntrl_rep2 a numeric vector shcntrl_rep3 a numeric vector shecad_rep1 a numeric vector shecad_rep2 a numeric vector shecad_rep3 a numeric vector Loss of the epithelial adhesion molecule E-cadherin is thought to enable metastasis by disrupting intercellular contacts-an early step in metastatic dissemination. To further investigate the molecular basis of this notion, two methods were used to inhibit E-cadherin function that distinguish between E-cadherin s cell-cell adhesion and intracellular signaling functions. While the disruption of cell-cell contacts alone does not enable metastasis, the loss of E-cadherin protein does, through induction of an epithelial-to-mesenchymal transition, invasiveness and anoikis-resistance. In addition, gene expression analysis shows that E-cadherin loss results in the induction of multiple transcription factors, at least one of which, Twist, is necessary for E-cadherin loss-induced metastasis. These findings indicate that E-cadherin loss in tumors contributes to metastatic dissemination by inducing wide-ranging transcriptional and functional changes. This dataset consists 5000 of 22277 gene libraries, two conditions shcntrl and shecad, each having 3 replicates. With access number GSE9691, you can obtain a whole data of 22277 gene libraries in GEO. data(ecadherin) momega Null rho for testing multiple null hypotheses Momega is a function that is used to estimate omega for testing multiple null hypotheses. Omege is a null ρ that is used as a threshold for ρ from real data. Since this omega is majorly applied to micoarray, we named this function as momega. Usage momega(dat, na, nb, distr = "norm")

4 omega Arguments dat Value a matrix dataset containing string columns and two-condition numeric columns na numeric value, size of sample 1 or number of replicates in condition A. nb numeric value, size of sample 1 or number of replicates in condition A. distr string, data distribution. This function is to use null data to calculate omega value with rho = 1. return a numeric value Author(s) Yuan-De Tan <tanyuande@gmail.com> References Yuan-De Tan Anita M. Chandler, Arindam Chaudhury, and Joel R. Neilson (2015) A Powerful Statistical Approach for Large-scale Differential Transcription Analysis. Plos One. 2015 DOI: 10.1371/journal.pone.0123658. See Also omega,rhov data(ecadherin) Ecad<-as.data.frame(Ecadherin) Ecad1<-Ecad[,-1] w<-momega(dat=ecad1[1:2000,], na=3, nb=3, distr = "norm") omega Omega calcularion Usage Omega (ω) function is a function that is used to estimate omega using simulate null data from negative bionomial distribution. Omege is a null rho that is used as a threshold for real rho. Simulation is dependent on the original data. omega(xa, XB, na, nb, m, alpha = 0.05, distr = "norm")

omega 5 Arguments XA XB a numeric vector values of condition A a numeric vector values of condition b na a numeric value, sample size in A. nb m a numeric value, sample size in B simulation number specified. alpha a numeric value, statistical cutoff. Default is 0.05. distr data distribution specified for data. For current version, we consider three distributions: normal, negative binomial (NB) and uniform. Default distribution is "norm". If user believe that the data follow negative binomial distribution, then set distr="nb" or "negative binomial" or if the data follow uniform distribution, then set distr="unif"or "uniform". This function is to use simulated null data to calculate omega value with rho = 1. Value return a numeric value Author(s) Yuan-De Tan <tanyuande@gmail.com> References Yuan-De Tan Anita M. Chandler, Arindam Chaudhury, and Joel R. Neilson(2015) A Powerful Statistical Approach for Large-scale Differential Transcription Analysis. Plos One. 2015 DOI: 10.1371/journal.pone.0123658. See Also momega, rhov X<-c(112,122,108,127) Y<-c(302, 314,322,328) omega(xa=x, XB=Y, na=4, nb=4, m=2000, alpha = 0.05) #[1] 0.9055152 omega(xa=x, XB=Y, na=4, nb=4, m=2000, alpha = 0.05,distr="NB") #[1] 0.8995424 omega(xa=x, XB=Y, na=4, nb=4, m=2000, alpha = 0.05,distr="uniform") #[1] 0.97194

6 rhov rhov Calculation of rho (ρ) value Usage The rho is used to define a gap and variation between two datasets, similar to fc(fold change). rhov(xa, XB) Arguments Value XA a numeric vector in condition A. XB a numeric vector in condition B. The rho is defined as square root of products of and, that is ψ ζ ρ = ψζ where ψ is used to define a overlap between two datasets and ζ is used to define effect variation. ρ > 1 indicates that there is gap between two datasets and noise is smaller than treatment effect, otherwise, there is overall lap and big noise in two datasets. return a numeric value Author(s) Yuan-De Tan <tanyuande@gmail.com> References Yuan-De Tan Anita M. Chandler, Arindam Chaudhury, and Joel R. Neilson(2015) A Powerful Statistical Approach for Large-scale Differential Transcription Analysis. Plos One. 2015 DOI: 10.1371/journal.pone.0123658. See Also omega

ta.test 7 X<-c(112,122, 108,127) Y<-c(302, 314, 322, 328) rhov(xa=x,xb=y) #[1] 3.062173 ta.test t_α-test for testing difference between two conditions with small samples Usage t α -test is a modified t-test of Gosset(1908). ta.test(x, nci=null, na, nb, alpha=0.05, paired=false, eqv=true, LOG=c("NULL","LOG2","LOG","LOG10"), alternative = c("two.sided", "less", "greater"), distr="norm") Arguments X nci na nb alpha paired eqv LOG alternative a numeric vector data or a numeric matrix (two columns) dataset with two conditions or groups A and B having sample size na and nb, respectively. X is also a datasheet of multiple variables (for example, genes ) with two groups A and B and information columns(see an example Ecadherin) a numeric int value, column numbers for strings or information of data. If X is a numeric vector data or a numeric matrix (two columns) dataset, then nci=null, otherwise, nci is non-zeror int, meaning that at least one column is reserved as information of data numeric constant, sample size in condition A numeric constant, sample size in condition B a numeric constant, statistical cutoff for significance level. If dataset X is multiple variables (rows) (variable number is over 1000), then alpha is taken default values of 0.05 and 0.01. a logical indicating whether you want a paired t-test. a logical variable indicating whether to treat the two variances as being equal. If TRUE then the pooled variance is used to estimate the variance otherwise the Welch (or Satterthwaite) approximation to the degrees of freedom is used. logarithm: if data are not taken logarithm convertion, then LOG=NULL, if data are converted into logarithm with base 10, 2 or natural logarithm, then correspondingly, LOG="LOG10", "LOG2" or "LOG". a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less". User can specify just the initial letter.

8 ta.test distr data distribution for estimating omega (ω). Default is normal distribution. In current version, we just consider three types of distributions: "negative distribution"(or "NB"), "norm"(or "normal"), and "unif"("uniform"). User can choose one of them according to user s data distribution. The formula interface is only applicable for the 2-sample tests. alternative = "greater" is the alternative that x has a larger mean than y. If paired is TRUE then both x and y must be specified and they must be the same length. Missing values are silently removed (in pairs if paired is TRUE). If var.equal is TRUE then the pooled estimate of the variance is used. By default, if var.equal is FALSE, then the variance is estimated separately for both groups and the Welch modification to the degrees of freedom is used. If the input data are effectively constant (compared to the larger of the two means) an error is generated. When pooled variance is larger, ttest is equvalent to t.test; When pooled variance is small, t-tavalue of ttest is smaller than than that of t.test; When the pooled variance is zeror, t-value of t.test is not defined but tvalue of ttest is still defined. Omega is dependent on alpha value. ta.test can be used to test for differentially expressed genes in microarray data, proteomic data and RPPA data but cannot be used to count data of RNA-seq reads. Value If X is single-hypothesis test data, then a list similar to t.test containing the following components: statistic the value of the t-statistic. parameter the degrees of freedom for the t-statistic. p.value the p-alue for the test. conf.int a confidence interval for the mean appropriate to the specified alternative hypothesis. estimate the estimated mean or difference in means depending on whether it was a one-sample test or a twosample test. null.value the specified hypothesized value of the mean or mean difference depending on whether it was a one-sample test or a two-sample test. alternative a character string describing the alternative hypothesis. method a character string indicating what type of t-test was performed. data.name a character string giving the name(s) of the data. omega a threshold for rho in tatest. If X is microarray, RPPA data, then a list is datasheet including t01,pv01, t05 and p05 columns

ttest 9 Note LOG choice must be the same with log transformation of data. Author(s) Yuande Tan <tanyuande@gmail.com> References See Also Gosset, W.S, "Probable error of a correlation coefficient". Biometrika. 1908,6 (2/3): 302-310. t.test,ttest X<-c(112,122,108,127) Y<-c(302, 314,322,328) dat<-cbind(x,y) ta.test(x=dat,nci=null, na=4,nb=4, eqv=true, alpha=0.05, LOG="NULL", alternative = "two.sided") data(ecadherin) res<-ta.test(x=ecadherin[1:2000,],nci=1, na=3,nb=3, eqv=true, LOG="LOG2", alternative = "two.sided") ttest A modified t-test Usage ttest is a modified t-test of Gosset(1098). ttest(x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95, LG = c("null", "LOG2", "LOG", "LOG10"),...) Arguments x y alternative a (non-empty) numeric vector of data values. an optional (non-empty) numeric vector of data values. For t α -test, y is required a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less". You can specify just the initial letter.

10 ttest mu paired var.equal conf.level LG a number indicating the true value of the mean (or difference in means if you are performing a two sample test). a logical indicating whether you want a paired t-test. a logical variable indicating whether to treat the two variances as being equal. If TRUE then the pooled variance is used to estimate the variance otherwise the Welch (or Satterthwaite) approximation to the degrees of freedom is used. confidence level of the interval. logarithm: if x and y are not taken logarithm convertion, the LG=NULL, if x and y are converted into logarithm with base 10, 2 or natural logarithm, then correspondingly, LG="LOG10",... further arguments to be passed to or from methods. The formula interface is only applicable for the 2-sample tests. alternative = "greater" is the alternative that x has a larger mean than y. If paired is TRUE then both x and y must be specified and they must be the same length. Missing values are silently removed (in pairs if paired is TRUE). If var.equal is TRUE then the pooled estimate of the variance is used. By default, if var.equal is FALSE, then the variance is estimated separately for both groups and the Welch modification to the degrees of freedom is used. If the input data are effectively constant (compared to the larger of the two means) an error is generated. When pooled variance is larger, ttest is equvalent to t.test; When pooled variance is small, t-tavalue of ttest is smaller than than that of t.test; When the pooled variance is zeror, t-value of t.test is not defined but tvalue of ttest is still defined. Value A list with class "htest" containing the following components: statistic the value of the t-statistic. parameter the degrees of freedom for the t-statistic. p.value the p-value for the test. conf.int a confidence interval for the mean appropriate to the specified alternative hypothesis. estimate the estimated mean or difference in means depending on whether it was a one-sample test or a twosample test. null.value the specified hypothesized value of the mean or mean difference depending on whether it was a one-sample test or a two-sample test. alternative a character string describing the alternative hypothesis. method

ttest 11 a character string indicating what type of t-test was performed. data.name a character string giving the name(s) of the data. Author(s) Yuande Tan <tanyuande@gmail.com> References See Also Gosset, W.S, "Probable error of a correlation coefficient". Biometrika. 1908,6(2/3): 302-310. stats{t.test}, ta.test # The following example is log10 transformed data YA<-c(4.1455383,4.7186418,4.6483316) YB<-c(6.9184465,5.609667,6.5920944) t.test(x=ya, y=yb) ttest(ya,yb,alternative = "two.sided", LG = "NULL") ttest(ya,yb,alternative = "two.sided", LG = "LOG2") ttest(ya,yb,alternative = "two.sided", LG = "LOG10")

Index Topic dataset Ecadherin, 2 Topic gap rhov, 6 Topic multiple tests momega, 3 Topic noise rhov, 6 Topic null rho momega, 3 Topic rhov ta.test, 7 Topic rho omega, 4 Topic t.test ttest, 9 Topic tatest tatest-package, 2 Topic thresholds omega, 4 Topic ttest ta.test, 7 ttest, 9 Ecadherin, 2 momega, 3, 5 omega, 4, 4, 6 rhov, 4, 5, 6 stats, 11 t.test, 2, 9 ta.test, 7, 11 tatest-package, 2 ttest, 9, 9 12