The ctest Package. January 3, 2000

Size: px

Start display at page:

Download "The ctest Package. January 3, 2000"

Janice Robinson
5 years ago
Views:

1 R objects documented: The ctest Package January 3, 2000 bartlett.test binom.test cor.test fisher.test friedman.test kruskal.test ks.test mantelhaen.test mcnemar.test var.test wilco.test mood.test ansari.test shapiro.test bartlett.test Bartlett Test for Homogeneity of Variances bartlett.test bartlett.test performs Bartlett s test of the null that the variances in each of the groups (samples) are the same. bartlett.test(, g) g a numeric vector of data values, or a list of numeric data vectors. a vector or factor object giving the group for the corresponding elements of. Ignored if is a list. 1

2 2 binom.test If is a list, its elements are taken as the samples to be compared for homogeneity of variances, and hence have to be numeric data vectors. In this case, g is ignored, and one can simply use bartlett.test() to perform the test. If the samples are not yet contained in a list, use bartlett.test(list(,...)). Otherwise, must be a numeric data vector, and g must be a vector or factor object of the same length as giving the group for the corresponding elements of. A list of class "htest" containing the following components: parameter Bartlett s K-square test. the degrees of freedom of the approimate chi-square distribution of the test. the string "Bartlett test for homogeneity of variances". binom.test Eact Binomial Test binom.test binom.test performs an eact test of the null that the probability of success in a Bernoulli eperiment of length n is p, based on the number of successes observed. binom.test(, n, p = 0.5, = "two.sided") n p number of successes. number of trials. probability of success. indicates the hypothesis and must be one of "two.sided", "greater" or "less". You can specify just the initial letter. the number of successes,. parameter the number of trials, n. null.value the probability of success under the null, p. a character string describing the hypothesis. the string "Eact binomial test".

3 cor.test 3 cor.test Test for Zero Correlation cor.test cor.test tests the null that and y are uncorrelated (independent). cor.test(, y, = "two.sided", = "pearson", eact = NULL), y numeric vectors of data values. and y must have the same length. eact indicates the hypothesis and must be one of "two.sided", "greater" or "less". You can specify just the initial letter. a string indicating which correlation coefficient is used for the test. Must be one of "pearson", "kendall", or "spearman". Only the first character is necessary. a logical indicating whether an eact p-value should be computed. If is "pearson", the test is based on Pearson s product moment correlation coefficient cor(, y) and follows a t distribution with length()-2 degrees of freedom. If is "kendall" or "spearman", Kendall s tau or Spearman s rho, respectively, are used to estimate the correlation. These tests should be used if the data do not necessarily come from a bivariate normal distribution. For Kendall s test, by default (if eact is not specified), an eact p-value is computed if both samples contain less than 50 finite values and there are no ties. Otherwise, the standardized estimate is used as the test, and is approimately normally distributed. For Spearman s test, p-values are computed using algorithm AS 89. parameter estimate the value of the test. the degrees of freedom of the test in the case that it follows a t distribution. the estimated correlation coefficient, with names attribute "cor", "tau", or "rho", correspoding to the employed. null.value the value of the correlation coefficient under the null hypothesis, hence 0. a character string describing the hypothesis. a string indicating how the correlation was estimated

4 4 fisher.test References D. J. Best & D. E. Roberts (1975), Algorithm AS 89: The Upper Tail Probabilities of Spearman s ρ. Applied Statistics, 24, fisher.test Fisher s Eact Test for Count Data fisher.test fisher.test performs Fisher s eact test for testing the null of independence of rows and columns in a contingency table with fied marginals. fisher.test(, y, = "two.sided", workspace = , hybrid = FALSE) y workspace hybrid either a two-dimensional contingency table in matri form, or a factor object. a factor object; ignored if is a matri. indicates the hypothesis and must be one of "two.sided", "greater" or "less". You can specify just the initial letter. Only used in the 2 by 2 case. an integer specifying the size of the workspace used in the network algorithm. a logical indicating whether the eact probabilities (default) or a hybrid approimation thereof should be computed. In the hybrid case, asymptotic chi-square probabilies are only used provided that the Cochran conditions are satisfied. If is a matri, it is taken as a two-dimensional contingency table, and hence its entries should be nonnegative integers. Otherwise, both and y must be vectors of the same length. Incomplete cases are removed, the vectors are coerced into factor objects, and the contingency table is computed from these. In the one-sided 2 by 2 cases, p-values are obtained directly using the hypergeometric distribution. Otherwise, computations are based on a C version of the FORTRAN subroutine FEXACT which implements the network developed by Mehta and Patel (1986) and improved by Clarkson, Fan & Joe (1993). The FORTRAN code can be obtained from a character string describing the hypothesis. the string "Fisher s Eact Test for Count Data".

5 friedman.test 5 References Cyrus R. Mehta & Nitin R. Patel (1986). Algorithm 643. FEXACT: A Fortran subroutine for Fisher s eact test on unordered r*c contingency tables. ACM Transactions on Mathematical Software, 12, Douglas B. Clarkson, Yuan-an Fan & Harry Joe (1993). A Remark on Algorithm 643: FEXACT: An Algorithm for Performing Fisher s Eact Test in r c Contingency Tables. ACM Transactions on Mathematical Software, 19, friedman.test Friedman Rank Sum Test friedman.test Performs a Friedman rank sum test with unreplicated blocked data. friedman.test(y, groups, blocks) y groups blocks either a numeric vector of data values, or a data matri. a vector giving the group for the corresponding elements of y if this is a vector; ignored if y is a matri. If not a factor object, it is coerced to one. a vector giving the block for the corresponding elements of y if this is a vector; ignored if y is a matri. If not a factor object, it is coerced to one. friedman.test can be used for analyzing unreplicated complete block designs (i.e., there is eactly one observation in y for each combination of levels of groups and blocks) where the normality assumption may be violated. The null hypothesis is that apart from an effect of blocks, the location parameter of y is the same in each of the groups. If y is a matri, groups and blocks are obtained from the column and row indices, respectively. NA s are not allowed in groups or blocks; if y contains NA s, corresponding blocks are removed. parameter the value of Friedman s chi-square. the degrees of freedom of the approimate chi-square distribution of the test. the string "Friedman rank sum test".

6 6 kruskal.test kruskal.test Kruskal-Wallis Rank Sum Test kruskal.test Performs a Kruskal-Wallis rank sum test. kruskal.test(, g) g a numeric vector of data values, or a list of numeric data vectors. a vector or factor object giving the group for the corresponding elements of. Ignored if is a list. kruskal.test performs a Kruskal-Wallis rank sum test of the null that the location parameters of the distribution of are the same in each group (sample). The is that they differ in at least one. If is a list, its elements are taken as the samples to be compared, and hence have to be numeric data vectors. In this case, g is ignored, and one can simply use kruskal.test() to perform the test. If the samples are not yet contained in a list, use kruskal.test(list(,...)). Otherwise, must be a numeric data vector, and g must be a vector or factor object of the same length as giving the group for the corresponding elements of. parameter the Kruskal-Wallis rank sum. the degrees of freedom of the approimate chi-square distribution of the test. the string "Kruskal-Wallis rank sum test".

7 ks.test 7 ks.test Kolmogorov-Smirnov Tests ks.test Performs one or two sample Kolmogorov-Smirnov tests. ks.test(, y,..., = "two.sided") y a numeric vector of data values. either a numeric vector of data values, or a character string naming a distribution function.... parameters of the distribution specified by y. indicates the hypothesis and must be one of "two.sided", "greater" or "less". You can specify just the initial letter. If y is numeric, a two sample test of the null that and y were drawn from the same distribution is performed. Alternatively, y can be a character string naming a distribution function. In this case, a one sample test of the null that the distribution function underlying is y with parameters specified by... is carried out. the value of the test. a character string describing the hypothesis. a character string indicating what type of test was performed. a character string giving the name(s) of the data. Eamples <- rnorm(50) y <- runif(30) # Do and y come from the same distribution? ks.test(, y) # Does come from a gamma distribution with shape 3 and scale 2? ks.test(, "pgamma", 3, 2)

8 8 mantelhaen.test mantelhaen.test Mantel-Haenszel Chi-Square Test for Count Data mantelhaen.test mantelhaen.test performs a Mantel-Haenszel chi-square test of the null that and y are conditionally independent in each stratum. mantelhaen.test(, y = NULL, z = NULL, correct = TRUE) y z correct either an array of dimension 2 by 2 by s, where s is the number of strata, or a dichotomous factor object. a dichotomous factor object; ignored if is an array. a factor object idenitifying to which stratum the corresponding elements in and y belong; ignored if is an array. a logical indicating whether to apply continuity correction when computing the test. If is an array, it must be of dimension 2 by 2 by s, and the entries should be nonnegative integers. NA s are not allowed. Otherwise,, y and z must have the same length. Triples containing NA s are removed. Both and y must be dichotomous (take eactly 2 values). parameter the Mantel-Haenszel chi-square. always 1, the degrees of freedom of the approimate chi-square distribution of the test. a string indicating the employed, and whether or not continuity correction was used.

9 mcnemar.test 9 mcnemar.test McNemar s Chi-square Test for Count Data mcnemar.test Performs McNemar s chi-square test for symmetry of rows and columns in a two-dimensional contingency table. mcnemar.test(, y = NULL, correct = TRUE) y correct either a two-dimensional contingency table in matri form, or a factor object. a factor object; ignored if is a matri. a logical indicating whether to apply continuity correction when computing the test. The null is that the probabilities of being classified into cells [i,j] and [j,i] are the same. If is a matri, it is taken as a two-dimensional contingency table, and hence its entries should be nonnegative integers. Otherwise, both and y must be vectors of the same length. Incomplete cases are removed, the vectors are coerced into factor objects, and the contingency table is computed from these. Continuity correction is only used in the 2-by-2 case if correct is TRUE. parameter the value of McNemar s. the degrees of freedom of the approimate chi-square distribution of the test. a character string indicating the type of test performed, and whether continuity correction was used. a character string giving the name(s) of the data.

10 10 var.test var.test F Test to Compare Two Variances var.test Performs an F test to compare the variances of two samples from normal populations. var.test(, y, ratio = 1, = "two.sided", conf.level = 0.95), y numeric vectors of data values. ratio the hypothesized ratio of the population variances of and y. conf.level indicates the hypothesis and must be one of "two.sided", "greater" or "less". You can specify just the initial letter. confidence level for the returned confidence interval. The null hypothesis is that the ratio of the variances of the populations from which and y were drawn is equal to ratio. the value of the F test. parameter the degrees of the freedom of the F distribtion of the test. conf.int a confidence interval for the ratio of the population variances. estimate the ratio of the sample variances of and y. null.value the ratio of population variances under the null. a character string describing the hypothesis. the string "F test to compare two variances".

11 wilco.test 11 wilco.test Wilcoon Rank Sum and Signed Rank Tests wilco.test Performs one and two sample Wilcoon tests on vectors of data. wilco.test(, y = NULL, = "two.sided", mu = 0, paired = FALSE, eact = NULL, correct = TRUE) y mu paired eact correct numeric vector of data values. an optional numeric vector of data values. indicates the hypothesis and must be one of "two.sided", "greater" or "less". You can specify just the initial letter. a number specifying an optional location parameter. a logical indicating whether you want a paired test. a logical indicating whether an eact p-value should be computed. a logical indicating whether to apply continuity correction in the normal approimation for the p-value. If only is given, or if both and y are given and paired is TRUE, a Wilcoon signed rank test of the null that the median of (in the one sample case) or of -y (in the paired two sample case) equals mu is performed. Otherwise, if both and y are given and paired is FALSE, a Wilcoon rank sum test (equivalent to the Mann-Whitney test) is carried out. In this case, the null hypothesis is that the location of the distributions of and y differ by mu. By default (if eact is not specified), an eact p-value is computed if the samples contain less than 50 finite values and there are no ties. Otherwise, a normal approimation is used. parameter null.value the value of the test with a name describing it. the parameter(s) for the eact distribution of the test. Currently, only normal approimations are used. the p-value for the test. the location parameter mu. a character string describing the hypothesis. the type of test applied.

12 12 ansari.test mood.test Mood Two-Sample Test of Scale mood.test Performs Mood s two-sample test of scale. mood.test(, y, = "two.sided"), y numeric vectors of data values. indicates the hypothesis and must be one of "two.sided", "greater" or "less". You can specify just the initial letter. The underlying model is that the two samples are drawn from f( l) and f(( l)/s)/s, respectively, where l is a common location parameter and s is a scale parameter. The null hypothesis is s = 1. There are more useful tests for this problem. the value of the test. a character string describing the hypothesis. the string "Mood two-sample test of scale". ansari.test Ansari-Bradley Test ansari.test Performs the Ansari-Bradley test for a difference in scale parameters. ansari.test(, y, = "two.sided", eact = NULL)

13 ansari.test 13 y eact numeric vector of data values. numeric vector of data values. indicates the hypothesis and must be one of "two.sided", "greater" or "less". You can specify just the initial letter. a logical indicating whether an eact p-value should be computed. Suppose that and y are independent samples from distributions with densities f((t m)/s)/s and f(t m), respectively, where m is an unknown nuisance parameter and s is the parameter of interest. The Ansari-Bradley test is used for testing the null that s equals 1, the two-sided being that s! = 1 (the distributions differ only in variance), and the one-sided s being s > 1 (the distribution underlying has a larger variance, "greater") or s < 1 ("less"). By default (if eact is not specified), an eact p-value is computed if both samples contain less than 50 finite values and there are no ties. Otherwise, a normal approimation is used. the value of the Ansari-Bradley test. a character string describing the hypothesis. the string "Ansari-Bradley test". References Myles Hollander & Douglas A. Wolfe (1973), Nonparametric al inference. New York: John Wiley & Sons. Eamples ## Hollander & Wolfe (1973, p. 86f): ## Serum iron determination using Hyland control sera ramsay <- c(111, 107, 100, 99, 102, 106, 109, 108, 104, 99, 101, 96, 97, 102, 107, 113, 116, 113, 110, 98) jung.parekh <- c(107, 108, 106, 98, 105, 103, 110, 105, 104, 100, 96, 108, 103, 104, 114, 114, 113, 108, 106, 99) ansari.test(ramsay, jung.parekh)

14 14 shapiro.test shapiro.test Shapiro-Wilk Normality Test shapiro.test Performs the Shapiro-Wilk test for normality. shapiro.test() a numeric vector of data values, the number of which must be between 3 and Missing values are allowed. the value of the Shapiro-Wilk. the p-value for the test. the string "Shapiro-Wilk normality test". a character string giving the name(s) of the data. References Patrick Royston (1982), An Etension of Shapiro and Wilk s W Test for Normality to Large Samples. Applied Statistics, 31, Patrick Royston (1982), Algorithm AS 181: The W Test for Normality. Applied Statistics, 31, Patrick Royston (1995), A Remark on Algorithm AS 181: The W Test for Normality. Applied Statistics, 44, See Also qqnorm for producing a normal quantile-quantile plot. Eamples shapiro.test(rnorm(100, mean = 5, sd = 3)) shapiro.test(runif(100, min = 2, ma = 4))

15 Inde Topic htest ansari.test, 12 bartlett.test, 1 binom.test, 2 cor.test, 2 fisher.test, 4 friedman.test, 5 kruskal.test, 6 ks.test, 7 mantelhaen.test, 8 mcnemar.test, 9 mood.test, 12 shapiro.test, 14 var.test, 10 wilco.test, 11 ansari.test, 12 bartlett.test, 1 binom.test, 2 cor.test, 2 fisher.test, 4 friedman.test, 5 kruskal.test, 6 ks.test, 7 mantelhaen.test, 8 mcnemar.test, 9 mood.test, 12 qqnorm, 14 shapiro.test, 14 var.test, 10 wilco.test, 11 15

Interval Estimation. The data set belongs to the MASS package, which has to be pre-loaded into the R workspace prior to use.

Interval Estimation. The data set belongs to the MASS package, which has to be pre-loaded into the R workspace prior to use. Interval Estimation It is a common requirement to efficiently estimate population parameters based on simple random sample data. In the R tutorials of this section, we demonstrate how to compute the estimates.