Hostname: page-component-5f745c7db-xx4dx Total loading time: 0 Render date: 2025-01-06T07:36:08.801Z Has data issue: true hasContentIssue false

Detecting Treatment Effects with Small Samples: The Power of Some Tests Under the Randomization Model

Published online by Cambridge University Press:  01 January 2025

Bryan Keller*
Affiliation:
University of Wisconsin–Madison
*
Requests for reprints should be sent to Bryan Keller, University of Wisconsin–Madison, Madison, WI, USA. E-mail: bskeller@wisc.edu

Abstract

Randomization tests are often recommended when parametric assumptions may be violated because they require no distributional or random sampling assumptions in order to be valid. In addition to being exact, a randomization test may also be more powerful than its parametric counterpart. This was demonstrated in a simulation study which examined the conditional power of three nondirectional tests: the randomization t test, the Wilcoxon–Mann–Whitney (WMW) test, and the parametric t test. When the treatment effect was skewed, with degree of skewness correlated with the size of the effect, the randomization t test was systematically more powerful than the parametric t test. The relative power of the WMW test under the skewed treatment effect condition depended on the sample size ratio.

Type
Original Paper
Copyright
Copyright © 2012 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Box, G.E.P., Anderson, S.L. (1955). Permutation theory in the derivation of robust criteria and the study of departures from assumption. Journal of the Royal Statistical Society. Series B, Statistical Methodology, 17, 134CrossRefGoogle Scholar
David, H.A. (2008). The beginnings of randomization tests. The American Statistician, 62, 7072CrossRefGoogle Scholar
Dwass, M. (1957). Modified randomization tests for nonparametric hypotheses. Annals of Mathematical Statistics, 28, 181187CrossRefGoogle Scholar
Eden, T., Yates, F. (1933). On the validity of Fisher’s z test when applied to an actual example of non-normal data. Journal of Agricultural Science, 23, 617CrossRefGoogle Scholar
Edgington, E.S., Ezinga, G. (1978). Randomization tests and outlier scores. The Journal of Psychology, 99, 259262CrossRefGoogle Scholar
Edgington, E.S., Onghena, P. (2007). Randomization tests, (4rd ed.). Boca Raton: Chapman & HallCrossRefGoogle Scholar
Fisher, R.A. (1935). The design of experiments, Edinburgh: Oliver & BoydGoogle Scholar
Gabriel, K.R., Hall, W.J. (1983). Rerandomization inference on regression and shift effects: Computationally feasible methods. Journal of the American Statistical Association, 78, 827836CrossRefGoogle Scholar
Gabriel, K.R., Hsu, C.-F. (1983). Evaluation of the power of rerandomization tests, with application to weather modification experiments. Journal of the American Statistical Association, 78, 766775CrossRefGoogle Scholar
Gill, P.M.W. (2007). Efficient calculation of p-values in linear-statistic permutation significance tests. Journal of Statistical Computation and Simulation, 77, 5561CrossRefGoogle Scholar
Hayes, A.F. (1996). Permutation test is not distribution-free: Testing H 0:ρ=0. Psychological Methods, 1, 184198CrossRefGoogle Scholar
Hettmansperger, T.P. (1984). Statistical inference based on ranks, New York: WileyGoogle Scholar
Hoeffding, W. (1952). The large sample power of tests based on permutations of observations. Annals of Mathematical Statistics, 23, 169192CrossRefGoogle Scholar
Hothorn, T., Hornik, K., van de Wiel, M.A., Zeileis, A. (2006). A Lego system for conditional inference. The American Statistician, 60(3), 257263CrossRefGoogle Scholar
Keller-McNulty, S., Higgins, J.J. (1987). Effect of tail weight and outliers on power and type-I error of robust permutation tests for location. Communications in Statistics. Simulation and Computation, 16, 1735CrossRefGoogle Scholar
Kempthorne, O., Doerfler, T.E. (1969). The behavior of some significance tests under experimental randomization. Biometrika, 56, 231248CrossRefGoogle Scholar
Keppel, G., Wickens, T.D. (2004). Design and analysis: a researcher’s handbook, (4rd ed.). Upper Saddle River: Pearson EducationGoogle Scholar
Klotz, J.H. (1966). The Wilcoxon, ties, and the computer. Journal of the American Statistical Association, 61, 772787CrossRefGoogle Scholar
Lehmann, E.L. (1975). Nonparametrics, San Francisco: Holden-DayGoogle Scholar
Levin, J.R., Marascuilo, L.A., Hubert, L.J. (1978). N=nonparametric randomization tests. In Kratochwill, T.R. (Ed.), Single-subject research: strategies for evaluating change, New York: Academic Press 167196CrossRefGoogle Scholar
Ludbrook, J., Dudley, H. (1998). Why permutation tests are superior to t and F tests in biomedical research. The American Statistician, 52, 127132Google Scholar
Mann, H.B., Whitney, D.R. (1947). On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 18, 5060CrossRefGoogle Scholar
Mehta, C.R., Patel, N.R., Tsiatis, A.A. (1984). Exact significance testing to establish treatment equivalence with ordered categorical data. Biometrics, 40, 819825CrossRefGoogle ScholarPubMed
Mewhort, D.J.K. (2005). A comparison of the randomization test with the F test when error is skewed. Behavior Research Methods, 37, 426435CrossRefGoogle Scholar
Onghena, P., May, R.B. (1995). Pitfalls in computing and interpreting randomization test p values: A commentary on Chen and Dunlap. Behavior Research Methods, Instruments, & Computers, 27, 408411CrossRefGoogle Scholar
Pitman, E.J.G. (1937). Significance tests which may be applied to samples from any populations. Supplement to the Journal of the Royal Statistical Society, 4, 119130Google Scholar
R Development Core Team (2011). R: a language and environment for statistical computing [Computer software manual]. Vienna, Austria. Available from http://www.R-project.org/ (ISBN 3-900051-07-0). Google Scholar
Scheffé, H. (1959). The analysis of variance, New York: WileyGoogle Scholar
Streitberg, B., Röhmel, J. (1986). Exact distributions for permutation and rank tests: An introduction to some recently published algorithms. Statistical Software Newsletter, 12, 1017Google Scholar
Tomarken, A.J., Serlin, R.C. (1986). Comparison of ANOVA alternatives under variance heterogeneity and specific noncentrality structures. Psychological Bulletin, 99, 9099CrossRefGoogle Scholar
Toothaker, L.E. (1972). An empirical investigation of the permutation t-test as compared to Student’s t-test and the Mann-Whitney U-test. Doctoral dissertation, University of Wisconsin, Madison. Google Scholar
van den Brink, W.P., van den Brink, S.G.J. (1989). A comparison of the power of the t test, Wilcoxon’s test, and the approximate permutation test for the two-sample location problem. British Journal of Mathematical & Statistical Psychology, 42, 183189CrossRefGoogle Scholar
Wald, A., Wolfowitz, J. (1944). Statistical tests based on the permutations of the observations. Annals of Mathematical Statistics, 15, 358372CrossRefGoogle Scholar
Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics, 1, 8083CrossRefGoogle Scholar
Zimmerman, D., Zumbo, B. (1992). Parametric alternatives to the Student t test under violation of normality and homogeneity of variance. Perceptual and Motor Skills, 74, 835844CrossRefGoogle Scholar
Zimmerman, D., Zumbo, B. (1993). Rank transformations and the power of the Student t test and Welch t′ test for non-normal populations with unequal variances. Canadian Journal of Experimental Psychology, 47, 523539CrossRefGoogle Scholar