Hostname: page-component-5f745c7db-nc56l Total loading time: 0 Render date: 2025-01-06T07:53:10.078Z Has data issue: true hasContentIssue false

Power(ful) guidelines for experimental economists

Published online by Cambridge University Press:  01 January 2025

Kathryn N. Vasilaky*
Affiliation:
Department of Economics, California Polytechnic State University, 1 Grand Ave, San Luis Obispo, CA 93407, USA International Research Institute for Climate and Society, Columbia University, New York, NY, USA
J. Michelle Brock*
Affiliation:
European Bank for Reconstruction and Development and CEPR, One Exchange Square, London EC2A 2JN, UK

Abstract

Statistical power is an important detail to consider in the design phase of any experiment. This paper serves as a reference for experimental economists on power calculations. We synthesize many of the questions and issues frequently brought up regarding power calculations and the literature that surrounds that. We provide practical coded examples and tools available for calculating power, and suggest when and how to report power calculations in published studies.

JEL classification

Type
Original Paper
Copyright
Copyright © Economic Science Association 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Thanks to contributions from the ESA discussion forum. We gratefully acknowledge the editors and Eduardo Zambrano for very helpful comments.

References

Anderson, M. L. (2008). Multiple inference and gender differences in the effects of early intervention: a reevaluation of the abecedarian, perry preschool, and early training projects. Journal of the American Statistical Association, 103(484), 14811495.CrossRefGoogle Scholar
Apicella, C., Dreber, A., Campbell, B., Gray, P., Hoffman, M., Little, A. (2008). Testosterone and financial risk preferences. Evolution and Human Behavior, 29(6), 384390.CrossRefGoogle Scholar
Arnold, B. F., Hogan, D. R., Colford, J. M. C., Hubbard, A. E. (2011). Simulation methods to estimate design power: an overview for applied research. BMC Medical Research Methodology, 11(94), 0011.CrossRefGoogle ScholarPubMed
Athey, S., & Imbens, G. W. (2017). The econometrics of randomized experiments, Chap. 3. In Banerjee, A. V. E. B. T.-H. of E. F. E. Duflo (Eds.), Handbook of Field Experiments (Vol. 1, pp. 73140). North-Holland. https://doi.org/10.1016/bs.hefe.2016.10.003.Google Scholar
Baltagi, B. (2013). Econometric analysis of panel data, West Sussex: Wiley.Google Scholar
Banerjee, A., Duflo, E., Finkelstein, A., Katz, L. F., Olken, B. A., & Sautmann, A. (2020). In praise of moderation: Suggestions for the scope and use of pre-analysis plans for rcts in economics. (No. w26993). National Bureau of Economic Research.CrossRefGoogle Scholar
Bellemare, C., Bissonnette, L., & Kröger, S. (2014). Statistical power of within and between-subjects designs in economic experiments. IZA Discussion Paper No. 8583.Google Scholar
Bellemare, C., Bissonnette, L., Kröger, S. (2016). Simulating power of economic experiments: the powerBBK package. Journal of the Economic Science Association, 2(2), 157168.CrossRefGoogle Scholar
Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E., Berk, R., Bollen, K. A., Brembs, B., Brown, L., Camerer, C., Cesarini, D., Chambers, C. D., Clyde, M., Cook, T. D., Boeck, P. D., Dienes, Z., Dreber, A., Easwaran, K., Efferson, C., Fehr, E., Fidler, F., Field, A. P., Forster, M., George, E. I., Gonzalez, R., Goodman, S., Green, E., Green, D. P., Greenwald, A., Hadfield, J. D., Hedges, L. V., Held, L., Ho, T. H., Hoijtink, H., Hruschka, D. J., Imai, K., Imbens, G., Ioannidis, J. P. A., Jeon, M., Jones, J. H., Kirchler, M., Laibson, D., List, J., Little, R., Lupia, A., Machery, E., Maxwell, S. E., Mccarthy, M., Moore, D., Morgan, S. L., Munafó, M., Nakagawa, S., Nyhan, B., Parker, T. H., Pericchi, L., Perugini, M., Rouder, J., Rousseau, J., Savalei, V., Schönbrodt, F. D., Sellke, T., Sinclair, B., Tingley, D., Zandt, T. V., Vazire, S., Watts, D. J., Winship, C., Wolpert, R. L., Xie, Y., Young, C., Zinman, J., Johnson, V. E. (2018). Redefine statistical significance. Nature Human Behavior, 2, 610.CrossRefGoogle ScholarPubMed
Burnham, T. C. (2007). High-testosterone men reject low ultimatum game offers. Proceedings of the Royal Society B: Biological Sciences, 274(1623), 23272330.CrossRefGoogle ScholarPubMed
Button, K., Ioannidis, J., Mokrysz, C., Nosek, B., Flint, J., Robinson, E., Munafo, M. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365.CrossRefGoogle ScholarPubMed
Camerer, C., Dreber, A., Forsell, E., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Almenberg, J., Altmejd, A., Chan, T., Heikensten, E., Holzmeister, F., Imai, T., Isaksson, S., Nave, G., Pfeiffer, T., Razen, M., Wu, H. (2016). Evaluating replicability of laboratory experiments in economics. Science, 351(6280), 14331436.CrossRefGoogle ScholarPubMed
Charness, G., Gneezy, U., Kuhn, M. A. (2012). Experimental methods: Between-subject and within-subject design. Journal of Economic Behavior & Organization, 81, 18.CrossRefGoogle Scholar
Chow, S.-C., Shao, J., Wang, H. (2008). Sample size calculations in clinical research, Boca Raton: Chapman & Hall/CRC.Google Scholar
Coffman, L. C., Niederle, M. (2015). Pre-analysis plans have limited upside, especially where replications are feasible. Journal of Economic Perspectives, 29(3), 8198.CrossRefGoogle Scholar
Cohen, J. (1988). Statistical power for the behavioral sciences, New York: Academic Press.Google Scholar
Czibor, E., Jimenez-Gomez, D., List, J. A. (2019). The dozen things experimental economists should do (More of). Southern Economic Journal, 86(2), 371432.CrossRefGoogle Scholar
Drichoutis, A. C., Lusk, J. L., Nayga, R. M. J. (2015). The veil of experimental currency units in second price auctions. Journal of the Economic Science Association, 1(2), 182196.CrossRefGoogle Scholar
Finucane McKenzie, M., Martinez, I., Cody, S. (2018). What works for whom? a bayesian approach to channeling big data streams for public program evaluation. American Journal of Evaluation, 39(1), 109122.CrossRefGoogle Scholar
Frèchette, G. R. (2012). Session-effects in the laboratory. Experimental Economics, 15, 485498.CrossRefGoogle Scholar
Gelman, A., Carlin, J. (2014). Beyond power calculations : assessing type S ( Sign ) and type M ( Magnitude ) errors. Perspectives on Pscychological Science, 9(6), 641651.CrossRefGoogle ScholarPubMed
Gelman, A. & Hill, J. (2006). Sample size and power calculations. In Analytical Methods for Social Research (pp. 437456). Cambridge: Cambridge University Press.Google Scholar
Gerber, A., Green, D. M. (2012). Field Experiments: Design, Analysis and Interpretation, Newbury Park: W W Norton and Company.Google Scholar
Gerber, A. S., Green, D. P. (2012). Field Experiments Design, Analysis, and Interpretation, New York: W.W. Norton.Google Scholar
Goodman, S., Berlin, J. (1994). The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. Annals of Internal Medicine, 121, 200206.CrossRefGoogle ScholarPubMed
Happ, M., Bathke, A., Brunner, E. (2019). Optimal sample size planning for the Wilcoxon–Mann–Whitney test. Statistical Medicine, 38(3), 363375.CrossRefGoogle ScholarPubMed
Hoenig, J. M., Heisey, D. M. (2001). The abuse of power : the pervasive fallacy of power calculations for data analysis. The Americian Statistician, 55(1), 1924.CrossRefGoogle Scholar
Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8),e124.CrossRefGoogle ScholarPubMed
JPAL. (2014). How to do power calculations in optimal design software. Technical report, Abdul Latif Jameel Poverty Action Lab. https://www.povertyactionlab.org/sites/default/files/ExerciseC_Powercalculation_participants.pdf.Google Scholar
Kahneman, D. (2011). Thinking, fast and slow, New York: Penguin Books.Google Scholar
Ledolter, J. (2013). Economic field experiments: comments on design efficiency, sample size and statistical power. Journal of Economics and Management, 9(2), 271290.Google Scholar
Lenth, R. V. (2001). Some practical guidelines for effective sample size determination. The American Statistician, 55(3), 187193.CrossRefGoogle Scholar
List, J., Sadoff, S., Wagner, M. (2011). So you want to run an experiment, now what? Some simple rules of thumb for optimal experimental design. Experimental Economics, 14(4), 439457.CrossRefGoogle Scholar
List, J., Shaikh, A. M., Xu, Y. (2019). Multiple hypothesis testing in experimental economics. Experimental Economics, 22(4), 773793.CrossRefGoogle Scholar
Luedicke, J. (2013). Simulation-based power analysis for linear and generalized linear models. In Stata Conference, pp. 125.Google Scholar
Matthews, J. N. S. (2006). Introduction to randomized controlled clinical trials, 2New York: Chapman & Hall.CrossRefGoogle Scholar
Maxwell, S., Delaney, H., Kelley, K. (2018). Designing experiments and analyzing data: a model comparison Perspective, 3Newbury Park: Routledge.Google Scholar
Murphy, B., Myors, B., Wolach, A. (2014). Statistical power analysis, New York: Routledge.CrossRefGoogle Scholar
Nikiforakis, N., & Slonim, R. (2015). Editors’ preface: statistics, replications and null results. Journal of the Economic Science Association, 1(2), 127131.CrossRefGoogle Scholar
Nosek. (2015). Estimating the reproducibility of psychological science. Science, 349(6251).Google Scholar
Porter, K. (2018). Statistical power in evaluations that investigate effects on multiple outcomes: A guide for researchers. Journal of Research on Educational Effectiveness, 1(2), 267295.CrossRefGoogle Scholar
Rahardja, D., Zhao, Y. D., Qu, Y. (2009). Sample size determinations for the wilcoxonâmannâwhitney test: A comprehensive review. Statistics in Biopharmaceutical Research, 1(3), 317322.CrossRefGoogle Scholar
Raudenbush, S., Spybrook, J., Congdon, R., Martinez, A., Bloom, H., & Hill, C. (2011). Optimal design software for multi-level and longitudinal research (Version 3.01) [Software]. Available from www.wtgrantfoundation.org.Google Scholar
Sainani, K. (2010). The importance of accounting for correlated observations. American Academy of Physical Medicine and Rehabilitation, 2(9), 858861.Google ScholarPubMed
Searle, S. R., Casella, G., McCulloch, C. E. (2009). Variance components, Ney York: Wiley.Google Scholar
Szucs, D., Ioannidis, J. (2017). Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature. PLoS Biol, 15(3),e2000797.CrossRefGoogle ScholarPubMed
van der Sluis, S., Dolan, C. V., Neale, M. C., Posthuma, D. (2008). Power calculations using exact data simulation: a useful tool for genetic study designs. Behavior Genetics, 38(2), 202–11.CrossRefGoogle ScholarPubMed
Wooldridge, J. (2010). Econometric Analysis of Cross Section and Panel Data, Cambridge: MIT Press.Google Scholar
Xiong, R., Athey, S., Bayati, M., & Imbens, G. (2019). Optimal experimental design for staggered rollouts. https://doi.org/10.2139/ssrn.3483934. https://ssrn.com/abstract=3483934.CrossRefGoogle Scholar
Zethraeus, N., Kocoska-Maras, L., Ellingsen, T., von Schoultz, B., Hirschberg, A. L., Johannesson, M. (2009). A randomized trial of the effect of estrogen and testosterone on economic behavior. Proceedings of the National Academy of Sciences, 106(16), 65356538.CrossRefGoogle ScholarPubMed
Zhang, L., & Ortmann, A. (2013). Exploring the meaning of significance in experimental economics. UNSW Australian School of Business Research Paper 2013–32.CrossRefGoogle Scholar
Zhong, B. (2009). How to calculate sample size in randomized controlled trial? Journal of Thoracic Disease, 1(1), 5154.Google ScholarPubMed