Hostname: page-component-78c5997874-8bhkd Total loading time: 0 Render date: 2024-11-13T03:54:08.788Z Has data issue: false hasContentIssue false

Invasive Plant Researchers Should Calculate Effect Sizes, Not P-Values

Published online by Cambridge University Press:  20 January 2017

Matthew J. Rinella*
Affiliation:
United States Department of Agriculture, Agricultural Research Service, 243 Fort Keogh Road, Miles City, MT 59301
Jeremy J. James
Affiliation:
United States Department of Agriculture, Agricultural Research Service, Eastern Oregon Agricultural Research Center, 67826-A Hwy 205, Burns, OR 97720
*
Corresponding author's E-mail: matt.rinella@ars.usda.gov

Abstract

Null hypothesis significance testing (NHST) forms the backbone of statistical inference in invasive plant science. Over 95% of research articles in Invasive Plant Science and Management report NHST results such as P-values or statistics closely related to P-values such as least significant differences. Unfortunately, NHST results are less informative than their ubiquity implies. P-values are hard to interpret and are regularly misinterpreted. Also, P-values do not provide estimates of the magnitudes and uncertainties of studied effects, and these effect size estimates are what invasive plant scientists care about most. In this paper, we reanalyze four datasets (two of our own and two of our colleagues; studies put forth as examples in this paper are used with permission of their authors) to illustrate limitations of NHST. The re-analyses are used to build a case for confidence intervals as preferable alternatives to P-values. Confidence intervals indicate effect sizes, and compared to P-values, confidence intervals provide more complete, intuitively appealing information on what data do/do not indicate.

Type
Review
Copyright
Copyright © Weed Science Society of America 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Literature Cited

Anderson, D. R., Burnham, K. P., and Thompson, W. L. 2000. Null hypothesis testing: problems, prevalence and an alternative. J. Wildl. Manag 64:912923.Google Scholar
Anderson, D. R., Link, W. A., Johnson, D. H., and Burnham, K. P. 2001. Suggestions for presenting the results of data analysis. J. Wildl. Manag 65:373378.Google Scholar
Bates, J. D. 2005. Herbaceous response to cattle grazing following juniper cutting in Oregon. Rangeland Ecol. Manag 58:225233.Google Scholar
Berger, J. O. and Berry, D. A. 1988. Statistical analysis and the illusion of objectivity. Am. Sci 76:159165.Google Scholar
Berger, J. O. and Sellke, T. 1987. Testing a point null hypothesis: the irreconcilability of P values and evidence. J. Am. Statistical Assoc 82:112122.Google Scholar
Berry, D. A. and Lindgren, B. W. 1996. Statistics, Theory and Methods. Belmont, CA Wadsworth. 702.Google Scholar
Casella, G. and Berger, R. L. 1987. Reconciling Bayesian and frequentist evidence in the one-sided testing problem (with comments). J. Am. Statistical Assoc 82:106139.Google Scholar
Cohen, J. 1994. The earth is round (p <.05). Am. Psychologist 49:9971003.Google Scholar
Cumming, G. and Finch, S. 2001. A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncentral distributions. Educ. Psychol. Meas 61:532574.Google Scholar
D'Antonio, C. M. and Levine, J. M. 1999. Elton revisited: a review of evidence linking diversity and invasibility. Oikos 87:1526.Google Scholar
Diamond, G. A. and Forrester, J. S. 1983. Clinical trials and statistical verdicts: probable grounds for appeal. Ann. Internal Med 98:385394.Google Scholar
Falk, R. and Greenbaum, C. W. 1995. Significance tests die hard. The amazing persistence of a probabilistic misconception. Theory Psychol 5:7598.Google Scholar
Fidler, F., Burgman, M. A., Cumming, G., Buttrose, R., and Thomason, N. 2006. Impact of criticism of null-hypothesis significance testing on statistical reporting practices in conservation biology. Conserv. Biol 20:15391544.Google Scholar
Fisher, R. A. 1929. The statistical method in psychical research. Proc. Soc. Psychical Res 39:189192.Google Scholar
Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. 2004. Bayesian data analysis. Boca Raton, FL Chapman & Hall/CRC. 668.Google Scholar
Guthery, F. S., Lusk, J. J., and Peterson, M. J. 2001. The fall of null hypothesis: liabilities and opportunities. J. Wildl. Manag 65:379384.Google Scholar
Heitschmidt, R. K. and Vermeire, L. T. 2006. Can abundant summer precipitation counter losses in herbage production caused by spring drought. Rangeland Ecol. Manag 59:392399.Google Scholar
Hubbard, R. and Lindsay, R. M. 2008. Why P values are not a useful measure of evidence in statistical significance testing. Theory Psychol 18:6988.Google Scholar
James, J. J., Davies, K. W., Sheley, R. L., and Aanderud, Z. T. 2008. Linking nitrogen partitioning and species abundance to invasion resistance in the Great Basin. Oecologia 156:637648.Google Scholar
Kirk, R. E. 1996. Practical significance: a concept whose time has come. Educ. Psych. Meas 56:741745.Google Scholar
Martinez-Abrain, A. 2007. Are there any differences? A non-sensical question in ecology. Acta Ecol. Int. J. Ecol 32:203206.Google Scholar
Nagele, P. 2001. Misuse of standard error of the mean (SEM) when reporting variability of a sample. A critical evaluation of four anaesthesia journals. Br. J. Anaesth 90:514516.Google Scholar
Nakagawa, S. and Cuthill, I. C. 2007. Effect size, confidence interval and statistical significance: a practical guide for biologists. Biol. Rev 82:591605.Google Scholar
Nelder, J. A. 1999. From statistics to statistical science. The Statistician 48:257269.Google Scholar
Neter, J., Kutner, M. H., Nachtsheim, C. J., and Wasserman, W. 1996. Applied linear statistical models. New York Irwin. 1408.Google Scholar
Nickerson, R. S. 2000. Null hypothesis significance testing: a review of an old and continuing controversy. Psych. Methods 5:241301.Google Scholar
Rinella, M. J., Jacobs, J. S., Sheley, R. L., and Borkowski, J. J. 2001. Spotted knapweed response to season and frequency of mowing. J. Range Manag 54:5256.Google Scholar
Robert, C. P. 2001. The Bayesian Choice. Paris, France Springer. 604.Google Scholar
Rosenthal, R. and Rubin, D. B. 1994. The counternull value of an effect size. Psychol. Sci 5:329334.Google Scholar
SAS 1999. Statistical software. Version 8.0. Cary, NC SAS Institute.Google Scholar
Sellke, T., Bayarri, M. J., and Berger, J. O. 2001. Calibration of p values for testing precise null hypotheses. Am. Statistician 55:6271.Google Scholar
Stephens, P. A., Buskirk, S. W., and Martinez del Rio, C. 2007. Inferences in ecology and evolution. Trends Ecol. Evol 22:192197.Google Scholar
Tukey, J. W. 1991. The philosophy of multiple comparisons. Statistical Sci 6:100116.Google Scholar