Sample Size Determinations for the Two Rater Kappa Statistic

V. F. Flack; A. A. Afifi; P. A. Lachenbruch; H. J. A. Schouten

doi:10.1007/BF02294215

Abstract

This paper gives a method for determining a sample size that will achieve a prespecified bound on confidence interval width for the interrater agreement measure, κ. The same results can be used when a prespecified power is desired for testing hypotheses about the value of kappa. An example from the literature is used to illustrate the methods proposed here.

Information

Type

Original Paper

Information

Psychometrika , Volume 53 , Issue 3 , September 1988 , pp. 321 - 325

DOI: https://doi.org/10.1007/BF02294215

References

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.CrossRef Google Scholar

Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213–220.CrossRef Google Scholar PubMed

Dixon, W. J., & Massey, F. J. Jr (1983). Introduction to statistical analysis 4th ed.,, New York: McGraw-Hill.Google Scholar

Flack, V. F. (1987). Confidence intervals for the two rater kappa. Communications in Statistics: Theory and Methods, 16, 953–968.CrossRef Google Scholar

Fleiss, J. L. (1981). Statistical methods for rates and proportions 2nd ed.,, New York: Wiley.Google Scholar

Fleiss, J. L., Cohen, J., & Everitt, B. S. (1969). Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 72, 323–327.CrossRef Google Scholar

Landis, J. R., & Koch, G. G. (1977). The measurement of interrater agreement for categorical data. Biometrics, 33, 159–174.CrossRef Google Scholar

Crossref Citations

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

1993. Statistical Techniques for the Study of Language and Language Behaviour. p. 187.

Rollins, Pamela R. Pan, Barbara A. Conti-Ramsden, Gina and Snow, Catherine E. 1994. Communicative skills in children with specific language impairments: A comparison with their language-matched siblings. Journal of Communication Disorders, Vol. 27, Issue. 2, p. 189.

Stalman, Wim van Essen, Gerrit A Gubbels, Jan W and de Melker, Ruut A 1997. Difficulties in diagnosing acute sinusitis in a Dutch group practice: Relative value of history, radiography and ultrasound. European Journal of General Practice, Vol. 3, Issue. 1, p. 12.

Schouten, Hubert J. A. 1999. Klinische statistiek. p. 33.

Allan, Donner 1999. Sample size requirements for interval estimation of the intraclass kappa statistic. Communications in Statistics - Simulation and Computation, Vol. 28, Issue. 2, p. 415.

Isaksen, Jesper 2000. Constructing Meaning Despite the Drudgery of Repetitive Work. Journal of Humanistic Psychology, Vol. 40, Issue. 3, p. 84.

Stiell, Ian G. Lesiuk, Howard Wells, George A. McKnight, R.Douglas Brison, Robert Clement, Catherine Eisenhauer, Mary A. Greenberg, Gary H. MacPhail, Iain Reardon, Mark Worthington, James Verbeek, Richard Rowe, Brian Cass, Daniel Dreyer, Jonathan Holroyd, Brian Morrison, Laurie Schull, Michael and Laupacis, Andreas 2001. The canadian CT head rule study for patients with minor head injury: Rationale, objectives, and methodology for phase I (derivation). Annals of Emergency Medicine, Vol. 38, Issue. 2, p. 160.

Stiell, Ian G. Lesiuk, Howard Wells, George A. Coyle, Douglas McKnight, R.Douglas Brison, Robert Clement, Catherine Eisenhauer, Mary A. Greenberg, Gary H. MacPhail, Iain Reardon, Mark Worthington, James Verbeek, Richard Rowe, Brian Cass, Daniel Dreyer, Jonathan Holroyd, Brian Morrison, Laurie Schull, Michael and Laupacis, Andreas 2001. Canadian CT head rule study for patients with minor head injury: Methodology for phase II (validation and economic analysis). Annals of Emergency Medicine, Vol. 38, Issue. 3, p. 317.

Stiell, Ian G. Wells, George A. McKnight, R. Douglas Brison, Robert Lesiuk, Howard Clement, Catherine M. Eisenhauer, Mary A. Greenberg, Gary H. MacPhail, Iain Reardon, Mark Worthington, James Verbeek, Richard Dreyer, Jonathan Cass, Daniel Schull, Michael Morrison, Laurie Rowe, Brian Holroyd, Brian Bandiera, Glen and Laupacis, Andreas 2002. Canadian C-Spine Rule study for alert and stable trauma patients: II. Study objectives and methodology. CJEM, Vol. 4, Issue. 03, p. 185.

Jones, J.Mary 2004. Reliability of nutritional screening and assessment tools. Nutrition, Vol. 20, Issue. 3, p. 307.

Sim, Julius and Wright, Chris C 2005. The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements. Physical Therapy, Vol. 85, Issue. 3, p. 257.

Rodger, Marc A. Maser, Elana Stiell, Ian Howley, Heather E.A. and Wells, Philip S. 2005. The interobserver reliability of pretest probability assessment in patients with suspected pulmonary embolism. Thrombosis Research, Vol. 116, Issue. 2, p. 101.

Sibley, A Kersten, P Ward, C D White, B Mehta, R and George, S 2006. Measuring autonomy in disabled people: validation of a new scale in a UK population. Clinical Rehabilitation, Vol. 20, Issue. 9, p. 793.

El Emam, Khaled Neri, Emilio and Jonker, Elizabeth 2007. An Evaluation of Personal Health Information Remnants in Second-Hand Personal Computer Disk Drives. Journal of Medical Internet Research, Vol. 9, Issue. 3, p. e24.

Meester‐Delver, Anke Beelen, Anita Hennekam, Raoul Nollet, Frans and Hadders‐Algra, Mijna 2007. The Capacity Profile: a method to classify additional care needs in children with neurodevelopmental disabilities. Developmental Medicine & Child Neurology, Vol. 49, Issue. 5, p. 355.

Van Ness, Peter H. Towle, Virginia R. and Juthani-Mehta, Manisha 2008. Testing Measurement Reliability in Older Populations. Journal of Aging and Health, Vol. 20, Issue. 2, p. 183.

Hall, Steven and Brannick, Michael 2008. Human Factors in Simulation and Training. p. 149.

Ore, Liora Tamir, Ada Stein, Nili and Cohen‐Dar, Michal 2009. Reliability of Vision Screening Tests for School Children. Journal of Nursing Scholarship, Vol. 41, Issue. 3, p. 250.

Buzzell, Jonathan E. Lutton, David M. Shyr, Yu Neviaser, Robert J. and Lee, Donald H. 2009. Reliability and accuracy of templating the proximal humeral component for shoulder arthroplasty. Journal of Shoulder and Elbow Surgery, Vol. 18, Issue. 5, p. 728.

Polak, Marike Abraham, Robert E. Van, Henricus L. and Ingenhoven, Theo 2010. Interrater reliability in unbalanced designs: A comment on Scholte et al. (2009). Bulletin of the Menninger Clinic, Vol. 74, Issue. 3, p. 238.

Download full list

Article contents

Sample Size Determinations for the Two Rater Kappa Statistic

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Article contents

Sample Size Determinations for the Two Rater Kappa Statistic

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests