Hostname: page-component-5f745c7db-96s6r Total loading time: 0 Render date: 2025-01-06T06:47:28.892Z Has data issue: true hasContentIssue false

A New Interpretation of the Weighted Kappa Coefficients

Published online by Cambridge University Press:  01 January 2025

Sophie Vanbelle*
Affiliation:
Maastricht University
*
Correspondence should be made to Sophie Vanbelle, Department of Methodology & Statistics, CAPHRI School for Public Health and Primary Care, Maastricht University, P.O. Box 616, 6200 MD Maastricht, The Netherlands. Email: sophie.vanbelle@maastrichtuniversity.nl

Abstract

Reliability and agreement studies are of paramount importance. They do contribute to the quality of studies by providing information about the amount of error inherent to any diagnosis, score or measurement. Guidelines for reporting reliability and agreement studies were recently provided. While the use of the kappa-like family is advised for categorical and ordinal scales, no further guideline in the choice of a weighting scheme is given. In the present paper, a new simple and practical interpretation of the linear- and quadratic-weighted kappa coefficients is given. This will help researchers in motivating their choice of a weighting scheme.

Type
Article
Copyright
Copyright © 2014 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Brenner, H., Kliebsch, U. (1996). Dependence of weighed kappa coefficients on the number of categories. Epidemiology, 7, 199202CrossRefGoogle Scholar
Byrt, T., Bishop, J., Carlin, J.B. (1993). Bias, prevalence and kappa. Journal of Clinical Epidemiology, 46, 423429CrossRefGoogle ScholarPubMed
Cicchetti, D., Allison, T. (1971). A new procedure for assessing reliability of scoring eeg sleep recordings. American Journal EEG Technology, 11, 101109CrossRefGoogle Scholar
Cicchetti, D.V., Feinstein, A.R. (1990). High agreement but low kappa: II. Resolving the paradoxes. Journal of Clinical Epidemiology, 43, 551558CrossRefGoogle ScholarPubMed
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 3746CrossRefGoogle Scholar
Cohen, J. (1968). Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213220CrossRefGoogle ScholarPubMed
Feinstein, A.R., Cicchetti, D.V. (1990). High agreement but low kappa: I. The problem of two paradoxes. Journal of Clinical Epidemiology, 43, 543549CrossRefGoogle Scholar
Fleiss, J.L., Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measure of reliability. Educational and Psychological Measurement, 33, 613619CrossRefGoogle Scholar
Kottner, J., Audigé, L., Brorson, S., Donner, A., Gajewski, B., Hröbjartsson, A., Roberts, C., Shoukri, M., Streiner, D. (2011). Guidelines for reporting reliability and agreement studies (grras) were proposed. Journal of Clinical Epidemiology, 64, 96106CrossRefGoogle ScholarPubMed
Kraemer, H.C. (1979). Ramifications of a population model for κ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa $$\end{document} as a coefficient of reliability. Psychometrika, 44, 461472CrossRefGoogle Scholar
Kraemer, H.C., Vyjeyanthi, S.P., Noda, A. (2004). Dynamic ambient paradigms. In D’Agostino, R.B. (Ed.), Tutorial in Biostatistics (pp. 85105), New York: WileyCrossRefGoogle Scholar
Lipsitz, S.R. (1992). Methods for estimating the parameters of a linear model for ordered categorical data. Biometrics, 48, 271281CrossRefGoogle ScholarPubMed
McGraw, K.O., Wong, S.P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1, 3046CrossRefGoogle Scholar
Pearson, K. (1900). Mathematical contributions to the theory of evolution. VII. On the correlation of characters not quantitatively measurable. Philosophical Transactions of the Royal Society of London, Series A, 195, 145Google Scholar
Rogot, E., Goldberg, I.D. (1966). A proposed index for measuring agreement in test–retest studies. Journal of Chronic Diseases, 19, 9911006CrossRefGoogle ScholarPubMed
Schuster, C. (2004). A note on the interpretation of weighted kappa and its relation to other rater agreement statistics for metric scales. Educational and Psychological Measurement, 64, 243253CrossRefGoogle Scholar
Shrout, P.E., Fleiss, J.L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86, 420428CrossRefGoogle ScholarPubMed
Stine, W. (1989). Interobserver relational agreement. Psychological Bulletin, 106, 341347CrossRefGoogle Scholar
Vach, W. (2005). The dependence of Cohen’s kappa on the prevalence does not matter. Journal of Clinical Epidemiology, 58, 655661CrossRefGoogle Scholar
Vanbelle, S. (2013). Clinical agreement in qualitative measurements: The kappa coefficient in clinical research. In Doi, S., Williams, G. (Eds.), Methods of clinical epidemiology, Springer series on epidemiology and public health (pp. 338), Heidelberg: SpringerGoogle Scholar
Vanbelle, S., Albert, A. (2009). A note on the linearly weighted kappa coefficient for ordinal scales. Statistical Methodology, 6, 157163CrossRefGoogle Scholar
Warrens, M. (2013). Conditional inequalities between cohen’s kappa and weighted kappas. Statistical Methodology, 10, 1422CrossRefGoogle Scholar
Warrens, M. (2014). Corrected zegers-ten berge coefficients are special cases of Cohen’s weighted kappa. Journal of Classification, 31, 179193CrossRefGoogle Scholar
Warrens, M.J. (2011). Cohen’s linearly weighted kappa is a weighted average of 2×2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2\times 2$$\end{document} kappas. Psychometrika, 76, 471486CrossRefGoogle Scholar
Warrens, M.J. (2012). Cohen’s quadratically weighted kappa is higher than linearly weighted kappa for tridiagonal agreement tables. Statistical Methodology, 9, 440444CrossRefGoogle Scholar
Warrens, M.J. (2013). The Cicchetti-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}Allison weighting matrix is positive definite. Computational Statistics & Data Analysis, 59, 180182CrossRefGoogle Scholar
Warrens, M.J. (2013). Some paradoxical results for the quadratically weighted kappa. Psychometrika, 77, 315323CrossRefGoogle Scholar
Warrens, M. J. (2013). Weighted kappas for 3×3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$3\times 3$$\end{document} tables. Journal of Probability and Statistics.Google Scholar
Yang, J., Chinchilli, V.M. (2009). Fixed-effects modeling of Cohen’s kappa for bivariate multinomial data. Communications in Statistics: Theory and Methods, 38, 36343653CrossRefGoogle Scholar
Yang, J., Chinchilli, V.M. (2011). Fixed-effects modeling of Cohen’s weighted kappa for bivariate multinomial data. Computational Statistics & Data Analysis, 55, 10611070CrossRefGoogle Scholar