Hostname: page-component-cd9895bd7-dk4vv Total loading time: 0 Render date: 2025-01-04T03:58:14.643Z Has data issue: false hasContentIssue false

Some Paradoxical Results for the Quadratically Weighted Kappa

Published online by Cambridge University Press:  01 January 2025

Matthijs J. Warrens*
Affiliation:
Leiden University
*
Requests for reprints should be sent to Matthijs J. Warrens, Institute of Psychology, Unit Methodology and Statistics, Leiden University, P.O. Box 9555, 2300 RB Leiden, The Netherlands. E-mail: warrens@fsw.leidenuniv.nl

Abstract

The quadratically weighted kappa is the most commonly used weighted kappa statistic for summarizing interrater agreement on an ordinal scale. The paper presents several properties of the quadratically weighted kappa that are paradoxical. For agreement tables with an odd number of categories n it is shown that if one of the raters uses the same base rates for categories 1 and n, categories 2 and n−1, and so on, then the value of quadratically weighted kappa does not depend on the value of the center cell of the agreement table. Since the center cell reflects the exact agreement of the two raters on the middle category, this result questions the applicability of the quadratically weighted kappa to agreement studies. If one wants to report a single index of agreement for an ordinal scale, it is recommended that the linearly weighted kappa instead of the quadratically weighted kappa is used.

Type
Original Paper
Copyright
Copyright © 2012 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agresti, A. (1988). A model for agreement between ratings on an ordinal scale. Biometrics, 44, 539548CrossRefGoogle Scholar
Agresti, A. (2010). Analysis of ordinal categorical data, (2rd ed.). Hoboken: WileyCrossRefGoogle Scholar
Becker, M.P. (1989). Using association models to analyse agreement data: two examples. Statistics in Medicine, 8, 11991207CrossRefGoogle ScholarPubMed
Brennan, R.L., Prediger, D.J. (1981). Coefficient kappa: some uses, misuses, and alternatives. Educational and Psychological Measurement, 41, 687699CrossRefGoogle Scholar
Brenner, H., Kliebsch, U. (1996). Dependence of weighted kappa coefficients on the number of categories. Epidemiology, 7, 199202CrossRefGoogle ScholarPubMed
Cicchetti, D., Allison, T. (1971). A new procedure for assessing reliability of scoring EEG sleep recordings. The American Journal of EEG Technology, 11, 101109CrossRefGoogle Scholar
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 213220CrossRefGoogle Scholar
Cohen, J. (1968). Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213220CrossRefGoogle ScholarPubMed
Crewson, P.E. (2005). Fundamentals of clinical research for radiologists: reader agreement studies. American Journal of Roentgenology, 184, 13911397CrossRefGoogle Scholar
Fleiss, J.L., Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33, 613619CrossRefGoogle Scholar
Graham, P., Jackson, R. (1993). The analysis of ordinal agreement data: beyond weighted kappa. Journal of Clinical Epidemiology, 46, 10551062CrossRefGoogle ScholarPubMed
Hsu, L.M., Field, R. (2003). Interrater agreement measures: comments on kappan, Cohen’s kappa, Scott’s π and Aickin’s α. Understanding Statistics, 2, 205219CrossRefGoogle Scholar
Jakobsson, U., Westergren, A. (2005). Statistical methods for assessing agreement for ordinal data. Scandinavian Journal of Caring Sciences, 19, 427431CrossRefGoogle ScholarPubMed
Kundel, H.L., Polansky, M. (2003). Measurement of observer agreement. Radiology, 288, 303308CrossRefGoogle Scholar
Maclure, M., Willett, W.C. (1987). Misinterpretation and misuse of the kappa statistic. American Journal of Epidemiology, 126, 161169CrossRefGoogle ScholarPubMed
Schuster, C. (2004). A note on the interpretation of weighted kappa and its relations to other rater agreement statistics for metric scales. Educational and Psychological Measurement, 64, 243253CrossRefGoogle Scholar
Tanner, M.A., Young, M.A. (1985). Modeling ordinal scale agreement. Psychological Bulletin, 98, 408415CrossRefGoogle Scholar
Vanbelle, S., Albert, A. (2009). Agreement between two independent groups of raters. Psychometrika, 74, 477491CrossRefGoogle Scholar
Vanbelle, S., Albert, A. (2009). A note on the linearly weighted kappa coefficient for ordinal scales. Statistical Methodology, 6, 157163CrossRefGoogle Scholar
Warrens, M.J. (2008). On the equivalence of Cohen’s kappa and the Hubert-Arabie adjusted Rand index. Journal of Classification, 25, 177183CrossRefGoogle Scholar
Warrens, M.J. (2008). On similarity coefficients for 2×2 tables and correction for chance. Psychometrika, 73, 487502CrossRefGoogle Scholar
Warrens, M.J. (2010). Inequalities between kappa and kappa-like statistics for k×k tables. Psychometrika, 75, 176185CrossRefGoogle Scholar
Warrens, M.J. (2010). A formal proof of a paradox associated with Cohen’s kappa. Journal of Classification, 27, 322332CrossRefGoogle Scholar
Warrens, M.J. (2010). Cohen’s kappa can always be increased and decreased by combining categories. Statistical Methodology, 7, 673677CrossRefGoogle Scholar
Warrens, M.J. (2011). Weighted kappa is higher than Cohen’s kappa for tridiagonal agreement tables. Statistical Methodology, 8, 268272CrossRefGoogle Scholar
Warrens, M.J. (2011). Cohen’s linearly weighted kappa is a weighted average of 2×2 kappas. Psychometrika, 76, 471486CrossRefGoogle Scholar
Warrens, M.J. (2012). Cohen’s quadratically weighted kappa is higher than linearly weighted kappa for tridiagonal agreement tables. Statistical Methodology, 9, 440444CrossRefGoogle Scholar
Warrens, M.J. (2012b, in press). Cohen’s linearly weighted kappa is a weighted average. Advances in Data Analysis and Classification. CrossRefGoogle Scholar
Zwick, R. (1988). Another look at interrater agreement. Psychological Bulletin, 103, 374378CrossRefGoogle Scholar