A New Interpretation of the Weighted Kappa Coefficients

Sophie Vanbelle

doi:10.1007/s11336-014-9439-4

A New Interpretation of the Weighted Kappa Coefficients

Published online by Cambridge University Press: 01 January 2025

Sophie Vanbelle

Show author details

Sophie Vanbelle*: Affiliation:
Maastricht University
*: Correspondence should be made to Sophie Vanbelle, Department of Methodology & Statistics, CAPHRI School for Public Health and Primary Care, Maastricht University, P.O. Box 616, 6200 MD Maastricht, The Netherlands. Email: sophie.vanbelle@maastrichtuniversity.nl

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Reliability and agreement studies are of paramount importance. They do contribute to the quality of studies by providing information about the amount of error inherent to any diagnosis, score or measurement. Guidelines for reporting reliability and agreement studies were recently provided. While the use of the kappa-like family is advised for categorical and ordinal scales, no further guideline in the choice of a weighting scheme is given. In the present paper, a new simple and practical interpretation of the linear- and quadratic-weighted kappa coefficients is given. This will help researchers in motivating their choice of a weighting scheme.

Keywords

agreement reliability ordinal scale linear quadratic

Information

Type: Article
Information: Psychometrika , Volume 81 , Issue 2 , June 2016 , pp. 399 - 410

DOI: https://doi.org/10.1007/s11336-014-9439-4 [Opens in a new window]
Copyright: Copyright © 2014 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Brenner, H., Kliebsch, U. (1996). Dependence of weighed kappa coefficients on the number of categories. Epidemiology, 7, 199–202CrossRef Google Scholar

Byrt, T., Bishop, J., Carlin, J.B. (1993). Bias, prevalence and kappa. Journal of Clinical Epidemiology, 46, 423–429CrossRef Google Scholar PubMed

Cicchetti, D., Allison, T. (1971). A new procedure for assessing reliability of scoring eeg sleep recordings. American Journal EEG Technology, 11, 101–109CrossRef Google Scholar

Cicchetti, D.V., Feinstein, A.R. (1990). High agreement but low kappa: II. Resolving the paradoxes. Journal of Clinical Epidemiology, 43, 551–558CrossRef Google Scholar PubMed

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46CrossRef Google Scholar

Cohen, J. (1968). Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213–220CrossRef Google Scholar PubMed

Feinstein, A.R., Cicchetti, D.V. (1990). High agreement but low kappa: I. The problem of two paradoxes. Journal of Clinical Epidemiology, 43, 543–549CrossRef Google Scholar

Fleiss, J.L., Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measure of reliability. Educational and Psychological Measurement, 33, 613–619CrossRef Google Scholar

Kottner, J., Audigé, L., Brorson, S., Donner, A., Gajewski, B., Hröbjartsson, A., Roberts, C., Shoukri, M., Streiner, D. (2011). Guidelines for reporting reliability and agreement studies (grras) were proposed. Journal of Clinical Epidemiology, 64, 96–106CrossRef Google Scholar PubMed

Kraemer, H.C. (1979). Ramifications of a population model for

κ

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa $$\end{document}

as a coefficient of reliability. Psychometrika, 44, 461–472CrossRef Google Scholar

Kraemer, H.C., Vyjeyanthi, S.P., Noda, A. (2004). Dynamic ambient paradigms. In D’Agostino, R.B. (Ed.), Tutorial in Biostatistics (pp. 85–105), New York: WileyCrossRef Google Scholar

Lipsitz, S.R. (1992). Methods for estimating the parameters of a linear model for ordered categorical data. Biometrics, 48, 271–281CrossRef Google Scholar PubMed

McGraw, K.O., Wong, S.P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1, 30–46CrossRef Google Scholar

Pearson, K. (1900). Mathematical contributions to the theory of evolution. VII. On the correlation of characters not quantitatively measurable. Philosophical Transactions of the Royal Society of London, Series A, 195, 1–45Google Scholar

Rogot, E., Goldberg, I.D. (1966). A proposed index for measuring agreement in test–retest studies. Journal of Chronic Diseases, 19, 991–1006CrossRef Google Scholar PubMed

Schuster, C. (2004). A note on the interpretation of weighted kappa and its relation to other rater agreement statistics for metric scales. Educational and Psychological Measurement, 64, 243–253CrossRef Google Scholar

Shrout, P.E., Fleiss, J.L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86, 420–428CrossRef Google Scholar PubMed

Stine, W. (1989). Interobserver relational agreement. Psychological Bulletin, 106, 341–347CrossRef Google Scholar

Vach, W. (2005). The dependence of Cohen’s kappa on the prevalence does not matter. Journal of Clinical Epidemiology, 58, 655–661CrossRef Google Scholar

Vanbelle, S. (2013). Clinical agreement in qualitative measurements: The kappa coefficient in clinical research. In Doi, S., Williams, G. (Eds.), Methods of clinical epidemiology, Springer series on epidemiology and public health (pp. 3–38), Heidelberg: SpringerGoogle Scholar

Vanbelle, S., Albert, A. (2009). A note on the linearly weighted kappa coefficient for ordinal scales. Statistical Methodology, 6, 157–163CrossRef Google Scholar

Warrens, M. (2013). Conditional inequalities between cohen’s kappa and weighted kappas. Statistical Methodology, 10, 14–22CrossRef Google Scholar

Warrens, M. (2014). Corrected zegers-ten berge coefficients are special cases of Cohen’s weighted kappa. Journal of Classification, 31, 179–193CrossRef Google Scholar

Warrens, M.J. (2011). Cohen’s linearly weighted kappa is a weighted average of

2 \times 2

kappas. Psychometrika, 76, 471–486CrossRef Google Scholar

Warrens, M.J. (2012). Cohen’s quadratically weighted kappa is higher than linearly weighted kappa for tridiagonal agreement tables. Statistical Methodology, 9, 440–444CrossRef Google Scholar

Warrens, M.J. (2013). The Cicchetti

-

Allison weighting matrix is positive definite. Computational Statistics & Data Analysis, 59, 180–182CrossRef Google Scholar

Warrens, M.J. (2013). Some paradoxical results for the quadratically weighted kappa. Psychometrika, 77, 315–323CrossRef Google Scholar

Warrens, M. J. (2013). Weighted kappas for

3 \times 3

tables. Journal of Probability and Statistics.Google Scholar

Yang, J., Chinchilli, V.M. (2009). Fixed-effects modeling of Cohen’s kappa for bivariate multinomial data. Communications in Statistics: Theory and Methods, 38, 3634–3653CrossRef Google Scholar

Yang, J., Chinchilli, V.M. (2011). Fixed-effects modeling of Cohen’s weighted kappa for bivariate multinomial data. Computational Statistics & Data Analysis, 55, 1061–1070CrossRef Google Scholar

Article contents

A New Interpretation of the Weighted Kappa Coefficients

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests