Hostname: page-component-745bb68f8f-b6zl4 Total loading time: 0 Render date: 2025-01-07T18:52:41.322Z Has data issue: false hasContentIssue false

Inequalities Between Kappa and Kappa-Like Statistics for k × k Tables

Published online by Cambridge University Press:  01 January 2025

Matthijs J. Warrens*
Affiliation:
Leiden University
*
Requests for reprints should be sent to Matthijs J. Warrens, Institute of Psychology, Unit Methodology and Statistics, Leiden University, P.O. Box 9555, 2300 RB Leiden, The Netherlands. E-mail: warrens@fsw.leidenuniv.nl
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

The paper presents inequalities between four descriptive statistics that can be expressed in the form [PE(P)]/[1 − E(P)], where P is the observed proportion of agreement of a k × k table with identical categories, and E(P) is a function of the marginal probabilities. Scott’s π is an upper bound of Goodman and Kruskal’s λ and a lower bound of both Bennett et al. S and Cohen’s κ. We introduce a concept for the marginal probabilities of the k×k table called weak marginal symmetry. Using the rearrangement inequality, it is shown that Bennett et al. S is an upper bound of Cohen’s κ if the k×k table is weakly marginal symmetric.

Type
Theory and Methods
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NC
This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Copyright
Copyright © 2009 The Psychometric Society

References

Agresti, A. (1990). Categorical data analysis, New York: Wiley.Google Scholar
Agresti, A., & Winner, L. (1997). Evaluating agreement and disagreement among movie reviewers. Chance, 10, 1014.CrossRefGoogle Scholar
Bennett, E.M., Alpert, R., & Goldstein, A.C. (1954). Communications through limited response questioning. Public Opinion Quarterly, 18, 303308.CrossRefGoogle Scholar
Blackman, N.J.-M., & Koval, J.J. (1993). Estimating rater agreement in 2×2 tables: Correction for chance and intraclass correlation. Applied Psychological Measurement, 17, 211223.CrossRefGoogle Scholar
Brennan, R.L., & Prediger, D.J. (1981). Coefficient kappa: Some uses, misuses, and alternatives. Educational and Psychological Measurement, 41, 687699.CrossRefGoogle Scholar
Byrt, T., Bishop, J., & Carlin, J.B. (1993). Bias, prevalence and kappa. Journal of Clinical Epidemiology, 46, 423429.CrossRefGoogle ScholarPubMed
Cohen, J.A. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 213220.CrossRefGoogle Scholar
Conger, A.J. (1980). Integration and generalization of kappas for multiple raters. Psychological Bulletin, 88, 322328.CrossRefGoogle Scholar
De Mast, J. (2007). Agreement and kappa-type indices. The American Statistician, 61, 148153.CrossRefGoogle Scholar
Dou, W., Ren, Y., Wu, Q., Ruan, S., Chen, Y., Bloyet, D., & Constans, J.-M. (2007). Fuzzy kappa for the agreement measure of fuzzy classifications. Neurocomputing, 70, 726734.CrossRefGoogle Scholar
Feinstein, A.R., & Cicchetti, D.V. (1990). High agreement but low kappa: I. The problems of two paradoxes. Journal of Clinical Epidemiology, 43, 543548.CrossRefGoogle Scholar
Fleiss, J.L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76, 378382.CrossRefGoogle Scholar
Fleiss, J.L. (1975). Measuring agreement between two judges on the presence or absence of a trait. Biometrics, 31, 651659.CrossRefGoogle ScholarPubMed
Goodman, G.D., & Kruskal, W.H. (1954). Measures of association for cross classifications. Journal of the American Statistical Association, 49, 732764.Google Scholar
Hardy, G.H., Littlewood, J.E., & Polya, G. (1988). Inequalities, (2nd ed.). Cambridge: Cambridge University Press.Google Scholar
Holley, J.W., & Guilford, J.P. (1964). A note on the G index of agreement. Educational and Psychological Measurement, 24, 749753.CrossRefGoogle Scholar
Hubert, L.J., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193218.CrossRefGoogle Scholar
Janson, S., & Vegelius, J. (1979). On generalizations of the G index and the Phi coefficient to nominal scales. Multivariate Behavioral Research, 14, 255269.CrossRefGoogle Scholar
Krippendorff, K. (1987). Association, agreement, and equity. Quality and Quantity, 21, 109123.CrossRefGoogle Scholar
Krippendorff, K. (2004). Reliability in content analysis: Some common misconceptions and recommendations. Human Communication Research, 30, 411433.Google Scholar
Maxwell, A.E. (1977). Coefficients between observers and their interpretation. British Journal of Psychiatry, 116, 651655.CrossRefGoogle Scholar
Mitrinović, D.S. (1964). Elementary inequalities, Noordhoff: Groningen.Google Scholar
Scott, W.A. (1955). Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly, 19, 321325.CrossRefGoogle Scholar
Steinley, D. (2004). Properties of the Hubert-Arabie adjusted Rand index. Psychological Methods, 9, 386396.CrossRefGoogle ScholarPubMed
Visser, H., & De Nijs, T. (2006). The map comparison kit. Environmental Modelling & Software, 21, 346358.CrossRefGoogle Scholar
Warrens, M.J. (2008). On similarity coefficients for 2×2 tables and correction for chance. Psychometrika, 73, 487502.CrossRefGoogle Scholar
Warrens, M.J. (2008). Bounds of resemblance measures for binary (presence/absence) variables. Journal of Classification, 25, 195208.CrossRefGoogle Scholar
Warrens, M.J. (2008). On association coefficients for 2×2 tables and properties that do not depend on the marginal distributions. Psychometrika, 73, 777789.CrossRefGoogle Scholar
Warrens, M.J. (2008). On the indeterminacy of resemblance measures for (presence/absence) data. Journal of Classification, 25, 125136.CrossRefGoogle Scholar
Warrens, M.J. (2008). On the equivalence of Cohen’s kappa and the Hubert-Arabie adjusted Rand index. Journal of Classification, 25, 177183.CrossRefGoogle Scholar
Zwick, R. (1988). Another look at interrater agreement. Psychological Bulletin, 103, 374378.CrossRefGoogle Scholar