Structural Analysis of Subjective Categorical Data

Karl Christoph Klauer; William H. Batchelder

doi:10.1007/BF02294336

Structural Analysis of Subjective Categorical Data

Published online by Cambridge University Press: 01 January 2025

Karl Christoph Klauer and

William H. Batchelder

Show author details

Karl Christoph Klauer*: Affiliation:
University Bonn
William H. Batchelder: Affiliation:
University of California, Irvine
*: Requests for reprints should be sent to Karl Christoph Klauer, Psychologisches Institut, Universität Bonn, Römerstr. 168, 53118 Bonn, FR GERMANY.

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

A general approach to the analysis of subjective categorical data is considered, in which agreement matrices of two or more raters are directly expressed in terms of error and agreement parameters. The method provides focused analyses of ratings from several raters for whom ratings have measurement error distributions that may induce bias in the evaluation of substantive questions of interest. Each rater's judgment process is modeled as a mixture of two components: an error variable that is unique for the rater in question as well as an agreement variable that operationalizes the “true” values of the units of observation. The statistical problems of identification, estimation, and testing of such measurement models are discussed.

The general model is applied in several special cases. The most simple situation is that underlying Cohen's Kappa, where two raters place units into unordered categories. The model provides a generalization and systematization of the Kappa-idea to correct for agreement by chance. In applications with typical research designs, including a between-subjects design and a mixed within-subjects, between-subjects design, the model is shown to disentangle structural and measurement components of the observations, thereby controlling for possible confounding effects of systematic rater bias. Situations considered include the case of more than two raters as well as the case of ordered categories. The different analyses are illustrated by means of real data sets.

Keywords

measurement error model categorical data Cohen's kappa

Type: Original Paper
Information: Psychometrika , Volume 61 , Issue 2 , June 1996 , pp. 199 - 239

DOI: https://doi.org/10.1007/BF02294336 [Opens in a new window]
Copyright: Copyright © 1996 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

The authors wish to thank Lawrence Hubert and Ivo Molenaar for helpful and detailed comments on a previous draft of this paper. Thanks are also due to Jens Möller und Bernd Strauß for the data from the 1992 Olympic Games. We thank the editor and three anonymous reviewers for valuable comments on an earlier draft.

References

Agresti, A. (1990). Categorical data analysis, New York: Wiley.Google Scholar

Bales, R. F. (1950). Interaction process analysis, Cambridge, MA: Addison-Wesley.Google Scholar

Batchelder, W. H., Romney, A. K. (1986). The statistical analysis of a general Condorcet model for dichotomous choice situations. In Grofman, B., Owen, G. (Eds.), Information pooling and group decision making (pp. 103–112). Greenwich, CT: JAI Press.Google Scholar

Batchelder, W. H., Romney, A. K. (1988). Test theory without an answer key. Psychometrika, 53, 71–92.CrossRef Google Scholar

Batchelder, W. H., Romney, A. K. (1989). New results in test theory without an answer key. In Roskam, E. (Eds.), Advances in mathematical psychology, Vol. II (pp. 229–248). Heidelberg: Springer.Google Scholar

Brennan, R. L., Light, R. J. (1974). Measuring agreement when two observers classify people into categories not defined in advance. British Journal of Mathematical and Statistical Psychology, 2, 154–163.CrossRef Google Scholar

Brewer, D. D., Romney, A. K., Batchelder, W. H. (1991). Consistency and consensus: A replication. Journal of Quantitative Anthropology, 3, 195–205.Google Scholar

Carey, G., Gottesman, I. I. (1978). Reliability and validity in binary ratings: Areas of common misunderstanding in diagnosis and symptom ratings. Archives of General Psychiatry, 35, 1454–1459.CrossRef Google Scholar PubMed

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.CrossRef Google Scholar

Cohen, J. (1968). Weighted Kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213–220.CrossRef Google Scholar PubMed

Cohen, J. (1988). Statistical power analysis for the behavioral sciences 2nd ed.,, Hillsdale, NJ: Lawrence Erlbaum.Google Scholar

Cooil, B., Rust, T. (1994). Reliability and expected loss: A unifying principle. Psychometrika, 59, 203–216.CrossRef Google Scholar

Cooper, W. H. (1981). Ubiquitous halo. Psychological Bulletin, 90, 218–244.CrossRef Google Scholar

Cressie, N., Holland, P. W. (1983). Characterizing the manifest probabilities of a latent trait model. Psychometrika, 48, 129–141.CrossRef Google Scholar

Cronbach, L. J., Gleser, G. C., Nanda, H., Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles, New York: Wiley.Google Scholar

Dillon, W. R., Madden, T. J., Kumar, A. (1983). Analyzing sequential categorical data on dyadic interaction: A latent structure approach. Psychological Bulletin, 94, 564–583.CrossRef Google Scholar

Dillon, W. R., Mulani, N. (1984). A probabilistic latent class model for assessing inter-judge reliability. Multivariate Behavioral Research, 19, 438–458.CrossRef Google Scholar PubMed

Efron, B. (1982). The jackknife, the bootstrap and other resampling plans, Philadelphia, PA: Society for Industrial and Applied Mathematics.CrossRef Google Scholar

Ekman, P. (1988). Gesichtsausdruck und Gefühl [Facial expression and emotion], Paderborn: Jungfermann.Google Scholar

Erdfelder, E., & Bredenkamp, J. (1993). Recognition of script-typical versus script-atypical information: Effects of cognitive elaboration. Manuscript submitted for publication.Google Scholar

Faul, F., Erdfelder, E. (1992). GPOWER: A-priori, post-hoc, and compromise power analyses for MSDOS [Computer program], Bonn, FRG: Bonn University, Department of Psychology.Google Scholar

Feger, H. (1983). Planung und Bewertung von wissenschaftlichen Beobachtungen [Design and evaluation of scientific observations]. In Feger, H., Bredenkamp, J. (Eds.), Datenerhebung (pp. 1–75). Göttingen: Hogrefe.Google Scholar

Gavanski, I., Wells, G. L. (1989). Counterfactual processing of normal and exceptional events. Journal of Experimental Social Psychology, 25, 314–325.CrossRef Google Scholar

Grove, W. M., Andreason, N. C., McDonald-Scott, P., Keller, B., Shapiro, R. W. (1981). Reliability studies of psychiatric diagnosis. Archives of General Psychiatry, 38, 408–411.CrossRef Google Scholar PubMed

Haberman, S. J. (1977). Log-linear models and frequency tables with small expected cell counts. The Annals of Statistics, 5, 815–841.CrossRef Google Scholar

Hu, X., Batchelder, W. H. (1994). The statistical analysis of general processing tree models with the EM algorithm. Psychometrika, 59, 21–47.CrossRef Google Scholar

Hubert, L. (1977). Kappa revisited. Psychological Bulletin, 84, 289–297.CrossRef Google Scholar

Hubert, L., Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.CrossRef Google Scholar

Janes, C. L. (1979). An extension of the random error coefficient of agreement to N ×N tables. British Journal of Psychiatry, 134, 617–619.CrossRef Google Scholar

Jöreskog, K. G. (1978). Structural analysis of covariance and correlation matrices. Psychometrika, 43, 443–477.CrossRef Google Scholar

Kelderman, H. (1984). Loglinear Rasch model tests. Psychometrika, 49, 223–245.CrossRef Google Scholar

Klauer, K. C., Migulla, G. (1995). Spontanes kontrafaktisches Denken [Spontaneous counterfactual processing]. Zeitschrift für Sozialpsychologie, 26, 34–45.Google Scholar

Klauer, K. C., Stern, E. (1992). How attitudes guide memory-based judgements: A two-process model. Journal of Experimental Social Psychology, 28, 186–206.CrossRef Google Scholar

Koch, G. G., Landis, J. R., Freeman, J. L., Freeman, D. H. Jr., Lehnen, R. G. (1977). A general methodology for the analysis of experiments with repeated measurement of categorical data. Biometrics, 33, 133–158.CrossRef Google Scholar PubMed

Koch, G. G., Reinfurt, D. W. (1975). The analysis of categorical data from mixed models. Biometrics, 27, 157–173.CrossRef Google Scholar

Landis, R. J., Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174.CrossRef Google Scholar PubMed

Langeheine, R., Rost, J. (1988). Latent trait and latent class models, New York: Plenum.CrossRef Google Scholar

Lehmann, E. L. (1970). Testing statistical hypotheses, New York: Wiley.Google Scholar

Liou, M., Yu, L. (1991). Assessing statistical accuracy in ability estimation: A bootstrap approach. Psychometrika, 56, 55–67.CrossRef Google Scholar

Lord, F. M., Novick, M. R. (1968). Statistical theories of mental test scores, Reading, MA: Lawrence Erlbaum.Google Scholar

Maher, K. M. (1987). A multiple choice model for aggregating group knowledge and estimating individual competence, Irvine: University of California.Google Scholar

Maxwell, A. E. (1977). Coefficients of agreement between observers and their interpretation. British Journal of Psychiatry, 130, 79–83.CrossRef Google Scholar PubMed

Möller, J. (1993). Zur Ausdifferenzierung des Paradigmas “Spontane Attributionen”: Eine empirische Analyse zeitlich unmittelbarer Ursachenzuschreibungen [Towards a differentiation of the paradigm “spontaneous attributions”: An empirical analysis of immediate causal descriptions]. Zeitschrift für Sozialpsychologie, 24, 129–136.Google Scholar

Möller, J., Strauß, B. (1994). Agreement matrix for ratings of causal location and stability of events of the 1992 Olympic Games, Kiel, FRG: University of Kiel.Google Scholar

Nisbett, R. E., Wilson, T. (1977). The halo effect: Evidence for unconscious alteration of judgements. Journal of Personality and Social Psychology, 35, 250–256.CrossRef Google Scholar

Perreault, W. D. Jr., Leigh, L. E. (1989). Reliability of nominal data based on qualitative judgements. Journal of Marketing Research, 26, 135–148.CrossRef Google Scholar

Rao, C. R. (1973). Linear statistical inference and its applications, New York: Wiley.CrossRef Google Scholar

Schutz, W. C. (1952). Reliability, ambiguity and content analysis. Psychological Review, 59, 119–129.CrossRef Google Scholar PubMed

Shweder, R. A., D'Andrade, R. G. (1980). The systematic distortion hypothesis. In Shweder, R. A., Fiske, D. W. (Eds.), New directions for methodology of behavioral science: Fallible judgements in behavioral research, San-Francisco: Jossey-Bass.Google Scholar

Simon, A., Boyer, E. G. (1974). Mirrors of behavior III. An anthology of observation instruments, Wyncote, PA: Communication Materials Center.Google Scholar

Spitznagel, E. L., Helzer, J. E. (1985). A proposed solution to the base rate problem in the kappa statistic. Archives of General Psychiatry, 42, 725–728.CrossRef Google Scholar

Sprott, D. A., Vogel-Sprott, M. D. (1987). The use of the log-odds ratio to assess the reliability of dichotomous questionnaire data. Applied Psychological Measurement, 11, 307–316.CrossRef Google Scholar

Uebersax, J. S. (1987). Diversity of decision-making models and the measurement of interrater agreement. Psychological Bulletin, 101, 140–146.CrossRef Google Scholar

Uebersax, J. S. (1988). Validity inferences from interobserver agreement. Psychological Bulletin, 104, 405–416.CrossRef Google Scholar

Weller, S. C. (1984). Consistency and consensus among informants: Disease concepts in a rural mexican town. American Anthropologist, 88, 313–338.Google Scholar

Zwick, R. (1988). Another look at interrater agreement. Psychological Bulletin, 103, 374–378.CrossRef Google Scholar

Article contents

Structural Analysis of Subjective Categorical Data

Abstract

Keywords

Access options

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests