Hostname: page-component-cd9895bd7-8ctnn Total loading time: 0 Render date: 2025-01-05T01:02:09.211Z Has data issue: false hasContentIssue false

Geometric Representation of Association between Categories

Published online by Cambridge University Press:  01 January 2025

Willem J. Heiser*
Affiliation:
Leiden University
*
Correspondence should be addressed to: Willem J. Heiser, Department of Psychology, Leiden University, PO Box 9555, 2300 RB Leiden, The Netherlands (e-mall: Heiser@fsw.leidenuniv.nl).

Abstract

Categories can be counted, rated, or ranked, but they cannot be measured. Likewise, persons or individuals can be counted, rated, or ranked, but they cannot be measured either. Nevertheless, psychology has realized early on that it can take an indirect road to measurement: What can be measured is the strength of association between categories in samples or populations, and what can be quantitatively compared are counts, ratings, or rankings made under different circumstances, or originating from different persons. The strong demand for quantitative analysis of categorical data has thus created a variety of statistical methods, with substantial contributions from psychometrics and sociometrics. What is the common basis of these methods dealing with categories? The basic element they share is that the sample space has a special geometry, in which categories (or persons) are point masses forming a simplex, while distributions of counts or profiles of ratings are centers of gravity, which are also point masses. Rankings form a discrete subset in the interior of the simplex, known as the permutation polytope, and paired comparisons form another subset on the edges of the simplex. Distances between point masses form the basic tool of analysis. The paper gives some history of major concepts, which naturally leads to a new concept: the shadow point. It is then shown how loglinear models, Luce and Rasch models, unfolding models, correspondence analysis and homogeneity analysis, forced classification and classification trees, as well as other models and methods, fit into this particular geometrical framework.

Type
2004 Presidential Address
Copyright
Copyright © 2004 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

This paper is based on my Presidential Address delivered at the 69th Annual Meeting of the Psychometric Society, Pacific Grove, California, June 14–17, 2004. It was completed during a stay as Fellow of the Netherlands Institute for Advanced Study in the Humanities and Social Sciences (NIAS) in Wassenaar, The Netherlands.

I would like to thank Marike Polak, Frank Busing, Elise Dusseldorp, and Angela Jansen for their help in the data analyses and the preparation of the figures, and Laurence Frank for her assistance during the oral presentation. I am also very lucky to have a career-long personal coach, Jacqueline J. Meulman, with whom I share so many interests and perspectives.

References

Agresti, A. (1990). Categorical Data Analysis, New York: WileyGoogle Scholar
Aitchison, J., Greenacre, M. (2002). Biplots of compositional data. Applied Statistics, 51, 375392Google Scholar
Andrich, D. (1988). The application of an unfolding model of the PIRT type to the measurement of attitude. Applied Psychological Measurement, 12, 3351CrossRefGoogle Scholar
Andrich, D. (1995). Hyperbolic cosine latent trait models for unfolding direct responses and pairwise preferences. Applied Psychological Measurement, 19, 269290CrossRefGoogle Scholar
Andrich, D. (2004). A hyperbolic cosine latent trait model for unfolding polytomous responses: Reconciling Thurstone and Likert methodologies. British Journal of Mathematical and Statistical Psychology, 49, 347365CrossRefGoogle Scholar
Anglin, M.D., McGlothlin, W.H., Speckart, G. (1981). The effect of parole on methadone-patient behavior. American Journal of Drug and Alcohol Abuse, 8, 153170CrossRefGoogle ScholarPubMed
Bakhuis Roozeboom, H.W. (1894).Grafische Darstellung der heterogenen Systeme aus ein bis vier Stoffen, mit Einschluss der Chemischen Umsetzung [Graphical Representation of Heterogeneous Systems in One to Four Substances, Including their Chemical Conversion]. Zeitschrift für Physikalische Chemie, 15, 145158CrossRefGoogle Scholar
Bartholomew, D.J. (1980). Factor analysis for categorical data. Journal of the Royal Statistical Society, 42, 293321CrossRefGoogle Scholar
Benzécri, J.-P. (1973). L'analyse des Données, Tome II: L'analyse des Correspondances. [Data Analysis, Part II: Correspondence Analysis] Paris: DunodGoogle Scholar
Benzécri, J.-P. (2002). Correspondence Analysis Handbook, New York: Marcel DekkerGoogle Scholar
Blasius, J., Greenacre, M.J. (1994). Vizualization of Categorical Data, New York: Academic PressGoogle Scholar
Böckenholt, U. (2003). Applications of Thurstonian models to ranking data. In Fligner, M., Verducci, J. (Eds.), Probability Models and Statistical Analyses for Ranking Data (pp. 157172). New York: Springer VerlagGoogle Scholar
Böckenholt, U. (2002). A Thurstonian analysis of preference change. Journal of Mathematical Psychology, 46, 300314CrossRefGoogle Scholar
Boring, E.G. (1942). Sensation and Perception in the History of Experimental Psychology, New York: Appleton-Century-CroftsGoogle Scholar
Bradley, R.A., Terry, M.E. (1952). Rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika, 39, 324345Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J. (1984). Classification and Regression Trees, Belmont, CA: WadsworthGoogle Scholar
Busing, F.M.T.A., Groenen, P.J.F., & Heiser, W.J. (2005). Avoiding degeneracy in multidimensional unfolding by penalizing on the coefficient of variation. Psychometrika, 70, in press.CrossRefGoogle Scholar
Carroll, J.D. (1972). Individual differences and multidimensional scaling. In Shepard, R.N., Romney, A.K., Nerlove, S.B. (Eds.), Multidimensional Scaling: Theory and Applications in the Behavioral Sciences, Vol. I. Theory (pp. 105155). New York: WileyGoogle Scholar
Cliff, N., Collins, L.M., Zatkin, J.L., Gallipeau, D., McCormick, D.J. (1988). An ordinal scaling method for questionnaire and other ordinal data. Applied Psychological Measurement, 12, 8397CrossRefGoogle Scholar
Cohen, A., Mallows, C.L. (1980). Analysis of ranking data, Murray Hill, New Jersey: Bell Telephone LaboratoriesGoogle Scholar
Cohen, A., Mallows, C.L. (1983). Assessing goodness of fit of ranking models to data. The Statistician, 32, 361373CrossRefGoogle Scholar
Coombs, C.H. (1950). Psychological scaling without a unit of measurement. Psychological Review, 57, 145158CrossRefGoogle ScholarPubMed
Coombs, C.H. (1964). A Theory of Data, New York: WileyGoogle Scholar
Cox, D.R. (1970). The Analysis of Binary Data, London: MethuenGoogle Scholar
Coxeter, H.S.M. (1973). Regular Polytopes, New York: DoverGoogle Scholar
Critchlow, D.E. (1985). Metric Methods for Analyzing Partially Ranked Data, New York: Springer VerlagCrossRefGoogle Scholar
Daniels, H.E. (1950). Rank correlation and population models. Journal of the Royal Statistical Society, 12, 171181CrossRefGoogle Scholar
Delbeke, L. (1968). Construction of preference spaces: An investigation into the applicability of multidimensional scaling models, Leuven: Leuvense Universitaire UitgavenGoogle Scholar
De Rooij, M., & Heiser, W.J. (2005). Graphical representations and odds ratios in a distance-association model for the analysis of cross-classified data. Psychometrika, 70, in press.CrossRefGoogle Scholar
DeSarbo, W.S., Cho, J. (1989). A stochastic multidimensional scaling vector threshold model for the spatial representation of “pick any/N” data. Psychometrika, 54, 105129CrossRefGoogle Scholar
DeSarbo, W.S., Hoffman, D.L. (1986). Simple and weighted unfolding threshold models for the spatial representation of binary choice data. Applied Psychological Measurement, 10, 247264CrossRefGoogle Scholar
DeSarbo, W.S., Rao, V.R. (1984). GENFOLD2: A set of models and algorithms for the GENeral unfolding analysis of preference/dominance data. Journal of Classification, 1, 147186CrossRefGoogle Scholar
De Soete, G., Heiser, W.J. (2003). A latent class unfolding model for analyzing single stimulus preference ratings. Psychometrika, 58, 545565CrossRefGoogle Scholar
Diaconis, P. (1982). Group Theory in Statistics. Harvard University Lecture Notes.Google Scholar
Diaconis, P. (1988). Group Representations in Probability and Statistics, Hayward, CA: Institute of Mathematical StatisticsCrossRefGoogle Scholar
Dijksterhuis, E.J. (1987). Archimedes, Princeton, NJ: Princeton University PressCrossRefGoogle Scholar
Embretson, S.E. (1984). A general multicomponent latent trait model for response processes. Psychometrika, 49, 175186CrossRefGoogle Scholar
Escher, B.G. (1934). De Methodes der Grafische Voorstelling [Methods of Graphical Representation]. Amsterdam: Maatschappij voor Goede en Goedkope LectuurGoogle Scholar
Feigin, P.D., Cohen, A. (1978). On a model for concordance between judges. Journal of the Royal Statistical Society, 40, 203213CrossRefGoogle Scholar
Fienberg, S.E. (1968). The geometry of anr ×c contingency table. The Annals of Mathematical Statistics, 39, 11861190CrossRefGoogle Scholar
Fienberg, S.E. (1970). An iterative procedure for estimation in contingency tables. The Annals of Mathematical Statistics, 41, 907917CrossRefGoogle Scholar
Fienberg, S.E., Gilbert, J.P. (1970). The geometry of a two by two contingency table. Journal of the American Statistical Association, 65, 694701CrossRefGoogle Scholar
Fienberg, S.E., Holland, P.W. (1973). Simultaneous estimation of multinomial cell probabilities. Journal of the American Statistical Association, 68, 683691CrossRefGoogle Scholar
Fienberg, S.E., Larntz, K. (1976). Loglinear representation for paired and multiple comparison models. Biometrika, 63, 245254CrossRefGoogle Scholar
Fischer, G.H. (1973). Linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359374CrossRefGoogle Scholar
Fligner, M.A., Verducci, J.S. (1986). Distance based ranking models. Journal of the Royal Statistical Society, 48, 359369CrossRefGoogle Scholar
Fligner, M.A., Verducci, J.S. (1988). Multistage ranking models. Journal of the American Statistical Association, 83, 892901CrossRefGoogle Scholar
Fligner, M.A., Verducci, J.S. (2003). Probability Models and Statistical Analyses for Ranking Data, New York: Springer VerlagGoogle Scholar
Galton, F. (1888). Co-relations and their measurement, chiefly from anthropometric data. Proceedings of the Royal Society of London, 45, 135145Google Scholar
Gibbs, J.W. (1877). On the equilibrium of heterogeneous substances. Transactions of the Connecticut Academy, III, 108248Google Scholar
Gifi, A. (1990). Nonlinear Multivariate Analysis, New York: WileyGoogle Scholar
Goodman, L.A. (1985). The analysis of cross-classified data having ordered and/or unordered categories: Association models, correlation models, and asymmetry models for contingency tables with or without missing entries. Annals of Statistics, 13, 1069CrossRefGoogle Scholar
Greenacre, M.J. (1988). Clustering the rows and columns of a contingency table. Journal of Classification, 5, 3951CrossRefGoogle Scholar
Greenacre, M.J. (2003). Biplots in correspondence analysis. Journal of Applied Statistics, 20, 251269CrossRefGoogle Scholar
Hastie, T., Tibshirani, R., Friedman, J.H. (2001). The Elements of Statistical Learning, New York: Springer VerlagCrossRefGoogle Scholar
Heath, T.L. (1925). Introduction. In Euclid, (Eds.), The Thirteen Books of the Elements, New York: DoverGoogle Scholar
Heiser, W.J. (1981). Unfolding Analysis of Proximity Data. Unpublished Ph.D. Thesis, Leiden University.Google Scholar
Heiser, W.J. (1989). Order invariant unfolding analysis under smoothness restrictions. In De Soete, G., Feger, H., Klauer, K.C. (Eds.), New Developments in Psychological Choice Modeling (pp. 331). Amsterdam: North-HollandCrossRefGoogle Scholar
Heiser, W.J. (2001). Correspondence analysis. In Smelser, N.J., Baltes, P.B. (Eds.), International Encyclopedia of the Social and Behavioral Sciences (pp. 28202824). Oxford: PergamonCrossRefGoogle Scholar
Heiser, W.J. (2003). Trust in relations. Measurement: Interdisciplinary Research and Perspectives, 1, 264269Google Scholar
Heiser, W.J. (2003b). Interpretation of Between-Set Distances in Correspondence Analysis. Paper presented at the DI-MACS Workshop on Algorithms for Multidimensional Scaling, II. Tallahassee, Florida, USA, June 11–12, 2003.Google Scholar
Heiser, W.J., Busing, F.M.T.A. (2004). Multidimensional scaling and unfolding of symmetric and asymmetric proximity relations. In Kaplan, D. (Eds.), The SAGE Handbook of Quantitative Methodology for the Social Sciences (pp. 2548). Thousand Oaks, CA: SageGoogle Scholar
Heiser, W.J., Meulman, J.J. (1983). Analyzing rectangular tables by joint and constrained MDS. Journal of Econometrics, 22, 139167CrossRefGoogle Scholar
Ihm, P., Van Groenewoud, H. (1975). A multivariate ordering of vegetation data based on Gaussian type gradient response curves. Journal of Ecology, 63, 767778CrossRefGoogle Scholar
Israëls, A. (1987). Eigenvalue Techniques for Qualitative Data, Leiden: DSWO PressGoogle Scholar
Johnson, M., Junker, B.W. (2003). Using data augmentation and Markov chain Monte Carlo for the estimation of unfolding response models. Journal of Educational and Behavioral Statistics, 28, 195230CrossRefGoogle Scholar
Kelderman, H., Rijkes, C.P.M. (1994). Loglinear multidimensional IRT models for polytomously scored items. Psychometrika, 59, 149176CrossRefGoogle Scholar
Kendall, M.G. (1948). Rank Correlation Methods first edition,, London: GriffinGoogle Scholar
Kim, C., Rangaswamy, A., DeSarbo, W.S. (1999). A quasi-metric approach to multidimensional unfolding for reducing the occurrence of degenerate solutions. Multivariate Behavioral Research, 34, 134180CrossRefGoogle ScholarPubMed
Kruskal, J.B., Carroll, J.D. (1969). Geometric models and badness-of-fit functions. In Krishnaiah, P.R. (Eds.), Multivariate Analysis, Vol II (pp. 639671). New York: Academic PressGoogle Scholar
Kruskal, J.B., Shepard, R.N. (1974). A nonmetric variety of linear factor analysis. Psychometrika, 39, 123157CrossRefGoogle Scholar
Lebart, L. (1994). Correspondence analysis, discrimination, and neural networks. In Hayashi, C., Ohsumi, N., Yajima, K., Tanaka, Y., Bock, H.-H., Baba, Y. (Eds.), Data Science, Classification, and Related Methods (pp. 423430). Tokyo: SpringerGoogle Scholar
Lee, S.-Y., Poon, W.-Y., Bentler, P.M. (2002). Structural equation models with continuous and polytomous variables. Psychometrika, 57, 89105CrossRefGoogle Scholar
Lewis, C. (1986). Test Theory and Psychometrika: The past twenty-five years. Psychometrika, 51, 1122CrossRefGoogle Scholar
Lovie, A.D. (1995). Who discovered Spearman's rank correlation?. British Journal of Mathematical and Statistical Psychology, 48, 255269CrossRefGoogle Scholar
Luce, R.D. (1959). Individual Choice Behavior: A Theoretical Analysis, New York: WileyGoogle Scholar
Magidson, J., Vermunt, J.K. (2001). Latent class factor and cluster models, bi-plots, and related graphical displays. Sociological Methodology, 31, 223274CrossRefGoogle Scholar
Mallows, C.L. (1957). Non-null ranking models: I. Biometrika, 44, 114130CrossRefGoogle Scholar
Marden, J.I. (1995). Analyzing and Modeling Rank Data, London: Chapman & HallGoogle Scholar
Maxwell, J.C. (1857). Experiments on colour, as perceived by the eye, with remarks on colour-blindness. Transactions of the Royal Society of Edinburgh, 21, 275298CrossRefGoogle Scholar
Maxwell, J.C. (1860). On the theory of compound colours, and the relations of the colours of the spectrum. Philosophical Transactions of the Royal Society of London, 150, 5784Google Scholar
McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In Zarembka, P. (Eds.), Frontiers in Econometrics (pp. 105142). New York: Academic PressGoogle Scholar
Meulman, J.J., Heiser, W.J. (1994). Visual display of interaction in multiway contingency tables by use of homogeneity analysis: the 2×2×2×2 case. In Blasius, J., Greenacre, M. (Eds.), Visualization of Categorical Data (pp. 277296). New York: Academic PressGoogle Scholar
Meulman, J.J., Van der Kooij, A.J., Heiser, W.J. (2004). Principal components analysis with nonlinear optimal scaling transformations for ordinal and nominal data. In Kaplan, D. (Eds.), The SAGE Handbook of Quantitative Methodology for the Social Sciences (pp. 4970). Thousand Oaks, CA: SageGoogle Scholar
Michell, J. (1999). Measurement in Psychology: Critical History of a Methodological Concept, Cambridge, UK: Cambridge University PressCrossRefGoogle Scholar
Mirkin, B. (2001). Eleven ways to look at the chi-squared coefficient for contingency tables. The American Statistician, 55, 111120CrossRefGoogle Scholar
Mirkin, B., Arabie, P., Hubert, L.J. (1995). Additive two-mode clustering: The error-variance approach revisited. Journal of Classification, 12, 243263CrossRefGoogle Scholar
Nishisato, S. (1984). Forced classification: A simple application of a quantification technique. Psychometrika, 49, 2536CrossRefGoogle Scholar
Nishisato, S. (2004). Dual scaling. In Kaplan, D. (Eds.), The SAGE Handbook of Quantitative Methodology for the Social Sciences (pp. 324). Thousand Oaks, CA: SageCrossRefGoogle Scholar
Pearson, K. (1896). Mathematical contributions to the theory of evolution—III: Regression, heredity, and panmixia. Philosophical Transactions of the Royal Society of London, Series A, Mathematical and Physical Sciences, 187, 253318Google Scholar
Pearson, K. (1900). The Grammar of Science 2nd Ed.,, London: Adam and Charles BlackGoogle Scholar
Plackett, R.L. (1975). The analysis of permutations. Applied Statistics, 24, 193202CrossRefGoogle Scholar
Post, W.J. (2002). Nonparametric Unfolding Models: A Latent Structure Approach, Leiden: DSWO PressGoogle Scholar
Rasch, G. (1966). An item analysis which takes individual differences into account. British Journal of Mathematical and Statistical Psychology, 19, 4957CrossRefGoogle ScholarPubMed
Roberts, J.S., Donoghue, J.R., Laughlin, J.E. (2000). A general item response theory model for unfolding unidimendional polytomous responses. Applied Psychological Measurement, 24, 332CrossRefGoogle Scholar
Roberts, J.S., Laughlin, J.E. (2004). A unidimensional item response model for unfolding responses from a graded disagree-agree response scale. Applied Psychological Measurement, 20, 231255CrossRefGoogle Scholar
Roskam, E.E.Ch.I. (1968). Metric Analysis of Ordinal Data in Psychology, Voorschoten: VAM PublGoogle Scholar
Ross, J., Cliff, N. (1964). A generalization of the interpoint distance model. Psychometrika, 29, 167176CrossRefGoogle Scholar
Schönemann, P.H. (1970). On metric multidimensional unfolding. Psychometrika, 35, 349366CrossRefGoogle Scholar
Schoute, P.H. (1911). Analytic treatment of the polytopes regularly derived from the regular polytopes. Verhandelingen der Koninklijke Akademie van Wetenschappen te Amsterdam (eerste sectie), 11(3), 182Google Scholar
Slater, P. (1960). The analysis of personal preferences. British Journal of Statistical Psychology, 13, 119135CrossRefGoogle Scholar
Spearman, C.E. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72101CrossRefGoogle Scholar
Spearman, C.E. (1904). ‘General intelligence’ objectively determined and measured. American Journal of Psychology, 15, 201293CrossRefGoogle Scholar
Spearman, C.E. (1906). ‘Footrule’ for measuring correlation. British Journal of Psychology, 2, 89108Google Scholar
Stein, S. (1999). Archimedes: What Did He Do Besides Cry Eureka?, Washington, DC: The Mathematical Association of AmericaCrossRefGoogle Scholar
Stevens, S.S. (1951). Mathematics, measurement and psychophysics. In Stevens, S.S. (Eds.), Handbook of Experimental Psychology (pp. 149). New York: WileyGoogle Scholar
Takane, Y. (1987). Analysis of contingency tables by ideal point discriminant analysis. Psychometrika, 52, 493513CrossRefGoogle Scholar
Takane, Y. (2004). An item response model for multidimensional analysis of multiple-choice questionaire data. Behaviormetrika, 23, 153167CrossRefGoogle Scholar
Takane, Y. (1994). Visualisation in ideal point discriminant analysis. In Blasius, J., Greenacre, M. (Eds.), Visualization of Categorical Data (pp. 441459). New York: Academic PressGoogle Scholar
Takane, Y., Yanai, H., Mayekawa, S. (2001). Relationships among several methods of linearly constrained correspondence analysis. Psychometrika, 56, 667684CrossRefGoogle Scholar
Ter Braak, C.J.F. (1986). Canonical correspondence analysis: A new eigenvector technique for multivariate direct gradient analysis. Ecology, 67, 11671179CrossRefGoogle Scholar
Thompson, G.L. (2003). Generalized permutation polytopes and exploratory graphical methods for ranked data. Annals of Statistics, 21, 14011430Google Scholar
Tucker, L.R. (1960). Intra-individual and inter-individual multidimensionality. In Gulliksen, H., Messick, S. (Eds.), Psychological Scaling: Theory and Applications (pp. 155167). New York: WileyGoogle Scholar
Van de Geer, J.P. (2003). Multivariate Analysis of Categorical Data: Theory, Newbury Park, CA: SageGoogle Scholar
Van der Ark, L.A., Van der Heijden, P.G.M (1994). Graphical display of latent budget analysis and latent class analysis, with special reference to correspondence analysis. In Blasius, J., Greenacre, M. (Eds.), Visualization of Categorical Data (pp. 489508). New York: Academic PressGoogle Scholar
Van der Ark, L.A., Van der Heijden, P.G.M., Sikkel, D. (1999). On the identifiability in the latent budget model. Journal of Classification, 16, 117137Google Scholar
Van Deun, K., Groenen, P.J.F., Heiser, W.J., Busing, F.M.T.A., & Delbeke, L. (2005). Interpreting degenerate solutions in unfolding by use of the vector model and the compensatory distance model. Psychometrika, 70, in press.Google Scholar
Van Deun, K., Heiser, W.J., & Delbeke, L. (2004). Multidimensional unfolding by nonmetric multidimensional scaling of Spearman distances in the extended permutation polytope (submitted ms.).Google Scholar
Van IJzendoorn, M.H., Kroonenberg, P.M. (1988). Cross-cultural patterns of attachment: A meta-analysis of the strange situation. Child Development, 59, 147156CrossRefGoogle Scholar
Wickens, T.D. (1989). Multiway Contingency Tables Analysis for the Social Sciences, Hillsdale, NJ: Lawrence ErlbaumGoogle Scholar
Williams, R.H., Zimmerman, D.W., Zumbo, B.D., Ross, D. (2003). Charles Spearman: British behavioral scientist. Human Nature Review, 3, 114118Google Scholar
Wilkinson, L. (1999). The Grammar of Graphics, New York: Springer VerlagCrossRefGoogle Scholar
Young, F.W., Takane, Y., De Leeuw, J. (1978). The principal components of mixed measurement level multivariate data: An alternating least squares method with optimal scaling features. Psychometrika, 43, 279281CrossRefGoogle Scholar
Yule, G.U. (1900). On the association of attributes in statistics. Philosophical Transactions of the Royal Society of London (A), 194, 257319Google Scholar
Zhang, J. (2004). Binary choice, subset choice, random utility, and ranking: A unified perspective using the permutahedron. Journal of Mathematical Psychology, 48, 107134CrossRefGoogle Scholar
Ziegler, G.U. (1995). Lectures on Polytopes, New York: Springer VerlagCrossRefGoogle Scholar