Hostname: page-component-745bb68f8f-d8cs5 Total loading time: 0 Render date: 2025-01-08T04:37:59.420Z Has data issue: false hasContentIssue false

The Knowledge Content of Statistical Data

Published online by Cambridge University Press:  01 January 2025

Lucien Preuss
Affiliation:
Feldeggstrasse 74, Ch 8008, Zürich, Switzerland
Helmut Vorkauf*
Affiliation:
University of Fribourg and Swiss Federal Office of Public Health
*
Requests for reprints should be sent to Helmut Vorkauf, Swiss Federal Office of Public Health, Hessstrasse 27, CH 3097 Liebefeld/Bern, SWITZERLAND.

Abstract

An information-theoretic framework is used to analyze the knowledge content in multivariate cross classified data. Several related measures based directly on the information concept are proposed: the knowledge content (S) of a cross classification, its terseness (Zeta), and the separability (GammaX) of one variable, given all others. Exemplary applications are presented which illustrate the solutions obtained where classical analysis is unsatisfactory, such as optimal grouping, the analysis of very skew tables, or the interpretation of well-known paradoxes. Further, the separability suggests a solution for the classic problem of inductive inference which is independent of sample size.

Type
Original Paper
Copyright
Copyright © 1997 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Lucien Preuss gratefully acknowledges the support of the Swiss National Science Foundation (grant 21-25'757.88) which provided the initial impetus for this work. Also, we owe more than the usual thanks to the Editor, Shizuhiko Nishisato, whose relentless demands for clarity resulted in quite a few improvements.

References

Bernstein, F. (1966). Selected contributions to the literature of blood groups and immunology (Dunsford Memorial): I. The ABO system, Fort Knox: U.S. Army Medical Research Laboratory.(Originally published in 1924. Cited from Edwards, 1972.)Google Scholar
Blyth, C. R. (1972). On Simpson's paradox and the sure thing principle. Journal of the American Statistical Association, 67, 364366.CrossRefGoogle Scholar
Blyth, C. R. (1972). Some probability paradoxes in choice from among random alternatives (with comments by D. V. Lindley, I. J. Good, R. L. Winkler, and J. W. Pratt). Journal of the American Statistical Association, 67, 366388. (Originally appeared in Siam Review, April, 1970.)CrossRefGoogle Scholar
Brown, Morton B. (1975). The asymptotic standard errors of some estimates of uncertainty in the two-way contingency table. Psychometrika, 40, 291296.CrossRefGoogle Scholar
Edwards, A. W. F. (1972). Likelihood, Cambridge: Cambridge University Press.Google Scholar
Fano, R. M. (1961). Transmission of information, New York: M.I.T. Press.CrossRefGoogle Scholar
Fisher, R. A. (1922). On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society, Series A, 222.Google Scholar
Fisher, R. A. (1951). The design of experiments 6th ed.,, Edinburgh: Oliver and Boyd.Google Scholar
Fisher, R. A. (1956). Statistical methods and scientific inference, Edinburgh: Oliver and Boyd.Google Scholar
Fleiss, J. L. (1981). Statistical methods for rates and proportions, New York: Wiley.Google Scholar
Gaines, H. F. (1956). Cryptoanalysis, New York: Dover Publications.Google Scholar
Goodman, L. A. (1978). Analyzing qualitative/categorical data, log-linear models and latent-structure analysis, London: Addison-Wesley.Google Scholar
Goodman, L. A., Kruskal, W. H. (1954). Measurement of association for cross classifications. Journal of the American Statistical Society, 49, 732764.Google Scholar
Goodman, L. A., Kruskal, W. H. (1959). Measurement of association for cross classifications. II: Further discussion and references. Journal of the American Statistical Association, 54, 123163.CrossRefGoogle Scholar
Goodman, L. A., Kruskal, W. H. (1963). Measurement of association for cross classifications. III: Approximate sampling theory. Journal of the American Statistical Association, 58, 310364(Reprinted in L. A. Goodman & W. H. Kruskal, 1979, Measures of association for cross classifications, Series in Statistics No. 1. New York: Springer.)CrossRefGoogle Scholar
Goodman, L. A., Kruskal, W. H. (1972). Measurement of association for cross classifications. IV: Simplification of asymptotic variances. Journal of the American Statistical Association, 67, 415421.CrossRefGoogle Scholar
Jaynes, E. T. (1978). Where do we stand on maximum entropy?. In Levine, M. D., Tribus, M. (Eds.), The maximum entropy formalism (pp. 211314). Cambridge, MA: M.I.T. Press.Google Scholar
Jaynes, E. T. (1989). Clearing up mysteries—The original goal. In Skilling, J. (Eds.), Maximum entropy and Bayesian methods (pp. 1314). Dordrecht/Boston/London: Kluwer Academic Publications.Google Scholar
Kendall, M. G. (1943). The advanced theory of statistics, London: Ch. Griffin & Co.Google Scholar
Khinchin, A. I. (1957). Mathematical foundations of information theory, New York: Dover Publications.Google Scholar
Lindley, D. V. (1957). A statistical paradox. Biometrika, 44, 187192.CrossRefGoogle Scholar
Lindley, D. V., Novick, M. R. (1981). The Rôle of Exchangeability in Inference. The Annals of Statistics, 9(1), 4548.CrossRefGoogle Scholar
McGill, W. J. (1954). Multivariate information Transmission. Proceedings of Transactions PGIiT 1954 Symposium on information Theory, 4, 93111.Google Scholar
Novick, M. R. (1980). Statistics as psychometrics. Psychometrika, 45, 411424.CrossRefGoogle Scholar
Press, W. H., Flannery, B. P., Teukolsky, S. A., Vetterling, W. T. (1989). Numerical recipes in Pascal (pp. 527532). Cambridge: Cambridge University Press.Google Scholar
Rajski, C. (1961). A metric space of discrete probability distributions. Information and Control, 4, 371377.CrossRefGoogle Scholar
Rajski, C. (1963). On the normed information rate of discrete random variables. Zastosowania Mathematiki, 6, 459461.Google Scholar
Särndal, Carl Erik (1974). A comparative study of association measures. Psychometrika, 39, 165187.CrossRefGoogle Scholar
Schmitt, S. A. (1969). Measuring uncertainty, an elementary introduction to Bayesian statistics, Reading, MA: Addison Wesley.Google Scholar
Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27, 379423.CrossRefGoogle Scholar
Stouffer, S. A. et al. (1949). The American Soldier: Adjustment during army life. Studies in social psychology in World War II, Vol. 1, Princeton, NJ: Princeton University Press. (Cited from Goodman, 1978)Google Scholar
Székely, G. J. (1990). Paradoxa, Thun & Frankfurt: Verlag Harri Deutsch.Google Scholar
Theil, H. (1972). Statistical decomposition analysis, with applications in the social and administrative sciences, Amsterdam & London: North-Holland.Google Scholar
Vessereau, A. (1947). La Statistique, Paris: Presses Universitaires de France.Google Scholar
Wilks, S. S. (1935). The likelihood test of independence in contingency tables. Annals of Mathematical Statistics, 6, 190196.CrossRefGoogle Scholar
Woolf, B. (1957). The log likelihood ratio test (the G-test). Annals of Human Genetics, 21, 397409.CrossRefGoogle ScholarPubMed