Hostname: page-component-745bb68f8f-grxwn Total loading time: 0 Render date: 2025-01-07T18:26:18.327Z Has data issue: false hasContentIssue false

Reliability of Test Scores in Nonparametric Item Response Theory

Published online by Cambridge University Press:  01 January 2025

Klaas Sijtsma*
Affiliation:
Free University of Amsterdam, The Netherlands
Ivo W. Molenaar
Affiliation:
University of Groningen, The Netherlands
*
Requests for reprints should be sent to Klaas Sijtsma, Vakgroep Arbeids- en Organisatiepsychologie, Free University, De Boelelaan 1081, 1081 HV Amsterdam, THE NETHERLANDS.

Abstract

Three methods for estimating reliability are studied within the context of nonparametric item response theory. Two were proposed originally by Mokken (1971) and a third is developed in this paper. Using a Monte Carlo strategy, these three estimation methods are compared with four “classical” lower bounds to reliability. Finally, recommendations are given concerning the use of these estimation methods.

Type
Original Paper
Copyright
Copyright © 1987 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

The authors are grateful for constructive comments from the reviewers and from Charles Lewis.

References

Birnbaum, A. (1968). In Lord, F. M., Novick, M. R. (Eds.), Statistical theories of mental test scores, Reading: Addison-Wesley.Google Scholar
Boomsma, A. (1983). On the robustness of LISREL (maximum likelihood estimation) against small sample size and non-normality. Unpublished doctoral dissertation, University of Groningen.Google Scholar
Cliff, N. (1983). Evaluating Guttman scales: Some old and new thoughts. In Wainer, H., Messick, S. (Eds.), Principals of modern psychological measurement, Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
Cliff, N. (1984). An improved internal consistency reliability estimate. Journal of Educational Statistics, 9, 151161.CrossRefGoogle Scholar
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297334.CrossRefGoogle Scholar
Feldt, L. S. (1965). The approximate sampling distribution of Kuder-Richardson reliability coefficient twenty. Psychometrika, 30, 357370.CrossRefGoogle ScholarPubMed
Fischer, G. H. (1974). Einführung in die theorie psychologischer tests [Introduction to psychological test theory], Bern: Huber.Google Scholar
Gustafsson, J. E. (1977). The Rasch for dichotomous items: Theory, applications and a computer program (Internal Rep. No. 63). Institute of Education, University of Goteborg.Google Scholar
Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10, 255282.CrossRefGoogle ScholarPubMed
Guttman, L. (1950). The basis for scalogram analysis. In Stouffer, S. A., Guttman, L., Suchman, E. A., Lazarsfeld, P. F., Star, S. A., Clausen, J. A. (Eds.), Measurement and prediction, Princeton: Princeton University Press.Google Scholar
Henning, H. J. (1976). Die Technik der Mokken-Skalenanalyse [The technique of Mokken scale analysis]. Psychologische Beiträge, 18, 410430.Google Scholar
Horn, J. (1971). Integration of concepts of reliability and standard error of measurement. Educational and Psychological Measurement, 31, 5774.CrossRefGoogle Scholar
Horst, P. (1953). Correcting the Kuder-Richardson reliability for dispersion of item difficulties. Psychological Bulletin, 50, 371374.CrossRefGoogle ScholarPubMed
Jackson, P. H., Agunwamba, C. C. (1977). Lower bounds for the reliability of the total score on a test composed of non-homogeneous items: I: Algebraic lower bounds. Psychometrika, 42, 567578.CrossRefGoogle Scholar
Jansen, P. G. W. (1982). Homogenitätsmessung mit Hilfe des Koeffizienten H von Loevinger: Eine kritische Diskussion [Measuring homogeneity by means of Loevinger's coefficient H: A critical discussion]. Psychologische Beiträge, 24, 96105.Google Scholar
Jansen, P. G. W. (1982). De onbruikbaarheid van Mokkenschaalanalyse [On the uselessness of Mokken scale analysis]. Tijdschrift voor Onderwijsresearch, 7, 1124.Google Scholar
Jansen, P. G. W. (1983). Rasch analysis of attitudinal data, Den Haag: Rijks Psychologische Dienst.Google Scholar
Jansen, P. G. W., Roskam, E.E.Ch.I., van den Wollenberg, A. L. (1982). De Mokkenschaal gewogen [Weighing the Mokken scale]. Tijdschrift voor Onderwijsresearch, 7, 3142.Google Scholar
Kristof, W. (1963). The statistical theory of stepped-up reliability coefficients when a test has been divided into several equivalent parts. Psychometrika, 28, 221238.CrossRefGoogle Scholar
Lewis, C. (1983). Bayesian inference for latent abilities. In Anderson, S. B., Helmick, J. S. (Eds.), On educational testing, San Francisco: Jossey-Bass.Google Scholar
Loevinger, J. (1948). The technique of homogeneous tests compared with some aspects of “scale analysis” and factor analysis. Psychological Bulletin, 45, 507530.CrossRefGoogle Scholar
Lord, F. M. (1980). Applications of item response theory to practical testing problems, Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
Lord, F. M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability. Psychometrika, 48, 233245.CrossRefGoogle Scholar
Lord, F. M., Novick, M. R. (1968). Statistical theories of mental test scores, Reading: Addison-Wesley.Google Scholar
Lumsden, J. (1976). Test theory. Annual Review of Psychology, 27, 251280.CrossRefGoogle Scholar
Mokken, R. J. (1971). A theory and procedure of scale analysis, The Hague: Mouton.CrossRefGoogle Scholar
Mokken, R. J., Lewis, C. (1982). A nonparametric approach to the analysis of dichotomous item responses. Applied Psychological Measurement, 6, 417430.CrossRefGoogle Scholar
Molenaar, I. W. (1982). Mokken scaling revisited. Kwantitatieve Methoden, 8, 145164.Google Scholar
Molenaar, I. W. (1982). Een tweede weging van de Mokkenschaal [A second weighing of the Mokken scale]. Tijdschrift voor Onderwijsresearch, 7, 172181.Google Scholar
Molenaar, I. W. (1982). De beperkte bruikbaarheid van Jansen's kritiek [On the limited usefulness of Jansen's criticisms]. Tijdschrift voor Onderwijsresearch, 7, 2530.Google Scholar
Molenaar, I. W., Sijtsma, K. (1984). Internal consistency and reliability in Mokken's nonparametric item response model. Tijdschrift voor Onderwijsresearch, 9, 257268.Google Scholar
Oosterloo, S. (1984). Confidence intervals for test information and relative efficiency. Statistica Neerlandica, 38, 3753.CrossRefGoogle Scholar
Samejima, F. (1977). Weakly parallel tests in latent trait theory with some criticisms of classical test theory. Psychometrika, 42, 193198.CrossRefGoogle Scholar
Schulman, R. S., Haden, R. L. (1975). A test theory model for ordinal measurements. Psychometrika, 40, 455472.CrossRefGoogle Scholar
Sedere, M. U., Feldt, L. S. (1977). The sampling distributions of the Kristof reliability coefficient, the Feldt coefficient, and Guttman's lambda-2. Journal of Educational Measurement, 14, 5362.CrossRefGoogle Scholar
Sijtsma, K. (1984). Useful nonparametric scaling: A reply to Jansen. Psychologische Beiträge, 26, 423437.Google Scholar
Sijtsma, K., Prins, P. M. (1986). Itemselectie in het Mokken model [Item selection in the Mokken model]. Tijdschrift voor Onderwijsresearch, 11, 121129.Google Scholar
Stokman, F. N., van Schuur, W. H. (1980). Basic scaling. Quality and Quantity, 14, 530.CrossRefGoogle Scholar
ten Berge, J. M. F., Zegers, F. E. (1978). A series of lower bounds to the reliability of a test. Psychometrika, 43, 575579.CrossRefGoogle Scholar
ten Berge, J. M. F., Snijders, T. A. B., Zegers, F. E. (1981). Computational aspects of the greatest lower bound to the reliability and constrained minimum trace factor analysis. Psychometrika, 46, 201213.CrossRefGoogle Scholar
Weiss, D. J., Davison, M. L. (1981). Test theory and methods. Annual Review of Psychology, 32, 629658.CrossRefGoogle Scholar
van den Wollenberg, A. L. (1982). Two new test statistics for the Rasch model. Psychometrika, 47, 123140.CrossRefGoogle Scholar
Wood, R. (1978). Fitting the Rasch model—A heady tale. British Journal of Mathematical and Statistical Psychology, 31, 2732.CrossRefGoogle Scholar