Hostname: page-component-5f745c7db-8qdnt Total loading time: 0 Render date: 2025-01-06T06:52:00.977Z Has data issue: true hasContentIssue false

Evaluating the Equal-Interval Hypothesis with Test Score Scales

Published online by Cambridge University Press:  01 January 2025

Ben Domingue*
Affiliation:
Institute of Behavioral Science, University of Colorado Boulder
*
Requests for reprints should be sent to Ben Domingue, Institute of Behavioral Science, University of Colorado Boulder, Boulder, CO, USA. E-mail: ben.domingue@gmail.com

Abstract

The axioms of additive conjoint measurement provide a means of testing the hypothesis that testing data can be placed onto a scale with equal-interval properties. However, the axioms are difficult to verify given that item responses may be subject to measurement error. A Bayesian method exists for imposing order restrictions from additive conjoint measurement while estimating the probability of a correct response. In this study an improved version of that methodology is evaluated via simulation. The approach is then applied to data from a reading assessment intentionally designed to support an equal-interval scaling.

Type
Original Paper
Copyright
Copyright © 2013 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In Lord, F.M., Novick, M.R. (Eds.), Statistical theories of mental test scores, Reading: Addison-Wesley 397472Google Scholar
Briggs, D. C. (2013). Measuring growth with vertical scales. Journal of Educational Measurement, 50(2), 204226CrossRefGoogle Scholar
Brogden, H.E. (1977). The Rasch model, the law of comparative judgement and additive conjoint measurement. Psychometrika, 42(4), 631634CrossRefGoogle Scholar
Campbell, N. (1933). The measurement of visual sensations. Proceedings of the Physical Society, 45, 565590CrossRefGoogle Scholar
Carroll, J., Davies, P., Richman, B. (1971). Word frequency book, Boston: Houghton MifflinGoogle Scholar
Davis-Stober, C.P., (2009). Analysis of multinomial models under inequality constraints: applications to measurement theory. Journal of Mathematical Psychology, 1–13.CrossRefGoogle Scholar
Devroye, L. (1986). Non-uniform random variate generation, New York: SpringerCrossRefGoogle Scholar
Domingue, B., (2012a). ConjointChecks: a package to check the cancellation axioms of conjoint measurement. Computer software manual.CrossRefGoogle Scholar
Domingue, B. (2012b). Evaluating the equal-interval hypothesis with test score scales. Unpublished doctoral dissertation, University of Colorado Boulder.Google Scholar
Ferguson, A., Myers, C.S., Bartlett, R.J., Banister, H., Bartlett, F.C., Brown, W. et al. (1940). Quantitative estimates of sensory events: final report of the committee appointed to consider and report upon the possibility of quantitative estimates of sensory events. Advancement of Science, 1, 331349Google Scholar
Fischer, G. (1968). Pscyhologische Testtheorie, Bern: HuberGoogle Scholar
Fischer, G.H. (1995). Linear logistic models for change. In Fischer, G.H., Molenaar, I.W. (Eds.), Rasch models: foundations, recent developments, and applications, New York: Springer 157180CrossRefGoogle Scholar
Gelfand, A.E., Smith, A.F.M., Lee, T.M. (1992). Bayesian analysis of constrained parameter and truncated data problems. Journal of the American Statistical Association, 87, 523532CrossRefGoogle Scholar
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B. (2004). Bayesian data analysis, (2nd ed.). Boca Raton: Chapman & Hall/CRCGoogle Scholar
Gigerenzer, G., Strube, G. (1983). Are there limits to binaural additivity of loudness?. Journal of Experimental Psychology, 9(1), 126136Google ScholarPubMed
Glas, C.A.W., Verhelst, N.D. (1995). Testing the Rasch model. In Fischer, G.H., Molenaar, I.W. (Eds.), Rasch models: foundations, recent developments, and applications, New York: Springer 6795Google Scholar
Gorin, J. (2006). Test design with cognition in mind. Educational Measurement, Issues and Practice, 25(4), 2135CrossRefGoogle Scholar
Green, K.E. (1986). Fundamental measurement: a review and application of additive conjoint measurement in educational testing. The Journal of Experimental Education, 54(3), 141147CrossRefGoogle Scholar
Grolier (1986). Electronic Encyclopedia, Danbury: GrolierGoogle Scholar
Hastings, W.K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1), 97109CrossRefGoogle Scholar
Hölder, O. (1901). Die Axiome der Quantität und die Lehr vom Mass. Berichte über die Verhandlungen der Königlich Sächsischen Gesellschaft der Wissenschaften Zu Leipzig. Mathematisch-Physische Klasse, 53, 146Google Scholar
Humphry, S. (2010). Modelling the effects of person group factors on discrimination. Educational and Psychological Measurement, 70, 215351CrossRefGoogle Scholar
Iverson, G., Falmagne, J. (1985). Statistical issues in measurement. Mathematical Social Sciences, 10, 131153CrossRefGoogle Scholar
Jackman, S. (2009). Bayesian analysis for the social sciences, Chichester: WileyCrossRefGoogle Scholar
Karabatsos, G. (2000). A critique of Rasch residual fit statistics. Journal of Applied Measurement, 1(2), 152176Google ScholarPubMed
Karabatsos, G. (2001). The Rasch model, additive conjoint measurement, and new models of probabilistic measurement theory. Journal of Applied Measurement, 2(4), 389423Google ScholarPubMed
Keats, J. (1967). Test theory. Annual Review of Psychology, 18, 217238CrossRefGoogle ScholarPubMed
Krantz, D.H., Luce, R., Suppes, P., Tversky, A. (1971). Foundations of measurement volume I: additive and polynomial representations, New York: Academic PressGoogle Scholar
Kyngdon, A. (2011). Plausible measurement analogies to some psychometric models of test performance. British Journal of Mathematical & Statistical Psychology, 64(3), 478497CrossRefGoogle ScholarPubMed
Kyngdon, A., Richards, B. (2007). Attitudes, order and quantity: deterministic and direct probabilistic tests of unidimensional unfolding. Journal of Applied Measurement, 8, 134Google ScholarPubMed
Luce, R.D., Steingrimsson, R. (2011). Theory and tests of the conjoint commutativity axiom for additive conjoint measurement. Journal of Mathematical Psychology, 55, 379385CrossRefGoogle Scholar
Luce, R.D., Tukey, J.W. (1964). Simultaneous conjoint measurement: a new type of fundamental measurement. Journal of Mathematical Psychology, 1, 127CrossRefGoogle Scholar
McClelland, G. (1977). A note on Arbuckle and Larimer, “The number of two-way tables satisfying certain additivity axioms”. Journal of Mathematical Psychology, 15(3), 292295CrossRefGoogle Scholar
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E. (1953). Equation of state calculations by fast computing machines. Journal of Chemical Physics, 21(6), 10871092CrossRefGoogle Scholar
Michell, J. (1988). Some problems in testing the double cancellation condition in conjoint measurement. Journal of Mathematical Psychology, 32, 466473CrossRefGoogle Scholar
Michell, J. (1990). An introduction to the logic of psychological measurement, New York: Psychology PressGoogle Scholar
Perline, R., Wright, B.D., Wainer, H. (1979). The Rasch model as additive conjoint measurement. Applied Psychological Measurement, 3(2), 237255CrossRefGoogle Scholar
R Development Core Team (2010). R: a language and environment for statistical computing. Computer software manual. Vienna, Austria. Available from http://www.R-project.org/. ISBN 3-900051-07-0.Google Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests, Copenhagen: Danish Institute for Educational ResearchGoogle Scholar
Scheiblechner, H. (1972). Das Lernen und Läsen komplexer Denkaufgaben (The learning and solving of complex reasoning items). Zeitschrift für Experimentelle und Angewandte Psychologie, 3, 456506Google Scholar
Scott, D. (1964). Measurement structures and linear inequalities. Journal of Mathematical Psychology, 1, 233247CrossRefGoogle Scholar
Stenner, A. (1996). Measuring reading comprehension with the Lexile framework. Fourth North American conference on adolescent/adult literacy, Washington: International Reading AssociationGoogle Scholar
Stenner, A., Burdick, H., Sanford, E., Burdick, D. (2006). How accurate are lexile test measures?. Journal of Applied Measurement, 7(3), 307322Google Scholar
Stenner, A., Smith, M., Burdick, D. (1983). Toward a theory of construct definition. Journal of Educational Measurement, 20, 305315CrossRefGoogle Scholar
Stenner, A., Stone, M., Burdick, D. (2011). How to model and test for the mechanisms that make measurement systems tick. Joint international IMEKO TC1 + TC7 + TC13 symposium, Jena: Interational Measurement ConfederationGoogle Scholar
Torres Irribarra, D., Diakow, R. (2012). Impact of instrument quality on the selection of a latent variable model. 74th annual meeting and training session, Vancouver: National Council on Measurement in EducationGoogle Scholar
Wu, M.L., Adams, R.J. (2013, in press). Properties of Rasch residual fit statistics. Journal of Applied Measurement.Google Scholar