Observed-Score Equating as a Test Assembly Problem

Wim J. van der Linden; Richard M. Luecht

doi:10.1007/BF02294862

Observed-Score Equating as a Test Assembly Problem

Published online by Cambridge University Press: 01 January 2025

Wim J. van der Linden and

Richard M. Luecht

Show author details

Wim J. van der Linden*: Affiliation:
University of Twente
Richard M. Luecht: Affiliation:
National Board of Medical Examiners
*: Requests for reprints should be sent to W. J. van der Linden, Department of Educational Measurement and Data Analysis, University of Twente, P.O. Box 217, 7500 AE Ensehede, THE NETHERLANDS. E-mail: vanderlinden@edte.utwente.nl

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

A set of linear conditions on item response functions is derived that guarantees identical observed-score distributions on two test forms. The conditions can be added as constraints to a linear programming model for test assembly that assembles a new test form to have an observed-score distribution optimally equated to the distribution on an old form. For a well-designed item pool and items fitting the IRT model, use of the model results into observed-score pre-equating and prevents the necessity of post hoc equating by a conventional observed-score equating method. An empirical example illustrates the use of the model for an item pool from the Law School Admission Test.

Keywords

item response theory test equating test assembly generalized binomial distribution 0–1 linear programming

Information

Type: Original Paper
Information: Psychometrika , Volume 63 , Issue 4 , December 1998 , pp. 401 - 418

DOI: https://doi.org/10.1007/BF02294862 [Opens in a new window]
Copyright: Copyright © 1998 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

The authors are most indebted to Norman D. Verhelst for suggesting Proposition 4 and its proof, to the Law School Admission Council (LSAC) for making available the data set, and to Wim M. M. Tielen for his computational assistance.

References

Adema, J. J. (1990). The construction of customized two-staged tests. Journal of Educational Measurement, 27, 241–253.CrossRef Google Scholar

Adema, J. J. (1992). Methods and models for the construction of weakly parallel tests. Applied Psychological Measurement, 16, 53–63.CrossRef Google Scholar

Adema, J. J., & van der Linden, W. J. (1989). Algorithms for computerized test construction using classical item parameters. Journal of Educational Statistics, 14, 279–290.CrossRef Google Scholar

Armstrong, R. D., & Jones, D. H. (1992). Polynomial algorithms for item matching. Applied Psychological Measurement, 16, 365–373.CrossRef Google Scholar

Armstrong, R. D., Jones, D. H., & Wang, Z. (1994). Automated parallel test construction using classical test theory. Journal of Educational Statistics, 19, 73–90.CrossRef Google Scholar

Armstrong, R. D., Jones, D. H., & Wu, I.-L. (1992). An automated test development of parallel tests. Psychometrika, 57, 271–288.CrossRef Google Scholar

Boekkooi-Timminga, E. (1987). Simultaneous test construction by zero-one programming. Methodika, 1, 1101–112.Google Scholar

Boekkooi-Timminga, E. (1990). The construction of parallel tests from IRT-based item banks. Journal of Educational Statistics, 15, 129–145.CrossRef Google Scholar

Braun, H. I., & Holland, P. W. (1982). Observed-score test equating: A mathematical analysis of some ETS equating procedures. In Holland, P. W., & Rubin, D. B. (Eds.), Test equating, New York: Academic Press.Google Scholar

Glas, C. A. W. (1992). A Rasch model with a multivariate distribution of ability. In Wilson, M. (Eds.), Objective measurement: Theory into practice (Vol. 1), Norwood, NJ: Ablex.Google Scholar

Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications, Boston: Kluwer-Nijhoff.CrossRef Google Scholar

Kelderman, H. (1997). Loglinear multidimensional item response model models for polytomously scored items. In van der Linden, W. J., & Hambleton, R. K. (Eds.), Handbook of modern item response theory (pp. 287–304). New York: Springer-Verlag.CrossRef Google Scholar

Kendall, M. G., & Stuart, A. (1977). The advanced theory of statistics 4th ed.,, London: Griffin & Co..Google Scholar

Kolen, M. J., & Brennan, R. L. (1995). Test equating: Methods and practices, New York: Springer-Verlag.CrossRef Google Scholar

Lord, F. M. (1980). Applications of item response theory to practical testing problems, Hillsdale, NJ: Erlbaum.Google Scholar

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental tests, Reading, MA: Addision-Wesley.Google Scholar

Lord, F. M., & Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed-score “equatings”. Applied Psychological Measurement, 8, 452–461.CrossRef Google Scholar

Luecht, R. M., & Hirsch, T. M. (1992). Computerized test construction using average growth approximation of target information functions. Applied Psychological Measurement, 16, 41–52.CrossRef Google Scholar

McKinley, R. L., & Reckase, M. N. (1983). An extension of the two-parameter logistic model to the multidimensional latent space, Iowa City, IA: American College Testing.Google Scholar

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests, Copenhagen: Danmarks Paedagogiske Institut.Google Scholar

Reckase, M. D. (1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9, 401–412.CrossRef Google Scholar

Reckase, M. D. (1997). A linear logistic multidimensional model for dichotomous item response data. In van der Linden, W. J., & Hambleton, R. K. (Eds.), Handbook of modern item response theory (pp. 271–286). New York City, NY: Springer-Verlag.CrossRef Google Scholar

Samejima, F. (1974). Normal-ogive model for the continuous response level in the multidimensional latent space. Psychometrika, 39, 111–121.CrossRef Google Scholar

Swanson, L., & Stocking, M. L. (1993). A model and heuristic for solving very large item selection problems. Applied Psychological Measurement, 17, 151–166.CrossRef Google Scholar

Tang, K. L., Way, W. D., & Carey, P. A. (1993). The effect of small calibration sample sizes on TEOFL IRT-based equating, Princeton, NJ: Educational Testing Service.Google Scholar

Theunissen, T. J. J. M. (1985). Binary programming and test design. Psychometrika, 50, 411–420.CrossRef Google Scholar

Timminga, E., van der Linden, W. J., & Schweizer, D. A. (1996). ConTEST 2.0: A decision support system for item banking and optimal test assembly (computer program and manual), Groningen, The Netherlands: iec Pro-GAMMA.Google Scholar

van der Linden, W. J. (1996). Assembling test for the measurement of multiple traits. Applied Psychological Measurement, 20, 373–388.CrossRef Google Scholar

van der Linden, W. J. (1998). Optimal assembly of psychological and educational tests. Applied Psychological Measurement, 22, 195–211.CrossRef Google Scholar

van der Linden, W. J., & Boekkooi-Timminga, E. (1988). A zero-one programming approach to Gulliksen's matched random subsets method. Applied Psychological Measurement, 12, 201–209.CrossRef Google Scholar

van der Linden, W. J., & Boekkooi-Timminga, E. (1989). A maximin model for test design with practical constraints. Psychometrika, 17, 237–247.CrossRef Google Scholar

van der Linden, W. J., & Hambleton, R. K. (1997). Handbook of modern item response theory, New York City, NY: Springer-Verlag.CrossRef Google Scholar

van der Linden, W. J., & Luecht, R. M. (1966). An optimization model for test assembly to match observed-score distributions. In Engelhard, G., & Wilson, M. (Eds.), Objective measurement: Theory into practice (pp. 405–418). Norwood, NJ: Ablex Publishing Company.Google Scholar

van der Linden, W. J., & Reese, L. M. (1998). A model for optimal constrained adaptive testing. Applied Psychological Measurement, 22, 259–270.CrossRef Google Scholar

Walsh, J. E. (1953). Approximate probability values for observed number of successes. Sankhya, 15, 281–290.Google Scholar

Walsh, J. E. (1963). Corrections to two papers concerned with binomial events. Sankhya, 25, 427–427.Google Scholar

Zeng, L., & Kolen, M. J. (1995). An alternative approach for IRT observed-score equating of number-correct scores. Applied Psychological Measurement, 19, 231–241.CrossRef Google Scholar

Article contents

Observed-Score Equating as a Test Assembly Problem

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests