The Missing Data Assumptions of the Neat Design and their Implications for Test Equating

Sandip Sinharay; Paul W. Holland

doi:10.1007/s11336-010-9156-6

The Missing Data Assumptions of the Neat Design and their Implications for Test Equating

Published online by Cambridge University Press: 01 January 2025

Sandip Sinharay and

Paul W. Holland

Show author details

Sandip Sinharay*: Affiliation:
ETS, Princeton
Paul W. Holland: Affiliation:
ETS, Princeton
*: Requests for reprints should be sent to Sandip Sinharay, ETS, Princeton, NJ, USA. E-mail: ssinharay@ets.org

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

The Non-Equivalent groups with Anchor Test (NEAT) design involves missingdata that are missing by design. Three nonlinear observed score equating methods used with a NEAT design are the frequency estimation equipercentile equating (FEEE), the chain equipercentile equating (CEE), and the item-response-theory observed-score-equating (IRT OSE). These three methods each make different assumptions about the missing data in the NEAT design. The FEEE method assumes that the conditional distribution of the test score given the anchor test score is the same in the two examinee groups. The CEE method assumes that the equipercentile functions equating the test score to the anchor test score are the same in the two examinee groups. The IRT OSE method assumes that the IRT model employed fits the data adequately, and the items in the tests and the anchor test do not exhibit differential item functioning across the two examinee groups. This paper first describes the missing data assumptions of the three equating methods. Then it describes how the missing data in the NEAT design can be filled in a manner that is coherent with the assumptions made by each of these equating methods. Implications on equating are also discussed.

Keywords

chain equating frequency estimation IRT observed-score equating post-stratification equating raking simulation true equating function

Type: Original Paper
Information: Psychometrika , Volume 75 , Issue 2 , June 2010 , pp. 309 - 327

DOI: https://doi.org/10.1007/s11336-010-9156-6 [Opens in a new window]
Copyright: Copyright © 2010 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bishop, Y.M.M., Fienberg, E.F., & Holland, P.W. (1975). Discrete multivariate analysis, Cambridge: MIT Press.Google Scholar

Braun, H.I., & Holland, P.W. (1982). Observed-score test equating: A mathematical analysis of some ETS equating procedures. In P.W. Holland, , Rubin, D.B. (Eds.), Test equating (pp. 71–135). New York: Academic Press.Google Scholar

Haberman, S.J. (2006). An elementary test of the normal 2PL model against the normal 3PL model (ETS RR-06-10). Princeton, NJ: ETS.Google Scholar

Holland, P.W. (1990). On the sampling theory foundations of item response theory models. Psychometrika, 55(4), 577–601.CrossRef Google Scholar

Holland, P.W., & Thayer, D.T. (2000). Univariate and bivariate loglinear models for discrete test score distributions. Journal of Educational and Behavioral Statistics, 25, 133–183.CrossRef Google Scholar

Holland, P.W., Sinharay, S., von Davier, A.A., & Han, N. (2008). An approach to evaluating the missing data assumptions of the chain and post-stratification equating methods for the NEAT design. Journal of Educational Measurement, 45, 17–43.CrossRef Google Scholar

Kolen, M.J., & Brennan, R.J. (2004). Test equating, scaling, and linking, (2nd ed.). New York: Springer.CrossRef Google Scholar

Liou, M., & Cheng, P.E. (1995). Equipercentile equating via data-imputation techniques. Psychometrika, 60(1), 119–136.CrossRef Google Scholar

Little, R.J., & Rubin, D.B. (2002). Statistical analysis with missing data, (2nd ed.). New York: Wiley.CrossRef Google Scholar

Livingston, S.A., Dorans, N.J., & Wright, N.K. (1990). What combination of sampling and equating methods works best?. Applied Measurement in Education, 3, 73–95.CrossRef Google Scholar

Lord, F.M., & Wingersky, M.S. (1984). Comparison of IRT true-score and equipercentile observed-score equatings. Applied Psychological Measurement, 8, 452–461.CrossRef Google Scholar

Marco, G.L., Petersen, N.S., & Stewart, E.E. (1983). A test of the adequacy of curvilinear score equating models. In Weiss, D. (Eds.), New horizons in testing: latent trait test theory and computerized adaptive testing, New York: Academic Press.Google Scholar

Miyazaki, K., Hoshino, T., Mayekawa, S., & Shigemasu, K. (2009). A new concurrent calibration method for nonequivalent group design under nonrandom assignment. Psychometrika, 74, 1–20.CrossRef Google Scholar

Puhan, G. (2010). A comparison of chained linear and post stratification linear equating under different testing conditions. Journal of Educational Measurement, 47(1), 54–75.CrossRef Google Scholar

Sinharay, S. (2008). Chain equating versus post-stratification equating: An illustrative comparison. Paper presented at the conference to honor Paul Holland, Princeton, NJ.Google Scholar

Sinharay, S., & Holland, P.W. (in press). A fair comparison of three nonlinear equating methods in applications of the NEAT design. Journal of Educational Measurement.Google Scholar

Thisted, R. (1988). Elements of statistical computing, New York: Chapman and Hall.Google Scholar

von Davier, A.A., Holland, P.W., & Thayer, D.T. (2004). The kernel method of test equating, New York: Springer.CrossRef Google Scholar

von Davier, A.A., Holland, P.W., Livingston, S.A., Casabianca, J., Grant, M.C., & Martin, K. (2006). An evaluation of the kernel equating method. A special study with pseudo-tests constructed from real test data (ETS RR-06-02). Princeton, NJ: ETS.Google Scholar

Wang, T., Lee, W.-C., Brennan, R.J., & Kolen, M.J. (2008). A comparison of the frequency estimation and chained equipercentile methods under the common-item non-equivalent groups design. Applied Psychological Measurement, 32, 632–651.CrossRef Google Scholar

Article contents

The Missing Data Assumptions of the Neat Design and their Implications for Test Equating

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests