Hostname: page-component-5f745c7db-j9pcf Total loading time: 0 Render date: 2025-01-06T07:47:03.274Z Has data issue: true hasContentIssue false

Item Cloning Variation and the Impact on the Parameters of Response Models

Published online by Cambridge University Press:  01 January 2025

Quinn N. Lathrop*
Affiliation:
Pearson
Ying Cheng
Affiliation:
University of Notre Dame
*
Correspondence should be made to Quinn N. Lathrop, Pearson, Portland, USA. Email: quinn.lathrop@gmail.com

Abstract

Item cloning is increasingly used to generate slight differences in tasks for use in psychological experiments and educational assessments. This paper investigates the psychometric issues that arise when item cloning introduces variation into the difficulty parameters of the item clones. Four models are proposed and evaluated in simulation studies with conditions representing possible types of variation due to item cloning. Depending on the model specified, unaccounted variance in the item clone difficulties propagates to other parameters in the model, causing specific and predictable patterns of bias. Person parameters are largely unaffected by the choice of model, but for inferences related to the item parameters, the choice is critical and can even be leveraged to identify problematic item cloning.

Type
Original Paper
Copyright
Copyright © 2016 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

The views and opinions expressed in this article are those of the authors and do not necessarily reflect those of their institutions.

References

Almond, R. G., Kim, Y. J., Velasquez, G., & Shute, V. J. (2014). How task features impact evidence from assessments embedded in simulations and games. Measurement: Interdisciplinary Research & Perspectives, 12, (1–2), 133.Google ScholarPubMed
Almond, R. G., Steinberg, L. S., & Mislevy, R. J. (2002). Enhancing the design and delivery of assessment systems: A four-process architecture. Journal of Technology, Learning, and Assessment, 1, (5), 16.Google Scholar
Arendasy, M. E., & Sommer, M. (2012). Using automatic item generation to meet the increasing item demands of high-stakes educational and occupational assessment. Learning and Individual Differences, 22, 112117.CrossRefGoogle Scholar
Arendasy, M. E., Sommer, M., & Mayr, F. (2012). Using automatic item generation to simultaneously construct german and english versions of a word fluency test. Journal of Cross-Cultural Psychology, 43, 464479.CrossRefGoogle Scholar
Bejar, I. I., Lawless, R., Morely, M., Wagner, M., Bennett, R., & Revuelta, J. (2003). A feasibility study of on-the-fly item generation in adaptive testing. The Journal of Technology, Learning, and Assessment, 2(3).Google Scholar
Bradlow, E. T., Wainer, H., & Wang, X. (1999). A bayesian random effects model for testlets. Psychometrika, 64, 153168.CrossRefGoogle Scholar
Buchanan, T. (2002). Online assessment: Desirable or dangerous?. Professional Psychology: Research and Practice, 33, 148154.CrossRefGoogle Scholar
Cho, S.-J., de Boeck, P., Embretson, S., & Rabe-Hesketh, S. (2014). Additive multilevel item structure models with random residuals: Item modeling for explanation and item generation. Psychometrika, 79, 84104.CrossRefGoogle ScholarPubMed
de Boeck, P., & Leuven, K. (2008). Random item IRT models. Psychometrika, 73, (4), 533559.CrossRefGoogle Scholar
de Boeck, P., & Wilson, M. (2004). Explanatory item response models, New York: Springer.CrossRefGoogle Scholar
DiCerbo, K. E., & Berhens, J. T. (2014). Impacts of the digital ocean, London: Pearson.Google Scholar
Embretson, S., Yang, X., Rao, C., & Sinharay, S. (2007). Automatic item generation and cognitive psychology. Handbook of statistics: Psychometrics, Amsterdam: Elsevier 747768.Google Scholar
Enright, M. K., Sheehan, K. M., Irvine, S. H., & Kyllonen, P. C. (2002). Modeling the difficulty of quantitative reasoning items: Implications for item generation. Item generation for test development, New York: Psychology Press.Google Scholar
Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta psychologica, 37, (6), 359374.CrossRefGoogle Scholar
Gatti, G. G. (2013). Digits 2012 efficacy study (tech. rep.), London: Pearson.Google Scholar
Geerlings, H., Glas, C. A., & van der Linden, W. (2011). Modeling rule-based item generation. Psychometrika, 76, (2), 337359.CrossRefGoogle Scholar
Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis, 1, (3), 515533.CrossRefGoogle Scholar
Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models, Cambridge: Cambridge University Press.Google Scholar
Gierl, M. J., & Lai, H. (2012). The role of item models in automatic item generation the role of item models in automatic item generation. International Journal of Testing, 12, 273298.CrossRefGoogle Scholar
Gierl, M. J., & Lai, H. (2013). Using automated processes to generate test items. Educational Measurement: Issues and Practice, 22, 3650.CrossRefGoogle Scholar
Glas, CAW, & van der Linden, W. J. (2003). Computerized adaptive testing with item cloning. Applied Psychological Measurement, 27, 247261.CrossRefGoogle Scholar
Graf, E. A. (2014). Connecting lines of research on task model variables, automatic item generation, and learning progressions in game-based assessment. Measurement: Interdisciplinary Research & Perspectives, 12, (1–2), 4246.Google Scholar
Han, K. T. (2012). Fixing the c parameter in the three-parameter logistic model. Practical Assessment, Research & Evaluation, 17, (1), 124.Google Scholar
Irvine, S., Kyllonen, P., Laboratory, AFHR, & Service, E. T. (2002). Item generation for test development, Mahwah: Lawrence Erlbaum Associates.Google Scholar
Johnson, M. S., & Sinharay, S. (2005). Calibration of polytomous item families using bayesian hierarchical modeling. Applied Psychological Measurement, 29, 369400.CrossRefGoogle Scholar
Matzen, L. E., Benz, Z. O., Dixon, K. R., Posey, J., Kroger, J. K., & Speed, A. E. (2010). Recreating raven’s: Software for systematically generating large numbers of raven-like matrix problems with normed properties. Behavior Research Methods, 42, 525541.CrossRefGoogle ScholarPubMed
McCulloch, C. E., & Neuhaus, J. M. (2011). Misspecifying the shape of a random effects distribution: Why getting it wrong may not matter. Statistical Science, 26, (3), 388402.CrossRefGoogle Scholar
Mislevy, R. J., Oranje, A., Bauer, M. I., von Davier, A., Hao, J., Corrigan, S., . . . John, M. (2014). Psychometric considerations in game-based assessment (Tech. Rep.). GlassLab Research.Google Scholar
Neuhaus, J. M., & McCulloch, C. E. (2006). Separating between- and within-cluster covariate effects by using conditional and partitioning methods. Journal of the Royal Statistical Society Series B (Statistical Methodology), 68, (5), 859872.CrossRefGoogle Scholar
R Development Core Team (2011). R: A language and environment for statistical computing [Computer software manual]. Vienna: Austria.Google Scholar
Sinharay, S., & Johnson, M. (2005). Analysis of data from an admissions test with item models (Tech. Rep. No. RR-05-06). Educational Testing Service.Google Scholar
Sinharay, S., Johnson, M., & Williamson, D. (2003). Calibrating item families and summarizing the results using family expected response functions. Journal of Educational and Behavioral Statistics, 28, (4), 295313.CrossRefGoogle Scholar
Stan Development Team (2013). Stan: A C++ library for probability and sampling, version 1.3.Google Scholar