A Hierarchical Model for Accuracy and Choice on Standardized Tests

Steven Andrew Culpepper; James Joseph Balamuta

doi:10.1007/s11336-015-9484-7

A Hierarchical Model for Accuracy and Choice on Standardized Tests

Published online by Cambridge University Press: 01 January 2025

Steven Andrew Culpepper and

James Joseph Balamuta

Show author details

Steven Andrew Culpepper*: Affiliation:
University of Illinois at Urbana-Champaign
James Joseph Balamuta: Affiliation:
University of Illinois at Urbana-Champaign
*: Correspondence should be made to Steven Andrew Culpepper, Department of Statistics, University of Illinois at Urbana-Champaign, 725 South Wright Street, Champaign, IL 61820, USA. Email: sculpepp@illinois.edu, balamut2@illinois.edu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

This paper assesses the psychometric value of allowing test-takers choice in standardized testing. New theoretical results examine the conditions where allowing choice improves score precision. A hierarchical framework is presented for jointly modeling the accuracy of cognitive responses and item choices. The statistical methodology is disseminated in the ‘cIRT’ R package. An ‘answer two, choose one’ (A2C1) test administration design is introduced to avoid challenges associated with nonignorable missing data. Experimental results suggest that the A2C1 design and payout structure encouraged subjects to choose items consistent with their cognitive trait levels. Substantively, the experimental data suggest that item choices yielded comparable information and discrimination ability as cognitive items. Given there are no clear guidelines for writing more or less discriminating items, one practical implication is that choice can serve as a mechanism to improve score precision.

Keywords

high-stakes testing item response theory Thurstonian models Bayesian statistics choice

Type: Original paper
Information: Psychometrika , Volume 82 , Issue 3 , September 2017 , pp. 820 - 845

DOI: https://doi.org/10.1007/s11336-015-9484-7 [Opens in a new window]
Copyright: Copyright © 2015 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Albert, J.. (1992). Bayesian estimation of normal ogive item response curves using Gibbs sampling. Journal of Educational and Behavioral Statistics, 17(3), 251–269. doi:10.3102/10769986017003251.CrossRef Google Scholar

Allen, N., Holland, P., Thayer, D.. (2005). Measuring the benefits of examinee-selected questions. Journal of Educational Measurement, 42, 27–51. doi:10.1111/j.0022-0655.2005.00003.x.CrossRef Google Scholar

Azzalini, A., Dalla Valle, A.. (1996). The multivariate skew-normal distribution. Biometrika, 83(4), 715–726. doi:10.1093/biomet/83.4.715.CrossRef Google Scholar

Béguin, A. A., Glas, C. A.. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66(4), 541–561. doi:10.1007/BF02296195.CrossRef Google Scholar

Böckenholt, U.. (2001). Hierarchical modeling of paired comparison data. Psychological Methods, 6(1), 49. doi:10.1037/1082-989X.6.1.49.CrossRef Google Scholar PubMed

Böckenholt, U.. (2004). Comparative judgments as an alternative to ratings: Identifying the scale origin. Psychological Methods, 9(4), 453. doi:10.1037/1082-989X.9.4.453.CrossRef Google Scholar

Böckenholt, U.. (2006). Thurstonian-based analyses: Past, present, and future utilities. Psychometrika, 71(4), 615–629. doi:10.1007/s11336-006-1598-5.CrossRef Google Scholar PubMed

Bradlow, E., Thomas, N.. (1998). Item response theory models applied to data allowing examinee choice. Journal of Educational and Behavioral Statistics, 23, 236–243. doi:10.3102/10769986023003236.CrossRef Google Scholar

Bridgeman, B., Morgan, R., Wang, M-M. (1997). Choice among essay topics: Impact on performance and validity. Journal of Educational Measurement, 34(3), 273–286. doi:10.1111/j.1745-3984.1997.tb00519.x.CrossRef Google Scholar

Brooks, S. P., Gelman, A.. (1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7(4), 434–455.CrossRef Google Scholar

Brown, A., Maydeu-Olivares, A.. (2011). Item response modeling of forced-choice questionnaires. Educational and Psychological Measurement, 71(3), 460–502. doi:10.1177/0013164410375112.CrossRef Google Scholar

Carmona, R.Indifference pricing: Theory and applications 2009 Princeton, NJ: Princeton University Press.Google Scholar

Cattelan, M. et al. (2012). Models for paired comparison data: A review with emphasis on dependent data. Statistical Science, 27(3), 412–433. doi:10.1214/12-STS396.CrossRef Google Scholar

Coombs, C. H., Milholland, J. E., Womer, F. B.. (1956). The assessment of partial knowledge. Educational and Psychological Measurement, 16(1), 13–37. doi:10.1177/001316445601600102.CrossRef Google Scholar

Croson, R.. (2005). The method of experimental economics. International Negotiation, 10, 131–148. doi:10.1163/1571806054741100.CrossRef Google Scholar

Culpepper, S.A. (2015). Revisiting the 4-parameter item response model: Bayesian estimation and application. Psychometrika..Google Scholar

Eddelbuettel, D.Seamless R and C++ integration with Rcpp 2013 New York: Springerdoi:10.1007/978-1-4614-6868-4.CrossRef Google Scholar

Fox, J-PBayesian item response modeling 2010 New York: Springerdoi:10.1007/978-1-4419-0742-4.CrossRef Google Scholar

Guay, R.Purdue spatial visualization test 1976 West Layfette, IN: Purdue University.Google Scholar

Hakstian, A. R., Kansup, W.. (1975). A comparison of several methods of assessing partial knowledge in multiple choice tests: II Testing procedures. Journal of Educational Measurement, 12(4), 231–239. doi:10.1111/j.1745-3984.1975.tb01024.x.CrossRef Google Scholar

Hontangas, P., Ponsado, V., Olea, J., Wise, S.. (2000). The choice of item difficulty in self-adapted testing. European Journal of Psychological Assessment, 16, 3–12. doi:10.1027//1015-5759.16.1.3.CrossRef Google Scholar

Kahneman, D.. (2003). Maps of bounded rationality: Psychology for behavioral economics. American Economic Review, 93, 1449–1475. doi:10.1257/000282803322655392.CrossRef Google Scholar

Kahneman, D., Knetsch, J. L., Thaler, R. H.. (1990). Experimental tests of the endowment effect and the Coase theorem. Journal of Political Economy, 98, 1325–1348. doi:10.1086/261737.CrossRef Google Scholar

Kahneman, D., Knetsch, J. L., Thaler, R. H.. (1991). Anomalies: The endowment effect, loss aversion, and status quo bias. The Journal of Economic Perspectives, 5, 193–206. doi:10.1257/jep.5.1.193.CrossRef Google Scholar

Lukhele, R., Thissen, D., Wainer, H.. (1994). On the relative value of multiple-choice, constructed response, and examinee-selected items on two achievement tests. Journal of Educational Measurement, 31, 234–250. doi:10.1111/j.1745-3984.1994.tb00445.x.CrossRef Google Scholar

Maeda, Y., Yoon, S.. (2013). A meta-analysis on gender differences in mental rotation ability measured by the Purdue spatial visualization tests: Visualization of rotations (PSVT:R). Educational Psychology Review, 25, 69–94. doi:10.1007/s10648-012-9215-x.CrossRef Google Scholar

Maeda, Y., Yoon, S. Y., Kim-Kang, G., Imbrie, P.. (2013). Psychometric properties of the revised PSVT: R for measuring first year engineering students’ spatial ability. International Journal of Engineering Education, 29(3), 763–776.Google Scholar

Maydeu-Olivares, A., Böckenholt, U.. (2005). Structural equation modeling of paired-comparison and ranking data. Psychological Methods, 10(3), 285. doi:10.1037/1082-989X.10.3.285.CrossRef Google Scholar PubMed

McFadden, D.. (2001). Economic choices. American Economic Review, 91, 351–378. doi:10.1257/aer.91.3.351.CrossRef Google Scholar

Patz, R. J., Junker, B. W.. (1999). Applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses. Journal of Educational and Behavioral Statistics, 24(4), 342–366. doi:10.3102/10769986024004342.CrossRef Google Scholar

Pitkin, A., Vispoel, W.. (2001). Differences between self-adapted and computerized adaptive tests: A meta-analysis. Journal of Educational Measurement, 38, 235–247. doi:10.1111/j.1745-3984.2001.tb01125.x.CrossRef Google Scholar

Powers, D., Bennett, R.. (2000). Effects of allowing examinees to select questions on a test of divergent thinking. Applied Measurement in Education, 12, 257–279. doi:10.1207/S15324818AME1203_3.CrossRef Google Scholar

Revuelta, J.. (2004). Estimating ability and item-selection strategy in self-adapted testing: A latent class approach. Journal of Educational and Behavioral Statistics, 29, 379–396. doi:10.3102/10769986029004379.CrossRef Google Scholar

Rocklin, T.. (1994). Self-adapted testing. Applied Measurement in Education, 7, 3–14. doi:10.1207/s15324818ame0701_2.CrossRef Google Scholar

Rocklin, T., O’Donnell, A.. (1987). Self-adapted testing: A performance-improving variant of computerized adaptive testing. Journal of Educational Psychology, 79, 315–319. doi:10.1037/0022-0663.79.3.315.CrossRef Google Scholar

Rocklin, T., O’Donnell, A., Holst, P.. (1995). Effects and underlying mechanisms of self-adapted testing. Journal of Educational Psychology, 87, 103–116. doi:10.1037/0022-0663.87.1.103.CrossRef Google Scholar

Ross, S.An elementary introduction to mathematical finance 2011 3New York: Cambridge University Pressdoi:10.1017/CBO9780511921483.CrossRef Google Scholar

Rubin, D.. (1976). Inference and missing data. Biometrika, 63, 581–592. doi:10.1093/biomet/63.3.581.CrossRef Google Scholar

Ryan, R. M., Deci, E. L.. (2000). Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. American Psychologist, 55(1), 68. doi:10.1037/0003-066X.55.1.68.CrossRef Google Scholar PubMed

Schraw, G., Flowerday, T., Reisetter, M.. (1998). The role of choice in reader engagement. Journal of Educational Psychology, 90, 705–714. doi:10.1037/0022-0663.90.4.705.CrossRef Google Scholar

Sinharay, S., Johnson, M. S., Stern, H. S.. (2006). Posterior predictive assessment of item response theory models. Applied Psychological Measurement, 30(4), 298–321. doi:10.1177/0146621605285517.CrossRef Google Scholar

Thurstone, L. L.. (1927). A law of comparative judgment. Psychological Review, 34(4), 273. doi:10.1037/h0070288.CrossRef Google Scholar

Tsai, R-C. (2000). Remarks on the identifiability of Thurstonian ranking models: Case V, Case III, or neither?. Psychometrika, 65(2), 233–240. doi:10.1007/BF02294376.CrossRef Google Scholar

Tsai, R-C. (2003). Remarks on the identifiability of Thurstonian paired comparison models under multiple judgment. Psychometrika, 68(3), 361–372. doi:10.1007/BF02294732.CrossRef Google Scholar

Tsai, R-C, Böckenholt, U.. (2002). Two-level linear paired comparison models: Estimation and identifiability issues. Mathematical Social Sciences, 43(3), 429–449. doi:10.1016/S0165-4896(02)00019-7.CrossRef Google Scholar

Tsai, R-C, Böckenholt, U.. (2006). Modelling intransitive preferences: A random-effects approach. Journal of Mathematical Psychology, 50(1), 1–14. doi:10.1016/j.jmp.2005.11.004.CrossRef Google Scholar

Tsai, R-C, Böckenholt, U.. (2008). On the importance of distinguishing between within-and between-subject effects in intransitive intertemporal choice. Journal of Mathematical Psychology, 52(1), 10–20. doi:10.1016/j.jmp.2007.09.004.CrossRef Google Scholar

Tversky, A., Kahneman, D.. (1991). Loss aversion in riskless choice: A reference-dependent model. The Quarterly Journal of Economics, 106, 1039–1061. doi:10.2307/2937956.CrossRef Google Scholar

van der Linden, W. J.. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72(3), 287–308. doi:10.1007/s11336-006-1478-z.CrossRef Google Scholar

Vispoel, W., Coffman, D.. (1994). Computerized-adaptive and self-adaptive music-listening tests: Psychometric features and motivational benefits. Applied Measurement in Education, 7, 25–51. doi:10.1207/s15324818ame0701_4.CrossRef Google Scholar

Wainer, H.Uneducated guesses: Using evidence to uncover misguided education policies 2011 Princeton, NJ: Princeton University Pressdoi:10.1515/9781400839575.CrossRef Google Scholar

Wainer, H., Thissen, D.. (1994). On examinee choice in educational testing. Review of Educational Research, 64, 159–195. doi:10.3102/00346543064001159.CrossRef Google Scholar

Wainer, H., Wang, X. B., Thissen, D.. (1994). How well can we compare scores on test forms that are constructed by examinees’ choice?. Journal of Educational Measurement, 31, 183–199. doi:10.1111/j.1745-3984.1994.tb00442.x.CrossRef Google Scholar

Wang, W., Jin, K., Qiu, X., Wang, L.. (2012). Item response models for examinee-selected items. Journal of Educational Measurement, 49, 419–445. doi:10.1111/j.1745-3984.2012.00184.x.CrossRef Google Scholar

Wang, X.B. (1992). Achieving equity in self-selected subsets of test items (Unpublished doctoral dissertation). University of Hawaii..Google Scholar

Wang, X. B., Wainer, H., Thissen, D.. (1995). On the viability of some untestable assumptions equating exams that allow examinee choice. Applied Measurement in Education, 8, 211–225. doi:10.1207/s15324818ame0803_2.CrossRef Google Scholar

Wise, S.. (1994). Understanding self-adaptive testing: The perceived control hypothesis. Applied Measurement in Education, 7, 15–24. doi:10.1207/s15324818ame0701_3.CrossRef Google Scholar

Wise, S., Plake, B., Johnson, P., Roos, L.. (1992). A comparison of self-adapted and computerized adaptive tests. Journal of Educational Measurement, 29, 329–339. doi:10.1111/j.1745-3984.1992.tb00381.x.CrossRef Google Scholar

Yoon, S.Y. (2011). Psychometric properties of the Revised Purdue Spatial Visualization tests: Visualization of rotations (the revised PSVT-R) (Unpublished doctoral dissertation). Purdue University..Google Scholar

Article contents

A Hierarchical Model for Accuracy and Choice on Standardized Tests

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests