Hostname: page-component-745bb68f8f-hvd4g Total loading time: 0 Render date: 2025-01-07T18:58:40.730Z Has data issue: false hasContentIssue false

Reporting of Subscores Using Multidimensional Item Response Theory

Published online by Cambridge University Press:  01 January 2025

Shelby J. Haberman
Affiliation:
ETS
Sandip Sinharay*
Affiliation:
ETS
*
Requests for reprints should be sent to Sandip Sinharay, ETS, Princeton, NJ, USA. E-mail: ssinharay@ets.org

Abstract

Recently, there has been increasing interest in reporting subscores. This paper examines reporting of subscores using multidimensional item response theory (MIRT) models (e.g., Reckase in Appl. Psychol. Meas. 21:25–36, 1997; C.R. Rao and S. Sinharay (Eds), Handbook of Statistics, vol. 26, pp. 607–642, North-Holland, Amsterdam, 2007; Beguin & Glas in Psychometrika, 66:471–488, 2001). A MIRT model is fitted using a stabilized Newton–Raphson algorithm (Haberman in The Analysis of Frequency Data, University of Chicago Press, Chicago, 1974; Sociol. Methodol. 18:193–211, 1988) with adaptive Gauss–Hermite quadrature (Haberman, von Davier, & Lee in ETS Research Rep. No. RR-08-45, ETS, Princeton, 2008). A new statistical approach is proposed to assess when subscores using the MIRT model have any added value over (i)  the total score or (ii)  subscores based on classical test theory (Haberman in J. Educ. Behav. Stat. 33:204–229, 2008; Haberman, Sinharay, & Puhan in Br. J. Math. Stat. Psychol. 62:79–95, 2008). The MIRT-based methods are applied to several operational data sets. The results show that the subscores based on MIRT are slightly more accurate than subscore estimates derived by classical test theory.

Type
Original Paper
Copyright
Copyright © 2010 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Note: Any opinions expressed in this publication are those of the authors and not necessarily of Educational Testing Service.

References

Ackerman, T., & Shu, Z. (2009). Using confirmatory mirt modeling to provide diagnostic information in large scale assessment. Paper presented at the annual meeting of the national council of measurement in education, San Diego, CA, April 2009.Google Scholar
Adams, R.J., Wilson, M.R., & Wang, W.C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 123.CrossRefGoogle Scholar
Beguin, A.A., & Glas, C.A.W. (2001). MCMC estimation and some fit analysis of multidimensional irt models. Psychometrika, 66, 471488.CrossRefGoogle Scholar
Bock, R.D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: application of an em algorithm. Psychometrika, 46, 443459.CrossRefGoogle Scholar
de la Torre, J., & Patz, R.J. (2005). Making the most of what we have: a practical application of multidimensional irt in test scoring. Journal of Educational and Behavioral Statistics, 30, 295311.CrossRefGoogle Scholar
Dwyer, A., Boughton, K.A., Yao, L., Steffen, M., & Lewis, D. (2006). A comparison of subscale score augmentation methods using empirical data. Paper presented at the annual meeting of the national council of measurement in education, San Fransisco, CA, April 2006.Google Scholar
Haberman, S.J. (1974). The analysis of frequency data, Chicago: University of Chicago Press.Google Scholar
Haberman, S.J. (1988). A stabilized Newton-Raphson algorithm for log-linear models for frequency tables derived by indirect observation. Sociological Methodology, 18, 193211.CrossRefGoogle Scholar
Haberman, S.J. (2007). The information a test provides on an ability parameter (ETS Research Rep. No. RR-07-18). Princeton, NJ: ETS.Google Scholar
Haberman, S.J. (2008). When can subscores have value?. Journal of Educational and Behavioral Statistics, 33, 204229.CrossRefGoogle Scholar
Haberman, S.J., & Sinharay, S. (2010, in press). Subscores based on multidimensional item response theory (ETS Research Rep.). Princeton, NJ: ETS.CrossRefGoogle Scholar
Haberman, S.J., Sinharay, S., & Puhan, G. (2008). Reporting subscores for institutions. British Journal of Mathematical and Statistical Psychology, 62, 7995.CrossRefGoogle Scholar
Haberman, S.J., von Davier, M., & Lee, Y. (2008). Comparison of multidimensional item response models: multivariate normal ability distributions versus multivariate polytomous distributions (ETS Research Rep. No. RR-08-45). Princeton, NJ: ETS.Google Scholar
Haladyna, S.J., & Kramer, G.A. (2004). The validity of subscores for a credentialing test. Evaluation and the Health Professions, 24(7), 349368.CrossRefGoogle Scholar
Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991). Fundamentals of item response theory, Newbury Park: Sage.Google Scholar
Luecht, R.M., Gierl, M.J., Tan, X., & Huff, K. (2006). Scalability and the development of useful diagnostic scales. Paper presented at the annual meeting of the national council on measurement in education, San Francisco, CA, April 2006.Google Scholar
Puhan, G., Sinharay, S., Haberman, S.J., & Larkin, K. (2010, in press). The utility of augmented subscores in a licensure exam: an evaluation of methods using empirical data. Applied Measurement in Education.CrossRefGoogle Scholar
Reckase, M.D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21, 2536.CrossRefGoogle Scholar
Reckase, M.D. (2007). Multidimensional item response theory. In Rao, C.R., & Sinharay, S. (Eds.), Handbook of statistics (pp. 607642). Amsterdam: North-Holland.Google Scholar
Schilling, S., & Bock, R.D. (2005). High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika, 70, 533555.Google Scholar
Sinharay, S. (2010, in press). How often do subscores have added value? Results from operational and simulated data. Journal of Educational Measurement.CrossRefGoogle Scholar
Sinharay, S., Haberman, S.J., & Puhan, G. (2007). Subscores based on classical test theory: to report or not to report. Educational Measurement: Issues and Practice, 21–28.CrossRefGoogle Scholar
Thissen, D., Nelson, L., & Swygert, K.A. (2001). Item response theory applied to combinations of multiple-choice and constructed-response items—approximation methods for scale scores. In Thissen, D., & Wainer, H. (Eds.), Test scoring (pp. 293341). Hillsdale: Lawrence Erlbaum.CrossRefGoogle Scholar
Wainer, H., Vevea, J.L., Camacho, F., Reeve, B.B., Rosa, K., & Nelson, L. (2001). Augmented scores—“borrowing strength” to compute scores based on small numbers of items. In Thissen, D., & Wainer, H. (Eds.), Test scoring (pp. 343387). Hillsdale: Lawrence Erlbaum.Google Scholar
Yao, L.H., & Boughton, K.A. (2007). A multidimensional item response modeling approach for improving subscale proficiency estimation and classification. Applied Psychological. Measurement, 31(2), 83105.Google Scholar
Yen, W.M. (1987). A Bayesian/IRT measure of objective performance. Paper presented at the annual meeting of the psychometric society, Montreal, Quebec, April 1987.Google Scholar