Hostname: page-component-745bb68f8f-kw2vx Total loading time: 0 Render date: 2025-01-07T18:49:01.400Z Has data issue: false hasContentIssue false

Asymptotically Correct Standardization of Person-Fit Statistics Beyond Dichotomous Items

Published online by Cambridge University Press:  01 January 2025

Sandip Sinharay*
Affiliation:
McGraw-Hill Education CTB
*
Correspondence should be made to Sandip Sinharay, McGraw-Hill Education CTB, Monterey, USA. Email: ssinharay@pacificmetrics.com

Abstract

The lz\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_z$$\end{document} statistic (Drasgow et al. in Br J Math Stat Psychol 38:67–86, 1985) is one of the most popular person-fit statistics (Armstrong et al. in Pract Assess Res Eval 12(16):1–10, 2007). Snijders (Psychometrika 66:331–342, 2001) derived the asymptotic null distribution of lz\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_z$$\end{document} when the examinee ability parameter is estimated. He also suggested the lz∗\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l^*_z$$\end{document} statistic, which is the asymptotically correct standardized version of lz\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_z$$\end{document}. However, Snijders (Psychometrika 66:331–342, 2001) only considered tests with dichotomous items. In this paper, the asymptotic null distribution of lz\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_z$$\end{document} is derived for mixed-format tests (those that include both dichotomous and polytomous items). The asymptotically correct standardized version of lz\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_z$$\end{document}, which can be considered as the extension of lz∗\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l^*_z$$\end{document} to such tests, is suggested. The Type I error rate and power of the suggested statistic are examined from several simulated datasets. The suggested statistic is computed using a real dataset. The suggested statistic appears to be a satisfactory tool for assessing person fit for mixed-format tests.

Type
Article
Copyright
Copyright © 2015 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

The research reported in this paper was performed when the author was an employee of McGraw-Hill Education CTB. The author is currently an employee of Pacific Metrics Corporation.

References

Armstrong, R., Stoumbos, Z., Kung, M., Shi, M. (2007). On the performance of lz\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} person-fit statistic. Practical Assessment, Research, and Evaluation, 12 (16), 110.Google Scholar
Chon, K. H., Lee, W., Ansley, T. N. (2013). An empirical investigation of methods for assessing item fit for mixed format tests. Applied Measurement in Education, 26, 115.CrossRefGoogle Scholar
Chon, K. H., Lee, W., Dunbar, S. B. (2010). A comparison of item fit statistics for mixed IRT models. Journal of Educational Measurement, 47, 318338.CrossRefGoogle Scholar
Costa, P. T., McCrae, R. R. (1992). Normal personality assessment in clinical practice: The NEO personality inventory. Psychological Assessment, 4, 513.CrossRefGoogle Scholar
Drasgow, F., Levine, M. V., McLaughlin, M. E. (1987). Detecting inappropriate test scores with optimal and practical appropriateness indices. Applied Psychological Measurement, 11, 5979.CrossRefGoogle Scholar
Drasgow, F., Levine, M. V., Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38, 6786.CrossRefGoogle Scholar
Emons, WHM (2008). Nonparametric person-fit analysis of polytomous item scores. Applied Psychological Measurement, 32, 224247.CrossRefGoogle Scholar
Finkelman, M., & Kim, W. (2007). Using person fit in a body of work standard setting. Paper presented at the Annual meeting of the American Education Research Association, Chicago, IL.Google Scholar
Glas, CAW, Dagohoy, AVT (2007). A person fit test for IRT models for polytomous items. Psychometrika, 72, 159180.CrossRefGoogle Scholar
Glas, CAW, Meijer, R. R. (2003). (1994). A Bayesian approach to person fit analysis in item response theory models. Applied Psychological Measurement, 27, 217233.CrossRefGoogle Scholar
Hanson, B., Harris, D. J. A comparison of several statistical methods for examining allegations of copying (ACT research report series no. 87–15), Iowa City, IA: American College Testing.Google Scholar
Hoadley, B. (1971). Asymptotic properties of maximum likelihood estimators for the independent not identically distributed case. The Annals of Mathematical Statistics, 42, 19771991.CrossRefGoogle Scholar
Klauer, K. C. (1991). An exact and optimal standardized person test for assessing consistency with the Rasch model. Psychometrika, 56, 213228.CrossRefGoogle Scholar
Kolen, M. J., Lee, W. (2011). Psychometric properties of scores on mixed-format tests. Educational Measurement: Issues and Practice, 30 (2), 1524.CrossRefGoogle Scholar
Levine, M. V., Rubin, D. B. (1979). Measuring the appropriateness of multiple-choice test scores. Journal of Educational Statistics, 4, 269290.CrossRefGoogle Scholar
Li, M. F., Olenik, S. (1997). The power of Rasch person-fit statistics in detecting unusual response patterns. Applied Psychological Measurement, 21, 215231.CrossRefGoogle Scholar
Magis, D. (2015). A note on weighted likelihood and jeffreys modal estimation of proficiency levels in polytomous item response models. Psychometrika, 80, 200204.CrossRefGoogle ScholarPubMed
Magis, D., Beland, S., Raiche, G. (2014). Snijders’s correction of infit and outfit indexes with estimated ability level: An analysis with the Rasch model. Journal of Applied Measurement, 15, 8293.Google Scholar
Magis, D., Raiche, G., Beland, S. (2012). A didactic presentation of Snijders’s lz\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l^*_z$$\end{document} index of person fit with emphasis on response model selection and ability estimation. Journal of Educational and Behavioral Statistics, 37, 5781.CrossRefGoogle Scholar
Magis, D., & Verhelst, N. (2014). On the finiteness and uniqueness of the weighted likelihood estimator of ability in polytomous IRT models. Research Center for Examination and Certification Workshop on IRT and Educational Measurement, University of Twente, The Netherlands.Google Scholar
Meijer, R. R., Egberink, I. J., Emons, W. H., Sijtsma, K. (2008). Detection and validation of unscalable item score patterns using item response theory: An illustration with harters self-perception profile for children. Journal of Personality Assessment, 90, 227238.CrossRefGoogle ScholarPubMed
Meijer, R. R., Nering, M. L. (1997). Trait level estimation for nonfitting response vectors. Applied Psychological Measurement, 21, 321336.CrossRefGoogle Scholar
Meijer, R. R., Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25, 107135.CrossRefGoogle Scholar
Meijer, R. R., Tendeiro, J. N. (2012). The use of the lz\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} and lz\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l^*_z$$\end{document} person-fit statistics and problems derived from model misspecification. Journal of Educational and Behavioral Statistics, 37, 758766.CrossRefGoogle Scholar
Molenaar, I. W., Hoijtink, H. (1990). The many null distributions of person fit indices. Psychometrika, 55, 75106.CrossRefGoogle Scholar
Muraki, E. (1992). (2015). (2001). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159176.CrossRefGoogle Scholar
R Core Team R: A language and environment for statistical computing, Austria: Vienna.Google Scholar
Rohatgi, V. K., Saleh, AKME An introduction to probability and statistics, New York, NY: Wiley.Google Scholar
Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Annals of Statistics, 12, 11511172.CrossRefGoogle Scholar
Samejima, F. (1973). Estimation of latent ability using a pattern of graded scores. Psychometrika, 38, 203219.CrossRefGoogle Scholar
Sijtsma, K., Meijer, R. R. (2001). The person response function as a tool in person-fit research. Psychometrika, 66, 191207.CrossRefGoogle Scholar
Sinharay, S. (2015). A note on the asymptotic distribution of estimates of the ability parameter: Beyond dichotomous items and unidimensional IRT models. (under review).Google Scholar
Sinharay, S. Assessment of person fit for mixed-format tests. Journal of Educational and Behavioral Statistics. (in press).Google Scholar
Sinharay, S., Wan, P., Whitaker, M., Kim, D., Zhang, L., Choi, S. W. (2014). Determining the overall impact of interruptions during online testing. Journal of Educational Measurement, 51, 419440.CrossRefGoogle Scholar
Smith, R. M. (1986). Person fit in the Rasch model. Educational and Psychological Measurement, 46, 359372.CrossRefGoogle Scholar
Snijders, T. (2001). Asymptotic distribution of person-fit statistics with estimated person parameter. Psychometrika, 66, 331342.CrossRefGoogle Scholar
Tao, J., Shi, N., Chang, H. (2012). Item-weighted likelihood method for ability estimation in tests composed of both dichotomous and polytomous items. Journal of Educational and Behavioral Statistics, 37, 298315.CrossRefGoogle Scholar
Tatsuoka, K. K. (1984). Caution indices based on item response theory. Psychometrika, 49, 95110.CrossRefGoogle Scholar
Tendeiro, J. N., Meijer, R. R. (2014). Detection of invalid test scores: The usefulness of simple nonparametric statistics. Journal of Educational Measurement, 51, 239259.CrossRefGoogle Scholar
van Krimpen-Stoop, EMLA, Meijer, R. R. (1999). The null distribution of person-fit statistics for conventional and adaptive tests. Applied Psychological Measurement, 23, 327345.CrossRefGoogle Scholar
van Krimpen-Stoop, EMLA, Meijer, R. R. (2002). Detection of person misfit in computerized adaptive tests with polytomous items. Applied Psychological Measurement, 26, 164180.CrossRefGoogle Scholar
Warm, T. A. (1989). (1982). (1979). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54, 427450.CrossRefGoogle Scholar
Wright, B. D., Masters, G. N. Rating scale analysis [Computer Software], Chicago, IL: Mesa Press.Google Scholar
Wright, B. D., Stone, M. H. Best test design, Chicago, IL: Mesa Press.Google Scholar