Asymptotically Correct Standardization of Person-Fit Statistics Beyond Dichotomous Items

Sandip Sinharay

doi:10.1007/s11336-015-9465-x

Asymptotically Correct Standardization of Person-Fit Statistics Beyond Dichotomous Items

Published online by Cambridge University Press: 01 January 2025

Sandip Sinharay

Show author details

Sandip Sinharay*: Affiliation:
McGraw-Hill Education CTB
*: Correspondence should be made to Sandip Sinharay, McGraw-Hill Education CTB, Monterey, USA. Email: ssinharay@pacificmetrics.com

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

The lz\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_z$$\end{document} statistic (Drasgow et al. in Br J Math Stat Psychol 38:67–86, 1985) is one of the most popular person-fit statistics (Armstrong et al. in Pract Assess Res Eval 12(16):1–10, 2007). Snijders (Psychometrika 66:331–342, 2001) derived the asymptotic null distribution of lz\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_z$$\end{document} when the examinee ability parameter is estimated. He also suggested the lz∗\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l^*_z$$\end{document} statistic, which is the asymptotically correct standardized version of lz\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_z$$\end{document}. However, Snijders (Psychometrika 66:331–342, 2001) only considered tests with dichotomous items. In this paper, the asymptotic null distribution of lz\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_z$$\end{document} is derived for mixed-format tests (those that include both dichotomous and polytomous items). The asymptotically correct standardized version of lz\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l_z$$\end{document}, which can be considered as the extension of lz∗\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$l^*_z$$\end{document} to such tests, is suggested. The Type I error rate and power of the suggested statistic are examined from several simulated datasets. The suggested statistic is computed using a real dataset. The suggested statistic appears to be a satisfactory tool for assessing person fit for mixed-format tests.

Keywords

generalized partial credit model lz l*z polytomous items

Type: Article
Information: Psychometrika , Volume 81 , Issue 4 , December 2016 , pp. 992 - 1013

DOI: https://doi.org/10.1007/s11336-015-9465-x [Opens in a new window]
Copyright: Copyright © 2015 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

The research reported in this paper was performed when the author was an employee of McGraw-Hill Education CTB. The author is currently an employee of Pacific Metrics Corporation.

References

Armstrong, R., Stoumbos, Z., Kung, M., Shi, M. (2007). On the performance of

l_{z}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document}

person-fit statistic. Practical Assessment, Research, and Evaluation, 12 (16), 1–10.Google Scholar

Chon, K. H., Lee, W., Ansley, T. N. (2013). An empirical investigation of methods for assessing item fit for mixed format tests. Applied Measurement in Education, 26, 1–15.CrossRef Google Scholar

Chon, K. H., Lee, W., Dunbar, S. B. (2010). A comparison of item fit statistics for mixed IRT models. Journal of Educational Measurement, 47, 318–338.CrossRef Google Scholar

Costa, P. T., McCrae, R. R. (1992). Normal personality assessment in clinical practice: The NEO personality inventory. Psychological Assessment, 4, 5–13.CrossRef Google Scholar

Drasgow, F., Levine, M. V., McLaughlin, M. E. (1987). Detecting inappropriate test scores with optimal and practical appropriateness indices. Applied Psychological Measurement, 11, 59–79.CrossRef Google Scholar

Drasgow, F., Levine, M. V., Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38, 67–86.CrossRef Google Scholar

Emons, WHM (2008). Nonparametric person-fit analysis of polytomous item scores. Applied Psychological Measurement, 32, 224–247.CrossRef Google Scholar

Finkelman, M., & Kim, W. (2007). Using person fit in a body of work standard setting. Paper presented at the Annual meeting of the American Education Research Association, Chicago, IL.Google Scholar

Glas, CAW, Dagohoy, AVT (2007). A person fit test for IRT models for polytomous items. Psychometrika, 72, 159–180.CrossRef Google Scholar

Glas, CAW, Meijer, R. R. (2003). (1994). A Bayesian approach to person fit analysis in item response theory models. Applied Psychological Measurement, 27, 217–233.CrossRef Google Scholar

Hanson, B., Harris, D. J. A comparison of several statistical methods for examining allegations of copying (ACT research report series no. 87–15), Iowa City, IA: American College Testing.Google Scholar

Hoadley, B. (1971). Asymptotic properties of maximum likelihood estimators for the independent not identically distributed case. The Annals of Mathematical Statistics, 42, 1977–1991.CrossRef Google Scholar

Klauer, K. C. (1991). An exact and optimal standardized person test for assessing consistency with the Rasch model. Psychometrika, 56, 213–228.CrossRef Google Scholar

Kolen, M. J., Lee, W. (2011). Psychometric properties of scores on mixed-format tests. Educational Measurement: Issues and Practice, 30 (2), 15–24.CrossRef Google Scholar

Levine, M. V., Rubin, D. B. (1979). Measuring the appropriateness of multiple-choice test scores. Journal of Educational Statistics, 4, 269–290.CrossRef Google Scholar

Li, M. F., Olenik, S. (1997). The power of Rasch person-fit statistics in detecting unusual response patterns. Applied Psychological Measurement, 21, 215–231.CrossRef Google Scholar

Magis, D. (2015). A note on weighted likelihood and jeffreys modal estimation of proficiency levels in polytomous item response models. Psychometrika, 80, 200–204.CrossRef Google Scholar PubMed

Magis, D., Beland, S., Raiche, G. (2014). Snijders’s correction of infit and outfit indexes with estimated ability level: An analysis with the Rasch model. Journal of Applied Measurement, 15, 82–93.Google Scholar

Magis, D., Raiche, G., Beland, S. (2012). A didactic presentation of Snijders’s

l_{z}^{*}

index of person fit with emphasis on response model selection and ability estimation. Journal of Educational and Behavioral Statistics, 37, 57–81.CrossRef Google Scholar

Magis, D., & Verhelst, N. (2014). On the finiteness and uniqueness of the weighted likelihood estimator of ability in polytomous IRT models. Research Center for Examination and Certification Workshop on IRT and Educational Measurement, University of Twente, The Netherlands.Google Scholar

Meijer, R. R., Egberink, I. J., Emons, W. H., Sijtsma, K. (2008). Detection and validation of unscalable item score patterns using item response theory: An illustration with harters self-perception profile for children. Journal of Personality Assessment, 90, 227–238.CrossRef Google Scholar PubMed

Meijer, R. R., Nering, M. L. (1997). Trait level estimation for nonfitting response vectors. Applied Psychological Measurement, 21, 321–336.CrossRef Google Scholar

Meijer, R. R., Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25, 107–135.CrossRef Google Scholar

Meijer, R. R., Tendeiro, J. N. (2012). The use of the

l_{z}

and

l_{z}^{*}

person-fit statistics and problems derived from model misspecification. Journal of Educational and Behavioral Statistics, 37, 758–766.CrossRef Google Scholar

Molenaar, I. W., Hoijtink, H. (1990). The many null distributions of person fit indices. Psychometrika, 55, 75–106.CrossRef Google Scholar

Muraki, E. (1992). (2015). (2001). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176.CrossRef Google Scholar

R Core Team R: A language and environment for statistical computing, Austria: Vienna.Google Scholar

Rohatgi, V. K., Saleh, AKME An introduction to probability and statistics, New York, NY: Wiley.Google Scholar

Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Annals of Statistics, 12, 1151–1172.CrossRef Google Scholar

Samejima, F. (1973). Estimation of latent ability using a pattern of graded scores. Psychometrika, 38, 203–219.CrossRef Google Scholar

Sijtsma, K., Meijer, R. R. (2001). The person response function as a tool in person-fit research. Psychometrika, 66, 191–207.CrossRef Google Scholar

Sinharay, S. (2015). A note on the asymptotic distribution of estimates of the ability parameter: Beyond dichotomous items and unidimensional IRT models. (under review).Google Scholar

Sinharay, S. Assessment of person fit for mixed-format tests. Journal of Educational and Behavioral Statistics. (in press).Google Scholar

Sinharay, S., Wan, P., Whitaker, M., Kim, D., Zhang, L., Choi, S. W. (2014). Determining the overall impact of interruptions during online testing. Journal of Educational Measurement, 51, 419–440.CrossRef Google Scholar

Smith, R. M. (1986). Person fit in the Rasch model. Educational and Psychological Measurement, 46, 359–372.CrossRef Google Scholar

Snijders, T. (2001). Asymptotic distribution of person-fit statistics with estimated person parameter. Psychometrika, 66, 331–342.CrossRef Google Scholar

Tao, J., Shi, N., Chang, H. (2012). Item-weighted likelihood method for ability estimation in tests composed of both dichotomous and polytomous items. Journal of Educational and Behavioral Statistics, 37, 298–315.CrossRef Google Scholar

Tatsuoka, K. K. (1984). Caution indices based on item response theory. Psychometrika, 49, 95–110.CrossRef Google Scholar

Tendeiro, J. N., Meijer, R. R. (2014). Detection of invalid test scores: The usefulness of simple nonparametric statistics. Journal of Educational Measurement, 51, 239–259.CrossRef Google Scholar

van Krimpen-Stoop, EMLA, Meijer, R. R. (1999). The null distribution of person-fit statistics for conventional and adaptive tests. Applied Psychological Measurement, 23, 327–345.CrossRef Google Scholar

van Krimpen-Stoop, EMLA, Meijer, R. R. (2002). Detection of person misfit in computerized adaptive tests with polytomous items. Applied Psychological Measurement, 26, 164–180.CrossRef Google Scholar

Warm, T. A. (1989). (1982). (1979). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54, 427–450.CrossRef Google Scholar

Wright, B. D., Masters, G. N. Rating scale analysis [Computer Software], Chicago, IL: Mesa Press.Google Scholar

Wright, B. D., Stone, M. H. Best test design, Chicago, IL: Mesa Press.Google Scholar

Article contents

Asymptotically Correct Standardization of Person-Fit Statistics Beyond Dichotomous Items

Abstract

Keywords

Access options

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests