Hostname: page-component-5f745c7db-2kk5n Total loading time: 0 Render date: 2025-01-06T07:40:35.614Z Has data issue: true hasContentIssue false

Higher-Order Asymptotics and Its Application to Testing the Equality of the Examinee Ability Over Two Sets of Items

Published online by Cambridge University Press:  01 January 2025

Sandip Sinharay*
Affiliation:
Educational Testing Service
Jens Ledet Jensen
Affiliation:
Aarhus University
*
Correspondence should be made to Sandip Sinharay, Educational Testing Service, Princeton, USA. Email: ssinharay@ets.org

Abstract

In educational and psychological measurement, researchers and/or practitioners are often interested in examining whether the ability of an examinee is the same over two sets of items. Such problems can arise in measurement of change, detection of cheating on unproctored tests, erasure analysis, detection of item preknowledge, etc. Traditional frequentist approaches that are used in such problems include the Wald test, the likelihood ratio test, and the score test (e.g., Fischer, Appl Psychol Meas 27:3–26, 2003; Finkelman, Weiss, & Kim-Kang, Appl Psychol Meas 34:238–254, 2010; Glas & Dagohoy, Psychometrika 72:159–180, 2007; Guo & Drasgow, Int J Sel Assess 18:351–364, 2010; Klauer & Rettig, Br J Math Stat Psychol 43:193–206, 1990; Sinharay, J Educ Behav Stat 42:46–68, 2017). This paper shows that approaches based on higher-order asymptotics (e.g., Barndorff-Nielsen & Cox, Inference and asymptotics. Springer, London, 1994; Ghosh, Higher order asymptotics. Institute of Mathematical Statistics, Hayward, 1994) can also be used to test for the equality of the examinee ability over two sets of items. The modified signed likelihood ratio test (e.g., Barndorff-Nielsen, Biometrika 73:307–322, 1986) and the Lugannani–Rice approximation (Lugannani & Rice, Adv Appl Prob 12:475–490, 1980), both of which are based on higher-order asymptotics, are shown to provide some improvement over the traditional frequentist approaches in three simulations. Two real data examples are also provided.

Type
Original Paper
Copyright
Copyright © 2018 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Note: The research reported in this article was supported by the Institute of Education Sciences (IES), U.S. Department of Education, through grant R305D170026. Any opinions expressed in this publication are those of the author and not necessarily of IES or Educational Testing Service.

References

Barndorff-Nielsen, O. E. (1986). Inference on full or partial parameters based on the standardized signed log likelihood ratio. Biometrika, 73, 307322.Google Scholar
Barndorff-Nielsen, O. E. (1991). Modified signed log likelihood ratio. Biometrika, 78, 557563.CrossRefGoogle Scholar
Barndorff-Nielsen, O. E., Cox, D. R. (1994). Inference and Asymptotics, London: Springer.CrossRefGoogle Scholar
Bedrick, E. J. (1997). Approximating the conditional distribution of person fit indexes for checking the Rasch model. Psychometrika, 62, 191199.CrossRefGoogle Scholar
Benjamini, Y., Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B (Methodological), 57, 289300.CrossRefGoogle Scholar
Biehler, M., Holling, H., Doebler, P. (2015). Saddlepoint approximations of the distribution of the person parameter in the two parameter logistic model. Psychometrika, 80, 665688.CrossRefGoogle ScholarPubMed
Brazzale, A. R., Davison, A. C., Reid, N. (2007). Applied asymptotics, Oxford: Cambridge University Press.CrossRefGoogle Scholar
Cizek, G. J., Wollack, J. A. (2017). Handbook of quantitative methods for detecting cheating on tests, Washington, DC: Routledge.Google Scholar
Costa, P. T., McCrae, R. R. (1992). Normal personality assessment in clinical practice: The NEO personality inventory. Psychological Assessment, 4, 513.CrossRefGoogle Scholar
Cox, D. R., Hinkley, D. V. (1974). Theoretical statistics, London: Chapman and Hall.CrossRefGoogle Scholar
Donoghue, J. R. (1994). An empirical examination of the IRT information of polytomously scored reading items under the generalized partial credit model. Journal of Educational Measurement, 31, 295311.CrossRefGoogle Scholar
Drasgow, F., Levine, M. V., Zickar, M. J. (1996). Optimal identification of mismeasured individuals. Applied Measurement in Education, 9, 4764.CrossRefGoogle Scholar
Ferrara, S. (2017). A framework for policies and practices to improve test security programs: Prevention, detection, investigation, and resolution (PDIR). Educational Measurement: Issues and Practice, 36(3), 524.CrossRefGoogle Scholar
Finkelman, M., Weiss, D. J., Kim-Kang, G. (2010). Item selection and hypothesis testing for the adaptive measurement of change. Applied Psychological Measurement, 34, 238254.CrossRefGoogle Scholar
Fischer, G. H. (2003). The precision of gain scores under an item response theory perspective: A comparison of asymptotic and exact conditional inference about change. Applied Psychological Measurement, 27, 326.CrossRefGoogle Scholar
Ghosh, J. K. (1994). Higher order asymptotics, Hayward, CA: Institute of Mathematical Statistics.CrossRefGoogle Scholar
Glas, C. A. W., Dagohoy, A. V. T. (2007). A person fit test for IRT models for polytomous items. Psychometrika, 72, 159180.CrossRefGoogle Scholar
Guo, J., Drasgow, F. (2010). Identifying cheating on unproctored internet tests: The z-test and the likelihood ratio test. International Journal of Selection and Assessment, 18, 351364.CrossRefGoogle Scholar
Haberman, S. J. (2006). An elementary test of the normal 2PL model against the normal 3PL alternative. ETS Research Report RR-06-14, ETS, Princeton, NJ.CrossRefGoogle Scholar
Haberman, S. J., & Lee, Y.-H. (2017). A statistical procedure for testing unusually frequent exactly matching responses and nearly matching responses. ETS Research Report RR-17-23, ETS, Princeton, NJ.CrossRefGoogle Scholar
Jensen, J. L. (1992). The modified signed likelihood statistic and saddlepoint approximations. Biometrika, 79, 693703.CrossRefGoogle Scholar
Jensen, J. L. (1995). Saddlepoint approximations, Oxford: Clarendon Press.CrossRefGoogle Scholar
Jensen, J. L. (1997). A simple derivation of r\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^*$$\end{document} for curved exponential families. Scandinavian Journal of Statistics, 24, 3346.CrossRefGoogle Scholar
Klauer, K. C. (1991). An exact and optimal standardized person test for assessing consistency with the Rasch model. Psychometrika, 56, 213228.CrossRefGoogle Scholar
Klauer, K. C., Rettig, K. (1990). An approximately standardized person test for assessing consistency with a latent trait model. British Journal of Mathematical and Statistical Psychology, 43, 193206.CrossRefGoogle Scholar
Lewis, C., & Thayer, D. T. (1998). The power of the K-index (or PMIR) to detect copying. ETS Research Report 98-49, Educational Testing Service, Princeton, NJ.CrossRefGoogle Scholar
Lugannani, R., Rice, S. (1980). Saddle point approximation for the distribution of the sum of independent random variables. Advances in Applied Probability, 12, 475490.CrossRefGoogle Scholar
Maris, G., Bechger, T. (2009). On interpreting the model parameters for the three parameter logistic model. Measurement: Interdisciplinary Research and Perspective, 7(2), 7588.Google Scholar
Martín, E. S., González, J., Tuerlinckx, F. (2015). On the unidentifiability of the fixed-effects 3PL model. Psychometrika, 80, 450467.CrossRefGoogle Scholar
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159176.CrossRefGoogle Scholar
Pierce, D. A., Peters, D. (1992). Practical use of higher-order asymptotics for multiparameter exponential families. Journal of Royal Statistical Society, Series B, 54, 701738.CrossRefGoogle Scholar
R Core Team (2017). R: A language and environment for statistical computing, Vienna: R Foundation for Statistical Computing.Google Scholar
Rao, C. R. (1973). Linear statistical inference and its applications, 2New York, NY: Wiley.CrossRefGoogle Scholar
Reid, N. (2003). Asymptotics and the theory of inference. The Annals of Statistics, 31, 16951731.CrossRefGoogle Scholar
Sinharay, S. (2017). Detection of item preknowledge using likelihood ratio test and score test. Journal of Educational and Behavioral Statistics, 42, 4668.CrossRefGoogle Scholar
Sinharay, S., Duong, M. Q., Wood, S. W. (2017). A New Statistic for Detection of Aberrant Answer Changes. Journal of Educational Measurement, 54, 200217.CrossRefGoogle Scholar
Skorupski, W. P., Wainer, H. (2017). The case for Bayesian methods when investigating test fraud. In Cizek, G. J., Wollack, J. A. (Eds), Handbook of quantitative methods for detecting cheating on tests, Washington, DC: Routledge 214231.Google Scholar
Skovgaard, I. M. (1990). On the density of minimum contrast estimators. The Annals of Statistics, 18, 779789.CrossRefGoogle Scholar
von Davier, M., Molenaar, I. W. (2003). A person-fit index for polytomous Rasch models, latent class models, and their mixture generalizations. Psychometrika, 68, 213228.CrossRefGoogle Scholar
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54, 427450.CrossRefGoogle Scholar
Wollack, J. A., Cohen, A. S., Eckerly, C. A. (2015). Detecting test tampering using item response theory. Educational and Psychological Measurement, 75, 931953.CrossRefGoogle ScholarPubMed
Wollack, J. A., Eckerly, C. (2017). Detecting test tampering at the group level. In Cizek, G. J., Wollack, J. A. (Eds), Handbook of quantitative methods for detecting cheating on tests, Washington, DC: Routledge 214231.Google Scholar
Wollack, J. A., Schoenig, R. W. (2018). Cheating. In Frey, B. B. (Eds), The SAGE encyclopedia of educational research, measurement, and evaluation, Thousand Oaks, CA: Sage 260265.Google Scholar