Hostname: page-component-5f745c7db-rgzdr Total loading time: 0 Render date: 2025-01-06T07:26:30.274Z Has data issue: true hasContentIssue false

A Mixed Stochastic Approximation EM (MSAEM) Algorithm for the Estimation of the Four-Parameter Normal Ogive Model

Published online by Cambridge University Press:  01 January 2025

Xiangbin Meng*
Affiliation:
Northeast Normal University
Gongjun Xu
Affiliation:
University of Michigan
*
Correspondence should bemade to XiangbinMeng, School of Mathematics and Statistics, KLAS, Northeast NormalUniversity, 5268 Renmin Street, Changchun, Jilin Province, China. Email: mengxb600@nenu.edu.cn

Abstract

In recent years, the four-parameter model (4PM) has received increasing attention in item response theory. The purpose of this article is to provide more efficient and more reliable computational tools for fitting the 4PM. In particular, this article focuses on the four-parameter normal ogive model (4PNO) model and develops efficient stochastic approximation expectation maximization (SAEM) algorithms to compute the marginalized maximum a posteriori estimator. First, a data augmentation scheme is used for the 4PNO model, which makes the complete data model be an exponential family, and then, a basic SAEM algorithm is developed for the 4PNO model. Second, to overcome the drawback of the SAEM algorithm, we develop an improved SAEM algorithm for the 4PNO model, which is called the mixed SAEM (MSAEM). Results from simulation studies demonstrate that: (1) the MSAEM provides more accurate or comparable estimates as compared with the other estimation methods, while computationally more efficient; (2) the MSAEM is more robust to the choices of initial values and the priors for item parameters, which is a valuable property for practice use. Finally, a real data set is analyzed to show the good performance of the proposed methods.

Type
Theory and Methods
Copyright
Copyright © 2022 The Author(s) under exclusive licence to The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

References

Akaike, H. (1998). Information theory and an extension of the maximum likelihood principle. In Selected papers of hirotugu akaike (pp. 199213). Springer.CrossRefGoogle Scholar
Allassonnière, S., Kuhn, E., Trouvé, A., et al. Construction of Bayesian deformable models via a stochastic approximation algorithm: A convergence study Bernoulli. (2010 16(3), 641678.CrossRefGoogle Scholar
Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques. Boca Raton: CRC Press.CrossRefGoogle Scholar
Barton, M. A., & Lord, F. M. (1981). An upper asymptote for the three-parameter logistic item-response model. ETS Research Report Series, 1981(1), i8.CrossRefGoogle Scholar
Battauz, M. (2020). Regularized estimation of the four-parameter logistic model. Psych, 2(4), 269278.CrossRefGoogle Scholar
Béguin, A. A., & Glas, C. A. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66(4), 541561.CrossRefGoogle Scholar
Berger, J. O. (1990). Robust Bayesian analysis: Sensitivity to the prior. Journal of Statistical Planning and Inference, 25(3), 303328.CrossRefGoogle Scholar
Camilli, G., & Fox, J.-P. (2015). An aggregate IRT procedure for exploratory factor analysis. Journal of Educational and Behavioral Statistics, 40(4), 377401.CrossRefGoogle Scholar
Camilli, G., & Geis, E. (2019). Stochastic approximation EM for large-scale exploratory IRT factor analysis. Statistics in Medicine, 38(21), 39974012.CrossRefGoogle ScholarPubMed
Celeux, G., Hurn, M., & Robert, C. P. (2000). Computational and inferential difficulties with mixture posterior distributions. Journal of the American Statistical Association, 95(451), 957970.CrossRefGoogle Scholar
Culpepper, S. A. (2016). Revisiting the 4-parameter item response model: Bayesian estimation and application. Psychometrika, 81(4), 11421163.CrossRefGoogle ScholarPubMed
Culpepper, S. A. (2017). The prevalence and implications of slipping on low-stakes, large-scale assessments. Journal of Educational and Behavioral Statistics, 42(6), 706725.CrossRefGoogle Scholar
Delyon, B., Lavielle, M., Moulines, E., et al. Convergence of a stochastic approximation version of the EM algorithm The Annals of Statistics. (1999 27(1), 94128.CrossRefGoogle Scholar
DeMars, C. E. (2012). A comparison of limited-information and full-information methods in M plus for estimating item response theory parameters for nonnormal populations. Structural Equation Modeling: A Multidisciplinary Journal, 19(4), 610632.CrossRefGoogle Scholar
Feuerstahler, L. M., & Waller, N. G. (2014). Estimation of the 4-parameter model with marginal maximum likelihood. Multivariate Behavioral Research, 49(3), 285.CrossRefGoogle ScholarPubMed
Fox, J.-P. (2003). Stochastic EM for estimating the parameters of a multilevel IRT model. British Journal of Mathematical and Statistical Psychology, 56(1), 6581.CrossRefGoogle ScholarPubMed
Galarza, C. E., Lachos, V. H., & Bandyopadhyay, D. (2017). Quantile regression in linear mixed models: A stochastic approximation EM approach. Statistics and its Interface, 10(3), 471.CrossRefGoogle ScholarPubMed
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis. Boca Raton: CRC Press.CrossRefGoogle Scholar
Gu, M. G., & Zhu, H.-T. (2001). Maximum likelihood estimation for spatial models by Markov chain Monte Carlo stochastic approximation. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2), 339355.CrossRefGoogle Scholar
Guo, S., & Zheng, C. (2019). The Bayesian expectation–maximization–maximization for the 3plm. Frontiers in Psychology, 10 1175.CrossRefGoogle ScholarPubMed
Jank, W. (2006). Implementing and diagnosing the stochastic approximation EM algorithm. Journal of Computational and Graphical Statistics, 15(4), 803829.CrossRefGoogle Scholar
Kern, J. L., & Culpepper, S. A. (2020). A restricted four-parameter IRT model: The dyad four-parameter normal ogive (Dyad-4PNO) model. Psychometrika, 85(3), 575599.CrossRefGoogle ScholarPubMed
Kuhn, E., & Lavielle, M. (2004). Coupling a stochastic approximation version of EM with an MCMC procedure. ESAIM: Probability and Statistics, 8, 115131.CrossRefGoogle Scholar
Lavielle, M., & Mbogning, C. (2014). An improved SAEM algorithm for maximum likelihood estimation in mixtures of non linear mixed effects models. Statistics and Computing, 24(5), 693707.CrossRefGoogle Scholar
Liao, W.-W., Ho, R.-G., Yen, Y.-C., & Cheng, H.-C. (2012). The four-parameter logistic item response theory model as a robust method of estimating ability despite aberrant responses. Social Behavior and Personality: An International Journal, 40(10), 16791694.CrossRefGoogle Scholar
Loken, E., & Rulison, K. L. (2010). Estimation of a four-parameter item response theory model. British Journal of Mathematical and Statistical Psychology, 63(3), 509525.CrossRefGoogle ScholarPubMed
McKinley, R. L., & Mills, C. N. (1985). A comparison of several goodness-of-fit statistics. Applied Psychological Measurement, 9(1), 4957.CrossRefGoogle Scholar
Meng, X.-L., & Schilling, S. (1996). Fitting full-information item factor models and an empirical investigation of bridge sampling. Journal of the American Statistical Association, 91(435), 12541267.CrossRefGoogle Scholar
Meng, X., Xu, G., Zhang, J., & Tao, J. (2020). Marginalized maximum a posteriori estimation for the four-parameter logistic model under a mixture modelling framework. British Journal of Mathematical and Statistical Psychology, 73, 5182.CrossRefGoogle Scholar
Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24(1), 5064.CrossRefGoogle Scholar
Orlando, M., & Thissen, D. (2003). Further investigation of the performance of s-x2: An item fit index for use with dichotomous item response theory models. Applied Psychological Measurement, 27(4), 289298.CrossRefGoogle Scholar
Patsula, L. (1995). A comparison of item parameter estimates and ICCs produced with TESTGRAF and BILOG under different test lengths and sample sizes. University of Ottawa (Canada).Google Scholar
Reise, S. P., & Waller, N. G. (2003). How many IRT parameters does it take to model psychopathology items?. Psychological Methods, 8(2), 164184.CrossRefGoogle ScholarPubMed
Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22, 400407.CrossRefGoogle Scholar
Rulison, K. L., & Loken, E. (2009). I’ve fallen and I can’t get up: Can high-ability students recover from early mistakes in CAT?. Applied Psychological Measurement, 33(2), 83101.CrossRefGoogle ScholarPubMed
Svetina, D., Valdivia, A., Underhill, S., Dai, S., & Wang, X. (2017). Parameter recovery in multidimensional item response theory models under complexity and nonnormality. Applied Psychological Measurement, 41(7), 530544.CrossRefGoogle ScholarPubMed
Tang, K. L., Way, W. D., & Carey, P. A. (1993). The effect of small calibration sample sizes on TOEFL IRT-based equating. ETS Research Report Series, 1993(2), 138.CrossRefGoogle Scholar
Tao, J., Shi, N.-Z., & Chang, H.-H. (2012). Item-weighted likelihood method for ability estimation in tests composed of both dichotomous and polytomous items. Journal of Educational and Behavioral Statistics, 37(2), 298315.CrossRefGoogle Scholar
Thissen, D. (1982). Marginal maximum likelihood estimation for the one-parameter logistic model. Psychometrika, 47(2), 175186.CrossRefGoogle Scholar
von Davier, M. (2009). Is there need for the 3pl model? Guess what?. Measurement: Interdisciplinary Research and Perspectives, 7(2), 110114.Google Scholar
Waller, N. G., & Feuerstahler, L. (2017). Bayesian modal estimation of the four-parameter item response model in real, realistic, and idealized data sets. Multivariate Behavioral Research, 52(3), 350370.CrossRefGoogle ScholarPubMed
Wang, C., Su, S., & Weiss, D. J. (2018). Robustness of parameter estimation to assumptions of normality in the multidimensional graded response model. Multivariate Behavioral Research, 53(3), 403418.CrossRefGoogle ScholarPubMed
Wei, G. C., & Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. Journal of the American Statistical Association, 85(411), 699704.CrossRefGoogle Scholar
Wollack, J. A., Bolt, D. M., Cohen, A. S., & Lee, Y.-S. (2002). Recovery of item parameters in the nominal response model: A comparison of marginal maximum likelihood estimation and Markov Chain Monte Carlo estimation. Applied Psychological Measurement, 26(3), 339352.CrossRefGoogle Scholar
Yen, W. M. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5(2), 245262.CrossRefGoogle Scholar
Yen, W. M. (1987). A comparison of the efficiency and accuracy of bilog and logist. Psychometrika, 52(2), 275291.CrossRefGoogle Scholar
Yoes, M. (1995). An updated comparison of micro-computer based item parameter estimation procedures used with the 3-parameter IRT model. St. Paul, MN: Assessment Systems Corporation Google Scholar
Zhang, J., Du, H., Zhang, Z., & Tao, J. (2020). Gibbs-slice sampling algorithm for estimating the four-parameter logistic model. Frontiers in Psychology, 11 2121.CrossRefGoogle ScholarPubMed
Zhang, S., Chen, Y., & Liu, Y. (2020). An improved stochastic EM algorithm for large-scale full-information item factor analysis. British Journal of Mathematical and Statistical Psychology, 73(1), 4471.CrossRefGoogle ScholarPubMed
Zhang, X., Wang, C., Weiss, D. J., & Tao, J. (2020c). Bayesian inference for IRT models with non-normal latent trait distributions. Multivariate Behavioral Research 1–21.CrossRefGoogle Scholar
Akaike, H. (1998). Information theory and an extension of the maximum likelihood principle. In Selected papers of hirotugu akaike (pp. 199213). Springer.CrossRefGoogle Scholar
Allassonnière, S., Kuhn, E., Trouvé, A., et al. Construction of Bayesian deformable models via a stochastic approximation algorithm: A convergence study Bernoulli. (2010 16(3), 641678.CrossRefGoogle Scholar
Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques. Boca Raton: CRC Press.CrossRefGoogle Scholar
Barton, M. A., & Lord, F. M. (1981). An upper asymptote for the three-parameter logistic item-response model. ETS Research Report Series, 1981(1), i8.CrossRefGoogle Scholar
Battauz, M. (2020). Regularized estimation of the four-parameter logistic model. Psych, 2(4), 269278.CrossRefGoogle Scholar
Béguin, A. A., & Glas, C. A. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66(4), 541561.CrossRefGoogle Scholar
Berger, J. O. (1990). Robust Bayesian analysis: Sensitivity to the prior. Journal of Statistical Planning and Inference, 25(3), 303328.CrossRefGoogle Scholar
Camilli, G., & Fox, J.-P. (2015). An aggregate IRT procedure for exploratory factor analysis. Journal of Educational and Behavioral Statistics, 40(4), 377401.CrossRefGoogle Scholar
Camilli, G., & Geis, E. (2019). Stochastic approximation EM for large-scale exploratory IRT factor analysis. Statistics in Medicine, 38(21), 39974012.CrossRefGoogle ScholarPubMed
Celeux, G., Hurn, M., & Robert, C. P. (2000). Computational and inferential difficulties with mixture posterior distributions. Journal of the American Statistical Association, 95(451), 957970.CrossRefGoogle Scholar
Culpepper, S. A. (2016). Revisiting the 4-parameter item response model: Bayesian estimation and application. Psychometrika, 81(4), 11421163.CrossRefGoogle ScholarPubMed
Culpepper, S. A. (2017). The prevalence and implications of slipping on low-stakes, large-scale assessments. Journal of Educational and Behavioral Statistics, 42(6), 706725.CrossRefGoogle Scholar
Delyon, B., Lavielle, M., Moulines, E., et al. Convergence of a stochastic approximation version of the EM algorithm The Annals of Statistics. (1999 27(1), 94128.CrossRefGoogle Scholar
DeMars, C. E. (2012). A comparison of limited-information and full-information methods in M plus for estimating item response theory parameters for nonnormal populations. Structural Equation Modeling: A Multidisciplinary Journal, 19(4), 610632.CrossRefGoogle Scholar
Feuerstahler, L. M., & Waller, N. G. (2014). Estimation of the 4-parameter model with marginal maximum likelihood. Multivariate Behavioral Research, 49(3), 285.CrossRefGoogle ScholarPubMed
Fox, J.-P. (2003). Stochastic EM for estimating the parameters of a multilevel IRT model. British Journal of Mathematical and Statistical Psychology, 56(1), 6581.CrossRefGoogle ScholarPubMed
Galarza, C. E., Lachos, V. H., & Bandyopadhyay, D. (2017). Quantile regression in linear mixed models: A stochastic approximation EM approach. Statistics and its Interface, 10(3), 471.CrossRefGoogle ScholarPubMed
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis. Boca Raton: CRC Press.CrossRefGoogle Scholar
Gu, M. G., & Zhu, H.-T. (2001). Maximum likelihood estimation for spatial models by Markov chain Monte Carlo stochastic approximation. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2), 339355.CrossRefGoogle Scholar
Guo, S., & Zheng, C. (2019). The Bayesian expectation–maximization–maximization for the 3plm. Frontiers in Psychology, 10 1175.CrossRefGoogle ScholarPubMed
Jank, W. (2006). Implementing and diagnosing the stochastic approximation EM algorithm. Journal of Computational and Graphical Statistics, 15(4), 803829.CrossRefGoogle Scholar
Kern, J. L., & Culpepper, S. A. (2020). A restricted four-parameter IRT model: The dyad four-parameter normal ogive (Dyad-4PNO) model. Psychometrika, 85(3), 575599.CrossRefGoogle ScholarPubMed
Kuhn, E., & Lavielle, M. (2004). Coupling a stochastic approximation version of EM with an MCMC procedure. ESAIM: Probability and Statistics, 8, 115131.CrossRefGoogle Scholar
Lavielle, M., & Mbogning, C. (2014). An improved SAEM algorithm for maximum likelihood estimation in mixtures of non linear mixed effects models. Statistics and Computing, 24(5), 693707.CrossRefGoogle Scholar
Liao, W.-W., Ho, R.-G., Yen, Y.-C., & Cheng, H.-C. (2012). The four-parameter logistic item response theory model as a robust method of estimating ability despite aberrant responses. Social Behavior and Personality: An International Journal, 40(10), 16791694.CrossRefGoogle Scholar
Loken, E., & Rulison, K. L. (2010). Estimation of a four-parameter item response theory model. British Journal of Mathematical and Statistical Psychology, 63(3), 509525.CrossRefGoogle ScholarPubMed
McKinley, R. L., & Mills, C. N. (1985). A comparison of several goodness-of-fit statistics. Applied Psychological Measurement, 9(1), 4957.CrossRefGoogle Scholar
Meng, X.-L., & Schilling, S. (1996). Fitting full-information item factor models and an empirical investigation of bridge sampling. Journal of the American Statistical Association, 91(435), 12541267.CrossRefGoogle Scholar
Meng, X., Xu, G., Zhang, J., & Tao, J. (2020). Marginalized maximum a posteriori estimation for the four-parameter logistic model under a mixture modelling framework. British Journal of Mathematical and Statistical Psychology, 73, 5182.CrossRefGoogle Scholar
Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24(1), 5064.CrossRefGoogle Scholar
Orlando, M., & Thissen, D. (2003). Further investigation of the performance of s-x2: An item fit index for use with dichotomous item response theory models. Applied Psychological Measurement, 27(4), 289298.CrossRefGoogle Scholar
Patsula, L. (1995). A comparison of item parameter estimates and ICCs produced with TESTGRAF and BILOG under different test lengths and sample sizes. University of Ottawa (Canada).Google Scholar
Reise, S. P., & Waller, N. G. (2003). How many IRT parameters does it take to model psychopathology items?. Psychological Methods, 8(2), 164184.CrossRefGoogle ScholarPubMed
Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22, 400407.CrossRefGoogle Scholar
Rulison, K. L., & Loken, E. (2009). I’ve fallen and I can’t get up: Can high-ability students recover from early mistakes in CAT?. Applied Psychological Measurement, 33(2), 83101.CrossRefGoogle ScholarPubMed
Svetina, D., Valdivia, A., Underhill, S., Dai, S., & Wang, X. (2017). Parameter recovery in multidimensional item response theory models under complexity and nonnormality. Applied Psychological Measurement, 41(7), 530544.CrossRefGoogle ScholarPubMed
Tang, K. L., Way, W. D., & Carey, P. A. (1993). The effect of small calibration sample sizes on TOEFL IRT-based equating. ETS Research Report Series, 1993(2), 138.CrossRefGoogle Scholar
Tao, J., Shi, N.-Z., & Chang, H.-H. (2012). Item-weighted likelihood method for ability estimation in tests composed of both dichotomous and polytomous items. Journal of Educational and Behavioral Statistics, 37(2), 298315.CrossRefGoogle Scholar
Thissen, D. (1982). Marginal maximum likelihood estimation for the one-parameter logistic model. Psychometrika, 47(2), 175186.CrossRefGoogle Scholar
von Davier, M. (2009). Is there need for the 3pl model? Guess what?. Measurement: Interdisciplinary Research and Perspectives, 7(2), 110114.Google Scholar
Waller, N. G., & Feuerstahler, L. (2017). Bayesian modal estimation of the four-parameter item response model in real, realistic, and idealized data sets. Multivariate Behavioral Research, 52(3), 350370.CrossRefGoogle ScholarPubMed
Wang, C., Su, S., & Weiss, D. J. (2018). Robustness of parameter estimation to assumptions of normality in the multidimensional graded response model. Multivariate Behavioral Research, 53(3), 403418.CrossRefGoogle ScholarPubMed
Wei, G. C., & Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. Journal of the American Statistical Association, 85(411), 699704.CrossRefGoogle Scholar
Wollack, J. A., Bolt, D. M., Cohen, A. S., & Lee, Y.-S. (2002). Recovery of item parameters in the nominal response model: A comparison of marginal maximum likelihood estimation and Markov Chain Monte Carlo estimation. Applied Psychological Measurement, 26(3), 339352.CrossRefGoogle Scholar
Yen, W. M. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5(2), 245262.CrossRefGoogle Scholar
Yen, W. M. (1987). A comparison of the efficiency and accuracy of bilog and logist. Psychometrika, 52(2), 275291.CrossRefGoogle Scholar
Yoes, M. (1995). An updated comparison of micro-computer based item parameter estimation procedures used with the 3-parameter IRT model. St. Paul, MN: Assessment Systems Corporation Google Scholar
Zhang, J., Du, H., Zhang, Z., & Tao, J. (2020). Gibbs-slice sampling algorithm for estimating the four-parameter logistic model. Frontiers in Psychology, 11 2121.CrossRefGoogle ScholarPubMed
Zhang, S., Chen, Y., & Liu, Y. (2020). An improved stochastic EM algorithm for large-scale full-information item factor analysis. British Journal of Mathematical and Statistical Psychology, 73(1), 4471.CrossRefGoogle ScholarPubMed
Zhang, X., Wang, C., Weiss, D. J., & Tao, J. (2020c). Bayesian inference for IRT models with non-normal latent trait distributions. Multivariate Behavioral Research 1–21.CrossRefGoogle Scholar