Hostname: page-component-745bb68f8f-s22k5 Total loading time: 0 Render date: 2025-01-07T18:32:09.179Z Has data issue: false hasContentIssue false

Automated Item Generation with Recurrent Neural Networks

Published online by Cambridge University Press:  01 January 2025

Matthias von Davier*
Affiliation:
National Board of Medical Examiners
*
Correspondence should be made to Matthias von Davier, National Board of Medical Examiners, 3750 Market Street, Philadelphia, PA 19104-3102, USA. Email: mvondavier@nbme.org

Abstract

Utilizing technology for automated item generation is not a new idea. However, test items used in commercial testing programs or in research are still predominantly written by humans, in most cases by content experts or professional item writers. Human experts are a limited resource and testing agencies incur high costs in the process of continuous renewal of item banks to sustain testing programs. Using algorithms instead holds the promise of providing unlimited resources for this crucial part of assessment development. The approach presented here deviates in several ways from previous attempts to solve this problem. In the past, automatic item generation relied either on generating clones of narrowly defined item types such as those found in language free intelligence tests (e.g., Raven’s progressive matrices) or on an extensive analysis of task components and derivation of schemata to produce items with pre-specified variability that are hoped to have predictable levels of difficulty. It is somewhat unlikely that researchers utilizing these previous approaches would look at the proposed approach with favor; however, recent applications of machine learning show success in solving tasks that seemed impossible for machines not too long ago. The proposed approach uses deep learning to implement probabilistic language models, not unlike what Google brain and Amazon Alexa use for language processing and generation.

Type
Original Paper
Copyright
Copyright © 2018 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E.,Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jozefowicz, R., Jia, Y., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Schuster, M., Monga, R., Moore, S., Murray, D., Olah, C., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y. & Zheng, X. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from tensorflow.org (Google Research).Google Scholar
Bejar, I. I., Lawless, R., Morley, M. E., Wagner, M. E., Bennett, R. E. & Revuelta, J. (2003). A feasibility study of on-the-fly item generation in adaptive testing. Journal of Technology, Learning, and Assessment. https://www.uam.es/personal_pdi/psicologia/fjabad/cv/articulos/jlta/A_Feasibility_Study_of_On_the_Fly_Item_Generation_in_Adaptive_Tes%5B1%5D.pdf. Accessed 7 March 2018.Google Scholar
Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A. & Bengio, Y. (2015). A recurrent latent variable model for sequential data. arXiv:1506.02216v6 [cs.LG].Google Scholar
Cui, H., Wei, X. & Dai, M. (2010). Parallel implementation of expectation-maximization for fast convergence. In ACM proceedings. http://users.ece.cmu.edu/~hengganc/archive/report/final.pdf. Accessed 7 March 2018.Google Scholar
Cybenko, G(1989).Approximations by superpositions of sigmoidal functions.Mathematics of Control, Signals, and Systems,2(4).303314.CrossRefGoogle Scholar
Dennis, J. E. & Schnabel, R. B. (1996). Numerical methods for unconstrained optimization and nonlinear equations. Classics in Applied Mathematics: Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9781611971200.CrossRefGoogle Scholar
Dreyfus, SE(1990).Artificial neural networks, back propagation, and the Kelley–Bryson gradient procedure.Journal of Guidance, Control, and Dynamics,13(5).926928.CrossRefGoogle Scholar
Embretson, S. E.,Irvine, S. H., &Kyllonen, P. C.(2002).Generating abstract reasoning items with cognitive theory.Item generation for test development,Mahwah, NJ:Erlbaum 219250Google Scholar
Embretson, S. E.,Yang, X.,Rao, C. R., &Sinharay, S.(2007).Automatic item generation and cognitive psychology.Handbook of Statistics: Psychometrics,North Holland:Elsevier 747768Google Scholar
Gal, Y. & Ghahramani, Z. (2015). A theoretically grounded application of dropout in recurrent neural networks. Published in NIPS 2016. arXiv:1512.05287Google Scholar
Gierl, M. J.,Lai, H.(2013).Using automated processes to generate test items.Educational Measurement: Issues and Practice,32,3650CrossRefGoogle Scholar
Gilula, Z., &Haberman, S. J.(1994).Models for analyzing categorical panel data.Journal of the American Statistical Association,89,645656.CrossRefGoogle Scholar
Gilula, Z., &Haberman, S. J.(1995).Prediction functions for categorical panel data.The Annals of Statistics,23,11301142.CrossRefGoogle Scholar
Goldberg, L. R.,Mervielde, I.,Deary, I.,De Fruyt, F., &Ostendorf, F.(1999).A broad-bandwidth, public domain, personality inventory measuring the lower-level facets of several five-factor models.Personality psychology in Europe,Tilburg:Tilburg University Press 728.Google Scholar
Goldberg, L. R.,Johnson, J. A.,Eber, H. W.,Hogan, R.,Ashton, M. C.,Cloninger, C. R., &Gough, H. C.(2006).The international personality item pool and the future of public-domain personality measures.Journal of Research in Personality,40,8496.CrossRefGoogle Scholar
Goodfellow, I. Pouget-Abadie, J., Mirza, M., Xu, B. Warde-Farley, D., Ozair, S., Courville, A. & Bengio, J. (2014). Generative adversarial networks. arXiv:1406.2661.Google Scholar
Greff, K., Srivastava, R. K., Koutnik, J., Steunebrink, B. R. & Schmidhuber, J. (2015). LSTM: A search space odyssey. arXiv preprint arXiv:1503.04069.Google Scholar
Hochreiter, S., &Schmidhuber, J.(1997).Long short-term memory.Neural Computation,9(8).17351780.CrossRefGoogle ScholarPubMed
Hornik, K.(1991).Approximation capabilities of multilayer feedforward networks.Neural Networks,4(2).251257.CrossRefGoogle Scholar
Jozefowicz, R., Vinyals, O., Schuster, M. Shazeer N., & Wu, Y. (2016). Exploring the limits of language modeling. arXiv:1602.02410v2.Google Scholar
Jozefowicz, R., Zaremba, W., Sutskever, I.(2015). An empirical exploration of recurrent network architectures.. In Proceedings of the 32nd international conference on machine learning, Lille, France, (37). JMLR:: W&CP.Google Scholar
Karpathy, A.(2015). The unreasonable effectiveness of RNNs. http://karpathy.Github.io/2015/05/21/rnn-effectiveness/. Accessed 7 March 2018.Google Scholar
Kingma, D., & Ba, J.(2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.Google Scholar
Mikolov, T.(2012). Statistical language models based on NNs.. Ph.D. thesis, Brno University of Technology,.Google Scholar
Ozair, S. (2016). Char-rnn for tensorflow.. https://github.com/sherjilozair/char-rnn-tensorflow. Accessed 7 March 2018.Google Scholar
Rammstedt, B., &John, O. P.(2007).Measuring personality in one minute or less: A 10-item short version of the big five inventory in English and German.Journal of Research in Personality,41,203212.CrossRefGoogle Scholar
Rosenblatt, F.(1958).The perceptron: A probabilistic model for information storage and organization in the brain.Psychological Review,65(6).386408.CrossRefGoogle ScholarPubMed
Rumelhart, D. E.,Hinton, G. E., &Williams, R. J.(1986).Learning internal representations by error propagation,Cambridge, MA:MIT pressGoogle Scholar
Savage, L.(1971).Elicitation of personal probabilities and expectations.Journal of the American Statistical Association,66(336).783801.CrossRefGoogle Scholar
Schäfer, A. M.,Zimmermann, H. G.,Kollias, S. D.,Stafylopatis, A.,Duch, W., &Oja, E.(2006).Recurrent neural networks are universal approximators.Artificial neural networks— ICANN 2006. ICANN 2006. Lecture notes in computer science,Berlin:SpringerGoogle Scholar
Sundermeyer, M.,Ney, H., &Schlüter, R.(2015).From feedforward to recurrent LSTM NNs for language modeling.IEEE/ACM Transactions on Audio, Speech, and Language Processing,23(3).517529.CrossRefGoogle Scholar
Trask, A., Gilmore, D., & Russell, M. (2015). Modeling order in neural word embeddings at scale. CoRR, abs/1506.02338, 2015,.arXiv:1506.02338.Google Scholar
von Davier, M.(2016).High-performance psychometrics: The parallel-E parallel-M algorithm for generalized latent variable models.ETS Research Report Series,2016,111.CrossRefGoogle Scholar
von Davier, M.(2017). New results on an improved parallel EM algorithm for estimating generalized latent variable models.In van der Ark, L. A., Wiberg, M., Culpepper, S. A.,Douglas, J. A., & Wang, W.-C. (Eds.) Quantitative psychology: Proceedings of the 81st annual meeting of the psychometric society, Asheville, North Carolina, 2016.(18).http://www.springer.com/us/book/9783319562933.Google Scholar