Hostname: page-component-745bb68f8f-5r2nc Total loading time: 0 Render date: 2025-01-08T16:31:32.152Z Has data issue: false hasContentIssue false

ProcData: An R Package for Process Data Analysis

Published online by Cambridge University Press:  01 January 2025

Xueying Tang
Affiliation:
University of Arizona
Susu Zhang
Affiliation:
University of Illinois at Urbana-Champaign
Zhi Wang
Affiliation:
Columbia University
Jingchen Liu*
Affiliation:
Columbia University
Zhiliang Ying
Affiliation:
Columbia University
*
Correspondence should be made to Jingchen Liu, Columbia University, New York, NY, USA. Email: jcliu@stat.columbia.edu

Abstract

Process data refer to data recorded in log files of computer-based items. These data, represented as timestamped action sequences, keep track of respondents’ response problem-solving behaviors. Process data analysis aims at enhancing educational assessment accuracy and serving other assessment purposes by utilizing the rich information contained in response processes. The R package ProcData presented in this article is designed to provide tools for inspecting, processing, and analyzing process data. We define an S3 class ‘proc’ for organizing process data and extend generic methods summary and print for ‘proc’. Feature extraction methods for process data are implemented in the package for compressing information in the irregular response processes into regular numeric vectors. ProcData also provides functions for making predictions from neural-network-based sequence models. In addition, a real dataset of response processes from the climate control item in the 2012 Programme for International Student Assessment is included in the package.

Type
Application Reviews and Case Studies
Copyright
Copyright © 2021 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bengio, Y.,Simard, P.,&Frasconi, P..(1994).Learning long-term dependencies with gradient descent is difficult.IEEE Transactions on Neural Networks,5(2),157166CrossRefGoogle ScholarPubMed
Borg, I.,&Groenen, P. J..(2005).Modern multidimensional scaling: Theory and applications,New York, NY:Springer Science & Business MediaGoogle Scholar
Broyden, C. G..(1970).The convergence of a class of double-rank minimization algorithms 1. General considerations.IMA Journal of Applied Mathematics,6(1),7690CrossRefGoogle Scholar
Chen, Y.,Li, X.,Liu, J.,&Ying, Z..(2019).Statistical analysis of complex problem-solving process data: An event history analysis approach.Frontiers in Psychology,10,486CrossRefGoogle Scholar
Cho, K, Van Merriënboer, B, Gulcehre, C, Bahdanau, D, Bougares, F, Schwenk, H, & Bengio, Y (2014). Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (pp. 1724–1734). Association for Computational Linguistics.https://doi.org/10.3115/v1/D14-1179.CrossRefGoogle Scholar
Fletcher, R..(1970).A new approach to variable metric algorithms.The Computer Journal,13(3),317322CrossRefGoogle Scholar
Goldfarb, D..(1970).A family of variable-metric methods derived by variational means.Mathematics of Computation,24(109),2326CrossRefGoogle Scholar
Gómez-Alonso, C.,Valls, A.,Torra, V., &Narukawa, Y..(2008).A similarity measure for sequences of categorical data based on the ordering of common elements.Modeling decisions for artificial intelligence,Berlin, Heidelberg:Springer, Berlin Heidelberg.134145CrossRefGoogle Scholar
Goodfellow, I.,Bengio, Y.,&Courville, A..(2016).Deep learning,Cambridge:MIT Press.Google Scholar
Hao, J.,Smith, L.,Mislevy, R.,von Davier, A.,&Bauer, M..(2016).Taming log files from game/simulation-based assessments: Data models and data analysis tools.ETS Research Report Series,2016(1),117CrossRefGoogle Scholar
He, Q, & von Davier, M .(2016). Analyzing process data from problem-solving items with n-grams: Insights from a computer-based large-scale assessment. In Y. Rosen, S Ferrara, & M. Mosharraf (Eds.), Handbook of research on technology tools for real-world skill development (pp. 749–776). Hershey, PA: Information Science Reference. https://doi.org/10.4018/978-1-4666-9441-5.ch029.CrossRefGoogle Scholar
Hinton, G, Srivastava, N, & Swersky, K. (2014). RMSProp: Divide the gradient by a running average of its recent magnitude. https://www.cs.toronto.edu/~tijmen/csc321/slides/lectureslideslec6.pdf.Google Scholar
Hochreiter, S.,&Schmidhuber, J..(1997).Long short-term memory.Neural Computation,9(8),173517809377276CrossRefGoogle ScholarPubMed
Kingma, D., & Ba, J (2015). Adam: A method for stochastic optimization. In: Proceedings of the 3rd international conference on learning representations.Google Scholar
Paradis, E..(2018).Multidimensional scaling with very large datasets.Journal of Computational and Graphical Statistics,27(4),935939CrossRefGoogle Scholar
Patterson, J, & Gibson, A (2017). Deep learning: A practitioner’s approach. O’Reilly Media, IncGoogle Scholar
Qiao, X.,&Jiao, H..(2018).Data mining techniques in analyzing process data: A didactic.Frontiers in Psychology,9,2231CrossRefGoogle Scholar
Ren, Y.,Luo, F.,Ren, P.,Bai, D.,Li, X.,&Liu, H..(2019).Exploring multiple goals balancing in complex problem solving based on log data.Frontiers in Psychology,10,1975CrossRefGoogle ScholarPubMed
Robbins, H.,&Monro, S..(1951).A stochastic approximation method.The Annals of Mathematical Statistics,22(3),400407CrossRefGoogle Scholar
Shanno, D.F..(1970).Conditioning of quasi-newton methods for function minimization.Mathematics of Computation,24(111),647656CrossRefGoogle Scholar
Stadler, M.,Fischer, F.,&Greiff, S..(2019).Taking a closer look: An exploratory analysis of successful and unsuccessful strategy use in complex problems.Frontiers in Psychology,10,777CrossRefGoogle ScholarPubMed
Tang, S, Peterson, J., & Pardos, Z. (2016). Deep neural networks and how they apply to sequential education data. In: Proceedings of the third (2016) acm conference on learning@scale (pp. 321–324).https://doi.org/10.1145/2876034.2893444.CrossRefGoogle Scholar
Tang, X.,Wang, Z.,He, Q.,Liu, J.,&Ying, Z..(2020 Latent feature extraction for process data via multidimensional scaling.Psychometrika,32572672CrossRefGoogle ScholarPubMed
Tang, X.,Wang, Z.,Liu, J.,&Ying, Z..(2020 An exploratory analysis of the latent structure of process data via action sequence autoencoders.British Journal of Mathematical and Statistical Psychology,Google ScholarPubMed
Wang, C.,Xu, G.,Shang, Z.,&Kuncel, N..(2018).Detecting aberrant behavior and item preknowledge: A comparison of mixture modeling method and residual method.Journal of Educational and Behavioral Statistics,43(4),469501CrossRefGoogle Scholar
Wang, X, Liu, Y, Sun, C, Wang, B, & Wang, X (2015), July. Predicting polarities of tweets by composing word embeddings with long short-term memory. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing, (volume 1: Long papers) (pp. 1343–1353). Beijing, China: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/P15-1130 https://doi.org/10.3115/v1/P15-1130.CrossRefGoogle Scholar
Zeiler, M D. (2012). Adadelta: an adaptive learning rate method. arXiv preprint arXiv: 1212.5701.Google Scholar
Zhang, S, Tang, X, He, Q, Liu, J, & Ying, Z (2021). External correlates of adult digital problem-solving behavior: Log data analysis of a large-scale assessment. Retrieved from https://arxiv.org/pdf/2103.15036.pdf.Google Scholar
Zhang, S, Wang, Z, Qi, J, Liu, J, & Ying, Z (2021). Accurate assessment via process data. Retrieved from https://arxiv.org/pdf/2103.15034.pdf.Google Scholar