ProcData: An R Package for Process Data Analysis

Xueying Tang; Susu Zhang; Zhi Wang; Jingchen Liu; Zhiliang Ying

doi:10.1007/s11336-021-09798-7

ProcData: An R Package for Process Data Analysis

Published online by Cambridge University Press: 01 January 2025

Xueying Tang ,

Susu Zhang ,

Zhi Wang ,

Jingchen Liu

and

Zhiliang Ying

Show author details

Xueying Tang: Affiliation:
University of Arizona
Susu Zhang: Affiliation:
University of Illinois at Urbana-Champaign
Zhi Wang: Affiliation:
Columbia University
Jingchen Liu*: Affiliation:
Columbia University
Zhiliang Ying: Affiliation:
Columbia University
*: Correspondence should be made to Jingchen Liu, Columbia University, New York, NY, USA. Email: jcliu@stat.columbia.edu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Process data refer to data recorded in log files of computer-based items. These data, represented as timestamped action sequences, keep track of respondents’ response problem-solving behaviors. Process data analysis aims at enhancing educational assessment accuracy and serving other assessment purposes by utilizing the rich information contained in response processes. The R package ProcData presented in this article is designed to provide tools for inspecting, processing, and analyzing process data. We define an S3 class ‘proc’ for organizing process data and extend generic methods summary and print for ‘proc’. Feature extraction methods for process data are implemented in the package for compressing information in the irregular response processes into regular numeric vectors. ProcData also provides functions for making predictions from neural-network-based sequence models. In addition, a real dataset of response processes from the climate control item in the 2012 Programme for International Student Assessment is included in the package.

Keywords

process data analysis multidimensional scaling autoencoder sequence model

Type: Application Reviews and Case Studies
Information: Psychometrika , Volume 86 , Issue 4: Special Section: Revisiting Cronbach’s Alpha , December 2021 , pp. 1058 - 1083

DOI: https://doi.org/10.1007/s11336-021-09798-7 [Opens in a new window]
Copyright: Copyright © 2021 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bengio, Y.,Simard, P.,&Frasconi, P..(1994).Learning long-term dependencies with gradient descent is difficult.IEEE Transactions on Neural Networks,5(2),157–166CrossRef Google Scholar PubMed

Borg, I.,&Groenen, P. J..(2005).Modern multidimensional scaling: Theory and applications,New York, NY:Springer Science & Business MediaGoogle Scholar

Broyden, C. G..(1970).The convergence of a class of double-rank minimization algorithms 1. General considerations.IMA Journal of Applied Mathematics,6(1),76–90CrossRef Google Scholar

Chen, Y.,Li, X.,Liu, J.,&Ying, Z..(2019).Statistical analysis of complex problem-solving process data: An event history analysis approach.Frontiers in Psychology,10,486CrossRef Google Scholar

Cho, K, Van Merriënboer, B, Gulcehre, C, Bahdanau, D, Bougares, F, Schwenk, H, & Bengio, Y (2014). Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (pp. 1724–1734). Association for Computational Linguistics.https://doi.org/10.3115/v1/D14-1179.CrossRef Google Scholar

Fletcher, R..(1970).A new approach to variable metric algorithms.The Computer Journal,13(3),317–322CrossRef Google Scholar

Goldfarb, D..(1970).A family of variable-metric methods derived by variational means.Mathematics of Computation,24(109),23–26CrossRef Google Scholar

Gómez-Alonso, C.,Valls, A.,Torra, V., &Narukawa, Y..(2008).A similarity measure for sequences of categorical data based on the ordering of common elements.Modeling decisions for artificial intelligence,Berlin, Heidelberg:Springer, Berlin Heidelberg.134–145CrossRef Google Scholar

Goodfellow, I.,Bengio, Y.,&Courville, A..(2016).Deep learning,Cambridge:MIT Press.Google Scholar

Hao, J.,Smith, L.,Mislevy, R.,von Davier, A.,&Bauer, M..(2016).Taming log files from game/simulation-based assessments: Data models and data analysis tools.ETS Research Report Series,2016(1),1–17CrossRef Google Scholar

He, Q, & von Davier, M .(2016). Analyzing process data from problem-solving items with n-grams: Insights from a computer-based large-scale assessment. In Y. Rosen, S Ferrara, & M. Mosharraf (Eds.), Handbook of research on technology tools for real-world skill development (pp. 749–776). Hershey, PA: Information Science Reference. https://doi.org/10.4018/978-1-4666-9441-5.ch029.CrossRef Google Scholar

Hinton, G, Srivastava, N, & Swersky, K. (2014). RMSProp: Divide the gradient by a running average of its recent magnitude. https://www.cs.toronto.edu/~tijmen/csc321/slides/lectureslideslec6.pdf.Google Scholar

Hochreiter, S.,&Schmidhuber, J..(1997).Long short-term memory.Neural Computation,9(8),1735–17809377276CrossRef Google Scholar PubMed

Kingma, D., & Ba, J (2015). Adam: A method for stochastic optimization. In: Proceedings of the 3rd international conference on learning representations.Google Scholar

Paradis, E..(2018).Multidimensional scaling with very large datasets.Journal of Computational and Graphical Statistics,27(4),935–939CrossRef Google Scholar

Patterson, J, & Gibson, A (2017). Deep learning: A practitioner’s approach. O’Reilly Media, IncGoogle Scholar

Qiao, X.,&Jiao, H..(2018).Data mining techniques in analyzing process data: A didactic.Frontiers in Psychology,9,2231CrossRef Google Scholar

Ren, Y.,Luo, F.,Ren, P.,Bai, D.,Li, X.,&Liu, H..(2019).Exploring multiple goals balancing in complex problem solving based on log data.Frontiers in Psychology,10,1975CrossRef Google Scholar PubMed

Robbins, H.,&Monro, S..(1951).A stochastic approximation method.The Annals of Mathematical Statistics,22(3),400–407CrossRef Google Scholar

Shanno, D.F..(1970).Conditioning of quasi-newton methods for function minimization.Mathematics of Computation,24(111),647–656CrossRef Google Scholar

Stadler, M.,Fischer, F.,&Greiff, S..(2019).Taking a closer look: An exploratory analysis of successful and unsuccessful strategy use in complex problems.Frontiers in Psychology,10,777CrossRef Google Scholar PubMed

Tang, S, Peterson, J., & Pardos, Z. (2016). Deep neural networks and how they apply to sequential education data. In: Proceedings of the third (2016) acm conference on learning@scale (pp. 321–324).https://doi.org/10.1145/2876034.2893444.CrossRef Google Scholar

Tang, X.,Wang, Z.,He, Q.,Liu, J.,&Ying, Z..(2020 Latent feature extraction for process data via multidimensional scaling.Psychometrika,32572672CrossRef Google Scholar PubMed

Tang, X.,Wang, Z.,Liu, J.,&Ying, Z..(2020 An exploratory analysis of the latent structure of process data via action sequence autoencoders.British Journal of Mathematical and Statistical Psychology,Google Scholar PubMed

Wang, C.,Xu, G.,Shang, Z.,&Kuncel, N..(2018).Detecting aberrant behavior and item preknowledge: A comparison of mixture modeling method and residual method.Journal of Educational and Behavioral Statistics,43(4),469–501CrossRef Google Scholar

Wang, X, Liu, Y, Sun, C, Wang, B, & Wang, X (2015), July. Predicting polarities of tweets by composing word embeddings with long short-term memory. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing, (volume 1: Long papers) (pp. 1343–1353). Beijing, China: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/P15-1130 https://doi.org/10.3115/v1/P15-1130.CrossRef Google Scholar

Zeiler, M D. (2012). Adadelta: an adaptive learning rate method. arXiv preprint arXiv: 1212.5701.Google Scholar

Zhang, S, Tang, X, He, Q, Liu, J, & Ying, Z (2021). External correlates of adult digital problem-solving behavior: Log data analysis of a large-scale assessment. Retrieved from https://arxiv.org/pdf/2103.15036.pdf.Google Scholar

Zhang, S, Wang, Z, Qi, J, Liu, J, & Ying, Z (2021). Accurate assessment via process data. Retrieved from https://arxiv.org/pdf/2103.15034.pdf.Google Scholar

Article contents

ProcData: An R Package for Process Data Analysis

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests