Hostname: page-component-5f745c7db-rgzdr Total loading time: 0 Render date: 2025-01-06T07:42:17.875Z Has data issue: true hasContentIssue false

Polytomous Effectiveness Indicators in Complex Problem-Solving Tasks and Their Applications in Developing Measurement Model

Published online by Cambridge University Press:  01 January 2025

Pujue Wang
Affiliation:
Beijing Normal University Beijing Key Laboratory of Applied Experimental Psychology, National Demonstration Center for Experimental Psychology Education (Beijing Normal University)
Hongyun Liu*
Affiliation:
Beijing Normal University Beijing Key Laboratory of Applied Experimental Psychology, National Demonstration Center for Experimental Psychology Education (Beijing Normal University)
*
Correspondence should be made to Hongyun Liu, Beijing Key Laboratory of Applied Experimental Psychology, National Demonstration Center for Experimental Psychology Education (Beijing Normal University), Faculty of Psychology, Beijing Normal University, No. 19, Xin Jie Kou Wai St., Hai Dian District, Beijing100875, People’s Republic of China. Email: hyliu@bnu.edu.cn
Rights & Permissions [Opens in a new window]

Abstract

Recent years have witnessed the emergence of measurement models for analyzing action sequences in computer-based problem-solving interactive tasks. The cutting-edge psychometrics process models require pre-specification of the effectiveness of state transitions often simplifying them into dichotomous indicators. However, the dichotomous effectiveness becomes impractical when dealing with complex tasks that involve multiple optimal paths and numerous state transitions. Building on the concept of problem-solving, we introduce polytomous indicators to assess the effectiveness of problem states ds\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$d_{s}$$\end{document} and state-to-state transitions Δds→s′\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\mathrm {\Delta }d}_{\mathrm {s\rightarrow s'}}$$\end{document}. The three-step evaluation method for these two types of indicators is proposed and illustrated across two real problem-solving tasks. We further present a novel psychometrics process model, the sequential response model with polytomous effectiveness indicators (SRM-PEI), which is tailored to encompass a broader range of problem-solving tasks. Monte Carlo simulations indicated that SRM-PEI performed well in the estimation of latent ability and transition tendency parameters across different conditions. Empirical studies conducted on two real tasks supported the better fit of SRM-PEI over previous models such as SRM and SRMM, providing rational and interpretable estimates of latent abilities and transition tendencies through effectiveness indicators. The paper concludes by outlining potential avenues for the further application and enhancement of polytomous effectiveness indicators and SRM-PEI.

Type
Theory & Methods
Copyright
Copyright © 2024 The Author(s), under exclusive licence to The Psychometric Society

Problem-solving ability is often considered one of the most difficult aspects of human cognition (Newell and Simon, Reference Newell and Simon1972) and a crucial skill for the 21st century (Griffin and Care, Reference Griffin and Care2014; OECD, 2018). Computer-based interactive assessments are increasingly favored in large-scale international survey programs. For example, the Organization for Economic Cooperation and Development’s (OECD) Programme for International Student Assessment (PISA) introduced tests for computer-based problem-solving and human-computer interactive collaborative problem-solving in 2012 and 2015, respectively (OECD, 2014, 2016). The Assessment and Teaching of 21st Century Skills (ATC21S) initiative pioneeres the interpersonal interaction testing task, which can also assess cooperative problem-solving skills (Griffin and Care, Reference Griffin and Care2014). Computer-based interactive tests, grounded in realistic problem-solving scenarios, require respondents to engage with the scenarios and make multistep decisions towards solutions. Every action taken in addressing a problem is recorded as process data by the computer platform. These action sequences provide valuable insights into the cognitive and response mechanisms of respondents, extending beyond mere outcomes and grades (Bergner and von Davier, Reference Bergner and von Davier2019). They can be analyzed to extract sequence-based features for interpreting the problem-solving process (e.g., (He and von Davier, Reference He, von Davier, van der Ark, Bolt, Wang, Douglas and Chow2015, Reference He, von Davier, Rosen, Ferrara and Mosharraf2016; Tang et al., Reference Tang, Wang, He, Liu and Ying2020)) and are key in developing measurement models for estimating latent problem-solving abilities (e.g., (Chen, Reference Chen2020; Han et al., Reference Han, Liu and Ji2022; LaMar, Reference LaMar2018; Shu et al., Reference Shu, Bergner, Zhu, Hao and von Davier2017; Xiao and Liu, 2023)). Measurement models estimating problem-solving abilities encompass both traditional psychometric models (Liu et al., Reference Liu, Liu and Li2018; Yuan et al., Reference Yuan, Xiao and Liu2019; Han and Wilson, Reference Han and Wilson2022) and stochastic process modeling (Arieli-Attali et al., Reference Arieli-Attali, Ou and Simmering2019; Xiao et al., Reference Xiao, He, Veldkamp and Liu2021). Merging the strengths of these two approaches, psychometric models incorporating stochastic process properties have also emerged (Shu et al., Reference Shu, Bergner, Zhu, Hao and von Davier2017; LaMar, Reference LaMar2018; Chen, Reference Chen2020; Han et al., Reference Han, Liu and Ji2022; Xiao and Liu, 2023; Fu et al., Reference Fu, Zhan, Chen and Jiao2023; Tang, Reference Tang2023). These models, considering the sequential dependency of actions, view action sequences as stochastic processes with first-order Markov properties and model the conditional probabilities of respondents’ choices under each problem state (Shu et al., Reference Shu, Bergner, Zhu, Hao and von Davier2017).

These process models are specifically designed for well-defined tasks, often utilizing the Finite State Automata (FSA) framework, a prevalent structure for interactive problem-solving tasks. In FSA tasks, the system is characterized by a finite number of states, a defined set of allowable actions, and a transition function that dictates the next state based on the action taken in the current state (Buchner and Funke, Reference Buchner and Funke1993). Respondents are tasked with moving from an initial state to a target state to resolve the problem (Anderson et al., 2007). Effective performance is achieved by determining and following the optimal path from the initial to the target state. Additionally, the occurrence of unnecessary steps is indicative of inefficiency during the knowledge application stage of the problem-solving process (Buchner and Funke, Reference Buchner and Funke1993; Funke, Reference Funke2001). To assess problem-solving ability, it is crucial to assess the effectiveness of each action that leads to a transition, considering the nature of FSA tasks. Then the effectiveness is integrated into the model as pre-defined parameters, subsequently facilitating the estimation of latent problem-solving abilities.

The concept and assessment of effectiveness originate in reinforcement learning, but due to the complexity of algorithmic evaluation, they have since evolved into manually assessed dichotomous indicators. LaMar (Reference LaMar2018) used the action-value function from the reinforcement learning paradigm to evaluate the effectiveness of actions and established a measurement model using Markov Decision Processes. The action-value function calculates the expected weighted sum of future rewards for each action in a given problem state and is solved using dynamic programming algorithms. However, Lamar’s approach to assessing action value is intricate, thus limiting its practicality in psychometrics. Chen (Reference Chen2020) determined action effectiveness based on whether an action, leading to a state-to-state transition, aligns with the optimal path. This assessment, essentially evaluating the effectiveness of transition, utilizes a dichotomous indicator with values of 0 and 1. For the PISA 2012 Ticket task (OECD, 2014), which features a single optimal path from the initial to the target state, Chen manually evaluated the effectiveness of each transition. Transitions that are either on the optimal path or lead back to the optimal path from an incorrect one are classified as correct transitions, with a value of 1 as effectiveness. Conversely, transitions that are not on the optimal path, indicating they are on an incorrect path, are classified as incorrect transitions, marked with a value of 0 as effectiveness. Essentially, effectiveness is reduced to dichotomous correctness. Utilizing this dichotomous effectiveness, Chen applied the Nominal Response Model (Bock, Reference Bock1972) in conjunction with a task difficulty parameter to fit action sequences, specifically employing the Continuous-Time Dynamic Choice Model (CTDC).

The new model modifies the values of dichotomous effectiveness and the difficulty parameter under the NRM framework. Han et al. (Reference Han, Liu and Ji2022) proposed the Sequential Response Model (SRM) and employed values of 1 and - 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-1$$\end{document} for correct and incorrect transitions, respectively. In SRM, effectiveness is only multiplied by the latent ability parameter, and a set of transition tendency parameters replaces the single task difficulty parameter. This expansion allows for a more nuanced estimation of transitions. Furthering the SRM, Fu et al. (2023) incorporated a log-normal action time model into the SRM to simultaneously accommodate action times. Xiao and Liu (2023) retained the dichotomous effectiveness indicators of 0 and 1 while altering the task difficulty parameter in CTDC to transition tendency parameters in SRM, forming the State Response Measurement Model (SRMM). Comparative analysis in both simulation studies and empirical research on the Ticket task shows that SRMM outperforms CTDC.

However, the use of dichotomous effectiveness indicators often results in an oversimplification of diverse situations and limits the application of measurement models to complex tasks, primarily for three reasons: First, in scenarios with a single path to the target and multiple backward transitions, dichotomous effectiveness fails to distinguish between different types of ineffective transitions. For example, in a state transition diagram (Fig. 1a) with only one path from starting state A to target state D via intermediates B and C, the transition from C back to A (C \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document} A) is worse than from C back to B (C \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document} B) as it moves further from the target. However, dichotomous effectiveness does not capture the severity difference between these backward transitions. Second, when multiple paths lead to a target or there are multiple targets, calibrating dichotomous effectiveness becomes challenging since a transition can move closer to one target while simultaneously moving away from the other. For instance, in a transition diagram with two target states C and E (Fig. 1b), the optimal path is A \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document} B \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document} C. According to the standard of dichotomous effectiveness, the transition D \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document} B is considered correct because it leads back to the shortest path from a non-optimal path. However, the transition D \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document} E is more effective as it allows the task to be completed in fewer steps, illustrating a limitation in the dichotomous approach where it fails to account for the effectiveness of completing the task. Third, with a multitude of states, transitions, and optimal paths, the complexities mentioned above may coexist, complicating the effectiveness assessment of various transitions. Furthermore, the manual evaluation process becomes exceedingly labor-intensive and time-consuming, making it impractical for complex tasks. Consequently, measurement models dependent on dichotomous effectiveness indicators face significant challenges in addressing complex scenarios, highlighting the necessity for more sophisticated evaluation approaches. This situation calls for an urgent shift towards polytomous effectiveness indicators and automated evaluation methods. Such advancements are crucial for accurately differentiating and adapting to a wide range of transitions and for broadening the scope of measurement models, making them suitable for more complex FSA tasks.

Figure 1 Diagram of two scenarios with multiple backward transitions and target states.

In this study, we introduce a novel method for assessing the effectiveness of various state transitions, as well as a new measurement model that incorporates polytomous effectiveness indicators and is tailored for complex tasks with multiple optimal paths. Specifically, in Sect. 1, we propose a universal method for gauging the effectiveness of states and state-to-state transitions capable of complex FSA tasks. We then exemplify the derivation of polytomous effectiveness evaluation outcomes through the two real FSA tasks. In Sect. 2, we introduce a new measurement model, termed the Sequential Response Model with Polytomous Effectiveness Indicators (SRM-PEI), detailing its specification and parameter estimation methodology. In Sect. 3, we execute a simulated study to probe the accuracy of SRM-PEI estimations under various conditions within simulated tasks. In Sect. 5, we demonstrate the applicability of SRM-PEI and compare it with SRM and SRMM in two problem-solving tasks. The article concludes with a discussion in Sect. 6.

1. Effectiveness Indicators of the States and Transitions in FSA Tasks

1.1. New Definitions of Effectiveness of States and State-to-State Transitions

Theoretically, problem-solving is a process of navigating towards the target through a series of state-to-state transitions (Newell and Simon, Reference Newell and Simon1972; Mayer and Wittrock, Reference Mayer, Wittrock, Alexander and Winne2006). The criterion for dichotomous effectiveness is determined by whether the state after the transition is closer to the goal than the state before the transition. When there is a single target and an optimal path, the criterion reduces to whether it is consistent with the optimal path. This criterion has two limitations: first, it does not account for scenarios with multiple targets and multiple shortest paths; second, it merely assesses whether the distance to the target state is reduced without considering the extent of the change in distance. Our proposed concept of transition effectiveness quantifies the change in distance to the target before and after each transition in situations with multiple targets and paths. A transition is considered effective if it reduces the distance and inefficient if it increases the distance, aligning with the principles of evaluating problem-solving abilities in FSA tasks (Buchner and Funke, Reference Buchner and Funke1993; Funke, Reference Funke2001). To address these complexities, we propose two types of effectiveness indicators suitable for complex FSA tasks: (1) the effectiveness indicators d s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{s}$$\end{document} of the state s, (2) the effectiveness indicators Δ d s s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\Delta }d}_{s\rightarrow s'}$$\end{document} of the transition s s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s\rightarrow s'$$\end{document} .

First, we define the distance between any problem state s and the target state s target \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{target}$$\end{document} , denoted as d s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{s}$$\end{document} . The d s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{s}$$\end{document} is the minimal steps of transitions needed to reach the target state s target \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{target}$$\end{document} from the state s. A smaller d s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{s}$$\end{document} means that state s is closer to target state, so it is more effective to solve the problem if the state with smaller d s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{s}$$\end{document} is reached In complex FSA tasks, there can be k target states that can be reached from state s, denoted as s target ( 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{target}^{(1)}$$\end{document} , s target ( 2 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{target}^{(2)}$$\end{document} to s target ( k ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{target}^{(k)}$$\end{document} . The distances to these target states are correspondingly d s ( 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{s}^{(1)}$$\end{document} , d s ( 2 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{s}^{(2)}$$\end{document} to d s ( k ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{s}^{(k)}$$\end{document} . Given the understanding of state effectiveness as the theoretical minimum distance from a state to any target state, the effectiveness of the state s is calculated as d s = min ( d s ( 1 ) d s ( 2 ) , , d s ( k ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{s}\mathrm {=min (}d_{s}^{(1)}d_{s}^{(2)},\ldots ,d_{s}^{(k)}\mathrm {)}$$\end{document}

In Fig. 1a, the effectiveness of states A, B, C, and D is determined by their respective shortest distances to the target state D, which are 3, 2, 1, and 0, respectively. In Fig. 1b, where the target states are C and E, both have an effectiveness of 0. The shortest distances from B to targets C and E are 1 and 2, respectively. Therefore, the effectiveness of state B is d B = min d B C , d B E = min 1 , 2 = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{B}\mathrm {=min }\left( d_{B}^{C},d_{B}^{E} \right) \mathrm {=min }\left( 1,2 \right) =1$$\end{document} . Similarly, the effectiveness of state D is d D = min d D C , d D E = min 2 , 1 = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{D}\mathrm {=min }\left( d_{D}^{C},d_{D}^{E} \right) \mathrm {=min }\left( 2,1 \right) =1$$\end{document} , and the effectiveness of state A is d A = min d A C , d A E = min 2 , 3 = 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{A}\mathrm {=min }\left( d_{A}^{C},d_{A}^{E} \right) \mathrm {=min }\left( 2,3 \right) =2$$\end{document}

Second, to get the effectiveness of a transition s s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s\rightarrow s'$$\end{document} , we calculate two distances from state s to s target \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{target}$$\end{document} (i.e., d s ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{s})$$\end{document} and s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s'$$\end{document} to s target \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{target}$$\end{document} (i.e., d s ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{s'})$$\end{document} , respectively Then, we compute the difference between d s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{s}$$\end{document} and d s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{s'}$$\end{document} denoted as Δ d s s = d s - d s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s'}={d_{s}-d}_{s'}$$\end{document} The difference, Δ d s s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s'}$$\end{document} is defined as the effectiveness indicator of the transition s s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s\rightarrow s'$$\end{document} whose value is equal to the change in the shortest distance from the target after the transition s s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s\rightarrow s'$$\end{document} . A value greater than zero for Δ d s s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s'}$$\end{document} indicates that it is closer to the target after the transition s s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s\rightarrow s'$$\end{document} , while a value less than zero implies that it is further from the target. The absolute value of Δ d s s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s'}$$\end{document} indicates the number of steps closer or farther away from the target It is noteworthy that two types of effectiveness indicators are located on different ends of the indicator scales. A state with a higher value of d s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{\textrm{s}}$$\end{document} is less effective, while a transition with a higher value of Δ d s s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\Delta d}_{\mathrm {s\rightarrow s'}}$$\end{document} is more effective.

In Fig. 1a, there are three transitions with an effectiveness of 1 that move closer to the target ( Δ d A B = Δ d B C = Δ d C D = 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\Delta }d}_{A\rightarrow B}={{\Delta }d}_{B\rightarrow C}={\mathrm {\Delta }d}_{C\rightarrow D}=1)$$\end{document} . Additionally, there are two transitions that move away from the target with unequal effectiveness ( Δ d C B = d C - d B = - 1 , Δ d C A = d C - d A = - 2 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{C\rightarrow B}={d_{C}-d}_{B}=-1, {\mathrm {\Delta }d}_{C\rightarrow A}={d_{C}-d}_{A}=-2)$$\end{document} . The transition C \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document} A, which moves further back, has lower effectiveness. Polytomous effectiveness differentiates between various types of backward movement. In Fig. 1b, the effectiveness of three transitions close to the target is also equals 1 ( Δ d A B = Δ d B C = Δ d D E = 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{A\rightarrow B}={\mathrm {\Delta }d}_{B\rightarrow C}={\mathrm {\Delta }d}_{D\rightarrow E}=1)$$\end{document} . Notably, Δ d B D = Δ d D B = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{B\rightarrow D}={\mathrm {\Delta }d}_{D\rightarrow B}=0$$\end{document} , indicating that the distance to the target remains unchanged after these transitions. This suggests that D \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document} B, compared to the more effective D \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document} E, is a less efficient transition and not always the optimal choice for returning to the shortest path. Since Δ d B A = - 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{B\rightarrow A}=-1$$\end{document} , the transition B \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document} A is a worse option than B \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document} D. The polytomous effectiveness provides a more nuanced evaluation than the dichotomous effectiveness based solely on the shortest path, better aligning with the task design.

This section presents a general framework that can automatically evaluate the polytomous effectiveness indicators of states and transitions in FSA tasks. The process of evaluating the effectiveness can be summarized in three steps: (1) Define the state space. This involves finding all the target states. During this step, states may be categorized and simplified. Simultaneously, all state transitions are defined. (2) Calculate the effectiveness d s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{s}$$\end{document} for all states. The shortest path can be identified using transition diagrams or search algorithms. (3) Calculate the effectiveness Δ d s s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s'}$$\end{document} for all transitions. This process requires that each state within the task can reach a target state through a series of transitions. If this requirement is not met for some states or transitions, additional values may need to be assigned to their effectiveness. In the two subsequent sections, we will demonstrate the process of evaluating effectiveness for two real tasks, one with single optimal path and the other with multiple optimal paths.

1.2. Example of Evaluating the Effectiveness Indicators for a FSA Task with a Single Optimal Path

In this section, we demonstrate the calculation of effectiveness metrics using the Ticket task from PISA 2012, which is the most commonly used problem-solving task in existing models (Chen, Reference Chen2020; Han et al., Reference Han, Liu and Ji2022; Xiao and Liu, 2023; Fu et al., Reference Fu, Zhan, Chen and Jiao2023). Taking sub-task CP038Q02 as an example, this task requires students to purchase a full-fare ticket for country trains, valid for two trips. Students have to sequentially select the correct option on the simulated ticketing interface (“COUNTRY TRAINS” \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document} “FULL FARE” \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document} “INDIVIDUAL” \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document} “2 Trips” \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document} “BUY”) Before selecting “BUY”, the student has the option to hit “CANCEL” to restart the task from the beginning. This task was scored in a binary fashion, depending on whether the student successfully purchased the correct ticket.

When evaluating dichotomous effectiveness in the Ticket task, previous studies have already completed much of the fundamental work for evaluating the polytomous effectiveness we propose. For the first step, Chen (Reference Chen2020) defined all states and transitions. Building on this, Han et al. (Reference Han, Liu and Ji2022) merged states with similar error types, effectively reducing the number and complexity of states and transitions. More importantly, they illustrated all paths leading to the target state through a transition diagram, which is crucial for identifying the shortest path from each state to the target in the second step. However, some adjustments to the state categorization and the transition diagram are still required for assessing the polytomous effectiveness. It is important to note that the Ticket task has only one target state, but also includes a non-target end state. Once the incorrect end state is reached, the problem-solving process terminates prematurely, making it impossible to reach the target state. In this case, we cannot get the shortest distance between the incorrect end state and the target state. From the task design, the incorrect end state is further from the target state than all other states, so we can set its effectiveness to be lower than any other state. Eventually, we distinguish between the correct target state and the incorrect end state to depict a new transition diagram (see Fig. 2) and then calculate the effectiveness of all transitions (see Table 1).

Figure 2 A new transition diagram for the CP038Q02 subtask of the Ticket task in the PISA 2012. Note: The solid arrows represent transitions that move closer to the target state, while the dotted arrows represent transitions that do not move closer to the target state.

Table 1 The effectiveness of all states and transitions in the CP038Q02 task.

The effectiveness of the states and transitions is in parentheses. The bold transitions are considered correct, while the non-bolded transitions are considered incorrect, according to Han et al. (Reference Han, Liu and Ji2022).

To proceed, we complete the final two steps by sequentially calculating the two types of effectiveness indicators. Step (2): The state effectiveness d s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{s}$$\end{document} distinctly and meaningfully differentiates between states A,B,C,D,E, and K on the optimal path. States G, H, I, and J on an incorrect path indicate a shortest distance of 6 transitions from the target. The other incorrect state F, as a branch on the optimal path, is only 2 steps away from the target. Since the maximum effectiveness value among states A to J is 6, we set the effectiveness of incorrect end state L as 7. Step (3): Different from the dichotomous effectiveness which can only indicate correct or incorrect, our effectiveness Δ d s s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s'}$$\end{document} can be polytomously scored in the task with a single optimal path, and clearly show the change in the shortest distance to the target state after each transition. Comparing with the evaluation of Han et al. (Reference Han, Liu and Ji2022), the effectiveness of correct transitions remains at a value of 1, while incorrect transitions have effectiveness values ranging from 0 to - 5 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-5$$\end{document} , with lower values indicating that the target is further away after the transitions These two types of effectiveness can be aggregated into descriptive indicators, not only providing a deeper description and evaluation of the problem-solving process but also serving as validation for estimating latent abilities in measurement models. The application of aggregated indicators will be discussed in detail in Sect. 5.

1.3. Example of Evaluating the Effectiveness Indicators for a FSA Task with Multiple Optimal Paths

In this section, we demonstrate the step-by-step evaluation of effectiveness indicators using a complex FSA task—the Balance Beam task—a collaborative problem-solving task from the Assessment and Teaching of 21st Century Skills (ATC21S) project (Griffin and Care, Reference Griffin and Care2014). In the Chinese version of the Balance Beam task developed by Yuan et al. (Reference Yuan, Xiao and Liu2019), two students are required to balance a beam that has four notches on each side for placing four weights (50 g, 100 g, 300 g, and 500 g). Only student A possesses all four weights at the beginning, while student B has none (see Fig. 5 in the Appendix). The testing system permits weight transfers between the students. The Balance Beam task exemplifies a complex FSA task with a multitude of intermediate states, intricate transition connections, and notably, multiple target states. There are multiple optimal paths to a target, because the order of hanging the same set of weights does not affect the final balance. The existence of multiple targets further expands the number of optimal paths. These paths are interconnected, meaning that a change in the target state during problem-solving can render a previously optimal transition suboptimal from a broader perspective. This complexity in the task structure necessitates a nuanced approach to evaluating the effectiveness of transitions and states within the problem-solving process. The procedure for automatically evaluating the effectiveness of all states and transitions in the Balance Beam task is as follows:

Step (1): Define all problem states and state transitions. Given the collaborative nature of the task, we view both sides of the balance beam as a whole, defining the state space based on the positions of the four weights. Each weight can occupy one of ten possible positions: eight on the beam and two off-beam, i.e., held by a student (see Appendix A for details). Given four weights, there are potentially 10 4 = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\textrm{4}}=$$\end{document} 10,000 distinct states. Consistent with the principle when defining the state space, the target state is also defined at the group level, which means that the states in which the beam is balanced are the target states shared by both individuals. Whether utilizing two, three, or four weights, there exists a wide array of combinations for hanging weights to attain balance. The Depth First Search (Cormen et al., Reference Cormen, Leiserson, Rivest and Stein2022) is adept at swiftly locating all the target states in the Balance Beam task within one second. When using two, three, and four weights, there are 24, 68, and 40 target states, respectively.

A transition between states occurs when the position of any of the four weights changes. The testing platform accommodates four kinds of actions capable of inducing a position change and a state-to-state transition, which include hanging weights, removing weights, transferring weights, and shifting notches on the same side Fig. 6 illustrates four types of transitions that can occur when a single weight is moved among ten possible positions. Note that Fig. 6 is not the state transition diagram showing all possible paths in this task. Under these conditions, the task permits 168,000 possible state-to-state transitions. For any intermediate state, all target states are accessible Two students have the flexibility to change targets at any time. Thus, it is impractical to represent all states and transitions through a diagram or a table, let alone find all the optimal paths from a given state to target states.

Step (2): Evaluate the effectiveness of all states Within the context of the Balance Beam task, the effectiveness indicators d s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{s}$$\end{document} of the state s can be interpreted as the theoretical shortest distance between the current state and any group-level target state The state effectiveness defined under the grouplevel target does not distinguish to which student the remaining transitions belong. Obviously, the effectiveness of all target states is set as d target = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{target}=0$$\end{document} . Subsequently, the effectiveness of all non-target states is evaluated. Since Any intermediate state can reach all target states before exiting the mission, we developed a rule-based algorithm which incorporates the positional encoding and edit distance. This approach can be programmed to compute the shortest distance between any intermediate and target states swiftly and precisely to avoid the labor-intensive and potentially error-prone process of manual computations. Further details of the algorithm can be found in Appendix A.

Step (3): Calculate the effectiveness of all transitions. With the effectiveness of all states determined, the effectiveness of all the transition can be obtained simply by Δ d s s = d s - d s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s'}={d_{s}-d}_{s'}$$\end{document} . This transition effectiveness defined at the group level represents the impact of an individual’s action on accomplishing the common goal shared by the two people.

In the Balance Beam task, the number of original states and transitions is too large, and the role of the original states (such as all four weights being with student A) and actions (for instance, student A passing the 50 g weight to student B) is ambiguous in terms of problem-solving. The two types of effectiveness d s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{s} $$\end{document} and Δ d s s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s'}$$\end{document} facilitate a rapid and meaningful categorization of these numerous states and transitions. The polytomous effectiveness indicators not only reduce the number of categories for states and transitions but also enhance the interpretability of further analysis for this task. In the balance beam task, all states can be divided into 6 types based on the value of d s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{s}$$\end{document} from 0 to 5. Each type of state can transition from itself (e.g., 5 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document} 5), or from states that are one transition away (e.g., 5 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document} 4), resulting in 16 types of transitions according to the effectiveness of the states before and after the transition s s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s\rightarrow s'$$\end{document} . Furthermore, based on the values of transition effectiveness Δ d s s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\Delta d}_{s\rightarrow s'}$$\end{document} , all transitions can be classified into 3 categories: advancing towards the target ( Δ d s s = 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\Delta d}_{s\rightarrow s'}=1)$$\end{document} , staying in the same place ( Δ d s s = 0 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\Delta d}_{s\rightarrow s'}=0)$$\end{document} , and moving away from the target ( Δ d s s = - 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\Delta d}_{s\rightarrow s'}=-1)$$\end{document} .

2. Sequential Response Model with Polytomous Effectiveness Indicators (SRM-PEI)

The new proposed indicators in Section 1 not only allow the description of response characteristics, but also enable the development of measurement models for complex problem-solving tasks. Specifically, we maintain the framework of the measurement model combined with the random process, substituting the dichotomous effectiveness indicator for the transition with the polytomous effectiveness indicator. As an example, we take the Sequential Response Model (SRM; Han et al. (Reference Han, Liu and Ji2022)), a model for state transitions with a dichotomous effectiveness indicator, to illustrate how to extend a model designed for a single optimal path task to one applicable for a complex task using the new effectiveness indicator. We call it the Sequential Response Model with Polytomous Effectiveness Indicators (SRM-PEI).

2.1. Model Specification

Drawing inspiration from SRM (Han et al., Reference Han, Liu and Ji2022), we focus on the state transitions prompted by a respondent’s actions, viewing these as external manifestations of latent ability. Each state is treated as an item, while each transition originating from this state is considered a choice pertaining to that item. This structure helps us conceptualize and analyze the transitions in process data within an IRT model. The effectiveness indicators of transitions ( Δ d s s ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\Delta d}_{s\rightarrow s'})$$\end{document} in SRM-PEI provide a more nuanced assessment of how good or bad each transition is.

Assuming that the next state s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s'$$\end{document} only depends on the current state s and the respondent’s latent ability during the problem-solving process We can treat the response sequence as a discrete-time stochastic process with a conditional Markov property. Given that a state can have multiple transitions in a complex problem-solving task, we employ the effectiveness indicators to ascertain the relative superiority among all transitions. The SRM-PEI can thus be built within the framework of the NRM The SRM-PEI specifies the conditional probability of respondent i choosing to reach state s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s\prime $$\end{document} when in problem state s as follows:

(1) P ( S i , l + 1 = s | S i , l = s , θ i , λ , D ) = exp ( Δ d s s · θ i + λ s s ) x M s exp ( Δ d s x · θ i + λ s x ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} P(S_{i,l+1}\mathrm {=}s'\mathrm { \vert }S_{i,l}=s,\theta _{i},{{\varvec{{\lambda , \textrm{D}}}}})=\frac{\text {exp}({{\Delta }d}_{s\rightarrow s' }{\cdot }{ \theta }_{i}{+}\lambda _{s\rightarrow s'})}{\sum \nolimits _{x\in M_{s}} {\text {exp}({{\Delta }d}_{s\rightarrow x }{\cdot }{ \theta }_{i}{+}\lambda _{s\rightarrow x})} } \end{aligned}$$\end{document}

where θ i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${ \theta }_{i}$$\end{document} represents the latent ability of respondent i, while λ s s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{s\rightarrow s'}$$\end{document} is the tendency parameter for the transition from state s to s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s'$$\end{document} and reflects the easiness of the transition. A larger value of λ s s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{s\rightarrow s'}$$\end{document} indicates a higher likelihood of making that transition. λ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {\mathbf {\lambda }}$$\end{document} is a vector of tendency parameters for all transitions within the task. M s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_{s}$$\end{document} represents the set of reachable states in the next step from the current state s; and Δ d s s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s' }$$\end{document} is the effectiveness indicator calculated in the previous section, forming a vector D of effectiveness indicators for all transitions.

2.2. Model Estimation

For flexibility and convenience in implementation, Bayesian estimation is adopted to estimate the parameters of latent ability and transition tendency. Let θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }$$\end{document} and S denote the collection of latent variables for n respondents and their response sequences. The posterior probability of θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\theta }}$$\end{document} and S can be expressed as follows:

(2) p ( θ , λ | S , D ) p ( S | θ , λ , D ) p ( θ , λ ) = i = 1 n l = 1 L i - 1 p ( S i , l + 1 = s | S i , l = s , θ i , λ , D ) p ( θ i ) p ( λ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p{({\varvec{{\theta , \lambda }}}}\vert \textbf{S,D}{)\propto }p{(}{\textbf{S}}{\varvec{{\vert \mathbf {\theta , \lambda , }}}}{\textbf{D}}{)}p({{\varvec{{\theta , \lambda }}}})=\prod \limits _{i=1}^n \prod \limits _{l=1}^{L_{i}-1} {p{(}S_{i,l+1}{=}s'{ \vert }S_{i,l}{=}s{,}\theta _{i}{,{\varvec{{\lambda }}},{\textbf{D}})}} p(\theta _{i})p({{\varvec{{\lambda )}}}} \end{aligned}$$\end{document}

where p ( θ i ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\theta _{i})$$\end{document} and p ( λ ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p({\varvec{\lambda )}}$$\end{document} are the prior distributions of the latent ability and vector of the transition tendency parameter, respectively, and are assumed to be independent of each other. To ensure model identification, the sum of all tendency parameters of transitions from the same state is constrained to be zero. To simplify Bayesian estimation, we assume that the prior distributions for the latent abilities are assumed to be standard normal distributions. The Markov chain Monte Carlo (MCMC) estimation is implemented using the Metropolis-Hastings-within-Gibbs sampling approach to empirically approximate the joint posterior distribution (Patz and Junker, Reference Patz and Junker1999a, Reference Patz and Junkerb). The detailed sampling procedures can be found in the Appendix of Han et al. (Reference Han, Liu and Ji2022).

It is crucial to assess the convergence and model fit in Bayesian estimation. We used two methods to monitor MCMC convergence: (1) the potential scale reduction factor (PSRF; Gelman & Rubin, Reference Gelman and Rubin1992), where PSRF values approximating 1 suggest convergence; (2) Monte Carlo error (MCE; Koehler et al. (Reference Koehler, Brown and Haneuse2009)) which measures the standard deviation of the sample means across chains. A smaller MCE indicates less variability between different chains, hence a higher likelihood of convergence. Following the assurance of convergence, we employ Posterior predictive checking (PPC) using the test statistics approach to evaluate the model-data fit (Gelman et al., Reference Gelman, Meng and Stern1996, Reference Gelman, Carlin, Stern, Dunson, Vehtari and Rubin2014; Guttman, Reference Guttman1967; Rubin, Reference Rubin1984). Specifically, we visually compare the observed frequencies of transitions to those obtained from the posterior predictive data. Additionally, we compute the posterior predictive p-value (ppp) based on the chi-square test of the two distributions. A ppp value close to 0.5 signifies a good model fit (Gelman et al., Reference Gelman, Carlin, Stern, Dunson, Vehtari and Rubin2014).

3. Simulation Study

In this section, we have performed a Monte Carlo simulation study to assess the precision of the SRM-PEI in estimating latent abilities and transition tendencies within the context of a simulated problem-solving task encompassing multiple optimal paths. This study is designed to explore the impacts of differing prior distributions, sample sizes, and lengths of response sequences on the parameter recovery performance of the SRM-PEI.

4. Simulation Design

In this simulation study, three factors were examined for their potential impact on the performance of SRM-PEI: sample sizes, sequence lengths, and prior distributions for transition tendency parameters. This resulted in a total of 16 condition: 4 (sample sizes: 200, 500, 1000, and 2000) × \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} 2 (sequence lengths: short, long) × \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} 2 (prior distributions: informative, non-informative). Each condition was replicated 100 times. The descriptive of the simulated problem-solving task and the effectiveness of the states and transitions are given in Supplementary Material S1

The true ability parameters were generated from a standard normal distribution for each replication. The transition tendency parameters can influence the lengths of the observed sequences (Han et al., Reference Han, Liu and Ji2022). Based on the values of polytomous transition effectiveness and the magnitude of transition tendency parameters in the original SRM (Han et al., Reference Han, Liu and Ji2022), we designed two sets of tendency parameters for SRM-PEI that could consistently generate differences in sequence length (see Table S2 in the Supplementary Material). To ensure model identification in SRM-PEI, a constraint was placed on x M s λ s x = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum \nolimits _{x\mathrm {\in }M_{s}} \mathrm {\lambda }_{s\rightarrow x} \mathrm {=0}$$\end{document} . The priors for transition tendency parameters were only set for transitions with effectiveness Δ d s s < 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s' }<1$$\end{document} , i.e., transitions not lying on the optimal paths. Tendency parameters of transitions with Δ d s s = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s' }=1$$\end{document} were set to equal the opposite of the sum of the tendency parameters for transitions with Δ d s s < 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s' }<1$$\end{document} starting from the same state s The informative prior was a standard multivariate normal distribution λ Δ d s s < 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {\lambda }_{{\mathrm {\Delta }d}_{s\rightarrow s' }<1}$$\end{document} ~MVN(0, I), where I refers to the identity matrix. The non-informative prior only changed the standard deviation to 10, so λ Δ d s s < 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {\lambda }_{{\mathrm {\Delta }d}_{s\rightarrow s' }<1}$$\end{document} ~MVN(0, I · \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\cdot $$\end{document} 100). The prior for the ability parameters was set as a standard normal distribution.

In MCMC, three chains were implemented with different initial values. The 10,000 samples from each chain were obtained, and the first 5,000 samples were discarded as burn-in. The remaining 15,000 samples in total were taken as the posterior distribution for each parameter. The last 500 samples from each of the three chains were used to conduct posterior predictive checks (PPC)

4.1. Results of the Simulation Study

Under all conditions, the MCMC estimates of SRM-PEI converged normally. The PSRF values for all parameters in both task settings were between 1 and 1.1, providing evidence of convergence (Brooks and Gelman, Reference Brooks and Gelman1998; Gelman and Rubin, Reference Gelman and Rubin1992). Furthermore, the MCE was 0.024 for ability parameters and 0.004 for tendency parameters, which suggests negligible differences between the means of the sampling chains for each parameter and supports the assertion of convergence. In terms of model-data fit, the ppp value of 0.598 was close to 0.5, indicating a good fit between the model and the data. The empirical values of each transition in the observed data were consistent with the median of the posterior predictive distributions (see Figure S1 in the Supplementary Material).

The average sequence lengths of the short and long sequence conditions in this task were 17 and 45, respectively. To evaluate the accuracy of parameter estimation, four metrics were calculated under each condition: BIAS, MAE (mean absolute error), RMSE (root mean squared error) and correlation between the estimated and true values of ability and tendency parameters. The estimation accuracy of ability and tendency parameters is shown in Table 2.

Table 2 The estimation accuracy of ability and tendency parameters in SRM-PEI in the simulated problem-solving tasks with multiple optimal paths.

a \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\textrm{a }}$$\end{document} The constraint leads to zero average bias over all tendency parameters.

In most conditions, the BIAS of latent ability parameters remains relatively small and decreases with larger sample size and longer sequence length, particularly under informative priors (standard normal distribution), where the absolute value of BIAS does not exceed 0.005. Moreover, the correlations between the estimated and true values were relatively high and seldom influenced by the prior, sequence length, and sample size. MAE and RMSE displayed more obvious differences across various factors. Generally speaking, the errors of tendency parameters were lower than those of latent abilities. The precision of the estimation was found to increase with sample size and sequence length, both of which emerged as key factors influencing MAE and RMSE Specifically, a larger sample size was associated with a more accurate estimation of tendency parameters, as reflected by the reduced MAE and RMSE. Likewise, longer sequences yielded lower MAE and RMSE for both latent abilities and tendency parameters. For complex multi-optimal-path tasks, informative priors were found to reduce estimation errors for both types of parameters, especially under conditions of a small sample size and a short sequence.

The estimation accuracy of each transition parameter was examined (see Table S3 in the Supplementary Material) Most of the tendency parameters exhibited an RMSE lower than average, and the tendency parameters with high accuracy were the important transitions in the problem-solving process. Setting informative priors can reduce the estimation errors of transition tendency parameters with low actual occurrence frequencies, especially in cases with small samples and short sequences. Some transition parameters with low occurrence frequencies might still face estimation errors due to a mismatch between the prior and the data, resulting in parameter estimates pulled towards the overall mean.

5. Empirical Study

In this section, we demonstrate the applicability of SRM-PEI by analyzing empirical data from two tasks, one with a single optimal path (i.e., the Ticket task) and the other with multiple optimal paths (i.e., the Balance Beam task). Specifically, we examine whether SRM-PEI can distinguish response patterns of different abilities and provide rational estimates of transition tendencies. In addition, we conduct model comparisons between SRM-PEI and other process models on both tasks

5.1. Ticket Task

5.1.1. Data Description and Analysis Process for the Ticket Task

To demonstrate the application of the SRM-PEI in conjunction with effectiveness indicators, we utilized it to analyze log file data from the sub-task CP038Q02 of the TICKET unit in PISA 2012. After excluding data that did not align with the transition diagram, we analyzed sequences from 31,906 students. The lengths of these sequences ranged from 2 to 110, with an average of 6.983 and a median of 6. For comparative purposes, we also implemented the original SRM (Han et al., Reference Han, Liu and Ji2022) and the SRMM (Xiao & Liu, 2023). We estimated transition tendency parameters and latent abilities for all three models. While SRM and SRMM relied on dichotomous effectiveness and Bayesian estimation as outlined in their respective studies, SRM-PEI used the polytomous transition effectiveness derived in Sect. 1. Given that effectiveness in SRM-PEI is akin to discrimination parameters in the NRM, extreme negative values (e.g., Δ d D L = - 5 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{D\rightarrow L}=-5)$$\end{document} were considered impractical. The pre-experiment show that directly using the effectiveness of the minimum value of - 5 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-5$$\end{document} harms the model-data fit of SRM-PEI and leads to unreasonable transition characteristic curves. We scaled the effectiveness indicators to a range of - 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-1$$\end{document} to 1 before integrating them into SRM-PEI. For the Bayesian estimation process, we employed three chains with 10,000 sampling iterations each, discarding the initial 5,000 as burn-in. The priors for latent ability and transition tendency parameters were set to standard normal and standard multivariate normal distributions, respectively. The approach for assessing convergence and model-data fit mirrored that used in the simulation study. To compare the models, we utilized various indices such as the Deviance Information Criterion (DIC; Spiegelhalter et al. (1998)), and Pseudo-Bayes Factor (PsBF; Geisser & Eddy, Reference Geisser and Eddy1979; Gelfand & Dey, Reference Gelfand and Dey1994). For DIC, lower values suggesting a model that provides a better fit without unnecessary complexity. According to Levy & Mislevy (Reference Levy and Mislevy2016, p. 246), a PsBF value greater than 3 is considered to provide positive, or even stronger, evidence in favor of Model 1 over Model 2.

The latent abilities estimated by the other two models, SRM and SRMM, were utilized to validate the SRM-PEI. Another essential part of our analysis involved determining whether latent abilities from SRM-PEI could account for the overall problem-solving performance in PISA 2012. To this end, we selected ten items (CP018Q04T, CP018Q05, CP025Q01, CP025Q02, CP036Q01, CP036Q02, CP036Q03, CP038Q01, CP038Q02, and CP038Q03) and used Rasch models to estimate overall problem-solving performance. In addition, we conducted an extensive calculation of effectiveness indicators for all states and transitions present in the sequences. This analysis led to the derivation of four aggregated indicators, which were employed to validate the latent abilities estimated by SRM-PEI. Three of these indicators were based on the effectiveness of transitions: the proportion of transitions that approach the target state (i.e., Δ d s s = 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s'}\mathrm {=1})$$\end{document} , transitions that maintain the same distance from the target state ( Δ d s s = 0 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s'}\mathrm {=0})$$\end{document} , and transitions that move away from the target state ( Δ d s s < 0 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s'}\mathrm {<0})$$\end{document} . The fourth indicator was state-based, reflecting the average shortest distance to the target state across all states in a given sequence. These indicators provided a comprehensive view of the students’ problem-solving processes, further substantiating the validity of the latent abilities estimated by SRM-PEI.

5.1.2. Results of the Empirical Study on the Ticket Task

The Bayesian estimation metrics for the three models, as shown in Table 3, proved the robustness of the MCMC estimates for the three models echoing the findings of previous studies (Han et al., Reference Han, Liu and Ji2022; Xiao and Liu, 2023; Fu et al., Reference Fu, Zhan, Chen and Jiao2023). The Potential Scale Reduction Factor (PSRF) for all parameters was below 1.1, and the Monte Carlo Error (MCE) for ability and tendency parameters was small. These indicators confirm that the MCMC for the three models has successfully converged. Furthermore, the posterior predictive p-value (ppp) for all three models was close to 0.5, suggesting an excellent fit between the models and the observed data. Most notably, two evaluation indices for model comparisons (DIC and PsBF) strongly supported the superiority of the SRM-PEI over the SRM and SRMM in modeling the Ticket task. This indicates that process models embedded with polytomous effectiveness parameters are a better fit than those with dichotomous effectiveness parameters.

Table 3 Model comparison of three models in the two empirical studies.

The posterior estimates for the transition tendency parameters from SRM-PEI are shown in Table 4 The transition parameters obtained for SRM and SRMM aligned closely with those reported in the original studies by Han et al. (Reference Han, Liu and Ji2022) and Xiao and Liu (2023). When grouping by transitions under the same state, the ranking order of the transition tendency parameters estimated by SRM-PEI was consistent with those derived from SRM and SRMM. Figure 3 displays characteristic curves for each group of transitions fitted by SRM-PEI, revealing that the transition tendency parameters assisted by polytomous effectiveness accurately portrayed the probabilities in a manner that reflects the inherent dynamics of the Ticket task. These curves demonstrated that students with higher abilities were more likely to engage in actions that brought them closer to the target state or returned to the initial state from the incorrect path. In contrast, students with lower abilities tended to engage in actions that enter or stay on the incorrect path. This distinction is crucial for understanding the variations in problem-solving abilities among students, as reflected in their choices during the task. It underscores the effectiveness of SRM-PEI in accurately capturing these subtle differences.

Table 4 Marginal posterior distributions for the transition tendency parameters of SRM-PEI for the Ticket task.

95%HPDL: Lower bound of 95% highest posterior density interval. 95%HPDU: Upper bound of 95% highest posterior density interval.

Figure 3 State transition characteristic curves for all transitions under each state of the Ticket task as estimated by SRM-PEI

Table 5 summarizes the marginal posterior distributions for latent abilities alongside corresponding response sequences, focusing on the five patterns that received the highest and lowest abilities from SRM-PEI. Compared with the states represented by letters, the implementation of state effectiveness encoding markedly enhanced the ability to discern students’ proximity to the target. Notably, behavioral patterns closely aligning with the optimal path were associated with the highest problem-solving abilities. In contrast, those students who initially pursued incorrect paths and faced difficulties in redirecting towards the correct path were assigned the lowest abilities.

The latent ability estimated by SRM-PEI demonstrated a very high correlation with the ability estimates from both SRM ( r = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r \quad =$$\end{document} 0.987 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\mathrm {***}})$$\end{document} and SRMM ( r = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r \quad =$$\end{document} 0.975 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\mathrm {***}})$$\end{document} . Furthermore, the correlation of the problem-solving abilities as assessed in PISA 2012 with latent abilities from SRM-PEI was marginally higher ( r = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r \quad =$$\end{document} 0.608 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\mathrm {***}})$$\end{document} compared to those obtained from SRM (r = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$=$$\end{document} 0.601 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\mathrm {***}})$$\end{document} and SRMM ( r = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r \quad =$$\end{document} 0.607 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\mathrm {***}})$$\end{document} . This finding suggests that SRM-PEI may provide a slightly more accurate representation of students’ problem-solving abilities. In addition, the latent abilities of SRM-PEI revealed significant correlations with aggregated variables that describe the problem-solving process. These correlations are indicative of the model’s nuanced understanding of students’ problem-solving strategies. Specifically, students with higher abilities were more likely to make progress towards the target (indicated by Δ d s s = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s'}=1$$\end{document} , r = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r \quad =$$\end{document} 0.976 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\mathrm {***}})$$\end{document} , and less inclined to maintain a constant distance ( Δ d s s = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s'}=0$$\end{document} , r = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r \quad =$$\end{document} –0.916 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\mathrm {***}})$$\end{document} or move backwards ( Δ d s s < 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s'}<0$$\end{document} , r = - 0 . 843 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r \quad = -0.843^{\mathrm {***}})$$\end{document} . They generally exhibited a shorter average distance to the target throughout their sequence of actions ( r = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r \quad =$$\end{document} –0.984 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\mathrm {***}})$$\end{document} . These findings demonstrate at the behavioral level that students with high abilities assessed by SRM-PEI tend to engage in more actions that are conducive to problem-solving and fewer actions that are detrimental to it. Moreover, they exhibit a preference for choosing paths that are easier to approach the targets throughout the problem-solving process. This interpretation underscores the importance of considering both the directionality of actions (towards or away from the target) and the overall strategic approach in assessing problem-solving abilities. The SRM-PEI’s ability to capture these aspects highlights its utility in providing a comprehensive evaluation of problem-solving skills in educational assessments.

Table 5 Marginal posterior distributions for the top five and bottom five abilities estimated by SRM-PEI and corresponding response patterns for the Ticket task.

95%HPDI: 95% highest posterior density interval.

a \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\textrm{a }}$$\end{document} The end of the number 0 means that the correct ticket was bought, the end of the number 7 means that the wrong ticket was bought, and the end of the other numbers means that the task was quit midway.

5.2. Balance Beam Task

5.2.1. Data Description and Analysis Process for the Balance Beam Task

Students from eighth and ninth grades across six schools in three regions of China participated in the Chinese version of the Balance Beam task developed by Yuan et al. (Reference Yuan, Xiao and Liu2019). In this study, we only used records from the sub-task requiring two weights to balance the beam. The states and transitions in the records inconsistent with the system settings (as detailed in Sect. 1.3) were excluded from the analysis. After data cleaning, there were a total of 422 groups with 167 successfully completing the task. On average, each group executed 33 transitions.

Based on the classification of effectiveness values in Sect. 1.3, there are 6 types of states and 16 types of state transitions. Additionally, Yuan’s testing system allows students to exit the test either midway through or after task completion. As a result, the extra termination states (marked as #) were added. Unlike the incorrect end state in the Ticket task, we defined the effectiveness of the transitions leading to the termination states in two cases: Exiting the system after reaching any target state (the transition is denoted as 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document} #) was considered correct with an effectiveness of 1 (i.e., Δ d 0 # = 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{0\rightarrow \# }=1)$$\end{document} . Since the task could not be continued after the termination, the transitions from the 5 types of non-target states ( d s = 5 , 4 , . . . , 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{s}\mathrm {=5, 4,...,1})$$\end{document} to the termination state were the incorrect early termination. A lower value of effectiveness than any other transition was assigned to these 5 transitions of early termination ( Δ d s # = - 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\Delta d}_{s\rightarrow \# }=-2$$\end{document} for any state s if d s > 0 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{s }\mathrm {>0})$$\end{document} , indicating them as the least preferable among the 22 (i.e., 16 + \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+$$\end{document} 1 + \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+$$\end{document} 5) transition types. In this study, we classified transitions based on the values of state effectiveness before and after the transition (22 categories), rather than using the original state representations from the task interface (168,000 categories) or the values of transition effectiveness (4 categories, i.e., Δ d s s = - 2 , - 1 , 0 , 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Delta }d}_{s\rightarrow s'}=-2,-1,0,1)$$\end{document} . By doing so, we ensured a manageable number of transition tendency parameters, with grouped transitions more likely to conform to the premise of equal difficulty, thereby sharing the same tendency parameters in the SRM-PEI.

Since Sect. 1.3 has defined the target states and the two types of effectiveness at the group level for the Balance Beam task, the measurement model combined with these effectiveness was designed to estimate the collective problem-solving ability of groups. To facilitate a comparison between SEM-PEI and the models utilizing dichotomous effectiveness, we adapted the polytomous effectiveness into dichotomous form. Specifically, transitions that progress closer to the targets (i.e., Δ d s s = 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s'}=1)$$\end{document} retained their effectiveness value of 1, denoting correct transitions. In contrast, transitions that lead away from the targets (i.e., Δ d s s < 0 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s'}<0)$$\end{document} were considered incorrect. For SRM and SRMM, these transitions were reassigned values of - 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-1$$\end{document} or 0 as dichotomous effectiveness, respectively. Transitions that keep a constant distance from the target (i.e., Δ d s s = 0 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s'}=0)$$\end{document} were bifurcated as either correct or incorrect, leading to two distinct versions of both SRM (termed SRM-v1 and SRM-v2) and SRMM (termed SRMM-v1 and SRMM-v2). The latent abilities estimated by these four versions offered a basis for validating the SRM-PEI. Mirroring the approach used for the Ticket task, we computed the proportion of each of the four types of transitions, categorized by their effectiveness values, within every sequence. This computation also included an assessment of the average distance from the target throughout the problem-solving process. For the Balance Beam task with multiple targets, we defined the nearest targets as the target states closest to the current state. Moreover, we quantified the average number of nearest targets, and the proportion of transitions that either augmented or reduced this number. To evaluate the efficacy of the SRM-PEI, we scrutinized the correlations between these seven aggregated indicators, based on both state and transition effectiveness, and the latent abilities estimated by the SRM-PEI. These correlations were integral to understanding the validity and interpretability of the SRM-PEI in measuring group problem-solving abilities.

5.2.2. Results of the Empirical Study on the Balance Beam Task

All models met the criteria of convergence and good model-data fit as shown in Table 3. The trace plots for the ability and transition tendency parameters specific to SRM-PEI are displayed in Figures S2 and S3 in the Supplementary Material The model comparison metrics indicated a preference for SRM-PEI over the two versions of SRM and SRMM. These preliminary findings suggest that, considering the complexity of the model, SRM-PEI is more apt at predicting data in complex problem-solving tasks that encompass multiple optimal paths.

Upon evaluating the transition tendency parameters estimated by SRM-PEI (see Table 6) and by two versions of SRM and SRMM (see Table S4 in the Supplementary Material), the assignment schemes of the effectiveness for transitions maintaining the same position (i.e., Δ d s s = 0 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s'}=0)$$\end{document} played an important role. When these transitions were considered incorrect, their tendency parameter rankings generally decreased A comparison of the rankings for the tendency parameters across each group of transitions revealed that SRM-v1 aligned most closely, albeit not identically, with the estimations of SRM-PEI. Another crucial observation was that assigning a small effectiveness value of - 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-2$$\end{document} to the five types of early termination transitions did not compromise the rationality and order of the tendency parameters in SRM-PEI. These transitions uniformly received the lowest rankings in terms of transition tendency parameters, signifying a strong preference to avoid prematurely ending the task. This inference was supported by the low percentages of groups that opted for early termination (0.8%, 1%, 4%, 0.6%, and 0.5%).

Figure 4 displays the characteristic curves for each group of transitions fitted by SRM-PEI. These curves demonstrated the nuanced capabilities of SRM-PEI, especially with its incorporation of polytomous transition effectiveness, in precisely fitting probability curves aligned with the task design. For states that were not the target, groups with lower abilities tended to exhibit a higher likelihood of either terminating the task prematurely or moving away from the target. In contrast, groups with medium abilities generally engaged in actions that maintain the current distance from the target. Most notably, groups with the highest abilities demonstrated a pronounced propensity to make transitions that brought them closer to the target. Upon reaching the target state, the pattern of responses shifted. Groups with the highest abilities are most likely to correctly conclude the task and exit the system, a behavior indicative of successful task completion. However, groups with lower abilities might persist in actions like passing weights (which did not affect the balance) or hanging or removing weights (which could disrupt the balance). These observed behaviors and the corresponding probability curves underscore the effectiveness of SRM-PEI in capturing the likelihood of various transitions accurately.

Table 6 Marginal posterior distributions for the transition tendency parameters of SRM-PEI for the Balance Beam task.

95%HPDL: Lower bound of 95% highest posterior density interval.

95%HPDU: Upper bound of 95% highest posterior density interval.

Figure 4 State transition characteristic curves for all transitions in each state of the Balance Beam task as estimated by SRM-PEI.

Table 7 showcases sequences associated with the highest and lowest abilities estimated by SRM-PEI. With the help of state effectiveness, we could clearly observe the problem-solving process in complex tasks in which the massive original states were challenging to represent by letters. High-ability groups efficiently located and followed optimal paths, demonstrating proficient problem-solving processes. Conversely, low-ability groups wandered in states further from the targets than the initial state and finally terminated the test process. States with an effectiveness value of 4, which were one step further from the targets than the initial state with an effectiveness value of 3, typically resulted from an incorrect action such as the improper transfer or suspension of one weight.

The latent problem-solving abilities of groups estimated by SRM-PEI exhibited a very high correlation with those derived from SRM ( r = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r \quad =$$\end{document} 0.922*** for SRM-v1 and r = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r \quad =$$\end{document} 0.931*** for SRM-v2) and SRMM ( r = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r \quad =$$\end{document} 0.888*** for SRMM-v1 and r = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r \quad =$$\end{document} 0.924*** for SRMM-v2), indicating strong consistency across these models. In terms of aggregated indictors, groups with higher abilities demonstrated a greater likelihood of advancing towards the target ( Δ d s s = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s' }=1$$\end{document} , r = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r \quad =$$\end{document} 0.845***). They were less prone to maintaining the same distance ( Δ d s s = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s' }=0$$\end{document} , r = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r \quad =$$\end{document} - \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document} 0.502***), retreating one step ( Δ d s s = - 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s' }=-1$$\end{document} , r = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r \quad =$$\end{document} - \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document} 0.574***), or terminating the task prematurely ( Δ d s s = - 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s' }=-2$$\end{document} , r = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r \quad =$$\end{document} - \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document} 0.381***). These groups also showed a shorter average distance to targets throughout the sequence ( r = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r \quad =$$\end{document} - \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document} 0.928***), suggesting efficient progression towards task completion. Furthermore, they tended to focus on fewer targets ( r = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r \quad =$$\end{document} - \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document} 0.422***), and take more actions that reduced ( r = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r \quad =$$\end{document} 0.735***) rather than increasing the number of nearest targets ( r = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r \quad =$$\end{document} - \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document} 0.471***). These findings underscore the strong alignment between the latent ability as estimated by SRM-PEI and the actual performance in the problem-solving process. The results affirm that SRM-PEI effectively characterizes the procedural aspects of evaluation, highlighting its utility in assessing complex problem-solving skills.

Table 7 Marginal posterior distributions for the top five and bottom five abilities estimated by SRM-PEI and corresponding response patterns for the Balance Beam task.

95%HPDI: 95% highest posterior density interval.

a \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\textrm{a }}$$\end{document} The state effectiveness for the termination state (#) is not defined.

6. Discussion

In this study, we innovated a method that assesses the effectiveness indicators of problem states d s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{s }$$\end{document} and transitions Δ d s s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s' }$$\end{document} in problem-solving tasks. Moreover, we proposed a measurement model named the sequential response model with polytomous effectiveness indicators (SRM-PEI) Through simulation and empirical studies, we demonstrated that the effectiveness indicators and SRM-PEI are capable of estimating latent problem-solving ability in various types of tasks.

Following the concepts of problem-solving and the characteristics of interactive tasks, we defined the effectiveness of a state as the theoretical shortest distance from the target state and the effectiveness of a transition between two states as the change in the theoretical shortest distance to the target. To facilitate the application, we proposed a general algorithm for computing the effectiveness indicators and illustrated the calculation process and results using two real tasks. Applied to measurement models and sequence-based aggregated features, we established an accessible methodology that promotes standardization and accuracy in the interactive problem-solving tests. Our proposed effectiveness indicators demonstrate several primary benefits. First, the automated nature of the evaluation method allows for rapid computation of the effectiveness of all states and transitions through straightforward programming. Second, our evaluation approach is not reliant on response data collection and can be performed once the task simulation system is designed or the task is planned out. Third, in the context of complex tasks with multiple states and transitions, the effectiveness indicators assist in simplifying and categorizing states and transitions. The indicator values provide a clear semantic understanding of different categories, as illustrated in the Balance Beam task. Fourth, in the simple task with a single optimal path, such as Ticket, the polytomous effectiveness indicators offer a more detailed classification and richer information compared to dichotomous effectiveness. Fifth, in the reinforcement learning framework that first proposed action effectiveness (i.e., (LaMar, Reference LaMar2018)), there is also a state-value function, closely related to the action value function, which computes the expected weighted rewards in the future for a given state, but the calculation is complex and difficult to use. Our proposed state effectiveness has a concise meaning and low computational complexity to evaluate the value of states, which enriches the applicable scope and improves the usability of effectiveness indicators.

The SRM-PEI model leverages the full potential of the effectiveness indicator Δ d s s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s' }$$\end{document} for evaluating and differentiating various types of state transitions, especially filling the gaps in the modeling and analysis of process data in complex problem-solving tasks. Furthermore, the SRM-PEI model introduces a new tool for data analysis that can facilitate the development of more intricate and realistic problem-solving interactive tests, as well as the evaluation of higher-level cognitive abilities. Both simulation and empirical studies demonstrated that the SRM-PEI model provides a comprehensive characterization of the easiness and probability of occurrence for a vast number of state transitions. The transition characteristic curves not only serve to further evaluate whether the model accurately fits the data, but also provide a detailed and intuitive description of the difficulty of each transition within the task. These curves can be utilized to study the adaptability of SRM-PEI to various tasks, inform the design of scoring based on transition tendencies, and validate the scores. In this study, in order to test the validity of latent abilities, we also innovatively created many aggregation indicators based on two effectiveness indicators, and all of them have a very high correlation with the latent ability estimated by SRM-PEI.

Unlike the values of dichotomous effectiveness indicators in CTDC, SRM, and SRMM, which merely categorize responses as correct or incorrect, the effectiveness indicators Δ d s s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s' }$$\end{document} provide more practical meaning, as their value signifies the distance toward the target that a transition affords. Furthermore, the SRM-PEI broadens the applicability of these types of models to encompass more complex problem-solving tasks. The aforementioned models—CTDC, SRM, and SRMM—while insightful, are best suited to the problem-solving task with a single optimal path. In contrast, the introduction of effectiveness indicators allows SRM-PEI to navigate inherently complex tasks with multiple optimal paths, as demonstrated by the Balance Beam task. Therefore, the development of polytomous effectiveness and SRM-PEI signals a meaningful progression in the analysis of process data in complex problem-solving tasks. We also simplify the polytomous effectiveness of transitions to dichotomous version to makes SRM and SRMM capable of the Balance Beam task. From a different perspective, the SRM can be considered a special case of the SRM-PEI with restricted effectiveness indicators. If the ability and tendency parameters in SRM-PEI are reparametrized, then both the CTDC and SRMM can also be viewed as special cases of the SRM-PEI with restricted effectiveness indicators. Additionally, two issues need to be considered when estimating models with the polytomous effectiveness Δ d s s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s' }$$\end{document} : First, this study categorizes transitions in the Balance Beam task based on the values of effectiveness, which implies the assumption that all transitions with the same value of Δ d s s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s' }$$\end{document} have equal difficulty and can be estimated with the same transition tendency parameter. This assumption needs to be evaluated for its applicability to different tasks. Second, when some values of Δ d s s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm {\Delta }d}_{s\rightarrow s' }$$\end{document} are excessively small as demonstrated in the Ticket task, it is feasible to scale the original values to a range appropriate for the NRM framework to achieve a better model-data fit of SRM-PEI.

This study serves as an initial exploration, and several areas warrant further research in the future. (1) Effectiveness indicators can be leveraged across different levels of process data analysis and integrated with a wider range of models and analytic methodologies. In the framework of computational psychometrics put forward by von Davier (Reference von Davier2017) for unstructured data in computer-based interactive assessments, effectiveness indicators could be used not only in measurement models, but also in sequence-based analysis approaches. (2) In addition to the two approaches demonstrated on the Ticket task and the Balance Beam task, there are various ways to assign effectiveness values to incorrect end or termination states and transitions leading to these states. The impact of different effectiveness values on SRM-PEI or other methods is also worth exploring. (3) There are numerous possibilities to improve the SRM-PEI. Though two empirical studies have all been conducted on a single sub-task, SRM-PEI is capable of handling multiple tasks like other psychometrics process models. After classifying states and transitions with the same evaluation procedure, SRM can be utilized to analyze the total ability across multiple sub-tasks. From a methodology perspective SRM-PEI could also be extended to a multidimensional form to estimate the abilities of two individuals, as well as two distinct types of abilities in collaborative problem-solving tasks (Yuan et al., Reference Yuan, Xiao and Liu2019; Li et al., Reference Li, Liu, Cai and Yuan2023)

Funding

No funding was received to assist with the preparation of this manuscript. The authors have no competing interests to declare that are relevant to the content of this article.

Data Availability

The data analyzed in the empirical example of this study are available on this project’s Open Science Framework (OSF) page: https://osf.io/fw82q/.

Code Availability

The codes are available on this project’s Open Science Framework (OSF) page: https://osf.io/fw82q/.

Appendix A. Algorithm for Automatically Calculating State Effectiveness in the Balance Beam Task

Figure 5 The interface of the initial state in the Chinese version of the Balance Beam task.

Figure 6 The diagram for the four types of transitions that can occur when a weight moves among ten possible positions in the Balance Beam task.

In the Balance Beam task, the ten potential positions for each weight are categorized into four groups: (1) Positions 1–4: Positioned on side A of the beam; (2) Position 5: Not suspended on side A; (3) Position 6: Not suspended on side B; (4) Positions 7–10: Positioned on side B of the beam. Figure 5 illustrates the transition of each weight among ten positions through four types of operations: (1) removing a weight from the beam; (2) hanging an unhung weight; (3) passing a weight to the other student; and (4) shifting the position of a weight on the same side. Each arrow represents an operation that can lead to a transition. Through this figure, we can easily find the minimum number of transitions between any two positions for one weight. Since an operation can only alter the position of one weight once, the shortest distance between states s and s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s'$$\end{document} equals the sum of the minimum number of operations required for each of the four weights to change its position from state s to s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s'$$\end{document} . Then, we can quickly and accurately calculate the shortest distance d s ( k ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{s}^{(k)}$$\end{document} between a state s and the target state s target ( k ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{target}^{(k)}$$\end{document} using the state code and rules to change the position according to Fig. 6. Finally, we select the minimum distance d s = min d s ( 1 ) , d s ( 2 ) , , d s ( k ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{s}=\min \left( d_{s}^{(1)}, d_{s}^{(2)}, \ldots , d_{s}^{(k)}\right) $$\end{document} as the effectiveness indicator d s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{s}$$\end{document} of the state s

During the process of programming the calculations mentioned above, the position of each weight can be assigned a unique number from one to ten. Therefore, any given state in the Balance Beam task can be encoded by a sequence of four numbers, a representation we refer to as the state code. For one weight, calculating the shortest distance between any two positions can be simplified by several rules. The R code for evaluating the effectiveness of states for the Balance Beam task that requires the use of two weights to achieve balance is available at https://osf.io/fw82q/.

In the example of the code, the four positions for hanging weights on the balance beam on student A’s side are coded as 1 to 4, and the four positions on student B’s side are coded as - 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-1$$\end{document} to - 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-4$$\end{document} . The unhung weights are coded as 0.5 when in student A’s hand and - \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document} 0.5 when in student B’s hand. In the initial state, all four weights are in the hand of A, and the state code is (0.5, 0.5, 0.5, 0.5). The effectiveness of the initial state is equal to 3, which means that the balance state using two weights can be achieved after a minimum of three transitions. Another example is that Student B holds the 50 g and 100 g weights and Student A has hung the 300 g weight at position 1 and the 500 g weight at position 2. This state is at a minimum distance of 2 from the balance state.

Footnotes

Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/s11336-024-09963-8.

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

References

Anderson, J. R., Funke, J., & Plata, G. (Eds.). (2007). Cognitive psychologic (6 Aufl.). Spektrum Akademischer Verlag http://www.gbv.de/dms/bs/toc/529836963.pdf.Google Scholar
Arieli-Attali, M., Ou, L., Simmering, V. R. (2019). Understanding test takers’ choices in a self-adapted test: A hidden Markov modeling of process data. Frontiers in Psychology, 10, 83.CrossRefGoogle Scholar
Bergner, Y., von Davier, A. A. (2019). Process data in NAEP: Past, present, and future. Journal of Educational and Behavioral Statistics, 44(6), 706732.CrossRefGoogle Scholar
Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 2951.CrossRefGoogle Scholar
Brooks, S. P., Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7, 434455.CrossRefGoogle Scholar
Buchner, A., Funke, J. (1993). Finite-state automata: Dynamic task environments in problem-solving research. The Quarterly Journal of Experimental Psychology, 46(1), 83118.CrossRefGoogle Scholar
Chen, Y. (2020). A continuous-time dynamic choice measurement model for problem-solving process data. Psychometrika, 85(4), 10521075.CrossRefGoogle ScholarPubMed
Cormen, T. H., Leiserson, C. E., Rivest, R. L., Stein, C. (2022). Introduction to algorithms, 4Cambridge: MIT Press 563572.Google Scholar
Fu, Y., Zhan, P., Chen, Q., Jiao, H. (2023). Joint modeling of action sequences and action time in computer-based interactive tasks. Behav Res Methods, .CrossRefGoogle ScholarPubMed
Funke, J. (2001). Dynamic systems as tools for analysing human judgement. Think Reason, 7, 6989.CrossRefGoogle Scholar
Geisser, S., Eddy, W. F. (1979). A predictive approach to model selection. J Am Stat Assoc, 74, 153160.CrossRefGoogle Scholar
Gelfand, A. E., Dey, D. K. (1994). Bayesian model choice: Asymptotics and exact calculations. Journal of the Royal Statistical Society Series B, 56, 501514.CrossRefGoogle Scholar
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., Rubin, D. B. (2014). Bayesian data analysis, 3Boca Raton: Chapman & Hall/CRC Press.Google Scholar
Gelman, A., Meng, X.-L., Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6, 733760.Google Scholar
Gelman, A., Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457472.CrossRefGoogle Scholar
Griffin, P., Care, E. (2014). Assessment and teaching of 21st century skills: Methods and approach, New York, NY: Springer.Google Scholar
Guttman, I. (1967). The use of the concept of a future observation in goodness-of-fit problems. Journal of the Royal Statistical Society: Series B (Methodological), 29(1), 83100.CrossRefGoogle Scholar
Han, Y., Liu, H., Ji, F. (2022). A sequential response model for analyzing process data on technology-based problem-solving tasks. Multivariate Behavioral Research, 57(6), 960977.CrossRefGoogle ScholarPubMed
Han, Y., Wilson, M. (2022). Analyzing student response processes to evaluate success on a technology-based problem-solving task. Applied Measurement in Education, 35(1), 3345.CrossRefGoogle Scholar
He, Q., von Davier, M. (2015). Identifying feature sequences from process data in problem-solving items with n-grams. In van der Ark, L. A., Bolt, D. M., Wang, W.-C., Douglas, J. A., Chow, S.-M. (Eds), Quantitative psychology research, Berlin: Springer 173190.CrossRefGoogle Scholar
He, Q., von Davier, M. (2016). Analyzing process data from problem-solving items with N-Grams: Insights from a computer-based large-scale assessment. In Rosen, Y., Ferrara, S., Mosharraf, M. (Eds), Handbook of research on technology tools for real-world skill development, Pennsylvania: IGI Global 750777.CrossRefGoogle Scholar
Koehler, E., Brown, E., Haneuse, J. P. A. (2009). On the assessment of monte carlo error in simulation-based statistical analyses. The American Statistician, 63(2), 155162.CrossRefGoogle ScholarPubMed
LaMar, M. M. (2018). Markov decision process measurement model. Psychometrika, 83(1), 6788.CrossRefGoogle ScholarPubMed
Levy, R., Mislevy, R. J. (2016). Bayesian psychometric modeling, Cambridge: CRC Press.Google Scholar
Li, M., Liu, H., Cai, M., Yuan, J. (2023). Estimation of individuals’ collaborative problem solving ability in computer-based assessment. Education and Information Technologies, .Google Scholar
Liu, H., Liu, Y., Li, M. (2018). Analysis of process data of PISA 2012 computer-based problem solving: Application of the modified multilevel mixture IRT model. Frontiers in Psychology, 9, 1372.CrossRefGoogle ScholarPubMed
Mayer, R. E., Wittrock, M. C. (2006). Problem solving. In Alexander, P. A., Winne, P. H. (Eds), Handbook of educational psychology, 2Mahwah: Erlbaum 287304.Google Scholar
Newell, A., Simon, H. A. (1972). Human problem solving, Englewood Cliffs: Prentice-Hall.Google Scholar
OECD (2014). PISA 2012 results: Creative problem solving: Studentsskills in tackling real-life problems (Vol. V). OECD.CrossRefGoogle Scholar
OECD. (2016). PISA 2015 Assessment and analytical framework: Science. Reading, mathematic and financial literacy: PISA. OECD Publishing. https://doi.org/10.1787/9789264255425-en.CrossRefGoogle Scholar
OECD (2018). The future of education and skills: Education 2030, Paris: OECD Publishing.Google Scholar
Patz, R. J., Junker, B. W. (1999). Applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses. Journal of Educational and Behavioral Statistics, 24(4), 342366.CrossRefGoogle Scholar
Patz, R. J., Junker, B. W. (1999). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24(2), 146178.CrossRefGoogle Scholar
Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applies statistician. The Annals of Statistics, 12, 11511172.CrossRefGoogle Scholar
Shu, Z., Bergner, Y., Zhu, M., Hao, J., von Davier, A. A. (2017). An item response theory analysis of problem-solving processes in scenario-based tasks. Psychological Test and Assessment Modeling, 59(1), 109131.Google Scholar
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van der Linde, A. (1998). Bayesian deviance, the effective number of parameters, and the comparison of arbitrarily complex models. MRC Biostatistics Unit: Technical report.Google Scholar
Tang, X. (2023). A latent hidden Markov model for process data. Psychometrika, .Google ScholarPubMed
Tang, X., Wang, Z., He, Q., Liu, J., Ying, Z. (2020). Latent feature extraction for process data via multidimensional scaling. Psychometrika, 85(2), 378397.CrossRefGoogle ScholarPubMed
von Davier, A. A. (2017). Computational psychometrics in support of collaborative educational assessments. Journal of Educational Measurement, 54(1), 311.CrossRefGoogle Scholar
Xiao, Y., He, Q., Veldkamp, B., Liu, H. (2021). Exploring latent states of problem-solving competence using hidden Markov model on process data. Journal of Computer Assisted Learning, 37(5), 12321247.CrossRefGoogle Scholar
Xiao, Y., & Liu, H. (2023). A state response measurement model for problem-solving process data. Behavior Research Methods, 1–20. https://doi.org/10.3758/s13428-022-02042-9.CrossRefGoogle Scholar
Yuan, J., Xiao, Y., Liu, H. (2019). Assessment of collaborative problem solving based on process stream data: A new paradigm for extracting indicators and modeling dyad data. Frontiers in Psychology, 10, 369.CrossRefGoogle Scholar
Figure 0

Figure 1 Diagram of two scenarios with multiple backward transitions and target states.

Figure 1

Figure 2 A new transition diagram for the CP038Q02 subtask of the Ticket task in the PISA 2012. Note: The solid arrows represent transitions that move closer to the target state, while the dotted arrows represent transitions that do not move closer to the target state.

Figure 2

Table 1 The effectiveness of all states and transitions in the CP038Q02 task.

Figure 3

Table 2 The estimation accuracy of ability and tendency parameters in SRM-PEI in the simulated problem-solving tasks with multiple optimal paths.

Figure 4

Table 3 Model comparison of three models in the two empirical studies.

Figure 5

Table 4 Marginal posterior distributions for the transition tendency parameters of SRM-PEI for the Ticket task.

Figure 6

Figure 3 State transition characteristic curves for all transitions under each state of the Ticket task as estimated by SRM-PEI

Figure 7

Table 5 Marginal posterior distributions for the top five and bottom five abilities estimated by SRM-PEI and corresponding response patterns for the Ticket task.

Figure 8

Table 6 Marginal posterior distributions for the transition tendency parameters of SRM-PEI for the Balance Beam task.

Figure 9

Figure 4 State transition characteristic curves for all transitions in each state of the Balance Beam task as estimated by SRM-PEI.

Figure 10

Table 7 Marginal posterior distributions for the top five and bottom five abilities estimated by SRM-PEI and corresponding response patterns for the Balance Beam task.

Figure 11

Figure 5 The interface of the initial state in the Chinese version of the Balance Beam task.

Figure 12

Figure 6 The diagram for the four types of transitions that can occur when a weight moves among ten possible positions in the Balance Beam task.

Supplementary material: File

Wang and Liu Supplementary material

Wang and Liu Supplementary material
Download Wang and Liu Supplementary material(File)
File 1.1 MB