Hostname: page-component-5f745c7db-6bmsf Total loading time: 0 Render date: 2025-01-06T06:59:28.186Z Has data issue: true hasContentIssue false

A Diagnostic Facet Status Model (DFSM) for Extracting Instructionally Useful Information from Diagnostic Assessment

Published online by Cambridge University Press:  01 January 2025

Chun Wang*
Affiliation:
University of Washington
*
Correspondence should bemade to ChunWang, 312E Miller Hall Measurement and Statistics, College of Education, University of Washington, 2012 Skagit Ln, Seattle, WA 98105, USA. Email: wang4066@uw.edu
Rights & Permissions [Opens in a new window]

Abstract

Modern assessment demands, resulting from educational reform efforts, call for strengthening diagnostic testing capabilities to identify not only the understanding of expected learning goals but also related intermediate understandings that are steppingstones on pathways to learning goals. An accurate and nuanced way of interpreting assessment results will allow subsequent instructional actions to be targeted. An appropriate psychometric model is indispensable in this regard. In this study, we developed a new psychometric model, namely, the diagnostic facet status model (DFSM), which belongs to the general class of cognitive diagnostic models (CDM), but with two notable features: (1) it simultaneously models students’ target understanding (i.e., goal facet) and intermediate understanding (i.e., intermediate facet); and (2) it models every response option, rather than merely right or wrong responses, so that each incorrect response uniquely contributes to discovering students’ facet status. Given that some combination of goal and intermediate facets may be impossible due to facet hierarchical relationships, a regularized expectation–maximization algorithm (REM) was developed for model estimation. A log-penalty was imposed on the mixing proportions to encourage sparsity. As a result, those impermissible latent classes had estimated mixing proportions equal to 0. A heuristic algorithm was proposed to infer a facet map from the estimated permissible classes. A simulation study was conducted to evaluate the performance of REM to recover facet model parameters and to identify permissible latent classes. A real data analysis was provided to show the feasibility of the model.

Type
Theory & Methods
Copyright
Copyright © 2024 The Author(s), under exclusive licence to The Psychometric Society, corrected publication

1. Introduction

The “Every Student Succeeds Act” emphasizes that “High-quality assessments are essential to effectively educating students, measuring progress, and promoting equity.” For such assessments to scaffold learning, they should provide actionable diagnostic information (Kluger and DeNisi, Reference Kluger and DeNisi1996; Pellegrino et al., Reference Pellegrino, Chudowsky and Glaser2001) such as learners’ mastery of learning goals and related intermediate understandings. According to theories of conceptual change and constructivism, accurate and reliable information about both aspects of student understanding (hereafter called “facets,” which represents psychological constructs such as knowledge states, conceptual understandings, etc.) is key to targeted instruction. Unfortunately, current assessment approaches have largely remained rooted in a summative paradigm, with scoring rubrics that predominantly gradate performance (i.e., right/wrong, or partial credit) (Dadey, Reference Dadey2017; Lanting, 2000; Underwood et al., Reference Underwood, Posey, Herrington, Carmel and Cooper2018).

A plethora of online learning platforms have emerged aiming for better-differentiated and more responsive instruction both in the classroom and at a distance. But the quality of a responsive, adaptive system hinges on an embedded assessment architecture providing feedback at a sufficiently fine-grain size. Assessments constructed under a formative paradigm and grounded in research on learner cognition would much better serve the needs of various stakeholders (e.g., teachers, students, parents). However, questions remain as to how to strengthen the diagnostic capabilities of such assessments for generating a more nuanced, yet robust profile of student learning that produces actionable information. An appropriate psychometric model is indispensable in this regard. In the past two decades, cognitive diagnostic modeling (CDM) has emerged as a flexible tool to extract diagnostic information from assessments and its application has been on the rise in various domains (Bradshaw and Templin, Reference Bradshaw and Templin2014; Morphew et al., Reference Morphew, Mestre, Kang, Chang and Fabry2018; Tjoe and de la Torre, Reference Tjoe and de la Torre2014). CDMs can provide a profile on every learner, pinpointing their mastery status on multiple attributes. However, existing CDMs have limited capability for modeling both goal and intermediate understandings, handling ultra-large number of attributes, and identifying intricate attribute relationships. This paper aims to develop (1) a psychometric model to identify learners’ mastery of goal attributes (i.e., accurate understanding, practice, etc.) and their intermediate thinking (i.e., intuitive, alternative, partial understanding, etc.) and (2) a statistical learning method to identify relations among attributes. Together these will enhance the diagnostic power of assessment and support targeted instructions (DiBello et al., Reference DiBello, Henson and Stout2015).

1.1. A Facets Approach to Instruction and Learning

In the realm of research on learner cognition and learning sciences, the knowledge in pieces (KiP) theory presumes that when faced with a new task, learners draw on relevant pieces of knowledge and reasoning to construct their understanding (diSessa, Reference diSessa, Amin and Levrini2017). Taking this “fragmented” stance (diSessa, Reference diSessa1993), the theory assumes that students’ ideas consist of many quasi-independent elements. Conceptual change (i.e., learning), therefore, involves learners picking and choosing the most productive ideas and refining them to create normative concepts (diSessa, 2014a) rather than abandoning entire ideas as a coherent whole. In this view, students’ naïve ideas are resources for learning to develop scientific understanding instead of roadblocks to conceptual change (Minstrell, Reference Minstrell, Resnick and Klopfer1989, 1991). The KiP perspective holds strong implications for instruction. First, KiP has a small enough grainsize of analysis to allow the tracking of individual learning so instructional design can benefit from formative feedback (diSessa, 2014b; Kapon and diSessa, Reference Kapon and diSessa2012). Second, KiP acknowledges complex contextuality, advocating that students need to be exposed to multiple contexts so that their knowledge can be developed in each. The knowledge a learner draws on may be productive in one context but problematic in another context. One example is the notion of “property” which physics learners commonly apply to both force and energy (e.g., “One cart colliding with a second stopped cart gives [suggesting transfer of property] some force of motion to the second cart.”) But whereas regarding force as a property of matter is problematic to fully understand forces and motion (FM), it is productive in the learning about energy.

Facets initially emerged from classroom research of learner’s conceptual understanding in physics. What started as a list of interesting alternative ideas and reasoning related to core physics concepts, was eventually organized, roughly ranked (in terms of degrees of instructional friction) and formalized into facet clusters. Most facets are paraphrases or slight abstractions of student expressions of their conceptual understanding as recorded in the classroom and through the research of others focused on conceptual learning in science. In brief, a “facet” is an individual piece or a construction of several pieces of knowledge or reasoning strategies that the learner uses to explain a phenomenon or solve a problem. Some pieces reflect accurate understanding (i.e., goal facets) while others (i.e., intermediate facets) may be alternative intuitive ideas, partial conceptions, or even useful ideas misapplied in a context. In some literature, intermediate facets are sometimes referred to as “misconceptions,” which originate from students’ prior learning and lived experience with the world or in the classroom (Smolleck and Hershberger, 2011; Thompson and Logue, Reference Thompson and Logue2006). For instance, in Newtonian mechanics, students’ intermediate understanding of force and motion is formed by their everyday experiences in the physical world (Clement, 1987). In this paper, we use “intermediate facet” and “misconceptions” interchangeably as they are treated the same in psychometric models, although the former carries the asset-based connotations. A Facets approach to assessment and instruction is a good starting point for the development of robust diagnostic assessment (Minstrell, 1991, Reference Minstrell, Duit, Goldberg and Niedderer1992, 2000; Minstrell et al., Reference Minstrell, Anderson, Li, Duschl and Bismark2015).

In contrast, the “attribute” in CDM is a generic term that represents psychological constructs including skills, knowledge states, cognitive processes, and rules. One subtle difference is that different condensation rules might be more appropriate for attributes vs. facets. For instance, the DINA model, the most frequently studied among CDMs, assumes a conjunctive rule, i.e., students need to master all the required attributes to answer an item correctly. This is most appropriate if considering attributes as skills and rules. For example, in order to respond correctly to the problem “4 5/7–1 4/7”, students need to master two skills: basic fraction subtraction, as well as separating out the whole number subtraction from fraction subtraction. On the other hand, facets represent small identifiable units of knowledge/reasoning that are building blocks of core knowledge within a content domain. As a result, an “additive” rather than conjunctive rule may be more appropriate, as it represents how different pieces/blocks of knowledge combine to build complete understanding of a particular concept.

Although mathematically we do not differentiate facets from attributes, they have different philosophical orientations and hence differ in their formative value for instruction. Attributes, primarily rooted in a summative paradigm of assessment, are useful for describing mastery or non-mastery of skills that constitute the highest stage of learning. However, they reveal little to nothing about the pathways (of conceptual development) leading to that mastery. Therefore, while attributes can reveal students’ static state of skill mastery to inform the general focus of subsequent instruction, they are less useful for pinpointing specific conceptual needs or for really promoting conceptual development. In contrast, facets stem from a formative paradigm, aimed at revealing the dynamics of conceptual learning and places of instructional opportunity. Moreover, recent studies have shown that both goal and intermediate understandings may coexist (Stavy et al., Reference Stavy, Babai, Tsamir, Tirosh, Lin and McRobbie2006), even after a conceptual change has occurred or when a student responds correctly to an assessment item (Foisy et al., Reference Foisy, Potvin, Riopel and Masson2015; Potvin et al., Reference Potvin, Masson, Lafortune and Cyr2015; Vosniadou and Verschaffel, Reference Vosniadou and Verschaffel2004). Being aware of the pieces of understanding that students are drawing together to explain a phenomenon or solve a problem is essential to providing instruction and feedback that are responsive to learners’ needs.

1.2. Psychometric Models for Scoring

Currently, facet-based assessments are scored by computing subscores, that is, tallying the number of times a student chooses an option that measures each facet. However, this approach not only ignores the inherent probabilistic nature of students responding to items but is also unreliable when a facet is measured by a small number of items (Cizek et al., Reference Cizek, Bunch and Koons2004; Haberman et al., Reference Haberman, Sinharay and Puhan2009; Tate, Reference Tate2004). Since measurement errors are rarely reported for subscores, the decisions made therefrom may be invalid if errors are large. Alternatively, ordered multiple-choice (OMC) scoring (Briggs et al., Reference Briggs, Alonzo, Schwab and Wilson2006; Hadenfeldt et al., Reference Hadenfeldt, Bernholt, Liu, Neumann and Parchmann2013) is a psychometrically better justified, flexible tool that overcomes these limitations. With OMC, each response option is linked to discrete, developmental levels of student understanding. Using the attribute hierarchy method (AHM), one can map out the facet mastery profile of each student with goal facets lined up along a linear hierarchy defined by the learning progression. However, the linear hierarchy assumption of OMC scoring is overly restrictive as it precludes the possibility that students may hold multiple intermediate facets simultaneously. In addition, certain intermediate facets may be at the same developmental levels such that strictly ordering them along a liner progression is too simplistic. In fact, research has shown that certain learning progressions, such as “forces and motion” in physics, are hard to map using the AHM framework (Alonzo and Steedle, Reference Alonzo and Steedle2009). Instead, CDM is a more versatile psychometric tool than AHM, allowing facets to connect more freely to reflect complex, and potentially nonlinear, learning pathways.

Table 1 compares CDM with two other popular modeling frameworks for diagnostic assessment: knowledge space theory, and the rule space model wherein AHM is located. Compared with these, CDM is not only capable of handling the probabilistic nature of human behaviors such as allowing slipping and guessing but is also immensely flexible to accommodate various cognitive processes (e.g., compensatory vs. conjunctive) and outcome spaces (e.g., dichotomous, polytomous, and nominal).

Table 1 Three different modeling frameworks for modeling learning.

Within the CDM framework, we propose a new model, namely, the diagnostic facet status model (DFSM), which will handle nominal outcome space to maintain the information encapsulated in each response option. Meanwhile, it will also provide a fuller cognitive profile that covers both goal and intermediate facets. Existing CDMs (see Table 2) can only accommodate one of these two flexibilities. Specifically, the widely used generalized deterministic input noisy ‘and’ gate model (GDINA, de la Torre, Reference de la Torre2011) and loglinear CDM model (Henson et al., Reference Henson, Templin and Willse2009) only handles goal facets and dichotomously (i.e., right or wrong) or polytomously scored responses (i.e., partial credits); the scaling individuals and classifying misconception (SICM, Bradshaw & Templin, Reference Bradshaw and Templin2014) model models option level responses and only intermediate facets; the multiple-choice DINA also models option level responses but only goal facets; the simultaneously identifying skills and misconceptions model (SISM, Kuo et al., 2018) models both goal and intermediate facets but at item level instead of option level hence it is only suitable for dichotomously scored items. One notable exception is the generalized diagnostic classification model for multiple choice option-based scoring (GDCM-MC, DiBello et al., Reference DiBello, Henson and Stout2015), which satisfies the needs of modeling goal and intermediate facets at item optional level. However, GDCM-MC requires a three-value coding scheme per response option, placing higher requirements on experts and hence a greater chance of human-coding errors. Furthermore, GDCM-MC is estimated by Bayesian Markov chain Monte Carlo (MCMC) algorithm, which does not scale well to large numbers of facets. DFSM requires less complicated coding of a Q-matrix and has more compact parameterization which affords use of the faster expectation–maximization (EM) algorithm. As detailed in the next section, the EM algorithm with a proper penalty term scales well to high-dimensional space.

Table 2 Unique features of DFSM compared to other CDMs.

A. model goal facets; B. model intermediate facets; C. scale well to large number of facets using efficient algorithm; D. model option level response (i.e., nominal response); and E. explore facet relationships.

Figure 1 An example item from “Forces and Motion” Unit.

2. Method

Before introducing DFSM, we first present an example item that motivates the proposal of DFSM. Figure 1a provides an example item from a topic of “Explaining constant speed” in the “Forces and Motion (FM)” unit in DiagnoserFootnote 1 (Thissen-Roe et al., Reference Thissen-Roe, Hunt and Minstrell2004), which is a set of online assessment and instruction tools designed to elicit and develop students thinking to arrive at deeper conceptual understanding. Figure 1 provides an example item from a topic of “Explaining constant speed” in the “Forces and Motion (FM)” unit. Figure 2a presents the definitions of facets from this FM topic, and Fig. 2b shows the mapping of item response options from the example item to the defined goal and intermediate facets. As shown, option C is the correct answer for this item, and this option measures both the goal facets but none of the intermediate facets. In contrast, all distractors measure one or multiple intermediate facets and some (or none) goal facets.

Figure 2 Illustration of facet definition and option-to-facet mapping.

2.1. Diagnostic Facet Status Model (DFSM)

DFSM models the log-odds of student i with a facet profile, α i , β i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left( \varvec{\alpha }_{i},\varvec{\beta }_{i} \right) $$\end{document} , choosing a distractor of item j against a correct answer, as follows:

(1) log P x ij = k P x ij = K = λ j , 0 + λ j , 1 h α i , q j , k g - h α i , q j , K g T + λ j , 2 h β i , q j , k p T , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \log {\left( \frac{P\left( x_{ij}=k \right) }{P\left( x_{ij}=K \right) } \right) =\lambda _{j,0}+\varvec{\lambda }_{j,1}}\left[ \textbf{h}\left( \varvec{\alpha }_{i},\textbf{q}_{j,k}^{g} \right) -\textbf{h}\left( \varvec{\alpha }_{i},\textbf{q}_{j,K}^{g} \right) \right] ^{T}+\varvec{\lambda }_{j,2}{\textbf{h}\left( \varvec{\beta }_{i},\textbf{q}_{j,k}^{p} \right) }^{T}, \end{aligned}$$\end{document}

where k = 1 , . . , K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k=1,..,K$$\end{document} denotes different response options and K is the correct response without loss of generality. λ j , 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{j,0}$$\end{document} is the intercept representing the logit of selecting a distractor over a correct response for someone with none of the facets. If λ j , 0 = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{j,0}=0$$\end{document} , then the probability of selecting the key is the same as selecting any other distractors, i.e., any response option has an equal chance to be selected. If λ j , 0 < 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{j,0}<0$$\end{document} , then the chance of selecting the key for students with none of the facets is higher than that of selecting any other distractors, and smaller λ j , 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{j,0}$$\end{document} leads to a higher probability of selecting the key, hence it implies that the item is easy. In contrast, if λ j , 0 > 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{j,0}>0$$\end{document} , then the probability of selecting the key is lower than that of selecting any of the distractors. λ j , 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\lambda }_{j,1}$$\end{document} and λ j , 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\lambda }_{j,2}$$\end{document} are row vectors of slopes quantifying the effect of having additional goal and intermediate facets on the logit. They are constrained to be positive. Larger slopes indicate better discrimination, i.e., the probability of endorsement pattern will differ more for students with and without the relevant facets. q j g \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{q}_{j}^{g}$$\end{document} and q j p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{q}_{j}^{p}$$\end{document} are the binary vectors denoting the mapping of response k on goal and intermediate facets, respectively. Both λ j , 1 T h α i , q j , k g \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\lambda }_{j,1}^{T}\textbf{h}\left( \varvec{\alpha }_{i},\textbf{q}_{j,k}^{g} \right) $$\end{document} and λ j , 2 T h β i , q j , k p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\lambda }_{j,2}^{T}\textbf{h}\left( \varvec{\beta }_{i},\textbf{q}_{j,k}^{p} \right) $$\end{document} are linear combinations of facet main effects and interactions.

In Eq. 1, if only the main effects are considered, the term λ j , 1 T h α i , q j , k g \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\lambda }_{j,1}^{T}\textbf{h}\left( \varvec{\alpha }_{i},\textbf{q}_{j,k}^{g} \right) $$\end{document} can be spelt out as

(2) m = 1 G λ j , 1 , m q j , k , m g α i , m \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sum \nolimits _{m=1}^G {\lambda _{j,1,m}q_{j,k,m}^{g}} \alpha _{i,m} \end{aligned}$$\end{document}

where G is the total number of goal facets, λ j , 1 , m \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{j,1,m}$$\end{document} is the main effect due to α m \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{m}$$\end{document} . Similar forms are used for λ j , 2 T h β i , q j , k p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\lambda }_{j,2}^{T}\textbf{h}\left( \varvec{\beta }_{i},\textbf{q}_{j,k}^{p} \right) $$\end{document} as well. We constrain all coefficients λ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda $$\end{document} to be non-negative, which means a student with more goal facets not matched with the response option will be less likely to choose a distractor over the correct option, whereas a student with more matched problematic facets will be more likely to choose a distractor. Comparing Eq. 2 to the LCDM model or the GDINA model, note that we do not include an intercept term. This is because the intercept is absorbed in λ j , 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{j,0}$$\end{document} in Eq. 1 automatically. Like in GDINA and LCDM, both λ j , 1 T h α i , q j , k g \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\lambda }_{j,1}^{T}\textbf{h}\left( \varvec{\alpha }_{i},\textbf{q}_{j,k}^{g} \right) $$\end{document} and λ j , 2 T h β i , q j , k p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\lambda }_{j,2}^{T}\textbf{h}\left( \varvec{\beta }_{i},\textbf{q}_{j,k}^{p} \right) $$\end{document} can also include interaction terms among facets as needed.

Given the logit form defined in Eq. 1, the conditional probability that a student i with a facet profile, α i , β i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left( \varvec{\alpha }_{i},\varvec{\beta }_{i} \right) $$\end{document} , chooses distractor option k for item j is

(3) P ( X ij = k | α i , β i , λ j ) = exp λ j , 0 + λ j , 1 h α i , q j , k g - h α i , q j , K g T + λ j , 2 h β i , q j , k p T 1 + k = 1 K - 1 exp λ j , 0 + λ j , 1 h α i , q j , k g - h α i , q j , K g T + λ j , 2 h β i , q j , k p T , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}{} & {} P(X_{ij}=k| \varvec{\alpha }_{i},\varvec{\beta }_{i},\varvec{\lambda }_{j})\nonumber \\{} & {} \quad =\frac{\textrm{exp}\left( \lambda _{j,0}{} {\textbf {{+}}}\varvec{\lambda }_{j,1}\left[ \textbf{h}\left( \varvec{\alpha }_{i},\textbf{q}_{j,k}^{g} \right) -\textbf{h}\left( \varvec{\alpha }_{i},\textbf{q}_{j,K}^{g} \right) \right] ^{T}{} {\textbf {{+}}}\varvec{\lambda }_{j,2}{\textbf{h}\left( \varvec{\beta }_{i},\textbf{q}_{j,k}^{p} \right) }^{T} \right) }{1+\sum \nolimits _{k=1}^{K-1} {\textrm{exp}\left( \lambda _{j,0}{} {\textbf {{+}}}\varvec{\lambda }_{j,1}\left[ \textbf{h}\left( \varvec{\alpha }_{i},\textbf{q}_{j,k}^{g} \right) -\textbf{h}\left( \varvec{\alpha }_{i},\textbf{q}_{j,K}^{g} \right) \right] ^{T}{} {\textbf {{+}}}\varvec{\lambda }_{j,2}{\textbf{h}\left( \varvec{\beta }_{i},\textbf{q}_{j,k}^{p} \right) }^{T} \right) } }, \end{aligned}$$\end{document}

whereas the conditional probability that a student i chooses a correct response option for item j is

(4) P ( X ij = K | α i , β i , λ j ) = 1 1 + k = 1 K - 1 exp λ j , 0 + λ j , 1 h α i , q j , k g - h α i , q j , K g T + λ j , 2 h β i , q j , k p T . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}{} & {} P(X_{ij}= K| \varvec{\alpha }_{i},\varvec{\beta }_{i},\varvec{\lambda }_{j})\nonumber \\{} & {} \quad =\frac{\textrm{1}}{1+\sum \nolimits _{k=1}^{K-1} {\textrm{exp}\left( \lambda _{j,0}{} {\textbf {{+}}}\varvec{\lambda }_{j,1}\left[ \textbf{h}\left( \varvec{\alpha }_{i},\textbf{q}_{j,k}^{g} \right) -\textbf{h}\left( \varvec{\alpha }_{i},\textbf{q}_{j,K}^{g} \right) \right] ^{T}{} {\textbf {{+}}}\varvec{\lambda }_{j,2}{\textbf{h}\left( \varvec{\beta }_{i},\textbf{q}_{j,k}^{p} \right) }^{T} \right) }}. \end{aligned}$$\end{document}

Remark

To provide an intuition about DFSM, let us consider a simple item example with 2 goal facets and 2 intermediate facets. For an item with 4 response options, the Q-matrix takes the form of [1, 1, 0, 0; 0, 0, 1, 0; 0, 0, 0, 1; 0, 0, 1, 1]. Each row refers to one response option and the first option is the correct response. The four columns refer to α 1 , α 2 , β 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{1},\alpha _{2}\textrm{,}\beta _{1}$$\end{document} , and β 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document} , respectively. Set λ j , 1 = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\lambda }_{j,1}=$$\end{document} (1.5, 2) and λ j , 2 = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\lambda }_{j,2}=$$\end{document} (0.75, 1.25), then the probability of responding to each option given a facet profile is presented in Table 14. Several conclusions can be drawn from Table 14. First, when the facet profile is (1, 1, 0, 0), implying that a student masters all required goal facets and none of the intermediate facets, the chance of selecting the key is the highest (i.e.,.909), whereas the chance of selecting any distractor is the same (i.e.,.030). The chances of selecting distractors are not always the same, they are the same in this case because none of the distractors measure any goal facets. Second, when a student masters all required goal facets and one (or several) intermediate facet, the chance of selecting the key drops understandably. For a student with a facet profile of ( α \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }$$\end{document} , β ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta )$$\end{document} , and for an item with a q-vector (i.e., row vector) denoted by q k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{q}_{k}$$\end{document} for option k. If ( α \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }$$\end{document} , β ) - q k 1 = ( α \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta )-\textbf{q}_{k_{1}}=(\varvec{\alpha }$$\end{document} , β ) - q k 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta )-\textbf{q}_{k_{2}}$$\end{document} , then the chance of selecting option k 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_{1}$$\end{document} and k 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_{2}$$\end{document} will be the same. Third, if the slope on a facet is high, then possessing that facet will lead to a higher chance of selecting an option that measures the same facet. For instance, in Table 14, because the slope on α 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{2}$$\end{document} is higher than the slope on α 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{1}$$\end{document} , the third facet profile yields a higher chance of selecting the key (i.e., .69) than the second facet profile. The same conclusion applies to intermediate facets as well.

Note that in Eq. 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim $$\end{document} 3, the λ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda $$\end{document} ’s are defined at item level instead of response option level, hence there is no subscript k. This choice is made because of two reasons: (1) the model is more parsimonious in general; (2) if for all items, each facet is measured by only one response option (that is, if one computes the column sums of a q-matrix for an item, the column sum vector only has 0’s and 1’s in it), then whether or not adding the subscript k will lead to the same model. We tried to fit the two versions of the DFSM model, one with item-level intercepts and slopes and the other with item-option level intercepts and slopes, on the real data set and noted that the latter model did not converge. However, we provide estimation code for both models on https://github.com/wang4066/HARLI1 and readers may consider this more flexible version if they have enough data to support the model estimation.

2.2. Regularized EM Algorithm

For model estimation, the main idea of the regularized EM will proceed as follows assuming α \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha } $$\end{document} and β \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta } $$\end{document} reside in the same space. Let λ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\lambda }}$$\end{document} denote the set of item parameters, the form of h \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{h}$$\end{document} is specified in advance (i.e., additive form for instance), and let π \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi $$\end{document} denote the vector of π α , β \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{\varvec{\alpha },\varvec{\beta }}$$\end{document} ’s representing the latent class mixing proportions. Then the marginal log-likelihood of ( λ , π ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\lambda },\varvec{\pi }})$$\end{document} given a response matrix Y is

(5) logL N ( λ , π | Y ) = i = 1 N log α , β 0 , 1 K π α , β j = 1 J k = 1 K θ j , k y ij = k , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {logL}_{N}({\varvec{\lambda }},{\varvec{\pi }} |\textbf{Y})=\sum \nolimits _{i=1}^N \log \left[ \sum \nolimits _{\left( \varvec{\alpha },\varvec{\beta } \right) \varvec{\in }\left\{ 0,1 \right\} ^{K}} {\pi _{\varvec{\alpha },\varvec{\beta }}\prod \nolimits _{j=1}^J \prod \nolimits _{k=1}^K \theta _{j,k}^{y_{ij}=k} } \right] , \end{aligned}$$\end{document}

where N is the sample size, and θ j , k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _{j,k}$$\end{document} is the conditional probability defined in Eq. 3 (or 4) which is a function of λ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\lambda }}$$\end{document} . A regularized EM (REM) algorithm with log-penalty on π l \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{l}$$\end{document} is proposed so that the mixing proportion associated with impermissible latent classes will shrink strictly to 0. In the REM algorithm, the objective function that will be maximized takes the following form

(6) arg max θ , π logL N ( λ , π | X ) + γ l 2 M log ρ N π l , subject\, to\, the\, constraint\, of l 2 M π l = 1 . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \text {arg}{max}_{{\varvec{\theta },\varvec{\pi }}}\left[ {logL}_{N}({\varvec{\lambda }},{\varvec{\pi }}| \textbf{X})+\gamma {{\sum \nolimits _{l}^{2}}^{M}} {{\textrm{log}}_{\rho _{N}}\left( \pi _{l} \right) } \right] , \text {subject\, to\, the\, constraint\, of} {{\sum \nolimits _{l}^{2}}^{M}} \pi _{l} \mathbf {=}\textrm{1}.\nonumber \\ \end{aligned}$$\end{document}

In Eq. 6, M is the total number of goal and problematic facets. log ρ N π l = log π l × I ( π l > ρ N ) + log ρ N × I ( π l ρ N ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textrm{log}}_{\rho _{N}}\left( \pi _{l} \right) =\log {\left( \pi _{l} \right) \times I(\pi _{l}>\rho _{N})}+\log {\left( \rho _{N} \right) \times I(\pi _{l}\le \rho _{N})}$$\end{document} where ρ N 1 N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho _{N}\approx \frac{1}{N}$$\end{document} is a small threshold parameter to suppress singularity issue of the log function (Gu and Xu, Reference Gu and Xu2019; Wang, 2021a). γ ϵ ( - , 0 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma \epsilon (-\infty ,0)$$\end{document} is a tuning parameter and smaller γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document} yields more sparsity in π \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\pi }$$\end{document} . At the convergence of the REM algorithm, π ^ l \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\pi }_{l}$$\end{document} will be compared to ρ N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho _{N}$$\end{document} and only classes with π ^ l > ρ N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\pi }_{l}>\rho _{N}$$\end{document} are retained as permissible classes. Details of the REM algorithm for the proposed DFSM model are given below.

2.3. E-step

In the E-step, one computes the conditional expectation of complete data log-likelihood, l N ( λ , π | X , α , β ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_{N}({\varvec{\lambda }},{\varvec{\pi }}| \mathbf {X,\alpha ,\beta })$$\end{document} , with respect to the posterior distributions of ( α i , β i ) i = 1 , . . . , N . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{\alpha }_{i},\varvec{\beta }_{i})\left( i=1,...,N \right) .$$\end{document} First, write the complete data log-likelihood as follows,

(7) logL N ( λ , π | X , α , β ) = i = 1 N j = 1 J k = 1 K j I x ij = k log θ j k , α i , β i + log f ( α i β i | π ) , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {logL}_{N}({\varvec{\lambda }},{\varvec{\pi }}| \mathbf {X,\alpha ,\beta })=\sum \nolimits _{i=1}^N \left\{ \sum \nolimits _{j=1}^J {\sum \nolimits _{k=1}^{K_j}} {I\left( x_{ij}=k \right) \log \theta _{jk,\varvec{\alpha }_{i}\textbf{,}\varvec{\beta }_{i}}} +\log f({\varvec{\alpha }_i}{\varvec{\beta }_i} |{\varvec{\pi }}) \right\} ,\nonumber \\ \end{aligned}$$\end{document}

where θ j k , α i , β i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _{jk,\varvec{\alpha }_{i}\textbf{,}\varvec{\beta }_{i}}$$\end{document} ’s are conditional probabilities defined in Eq. 3 or 4. Then its expectation is,

(8) E α , β [ logL N ( λ , π | X , α , β ) ] = i = 1 N α , β 0 , 1 M P i ( α , β ) r j = 1 J k = 1 K j I x ij = k log θ j k , α i , β i + log f ( α i β i | π ) = j = 1 J i = 1 N α , β 0 , 1 M P i ( α , β ) r k = 1 K j I x ij = k log θ j k , α i , β i + i = 1 N α , β 0 , 1 M P i ( α , β ) r log π l \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}{} & {} E_{\varvec{\alpha },\varvec{\beta }}[{logL}_{N}({\varvec{\lambda }},{\varvec{\pi }}|\textbf{X},\varvec{\alpha },\varvec{\beta })] \nonumber \\{} & {} \quad =\sum \nolimits _{i=1}^N \left\{ \sum \nolimits _{\left( \varvec{\alpha },\varvec{\beta } \right) \varvec{\in }\left\{ 0,1 \right\} ^{M}} P_{i(\varvec{\alpha },\varvec{\beta } )}^{r} \left[ \! \sum \nolimits _{j=1}^J {\sum \nolimits _{k=1}^{K_j}} {I\left( x_{ij}=k \right) \log \theta _{jk,\varvec{\alpha }_{i}\textbf{,}\varvec{\beta }_{i}}} \!+\!\log {f({\varvec{\alpha }_i}{\varvec{\beta }_i} |{\varvec{\pi )}}} \!\right] \!\right\} \nonumber \\{} & {} \quad =\sum \nolimits _{j=1}^J \left[ \sum \nolimits _{i=1}^N {\sum \nolimits _{\left( \varvec{\alpha ,\beta } \right) \varvec{\in }\left\{ 0,1 \right\} ^{M}} P_{i(\varvec{\alpha },\varvec{\beta } )}^{r} {\sum \nolimits _{k=1}^{K_j}} {I\left( x_{ij}=k \right) \log \theta _{jk,\varvec{\alpha }_{i}\textbf{,}\varvec{\beta }_{i}}} } \right] \nonumber \\{} & {} \quad +\sum \nolimits _{i=1}^N \left[ \sum \nolimits _{\left( \varvec{\alpha },\varvec{\beta } \right) \varvec{\in }\left\{ 0,1 \right\} ^{M}} P_{i(\varvec{\alpha },\varvec{\beta } )}^{r} \log \pi _{l} \right] \end{aligned}$$\end{document}

where P i ( α , β ) r P i α , β l r = P ( α i = α l , β i = β l | λ r , π r ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P_{i(\varvec{\alpha },\varvec{\beta } )}^{r}\equiv P_{i\left( \varvec{\alpha },\varvec{\beta } \right) \varvec{\in }l}^{r}=P(\varvec{\alpha }_{i}=\varvec{\alpha }_{l}\textrm{,}\varvec{\beta }_{i}=\varvec{\beta }_{l}\vert {\varvec{\lambda }}^{r},{\varvec{\pi }}^{r})$$\end{document} is the posterior distribution of ( α l , β l ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{\alpha }_{l}\textrm{,}\varvec{\beta }_{l})$$\end{document} for person i given the parameter estimates from the rth iteration.

Let H j r i = 1 N α , β 0 , 1 M P i ( α , β ) r k = 1 K j I x ij = k log θ j k , α i , β i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_{j}^{r}\equiv \sum \nolimits _{i=1}^N {\sum \nolimits _{\left( \varvec{\alpha },\varvec{\beta } \right) \varvec{\in }\left\{ 0,1 \right\} ^{M}} P_{i(\varvec{\alpha },\varvec{\beta } )}^{r} {\sum \nolimits _{k=1}^{K_j}} {I\left( x_{ij}=k \right) \log \theta _{jk,\varvec{\alpha }_{i}\textbf{,}\varvec{\beta }_{i}}} } $$\end{document} for notational simplicity hereafter, and hence Eq. 8 is simplified as follows

(9) E α , β [ logL N ( λ , π | X , α , β ) ] = j = 1 J H j r + i = 1 N α , β 0 , 1 M P i ( α , β ) r log π l . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} E_{\varvec{\alpha },\varvec{\beta }} [{logL}_{N}({\varvec{\lambda }},{\varvec{\pi }}| \mathbf {X,\alpha ,\beta })]=\sum \nolimits _{j=1}^J H_{j}^{r} + \quad \sum \nolimits _{i=1}^N \left[ \sum \nolimits _{\left( \varvec{\alpha },\varvec{\beta } \right) \varvec{\in }\left\{ 0,1 \right\} ^{M}} P_{i(\varvec{\alpha },\varvec{\beta } )}^{r} \log \pi _{l} \right] . \end{aligned}$$\end{document}

Given that the last term in Eq. 9 is unrelated to λ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\lambda }}$$\end{document} , it will be considered separately.

2.4. M-step

In the M-step, we will maximize E α , β [ l N ( λ , π | X , α , β ) ] + γ l 2 M log ρ N π l \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E_{\varvec{\alpha },\varvec{\beta }}[l_{N}({\varvec{\lambda }},{\varvec{\pi }}| \mathbf {X,\alpha ,\beta })]+\gamma {{\sum \nolimits _{l}^{2}}^{M}} {{\textrm{log}}_{\rho _{N}}\left( \pi _{l} \right) } $$\end{document} with respect to θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }$$\end{document} and π \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\pi }$$\end{document} , respectively. Specifically, let P i ( α , β ) r P il r \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P_{i(\varvec{\alpha },\varvec{\beta } )}^{r}\equiv P_{il}^{r}$$\end{document} for notational simplicity, the objective function of π l \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{l}$$\end{document} that needs to be maximized becomes,

(10) logL N π = i = 1 N l 2 M P il r log π l + γ l 2 M log ρ N π l + s ( 1 - l 2 M π l ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {logL}_{N}\left( {\varvec{\pi }} \right) =\sum \nolimits _{i=1}^N \left[ {{\sum \nolimits _{l}^{2}}^{M}} P_{il}^{r} \log \pi _{l} \right] +\gamma {{\sum \nolimits _{l}^{2}}^{M}} {{\textrm{log}}_{\rho _{N}}\left( \pi _{l} \right) } \mathbf {+}s(1-{{\sum \nolimits _{l}^{2}}^{M}} \pi _{l} ) \end{aligned}$$\end{document}

where s is a Lagrange multiplier, and the last term in Eq. 10 reflects the constraint that l 2 M π l = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\sum \nolimits _{l}^{2}}^{M}} \pi _{l} =1$$\end{document} . Then

logL N π π l = γ + i = 1 N P il r π l - s = 0 , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{\partial {logL}_{N}\left( {\varvec{\pi }} \right) }{\partial \pi _{l}}=\frac{\gamma +\sum \nolimits _{i=1}^N P_{il}^{r} }{\pi _{l}}-s=0, \end{aligned}$$\end{document}

such that π l = γ + i = 1 N P il r s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{l}=\frac{\gamma +\sum \nolimits _{i=1}^N P_{il}^{r} }{s}$$\end{document} . The key is to derive the term s. Since l 2 M π l = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\sum \nolimits _{l}^{2}}^{M}} \pi _{l} =1$$\end{document} , we have l 2 M γ + i = 1 N P il r s = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\sum \nolimits _{l}^{2}}^{M}} \left( \frac{\gamma +\sum \nolimits _{i=1}^N P_{il}^{r} }{s} \right) =1$$\end{document} , and hence s = l = 1 2 M γ + i = 1 N P il r \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s={{\sum \nolimits _{l=1}^{2}}^{M}} \left( \gamma +\sum \nolimits _{i=1}^N P_{il}^{r} \right) $$\end{document} . Based on the derivation, π l \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{l}$$\end{document} will be updated using the following two steps.

  1. 1. Let ν l ( r ) = max c , γ + i = 1 N P il r \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _{l}^{(r)}=\max \left\{ c,\gamma +\sum \nolimits _{i=1}^N P_{il}^{r} \right\} $$\end{document} , where c > 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c>0$$\end{document} is a pre-specified value and P il r P i α , β r \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P_{il}^{r}\equiv P_{i\left( \varvec{\alpha },\varvec{\beta } \right) }^{r}$$\end{document} where α , β \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left( \varvec{\alpha },\varvec{\beta } \right) $$\end{document} belongs to the lth group.

  2. 2. π l ( r ) = ν l ( r ) l = 1 L ν l ( r ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{l}^{(r)}=\frac{\nu _{l}^{(r)}}{\sum \nolimits _{l=1}^L \nu _{l}^{(r)} }$$\end{document} , where L denotes the total number of latent classes.

Here, “c” was chosen to be a very small value, in our case, c = . 00001 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c=.00001$$\end{document} . Like Gu and Xu (Reference Gu and Xu2019) as well as Wang (2021b), “c” was prespecified and included to avoid negative counts. Note that γ < 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma <0$$\end{document} , depending on the value of γ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document} , it is likely that γ + i = 1 N P il r < 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma +\sum \nolimits _{i=1}^N P_{il}^{r} < 0$$\end{document} . This is impermissible as Δ l ( r ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta _{l}^{(r)}$$\end{document} denotes the expected number of people in latent class l which cannot be negative. So, adding “c” is a numeric trick to make the algorithm stable, it will not affect the results as it is set to be very small. Each item’s parameters will be updated by maximizing H j r \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_{j}^{r}$$\end{document} separately. The full REM algorithm is presented in Table 3.

Table 3 The REM algorithm for DFSM.

2.5. Model Identifiability

The DFSM is identified when the model parameters can be uniquely estimated given the observed responses. The identifiability of DFSM can be studied using the conclusions established by Liu and Culpepper (Reference Liu and Culpepper2023). They derived the strict and generic identifiability conditions for restricted latent class models (RLCM) for nominal response data. Because DFSM is a special CDM for nominal responses, their conclusions can be generalized to our context. Specifically, we first need to construct a Δ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta $$\end{document} -matrix for an item from its Q-matrix. For example, for an item j with 4 total response options, i.e., K j = 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K_{j}=3$$\end{document} , and assume the total number of facets M = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$=$$\end{document} 3 (e.g., 1 goal facet and 2 intermediate facet) for simplicity, and presume the Q-matrix takes the form (note that the first option is the key), Q j = 1 0 0 0 1 0 0 0 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{Q}_{j}=\left[ {\begin{array}{*{20}c} 1 &{}\quad 0 &{}\quad 0\\ 0 &{}\quad 1 &{}\quad 0\\ 0 &{}\quad 0 &{}\quad 1\\ \end{array} } \right] $$\end{document} . Different from Liu and Culpepper (Reference Liu and Culpepper2023), we have a term h α i , q j , k g - h α i , q j , K g \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left[ \textbf{h}\left( \varvec{\alpha }_{i},\textbf{q}_{j,k}^{g} \right) -\textbf{h}\left( \varvec{\alpha }_{i},\textbf{q}_{j,K}^{g} \right) \right] $$\end{document} in DFSM, hence, we need to derive a Q j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{Q}_{j}^{*}$$\end{document} by subtracting row 1 from row 2 and row 3, respectively, before obtaining the Δ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta $$\end{document} -matrix. That is,

Q j = 0 0 0 - 1 1 0 - 1 0 1 Δ j = 0 0 0 0 0 0 0 0 1 - 1 1 0 0 0 0 0 1 - 1 0 1 0 0 0 0 λ j = 0 0 0 0 0 0 0 0 λ j , 0 - λ j , 1 , 1 λ j , 2 , 1 0 0 0 0 0 λ j , 0 - λ j , 1 , 1 0 λ j , 2 , 2 0 0 0 0 . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \textbf{Q}_{j}^{*}= & {} \left[ {\begin{array}{*{20}c} 0 &{}\quad 0 &{}\quad 0\\ -1 &{}\quad 1 &{}\quad 0\\ -1 &{}\quad 0 &{}\quad 1\\ \end{array} } \right] \rightarrow {\varvec{\Delta }}_{j}=\left[ {\begin{array}{*{20}c} 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 0\\ 1 &{}\quad -1 &{}\quad 1 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 0\\ 1 &{}\quad -1 &{}\quad 0 &{}\quad 1 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 0\\ \end{array} } \right] \\{} & {} \quad \rightarrow {\varvec{\lambda }}_{j}=\left[ {\begin{array}{*{20}c} 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 0\\ \lambda _{j,0} &{}\quad -\lambda _{j,1,1} &{}\quad \lambda _{j,2,1} &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 0\\ \lambda _{j,0} &{}\quad -\lambda _{j,1,1} &{}\quad 0 &{}\quad \lambda _{j,2,2} &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 0\\ \end{array} } \right] . \end{aligned}$$\end{document}

Here, Δ j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Delta }}_{j}$$\end{document} is a K j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K_{j}$$\end{document} -by- 2 M \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{2}^{\textrm{M}}$$\end{document} matrix, and as shown, the first row refers to the first option which is the “reference” category in the nominal response DFSM model. The last two rows refer to the two distractors. Each column refers to one possible facet profile, and the first column refers to the intercept. Consistent with previous notation, the three subscripts in λ j , 1 , 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{j,1,1}$$\end{document} refers to item j, goal facet, and α 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{1}$$\end{document} , respectively. Similarly, the subscripts in λ j , 2 , 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{j,2,2}$$\end{document} refer to item j, intermediate facet, and β 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document} , respectively. Since we only consider “main” effects of facets on item responses, column 4 to 8 are all 0’s in both Δ j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Delta }}_{j}$$\end{document} and λ j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\lambda }}_{j}$$\end{document} .

According to Liu and Culpepper (Reference Liu and Culpepper2023), the nominal RLCM is generically identifiable if the sparse three-dimensional array Δ J × K j × 2 M = Δ 1 Δ 2 Δ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left( {\varvec{\Delta }} \right) _{J\times \sum {K_{j}\times 2^{M}} }=\left[ {\begin{array}{*{20}c} {\varvec{\Delta }}^{1}\\ {\varvec{\Delta }}^{2}\\ {\varvec{\Delta }}^{'}\\ \end{array} } \right] $$\end{document} satisfies the following two conditions:

(a) For j = 1 , . . . , M \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,...,M$$\end{document} , Δ j 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\Delta }_{j}^{1}$$\end{document} and Δ j 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\Delta }_{j}^{2}$$\end{document} meet the structure of

Δ j = 0 0 0 0 0 δ j , j , 1 δ j , j , K j - 1 K j × 2 M \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\varvec{\Delta }}_{j}=\left[ {\begin{array}{*{20}c} 0 &{}\quad \cdots &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad \cdots &{}\quad 0\\ *&{}\quad \cdots &{}\quad *&{}\quad \delta _{j,j,1} &{}\quad *&{}\quad \cdots &{}\quad *\\ \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots \\ *&{}\quad \cdots &{}\quad *&{}\quad \delta _{j,j,K_{j}-1} &{}\quad *&{}\quad \cdots &{}\quad *\\ \end{array} } \right] _{K_{j}\times 2^{M}} \end{aligned}$$\end{document}

where the first row refers to the reference option, which, in the case of DFSM, refers to the answer key, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$*$$\end{document} can be either 0 or 1, and k = 1 K j - 1 δ j , j , k > 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum \nolimits _{k=1}^{K_{j}-1} \delta _{j,j,k} >1$$\end{document} . The latter constraint implies that for item j, at least one of the response options must measure facet j. Or in other words, this condition implies that, in the test, for each facet m, there are at least 2 items and each item has at least one response option measuring m. Hence, the test length is at least 2 × M \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2\times M$$\end{document} .

(b) For every m = 1 , . . . , M \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m=1,...,M$$\end{document} ,     Δ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\Delta }}^{'} $$\end{document} satisfies that there exists an item j > 2 M \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j>2\,M$$\end{document} that k = 1 K j - 1 δ j , m , k 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum \nolimits _{k=1}^{K_{j}-1} \delta _{j,m,k} \ge 1$$\end{document} . This implies that there is at least one item in the last J - 2 M \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J-2M$$\end{document} items that has a response option that measures facet m.

Note that the generic identification condition described above is directly adopted from Theorem 2 of Liu and Culpepper (Reference Liu and Culpepper2023). Although in the current DFSM model set up, the loadings and slopes are at item (instead of item option) level (i.e., see the λ j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\lambda }}_{j}$$\end{document} matrix), because every response option of an item will measure different facets, the rank of the λ j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\lambda }}_{j}$$\end{document} matrix will be the same regardless of whether the parameters are defined at item level or item option level. Furthermore, the proof of Theorem 2 only requires p j 0 p j 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{p}_{j0}\ne \textbf{p}_{j1}$$\end{document} for j = 1 , . . . , M \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,...,M$$\end{document} (see Equation A15 in their paper), where p j 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{p}_{j0}$$\end{document} is the two-dimensional vector including the probability of selecting the key for a given facet profile and one minus that probability (i.e., sum of two elements in p j 0 = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{p}_{j0} =$$\end{document} 1). p j 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{p}_{j0}$$\end{document} is the two-dimensional vector including the probability of selecting a distractor for a given facet profile and one minus that probability. The inequality p j 0 p j 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{p}_{j0}\ne \textbf{p}_{j1}$$\end{document} easily holds when the above condition (a) is satisfied. In fact, using the example item presented in the earlier remark, as shown in Table 14, the probability of selecting the key is always different from the probability of selecting the distractor for all possible facet profiles.

2.6. Facet Map

Another instructionally formative piece of information generated by DFSM is a comprehensive facet map, which includes both directional and non-directional links between facets. A directional link implies a hierarchical structure (Dahlgren et al., 2006; Simon and Tzur, Reference Simon and Tzur2004), reflecting theories about students’ learning, where a piece of knowledge learned earlier serves as a pre-requisite for the more advanced knowledge, forming a so-called learning trajectory (LT, Corcoran et al., 2009; Daro et al., 2011; Templin & Bradshaw, Reference Templin and Bradshaw2014). A LT will allow for optimizing instructional design by offering theory-driven sequences of cognitive progression. In contrast, facets joined by a non-directional link would result in a “conjoined facet” which represents crosscutting thematic ideas. Identifying conjoined facets could provide teachers insight into persistent ideas and reasoning that a student is applying across multiple learning units (with variable success), and which may require specific attention. For example, a dominant physics learner conception is that “motion implies an on-going force of motion” (see Option A in the item in Fig. 1), which means students are treating force as a property of the object. Perhaps if net force on an object were taught as a mechanism conjoined with a resulting change in the property of momentum or kinetic energy of the object, student learning would be more naturally and productively facilitated.

Both facet hierarchy and conjoined facet can be inferred from the permissible facet profiles. Specifically, a hierarchical relationship exists between a pair of facets when mastery of a facet m is required prior to mastery of another facet m \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m^{'}$$\end{document} . That is, for person i, α im = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{im}=1$$\end{document} is a necessary but not sufficient condition for α i m = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{im^{'}}=1$$\end{document} . As a result, for the combination of ( α ik , α i k ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\alpha _{ik},\alpha _{ik^{'}})$$\end{document} , instead of having 4 possible combinations, only 3 would be possible, i.e., (0, 0), (1,0), and (1, 1). For conjoined facets (i.e., they are connected topics and should be treated as a unity), only 2 would be possible, i.e., (0, 0) and (1, 1). That is, a student will either master both facets or master neither of the facets, resulting in a non-directional link between them. The non-directional link implies that both facets in the conjoined set are at the same level, none is pre-requisite of the other.

We propose a deductive algorithm to infer the links among facets and eventually to construct a facet map. First, based on the significant latent classes, we will check pairs of facets separately to deduce the feasible profiles based on them and put a directional arrow (if any) between them, resulting in a directed acyclic graph (DAG). The DAG is denoted as G(VE) where V is a set of nodes (i.e., facets) and E is a set of directed edges that imply the hierarchical relationships among facets. Then we will perform transitive reductions to construct a simple structure. A transitive reduction is a subgraph with fewest edges that maintain the same reachability (i.e., directed connection) as G (Chen and Wang, Reference Chen and Wang2023).

Figure 3 Illustration of the deductive algorithm to construct DAG and added strength of the relationship.

When we put both goal and intermediate facets in the same map, we first need to “reverse” the profile of intermediate facets before using the same algorithm to determine G, that is, coding 0 as 1 and 1 as 0, respectively. This is because the possession of an intermediate facet implies lack of relevant knowledge. Using α 1 , β 1 β 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{1},\beta _{1}\sim \beta _{3}$$\end{document} in Fig. 2 as an example, Fig. 3 illustrates the proposed deductive algorithm for constructing DAG. It follows three steps:

  1. (1) Step 1: For the final N-by-K (total number of goal and intermediate facets) estimated facet profile matrix, reverse code the columns related β \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta }$$\end{document} ’s.

  2. (2) Step 2: For each pair of facets, check the number of permissible patterns. If such a number is smaller than 4, that means there is a potential link between the pair. When the number is 2, it implies the existence of a conjoined facet, i.e., a non-directional arrow. When the number is 3, it implies a hierarchy. The direction of the arrows depends on the specific permissible pattern. In the example in Fig. 3, the permissible patterns between ( β 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document} , α 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{1})$$\end{document} are [0, 0], [1, 0], and [1, 1], resulting in an arrow from β 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document} to α 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{1}$$\end{document} , denoting that β 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document} is pre-requisite of α 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{1}$$\end{document} .

  3. (3) Step 3: Transitive reduction. We only want to include direct links. If a facet has two or more arrows pointing toward it, it implies that there may exist redundancy. For instance, in Fig. 3, β 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document} connects to β 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document} through two routes: β 1 β 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}\rightarrow \beta _{2}$$\end{document} and β 1 α 1 β 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}\rightarrow \alpha _{1}\rightarrow \beta _{2}$$\end{document} . As a result, the direct connection between β 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document} and β 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document} can be eliminated to make DAG more parsimonious without changing its meaning. With a small number of facets, this step can be accomplished by inspecting the full DAG. When the number of total facets is large, we can construct an adjacency matrix of the DAG, A, based on the pair-wise relationships among facets. The A-matrix for the example in Fig. 3 will the following form with the order of facet ( β 1 , α 1 , β 2 , β 3 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1},\alpha _{1},\beta _{2},\beta _{3})$$\end{document} :

    A = 0 1 1 1 0 0 1 1 0 0 0 0 0 0 0 0 (after\, transitive\, reduction) A ~ = 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \textbf{A}=\left[ {\begin{array}{*{20}c} 0 &{}\quad 1 &{}\quad 1 &{}\quad 1\\ 0 &{}\quad 0 &{}\quad 1 &{}\quad 1\\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 0\\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 0\\ \end{array} } \right] \rightarrow \text {(after\, transitive\, reduction)} \tilde{\textbf{A}}=\left[ {\begin{array}{*{20}c} 0 &{}\quad 1 &{}\quad 0 &{}\quad 0\\ 0 &{}\quad 0 &{}\quad 1 &{}\quad 1\\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 0\\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 0\\ \end{array} } \right] . \end{aligned}$$\end{document}

That is, for any pair of facets denoted by ( m 1 , m 2 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m_{1},m_{2})$$\end{document} , if ( A 2 ) m 1 , m 2 > 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${(A^{2})}_{m_{1},m_{2}}>0$$\end{document} then we set A m 1 , m 2 = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_{m_{1},m_{2}}=0$$\end{document} , resulting in the second matrix above. The full, parsimonious DAG can be constructed based on A ~ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{A}$$\end{document} .

Once the DAG is constructed, we propose to further infer the strength of the arrow based on the estimated mixing proportions, as illustrated in the probabilities marked in red in Fig. 3. The strength value will be a probability measure between 0 and 1, with larger values indicating stronger relationships. As shown, such values are simply the proportion of students who possess both facets (again with intermediate facets reverse coded). This is intuitive because if a high proportion of students simultaneously possess both facets rather than merely the prerequisite, there is a strong pathway between them, i.e., say from β 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document} to α 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{1}$$\end{document} , for a student who has matured from the intermediate facet β 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document} , they are highly likely to master α 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{1}$$\end{document} as well.Footnote 2 A conjoined facet will be formed by adding a bidirectional arrow between two facets, which happens when 2 out of 4 latent profiles are permissible, implying the co-occurrence of two facets.

3. Simulation Study

A simulation study was conducted to evaluate the REM algorithm for DFSM parameter recovery. The sample size was fixed at 4,000 and we intentionally selected the large sample size due to model complexity. Test length was fixed at 27, and each item had 4 response options. It was assumed that the test measures 3 goal facets and 5 intermediate facets just to be consistent with our real data example. The Q-matrix was constructed as follows:

  • Item 1–9: key requires 1 goal facet; none of the distractors require any goal facets. Each distractor only measures 1 intermediate fact.

  • Item 10–18: key requires 2 goal facets; 1 distractor requires 1 goal facet and 1intermediate facets. The rest of the distractors only measure 1 intermediate facet and none of the goal facet.

  • Item 19–27: key requires 2 goal facets; 1 distractor requires 1 goal facet and 1 intermediate facts; 1 distract requires 2 intermediate facets and 0 goal facet, and 1 distractor only requires 1 intermediate facet.

As a result, the number of items measuring each attribute were 21, 21, 21, 18, 18, 18, 18, and 18, respectively, for the goal facets (the first 3) and intermediate facets (the last five). λ j , 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{j,0}$$\end{document} was generated from Uniform (-1, 1), both λ j , 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{j,1}$$\end{document} and λ j , 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{j,2}$$\end{document} were generated from Uniform (1.75, 2.25), resulting in good informative items (de la Torre, Reference de la Torre2011). We only considered main effects in both h α i , q j , k g \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{h}\left( \varvec{\alpha }_{i},\textbf{q}_{j,k}^{g} \right) $$\end{document} and h β i , q j , k p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{h}\left( \varvec{\beta }_{i},\textbf{q}_{j,k}^{p} \right) $$\end{document} . Twenty-five replications were conducted per condition. Different sets of item parameters were simulated per replication.

We manipulated facet relationship, as follows:

  • Moderate correlation among facets: Assuming the attributes were generated from a higher-order DINA model (de la Torre and Douglas, Reference de la Torre and Douglas2004), i.e., P α k | θ = 1 1 + e - 1.7 γ 1 k ( θ - γ 0 k ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P\left( \alpha _{k}\vert \theta \right) =\frac{1}{1+e^{-1.7\gamma _{1k}(\theta -\gamma _{0k})}}$$\end{document} , where θ N ( 0 , 1 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta \sim N(0,1)$$\end{document} , γ 0 k U ( - 1 , 0.5 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{0k}\sim U(-1,0.5)$$\end{document} ,     γ 1 k = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{1k}=$$\end{document} [1.35, 1.5, 1.65, - \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document} 1.35, - \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document} 1.425, - \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document} 1.5, - \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document} 1.575, - \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document} 1.65]. Note that γ 1 k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{1k}$$\end{document} ’s are positive on the goal facets, but they are negative on the intermediate facet. This reflects the intuition that higher overall ability θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} is associated high chance of mastery of goal facets and lower chance of possession of intermediate facets. The parameters were chosen to induce 30% sparsity, which means out of 2 8 = 256 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^{8}=256$$\end{document} all possible patterns, there were about 180 permissible patterns. The absolute correlation among attributes were in the range of 0.4~0.5.

  • Independent facets: We simulated all profiles from a uniform distribution, i.e., assume every facet profile among all 256 possible profiles are equally likely to occur. Then we imposed a constraint to ensure that those who have more goals facets will be less likely to have many intermediate facets, i.e., if k = 1 3 α k 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum \nolimits _{k=1}^3 \alpha _{k} \ge 2$$\end{document} then k = 1 5 β k 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum \nolimits _{k=1}^5 \beta _{k} \le 2$$\end{document} . This manipulation results in 25% sparsity and almost zero correlation among facets.

  • Hierarchical facets: We simulated profiles based on a hypothetical theory-driven hierarchical structure, as shown in Fig. 4a. This structure was chosen because it contains divergent (i.e., α 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{1}$$\end{document} is the pre-requisite of both β 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document} and β 3 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{3})$$\end{document} , convergent (i.e., β 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document} requires both α 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{1}$$\end{document} and β 3 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{3})$$\end{document} , and independent (i.e., α 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{3}$$\end{document} is independent of the rest) structures (Liu et al., Reference Liu, Huggins-Manley and Bradshaw2017; Wang, 2021b; Wang and Lu, Reference Wang and Lu2021) in one map. Therefore, it is, to some extent, comprehensive to cover popular attribute relationships in practice. As a result, there were only 36 permissible patterns (listed in Table 15 in the Appendix), and the correlation among attributes are shown in Fig. 4b.

Figure 4 Hierarchical facet structure in the simulation study.

Evaluation criteria include the mean bias, mean relative bias, and mean absolute bias for all item parameters. They were averaged across 25 replications. For person’s facet profiles, we computed goal facet recovery rate, i.e., i = 1 N k = 1 K I α ik = α ^ ik N × K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum \nolimits _{i=1}^N \sum \nolimits _{k=1}^K \frac{\textrm{I}_{\left| \alpha _{ik}=\hat{\alpha }_{ik} \right| }}{N\times K} $$\end{document} . Similarly, we computed the pattern recovery for intermediate facets as well as the combined profile. For population facet relationship, we compared the estimated sparsity structure with the true sparsity structure and computed (1) true positive rate (TPR) = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$=$$\end{document} Correct recovery of true permissible patterns, (2) 1-FDR (False discovery rate) = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$=$$\end{document} proportion of detected permissible patterns that are true patterns, and (3) true negative rate (TNR) = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$=$$\end{document} correct recovery of true impermissible patterns. We aimed for these three rates to be as close to 1 as possible.

Table 4 presents the recovery of item parameters. Values in the parathesis are standard deviations across items and across replications. As shown, all item parameters appear to recover well, and there is no appreciable difference among the three facet relational conditions, or among different types of item parameters. Of note, the relative bias of λ j ,0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{j\textrm{,0}}$$\end{document} is high (so is its standard deviation across replications) which is merely because some of the true λ j ,0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{j\textrm{,0}}$$\end{document} ’s is close to 0, hence when they appear in the denominator to compute relative bias, the relative bias shoots up.

Table 5 presents the recovery of facets and facet profiles. It is unsurprising that recovery at facet level is much higher than recovery at profile level. Moreover, recovery at overall profile level is the hardest, followed by recovery of the five-dimensional intermediate facet profile, whereas recovery of the three-dimensional goal facet profile is the most accurate. In addition, under the hierarchical facet condition, the facet and facet profile recovery are the best, whereas their recoveries are the worst under the independent facet condition. This is much expected because when the facets display a hierarchical structure, only 36 out of 256 facet profiles are permissible, which largely reduces the space for searching the optimal facet profile for each student. When the facets are independent, there is not much shared information among different facets, making the recovery of the entire profile hardest.

Table 4 Recovery of item parameters (values in the parathesis are standard deviations across items and across replications).

Table 5 Recovery of facets and facet profiles.

Table 6 presents the recovery of facet sparsity structure. First to notice is under both hierarchical and independent conditions, all TPR, TNR and 1-FDR are close to 1, implying that the sparsity structure is mostly accurately recovered under these two conditions. The hierarchical condition slightly outperforms the independent condition simply because it has less density and therefore easier to locate true permissible profiles. Second, where TNR and 1-FDR are close to 1 under the moderately correlated facets condition, whereas the average TPR is only.526. This means that the algorithm tends to generate a sparser solution, thereby missing about 48% of the true permissible pattern. This is not too surprising as it is well known in Lasso regression that when the predictors are correlated, correct identification of significant predictors is harder, especially when true density is high (i.e., less sparsity).

Table 6 Recovery of facet sparsity structure.

4. Real Data Example

The feasibility of DFSM was demonstrated using data from Diagnoser assessment. The assessment contains multiple-choice items with diagnostically rich response options, each mapped onto well-defined facets (Minstrell et al., Reference Minstrell, Anderson, Li, Duschl and Bismark2015). The Diagnoser items have gone through various psychometric analyses to validate interpretations of learning progression level diagnosis (Chattergoon, 2020; Steedle and Shavelson, Reference Steedle and Shavelson2009), although our analysis would be the first to use DFSM to construct facet map. Nine items from the unit of “Identification of forces” were used, and the sample size was N = 2 , 729 . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=2,729.$$\end{document} Every item has 4 response options except item #3 which has 5 options. The Q-matrix for these 9 items is presented in the Appendix. Overall, the number of response options measuring each facet (i.e., the first 3 are goal facets and the remaining 5 are intermediate facets) is 19, 16, 6, 6, 5, 4, 4, and 8. As shown, the Q-matrix is sparse, and only the first two goal facets are measured by more than half of the response options. The definition of the facets is presented in Table 7. Table 8 presents the proportion of endorsement of each response option per item, and the key per item is bolded. As shown in Table 8, for 7 out of 9 items (except for items 4 and 5), the proportion of students selecting the key is the highest, indicating that the items are relatively easy for the student sample. We also use the reliability index proposed in Templin and Bradshaw (Reference Templin and Bradshaw2014) and found that the reliability of the three goal facets are: 0.89, 0.87, 0.92, respectively, whereas the reliability of the five intermediate facts are 0.84, 0.82, 0.74, 0.43, and 0.68, respectively. It is worth noting that higher magnitude of slopes on the corresponding facets (i.e., λ j , 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\lambda }_{j,1}$$\end{document} and λ j , 2 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\lambda }_{j,2})$$\end{document} leads to higher reliability unsurprisingly, whereas the number of items options measuring each facet does not seem to affect facet reliability much.

Table 7 Definition of goal and intermediate facets for the real data example using items from the unit of “Identification of Forces”.

Note. Learners frequently speak of “force” as though it were a property of an object rather than an action on an object (i.e., “the object has force”). While this idea is less productive to understanding forces, it is useful for thinking about momentum or kinetic energy. Therefore, an α \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} attribute in one unit may be modeled as β \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document} attribute in another unit

Table 8 Proportion of students selecting each response option per item.

Note. The answer key per item is bolded

To cross-check the results, we randomly split the data into a training set (sample size = \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$=$$\end{document} 2,000) and a test set (sample size = 729) 10 times. For each random split, we fit the DFSM using the REM algorithm on the training set and used the estimated model parameters (including both item parameters and population class mixing proportions) on the test set to estimate students’ facet profiles. The estimated item parameters and their standard error are presented in Table 9. The point estimates were obtained by averaging over 10 analyses. The standard error was computed as standard deviation of the point estimates from the 10 random training sets. Several points can be spotted from Table 9. First, majority of the items have negative λ j , 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{j,0}$$\end{document} ’s, which is consistent with the observations from Table 8 that most items are relatively easy. Second, some of the cells for λ j , 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\lambda }_{j,1}$$\end{document} and λ j , 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\lambda }_{j,2}$$\end{document} are blank simply because none of the response options of those items measure the corresponding facets. Among all the estimated slopes, there are six that turn out to be nearly 0 (see the values highlighted in italics), which implies that the items are not discriminative with respect to the related facet. Take item 9 as an example, the estimated slopes on both α 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{\textrm{1}}$$\end{document} and β 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2} $$\end{document} are 0. As a result, for students with true facet profiles of [1,1,1,0,0,0,0,0], [0,1,1,0,0,0,0,0], [1,1,1,0,1,0,0,0], [0,1,1,0,1,0,0,0], they all have exactly the same predicted response probabilities of [.904,.033,.033,.030] for the four response options, respectively. That is, the item can hardly differentiate students with or without any or both of α 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{\textrm{1}}$$\end{document} and β 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document} . In fact, by inspecting Table 8, it is not surprising to note that the frequency of choosing option 1 (i.e., the key) is the highest, whereas the frequencies of choosing any of the distractors are all low. Third, a few of the standard errors seem to be large, which indicate the corresponding parameters were not stably estimated. The large standard errors mostly occur on slopes for α 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{2}$$\end{document} , and as we show below, α 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{2}$$\end{document} turns out to be the most difficult facet.

Figure 5 presents an estimated facet map. It is worth noting that not every round of analysis produced the same facet map, instead, we put a solid link between two facets if such link appeared in 5 or more out of 10 rounds of analysis. We put a dashed link between two facets if such link appeared in less than 5 rounds of analysis. Table 10 presents the details of 11 links that were identified between two facets. All these links were directional as reflected by 3 instead of 4 permissible classes defined by two facets, indicating that no conjoined facets were detected. Table 10 also includes the frequency of occurrence of each possible link, as well as the estimated probabilities of each permissible latent class. These probabilities were averaged across 10 random splits. Figure 5 presents the results in Table 10 graphically for easier visualization. Several hierarchical facets showed up. For instance, α 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{3}$$\end{document} , β 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document} , and β 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{4}$$\end{document} are all pre-requisites of α 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{2}$$\end{document} , making α 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{2}$$\end{document} arguably the most difficult goal facet. There is also an interesting linear hierarchy between β 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document} , β 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{4}$$\end{document} ,    and β 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{3}$$\end{document} , as well as another linear hierarchy between β 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document} , β 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{4}$$\end{document} ,    and β 5 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{5}$$\end{document} . The strengths of the solid links were also displayed in Fig. 5.

Table 11 presents the probability of possession of each facet in both training and test sets, averaged across 10 rounds of analysis. As can be seen, the probabilities are quite consistent between training and testing sets. Further, the mastery probability of α 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{2}$$\end{document} is the lowest, which is consistent with the conjectured facet map that α 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{2}$$\end{document} tends to be most difficult. In addition, β 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document} and β 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{4}$$\end{document} tends to be at the lower end of the spotted linear hierarchy, and hence it is unsurprising to see low probabilities on these two facets, implying that there may not be a lot of misconceptions among students on these most basic understandings.

Lastly, we evaluated item fit by using a chi-squared statistic. Two versions of the statistics were considered. For version I, for an item j, among all the permissible facet profiles (i.e., denoted by G, and G < 2 M ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G<2^{M})$$\end{document} , we compute the following chi-squared statistic,

(11) χ j 2 = g = 1 G k = 1 K j n g O jgk - E jgk 2 E jgk n g - E jgk . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \chi _{j}^{2}=\sum \nolimits _{g=1}^G {\sum \nolimits _{k=1}^{K_j}} {n_{g}\frac{\left( O_{jgk}-E_{jgk} \right) ^{2}}{E_{jgk}\left( n_{g}-E_{jgk} \right) }}. \end{aligned}$$\end{document}

Here O jgk \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O_{jgk}$$\end{document} and E jgk \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E_{jgk}$$\end{document} are the observed and expected count of students who are in latent class gand select option k of item j. We also considered two versions of degree-of-freedom G × K j - 1 - d j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G\times \left( K_{j}-1 \right) -d_{j}$$\end{document} where d j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{j}$$\end{document} is the total number of item parameters for item j and G - d j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G-d_{j}$$\end{document} . The former case seems to fit better with Eq. 11, whereas the latter term is used in polytomous IRT model (e.g., Naumenko, 2014; Su et al., Reference Su, Wang and Weiss2021).

For version II, for an item j, we will consider all permissible latent groups G j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_{j}$$\end{document} (i.e., G j G \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_{j}\le G$$\end{document} , and G j 2 K j ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_{j}\le 2^{K_{j}})$$\end{document} . The latent groups are created based on the q-vector of item j, and each latent group may contain several latent classes with identical attribute vector with respect to the required attributes (e.g., if q = ( 1 , 1 , 0 , 0 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q=(1,1,0,0)$$\end{document} , the latent classes (1,1,0,0), (1,1,1,0), (1,1,0,1) and (1,1,1,1) are placed in the latent group. For DFSM, the element in the q-vector of item j is 1 if at least one option of item j measures the corresponding attribute. Then the chi-squared statistic is

(12) χ j 2 = g = 1 G j k = 1 K j n g O jgk - E jgk 2 E jgk n g - E jgk . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \chi _{j}^{2}={\sum \nolimits _{g=1}^{G_j}} {\sum \nolimits _{k=1}^{K_j}} {n_{g}\frac{\left( O_{jgk}-E_{jgk} \right) ^{2}}{E_{jgk}\left( n_{g}-E_{jgk} \right) }}. \end{aligned}$$\end{document}

Here O jgk \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O_{jgk}$$\end{document} and E jgk \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E_{jgk}$$\end{document} are the observed and expected count of students who are in latent group gand select option k of item j. The two versions of degree of freedom are G j × K j - 1 - d j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_{j}\times \left( K_{j}-1 \right) -d_{j}$$\end{document} where d j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{j}$$\end{document} is the total number of item parameters for item j and G j - d j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_{j}-d_{j}$$\end{document} .

Before applying the item fit indices on real data, we tested them on the simulated data from two conditions: moderately correlated facets and hierarchical facets. Since we did not simulate any misfit intentionally, we only focused on Type I error of each index, as shown in Table 12. In this Table, we also considered whether correcting for family-wise error rate using Bonferroni correction. Most of the Type I error rates were below.05, and if Type I error is extremely small (such as 0), the test may not be powerful. We aim to look for indices that have Type I error closest to.05, and they are highlighted in red in Table 5. It is not surprising that implementing Bonferroni correction yields lower Type I error. Further, Type I error is considerably lower for moderated correlated facet condition because this condition yields many more permissible facet profiles (see Table 6). Because the real data mimics the “hierarchical” structure, we tried the two methods marked in Table 12 on the real data. Both methods flag the same items as having potential misfit: item # 1, 3, 7, and 9, which suggest that either the q-matrix of these items need to be revised, or the additive structure assumed in h α i , q j , k g \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{h}\left( \varvec{\alpha }_{i},\textbf{q}_{j,k}^{g} \right) $$\end{document} and h β i , q j , k p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{h}\left( \varvec{\beta }_{i},\textbf{q}_{j,k}^{p} \right) $$\end{document} need to be revisited.

Figure 5 Estimated facet map.

Table 9 Estimated item parameters and their standard errors.

Values in the parathesis are standard deviations computed across 10 random splits.

Table 10 Estimated links between pairs of facets.

The displayed permissible classes for each pair of facets were after reversing the intermediate facets.

Table 11 Average probability of possession of each facet in training and test sets.

Table 12 Type I error of various item fit indices.

5. Discussion

This study provides a valuable psychometric tool to greatly enhance diagnostic assessment. Current scoring rubrics in most assessments still primarily gradate performance (i.e., right/wrong, or partial credit). Such assessment results only provide an overview of student learning progress. Instead, the best feedback should identify both correct and intermediate states of understanding. This is exactly the type of information that DFSM will extract from assessment data. With targeted and individualized feedback, every student will get appropriate support on their greatest needs to be successful, a condition essential to achieving educational equity. Our simulation study and real data example show great promise for the DFSM and the regularized EM algorithm.

Note that another available model that could be used for facet-based items is the generalized diagnostic classification model for multiple-choice option-based scoring (GDCM-MC), which accounts for both goal and intermediate facets (DiBello et al., Reference DiBello, Henson and Stout2015). However, the GDCM-MC has a complex formulation which makes Bayesian MCMC algorithm a preferable choice than the more efficient EM algorithm. Hence, the regularization technique that is proposed can hardly be applied for GDCM-MC in its current form. Second, as pointed out in a recent paper, the possible coexistence of skills and preconceptions is not adequately handled in GDCM-MC (Kuo et al., 2018). Third, GDCM-MC requires a three-valued coding scheme to specify which facets are strongly related to each option, and this puts a higher requirement on the expert-provided coding matrix. Instead, the DFSM model can naturally handle guessing and it requires less complicated coding of the option-facet mapping matrix. Compared to GDCM-MC, the DFSM model also has more compact parameterization such that it permits using the EM algorithm. If the option to facet mapping is sparse (i.e., not enough options measuring each facet), the parameterization of the DFSM model can be further reduced to make λ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda $$\end{document} ’s as attribute-level parameters instead of item-by-attribute level parameters. Very recently, a model like DFSM was published (Levy, Reference Levy2019) and it was used to analyze game-based assessment data where each step a student takes is cognitively coded. However, the authors used Bayesian estimation and did not consider exploring attribute relationships.

One innovation worth highlighting is that our REM algorithm coupled with deductive algorithm permits deriving a facet map. Despite the instructional benefits of a detailed facet map, identifying the multitude of links among facet pairs poses a methodological challenge. For example, just 9 facets could generate as many as 72 potential directional links. Templin and Bradshaw (Reference Templin and Bradshaw2014) proposed a likelihood ratio (LR) test formatted within a hierarchical diagnostic classification model to statistically test one hypothesized link at a time. However, it is cumbersome to perform their LR testing at scale because the reference distribution needs to be simulated each time to derive the empirical p-value. Instead, we propose to leverage a fast ML method to directly identify massive potential facet hierarchies and create a facet map with data-driven directional and non-directional links. Such empirical evidence will not only complement existing theory driven LTs but may also reveal new insights to refine or expand theoretical trajectories.

This research can be expanded in several directions. First and foremost, as we mentioned in the real data analysis, the model may not be identified with the given Q-matrix because we noted that there does not seem to be an exact one-on-one relationship between the permissible latent classes and facet map. From our discussion about generic identifiable conditions, the Q-matrix in the real data example does not satisfy the conditions because one simple requirement of the condition is there are more than 2M items. In the real data example, since we have 8 total facets, there needs to be at least 16 items. However, the conditions we adopt from Liu and Culpepper (Reference Liu and Culpepper2023) are sufficient conditions and they may not be necessary. Plus, the conditions may change when there exist facet hierarchies or conjoined facets. Further studies need to be conducted to establish necessary conditions for DFSM to be generically identified, with and without facet relationships. Due to potential under identification, the revealed facet maps differed in certain aspects across 10 cross-validation analyses on one hand, and on the other hand, if we derived the facet map via the deductive algorithm from estimated permissible classes, and then used the derived facet map to map out the permissible classes, they did not exactly match the estimated permissible classes. This noted discrepancy implies the potential model identification issues, which did not happen in simulation study. Because the Q-matrix is much sparser and test length is much shorter in real data compared to simulated data, future study is needed to delineate the necessary (and sufficient) conditions of Q-matrix for DFSM to be identified. This information is essential for not only real data analysis, but also for designing future facet-based diagnostic assessments.

Along this line of inquiry, a second research direction would be to explore the types of items that provide high information. In classical item response theory, it is widely known that polytomously scored items tend to carry more information than dichotomously scored items, and items with higher discrimination are more informative. With DFSM, although it is intuitive to claim that higher λ j , 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{j,1}$$\end{document} and λ j , 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{j,2}$$\end{document} yield higher item information, it is still unknown what kind of Q-matrix structure makes an item more informative. For instance, should each distractor only measure one intermediate facet? Or should each distractor measure multiple intermediate facets to max out the number of options per facet? A future study should take a deeper dive to study the item information for DFSM, and to the extent possible, quantify the information gains by modeling nominal responses compared to binary scoring.

Further, our simulation study is limited to additive structure of h α i , q j , k g \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{h}\left( \varvec{\alpha }_{i},\textbf{q}_{j,k}^{g} \right) $$\end{document} and h β i , q j , k p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{h}\left( \varvec{\beta }_{i},\textbf{q}_{j,k}^{p} \right) $$\end{document} , as well as one level of test length and sample size. Increasing sample size and/or increasing test length will certainly improve the estimation of item parameters and person parameters, respectively, determining the minimum sample size needed for adequate recovery of DFSM item parameters under different manipulated conditions is still needed because such information is essential to plan and interpret real data analysis. In addition, we provide a heuristic descriptive statistic to infer the strength of links between pairs of facets, which is essentially the estimated probability of having both the pre-requisite and current facets. Such a descriptive statistic is only a proxy to the actual strength of linkages, whereas future research may consider network models that can directly estimate and make inferences on the link strengths. Furthermore, our real data analysis revealed that 4 items may exhibit misfit. Not only will a separate study exploring the performance of various item fit indices in-depth be needed, but also developing more flexible versions of DFSM will be critical as well. For instance, allowing interaction terms in h α i , q j , k g \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{h}\left( \varvec{\alpha }_{i},\textbf{q}_{j,k}^{g} \right) $$\end{document} and h β i , q j , k p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{h}\left( \varvec{\beta }_{i},\textbf{q}_{j,k}^{p} \right) $$\end{document} or allowing λ j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\lambda }}_{j}$$\end{document} to be an item-option level parameter instead of an item-level parameter will make the model more flexible, albeit less parsimonious. Lastly, the derived facet map should be thoroughly validated through a well-planned validation study that will likely involve student cognitive interviews and teacher focus groups.

Funding

The study is funded by IES R305D200015 and NSF EDU-CORE #2300382.

Data availibility

The data that support the findings of this study are available on request from the corresponding author. The data is not publicly available yet because it is still under IRB review at the University of Washington. Upon IRB approval and completion of the NSF project, the data will be publicly available.

Declarations

Conflict of interest

The authors declare no conflict of interest.

6. Appendix

See Tables 13, 14, 15.

Table 13 Q-matrix of the sample Diagnoser items in the real data analysis.

Table 14 The probability of selecting each response option of an example item.

Table 15 The 36 permissible patterns under the hierarchical facet simulation condition.

Footnotes

The study is funded by IES R305D200015 and NSF EDU-CORE #2300382.

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

2 One may argue that the strength of the link can be quantified as the probability of simultaneously possess plus simultaneously not possess both facets. We acknowledge that our selected statistic is somewhat confounded by the difficulty of the facets. That is, when the two facets are easy and basic, the probability of the class (1, 1) would be high, leading to a seemingly strong link between the two facets. As we mention in the discussion, other better ways to measure strength of links between facets are needed in the future.

References

Alonzo, A. C., Steedle, J. T. (2009). Developing and accessing a force and motion learning progression. Science Education, 93(3), 389421.CrossRefGoogle Scholar
Bradshaw, L., Templin, J. (2014). Combining item response theory and diagnostic classification models: A psychometric model for scaling ability and diagnosing misconceptions. Psychometrika, 79(3), 403425.CrossRefGoogle ScholarPubMed
Briggs, D. C., Alonzo, A. C., Schwab, C., Wilson, M. (2006). Diagnostic assessment with ordered multiple-choice items. Educational Assessment, 11(1), 3363.CrossRefGoogle Scholar
Chattergoon, R. (2020). Using polytomous item response theory models to validate learning progressions. Univeristy of Colorado at Boulder, Doctoral Dissertation.Google Scholar
Chen, Y., Wang, S. (2023). Bayesian estimation of attribute hierarchy for cognitive diagnosis models. Journal of Educaitonal and Behavorial Statistics, .CrossRefGoogle Scholar
Cizek, G. J., Bunch, M. B., Koons, H. (2004). Setting performance standards: Contemporary methods. Educational Measurement: Issues and Practice, 23(4), 3131.CrossRefGoogle Scholar
Clement, J. (1987). The use of analogies and anchoring intuitions to remediate misconceptions in mechanics. Paper presented at the Annual Meeting of American Educational Research Association, Washington, DC.Google Scholar
Corcoran, T. B., Mosher, F. A., & Rogat, A. (2009). Learning progressions in science: An evidence-based approach to reform. Report of the Center on Continuous Instructional Improvement, Teachers College, Columbia University, New York.CrossRefGoogle Scholar
Dadey, N. (2017). Reporting scores from NGSS assessments: Exploring scores & subscores, Portsmouth, NH: Reidy Interactive Lecture Series.Google Scholar
Dahlgren, M. A., Hult, H., Dahlgren, L. O., af Segerstad, H. H., & Johansson, K. (2006). From senior student to novice worker: Learning trajectories in political science, psychology and mechanical engineering. Studies in Higher Education, 31(5), 569–586.CrossRefGoogle Scholar
Daro, P., Mosher, F. A., & Corcoran, T. (2011). Learning Trajoctories in Mathematics: A Foundation for Standards, Curriculum, Assessment and Instruction. CPRE Research Report #RR-68. New York: Columbia University.Google Scholar
de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179199.CrossRefGoogle Scholar
de la Torre, J., Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333353.CrossRefGoogle Scholar
DiBello, L. V., Henson, R. A., Stout, W. F. (2015). A family of generalized diagnostic classification models for multiple choice option-based scoring. Applied psychological measurement, 39(1), 6279.CrossRefGoogle ScholarPubMed
diSessa, A. A. (1993). Toward an epistemology of physics. Cognition and Instruction, 10 2 &3165255.CrossRefGoogle Scholar
diSessa, A. A. (2014a). A history of conceptual change research: Threads and fault lines. In R. K. Sawyer (Ed.), The Cambridge handbook of the learning sciences (2nd ed., pp. 88–108). Cambridge: Cambridge University Press.CrossRefGoogle Scholar
diSessa, A. A. (2014b). The construction of causal schemes: Learning mechanisms at the knowledge level. Cognitive science, 38(5), 795–850.CrossRefGoogle Scholar
diSessa, A. A. (2017). Knowledge in pieces: An evolving framework for understanding knowing and learning. In Amin, T. G., Levrini, O. (Eds), Converging perspectives on conceptual change: Mapping an emerging paradigm in the learning sciences, London: Routledge 716.CrossRefGoogle Scholar
Doignon, J. P., Falmagne, J. C. (1999). Knowledge spaces, Berlin: Springer.CrossRefGoogle Scholar
Foisy, L-MB, Potvin, P., Riopel, M., Masson, S. (2015). Is inhibition involved in overcoming a common physics misconception in mechanics?. Trends in Neuroscience and Education, 4 1–22636.CrossRefGoogle Scholar
Gu, Y., Xu, G. (2019). Learning attribute patterns in high-dimensional structured latent attribute models. Journal of Machine Learning, 20, 158.Google Scholar
Haberman, S., Sinharay, S., Puhan, G. (2009). Reporting subscores for institutions. British Journal of Mathematical and Statistical Psychology, 62(1), 7995.CrossRefGoogle ScholarPubMed
Hadenfeldt, J. C., Bernholt, S., Liu, X., Neumann, K., Parchmann, I. (2013). Using ordered multiple-choice items to assess students’ understanding of the structure and composition of matter. Journal of Chemical Education, 90(12), 16021608.CrossRefGoogle Scholar
Henson, R. A., Templin, J. L., Willse, J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191210.CrossRefGoogle Scholar
Kapon, S., diSessa, A. A. (2012). Reasoning through instructional analogies. Cognition and Instruction, 30(3), 261310.CrossRefGoogle Scholar
Kluger, A. N., DeNisi, A. (1996). The effects of feedback interventions on performance: A historical review, a meta-analysis, and a preliminary feedback intervention theory. Psychological Bulletin, 119(2), 254284.CrossRefGoogle Scholar
Kuo, B. C., Chen, C. H., & de la Torre, J. (2018). A cognitive diagnosis model for identifying coexisting skills and misconceptions. Applied Psychological Measurement, 42(3), 179–191.CrossRefGoogle Scholar
Lanting, A. S. Y. (2000). An empirical study of a district-wide K–2 performance assessment program: Teacher practices, information gained, and use of assessment results. University of Illinois at Urbana-Champaign.Google Scholar
Levy, R. (2019). Dynamic Bayesian network modeling of game-based diagnostic assessments. Multivariate Behavioral Research, 54, 771794.CrossRefGoogle ScholarPubMed
Liu, Y., Culpepper, S. A. (2023). Restricted latent class models for nominal response data: Identifiability and estimation. Psychometrika, .Google ScholarPubMed
Liu, R., Huggins-Manley, A. C., Bradshaw, L. (2017). The impact of Q-matrix designs on diagnostic classification accuracy in the presence of attribute hierarchies. Educational and Psychological Measurement, 77(2), 220240.CrossRefGoogle ScholarPubMed
Minstrell, J. (1989). Teaching science for understanding. In Resnick, L. B., Klopfer, L. E. (Eds), Toward the thinking curriculum: Current cognitive research, Alexandria, VA: Association for Supervision and Curriculum Development 129149.Google Scholar
Minstrell, J. (1991). Facets of students’ knowledge and relevant instruction. In R. Duit, F. Goldberg, & H. Niedderer (Eds.), Research in Physics Learning: Theoretical Issues and Empirical Studies, Proceedings of an International Workshop, Bremen, Germany, March 4–8, 1991 (IPN, Kiel Germany, 1992) pp. 110–128.Google Scholar
Minstrell, J. (1992). Facets of students’ knowledge and relevant instruction. In Duit, R., Goldberg, F., Niedderer, H. (Eds), Research in physics learning: Theoretical Issues and Empirical Studies, Kiel: IPN 110128.Google Scholar
Minstrell, J. (2000). Student Thinking and Related Assessment: Creating a Facet Assessment-based Learning Environment. In Pellegrino, J. , Jones, L., and Mitchell, K. (Eds.) Grading the Nation’s Report Card: Research from the Evaluation of NAEP. Committee on the Evaluation of National and State Assessment of Educational Progress, Board of Testing and Assessment. Washington D.C.: National Academy Press.Google Scholar
Minstrell, J., Anderson, R., Li, M. (2015). Diagnostic instruction: Toward an integrated system for classroom assessment. In Duschl, Richard A, Bismark, Amber S (Eds), Reconceptualizing STEM education: The central role of practices, New York: Routledge.Google Scholar
Morphew, J. W., Mestre, J. P., Kang, H.-A., Chang, H.-H., Fabry, G. (2018). Using computer adaptive testing to assess physics proficiency and improve exam performance in an introductory physics course. Physical Review Physics Education Research, 14(2), .CrossRefGoogle Scholar
Naumenko, O. (2014). Comparison of various polytomous item response theory modeling approaches for task-based simulation CPA exam data. In AICPA 2014 summer internship project. Retrieved from naumenko-polytomous-2014.pdf (aicpa.org).Google Scholar
Pellegrino, J. W., Chudowsky, N., Glaser, R. (2001). Knowing what students know: The science and design of educational assessment, Washington, DC: National Academy Press.Google Scholar
Potvin, P., Masson, S., Lafortune, S., Cyr, G. (2015). Persistence of the intuitive conception that heavier objects sink more: A reaction time study with different levels of interference. International Journal of Science and Mathematics Education, 13(1), 2143.CrossRefGoogle Scholar
Simon, M. A., Tzur, R. (2004). Explicating the role of mathematical tasks in conceptual learning: An elaboration of the hypothetical learning trajectory. Mathematical Thinking and Learning, 6(2), 91104.CrossRefGoogle Scholar
Smolleck, L., & Hershberger, V. (2011). Playing with science: An investigation of young children’s science conceptions and misconceptions. Current Issues in Education, 14. Retrieved from http://cie.asu.edu/ojs/index.php/cieatasu/article/view.Google Scholar
Stavy, R., Babai, R., Tsamir, P., Tirosh, D., Lin, F.-L., McRobbie, C. (2006). Are intuitive rules universal?. International Journal of Science and Mathematics Education, 4, 417436.CrossRefGoogle Scholar
Steedle, J. T., Shavelson, R. J. (2009). Supporting valid interpretations of learning progression level diagnoses. Journal of Research in Science Teaching: The Official Journal of the National Association for Research in Science Teaching, 46(6), 699715.CrossRefGoogle Scholar
Su, S., Wang, C., Weiss, D. (2021). Performance of the S-χ2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} statistic for the multidimensional graded response model. Educational and Psychological Measurement, 81(3), 491522.CrossRefGoogle Scholar
Tate, R. L. (2004). Implications of multidimensionality for total score and subscore performance. Applied Measurement in Education, 17(2), 89112.CrossRefGoogle Scholar
Tatsuoka, K. K. (1983). Rule space: An approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement, 20, 345354.CrossRefGoogle Scholar
Tatsuoka, K. K. (2009). Cognitive assessment: An introduction to the rule space method, New York: Routledge.CrossRefGoogle Scholar
Templin, J., Bradshaw, L. (2014). Hierarchical diagnostic classification models: A family of models for estimating and testing attribute hierarchies. Psychometrika, 79(2), 317339.CrossRefGoogle ScholarPubMed
Thissen-Roe, A., Hunt, E., Minstrell, J. (2004). The diagnoser project: Combining assessment and learning. Behavior Research Methods, Instruments, & Computers, 36(2), 234240.CrossRefGoogle ScholarPubMed
Thompson, F., Logue, S. (2006). An exploration of common student misconceptions in science. International Education Journal, 7(4), 553559.Google Scholar
Tjoe, H., de la Torre, J. (2014). On recognizing proportionality: Does the ability to solve missing value proportional problems presuppose the conception of proportional reasoning?. The Journal of Mathematical Behavior, 33, 17.CrossRefGoogle Scholar
Underwood, S., Posey, L., Herrington, D., Carmel, J., Cooper, M. (2018). Adapting assessment tasks to support three-dimensional learning. Journal of Chemical Education, 95, 207217.CrossRefGoogle Scholar
Vosniadou, S., Verschaffel, L. (2004). Extending the conceptual change approach to mathematics learning and teaching. Learning and Instructions, 14, 445451.CrossRefGoogle Scholar
Wang, C. (2021a). Using penalized EM algorithm to infer learning trajectories in latent transition CDM. Psychometrika, 86(1), 167–189.CrossRefGoogle Scholar
Wang, C. (2021b). On interim cognitive diagnostic computerized adaptive testing in learning context. Applied Psychological Measurement, 45, 235–252.CrossRefGoogle Scholar
Wang, C., Lu, J. (2021). Learning attribute hierarchies from data: Two exploratory approaches. Journal of Educational and Behavioral Statistics, 46, 5884.CrossRefGoogle Scholar
Figure 0

Table 1 Three different modeling frameworks for modeling learning.

Figure 1

Table 2 Unique features of DFSM compared to other CDMs.

Figure 2

Figure 1 An example item from “Forces and Motion” Unit.

Figure 3

Figure 2 Illustration of facet definition and option-to-facet mapping.

Figure 4

Table 3 The REM algorithm for DFSM.

Figure 5

Figure 3 Illustration of the deductive algorithm to construct DAG and added strength of the relationship.

Figure 6

Figure 4 Hierarchical facet structure in the simulation study.

Figure 7

Table 4 Recovery of item parameters (values in the parathesis are standard deviations across items and across replications).

Figure 8

Table 5 Recovery of facets and facet profiles.

Figure 9

Table 6 Recovery of facet sparsity structure.

Figure 10

Table 7 Definition of goal and intermediate facets for the real data example using items from the unit of “Identification of Forces”.

Figure 11

Table 8 Proportion of students selecting each response option per item.

Figure 12

Figure 5 Estimated facet map.

Figure 13

Table 9 Estimated item parameters and their standard errors.

Figure 14

Table 10 Estimated links between pairs of facets.

Figure 15

Table 11 Average probability of possession of each facet in training and test sets.

Figure 16

Table 12 Type I error of various item fit indices.

Figure 17

Table 13 Q-matrix of the sample Diagnoser items in the real data analysis.

Figure 18

Table 14 The probability of selecting each response option of an example item.

Figure 19

Table 15 The 36 permissible patterns under the hierarchical facet simulation condition.