New Paradigm of Identifiable General-response Cognitive Diagnostic Models: Beyond Categorical Data

Seunghyun Lee; Yuqi Gu

doi:10.1007/s11336-024-09983-4

New Paradigm of Identifiable General-response Cognitive Diagnostic Models: Beyond Categorical Data

Published online by Cambridge University Press: 01 January 2025

Seunghyun Lee

and

Yuqi Gu

Show author details

Seunghyun Lee: Affiliation:
Columbia University
Yuqi Gu*: Affiliation:
Columbia University
*: Correspondence should be made to Yuqi Gu, Department of Statistics, Columbia University, Room 928 SSW, 1255 Amsterdam Avenue, New York, NY10027, USA. Email: yuqi.gu@columbia.edu

Article contents

Abstract
Introduction
General-Response Cognitive Diagnostic Models
Identifiability of General-response CDMs
Universal EM Algorithms for Estimating ExpCDMs
Simulation Studies
Application to the TIMSS 2019 Response Time Data
Discussion
Code availability
Footnotes
References

Rights & Permissions

Abstract

Cognitive diagnostic models (CDMs) are a popular family of discrete latent variable models that model students’ mastery or deficiency of multiple fine-grained skills. CDMs have been most widely used to model categorical item response data such as binary or polytomous responses. With advances in technology and the emergence of varying test formats in modern educational assessments, new response types, including continuous responses such as response times, and count-valued responses from tests with repetitive tasks or eye-tracking sensors, have also become available. Variants of CDMs have been proposed recently for modeling such responses. However, whether these extended CDMs are identifiable and estimable is entirely unknown. We propose a very general cognitive diagnostic modeling framework for arbitrary types of multivariate responses with minimal assumptions, and establish identifiability in this general setting. Surprisingly, we prove that our general-response CDMs are identifiable under Q\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\textbf{Q}}$$\end{document}-matrix-based conditions similar to those for traditional categorical-response CDMs. Our conclusions set up a new paradigm of identifiable general-response CDMs. We propose an EM algorithm to efficiently estimate a broad class of exponential family-based general-response CDMs. We conduct simulation studies under various response types. The simulation results not only corroborate our identifiability theory, but also demonstrate the superior empirical performance of our estimation algorithms. We illustrate our methodology by applying it to a TIMSS 2019 response time dataset.

Keywords

cognitive diagnostic model diagnostic classification model EM algorithm exponential family general responses generalized linear model identifiability Q-matrix

Type: Theory and Methods
Information: Psychometrika , Volume 89 , Issue 4 , December 2024 , pp. 1304 - 1336

DOI: https://doi.org/10.1007/s11336-024-09983-4 [Opens in a new window]
Copyright: © 2024 The Author(s), under exclusive licence to The Psychometric Society

1. Introduction

Cognitive diagnostic models (CDMs, also called diagnostic classification models; see Rupp et al. Reference Rupp, Templin and Henson2010) are a popular family of discrete latent variable models in educational and psychological measurement. CDMs employ multiple discrete latent variables to model and diagnose students’ mastery or deficiency of a set of fine-grained skills. Popular examples in the literature include the Deterministic Input Noisy output “And” gate model (DINA model; Junker and Sijtsma, Reference Junker and Sijtsma2001), the log-linear CDM (LCDM; Henson et al., Reference Henson, Templin and Willse2009), the additive CDM (ACDM; de la Torre, Reference de la Torre2011), and general diagnostic models (GDM; von Davier, Reference von Davier2008).

Originally, CDMs were developed mostly to model binary responses and later generalized to polytomous responses, which are both categorical. Recently, with advances in technology and the emergence of varying test formats in modern educational assessments, new response types have become available. In particular, multivariate continuous and count responses are especially common. Continuous responses arise in the following scenarios: (a) responses that place a mark on a line segment (such as the visual analog scale), (b) assessments that record the probability of each option being correct (probability testing), (c) computer-based tests that record the response time (Minchen et al., Reference Minchen, de la Torre and Liu2017). In particular, modeling response times has received great interest for a long time, and many different models have been proposed to this end; see De Boeck and Jeon (Reference De Boeck and Jeon2019) for an overview. Another common response type is count responses. They arise in the following scenarios: (a) assessments with repetitive tasks where the number of correct responses is recorded, (b) assessments where examinees read aloud a passage and the number of errors is recorded, (c) modern exams with eye tracking sensors that record students’ visual fixation counts, (d) computer-based tests that record the visit count per item (Man and Harring, Reference Man and Harring2019; Liu et al., Reference Liu, Liu, Shi and Jiang2022). Rasch (1993) first proposed a Poisson-based item response theory (IRT) model for count data, and many other models have been proposed (Magnus and Thissen, Reference Magnus and Thissen2017; Man and Harring, Reference Man and Harring2019, 2022).

Many existing latent variable models for general responses are based on IRT that use continuous latent traits to model the unobserved constructs (Thissen, 1983; van der Linden, Reference van der Linden2007; Wang and Xu, Reference Wang and Xu2015). On the other hand, using discrete latent variables as in CDMs can provide students with valuable personalized diagnoses of the mastery/deficiency profiles of the latent skills. Currently, there are a few CDMs proposed for modeling non-categorical responses. For instance, Minchen et al. (Reference Minchen, de la Torre and Liu2017) proposed a DINA model with a lognormal link to model response time data, whereas Liu et al. (Reference Liu, Liu, Shi and Jiang2022, Reference Liu, Heo, Liu, Shi and Jiang2023) proposed GDMs (which include the DINA model as a submodel) with a Poisson link and a negative binomial link, respectively, to model visual fixation count data. It is desirable to propose a general modeling framework of CDMs for flexible types of responses and their associated estimation methods.

When proposing new statistical models, identifiability is a crucial consideration, because it is a fundamental prerequisite for valid statistical estimation and inference. A model is identifiable if the parameters can be uniquely recovered from the observed data distribution. In recent years, many identifiability results have been established for the categorical-response CDMs, including both binary and polytomous cases (e.g., Chen et al., Reference Chen, Liu, Xu and Ying2015; Xu and Zhang, Reference Xu and Zhang2016; Fang et al., Reference Fang, Liu and Ying2019; Gu and Xu, Reference Gu and Xu2019, Reference Gu and Xu2020; Culpepper, Reference Culpepper2019, Reference Culpepper2023). These studies typically show that CDMs for multivariate categorical data are identifiable under structural constraints on the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix. The $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix here is a key component in a CDM that specifies how the observed responses depend on the latent attributes (see its formal definition in Sect. 2). However, it is entirely unknown whether or not those extended CDMs for continuous or count data (such as those in Minchen et al. (Reference Minchen, de la Torre and Liu2017) and Liu et al. (Reference Liu, Liu, Shi and Jiang2022)) are identifiable, let alone identifiability of models for more general response types.

This manuscript makes the following contributions. First, we propose a very general new framework of $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix-based CDMs for modeling rich types of responses. In particular, this framework includes a sub-family of exponential family-based CDMs (ExpCDMs), which is a wide class of parametric CDMs using exponential families to model general responses. Our general modeling framework covers existing CDMs for continuous and count responses as special cases (Minchen et al., Reference Minchen, de la Torre and Liu2017; Minchen and de la Torre, Reference Minchen and de la Torre2018; Liu et al., Reference Liu, Liu, Shi and Jiang2022, Reference Liu, Heo, Liu, Shi and Jiang2023). Second, we provide the crucial identifiability guarantees for the proposed new models. Somewhat surprisingly, we prove that our general model is identifiable under similar structural conditions on the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix, just as those for traditional categorical-response CDMs. This is the first identifiability result for CDMs with non-categorical responses. Our conclusions set up a new paradigm of identifiable general-response CDMs and significantly advances the psychometric theory of diagnostic modeling. Third, we propose an EM algorithm to efficiently estimate model parameters. As concrete demonstrations, we consider the DINA and main-effect-based CDMs for general responses and derive explicit updates in the EM algorithms. Our simulation results corroborate the identifiability theory, and also demonstrate the superior empirical performance of the proposed estimation algorithms.

The remainder of this manuscript is organized as follows. Section 2 formally introduces our general-response CDM framework and gives many examples of its parametric submodels. Section 3 provides conditions for model identifiability. Section 4 proposes a general-purpose EM algorithm to estimate the model parameters. Section 5 presents simulation studies under various response types. Section 6 illustrates our methodology via application to a real-world response time dataset from the Trends in International Mathematics and Science Study (TIMSS) in 2019. Section 7 concludes and discusses future research directions. The Supplementary Material contains the proofs of the theorems and more details of the simulation studies and real data analysis.

2. General-Response Cognitive Diagnostic Models

2.1. General Model Setup

Consider an educational assessment with J items, and denote an observed response vector by $Y = (Y_{1}, \dots, Y_{J}) \in \times_{j = 1}^{J} Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Y}}= (Y_1, \ldots , Y_J)\in \times _{j=1}^J \mathcal {Y}_j$$\end{document} , where each $Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {Y}_j$$\end{document} is the sample space of the random variable $Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_j$$\end{document} . In particular, our main examples are (a) continuous responses with $Y_{j} = R$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {Y}_j = {\mathbb {R}}$$\end{document} , and (b) count responses with $Y_{j} = {0, 1, 2, \dots}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {Y}_j = \{0, 1,2, \ldots \}$$\end{document} being the set of all nonnegative integers. Beyond continuous and count responses, our general modeling framework covers a much wider class of responses with a general sample space $Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {Y}_j$$\end{document} . This includes vector-valued responses that arise in the context of modern assessment data, such as a joint vector of response accuracy, response time, and visual fixation counts (De Boeck and Jeon, Reference De Boeck and Jeon2019; Man and Harring, 2022). Another example of responses is continuous features learned from process data via dimension reduction methods (He and Von Davier, 2016; Tang et al., Reference Tang, Wang, He, Liu and Ying2020). For a more rigorous presentation with minimal assumptions on $Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {Y}_j$$\end{document} , measure-theoretic definitions are required, and we provide those details in Supplementary Material S.1. Note that we also allow the response type $Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {Y}_j$$\end{document} to differ across $j = 1 \dots, J$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j = 1 \ldots , J$$\end{document} , that is, mixed-type responses (e.g., see Moustaki and Knott, Reference Moustaki and Knott2000).

To define the general-response CDMs, we start by specifying the latent part. We model each student’s latent skill profile $A$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{A}}$$\end{document} as a binary K-dimensional vector $A = (A_{1}, \dots, A_{K}) \in {0, 1}^{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{A}}= (A_1, \ldots , A_K) \in \{0,1\}^K$$\end{document} for diagnostic modeling purposes. Here $A_{k} = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_k = 1$$\end{document} or 0 represents the presence or absence of the kth latent skill. Following most existing studies on CDMs (Chen et al., Reference Chen, Liu, Xu and Ying2015, Reference Chen, Culpepper, Chen and Douglas2018), we adopt the saturated model for the latent attributes; that is, for each skill profile $α \in {0, 1}^{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }\in \{0,1\}^K$$\end{document} , define its population proportion parameter as $p_{α} = P (A = α)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{\varvec{\alpha }} = \mathbb {P}({\textbf{A}}= \varvec{\alpha })$$\end{document} , which satisfy

(1)

\begin{matrix} \sum_{α \in {0, 1}^{K}} p_{α} = 1, p_{α} > 0 . \end{matrix}

We collect all proportion parameters in the vector $p = (p_{α} : α \in {0, 1}^{K}) .$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{p}}= (p_{\varvec{\alpha }}: \varvec{\alpha }\in \{0,1\}^K).$$\end{document}

Next, we define the measurement model, which specifies the conditional distribution of the observed responses $Y$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Y}}$$\end{document} given the latent skills $A$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{A}}$$\end{document} . The main ingredients in this definition are (a) the local independence assumption, and (b) constraints induced by the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix (Tatsuoka, Reference Tatsuoka1983). Let $P_{j, α} (Y_{j} ∣ A = α)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}_{j, \varvec{\alpha }}(Y_j \mid {\textbf{A}}= \varvec{\alpha })$$\end{document} (or $P_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}_{j, \varvec{\alpha }}$$\end{document} for short) denote the conditional distribution of $Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_j$$\end{document} given $A = α$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{A}}= \varvec{\alpha }$$\end{document} . For any positive integer M, let $[M] : = {1, 2, \dots, M}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[M]:=\{1,2,\ldots ,M\}$$\end{document} denote the set of all positive integers no greater than M. First, under local independence, the observed $Y_{1}, \dots, Y_{J}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_1,\ldots ,Y_J$$\end{document} are conditionally independent given the latent $A_{1}, \dots, A_{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_1,\ldots ,A_K$$\end{document} , which implies:

(2)

\begin{matrix} P (Y \in \times_{j = 1}^{J} S_{j} ∣ A = α) = \prod_{j = 1}^{J} P_{j, α} (Y_{j} \in S_{j} ∣ A = α), \forall S_{j} \in F_{j}, j \in [J] . \end{matrix}

Here, the notation $Y \in \times_{j = 1}^{J} S_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Y}}\in \times _{j=1}^J S_j$$\end{document} means that $Y_{j} \in S_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_j\in S_j$$\end{document} for all j, and $F_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {F}_j$$\end{document} is a collection of measurable subsets of the sample space $Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {Y}_j$$\end{document} that determines a probability distribution on $Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {Y}_j$$\end{document} . For continuous responses, $F_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {F}_j$$\end{document} can be the collection of all open intervals in $R$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {R}$$\end{document} ; for count responses, $F_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {F}_j$$\end{document} can be the collection of all subsets of $Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {Y}_j$$\end{document} .

Figure 1 Graphical model of the general-response CDM with $Y_{j} \in Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_j\in \mathcal Y_j$$\end{document} . White nodes are latent attributes, and gray nodes are observed responses. The directed arrows from the latent to the observed capture the conditional dependence of $Y$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Y}}$$\end{document} given $A$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{A}}$$\end{document} , which is exactly encoded in the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix $Q_{J \times K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}_{J\times K}$$\end{document} . There is an directed arrow from $A_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_k$$\end{document} to $Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_j$$\end{document} if and only if $q_{j, k} = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{j,k}=1$$\end{document} .

Next, we describe the constraints on the conditional distributions $P_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}_{j,\varvec{\alpha }}$$\end{document} imposed by the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix. The $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix was initially proposed by Tatsuoka (Reference Tatsuoka1983) for modeling binary responses using the cognitive diagnostic assumption. The $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix $Q = (q_{j, k})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}=(q_{j,k})$$\end{document} is a $J \times K$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J \times K$$\end{document} matrix with binary entries, where the (j, k)-th entry $q_{j, k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{j,k}$$\end{document} equals 1 if item j requires or measures the latent attribute k, and 0 otherwise. Statistically, the entries in the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix describe how the J observed item responses depend on the latent attributes. For convenience, we also use the notation pa $(j) = {k \in [K] : q_{j, k} = 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(j) =\{k \in [K]: q_{j,k} = 1 \}$$\end{document} to represent the set of attributes required/measured by item j. Here “pa” is short for “parent”, motivated by the graphical model perspective of the proposed model; see Remark 1 below for details. We write $α_{pa (j)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }_{\text {pa}(j)}$$\end{document} as the sub-vector of $α = (α_{1}, \dots, α_{K})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }= (\alpha _1, \ldots , \alpha _K)$$\end{document} that contains those entries indexed by $k \in pa (j)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k \in {\text {pa}(j)}$$\end{document} . Then, by the definition of the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix, the distribution $P_{j, α} (Y_{j} \in S_{j} ∣ A = α, Q)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}_{j,\varvec{\alpha }}(Y_j \in S_j \mid {\textbf{A}}= \varvec{\alpha },~ {\textbf{Q}})$$\end{document} must satisfy

(3)

\begin{matrix} P_{j, α} (Y_{j} \in S_{j} ∣ A = α, Q) = P_{j, α} (Y_{j} \in S_{j} ∣ A_{pa (j)} = α_{pa (j)}), \forall S_{j} \in F_{j}; \end{matrix}

that is, the conditional distribution of $Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_j$$\end{document} given all the latent attributes is the same as the conditional distribution of $Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_j$$\end{document} given the required attributes of item j as indicated by the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix. To make the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix play a meaningful role in describing the conditional distribution of the observed variables given the latent variables, we make a mild assumption that for each $j \in [J]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j\in [J]$$\end{document} , there exist latent patterns $α$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }$$\end{document} and $α^{'}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }'$$\end{document} with $α_{pa (j)} \neq α_{pa (j)}^{'}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }_{\text { {pa}}(j)} \ne \varvec{\alpha }'_{\text { {pa}}(j)}$$\end{document} such that $P_{j, α} \neq P_{j, α^{'}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}_{j, \varvec{\alpha }} \ne \mathbb {P}_{j, \varvec{\alpha }'}$$\end{document} . In the special case of binary response CDMs, this assumption is readily satisfied by many popular existing models. In this work, we consider the confirmatory modeling setting where the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix is known.

Remark 1

Viewed from a probabilistic graphical modeling perspective, the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix can also be represented by a bipartite graph with directed arrows pointing from the latent variables to the observed ones; see Fig. 1 for a graphical model illustration. Then, $pa (j)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text { {pa}}(j)$$\end{document} exactly denotes the parent attributes that have a directed arrow toward $Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_j$$\end{document} , and Assumption (3) can be interpreted as the graphical model being faithful (e.g., see Definition 3.8 in Koller and Friedman, 2009). For example, in the setting of Fig. 1, $A_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_1$$\end{document} is the only parent attribute of $Y_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_2$$\end{document} , thus $pa (2) = {1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text { {pa}}(2) = \{1\}$$\end{document} .

Now, we formally define the general-response CDM based on the above assumptions.

Definition 1

(General-response CDM) A general-response CDM with K binary latent attributes $A = (A_{1}, \dots, A_{K})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{A}}= (A_1, \ldots , A_K)$$\end{document} , J observed responses $Y = (Y_{1}, \dots, Y_{J})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Y}}= (Y_1, \ldots , Y_J)$$\end{document} , and model components $(p, {P_{j, α}})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\varvec{p}}, \{\mathbb {P}_{j,\varvec{\alpha }}\})$$\end{document} is a statistical model for $Y$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Y}}$$\end{document} that satisfies (1), (2), and (3).

Under Definition 1, the marginal distribution of $Y$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Y}}$$\end{document} in a general-response CDM is:

(4)

\begin{matrix} P (Y \in \times_{j = 1}^{J} S_{j} ∣ Q) = \sum_{α \in {0, 1}^{K}} p_{α} \prod_{j = 1}^{J} P_{j, α} (Y_{j} \in S_{j} ∣ A_{pa (j)} = α_{pa (j)}), \forall S_{j} \in F_{j} . \end{matrix}

The proposed general-response CDMs form a very general and flexible framework of cognitive diagnostic models. First, this framework includes existing popular CDMs such as the DINA model, Reduced-RUM, LCDM, and GDM for categorical responses as submodels. This is because one can let the conditional distribution $P_{j, α} (Y_{j} ∣ A = α)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}_{j, \varvec{\alpha }}(Y_j\mid {\textbf{A}}= \varvec{\alpha })$$\end{document} be a specific parametric distribution such as the Bernoulli with a certain link function for modeling binary responses (elaborated in the next paragraph). Second, our framework also includes those recently proposed specific CDMs for continuous or count responses (e.g., Minchen and de la Torre, Reference Minchen and de la Torre2018; Liu et al., Reference Liu, Liu, Shi and Jiang2022) as special cases, and is further able to model other response types.

As a concrete illustration, we next present the special case of our model (4) when the responses are all binary, which is the most widely considered scenario for CDMs. In this case, $Y_{j} = {0, 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {Y}_j = \{0, 1\}$$\end{document} for all $j \in [J]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j\in [J]$$\end{document} and $P_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}_{j,\varvec{\alpha }}$$\end{document} is just the Bernoulli distribution. We write the Bernoulli parameter of $P_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}_{j,\varvec{\alpha }}$$\end{document} as $θ_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _{j,\varvec{\alpha }}$$\end{document} . Note that in the classical binary-response CDM literature, $θ_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _{j,\varvec{\alpha }}$$\end{document} is often called the positive response probability. The conditional independence condition (3) boils down to the following equality constraints on the parameters $θ_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _{j,\varvec{\alpha }}$$\end{document} :

(5)

\begin{matrix} \begin{matrix} θ_{j, α} = θ_{j, α^{'}}, \forall α, α^{'} \in {0, 1}^{K} such that α_{pa (j)} = α_{pa (j)}^{'} . \end{matrix} \end{matrix}

The above constraints are satisfied in all existing CDMs for binary responses, and they also appeared as assumptions in the study of identifiability for the related restricted latent class models (Xu, Reference Xu2017; Gu and Xu, Reference Gu and Xu2020). As a more concrete example, $θ_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _{j,\varvec{\alpha }}$$\end{document} under the DINA model with the slipping parameter $s_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_j$$\end{document} and guessing parameter $g_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_j$$\end{document} can be written as

\begin{matrix} θ_{j, α} = (1 - s_{j}) \prod_{k = 1}^{K} α^{q_{j, k}} + g_{j} (1 - \prod_{k = 1}^{K} α^{q_{j, k}}) = (1 - s_{j}) \prod_{k \in pa (j)} α_{k} + g_{j} (1 - \prod_{k \in pa (j)} α_{k}), \end{matrix}

where $1 - s_{j} > g_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1 - s_j > g_j$$\end{document} is typically assumed. This parametrization clearly satisfies (5).

2.2. Parametric and Exponential Family-Based CDMs (ExpCDMs)

We next define parametric general-response CDMs, and illustrate how our general framework can be specified according to various modeling assumptions and response types. In this section, for notational simplicity, we additionally assume that the response types are the same across the items (i.e., $(Y_{j}, F_{j})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\mathcal {Y}_j, \mathcal {F}_j)$$\end{document} is the same for all $j \in [J]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j\in [J]$$\end{document} ) and follow the same parametric family $P = {g (\cdot; η)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {P}} = \{g(\cdot ; \varvec{\eta })\}$$\end{document} . Under this assumption, we can further write $P_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}_{j,\varvec{\alpha }}$$\end{document} as

(6)

\begin{matrix} P_{j, α} (Y_{j} \in S_{j} ∣ A = α) = (\begin{matrix} \int_{S_{j}} g (y; η_{j, α}) d y, & if Y_{j} = (a, b) or R, \\ \sum_{y \in S_{j}} g (y; η_{j, α}), & if Y_{j} = {0, 1, 2, \dots} . \end{matrix}) \end{matrix}

Here, $g (\cdot; η)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g(\cdot ; \varvec{\eta })$$\end{document} is the probability density/mass function of a specific identifiable parametric family $P$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {P}$$\end{document} (that is, $η \to g (\cdot; η)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }\rightarrow g(\cdot ; \varvec{\eta })$$\end{document} is a one-to-one mapping), and $η_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_{j, \varvec{\alpha }}$$\end{document} ’s are parameters that can be a scalar or a vector. We next formally define a parametric CDM.

Definition 2

(Parametric general-response CDM) Given an identifiable parametric family $P = {g (\cdot; η)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {P}} = \{g(\cdot ; \varvec{\eta }) \}$$\end{document} , the parametric general-response CDM with parameters $({η_{j, α}}, p)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\{\varvec{\eta }_{j,\varvec{\alpha }}\}, {\varvec{p}})$$\end{document} is a general-response CDM whose $P_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}_{j,\varvec{\alpha }}$$\end{document} satisfies (6).

For parametric general-response CDMs, the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix constraints (3) reduce to the following constraints on the parameters $η$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }$$\end{document} , which are analogous to (5) in the binary-response case:

(7)

\begin{matrix} η_{j, α} = η_{j, α^{'}}, \forall α, α^{'} \in {0, 1}^{K} such that α_{pa (j)} = α_{pa (j)}^{'} . \end{matrix}

In the context of modeling response times in psychometrics, the Gamma (Maris, Reference Maris1993), Weibull (Loeys et al., Reference Loeys, Rosseel and Baten2011), and inverse Gaussian (Lo and Andrews, Reference Lo and Andrews2015) distributions have been commonly used in addition to the aforementioned lognormal distribution. In the context of modeling visual fixation counts or correct answer counts, the negative binomial distribution is a popular choice (Man and Harring, Reference Man and Harring2019; Liu et al., Reference Liu, Heo, Liu, Shi and Jiang2023) in addition to the aforementioned Poisson distribution (Liu et al., Reference Liu, Liu, Shi and Jiang2022). In the later sections on estimation methodology and simulation studies, we will consider a variety of distributions for $P$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal P$$\end{document} : Normal, transformed-Normal (lognormal, logistic-Normal), Poisson, and negative binomial.

Except for the negative binomial distribution, all parametric distributions mentioned in the previous paragraph are exponential family distributions (e.g., see Section 3.4 in Casella and Berger, Reference Casella and Berger2021, for which the probability density/mass function can be written as:

(8)

\begin{matrix} g (y_{j}; η) = h (y_{j}) exp {η^{⊤} T (y_{j}) - A (η)} . \end{matrix}

Following the convention for the exponential family distributions, $η$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }$$\end{document} collects the natural parameters, $T (Y_{j})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{T}}(Y_j)$$\end{document} collects the sufficient statistics, and $A (η)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A(\varvec{\eta })$$\end{document} is the log-partition function. Both $η$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }$$\end{document} and $T (Y_{j})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{T}}(Y_j)$$\end{document} are typically multidimensional in our general-response CDM context. Exponential families are a natural and flexible choice to model rich types of responses and are widely used in statistics and psychometrics. For example, the Bernoulli distribution with success parameter $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} , commonly used to model binary responses, is an exponential family with natural parameter $η = log (θ / (1 - θ))$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta = \log \left( \theta /(1-\theta ) \right) $$\end{document} and sufficient statistic $T (Y) = Y$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(Y) = Y$$\end{document} . In Table 1, we also present the natural parameter and sufficient statistics for the exponential families that we will later use to model general responses.

Table 1 Examples of exponential families, and their natural parameters and sufficient statistics.

We next define parametric general-response CDMs based on the exponential families; we call these models ExpCDMs.

Definition 3

(ExpCDM) We define an ExpCDM as a parametric CDM whose parametric family $P = {g (\cdot; η)}_{η}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {P}} = \{g(\cdot ; \varvec{\eta })\}_{\varvec{\eta }}$$\end{document} is an exponential family distribution.

The general framework of ExpCDMs encompasses models for various response types including categorical, count, bounded and unbounded continuous data. In the end of this section, we will also briefly mention how to deal with even more general distributions outside of the exponential family, such as the negative binomial.

Remark 2

Using exponential family distributions to model general-response multivariate data with latent variables has a long history. Moustaki and Knott (Reference Moustaki and Knott2000) extended the IRT model to general responses, by modeling the natural parameter of the exponential family distribution as the linear combination of the continuous latent traits. This approach extends the generalized linear model (GLM) in Nelder and Wedderburn (Reference Nelder and Wedderburn1972) to the latent variable setting. In a similar spirit, Dunson (Reference Dunson2000) used continuous latent variables and exponential family distributions to model multiple clustered outcomes.

In the context of factor analysis, Skrondal and Rabe-Hesketh (2004) extended the linear factor analysis to model general responses by using exponential families and proposed the generalized linear factor model. We emphasize that all these previous models use continuous latent variables, and there do not exist works that adopt a framework as flexible as the exponential family with multidimensional discrete latent variables that serve the diagnostic purposes.

To define a general-response CDM, there are two parts that need to be specified: (a) the type of parametric family that models the response distribution, i.e., the forms of $P$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {P}}$$\end{document} and $g (\cdot; η)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g(\cdot ;\varvec{\eta })$$\end{document} ; and (b) the type of interactions between the latent attributes and the observed responses, i.e., in what way $η_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_{j,\varvec{\alpha }}$$\end{document} are subject to the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix constraints, or equivalently, whether the parameters for $Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_j$$\end{document} are impacted by the main effects, or interaction effects, or both, of the required attributes. We have already shown that the ExpCDM framework provides a flexible way of specifying part (a). As for part (b), all the existing binary-response CDMs including the conjunctive DINA model, disjunctive DINO model, additive models (ACDM, reduced RUM), and more flexible all-effect models such as GDINA and GDM can be easily extended and incorporated into our ExpCDM framework.

We next show how all the above different types of attribute-item interactions have their ExpCDM counterparts for modeling rich types of data. In the remainder of this section, we describe the exponential family-based DINA (ExpDINA) model and the exponential family-based additive CDM (ExpACDM). We define these models by further specifying how the natural parameters $η_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_{j,\varvec{\alpha }}$$\end{document} satisfy the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix constraints in (7). Specifically, for each model, we present the lognormal distribution and the Poisson distribution as example exponential families to illustrate our general framework. These two distributions are suitable for modeling positive continuous data and nonnegative count data, respectively.

2.2.1. ExpDINA for General Responses

The DINA model was proposed by Junker and Sijtsma (Reference Junker and Sijtsma2001) for modeling binary responses using a conjunctive assumption. Define the ideal response to item j given attribute pattern $A = α$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{A}}= \varvec{\alpha }$$\end{document} to be

(9)

\begin{matrix} Γ_{j, α} = \prod_{k = 1}^{K} α_{k}^{q_{j, k}} = \prod_{k \in pa (j)} α_{k} . \end{matrix}

In other words, $Γ_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Gamma _{j,\varvec{\alpha }}$$\end{document} is the binary indicator of whether skill pattern $α$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }$$\end{document} masters all of the required skills of item j and hence is “capable” of item j.

Under the binary-response DINA model, the positive response probability $θ_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _{j,\varvec{\alpha }}$$\end{document} can be written as

(10)

\begin{matrix} θ_{j, α} = (\begin{matrix} 1 - s_{j}, & if Γ_{j, α} = 1, \\ g_{j}, & if Γ_{j, α} = 0 . \end{matrix}) \end{matrix}

Following the conventional notation in the literature, $s_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_j$$\end{document} is the slipping parameter that gives the probability that a capable subject of item j provides an incorrect answer, and $g_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_j$$\end{document} is the guessing parameter that gives the probability that an incapable subject provides a correct answer. Continuing from (10), we can also write the Bernoulli natural parameters in terms of the item parameters $s_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_j$$\end{document} and $g_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_j$$\end{document} by using the following transformation:

\begin{matrix} η_{j, (h)} : = (\begin{matrix} log (\frac{1 - s_{j}}{s_{j}}), & if h = 1, \\ log (\frac{g_{j}}{1 - g_{j}}), & if h = 0 . \end{matrix}) \end{matrix}

Here, h equals the value of the DINA model ideal response $Γ_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Gamma _{j,\varvec{\alpha }}$$\end{document} , and $η_{j, (h)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta _{j,(h)}$$\end{document} denotes the value of the natural parameter $η_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_{j,\varvec{\alpha }}$$\end{document} given $Γ_{j, α} = h$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Gamma _{j,\varvec{\alpha }} = h$$\end{document} . We are now ready to define the general-response ExpDINA model.

Definition 4

(ExpDINA) The ExpDINA model is a submodel of the ExpCDM for which the natural parameters can be written as $η_{j, α} = η_{j, (Γ_{j, α})}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_{j,\varvec{\alpha }} {=} \varvec{\eta }_{j,(\Gamma _{j,\varvec{\alpha }})}$$\end{document} for $j \in [J]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j\in [J]$$\end{document} and $α \in {0, 1}^{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }\in \{0,1\}^K$$\end{document} . The parameters in an ExpDINA include the item parameters ${η_{j, (0)}, η_{j, (1)} : j \in [J]}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\varvec{\eta }_{j,(0)}, \varvec{\eta }_{j,(1)}: j {\in } [J]\}$$\end{document} and proportion parameters $p$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{p}}$$\end{document} .

Remark 3

We remark that for certain distributions there may exist a more conventional parametrization than directly parametrizing the natural parameters to depend on the latent attributes. For instance, in the ExpDINA with a normal distribution for $Y_{j} ∣ A = α$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_j\mid {\textbf{A}}=\varvec{\alpha }$$\end{document} (see Example 1), it is more intuitive to parameterize using the mean $μ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document} and the variance $σ^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2$$\end{document} as depending on $α$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }$$\end{document} instead of assuming so for the natural parameters $η = {(μ / σ^{2}, - 1 / 2 σ^{2})}^{⊤}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }= \left( {\mu }/{\sigma ^2}, - {1}/{2\sigma ^2}\right) ^\top $$\end{document} . Such equivalent re-parametrizations still fit well into our definition of ExpCDMs.

Under an ExpDINA model, each natural parameter for each item j takes exactly two possible values, for capable ( $Γ_{j, α} = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Gamma _{j,\varvec{\alpha }}=1$$\end{document} ) and incapable ( $Γ_{j, α} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Gamma _{j,\varvec{\alpha }}=0$$\end{document} ) subjects, respectively. Generally, ExpDINA still adopts the conjunctive assumption of required attributes and can be used to model rich types of response data. Under Definition 4, the conditional distribution $Y_{j} ∣ A = α$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_j \mid {\textbf{A}}= \varvec{\alpha }$$\end{document} in (6) is:

(11)

\begin{matrix} Y_{j} ∣ A = α \sim (\begin{matrix} g (Y_{j}; η_{j, (1)}), & if Γ_{j, α} = 1; \\ g (Y_{j}; η_{j, (0)}), & if Γ_{j, α} = 0 . \end{matrix}) \end{matrix}

Therefore, ExpDINA essentially models each response $Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_j$$\end{document} as a local mixture of two distributions, similar in spirit to the binary-response DINA model. Indeed, ExpDINA covers the binary-response DINA and polytomous-response DINA as special cases. From the definition of $Γ_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Gamma _{j,\varvec{\alpha }}$$\end{document} , it is clear that the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix constraints (7) are satisfied.

Example 1

(Lognormal-DINA, Logistic-Normal-DINA, and transformed-Normal-DINA) As an illustrative example, consider using the lognormal distribution to model positive continuous response. The lognormal is very commonly used to model response times (van der Linden, Reference van der Linden2006, Reference van der Linden2007; Minchen et al., Reference Minchen, de la Torre and Liu2017). The lognormal density with mean parameter $μ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document} and variance parameter $σ^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2$$\end{document} takes the form of

\begin{matrix} g^{lognormal} (y; μ, σ^{2}) = \frac{1}{y \sqrt{2 π σ^{2}}} exp (- \frac{{(log y - μ)}^{2}}{2 σ^{2}}), y \in (0, \infty) . \end{matrix}

As mentioned in Remark 3, we directly parameterize the mean $μ_{j, h}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{j,h}$$\end{document} and variance $σ_{j, h}^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{j,h}^2$$\end{document} for the capable or incapable students based on the DINA model ideal response values $h = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h = 1$$\end{document} or 0, respectively, instead of using the less interpretable natural parameters $η = {(μ / σ^{2}, - 1 / 2 σ^{2})}^{⊤}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }= \left( {\mu }/{\sigma ^2}, - {1}/{2\sigma ^2}\right) ^\top $$\end{document} . Then, our lognormal-DINA model becomes the so-called C-DINA model proposed by Minchen et al. (Reference Minchen, de la Torre and Liu2017):

(12)

\begin{matrix} Y_{j} ∣ A = α \sim (\begin{matrix} lognormal (μ_{j, 1}, σ_{j, 1}^{2}), & if Γ_{j, α} = 1, \\ lognormal (μ_{j, 0}, σ_{j, 0}^{2}), & if Γ_{j, α} = 0 . \end{matrix}) \end{matrix}

Note that under the same parameter values $(μ, σ^{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\mu , \sigma ^2)$$\end{document} , the lognormal distribution is just an exponential transformation of the Normal distribution. Hence, the lognormal-DINA and normal-DINA models are equivalent up to an invertible transform of Y. Importantly, ExpCDMs based on any monotone-increasing transformation of the Normal are equivalent to the Normal-CDMs, that is, when $Y_{j} ∣ A = α$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_j\mid {\textbf{A}}=\varvec{\alpha }$$\end{document} follows a Normal distribution. This observation can be used to build ExpCDMs for continuous responses that take values in a restricted range. For example, in addition to lognormal-CDM, we can define logistic-Normal-CDM by applying a logit transform of a Normal random variable to model bounded responses ranging in the interval (0, 1). Here, the density function of the logistic-Normal $(μ, σ^{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\mu , \sigma ^2)$$\end{document} is

\begin{matrix} g^{logistic-Normal} (y; μ, σ^{2}) = \frac{1}{\sqrt{2 π σ^{2}} y (1 - y)} exp (- \frac{{(log (y / (1 - y)) - μ)}^{2}}{2 σ^{2}}), y \in (0, 1) . \end{matrix}

The logistic-Normal-DINA model can be defined as:

\begin{matrix} Y_{j} ∣ A = α \sim logistic-Normal (μ_{j, h}, σ_{j, h}^{2}), h = Γ_{j, α} . \end{matrix}

This newly proposed logistic-Normal-DINA model may be useful for analyzing continuous bounded responses in psychological or educational assessments.

Example 2

(Poisson-DINA) As another example, we consider using the Poisson distribution to model count responses. The Poisson distribution is a canonical choice for modeling count responses, and was recently used in Liu et al. (Reference Liu, Liu, Shi and Jiang2022) to model the number of correct answers in repetitive tasks under a diagnostic model. Poisson distribution with mean parameter $λ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda $$\end{document} has the probability mass function

\begin{matrix} g (y; λ) = \frac{e^{- λ} λ^{y}}{y!}, y \in N, \end{matrix}

where $N$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {N}}$$\end{document} denotes the collection of all nonnegative integers. By defining the mean for $h = 0, 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h = 0,1$$\end{document} as $λ_{j, h}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{j,h}$$\end{document} (or equivalently setting the natural parameters as $η_{j, (h)} = log λ_{j, h}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_{j,(h)} = \log \lambda _{j, h}$$\end{document} ), the distribution $Y_{j} ∣ A$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ Y_j \mid {\textbf{A}}$$\end{document} can be written as

(13)

\begin{matrix} \begin{matrix} Y_{j} ∣ A = α \sim (\begin{matrix} Poisson (λ_{j, 1}), if Γ_{j, α} = 1, \\ Poisson (λ_{j, 0}), if Γ_{j, α} = 0 . \end{matrix}) \end{matrix} \end{matrix}

This can be viewed as a reparameterization of the PDCM-DINA model in Liu et al. (Reference Liu, Liu, Shi and Jiang2022).

2.2.2. ExpACDM for General Responses

We next define the exponential family-based Additive CDMs (ExpACDM) for general responses. The word additive refers to the assumption that the required skills of item j enter the conditional distribution of $Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_j$$\end{document} through an additive linear combination of the latent attributes $A_{1}, \dots, A_{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_1,\ldots ,A_K$$\end{document} . Variants of such additive diagnostic models are very popular in the binary-response CDM literature. Examples include the reduced reparameterized unified model (R-RUM; DiBello et al., 1995), the additive cognitive diagnostic model (ACDM; de la Torre, Reference de la Torre2011), and the logistic linear model (LLM; Maris, Reference Maris1999).

We continue to work in the general framework of general-response CDMs in (6). Under the additive assumption, we write the parameters $η_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_{j,\varvec{\alpha }}$$\end{document} as

(14)

\begin{matrix} η_{j, α} = h (β_{j, 0} + \sum_{k = 1}^{K} β_{j, k} q_{j, k} α_{k}, γ_{j}) = h (β_{j, 0} + \sum_{k \in pa (j)} β_{j, k} α_{k}, γ_{j}) . \end{matrix}

We explain the notations in the above expression in turn. First, similar to binary-response additive CDMs, the corresponding $β_{j, k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{j,k}$$\end{document} are not needed to define the model when $q_{j, k} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{j,k}=0$$\end{document} . So without the loss of generality, we assume $β_{j, k} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{j,k} = 0$$\end{document} if $q_{j, k} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{j,k}=0$$\end{document} . If $q_{j, k} = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{j,k}=1$$\end{document} , then $β_{j, k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{j,k}$$\end{document} is the main-effect coefficient for the kth latent attribute. One can see that the linear combination $β_{j, 0} + \sum_{k = 1}^{K} β_{j, k} q_{j, k} α_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{j,0} + \sum _{k=1}^K \beta _{j,k} q_{j,k} \alpha _k$$\end{document} depends only on those required attributes of item j, so $η_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_{j,\varvec{\alpha }}$$\end{document} defined by (14) satisfies the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix constraints (7).

The $γ_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\gamma }_j$$\end{document} in (14) denotes potential additional model parameters that do not depend on the latent attributes (e.g., $γ_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\gamma }_j$$\end{document} may be the dispersion parameter if an exponential family distribution is used), and $h$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{h}}$$\end{document} in (14) is a link function that plays a very similar role as the link function in the generalized linear models (e.g., Nelder and Wedderburn, Reference Nelder and Wedderburn1972). Here, $h$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{h}}$$\end{document} is introduced to map the linear combination of the required attributes to the natural parameters $η$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }$$\end{document} , as the natural parameter space may be different from the space of the linear combinations. Note that $η$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }$$\end{document} may be multidimensional, and hence $h$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{h}}$$\end{document} can be a mapping between multidimensional spaces.

With these notations at hand, we next formally define ExpACDMs.

Definition 5

(ExpACDM) An ExpACDM is a submodel of the ExpCDM for which the natural parameters $η_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_{j,\varvec{\alpha }}$$\end{document} satisfy (7).

The parameters in an ExpACDM with a link function $h$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{h}}$$\end{document} include ${β_{j, 0} : j \in [J]}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\beta _{j,0}: j \in [J]\}$$\end{document} , ${β_{j, k} : q_{j, k} = 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\beta _{j,k}:\; q_{j,k}=1\}$$\end{document} , ${γ_{j} : j \in [J]}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\gamma _j: j \in [J]\}$$\end{document} and proportion parameters $p$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{p}}$$\end{document} .

Note that for the ExpDINA model defined earlier in Sect. 2.2.1, we have parameterized the item parameters as $(η_{j, (0)}, η_{j, (1)})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{\eta }_{j,(0)}, \varvec{\eta }_{j,(1)})$$\end{document} , because there are only two possible values of ${η_{j, α} : α \in {0, 1}^{K}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\varvec{\eta }_{j, \varvec{\alpha }}:\; \varvec{\alpha }\in \{0, 1\}^K\}$$\end{document} due to the conjunctive modeling assumption. In contrast, the ExpACDM has more possible values of ${η_{j, α}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\varvec{\eta }_{j, \varvec{\alpha }}\}$$\end{document} when the attribute profile $α$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }$$\end{document} ranges in ${0, 1}^{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{0,1\}^K$$\end{document} . So in Definition 5, we have parameterized ExpACDMs using the $β$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document} - and $γ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document} -coefficients.

Example 3

(lognormal-ACDM) Continuing from Example 1, we consider the lognormal distribution for modeling the responses but now with an additive structure of the attributes.

For the lognormal parameters $(μ_{j, α}, σ_{j, α}^{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\mu _{j,\varvec{\alpha }}, \sigma ^2_{j,\varvec{\alpha }})$$\end{document} , one can choose to model $μ_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{j,\varvec{\alpha }}$$\end{document} as additive in $α_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _k$$\end{document} ’s and $σ_{j, α}^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2_{j,\varvec{\alpha }}$$\end{document} to not depend on $α$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }$$\end{document} :

\begin{matrix} μ_{j, α} = β_{j, 0} + \sum_{k = 1}^{K} β_{j, k} q_{j, k} α_{k}, σ_{j, α}^{2} = γ_{j} . \end{matrix}

Here, we can view the link function $h$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{h}}$$\end{document} as $h (μ, σ^{2}) = (μ / σ^{2}, - 1 / 2 σ^{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{h}}(\mu , \sigma ^2) = \left( {\mu }/{\sigma ^2}, -{1}/{2\sigma ^2} \right) $$\end{document} that maps the mean and variance to the natural parameters under the lognormal distribution.

By plugging this parametrization in (6), $Y_{j} ∣ A$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_j \mid {\textbf{A}}$$\end{document} can be written as

(15)

\begin{matrix} Y_{j} ∣ A = α \sim lognormal (β_{j, 0} + \sum_{k = 1}^{K} β_{j, k} q_{j, k} α_{k}, γ_{j}) . \end{matrix}

Note that there are multiple modeling choices regarding how to specify the dependence of $μ_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{j,\varvec{\alpha }}$$\end{document} and $σ_{j, α}^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2_{j,\varvec{\alpha }}$$\end{document} on the linear (i.e., additive) combinations of $α_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _k$$\end{document} ’s.

One could model an additive mean and a constant variance after the log transformation as done in the above (15). Alternatively, one could even model the variance also as additive in $α_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _k$$\end{document} ’s.

Example 4

(Poisson-ACDM) As an counterpart of the Poisson-DINA model in Example 2, we define an ExpACDM that uses the Poisson distribution to model count data.

Since Poisson only has one rate parameter, there is no need to introduce $γ_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\gamma }_j$$\end{document} and we only need to express the rate parameter as the linear combination of those required attributes:

\begin{matrix} λ_{j, α} = β_{j, 0} + \sum_{k = 1}^{K} β_{j, k} q_{j, k} α_{k} . \end{matrix}

Recall that the natural parameter under Poisson is $η = log λ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }= \log \lambda $$\end{document} , so we define the link function $h$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{h}}$$\end{document} as $h (λ) = log λ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{h}}(\lambda ) = \log \lambda $$\end{document} . We additionally assume $β_{j, k} \geq 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{j,k} \ge 0$$\end{document} to ensure that the rate parameters $λ_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{j,\varvec{\alpha }}$$\end{document} are nonnegative. By plugging the above expression in (6), $Y_{j} ∣ A$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ Y_j \mid {\textbf{A}}$$\end{document} can be written as

\begin{matrix} Y_{j} ∣ A = α \sim Poisson (β_{j, 0} + \sum_{k = 1}^{K} β_{j, k} q_{j, k} α_{k}) . \end{matrix}

Alternatively, one could let $h$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{h}}$$\end{document} to be the identity link $h (λ) = λ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{h}}(\lambda )=\lambda $$\end{document} without assuming $β_{j, k} \geq 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{j,k} \ge 0$$\end{document} .

In Supplementary Material S.1, we define the exponential family-based general diagnostic model (ExpGDM) for general responses, and also discuss how to define the general-response DINA and ACDM for distributions outside of the exponential family, such as the negative binomial distribution. In this section, we have focused on cases where the parametric families $P$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {P}}$$\end{document} do not change across different items $j \in [J]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j\in [J]$$\end{document} . But it is actually straightforward to use our framework to model different distribution families for different items.

3. Identifiability of General-response CDMs

3.1. Strict Identifiability of General-Response CDMs

Model identifiability is a fundamental prerequisite for valid statistical estimation and inference. We will prove that our general-response CDMs are identifiable under transparent conditions on the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix. In this subsection, we first define strict identifiability for general-response CDMs and then propose conditions for strict identifiability.

Recall that $(p, {P_{j, α} : j \in [J], α \in {0, 1}^{K}})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\varvec{p}}, \{\mathbb {P}_{j, \varvec{\alpha }}: j\in [J], \varvec{\alpha }\in \{0,1\}^K\})$$\end{document} can be viewed as the model parameters of a general-response CDM. We will say that $(p, {P_{j, α}})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\varvec{p}},~ \{\mathbb {P}_{j, \varvec{\alpha }}\})$$\end{document} is equal to $(\bar{p}, {{\bar{P}}_{j, α}})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ ({\overline{{\varvec{p}}}},~ \{{\overline{\mathbb {P}}}_{j, \varvec{\alpha }}\})$$\end{document} up to a sign flip for each coordinate of $α$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }$$\end{document} , if there exist permutation maps $τ_{k} : {0, 1} \to {0, 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _k: \{0,1\} \rightarrow \{0,1\}$$\end{document} such that $p_{α} = {\bar{p}}_{(τ_{1} (α_{1}), \dots, τ_{K} (α_{K}))}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{\varvec{\alpha }} = \overline{p}_{(\tau _1 (\alpha _{1}), \ldots , \tau _K(\alpha _{K}))}$$\end{document} and $P_{j, α} = {\bar{P}}_{j, (τ_{1} (α_{1}), \dots, τ_{K} (α_{K}))}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}_{j,\varvec{\alpha }} = {\overline{\mathbb {P}}}_{j, (\tau _1 (\alpha _{1}), \ldots , \tau _K(\alpha _{K}))}$$\end{document} for any $j \in [p]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j\in [p]$$\end{document} and $α \in {0, 1}^{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }\in \{0,1\}^K$$\end{document} . The sign flipping issue of the binary latent variables in Definition 6 is inevitable for general-response CDMs under the minimal assumptions that we make on the response types. Nevertheless, sign flipping is a trivial ambiguity and can be easily fixed in parametric submodels including most ExpCDMs. For example, in traditional CDMs for binary responses, the sign of latent variables is fixed by making the monotonicity assumption that students who possess all required skills of an item are more likely to correctly answer it compared to those who lack some required skills. Similarly, to resolve the sign flipping issue, in the lognormal-DINA model for continuous responses, one can assume that the students who possess all skills required by an item have a larger mean parameter in the lognormal distribution than those who do not: $μ_{j, 1} > μ_{j, 0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{j,1} > \mu _{j,0}$$\end{document} . More generally, for ExpACDMs, assuming $β_{j, k} > 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{j,k} > 0$$\end{document} when $q_{j, k} = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{j,k} = 1$$\end{document} resolves the sign flipping issue. This claim is formally presented in the later Proposition 1. However, it is less convincing to assume monotonicity for every response type, as not all responses increase with more skill mastery, and moreover, an ordering may not be defined on the sample space of the response $Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {Y}_j$$\end{document} if it is not a subset of the real line $R$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {R}}$$\end{document} . We next formally define strict identifiability up to the sign flipping.

Definition 6

(Strict identifiability) Consider a general-response CDM with a known $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix. The model is strictly identifiable if for any parameters $(p, {P_{j, α}})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\varvec{p}}, \{\mathbb {P}_{j, \varvec{\alpha }}\})$$\end{document} and $(\bar{p}, {{\bar{P}}_{j, α}})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\overline{{\varvec{p}}}}, \{ \overline{\mathbb {P}}_{j, \varvec{\alpha }}\})$$\end{document} satisfying (3), the following equations hold if and only if $(p, {P_{j, α}})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\varvec{p}}, \{\mathbb {P}_{j, \varvec{\alpha }}\})$$\end{document} is equal to $(\bar{p}, {{\bar{P}}_{j, α}})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ ({\overline{{\varvec{p}}}}, \{{\overline{\mathbb {P}}}_{j, \varvec{\alpha }}\})$$\end{document} up to a sign flip for each coordinate of $α$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }$$\end{document} :

(16)

\begin{matrix} \begin{matrix} P (Y \in \times_{j = 1}^{J} S_{j} ∣ \bar{p}, \bar{P}, Q) = P (Y \in \times_{j = 1}^{J} S_{j} ∣ p, P, Q), \forall S_{j} \in F_{j} . \end{matrix} \end{matrix}

Now we are ready to state the main theorem on strict identifiability. We emphasize that no further parametric assumptions are imposed beyond the three assumptions for the general-response CDM, given in Definition 1. Hence, the result applies for general-response CDMs with an arbitrary response type $Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {Y}_j$$\end{document} .Footnote 1

Theorem 1

Under the general-response CDM, the model components $(p, {P_{j, α}})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\varvec{p}}, \{\mathbb {P}_{j, \varvec{\alpha }}\})$$\end{document} are strictly identifiable if the following conditions hold.

A. The true $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix contains two identity submatrices $I_{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{I}}_K$$\end{document} after row swapping, i.e., $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} can be written as
$\begin{matrix} Q = {[I_{K}, I_{K}, Q^{* ⊤}]}^{⊤} . \end{matrix}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\textbf{Q}}= [{\textbf{I}}_K, {\textbf{I}}_K, {\textbf{Q}}^{*\top }]^\top . \end{aligned}$$\end{document}
B. Suppose that the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix is written as in A. For any $α \neq α^{'}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }\ne \varvec{\alpha }'$$\end{document} , there exists $j > 2 K$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j > 2K$$\end{document} such that $P_{j, α} \neq P_{j, α^{'}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}_{j, \varvec{\alpha }} \ne \mathbb {P}_{j, \varvec{\alpha }'}$$\end{document} .

The conditions in Theorem 1 are similar in spirit to existing identifiability conditions for CDMs with binary or polytomous responses, such as Xu (Reference Xu2017) and Culpepper (Reference Culpepper2019). When considering unidimensional responses $Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_j$$\end{document} ’s for each item, we can additionally make a monotonicity assumption to avoid the sign flipping issue. The following proposition establishes the strongest possible notion of identifiability under a monotonicity assumption (17). Here, $0_{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{0}_K$$\end{document} and $1_{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{1}_K$$\end{document} denote the all-zero and all-one vectors of length K, respectively.

Proposition 1

Suppose that a general-response CDM satisfies conditions A and B. Without loss of generality, suppose the first K rows in $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} form an identity matrix $I_{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{I}}_K$$\end{document} .

Additionally, suppose that $Y_{j} \subseteq R$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {Y}_j \subseteq \mathbb {R}$$\end{document} , and the conditional distributions ${P_{j, α}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\mathbb {P}_{j,\varvec{\alpha }}\}$$\end{document} satisfy the following monotonicity assumption:

(17)

\begin{matrix} E (Y_{j} ∣ A = 0_{K}) < E (Y_{j} ∣ A = 1_{K}), \end{matrix}

for $1 \leq j \leq K$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1 \le j \le K$$\end{document} .

Then, the model components $(p, {P_{j, α}})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\varvec{p}}, \{\mathbb {P}_{j,\varvec{\alpha }}\})$$\end{document} are strictly identifiable with no sign flipping issues. In other words, (16) implies $(p, {P_{j, α}}) = (\bar{p}, {{\bar{P}}_{j, α}})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\varvec{p}}, \{\mathbb {P}_{j,\varvec{\alpha }}\}) = (\overline{{\varvec{p}}}, \{\overline{\mathbb {P}}_{j,\varvec{\alpha }}\})$$\end{document} .

The proofs of all theoretical results are deferred to the Supplementary Material. We next provide the high-level intuition of our proof argument. The main idea is to discretize all responses by first constructing a partition of each sample space $Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {Y}_j$$\end{document} into a finite number of categories, and then defining surrogate categorical variables based on the partitions. In this case, our model in (4) implies a tensor decomposition for the probability mass function of the surrogate categorical variables. We then leverage Kruskal’s theorem (Kruskal, Reference Kruskal1977) to establish uniqueness of this tensor decomposition under the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix conditions. Finally, we link such uniqueness results back to identifiability of parameters based on the original distribution of the general responses $Y_{j} \in Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_j\in \mathcal {Y}_j$$\end{document} . Our proof strategy is motivated by Theorem 8 in Allman et al. (Reference Allman, Matias and Rhodes2009) which established identifiability of mixtures of products of nonparametric probability densities. However, Allman et al. (Reference Allman, Matias and Rhodes2009)’s result cannot be directly applied to our setup as we consider more general response types and our conditional distributions are subject to $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix constraints.

As pointed out by a reviewer, our results extend identifiability conditions for finite mixtures of product distributions for continuous/count responses to the more intricate setting of general-response CDMs with the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix constraints. While those traditional mixture models also satisfy our assumptions (1) and (2), there are no $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix constraints (3) that restrict the space of conditional distributions $P_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}_{j,\varvec{\alpha }}$$\end{document} ’s. Hence, existing identifiability conditions for the general-response finite mixture of products (Teicher, Reference Teicher1967; Yakowitz and Spragins, Reference Yakowitz and Spragins1968; Allman et al., Reference Allman, Matias and Rhodes2009) are not directly applicable to general-response CDMs. In particular, these results commonly assume linear independence of the conditional distributions ${P_{j, α} : α \in {0, 1}^{K}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\mathbb {P}_{j,\varvec{\alpha }}:~ \varvec{\alpha }\in \{0,1\}^K\}$$\end{document} , for each $j \in [J]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j\in [J]$$\end{document} . However, for CDMs, ${P_{j, α}}_{α \in {0, 1}^{K}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\mathbb {P}_{j,\varvec{\alpha }}\}_{\varvec{\alpha }\in \{0,1\}^K}$$\end{document} ’s are linearly dependent unless we consider the special case of the GDINA model with $q_{j} = 1_{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{q}}_j = \textbf{1}_K$$\end{document} for all j, corresponding to the case where the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix imposes no restrictions at all. We remark that for certain parametric distributions, such as the exponential distribution and Normal distribution, it is known that mixture models with independent conditional distributions are identifiable (see Proposition 1 and 2 in Yakowitz and Spragins, Reference Yakowitz and Spragins1968). However, such results heavily depend on the specific parametric family under consideration whereas our results generally apply under the minimal nonparametric assumptions.

Theorem 1 and Proposition 1 hold true without assuming a specific measurement model such as DINA or ACDM. For binary responses, there is an extensive literature on model identifiability under various assumptions (e.g., Chen et al., Reference Chen, Liu, Xu and Ying2015; Xu, Reference Xu2017; Xu and Shang, Reference Xu and Shang2018;Chen et al., Reference Chen, Culpepper and Liang2020; Gu and Xu, Reference Gu and Xu2021). It is worth noting that when not assuming a specific measurement model, our results in Theorem 1 and Proposition 1 nearly match existing weakest identifiability conditions for categorical-response CDMs. For example, Theorem 1 in Xu (Reference Xu2017) stated that the parameters $(p, Θ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\varvec{p}},\varvec{\Theta })$$\end{document} in a restricted latent class model are strictly identifiable under condition A and the following condition B1:

B1. For any $k \in [K]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k \in [K]$$\end{document} , there exists one $j > 2 K$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j > 2K$$\end{document} such that $θ_{j, e_{k}} \neq θ_{j, 0_{K}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _{j, \textbf{e}_k} \ne \theta _{j, \varvec{0}_K}$$\end{document} .

Here, $Θ = {(θ_{j, α})}_{j \in [J], α \in {0, 1}^{K}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Theta }= (\theta _{j,\varvec{\alpha }})_{j \in [J], \varvec{\alpha }\in \{0,1\}^K}$$\end{document} is the positive response probability matrix with $θ_{j, α} = P_{j, α} (Y_{j} = 1 ∣ A = α)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _{j,\varvec{\alpha }} = \mathbb {P}_{j,\varvec{\alpha }}(Y_j = 1 \mid {\textbf{A}}= \varvec{\alpha })$$\end{document} , and this determines the $P_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}_{j,\varvec{\alpha }}$$\end{document} ’s, as mentioned in Sect. 2. Also, $e_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{e}_k$$\end{document} is the standard basis vector whose kth entry is 1 and other entries are zeros. For exploratory CDMs with polytomous responses, Theorem 2 in Culpepper (Reference Culpepper2019) proved that the model parameters are identifiable under conditions A, B, and an additional condition C:

C. Each item $j \in [J]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j\in [J]$$\end{document} has distinct item response functions for at least two latent classes.

For general-response CDMs, the above additional assumption is equivalent to “for all $j \in [J]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j \in [J]$$\end{document} , there exist skill patterns $α \neq α^{'}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }\ne \varvec{\alpha }'$$\end{document} such that $P_{j, α} \neq P_{j, α^{'}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}_{j,\varvec{\alpha }} \ne \mathbb {P}_{j,\varvec{\alpha }'}$$\end{document} ”. Our Theorem 1 does not impose such an assumption for all items, but only implicitly imposes it for $j = 1, \dots, 2 K$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,\ldots ,2K$$\end{document} (because the first 2K rows in $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} form two copies of the identity matrix $I_{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{I}}_K$$\end{document} ). As for $j > 2 K$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j > 2K$$\end{document} , we do not require there exist $α \neq α^{'}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }\ne \varvec{\alpha }'$$\end{document} such that $P_{j, α} \neq P_{j, α^{'}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}_{j,\varvec{\alpha }} \ne \mathbb {P}_{j,\varvec{\alpha }'}$$\end{document} for each j. In fact, Theorem 1 allows some $Y_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_j$$\end{document} for $j > 2 K$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j>2K$$\end{document} to not depend on any attributes; or equivalently, the corresponding row vector $q_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{q}}_j$$\end{document} in the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix is an all-zero row vector. Such all-zero row vectors can be absorbed into the submatrix $Q^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}^*$$\end{document} in condition A of Theorem 1.

Finally, in the following propositions, we propose variations of Theorem 1 by modifying condition B to more easily checkable ones. In Proposition 2, we replace condition B in Theorem 1 by a stronger, but more intuitive condition. This condition has been previously proposed to establish identifiability of exploratory diagnostic models for categorical responses (Fang et al., Reference Fang, Liu and Ying2019; Gu and Dunson, Reference Gu and Dunson2023).

Proposition 2

Condition B in Theorem 1 holds when $Q^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}^*$$\end{document} contains an identity submatrix $I_{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{I}}_K$$\end{document} . Hence, the general-response CDM is strictly identifiable when it vertically stacks three identity submatrices $I_{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{I}}_K$$\end{document} after some row swapping.

In the next proposition, we consider the ExpACDM with parameters $(p, β, γ, Q)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\varvec{p}}, \varvec{\beta }, \varvec{\gamma }, {\textbf{Q}})$$\end{document} , instead of the general-response CDM, and show how condition B can be simplified.

Proposition 3

For the ExpACDM, condition B in Theorem 1 reduces to:

B2. For any $α \neq α^{'} \in {0, 1}^{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }\ne \varvec{\alpha }' \in \{0,1\}^K$$\end{document} , there exists $j > 2 K$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j > 2K$$\end{document} such that $\sum_{k = 1}^{K} q_{j, k} β_{j, k} (α_{k} - α_{k}^{'}) \neq 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum _{k=1}^K q_{j,k}\beta _{j,k} (\alpha _k - \alpha _k') \ne 0$$\end{document} .

Hence, the ExpACDM is strictly identifiable when the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix satisfies condition A, and the main-effect coefficients $β$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta }$$\end{document} satisfies condition B2.

3.2. Generic Identifiability of ExpACDMs

Although strict identifiability is the strongest possible identifiability notion, in practice it may impose too stringent conditions on the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix. For instance, the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix for the TIMSS 2019 response time dataset analyzed in Sect. 6 requires one content skill and one cognitive skill for each item (see Table 2 and related discussion there). This $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix does not contain any K-dimensional standard basis vectors as row vectors, and hence does not contain an identity submatrix $I_{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{I}}_K$$\end{document} . In this subsection, we study generic identifiability, which is a slightly weaker notion than strict identifiability proposed by Allman et al. (Reference Allman, Matias and Rhodes2009). Generic identifiability requires that the model parameters are identifiable almost everywhere in the parameter space, in the sense that they are identifiable except in a Lebesgue measure-zero subset of the parameter space. Existing studies such as Gu and Xu (Reference Gu and Xu2020) and Chen et al. (Reference Chen, Culpepper and Liang2020) proposed practical generic identifiability conditions for binary-response CDMs that contain main effects of latent attributes. Those conditions are much weaker than the strict identifiability conditions.

To study generic identifiability, we next focus on parametric forms of $P_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}_{j, \varvec{\alpha }}$$\end{document} and especially consider ExpACDMs that model the main effects of latent attributes. For notational simplicity, we let $β_{main} = {β_{j, k}}_{k \in [J], k \in [K]}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta }_{\text {main}} = \{ \beta _{j,k}\}_{k\in [J],\; k\in [K]}$$\end{document} denote all main-effect coefficients, and $β = β_{main} \cup {β_{j, 0}}_{j \in [J]}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta }= \varvec{\beta }_{\text {main}} \cup \{ \beta _{j,0}\}_{j\in [J]}$$\end{document} denote the collection of all intercepts and main-effect coefficients. Before defining the concept of generic identifiability, we define the true parameter space $Ω (β_{main}; Q)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Omega (\varvec{\beta }_{\text {main}}; {\textbf{Q}})$$\end{document} for $β_{main}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta }_{\text {main}}$$\end{document} as follows:

(18)

\begin{matrix} Ω (β_{main}; Q) = {β_{main} : β_{j, k} = 0 if q_{j, k} = 0} . \end{matrix}

Here, we are using the previous assumption that $β_{j, k} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{j,k} = 0$$\end{document} if $q_{j, k} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{j,k}=0$$\end{document} (see the text following (14)). We also define the parameter space of $γ = {(γ_{1}, \dots, γ_{J})}^{⊤}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\gamma }=(\gamma _1,\ldots ,\gamma _J)^\top $$\end{document} as the set $Ω (γ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Omega (\varvec{\gamma })$$\end{document} . Recall that in an ExpACDM, $γ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\gamma }$$\end{document} commonly denotes the dispersion parameters of the exponential family, so we define $Ω (γ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Omega (\varvec{\gamma })$$\end{document} as $Ω (γ) = {γ : γ_{j} > 0 for all j \in [J]}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Omega (\varvec{\gamma }) = \{\varvec{\gamma }: \gamma _j > 0\text { for all }j\in [J] \}$$\end{document} . Also, recall that $p$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{p}}$$\end{document} denotes the proportion parameters of the latent attribute patterns. We define the joint parameter space of all parameters $p$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{p}}$$\end{document} , $β$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta }$$\end{document} , and $γ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\gamma }$$\end{document} by

(19)

\begin{matrix} Ω (p, β, γ; Q) = {(p, β, γ) : \sum_{α \in {0, 1}^{K}} p_{α} = 1, p_{α} \geq 0, β_{main} \in Ω (β_{main}; Q), γ \in Ω (γ)} . \end{matrix}

In order to define generic identifiability without the sign flipping issue, we next adopt the monotonicity assumption in (17). In particular, we assume that $β_{j, k} > 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{j,k} > 0$$\end{document} when $q_{j, k} = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{j,k} = 1$$\end{document} , which means that the main effects of the required attributes are all positive. Here, the actual sign of $β_{j, k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{j,k}$$\end{document} is not important and one may assume $β_{j, k} < 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{j,k} < 0$$\end{document} for all j and k instead to achieve identifiability. Now we are ready to define generic identifiability for ExpACDMs.

Definition 7

An ExpACDM with parameters $(p, β, γ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\varvec{p}}, \varvec{\beta }, \varvec{\gamma })$$\end{document} is generically identifiable if the following set has measure zero with respect to the Lebesgue measure on $Ω (p, β, γ; Q)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Omega ({\varvec{p}}, \varvec{\beta }, \varvec{\gamma }; {\textbf{Q}})$$\end{document} :

\begin{matrix} {(p, β, γ) & \in Ω (p, β, γ; Q) : there exist alternative parameters (\bar{p}, \bar{β}, \bar{γ}) \neq (p, β, γ) in \\ Ω (p, β, γ; Q) such that P (Y ∣ p, β, γ) = P (Y ∣ \bar{p}, \bar{β}, \bar{γ})} . \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \{ ({\varvec{p}}, \varvec{\beta }, \varvec{\gamma })&\in \Omega ({\varvec{p}}, \varvec{\beta }, \varvec{\gamma }; {\textbf{Q}}) : \text { there exist alternative parameters } (\overline{{\varvec{p}}}, \overline{\varvec{\beta }}, \overline{\varvec{\gamma }}) \ne ({\varvec{p}}, \varvec{\beta }, \varvec{\gamma }) \text { in } \\&\Omega ({\varvec{p}}, \varvec{\beta }, \varvec{\gamma }; {{\textbf{Q}}}) \text { such that }\mathbb {P}({\textbf{Y}}\mid {\varvec{p}}, \varvec{\beta }, \varvec{\gamma }) = \mathbb {P}({\textbf{Y}}\mid \overline{{\varvec{p}}}, \overline{\varvec{\beta }}, \overline{\varvec{\gamma }} )\}. \end{aligned}$$\end{document}

We next propose generic identifiability conditions for ExpACDMs that are substantially weaker than the strict identifiability conditions in Sect. 3.1.

Theorem 2

Consider an ExpACDM. Assume that $h$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{h}}$$\end{document} is an analytic function, and that the true parameters $(p, β, γ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\varvec{p}}, \varvec{\beta }, \varvec{\gamma })$$\end{document} lie in $Ω (p, β, γ; Q)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Omega ({\varvec{p}}, \varvec{\beta }, \varvec{\gamma }; {\textbf{Q}})$$\end{document} . Then, the model parameters $(p, β, γ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\varvec{p}}, \varvec{\beta }, \varvec{\gamma })$$\end{document} are generically identifiable if the following conditions on the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix hold.

A $^{⋆}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\star }$$\end{document} .

The true $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix can be written as $Q = {[Q_{1}^{⊤}, Q_{2}^{⊤}, Q^{* ⊤}]}^{⊤}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}= [{\textbf{Q}}_1^\top , {\textbf{Q}}_2^\top , {\textbf{Q}}^{*\top }]^\top $$\end{document} after some row permutation, where $Q_{1}, Q_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}_1, {\textbf{Q}}_2$$\end{document} ’s are $K \times K$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K \times K$$\end{document} matrices such that all diagonal entries are equal to one. In other words, we can write

\begin{matrix} Q_{i} = (\begin{matrix} 1 & * & \dots & * \\ * & 1 & \dots & * \\ ⋮ & ⋮ & ⋱ & ⋮ \\ * & * & \dots & 1 \end{matrix}) \end{matrix}

for $i = 1, 2 .$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i = 1, 2.$$\end{document} Here, $*$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$*$$\end{document} indicates an arbitrary value in ${0, 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{0,1\}$$\end{document} .

B $^{⋆}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\star }$$\end{document} .

Suppose that the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix is written as in A $^{⋆}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\star }$$\end{document} , and $Q^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}^*$$\end{document} does not have any all-zero columns. In other words, for any k, there exists $j > 2 K$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j > 2K$$\end{document} such that $q_{j, k} = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{j,k} = 1$$\end{document} .

Conditions in Theorem 2 only depend on the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix but not on other model components. Theorem 2 is motivated by Gu and Xu (Reference Gu and Xu2020) and Chen et al. (Reference Chen, Culpepper and Liang2020), which are the first studies that considered generic identifiability for CDMs. Theorem 4.3 in Gu and Xu (Reference Gu and Xu2020) and Theorem 1 in Chen et al. (Reference Chen, Culpepper and Liang2020) showed that binary-response CDMs are generically identifiable under conditions A $^{⋆}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\star }$$\end{document} and B $^{⋆}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\star }$$\end{document} .

Note that Theorem 2 holds without assuming (1) that $p_{α} > 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{\varvec{\alpha }} > 0$$\end{document} for all $α \in {0, 1}^{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }\in \{0,1\}^K$$\end{document} . This is because ${p : \exists α such that p_{α} = 0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{ {\varvec{p}}: \exists \varvec{\alpha }\text { such that } p_{\varvec{\alpha }} = 0\}$$\end{document} is a measure-zero subset of ${p : \sum_{α} p_{α} = 1, p_{α} \geq 0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{ {\varvec{p}}: \sum _{\varvec{\alpha }} p_{\varvec{\alpha }} = 1, p_{\varvec{\alpha }} \ge 0\}$$\end{document} , and hence relaxing the condition from $p_{α} > 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{\varvec{\alpha }}>0$$\end{document} to $p_{α} \geq 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{\varvec{\alpha }}\ge 0$$\end{document} does not violate generic identifiability. In addition, we point out that the exponential family assumption for $P$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {P}}$$\end{document} is not crucial for generic identifiability to hold. In fact, $P$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {P}}$$\end{document} can be any parametric family as long as the model parameters $η_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_{j,\varvec{\alpha }}$$\end{document} satisfy the additive assumption in (14). Hence, the conditions in Theorem 2 also guarantee the generic identifiability of ACDMs with non-exponential family distributions; see more discussions in Remark S.1 in the Supplementary Material S.2.

Theorem 2 may be extended to the setting of ExpGDM and other all-effect CDMs for general responses. This is because these models include the main effects of the latent attributes, which are the key components underlying generic identifiability. We leave the rigorous statement of generic identifiability of ExpGDMs as future work. On the other hand, Theorem 2 does not apply to the DINA model, because the DINA model is conjunctive and does not include the main effects of the latent attributes. Under the DINA model for general responses, we conjecture that another identifiability notion similar to the $p$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{p}}$$\end{document} -partial identifiability in Gu and Xu (Reference Gu and Xu2020) for conventional categorical-response CDMs can be studied.

4. Universal EM Algorithms for Estimating ExpCDMs

Our identifiability result in Theorem 1 has a nice consequence of statistical consistency of the maximum likelihood estimator (MLE). We next formally establish this result. Consider a sample with N independent and identically distributed response vectors $Y_{1 : N} = {Y_{1}, \dots, Y_{N}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Y}}_{1:N} = \{{\textbf{Y}}_1, \ldots , {\textbf{Y}}_N\}$$\end{document} from an ExpCDM in (6) with true parameters $(η_{0}, p_{0})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{\eta }_0, {\varvec{p}}_0)$$\end{document} . Given the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix, the marginal log-likelihood can be written as

(20)

\begin{matrix} ℓ (η, p ∣ Y_{1 : N}, Q) = \sum_{i = 1}^{N} log P (Y = Y_{i} ∣ η, p) = \sum_{i = 1}^{N} log (\sum_{α \in {0, 1}^{K}}, p_{α}, \prod_{j = 1}^{J}, g, (Y_{i, j}; η_{j, α})), \end{matrix}

where $η$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }$$\end{document} is the collection of all $η_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_{j,\varvec{\alpha }}$$\end{document} ’s that are subject to the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix constraints under the considered ExpCDM. Define the MLE of model parameters as

\begin{matrix} (\hat{η}, \hat{p}) = \underset{η, p}{argmax} ℓ (η, p ∣ Y_{1 : N}) . \end{matrix}

Following from the identifiability conclusions, we have the following proposition on parameter estimation consistency.

Proposition 4

(Parameter Estimation Consistency) Consider an ExpCDM in the form (6) where the true parameters $(η_{0}, p_{0})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{\eta }_0, {\varvec{p}}_0)$$\end{document} lie in the interior of the parameter space. Suppose the strict identifiability conditions A and B in Theorem 1 hold. Then $(\hat{η}, \hat{p})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\widehat{\varvec{\eta }}, \widehat{{\varvec{p}}})$$\end{document} converge to $(η_{0}, p_{0})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{\eta }_0, {\varvec{p}}_0)$$\end{document} almost surely as $N \to \infty$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N \rightarrow \infty $$\end{document} .

Next, consider an ExpACDM with true parameters $(p_{0}, β_{0}, γ_{0})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\varvec{p}}_0, \varvec{\beta }_0, \varvec{\gamma }_0)$$\end{document} where the generic identifiability conditions A $^{⋆}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^\star $$\end{document} and B $^{⋆}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^\star $$\end{document} in Theorem 2 hold. Then, the MLE $(\hat{p}, \hat{β}, \hat{γ})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\widehat{{\varvec{p}}}, \widehat{\varvec{\beta }}, \widehat{\varvec{\gamma }})$$\end{document} converges to $(p_{0}, β_{0}, γ_{0})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\varvec{p}}_0, \varvec{\beta }_0, \varvec{\gamma }_0)$$\end{document} almost surely, for $(p_{0}, β_{0}, γ_{0}) \in Ω (p, β, γ; Q) \ N$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\varvec{p}}_0, \varvec{\beta }_0, \varvec{\gamma }_0) \in \Omega ({\varvec{p}}, \varvec{\beta }, \varvec{\gamma }; {\textbf{Q}}) {\setminus } \mathcal {N}$$\end{document} . Here, $N$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {N}$$\end{document} is a negligible measure-zero set in $Ω (p, β, γ; Q)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Omega ({\varvec{p}}, \varvec{\beta }, \varvec{\gamma }; {\textbf{Q}})$$\end{document} .

Next we propose an expectation-maximization (EM, Dempster et al., Reference Dempster, Laird and Rubin1977) algorithm for parameter estimation. For the purpose of developing an EM algorithm, consider the log-likelihood for the complete data $(Y_{1 : N}, A_{1 : N}) = {(Y_{i}, A_{i})}_{i = 1, \dots, N}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\textbf{Y}}_{1:N}, {\textbf{A}}_{1:N}) = ( {\textbf{Y}}_i, {\textbf{A}}_i)_{i=1,\ldots ,N}$$\end{document} :

(21)

\begin{matrix} ℓ_{c} (η, p ∣ Y_{1 : N}, A_{1 : N}, Q) & = log [\prod_{i = 1}^{N} P (A_{i}) P (Y_{i} ∣ A_{i}, Q)] \\ = \sum_{α \in {0, 1}^{K}} \sum_{i = 1}^{N} 1 (A_{i} = α) (log p_{α} + \sum_{j = 1}^{J} log g (Y_{i, j}; η_{j, α})), \end{matrix}

where $A_{i} \in {0, 1}^{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{A}}_i\in \{0,1\}^K$$\end{document} denotes the latent attribute profile of the ith subject in the sample. In Sects. 4.1 and Sect. 4.2, we present details of the EM algorithms for the general ExpDINA and ExpACDM models.

4.1. EM Algorithm for the ExpDINA

Consider the ExpDINA model defined in Sect. 2.2.1, which is parametrized by item parameters ${η_{j, (h)}}_{j \in [J], h = 0, 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\varvec{\eta }_{j,(h)}\}_{j \in [J], h = 0,1}$$\end{document} and $p$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{p}}$$\end{document} . In this subsection, for notational simplicity, we write $η = {η_{j, (h)}}_{j \in [J], h = 0, 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }= \{\varvec{\eta }_{j,(h)}\}_{j \in [J], h = 0,1}$$\end{document} to denote all item parameters. For the ExpDINA model, the complete data log-likelihood (21) can be rewritten as:

(22)

\begin{matrix} ℓ_{c}^{ExpDINA} (η, p ∣ Y_{1 : N}, A_{1 : N}) = & \sum_{α \in {0, 1}^{K}} \sum_{i = 1}^{N} 1 (A_{i} = α) log p_{α} \\ + \sum_{α \in {0, 1}^{K}} \sum_{i = 1}^{N} 1 (A_{i} = α) \sum_{j = 1}^{J} \\ [(η_{j, (0)}^{⊤} T (Y_{i, j}) - A (η_{j, (0)}) + log h (Y_{i, j})) (1 - Γ_{j, α}) \\ + (η_{j, (1)}^{⊤} T (Y_{i, j}) - A (η_{j, (1)}) + log h (Y_{i, j})) Γ_{j, α}] . \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \ell ^{\text {ExpDINA}}_{\text {c}} (\varvec{\eta }, {\varvec{p}}\mid {\textbf{Y}}_{1:N}, {\textbf{A}}_{1:N}) =&\, \sum _{\varvec{\alpha }\in \{0,1\}^K} \sum _{i=1}^N \mathbb {1}({\textbf{A}}_i=\varvec{\alpha }) \log p_{\varvec{\alpha }} \\ \nonumber&+ \sum _{\varvec{\alpha }\in \{0,1\}^K} \sum _{i=1}^N \mathbb {1}({\textbf{A}}_i=\varvec{\alpha }) \sum _{j = 1}^J \\ \nonumber&\Big [ \left( \varvec{\eta }_{j,(0)}^{\top } {\textbf{T}}(Y_{i,j}) - A(\varvec{\eta }_{j,(0)}) + \log h(Y_{i,j})\right) (1 - \Gamma _{j, \varvec{\alpha }}) \\ \nonumber&+\left( \varvec{\eta }_{j,(1)}^{\top } {\textbf{T}}(Y_{i,j}) - A(\varvec{\eta }_{j,(1)}) + \log h(Y_{i,j}) \right) \Gamma _{j, \varvec{\alpha }} \Big ]. \end{aligned}$$\end{document}

Recall that $Γ_{j, α}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Gamma _{j,\varvec{\alpha }}$$\end{document} is the binary ideal response, which equals one if and only if the latent skill pattern $α$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }$$\end{document} possesses all skills required by item j. We next present details of the E step and M step in the EM algorithm.

E step for ExpDINA In the E step of the t-th EM iteration, we calculate the conditional expectation of $ℓ_{c}^{ExpDINA}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell ^{\text {ExpDINA}}_{\text {c}}$$\end{document} in (22) given the current parameter values $η^{(t)}, p^{(t)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }^{(t)}, {\varvec{p}}^{(t)}$$\end{document} . As $A_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{A}}_i$$\end{document} is discrete and ranges in ${0, 1}^{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{0,1\}^K$$\end{document} , it suffices to evaluate $φ_{i, α}^{(t + 1)} : = P (A_{i} = α ∣ η^{(t)}, p^{(t)})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varphi _{i,\varvec{\alpha }}^{(t+1)}:= {\mathbb {P}}({\textbf{A}}_i=\varvec{\alpha }\mid \varvec{\eta }^{(t)}, {\varvec{p}}^{(t)})$$\end{document} for all $α \in {0, 1}^{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }\in \{0,1\}^K$$\end{document} . This can be calculated by noting that

\begin{matrix} φ_{i, α}^{(t + 1)} = p_{α} \prod_{j = 1}^{J} P (Y_{i, j} ∣ η^{(t)}, p^{(t)}, A_{i} = α) / (\sum_{α^{'} \in {0, 1}^{K}} p_{α^{'}} \prod_{j = 1}^{J} P (Y_{i, j} ∣ η^{(t)}, p^{(t)}, A_{i} = α^{'})) . \end{matrix}

The exact form of $φ_{i, α}^{(t + 1)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varphi _{i,\varvec{\alpha }}^{(t+1)}$$\end{document} is given in Algorithm 1. Now using $φ_{i, α}^{(t + 1)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varphi _{i,\varvec{\alpha }}^{(t+1)}$$\end{document} , we can calculate the conditional expectation of $ℓ_{c}^{ExpDINA}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell ^{\text {ExpDINA}}_{\text {c}}$$\end{document} as follows and denote it as $Q (η, p ∣ η^{(t)}, p^{(t)})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Q(\varvec{\eta }, {\varvec{p}}\mid \varvec{\eta }^{(t)}, {\varvec{p}}^{(t)})$$\end{document} :

\begin{matrix} Q (η, p ∣ η^{(t)}, p^{(t)}) = E [ℓ_{c} (η, p ∣ Y, A) | η^{(t)}, p^{(t)}] \\ = & \sum_{α \in {0, 1}^{K}} \sum_{i = 1}^{N} φ_{i, α}^{(t + 1)} log p_{α} \\ + \sum_{α \in {0, 1}^{K}} \sum_{i = 1}^{N} φ_{i, α}^{(t + 1)} \sum_{j = 1}^{J} [(η_{j, (0)}^{⊤} T (Y_{i, j}) - A (η_{j, (0)}) + log h (Y_{i, j})) (1 - Γ_{j, α}) \\ + (η_{j, (1)}^{⊤} T (Y_{i, j}) - A (η_{j, (1)}) + log h (Y_{i, j})) Γ_{j, α}] . \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}&~ Q(\varvec{\eta }, {\varvec{p}}\mid \varvec{\eta }^{(t)}, {\varvec{p}}^{(t)}) = \mathbb {\mathbb {E}} \Big [\ell _{\text {c}}(\varvec{\eta }, {\varvec{p}}\mid {\textbf{Y}}, {\textbf{A}}) ~\Big \vert ~ \varvec{\eta }^{(t)}, {\varvec{p}}^{(t)} \Big ] \\ =&~ \sum _{\varvec{\alpha }\in \{0,1\}^K} \sum _{i=1}^N \varphi _{i,\varvec{\alpha }}^{(t+1)} \log p_{\varvec{\alpha }} \\&\quad + \sum _{\varvec{\alpha }\in \{0,1\}^K} \sum _{i=1}^N \varphi _{i,\varvec{\alpha }}^{(t+1)} \sum _{j = 1}^J \Big [ \left( \varvec{\eta }_{j,(0)}^{\top } {\textbf{T}}(Y_{i,j}) - A(\varvec{\eta }_{j,(0)}) + \log h(Y_{i,j})\right) (1 - \Gamma _{j, \varvec{\alpha }}) \\&\quad + \left( \varvec{\eta }_{j,(1)}^{\top } {\textbf{T}}(Y_{i,j}) - A(\varvec{\eta }_{j,(1)}) + \log h(Y_{i,j}) \right) \Gamma _{j, \varvec{\alpha }} \Big ]. \end{aligned}$$\end{document}

M step for ExpDINA Next, in the M step in the tth EM iteration, we maximize $Q (η, p ∣ η^{(t)}, p^{(t)})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Q(\varvec{\eta }, {\varvec{p}}\mid \varvec{\eta }^{(t)}, {\varvec{p}}^{(t)})$$\end{document} with respect to $(η, p)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{\eta }, {\varvec{p}})$$\end{document} and update the model parameters as follows:

\begin{matrix} (η^{(t + 1)}, p^{(t + 1)}) = \underset{η, p}{argmax} Q (η, p ∣ η^{(t)}, p^{(t)}) . \end{matrix}

Since $η_{j, (0)}, η_{j, (1)}, p$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_{j,(0)}, \varvec{\eta }_{j,(1)}, {\varvec{p}}$$\end{document} are continuous parameters, we can set their partial gradients to zero and update parameters as the solutions to the gradient equations. Specifically, we will solve

(23)

\begin{matrix} \underset{p}{argmax} \sum_{α \in {0, 1}^{K}} \sum_{i = 1}^{N} φ_{i, α}^{(t + 1)} log p_{α}, subject to \sum_{α \in {0, 1}^{K}} p_{α} = 1, \end{matrix}

(24)

\begin{matrix} \underset{η_{j, (0)}}{argmax} \sum_{α \in {0, 1}^{K}} \sum_{i = 1}^{N} φ_{i, α}^{(t + 1)} \sum_{j = 1}^{J} (η_{j, (0)}^{⊤} T (Y_{i, j}) - A (η_{j, (0)})) (1 - Γ_{j, α}), \forall j \in [J], \end{matrix}

(25)

\begin{matrix} \underset{η_{j, (1)}}{argmax} \sum_{α \in {0, 1}^{K}} \sum_{i = 1}^{N} φ_{i, α}^{(t + 1)} \sum_{j = 1}^{J} (η_{j, (1)}^{⊤} T (Y_{i, j}) - A (η_{j, (1)})) Γ_{j, α}, \forall j \in [J] . \end{matrix}

The first optimization problem in (23) has a closed-form solution for $p$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{p}}$$\end{document} :

\begin{matrix} p_{α} = \frac{\sum_{i = 1}^{N} φ_{i, α}^{(t + 1)}}{\sum_{α \in {0, 1}^{K}} \sum_{i = 1}^{N} φ_{i, α^{'}}^{(t + 1)}}, \forall α \in {0, 1}^{K} . \end{matrix}

For optimization problems in (24) and (25) for the item parameters, there also often exist closed-form updates for many distributions. Consider $η_{j, (1)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_{j,(1)}$$\end{document} in (25) for some item $j \in [J]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j\in [J]$$\end{document} .

Setting the partial gradient of (25) with respect to $η_{j, (1)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_{j,(1)}$$\end{document} to zero, we obtain:

(26)

\begin{matrix} \sum_{α} \sum_{i} φ_{i, α}^{(t + 1)} Γ_{j, α} (T (Y_{i, j}) - \nabla_{η} A (η)) = 0, ⟹ \nabla_{η_{j, (1)}} A (η) = \frac{\sum_{i, α} T (Y_{i, j}) φ_{i, α}^{(t + 1)} Γ_{j, α}}{\sum_{i, α} φ_{i, α}^{(t + 1)} Γ_{j, α}} . \end{matrix}

Now note that an exponential family distribution has a nice property that $A (η)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A(\varvec{\eta })$$\end{document} is always a convex function of $η$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }$$\end{document} (Casella and Berger, Reference Casella and Berger2021). This is because the Hessian matrix of $A (η)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A(\varvec{\eta })$$\end{document} equals the covariance matrix of the sufficient statistics $T$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{T}}$$\end{document} and hence is always positive definite. This property implies that $η \to \nabla_{η} A (η)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }\rightarrow \nabla _{\varvec{\eta }} A(\varvec{\eta })$$\end{document} is always an invertible map of $η$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }$$\end{document} and there exists a unique solution to the gradient equation in (26). In fact, for many widely used exponential family distributions, (26) has an explicit solution and hence the corresponding ExpDINA has explicit M step updates.

For example, under the Lognormal-DINA model, maximizing (24)–(25) gives the following updates of the mean and variance parameters:

\begin{matrix} μ_{j, 1}^{(t + 1)} & = \frac{\sum_{i, α} φ_{i, α} Γ_{j, α} log Y_{i, j}}{\sum_{i, α} φ_{i, α} Γ_{j, α}}, {(σ_{j, 1}^{2})}^{(t + 1)} = \frac{\sum_{i, α} φ_{i, α} Γ_{j, α} {(log Y_{i, j} - μ_{j, 1}^{(t + 1)})}^{2}}{\sum_{i, α} φ_{i, α} Γ_{j, α}}; \\ μ_{j, 0}^{(t + 1)} & = \frac{\sum_{i, α} φ_{i, α} (1 - Γ_{j, α}) log Y_{i, j}}{\sum_{i, α} φ_{i, α} (1 - Γ_{j, α})}, {(σ_{j, 0}^{2})}^{(t + 1)} = \frac{\sum_{i, α} φ_{i, α} (1 - Γ_{j, α}) {(log Y_{i, j} - μ_{j, 1}^{(t + 1)})}^{2}}{\sum_{i, α} φ_{i, α} (1 - Γ_{j, α})} . \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \mu _{j,1}^{(t+1)}&= \frac{\sum _{i, \varvec{\alpha }} \varphi _{i, \varvec{\alpha }} \Gamma _{j, \varvec{\alpha }} \log Y_{i,j}}{\sum _{i, \varvec{\alpha }} \varphi _{i, \varvec{\alpha }} \Gamma _{j, \varvec{\alpha }}}, \;~\qquad \quad (\sigma _{j,1}^2)^{(t+1)} = \frac{\sum _{i, \varvec{\alpha }} \varphi _{i, \varvec{\alpha }} \Gamma _{j, \varvec{\alpha }} (\log Y_{i,j} - \mu _{j,1}^{(t+1)})^2}{\sum _{i, \varvec{\alpha }} \varphi _{i, \varvec{\alpha }} \Gamma _{j, \varvec{\alpha }}};\\ \mu _{j,0}^{(t+1)}&= \frac{\sum _{i, \varvec{\alpha }} \varphi _{i, \varvec{\alpha }} (1-\Gamma _{j, \varvec{\alpha }}) \log Y_{i,j}}{\sum _{i, \varvec{\alpha }} \varphi _{i, \varvec{\alpha }} (1-\Gamma _{j, \varvec{\alpha }})}, \quad (\sigma _{j,0}^2)^{(t+1)} = \frac{\sum _{i, \varvec{\alpha }} \varphi _{i, \varvec{\alpha }} (1-\Gamma _{j, \varvec{\alpha }}) (\log Y_{i,j} - \mu _{j,1}^{(t+1)})^2}{\sum _{i, \varvec{\alpha }} \varphi _{i, \varvec{\alpha }} (1-\Gamma _{j, \varvec{\alpha }})}. \end{aligned}$$\end{document}

Updates for ExpDINA for other transformed-Normal distributions (e.g., logistic-Normal) can be obtained by simply replacing the $log Y_{i, j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\log Y_{i,j}$$\end{document} term in the above display with the corresponding transform. As another example, under the Poisson-DINA model, maximizing (24)–(25) gives the following updates for the Poisson rate parameters:

\begin{matrix} λ_{j, 1}^{(t + 1)} & = \frac{\sum_{i, α} φ_{i, α} Γ_{j, α} Y_{i, j}}{\sum_{i, α} φ_{i, α} Γ_{j, α}}, λ_{j, 0}^{(t + 1)} = \frac{\sum_{i, α} φ_{i, α} (1 - Γ_{j, α}) Y_{i, j}}{\sum_{i, α} φ_{i, α} (1 - Γ_{j, α})} . \end{matrix}

Algorithm 1 EM Algorithm for the General ExpDINA Model

4.2. EM Algorithm for the ExpACDM

In this subsection, we propose an EM algorithm to estimate the model parameters under the ExpACDM defined in Sect. 2.2.2. Recall that the ExpACDM is parametrized by $η_{j, α} = h (β_{j, 0} + \sum_{k = 1}^{K} β_{j, k} q_{j, k} α_{k}, γ_{j})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_{j,\varvec{\alpha }} = {\varvec{h}}\left( \beta _{j,0} + \sum _{k=1}^K \beta _{j,k} q_{j,k} \alpha _k, \;\varvec{\gamma }_j\right) $$\end{document} with all parameters collected in $(β, γ, p)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{\beta }, \varvec{\gamma }, {\varvec{p}})$$\end{document} . The complete data log-likelihood (21) under an ExpACDM can be written as:

(27)

\begin{matrix} ℓ_{c}^{ExpACDM} (β, γ, p ∣ Y, A) = & \sum_{α \in {0, 1}^{K}} \sum_{i = 1}^{N} 1 (A_{i} = α) log p_{α} \\ + \sum_{α \in {0, 1}^{K}} \sum_{i = 1}^{N} 1 (A_{i} = α) \sum_{j = 1}^{J} {h {(β_{j, 0} + \sum_{k = 1}^{K} β_{j, k} q_{j, k} α_{k}, γ_{j})}^{⊤} T (Y_{i, j}) \\ - A (h (β_{j, 0} + \sum_{k = 1}^{K} β_{j, k} q_{j, k} α_{k}, γ_{j})) + log h (Y_{i, j})} . \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \ell _{\text {c}}^{\text {ExpACDM}} (\varvec{\beta }, \varvec{\gamma }, {\varvec{p}}\mid {\textbf{Y}}, {\textbf{A}}) =&\, \sum _{\varvec{\alpha }\in \{0,1\}^K} \sum _{i=1}^N \mathbb {1}({\textbf{A}}_i=\varvec{\alpha }) \log p_{\varvec{\alpha }} \\ \nonumber&+ \sum _{\varvec{\alpha }\in \{0,1\}^K} \sum _{i=1}^N \mathbb {1}({\textbf{A}}_i=\varvec{\alpha }) \sum _{j = 1}^J \Big \{{\varvec{h}}(\beta _{j,0} + \sum _{k=1}^K \beta _{j,k} q_{j,k} \alpha _k, \varvec{\gamma }_j)^{\top } {\textbf{T}}(Y_{i,j}) \\ \nonumber&- A \big ({\varvec{h}}(\beta _{j,0} +\sum _{k=1}^K \beta _{j,k} q_{j,k} \alpha _k, \varvec{\gamma }_j) \big ) + \log h(Y_{i,j}) \Big \}. \end{aligned}$$\end{document}

Based on the above expression, we can derive an EM algorithm to estimate parameters in an ExpACDM. The detailed steps are summarized in Algorithm 2.

Algorithm 2 EM Algorithm for the General ExpACDM

An interesting fact is that in Algorithm 2, maximizing $(β_{j}, γ_{j})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{\beta }_j, \varvec{\gamma }_j)$$\end{document} in each M step is similar to obtaining the MLE of the regression coefficients in a generalized linear model, but with the observed covariates replaced with the latent attributes evaluated in the E step. In particular, in the special case of the transformed-Normal distributions, this maximization is similar to linear regression and has a closed form solution. For example, when $P$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {P}}$$\end{document} is the Normal distribution, the maximization for $(β_{j}, γ_{j})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{\beta }_j, \varvec{\gamma }_j)$$\end{document} in the M step can be written as:

\begin{matrix} {(β_{j}, γ_{j})}^{(t + 1)} = \underset{β_{j}, γ_{j}}{argmax} \sum_{i, α} & (- \frac{{(Y_{i, j} - β_{j, 0} - \sum_{k = 1}^{K} β_{j, k} q_{j, k} α_{k})}^{2}}{2 γ_{j}^{2}} - \frac{1}{2} log γ_{j}^{2}) φ_{i, α}^{(t + 1)} \end{matrix}

for any $j \in [J]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j\in [J]$$\end{document} . So for those $q_{j, k} = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{j,k}=1$$\end{document} , we can get the updates

\begin{matrix} {\hat{β}}_{j, k}^{(t + 1)} = \frac{\sum_{i, α} Y_{i, j} α_{k} φ_{i, α}^{(t + 1)}}{\sum_{i, α} α_{k} φ_{i, α}^{(t + 1)}}, {\hat{γ}}_{j}^{(t + 1)} = \sqrt{\frac{\sum_{i, α} {(Y_{i, j} - β_{j, 0} - \sum_{k = 1}^{K} β_{j, k} q_{j, k} α_{k})}^{2} φ_{i, α}^{(t + 1)}}{N}} . \end{matrix}

The updates under the transformed-Normal distributions can be obtained similarly by applying the corresponding transform of $Y_{i, j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{i,j}$$\end{document} . For other members in the exponential family beyond the transformed-Normal, maximizing $(β_{j}, γ_{j})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{\beta }_j, \varvec{\gamma }_j)$$\end{document} may not have closed forms and one may use available optimization software to find $({\hat{β}}_{j}, {\hat{γ}}_{j})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\widehat{\varvec{\beta }}}_j, {\widehat{\varvec{\gamma }}}_j)$$\end{document} in the M step.

Although we have focused on describing EM algorithms for ExpDINA and ExpACDM in this section, we remark that an EM algorithm with closed-form M step can be similarly developed for the ExpGDM (ExpGDINA). Furthermore, the exponential family assumption in Algorithms 1 and 2 is not essential to our EM procedures. In Supplementary Material S.4, we demonstrate how our algorithms can be modified to estimate the negative-binomial-based DINA and negative-binomial-based ACDM.

5. Simulation Studies

We conduct simulation studies under various models in the proposed family. We have two goals in the simulations: (a) to empirically verify the theoretical results of identifiability and consistency; and (b) to assess the computational performance of the proposed EM algorithms. Under the ExpDINA and the ExpACDM, we consider the Normal and transformed Normal distributions (i.e., lognormal and logistic normal) for continuous responses in Sect. 5.1, and the Poisson and negative binomial distributions for count responses in Sect. 5.2. We remark that in addition to the distributions considered in this section, it is also easy to use our framework and estimation procedures for other exponential family distributions such as Gamma and Beta to model continuous positive, and continuous bounded data, respectively.

5.1. Simulations Under the Normal- and Transformed Normal-CDMs

We first describe the true parameter settings used in the simulations. In all simulations, we set the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix and proportion parameter $p$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{p}}$$\end{document} as follows. Consider $K = 5$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K = 5$$\end{document} latent attributes and $J = 20$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J = 20$$\end{document} items. The $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix takes the form

(28)

\begin{matrix} Q = (\begin{matrix} I_{K} \\ I_{K} \\ I_{K} \\ Q_{1} \end{matrix}), where Q_{1} = {(\begin{matrix} 1 & 1 & 0 \\ 1 & ⋱ & ⋱ \\ ⋱ & ⋱ & 1 \\ 0 & 1 & 1 \end{matrix})}_{K \times K} . \end{matrix}

The above $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix satisfies our strict identifiability conditions in Theorem 1 and Proposition 2. The proportion parameters are set to be uniform with $p_{α} = 1 / 2^{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{\varvec{\alpha }} = 1/2^K$$\end{document} for all $α \in {0, 1}^{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }\in \{0, 1\}^K$$\end{document} . We consider varying sample sizes $N = 100, 500, 1000, 1500, 2000$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N = 100, 500, 1000, 1500, 2000$$\end{document} . In each setting, 100 independent simulation replicates are performed. Under the Normal-DINA model, we set the true item parameters as

\begin{matrix} μ_{j, 0} = - 1, μ_{j, 1} = 2, σ_{j, 0} = 1, σ_{j, 1} = 1 . \end{matrix}

Under the Normal-ACDM, we set the coefficients $β_{j, k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{j,k}$$\end{document} by

\begin{matrix} β_{j, 0} = - 1; β_{j, k} = \frac{3}{\sum_{k^{'} = 1}^{K} q_{j, k^{'}}} 1 (q_{j, k} = 1), \forall j \in [J], k \in [K] . \end{matrix}

The variance parameter $γ_{j} = σ_{j}^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _j = \sigma _j^2$$\end{document} is fixed to be 1 for all j.

Recall that in Example 1, we have mentioned that general-response CDMs based on any transformed Normal distributions (such as lognormal and logistic Normal) are equivalent to the Normal-based CDM. This fact implies that the estimation procedures for these models are identical up to an invertible transformation in the M step. In preliminary simulations, we have tried estimating the Normal-based and lognormal-based CDMs independently, and the estimation accuracies for the two models are exactly the same. Hence, we only report the estimation accuracy for the Normal-based CDM in this section.

In each of the $C = 100$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C=100$$\end{document} independent replicates, we generate data using the above parameter settings and fit our EM Algorithms 1 or 2 with a random initialization. We calculate the root average mean squared error (RMSE) of the proportion parameters and item parameters based on the simulation replicates. Figure 2 displays the average RMSEs for the Normal-DINA and Normal-ACDM. In each simulation setting, the RMSE is defined as

\begin{matrix} \sqrt{\frac{1}{C} \sum_{c = 1}^{C} \frac{‖ {\hat{θ}}^{(c)} - θ_{0} ‖_{2}^{2}}{dim (θ_{0})}}, \end{matrix}

where $θ_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }_0$$\end{document} is the true parameter vector, ${\hat{θ}}^{(c)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\varvec{\theta }}^{(c)}$$\end{document} is the estimator in the c-th simulation replication, and $dim (θ_{0})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {dim}(\varvec{\theta }_0)$$\end{document} denotes the dimension of the vector $θ_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }_0$$\end{document} . The exact RMSE values are included in Tables S.1, S.2, and S.3 in the Supplementary Material.

Figure 2 RMSEs of parameters under the Normal-DINA (red) and Normal-ACDM (blue).

Figure 3 RMSEs of $p$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{p}}$$\end{document} with respect to $\frac{1}{\sqrt{N}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{1}{\sqrt{N}}$$\end{document} under the Normal-DINA model.

Figure 2 clearly shows that the RMSE decreases as the sample size N increases. Furthermore, in Fig. 3, we plot the average RMSE of $p$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{p}}$$\end{document} under the Normal-DINA model versus $1 / \sqrt{N}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1/\sqrt{N}$$\end{document} , and it is clear that the RMSE is linear with respect to $1 / \sqrt{N}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1/{\sqrt{N}}$$\end{document} . This observation empirically validates that our estimation procedure is statistically consistent and converges at the usual parametric rate $1 / \sqrt{N}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1/{\sqrt{N}}$$\end{document} . In addition, the above simulation results also show that our EM algorithms have good computational performance and can efficiently find the MLE.

In the Supplementary Material S.5.2, we conduct additional simulation studies with alternative $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrices that do not satisfy the strict identifiability conditions in Theorem 1, but satisfy the generic identifiability conditions in Theorem 2. The simulation results empirically show that in such settings, the model parameters can still be consistently estimated by our proposed method.

5.2. Simulations Under the Poisson- and Negative Binomial-CDMs

We also conduct simulation studies for general-response CDMs for multivariate count data. Here, we consider Poisson and negative binomial distributions to model the count responses. The $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix and the proportion parameters $p$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{p}}$$\end{document} are set to be the same as described in the previous subsection. The distribution-specific item parameters are set as follows. For the Poisson-DINA model, we set the item parameters as $λ_{j, 0} = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{j,0} = 1$$\end{document} and $λ_{j, 1} = 3 .$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{j,1} = 3.$$\end{document} For the Poisson-ACDM, we set the true model parameters as

\begin{matrix} β_{j, 0} = 1, β_{j, k} = \frac{2}{\sum_{k^{'} = 1}^{K} q_{j, k^{'}}} 1 (q_{j, k} = 1), \forall j \in [J], k \in [K] . \end{matrix}

For the negative binomial-DINA model (NegBin-DINA, defined in Supplementary Material S.1), we set the item parameters as

\begin{matrix} r_{j, 0} = 1, r_{j, 1} = 3, π_{j, 0} = 0.5, π_{j, 1} = 0.5, \forall j \in [J], \end{matrix}

where $r_{j, 0}, r_{j, 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_{j,0}, r_{j,1}$$\end{document} are the number of successes and $π_{j, 0}, π_{j, 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{j,0}, \pi _{j,1}$$\end{document} are the success probability in a negative binomial distribution. For the negative binomial-ACDM (NegBin-ACDM, defined in Supplementary Material S.1), we set the true model parameters as

\begin{matrix} β_{j, 0} = 1, β_{j, k} = \frac{2}{\sum_{k^{'} = 1}^{K} q_{j, k^{'}}} 1 (q_{j, k} = 1), γ_{j} = π_{j} = 0.5, \forall j \in [J], k \in [K] . \end{matrix}

Figures 4 and 5 report the RMSEs of the estimated parameters obtained from the replicated simulations. Similar to the Normal-based CDMs, the RMSEs here also decrease as N increases, at the typical $1 / \sqrt{N}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1/\sqrt{N}$$\end{document} rate.

Figure 4 RMSEs of parameters under the Poisson-DINA (red) and Poisson-ACDM (blue).

Figure 5 RMSEs of parameters under the NegBin-DINA (red) and NegBin-ACDM (blue).

Finally, we remark that our estimation methods are computationally quite efficient. In the simulation settings considered in this section, the computation time taken by our EM algorithm is less than one minute on average even for a sample size as large as $N = 2000$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N = 2000$$\end{document} . We present the average number of EM iterations and computation time in Supplementary Material S.5.1.

6. Application to the TIMSS 2019 Response Time Data

We demonstrate the proposed general-response CDM by applying it to a response time dataset extracted from the TIMSS 2019 assessment (Fishbein et al., 2021). This dataset is from the mathematics assessment data of eighth-grade students in the United States. It consists of each student’s time spent on each item screen (in seconds). If a question has sub-questions that share the same screen, all the sub-questions combined are regarded as a single question, and the overall response time is recorded. We focus on the students who received booklet number 14. After data preprocessing (see details in Supplementary Material S.5), the dataset consists of $N = 620$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N = 620$$\end{document} students’ response times to $J = 29$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J = 29$$\end{document} items.

The TIMSS 2019 mathematics assessment aims to measure four content skills (Number, Algebra, Geometry, and Data and Probability) and three cognitive skills (Knowing, Applying, and Reasoning). The TIMSS 2019 database specifies how each of the $J = 29$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J=29$$\end{document} items measures one content skill and one cognitive skill. Based on this information, we construct a $29 \times 7$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$29\times 7$$\end{document} $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix with $K = 7$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=7$$\end{document} skill attributes: $A_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_1$$\end{document} : Number, $A_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_2$$\end{document} : Algebra, $A_{3}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_3$$\end{document} : Geometry, $A_{4}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_4$$\end{document} : Data and Probability, $A_{5}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_5$$\end{document} : Knowing, $A_{6}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_6$$\end{document} : Applying, $A_{7}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_7$$\end{document} : Reasoning. Each row of this $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix has exactly two nonzero entries in one content skill and one cognitive skill. We provide the details of this $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix in Table 2. It is not hard to verify that this $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix satisfies our generic identifiability conditions in Theorem 2 under any general-response ExpACDMs, and we choose to use the lognormal-ACDM to analyze this dataset. The TIMSS database also provides additional item information, including a brief description of the item type (whether it is a multiple choice item or a constructed response item), and the correct response percentage among the U.S. students. We present this information in Table 3.

Table 2 $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix for the TIMSS 2019 math assessment booklet 14.

Table 3 Additional item information for items in TIMSS 2019 math assessment booklet 14. Starred ( $^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^*$$\end{document} ) items are items that are composed of sub-questions; for these items, we display the smallest correct response percentage among all sub-questions.

Given the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix in Table 2, we fit the lognormal-ACDM using Algorithm 2. In addition to the identifiability consideration, our rationale for adopting the additive model assumption is that in order to solve each problem, students need to perform operations (i.e., attributes) that are specified by the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix additively. We assume that each operation is carried out separately, and the total log-time is the sum of the log-times for each operation. For instance, the question “Value of X in $10 / 15 = X / 18$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10/15 = X/18$$\end{document} ” (this is the 15th question in our dataset; see Tables 2 and 3 for more details) measures the content skill “Number" and cognitive skill “Knowing", and we assume that the log-time is the sum of:

$β_{j, 0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{j, 0}$$\end{document} :: how long it takes to read the problem and click/type the answer;
$β_{j, K} A_{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{j, K} A_{K}$$\end{document} :: how long it takes to re-formulate the problem as “ $10 \times 18 \div 15 = X$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10 \times 18 \div 15 = X$$\end{document} ", regarding whether a student “knows" the necessary concepts, i.e., whether the student possesses the cognitive skill “Knowing”;
$β_{j, N} A_{N}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{j, N} A_{N}$$\end{document} :: how long it takes to compute and find the answer $X = 12$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X = 12$$\end{document} , regarding whether the student possesses the content skill “Number".

Summing up the above components and adding a normal error gives the following response time distribution, which corresponds to the lognormal-ACDM defined in Example 3:

\begin{matrix} log Y_{i, j} ∣ A = \underset{intercept}{β_{j, 0}} + \underset{cognitive skill}{β_{j, K} A_{K}} + \underset{content skill}{β_{j, N} A_{N}} + ϵ_{i, j}, where ϵ_{i, j} \overset{i.i.d}{\sim} N (0, γ_{j}) . \end{matrix}

Decomposing the response time into several components has a long history. Sternberg (Reference Sternberg1969) and Sternberg (Reference Sternberg1980) considered a linear regression model based on a sequence of hypothetical processes that students go through to solve a problem. However, these studies do not introduce individual-level latent variables. Maris (Reference Maris1993) considered a latent variable model with Gamma distribution for the response time, with a similar additive combination of the latent variables. These models are also referred to as exploratory response time models in the literature (e.g., De Boeck and Jeon, Reference De Boeck and Jeon2019). The recent papers Minchen et al. (Reference Minchen, de la Torre and Liu2017) and Minchen and de la Torre (Reference Minchen and de la Torre2018) incorporate this line of thinking into a CDM framework with the lognormal distribution for modeling the response times. The empirical results in these papers show that students possessing more required attributes take a longer time to respond, so we anticipate that our main-effect coefficients $β_{j, k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{j,k}$$\end{document} for the attributes are nonnegative.

We run our EM Algorithm 2 with 20 random parameter initializations with $β_{j, k} > 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{j,k} > 0$$\end{document} . We do not impose any strict constraints on the sign of $β_{j, k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{j,k}$$\end{document} and allow the algorithm to update parameters in the unconstrained space. Among the resulting 20 parameter estimates, we select the estimate with the largest log-likelihood. Figure 6 presents the estimated main-effect coefficients. This figure shows that students spend the most time reading the question and clicking/typing the answer (represented by the intercepts in the first column), compared to the time it takes to re-formulate and solve the problem (represented by the main-effect coefficients from the second column to the last column). The intercept values vary a lot across different questions, ranging from 1.0 to 3.1. This indicates that the length and the abstractness of the questions vary substantially across items. One interesting observation is the relation between the magnitude of the estimated intercept $β_{j, 0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{j,0}$$\end{document} and the type of each item. Recall that Table 3 provides information about whether each item is a multiple choice item or a constructed response item. Nonetheless, Fig. 6 reveals that, among the first ten items, the three items that ask students to construct the response (items 2, 4, 7) have the largest intercept values. This is clearly visible by looking at the darker-colored entries (or the numbers shown in white) in the first column of Fig. 6. This result can be interpreted as that students spend more time typing and checking the answer when solving constructed response items. Note that the item type information is not used in our estimation procedure, but our method can automatically distinguish the multiple choice items from the constructed response items via the estimated parameters.

Figure 6 Heatmap of the estimated $β$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document} -parameters under the lognormal-ACDM for the TIMSS 2019 response time dataset.

As for the main-effect coefficients $β_{j, k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{j,k}$$\end{document} for $k \geq 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k\ge 1$$\end{document} , the computation time represented by the content skills’ coefficients and formulation time represented by the cognitive skills’ coefficients also differ a lot across the items. These coefficients range from 0.03 to 2.3. Even though we do not constrain $β_{j, k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{j,k}$$\end{document} to be positive, all estimated main-effect coefficients turn out to be positive, which is consistent with empirical findings in previous studies of response time modeling. This observation also indicates that our additive model assumption is indeed plausible and yields interpretable parameter estimates.

Figure 7 Correlation plot of the latent attributes under the estimated lognormal-ACDM for the TIMSS 2019 response time dataset.

We also provide the correlation plot of the estimated latent attributes in Fig. 7. The EM algorithm does not estimate the individual latent attributes directly, but instead computes the conditional expectations ${\hat{φ}}_{i, α} = P (A_{i} = α ∣ Y, \hat{η}, \hat{p})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widehat{\varphi }}_{i,\varvec{\alpha }} = \mathbb {P}({\textbf{A}}_i = \varvec{\alpha }\mid {\textbf{Y}}, \;{\widehat{\varvec{\eta }}},\; {\widehat{{\varvec{p}}}})$$\end{document} for all $i \in [N]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i\in [N]$$\end{document} and $α \in {0, 1}^{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }\in \{0,1\}^K$$\end{document} . Therefore, we estimate the latent attribute profiles $A_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{A}}_i$$\end{document} by

\begin{matrix} {\hat{A}}_{i} = \underset{α \in {0, 1}^{K}}{argmax} {\hat{φ}}_{i, α}, \end{matrix}

and use these estimated ${\hat{A}}_{1}, \dots, {\hat{A}}_{N}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\textbf{A}}}_1,\ldots ,\widehat{{\textbf{A}}}_N$$\end{document} to compute the sample correlation between the attributes. Figure 7 reveals interesting observations about the intrinsic dependence among the attributes. There are higher correlations among the four content skills and also higher correlations among the three cognitive skills, compared to the correlations between one content skill and one cognitive skill. This phenomenon not only supports that our model is a reasonable choice as it outputs interpretable correlation structures among latent attributes, but also implies that it may be plausible to model high-order latent traits behind the seven fine-grained attributes (de la Torre and Douglas, Reference de la Torre and Douglas2004; Zhan et al., Reference Zhan, Jiao and Liao2018a). Future studies are warranted to explore the identifiability and interpretability of such high-order extensions of our general-response CDMs.

To assess the model fit, we use the popular Bayesian information criterion (BIC) and compare the lognormal-ACDM to an alternative lognormal-DINA model in terms of the BIC values. In order to satisfy the identifiability conditions in Theorem 1 under the lognormal-DINA model, and also motivated by the high correlation among the cognitive skills in Fig. 7, we consider a more parsimonious lognormal-DINA model with $K = 4$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=4$$\end{document} content skills only. The corresponding $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix for lognormal-DINA has four columns, which are the first four columns of the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix for lognormal-ACDM in Table 2. The BIC value for the lognormal-ACDM is 17,005, whereas the BIC for the lognormal-DINA (fitted via Algorithm 1) is 17,067. Based on this result, we conclude that the lognormal-ACDM with seven attributes is more suitable and fits better to this dataset, despite its higher model complexity.

Furthermore, in Supplementary Material S.6.2, we also provide an analysis of the binary response accuracy data. We find an interesting relationship between the item parameters estimated from the response time data and those from the response accuracy data, as well as a similar connection between the intercept (guessing parameter) of an item and the type of the item; see the Supplementary Material for details.

7. Discussion

In this paper, we have proposed a flexible new framework of cognitive diagnostic models for multivariate general responses beyond the traditional binary or polytomous responses. Our modeling framework incorporates the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix constraints in a unified way and covers popular existing CDMs as submodels. An important contribution of this work is to provide the crucial identifiability theory for all these general-response CDMs. Interestingly and somewhat surprisingly, we have shown that the general-response CDMs are identifiable under similar conditions on the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix as the binary-response CDMs. Our identifiability theory has the nice implication of consistent parameter estimation via the maximum likelihood. For computation, we have proposed an efficient EM algorithm for parameter estimation under various response types. Extensive simulation studies not only corroborate the identifiability conclusions, but also demonstrate the favorable computational performance of our algorithms. We have analyzed a response time dataset from the TIMSS 2019 assessment using the proposed lognormal-ACDM and obtained interpretable results.

The proposed new paradigm of identifiable general-response cognitive diagnostic models open up a number of interesting possibilities for future research. First, our current identifiability results and estimation procedure assume that the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix is known. In practice, many modern assessment datasets may not come with a readily available $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix. In these exploratory settings, it would be interesting to directly identify and estimate the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix along with other model parameters for general-response CDMs. In terms of identifiability, we conjecture that our identifiability proof technique could be generalized to also handle the identifiability of the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix. In terms of estimation, it may be possible to extend existing methods for estimating the $Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Q}}$$\end{document} -matrix such as penalized likelihood methods in Chen et al. (Reference Chen, Liu, Xu and Ying2015) and Ma et al. (Reference Ma, Ouyang and Xu2023), or Bayesian approaches in Chen et al. (Reference Chen, Culpepper, Chen and Douglas2018) and Liu et al. (Reference Liu, Andersson and Skrondal2020) from the binary-response CDMs to general-response CDMs. We plan to pursue these directions of exploratory general-response CDMs in the future.

Second, it would be interesting to explore possibilities of strengthening our identifiability results by developing weaker identifiability conditions for special submodels of general-response CDMs. Our main theorems provide transparent sufficient conditions for identifiability in a very general setting. But for binary-response CDMs under some special modeling assumptions, existing studies show that there exist weaker conditions that can guarantee identifiability. For example, for the DINA model with binary responses, Gu and Xu (Reference Gu and Xu2019) proposed weaker conditions that are necessary and sufficient for strict identifiability. Establishing necessary identifiability conditions in the most general setting with arbitrary response types is a non-trivial but interesting future direction.

Third, many educational assessments naturally contain multiple types of responses for each item. For instance, it is common to record the response accuracy and visual fixation counts/visit counts in addition to the response time for each item (Zhan et al., 2022; Fishbein et al., 2021). Jointly modeling multiple types of responses has received great attention in the measurement literature (van der Linden, Reference van der Linden2007; Molenaar et al., Reference Molenaar, Tuerlinckx and van der Maas2015; Zhan et al., Reference Zhan, Liao and Bian2018b; Wang et al., Reference Wang, Zhang, Douglas and Culpepper2018; Man and Harring, 2022; Kang et al., 2023). Among these modeling approaches, CDM-based methods such as Zhan et al. (Reference Zhan, Liao and Bian2018b) and Zhan et al. (2022) use binary latent variables to model the response accuracy only (i.e., binary responses of correct or wrong answers to items), and use an IRT-based model with continuous latent variables to model the response time and fixation counts. In fact, as suggested by our analysis of the TIMSS response time data, using binary latent skills to model the response time can also yield interpretable results. Therefore, this work offers useful insights for future research to propose identifiable general-response CDMs to jointly model multiple types of responses, such as one’s response accuracy, response time, and visit counts to an item. We believe that our ExpCDM framework is flexible enough to allow for such extensions. Indeed, one naive model would be to assume conditional independence between the response accuracy ( $Y_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{ij}$$\end{document} ) and the response time ( $T_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{ij}$$\end{document} ) given one’s latent attributes ( $A_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{A}}_i$$\end{document} ), and we conjecture that similar identifiability conditions as proposed in this work still suffice for identifiability in this setting. More investigations about modeling methodology and identifiability theory along this line are left for future research.

Acknowledgements

This research is partially supported by NSF grant DMS-2210796. The authors thank the editor Prof. Sandip Sinharay and three reviewers for their helpful and constructive comments.

Code availability

The Matlab codes implementing the proposed methods are available via this https://github.com/seunghyunstats/General-Response-CDM/tree/main.

Footnotes

Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/s11336-024-09983-4.

¹ We prove Theorem 1 under the minimal assumption (in Supplementary Material S.1) that Y j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {Y}_j$$\end{document} is a separable metric space.

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

References

Allman, E. S., Matias, C., Rhodes, J. A. (2009). Identifiability of parameters in latent structure models with many observed variables. The Annals of Statistics, 37 6A3099–3132.CrossRef Google Scholar

Casella, G., Berger, R. L. (2021). Statistical inference, Cengage Learning.Google Scholar

Chen, Y., Culpepper, S., Liang, F. (2020). A sparse latent class model for cognitive diagnosis. Psychometrika, 85(1), 121–153.CrossRef Google Scholar PubMed

Chen, Y., Culpepper, S. A., Chen, Y., Douglas, J. (2018). Bayesian estimation of the DINA Q matrix. Psychometrika, 83(1), 89–108.CrossRef Google Scholar PubMed

Chen, Y., Liu, J., Xu, G., Ying, Z. (2015). Statistical analysis of Q-matrix based diagnostic classification models. Journal of the American Statistical Association, 110(510), 850–866.CrossRef Google Scholar

Culpepper, S. A. (2019). An exploratory diagnostic model for ordinal responses with binary attributes: Identifiability and estimation. Psychometrika, 84(4), 921–940.CrossRef Google Scholar PubMed

Culpepper, S. A. (2023). A note on weaker conditions for identifying restricted latent class models for binary responses. Psychometrika, 88(1), 158–174.CrossRef Google Scholar PubMed

De Boeck, P., Jeon, M. (2019). An overview of models for response times and processes in cognitive tests. Frontiers in Psychology, 10, 102.CrossRef Google Scholar PubMed

de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179–199.CrossRef Google Scholar

de la Torre, J., Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333–353.CrossRef Google Scholar

Dempster, A. P., Laird, N. M., Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B: Statistical Methodology, 39(1), 1–22.CrossRef Google Scholar

DiBello, L. V., Stout, W. F., & Roussos, L. A. (1995). Unified cognitive/psychometric diagnostic assessment likelihood-based classification techniques. Cognitively diagnostic assessment, 361389.Google Scholar

Dunson, D. B. (2000). Bayesian latent variable models for clustered mixed outcomes. Journal of the Royal Statistical Society Series B: Statistical Methodology, 62(2), 355–366.CrossRef Google Scholar

Fang, G., Liu, J., Ying, Z. (2019). On the identifiability of diagnostic classification models. Psychometrika, 84(1), 19–40.CrossRef Google Scholar PubMed

Fishbein, B., Foy, P., & Yin, L. (2021). TIMSS 2019 user guide for the international database. Hentet fra https://timssandpirls. bc. edu/timss2019/international-database.Google Scholar

Gu, Y., Dunson, D. B. (2023). Bayesian Pyramids: identifiable multilayer discrete latent structure models for discrete data. Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(2), 399–426.CrossRef Google Scholar

Gu, Y., Xu, G. (2019). The sufficient and necessary condition for the identifiability and estimability of the DINA model. Psychometrika, 84(2), 468–483.CrossRef Google Scholar PubMed

Gu, Y., Xu, G. (2020). Partial identifiability of restricted latent class models. Annals of Statistics, 48(4), 2082–2107.CrossRef Google Scholar

Gu, Y., Xu, G. (2021). Sufficient and necessary conditions for the identifiability of the

Q

-matrix. Statistica Sinica, 31, 449–472.Google Scholar

He, Q., & Von Davier, M. (2016). Analyzing process data from problem-solving items with n-grams: Insights from a computer-based large-scale assessment. In Handbook of research on technology tools for real-world skill development, pages 750–777. IGI Global.CrossRef Google Scholar

Henson, R. A., Templin, J. L., Willse, J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74, 191–210.CrossRef Google Scholar

Junker, B. W., Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258–272.CrossRef Google Scholar

Kang, I., Jeon, M., & Partchev, I. (2023). A latent space diffusion item response theory model to explore conditional dependence between responses and response times. Psychometrika, pages 1–35.CrossRef Google Scholar

Koller, D., & Friedman, N. (2009). Probabilistic graphical models: principles and techniques. MIT press.Google Scholar

Kruskal, J. B. (1977). Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra and its Applications, 18(2), 95–138.CrossRef Google Scholar

Liu, C-W, Andersson, B., Skrondal, A. (2020). A constrained Metropolis-Hastings Robbins–Monro algorithm for Q matrix estimation in DINA models. Psychometrika, 85(2), 322–357.CrossRef Google Scholar PubMed

Liu, R., Heo, I., Liu, H., Shi, D., Jiang, Z. (2023). Applying negative binomial distribution in diagnostic classification models for analyzing count data. Applied Psychological Measurement, 47(1), 64–75.CrossRef Google Scholar PubMed

Liu, R., Liu, H., Shi, D., Jiang, Z. (2022). Poisson diagnostic classification models: A framework and an exploratory example. Educational and Psychological Measurement, 82(3), 506–516.CrossRef Google Scholar PubMed

Lo, S., Andrews, S. (2015). To transform or not to transform: Using generalized linear mixed models to analyse reaction time data. Frontiers in Psychology, 6, 1171.CrossRef Google Scholar PubMed

Loeys, T., Rosseel, Y., Baten, K. (2011). A joint modeling approach for reaction time and accuracy in psycholinguistic experiments. Psychometrika, 76, 487–503.CrossRef Google Scholar

Ma, C., Ouyang, J., Xu, G. (2023). Learning latent and hierarchical structures in cognitive diagnosis models. Psychometrika, 88(1), 175–207.CrossRef Google Scholar PubMed

Magnus, B. E., Thissen, D. (2017). Item response modeling of multivariate count data with zero inflation, maximum inflation, and heaping. Journal of Educational and Behavioral Statistics, 42(5), 531–558.CrossRef Google Scholar

Man, K., Harring, J. R. (2019). Negative binomial models for visual fixation counts on test items. Educational and Psychological Measurement, 79(4), 617–635.CrossRef Google Scholar PubMed

Man, K., & Harring, J. R. (2022). Detecting preknowledge cheating via innovative measures: A mixture hierarchical model for jointly modeling item responses, response times, and visual fixation counts. Educational and Psychological Measurement, page 00131644221136142.Google Scholar

Maris, E. (1993). Additive and multiplicative models for gamma distributed random variables, and their application as psychometric models for response times. Psychometrika, 58, 445–469.CrossRef Google Scholar

Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64(2), 187–212.CrossRef Google Scholar

Minchen, N., de la Torre, J. (2018). A general cognitive diagnosis model for continuous-response data. Measurement: Interdisciplinary Research and Perspectives, 16(1), 30–44.Google Scholar

Minchen, N., de la Torre, J., Liu, Y. (2017). A cognitive diagnosis model for continuous response. Journal of Educational and Behavioral Statistics, 42(6), 651–677.CrossRef Google Scholar

Molenaar, D., Tuerlinckx, F., van der Maas, H. L. (2015). A bivariate generalized linear item response theory modeling framework to the analysis of responses and response times. Multivariate Behavioral Research, 50(1), 56–74.CrossRef Google Scholar

Moustaki, I., Knott, M. (2000). Generalized latent trait models. Psychometrika, 65, 391–411.CrossRef Google Scholar

Nelder, J. A., Wedderburn, R. W. (1972). Generalized linear models. Journal of the Royal Statistical Society: Series A (General), 135(3), 370–384.CrossRef Google Scholar

Rasch, G. (1993). Probabilistic models for some intelligence and attainment tests. ERIC.Google Scholar

Rupp, A. A., Templin, J., Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications, Guilford Press.Google Scholar

Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models. Crc Press.CrossRef Google Scholar

Sternberg, R. J. (1980). Representation and process in linear syllogistic reasoning. Journal of Experimental Psychology: General, 109(2), 119.CrossRef Google Scholar

Sternberg, S. (1969). The discovery of processing stages: Extensions of donders’ method. Acta psychologica, 30, 276–315.CrossRef Google Scholar

Tang, X., Wang, Z., He, Q., Liu, J., Ying, Z. (2020). Latent feature extraction for process data via multidimensional scaling. Psychometrika, 85(2), 378–397.CrossRef Google Scholar PubMed

Tatsuoka, K. K. (1983). Rule space: an approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement, 20, 345–354.CrossRef Google Scholar

Teicher, H. (1967). Identifiability of mixtures of product measures. The Annals of Mathematical Statistics, 38(4), 1300–1302.CrossRef Google Scholar

Thissen, D. (1983). Timed testing: An approach using item response theory. In New horizons in testing, pages 179–203. Elsevier.CrossRef Google Scholar

van der Linden, W. J. (2006). A lognormal model for response times on test items. Journal of Educational and Behavioral Statistics, 31(2), 181–204.CrossRef Google Scholar

van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72(3), 287–308.CrossRef Google Scholar

von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61, 287–307.CrossRef Google Scholar PubMed

Wang, C., Xu, G. (2015). A mixture hierarchical model for response times and response accuracy. British Journal of Mathematical and Statistical Psychology, 68(3), 456–477.CrossRef Google Scholar PubMed

Wang, S., Zhang, S., Douglas, J., Culpepper, S. (2018). Using response times to assess learning progress: A joint model for responses and response times. Measurement: Interdisciplinary Research and Perspectives, 16(1), 45–58.Google Scholar

Xu, G. (2017). Identifiability of restricted latent class models with binary responses. Annals of Statistics, 45, 675–707.CrossRef Google Scholar

Xu, G., Shang, Z. (2018). Identifying latent structures in restricted latent class models. Journal of the American Statistical Association, 113(523), 1284–1295.CrossRef Google Scholar

Xu, G., Zhang, S. (2016). Identifiability of diagnostic classification models. Psychometrika, 81(3), 625–649.CrossRef Google Scholar PubMed

Yakowitz, S. J., Spragins, J. D. (1968). On the identifiability of finite mixtures. The Annals of Mathematical Statistics, 39(1), 209–214.CrossRef Google Scholar

Zhan, P., Jiao, H., Liao, D. (2018). Cognitive diagnosis modelling incorporating item response times. British Journal of Mathematical and Statistical Psychology, 71(2), 262–286.CrossRef Google Scholar PubMed

Zhan, P., Liao, M., Bian, Y. (2018). Joint testlet cognitive diagnosis modeling for paired local item dependence in response times and response accuracy. Frontiers in Psychology, 9, 607.CrossRef Google Scholar PubMed

Zhan, P., Man, K., Wind, S. A., & Malone, J. (2022). Cognitive diagnosis modeling incorporating response times and fixation counts: Providing comprehensive feedback and accurate diagnosis. Journal of Educational and Behavioral Statistics.CrossRef Google Scholar

Figure 1 Graphical model of the general-response CDM with Yj∈Yj\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$Y_j\in \mathcal Y_j$$\end{document}. White nodes are latent attributes, and gray nodes are observed responses. The directed arrows from the latent to the observed capture the conditional dependence of Y\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\textbf{Y}}$$\end{document} given A\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\textbf{A}}$$\end{document}, which is exactly encoded in the Q\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\textbf{Q}}$$\end{document}-matrix QJ×K\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\textbf{Q}}_{J\times K}$$\end{document}. There is an directed arrow from Ak\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$A_k$$\end{document} to Yj\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$Y_j$$\end{document} if and only if qj,k=1\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$q_{j,k}=1$$\end{document}.

Table 1 Examples of exponential families, and their natural parameters and sufficient statistics.

Algorithm 1 EM Algorithm for the General ExpDINA Model

Algorithm 2 EM Algorithm for the General ExpACDM

Figure 2 RMSEs of parameters under the Normal-DINA (red) and Normal-ACDM (blue).

Figure 3 RMSEs of p\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\varvec{p}}$$\end{document} with respect to 1N\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\frac{1}{\sqrt{N}}$$\end{document} under the Normal-DINA model.

Figure 4 RMSEs of parameters under the Poisson-DINA (red) and Poisson-ACDM (blue).

Figure 5 RMSEs of parameters under the NegBin-DINA (red) and NegBin-ACDM (blue).

Table 2 Q\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\textbf{Q}}$$\end{document}-matrix for the TIMSS 2019 math assessment booklet 14.

Table 3 Additional item information for items in TIMSS 2019 math assessment booklet 14. Starred (∗\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$^*$$\end{document}) items are items that are composed of sub-questions; for these items, we display the smallest correct response percentage among all sub-questions.

Figure 6 Heatmap of the estimated β\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\beta $$\end{document}-parameters under the lognormal-ACDM for the TIMSS 2019 response time dataset.

Figure 7 Correlation plot of the latent attributes under the estimated lognormal-ACDM for the TIMSS 2019 response time dataset.

Lee and Gu supplementary materials

File 673 KB

Article contents

New Paradigm of Identifiable General-response Cognitive Diagnostic Models: Beyond Categorical Data

Abstract

Keywords

1. Introduction

2. General-Response Cognitive Diagnostic Models

2.1. General Model Setup

Remark 1

Definition 1

2.2. Parametric and Exponential Family-Based CDMs (ExpCDMs)

Definition 2

Definition 3

Remark 2

2.2.1. ExpDINA for General Responses

Definition 4

Remark 3

Example 1

Example 2

2.2.2. ExpACDM for General Responses

Definition 5

Example 3

Example 4

3. Identifiability of General-response CDMs

3.1. Strict Identifiability of General-Response CDMs

Definition 6

Theorem 1

Proposition 1

Proposition 2

Proposition 3

3.2. Generic Identifiability of ExpACDMs

Definition 7

Theorem 2

4. Universal EM Algorithms for Estimating ExpCDMs

Proposition 4

4.1. EM Algorithm for the ExpDINA

4.2. EM Algorithm for the ExpACDM

5. Simulation Studies

5.1. Simulations Under the Normal- and Transformed Normal-CDMs

5.2. Simulations Under the Poisson- and Negative Binomial-CDMs

6. Application to the TIMSS 2019 Response Time Data

7. Discussion

Acknowledgements

Code availability

Footnotes

References

Lee and Gu supplementary materials

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests