Optimizing Large-Scale Educational Assessment with a “Divide-and-Conquer” Strategy: Fast and Efficient Distributed Bayesian Inference in IRT Models

Sainan Xu; Jing Lu; Jiwei Zhang; Chun Wang; Gongjun Xu

doi:10.1007/s11336-024-09978-1

Optimizing Large-Scale Educational Assessment with a “Divide-and-Conquer” Strategy: Fast and Efficient Distributed Bayesian Inference in IRT Models

Published online by Cambridge University Press: 01 January 2025

Sainan Xu ,

Chun Wang and

Sainan Xu: Affiliation:
Northeast Normal University
Jing Lu*: Affiliation:
Northeast Normal University
Jiwei Zhang*: Affiliation:
Northeast Normal University
Chun Wang: Affiliation:
University of Washington
Gongjun Xu: Affiliation:
University of Michigan
*: Correspondence should be made to Jing Lu, Key Laboratory of Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun, Jilin, China. Email: luj282@nenu.edu.cn
Correspondence should be made to Jiwei Zhang, Faculty of Education, Key Laboratory of Applied Statistics of MOE, Northeast Normal University, Changchun, Jilin, China. Email: zhangjw713@nenu.edu.cn

Article contents

Abstract
Models
The Location-Scatter Wasserstein Posterior (LS-WASP) Algorithm with a Divide-and-Conquer-Based Strategy
Theoretical Properties
Simulation Studies
Empirical Examples
Discussion
Declarations
Footnotes
References

Rights & Permissions

Abstract

With the growing attention on large-scale educational testing and assessment, the ability to process substantial volumes of response data becomes crucial. Current estimation methods within item response theory (IRT), despite their high precision, often pose considerable computational burdens with large-scale data, leading to reduced computational speed. This study introduces a novel “divide- and-conquer” parallel algorithm built on the Wasserstein posterior approximation concept, aiming to enhance computational speed while maintaining accurate parameter estimation. This algorithm enables drawing parameters from segmented data subsets in parallel, followed by an amalgamation of these parameters via Wasserstein posterior approximation. Theoretical support for the algorithm is established through asymptotic optimality under certain regularity assumptions. Practical validation is demonstrated using real-world data from the Programme for International Student Assessment. Ultimately, this research proposes a transformative approach to managing educational big data, offering a scalable, efficient, and precise alternative that promises to redefine traditional practices in educational assessments.

Keywords

large-scale testing item response theory divide-and-conquer strategy distributed Bayesian inference Wasserstein posterior

Type: Theory and Methods
Information: Psychometrika , Volume 89 , Issue 4 , December 2024 , pp. 1119 - 1147

DOI: https://doi.org/10.1007/s11336-024-09978-1 [Opens in a new window]
Copyright: © 2024 The Author(s), under exclusive licence to The Psychometric Society

Large-scale educational testing and assessments have gained global attention, engaging educators, researchers, and policymakers worldwide. This interest arises from significant changes in organized assessments over the last fifty years, resulting in rich and extensive educational big datasets. One significant example is the Programme for International Student Assessment (PISA), conducted by OECD. PISA collects data every three years from 15-year-olds worldwide to measure students’ abilities in applying their knowledge and skills in reading, mathematics, and science to solve real-world problems. PISA 2018 collected data from 606,627 students who answered 82 mathematics cognitive items (OECD, 2021). While not every student responds to each question, after eliminating the responses of students who did not answer any of the items, the sample size still remains as high as 267,889. Furthermore, large-scale educational assessments conducted by individuals or schools are also notable. For example, to study the ”critical period” in Second Language Acquisition, Hartshorne et al. (Reference Hartshorne, Tenenbaum and Pinker2018) administered a grammar test through social media to English-native and non-native speakers (Wu et al., 2020). The test consists of 95 binary items and was completed by 669,498 participants without missing data. Other large-scale testing like the National Assessment of Educational Progress (NAEP; van Rijn et al. (Reference van Rijn, Sinharay, Haberman and Johnson2016)) and the Trends in International Mathematics and Science Study (TIMSS; Martin and Kelly (1996)) belong to this landscape. Clearly, these extensive educational response datasets share a common characteristic: big data and large sample sizes. Note that “big data” and “large sample size” in this paper refer to high number of individuals rather than a large number of items. Therefore, conducting a sound analysis of these large response data holds significant value for educational evaluation and item quality assessment.

Leveraging these big datasets presents a unique opportunity and a challenge. It enables researchers and educators to gain insights into student performance on an unprecedented scale, thereby allowing more informed decision-making regarding educational policies and teaching strategies. Moreover, it facilitates a deeper grasp of student learning processes, capturing not just final answers but the methods used to reach them. However, analyzing such datasets is complex. It requires robust statistical methodologies and the computational capacity to process the educational big data. Thus, ongoing investment in innovative methods and computational tools is crucial to harness the full potential of these large-scale educational datasets.

Item response theory (IRT; van der Linden and Hambleton (Reference van der Linden and Hambleton1997)) is a powerful tool for test design, analysis, and scoring. It employs a probabilistic model to explain the relationship between an individual’s latent ability and the probability of correctly answering a particular item on a test. This theory is particularly relevant in the context of large-scale educational assessments. For example, IRT models are applied to analyze the collected data for PISA test (Embretson and Reise, Reference Embretson and Reise2000). The analysis offers insights into the mathematical abilities, reading comprehension, and scientific literacy of students from various countries.

Accurate parameter estimation is a fundamental requirement for applying IRT models. Commonly employed methods for estimating parameters in IRT include marginal maximum likelihood estimation via expectation maximization algorithm (MMLE-EM; Bock and Aitkin (Reference Bock and Aitkin1981); Baker and Kim (Reference Baker and Kim2004); Schilling and Bock (Reference Schilling and Bock2005)) and Bayesian Markov chain Monte Carlo (MCMC) methods, such as the Metropolis–Hastings (MH; Metropolis et al. (Reference Metropolis, Rosenbluth, Rosenbluth, Teller and Teller1953); Hastings (Reference Hastings1970)) and Gibbs algorithms (Béguin and Glas, Reference Béguin and Glas2001; Fox, Reference Fox2010; Culpepper, Reference Culpepper2016). The MMLE-EM algorithm, despite being frequently utilized in various IRT models, possesses significant limitations. One primary concern is the requirement of these algorithms to perform high-dimensional numerical integration for parameter estimation. This is an inherently complex process that is computationally intensive. The issue is further compounded when the number of quadrature points needed for integration grows exponentially as the number of latent variables increases linearly (Jiang and Templin, Reference Jiang and Templin2019). To circumvent these limitations, MCMC algorithms, based on posterior sampling, have been widely applied to estimate parameters in various complex IRT models. This approach significantly reduces the complexity of parameter estimation and improves computational efficiency.

However, parameter estimation in IRT models using MCMC methods is not without limitations either. MCMC methods often have difficulty with big data because of the following two reasons. Firstly, there are challenges related to data storage; when the dataset is too large, it becomes impractical for a single computer to store and process it. Secondly, computational time is a concern; applying MCMC methods in settings with big data can be highly time-consuming, as it typically involves numerous iterations, each of which requires several passes over the entire dataset (Xue and Liang, Reference Xue and Liang2019; Srivastava and Xu, Reference Srivastava and Xu2021; Shyamalkumar and Srivastava, Reference Shyamalkumar and Srivastava2022).

Researchers have been actively addressing the challenges of using MCMC methods with big data. One approach involves approximating posterior analyses, such as variational Bayesian (Hoffman et al., Reference Hoffman, Blei, Wang and Paisley2013; Tan and Nott, Reference Tan and Nott2014; Lee and Wand, Reference Lee and Wand2016), Laplace/Gaussian approximations (Rue et al., Reference Rue, Martino and Chopin2009), and expectation propagation techniques (Vehtari et al., Reference Vehtari, Gelman, Sivula, Jylänki, Tran, Sahai and Robert2020). Although these methods can generate useful posterior mean estimates for big data, most of them often underestimate the posterior variance and lack theoretical guarantees for quantifying posterior uncertainty (Giordano, Reference Giordano, Broderick and Jordan2018). The second approach focuses on approximating MCMC transition kernels with subsampling methods or easier-to-sample alternatives, without needing to pass through full a dataset (Korattikara et al., 2014; Alquier et al., Reference Alquier, Friel, Everitt and Boland2016; Quiroz et al., Reference Quiroz, Kohn, Villani and Tran2019). This approach, though promising, requires careful parameter tuning; only with correctly tuned parameters can the algorithm produce reliable posterior uncertainty estimates.

The third approach involves the divide-and-conquer (D &C) strategy, which comprises three stages: initially, the full dataset is partitioned into multiple subsets; subsequently, the sampling algorithm is implemented in parallel on all these subsets to derive posterior samples; and finally, the posterior samples collected from each subset are combined in a specific manner to conduct accurate posterior inference on the full dataset. Several techniques exist within stage 3, with the main difference being how the posterior draws from different subsets are merged (Minsker et al., 2014; Neiswanger et al., 2014; Scott et al., Reference Scott, Blocker, Bonassi, Chipman, George and McCulloch2016). Currently, the most advanced merging method is the Wasserstein posterior (WASP) method. Srivastava et al. (2015, Reference Srivastava, Li and Dunson2018) first introduced WASP method, which amalgamates the posterior distribution of subsets by leveraging the Wasserstein barycenter—a concept representing the geometric center for probability measures (Agueh and Carlier, Reference Agueh and Carlier2011). Although the method offers extensive applicability and theoretical guarantees, its implementation, which requires computing the Wasserstein barycenter, can be highly complex and resource-intensive. In response, Li et al. (Reference Li, Srivastava and Dunson2017) proposed a computationally more straightforward posterior interval estimation algorithm (PIE) that estimates the WASP quantile by averaging the quantiles derived from each subset draw. However, PIE has been criticized for its effectiveness only with one-dimensional parameters.

More recently, the “double-parallel” Monte Carlo (DPMC) method was suggested by Xue and Liang (Reference Xue and Liang2019). This approach mainly relies on the mixture distribution derived from the common center of the subset draws to approximate the full data posterior. However, the assumption of asymptotic normality in DPMC can be challenging to validate in practical scenarios as it implies that the covariance matrices of the subset posterior distributions are identical. $\overset{´}{A}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\acute{\text {A}}$$\end{document} lvarez et al. (Reference Álvarez-Esteban, Del Barrio, Cuesta-Albertos and Matrán2016) argued that the computation of the Wasserstein barycenter could be simplified by solving a fixed-point equation on a positive definite matrix space if the subsets posterior distribution pertains to the same location-scatter (LS) family of distributions. Building off this idea, Srivastava and Xu (Reference Srivastava and Xu2021) proposed the LS-WASP algorithm for distributed Bayesian inference in linear mixed-effects models. The LS-WASP algorithm not only preserves the computational simplicity of the PIE and DPMC algorithms, but also delivers asymptotic Monte Carlo and statistical guarantees. Subsequently, Shyamalkumar and Srivastava (Reference Shyamalkumar and Srivastava2022) as well as Wang and Srivastava (Reference Wang and Srivastava2023) extended this algorithm to generalized linear models and hidden Markov models, respectively.

Despite extensive research into addressing the challenges of applying MCMC methods to mid-sized data, the difficulty of estimating the parameters of the IRT models using the MCMC algorithm remains in the presence of big data. This study presents the location-scatter Wasserstein posterior (LS-WASP) algorithm, a divide-and-conquer-based strategy, to tackle the big data challenge. This divide-and-conquer algorithm, anchored in the LS-WASP concept, transforms parameters drawn from subset posteriors into draws from WASP via a straightforward recentering and rescaling operation. Not only is this algorithm cutting-edge for scalable Bayesian inference applications, but it also presents a simple computation with asymptotic theoretical guarantees. This study applied the LS-WASP algorithm to the two-parameter logistic (2PL; Birnbaum, (1957)) model and the multidimensional two-parameter logistic (M2PL; Reckase (Reference Reckase1972, Reference Reckase2009)) model, confirming its utility and feasibility within the educational and psychometrics field. When dealing with large-scale response data, the LS-WASP algorithm not only accurately estimates each parameter of the IRT models but also substantially accelerates computational speed, supported by asymptotic Monte Carlo and statistical guarantees.

The organizations of this paper are as follows. Section 1 provides a brief introduction to the 2PL and M2PL models. Section 2 provides a detailed description of the preliminary preparations and the specific implementation process of the LS-WASP algorithm within the IRT framework. Section 3 presents the specific theoretical assumptions and the asymptotic statistical properties, and Monte Carlo guarantees under these assumptions. Section 4 presents the simulation studies of the LS-WASP algorithm in the 2PL and M2PL models. In Sect. 5, two examples are provided to show the application of the LS-WASP algorithm in IRT models. Finally, Sect. 6 concludes the paper with a brief summary and discussion.

1. Models

1.1. Unidimensional Two-Parameter Logistic Model

The unidimensional IRT model establishes a basic structure for illustrating the interaction between individuals and items by postulating a single latent trait dimension. Presume a test is composed of J binary response items, with each assessing a unidimensional latent trait, $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }$$\end{document} . Let $y = {(y_{ij})}_{n \times J}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{y=}}\left[ y_{ij}\right] _{n\times J}$$\end{document} be an $n \times J$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n \times J$$\end{document} matrix representing the responses of n examinees to J items, where $y_{ij} = 1 (y_{ij} = 0)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{ij}=1\left( y_{ij}=0\right) $$\end{document} if the ith examinee answers the jth item correctly (wrong), for $i = 1, . . ., n$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i=1,...,n$$\end{document} and $j = 1, . . ., J$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,...,J$$\end{document} . The correct response probability of the two-parameter logistic (2PL) model is expressed as follows:

(1)

\begin{matrix} P (y_{ij} = 1 (θ_{i}, a_{j}, b_{j})) = \frac{exp (a_{j}, (θ_{i} - b_{j}))}{1 + exp (a_{j}, (θ_{i} - b_{j}))}, \end{matrix}

where $θ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _i$$\end{document} is the ability parameter of the examinee, for $i = 1, \dots, n$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i=1,\cdots ,n$$\end{document} ; $a_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_j$$\end{document} is the discrimination parameter for the item; and $b_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_j$$\end{document} is the difficulty parameter for the item, for $j = 1, \dots, J$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,\cdots ,J$$\end{document} .

1.2. Multidimensional Two-Parameter Logistic Model

Multidimensional item response theory (MIRT) models (Ackerman, Reference Ackerman1996; Béguin and Glas, Reference Béguin and Glas2001), as an extension of unidimensional IRT models, are extensively applied to illustrate the relationships between test items and multiple latent traits in psychological and educational assessments (Reckase, Reference Reckase2009). Subsequently, we present the multidimensional two-parameter logistic (M2PL) model, a prevalent model within the domain of MIRT. Consider a test comprising of J items, which is designed to assess Q latant traits among n examinees. Denote the observed responses to the J items for all n examinees as $y = {(y_{ij})}_{n \times J}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{y}}=\left[ y_{ij}\right] _{n\times J}$$\end{document} . Here, $y_{ij} = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{ij}=1$$\end{document} indicates that examinee i answers item j correctly, while $y_{ij} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{ij}=0$$\end{document} indicates a wrong response. For each examinee i ( $i = 1, . . ., n$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i=1,...,n$$\end{document} ), let $θ_{i} = {(θ_{i 1}, . . ., θ_{iQ})}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }_{i}=\left( \theta _{i1},...,\theta _{iQ}\right) ^{T}$$\end{document} denote the latent traits being measured, with Q dimensions. The correct response probability of the M2PL model is expressed as follows:

(2)

\begin{matrix} P (y_{ij} = 1 (θ_{i}, a_{j}, b_{j})) = \frac{exp (a_{j}^{T} θ_{i} - b_{j})}{1 + exp (a_{j}^{T} θ_{i} - b_{j})}, \end{matrix}

where the discrimination and difficulty parameters are $a_{j} = {(a_{j 1}, . . ., a_{jQ})}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}_{j}=\left( a_{j1},...,a_{jQ}\right) ^{T}$$\end{document} and $b_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{j}$$\end{document} , respectively. $a_{jq} \neq 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{jq}\ne 0$$\end{document} indicates that item j is associated with latent trait q.

Due to the overparametrization that allows for the rotation of discrimination parameters $(a_{j 1}, . . ., a_{jQ})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(a_{j1},...,a_{jQ})$$\end{document} , it is necessary to address the model identification in advance. Usually, two approaches are employed to ensure model identification. The first approach constrains all ability parameters to follow a multivariate normal distribution with the mean of zero vector and covariance matrix of an identity matrix, respectively, and introducing the constrains $a_{jq} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{jq}=0$$\end{document} , for $j = 1, \dots, Q - 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,\cdots ,Q-1$$\end{document} , $q = j + 1, \dots, Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q=j+1,\cdots ,Q$$\end{document} . The second approach constrains the item parameters by setting Q item parameters $b_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_j$$\end{document} to zero. Furthermore, for each $j = 1, \dots, Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,\cdots ,Q$$\end{document} and $q = 1, \dots, Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q=1,\cdots ,Q$$\end{document} , constraints are imposed on $a_{jq}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{jq}$$\end{document} such that it equals to 1 if $j = q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=q$$\end{document} , and it is set to zero if $j \neq q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j \ne q$$\end{document} . Meanwhile, the ability parameters are freely estimated. Both approaches effectively guarantee a unique solution for parameter estimation, as emphasized by B $\overset{´}{e}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\acute{\text {e}}$$\end{document} guin & Glas (Reference Béguin and Glas2001). Similarly, factor analysis models often constrain factor loadings to achieve model identifiability. Please see chapter 5 of Skrondal and Rabe-Hesketh (Reference Skrondal and Rabe-Hesketh2004) for details. In this paper, we adopt the second approach, constraining item parameters.

2. The Location-Scatter Wasserstein Posterior (LS-WASP) Algorithm with a Divide-and-Conquer-Based Strategy

2.1. Preliminaries

This section introduces fundamental concepts, definitions, and theorems used in the LS-WASP algorithm, which serves as the foundation for the theoretical properties of the LS-WASP algorithm in Sect. 3. Next, several concepts and definitions related to the Wasserstein space are introduced. The Wasserstein space of order 2 on $R^{p}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {R}}^p$$\end{document} is defined as:

\begin{matrix} P_{2} (R^{p}) = (ν \in P (R^{p}) : \int {‖ x ‖}_{2}^{2} ν (d, x) < \infty), \end{matrix}

where $P (R^{p})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {P}}({\mathbb {R}}^p)$$\end{document} is the set of Borel probability measures on $R^{p}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {R}}^p$$\end{document} and ${‖ \cdot ‖}_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Vert \cdot \Vert _2$$\end{document} represents the Euclidean metric. The $L_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_2$$\end{document} -Wasserstein distance given by $(μ, ν) \in P_{2} (R^{p})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\mu ,\nu )\in {\mathcal {P}}_2({\mathbb {R}}^p)$$\end{document} can be defined as:

\begin{matrix} W_{2} (μ, ν) = {(inf ({\int ‖ x - y ‖}_{2}^{2} d π (x, y), π \in P_{2} (R^{p} \times R^{p}) with marginals μ, ν))}^{\frac{1}{2}} . \end{matrix}

Definition 1

(Wasserstein barycenter) Given $ν_{1}, \dots, ν_{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _1,\cdots ,\nu _K$$\end{document} in $P_{2} (R^{p})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {P}}_2({\mathbb {R}}^p)$$\end{document} and denoting $w_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_k$$\end{document} as the weight tied to $ν_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _k$$\end{document} , their Wasserstein barycenter can be expressed as:

(3)

\begin{matrix} \tilde{ν} = arg min_{ν \in P_{2} (R^{p})} \sum_{k = 1}^{K} \frac{w_{k}}{2} W_{2}^{2} (ν, ν_{k}) . \end{matrix}

Note that $\sum_{k = 1}^{K} w_{k} = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum _{k=1}^Kw_k=1$$\end{document} and all $w_{1}, \dots, w_{K} > 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_1,\cdots ,w_K>0$$\end{document} .

The Wasserstein barycenter exhibits important properties that render it a useful tool across different domains. First, the barycenter provides an effective measure of central tendency, representing a compromise between the multiple probability measures. Second, according to Agueh and Carlier (Reference Agueh and Carlier2011), the Wasserstein barycenter exists uniquely and optimally, as it minimizes the sum of the squared Wasserstein distances to the individual measures. This implies that it provides the most accurate approximation under the Wasserstein metric. In practice, these properties allow the Wasserstein barycenter to aggregate complex, multi-modal distributions, thereby playing a critical role in applications such as optimal transport, machine learning, and computer vision, to name a few. When applying this idea to IRT model estimation, $ν_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _k$$\end{document} is the kth subset posterior, and $\tilde{ν}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\nu }}$$\end{document} is the Wasserstein posterior (WASP) which replaces the full data posterior for inference. Nonetheless, the computation of the Wasserstein barycenter poses a significant challenge, prompting ongoing research into effective algorithms and computational strategies. $\overset{´}{A}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\acute{\text {A}}$$\end{document} lvarez-Esteban et al. (Reference Álvarez-Esteban, Del Barrio, Cuesta-Albertos and Matrán2016) argued that when $ν_{1}, \dots, ν_{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _1,\cdots ,\nu _K$$\end{document} belong to the same location-scatter family, the computation of the Wasserstein barycenter can be simplified to solving a fixed-point equation in the space of positive definite matrices. The definition of the location-scatter family is as follows.

Definition 2

(Location-scatter family) Consider a random vector $U$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{U}}$$\end{document} that conforms to the probability law $G \in P_{2} (R^{p})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G\in {\mathcal {P}}_2({\mathbb {R}}^p)$$\end{document} , such that its expectation $E (U)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E({\varvec{U}})$$\end{document} is zero and its variance $Var (U)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {Var}({\varvec{U}})$$\end{document} equals the identity matrix of dimensions $p \times p$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p\times p$$\end{document} . Let $L$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {L}}$$\end{document} denote the probability distribution of a random variable W, and $M_{+}^{p \times p}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {M}}_+^{p\times p}$$\end{document} represent the collection of $p \times p$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p\times p$$\end{document} positive definite matrices. We define the family $F (G) = {L (Σ^{\frac{1}{2}} U + μ) : Σ \in M_{+}^{p \times p}, μ \in R^{p}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {F}}(G)=\{{\mathcal {L}}(\varvec{\Sigma }^{\frac{1}{2}}{\varvec{U}}+\varvec{\mu }):\varvec{\Sigma }\in {\mathcal {M}}_+^{p\times p},\varvec{\mu }\in {\mathbb {R}}^p\}$$\end{document} , composed of probability laws formed by positive definite affine transformations derived from G, as a location-scatter family. Here, $Σ^{\frac{1}{2}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Sigma }^{\frac{1}{2}}$$\end{document} denotes the symmetric square root of $Σ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Sigma }$$\end{document} .

The location-scatter family offers a flexible and efficient method to generate a plethora of probability measures from a base measure. By adjusting the location parameter $μ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\mu }$$\end{document} and the positive definite matrix $Σ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Sigma }$$\end{document} , we can effectively shift and scale the original measure, generating a comprehensive family of measures. This flexibility finds widespread use in statistical modeling and analysis. For instance, in multivariate analysis, the location-scatter family allows for the modeling of diverse and complex data distributions. Moreover, in machine learning, location-scatter families can help construct flexible probabilistic models that can adapt to the specific characteristics of the data. For additional details, please refer to the study by $\overset{´}{A}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\acute{\text {A}}$$\end{document} lvarez-Esteban et al. (Reference Álvarez-Esteban, Del Barrio, Cuesta-Albertos and Matrán2016). Next, Theorem 1 provides a simplified process of the Wasserstein barycenter under the assumption of location-scatter family.

Theorem 1

Suppose $ν_{1}, \dots, ν_{K} \in F (G)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _1,\cdots ,\nu _K\in {\mathcal {F}}(G)$$\end{document} for a certain G, where $μ_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\mu }_k$$\end{document} and $B_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{B}}_k$$\end{document} denote the mean vector and covariance matrix of $ν_{k} (k = 1, \dots, K)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _k(k=1,\cdots ,K)$$\end{document} , respectively. Under general conditions, the Wasserstein barycenter of $ν_{1}, \dots, ν_{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _1,\cdots ,\nu _K$$\end{document} with weights $w_{1}, \dots, w_{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_1,\cdots ,w_K$$\end{document} , indicated as $\tilde{ν}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\nu }}$$\end{document} , is also an element of $F (G)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {F}}(G)$$\end{document} . Its mean vector $\tilde{μ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\varvec{\mu }}}$$\end{document} is the weighted average of $μ_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\mu }_k$$\end{document} values, i.e., $\tilde{μ} = \sum_{k = 1}^{K} w_{k} μ_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\varvec{\mu }}}=\sum _{k=1}^Kw_k\varvec{\mu }_k$$\end{document} , and the covariance matrix $\tilde{B}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{{\varvec{B}}}$$\end{document} corresponds to the fixed-point of the sequence ${Δ_{t}}_{t = 0}^{\infty}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\varvec{\Delta }_t\}_{t=0}^{\infty }$$\end{document} , which can be expressed as follows:

(4)

\begin{matrix} Δ_{t + 1} = Δ_{t}^{- \frac{1}{2}} {(\sum_{k = 1}^{K}, w_{k}, {(Δ_{t}^{\frac{1}{2}}, B_{k}, Δ_{t}^{\frac{1}{2}})}^{\frac{1}{2}})}^{2} Δ_{t}^{- \frac{1}{2}}, t = 0, 1, 2, \dots, + \infty, \end{matrix}

where $Δ_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Delta }_{0}$$\end{document} is set as the unit array.

Theorem 1 reveals a fundamental and appealing property of the location-scatter family. When the probability measures $ν_{1}, \dots, ν_{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _1,\cdots ,\nu _K$$\end{document} belong to the same location-scatter family $F (G)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {F}}(G)$$\end{document} , their Wasserstein barycenter, denoted by $\tilde{ν}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\nu }}$$\end{document} , also belongs to the same family under general assumptions. This powerful result provides a direct and efficient computational route to determine the Wasserstein barycenter of a collection of measures within a location-scatter family, bypassing the need for solving the potentially complex Wasserstein barycenter problem directly. The theorem also provides explicit analytical forms for the mean vector $\tilde{μ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\varvec{\mu }}}$$\end{document} and covariance matrix $\tilde{B}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{{\varvec{B}}}$$\end{document} of the barycenter $\tilde{ν}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\nu }}$$\end{document} . Specifically, the mean vector $\tilde{μ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\varvec{\mu }}}$$\end{document} is the weighted average of the mean vectors $μ_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\mu }_k$$\end{document} of the individual measures $ν_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _k$$\end{document} , while the covariance matrix $\tilde{B}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{{\varvec{B}}}$$\end{document} is given by the fixed-point of the iteratively defined sequence ${Δ_{t}}_{t = 0}^{\infty}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\varvec{\Delta }_t\}_{t=0}^{\infty }$$\end{document} . Please refer to $\overset{´}{A}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\acute{\text {A}}$$\end{document} lvarez-Esteban et al. (Reference Álvarez-Esteban, Del Barrio, Cuesta-Albertos and Matrán2016) for the proof of Theorem 1.

2.2. Implementation of the LS-WASP Algorithm within the IRT Framework

Consider an item response dataset comprising n examinees and J items from a large-scale testing, the detailed stages of the LS-WASP algorithm to approximate the posterior of the 2PL model are as follows:

Stage 1: Partitioning of the Full Data

Divide the full data set $y$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{y}}$$\end{document} into K disjoint subsets $y^{(1)}, \dots, y^{(K)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{y}}^{(1)}, \cdots , {\varvec{y}}^{(K)}$$\end{document} through random subsampling. Specifically, let $s_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_k$$\end{document} be the number of examinees in subset k, ensuring that $\sum_{k = 1}^{K} s_{k} = n$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum _{k=1}^{K}s_k=n$$\end{document} , where n is the total number of examinees. The 2PL model for subset k is given by:

(5)

\begin{matrix} p_{ij}^{(k)} = P (y_{ij}^{(k)} = 1 (θ_{i}^{(k)}, a_{j}^{(k)}, b_{j}^{(k)})) = \frac{exp (a_{j}^{(k)}, (θ_{i}^{(k)} - b_{j}^{(k)}))}{1 + exp (a_{j}^{(k)}, (θ_{i}^{(k)} - b_{j}^{(k)}))} . \end{matrix}

Here, $y_{ij}^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{ij}^{(k)}$$\end{document} represents the response data for subset k. $p_{ij}^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{ij}^{(k)}$$\end{document} is the probability that the ith examinee in subset k answers the jth item correctly. $θ_{i}^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _i^{(k)}$$\end{document} represents the ability parameter of the ith examinee in subset k. The parameters $a_{j}^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{j}^{(k)}$$\end{document} and $b_{j}^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{j}^{(k)}$$\end{document} refer to the discrimination and difficulty parameters for the jth item in subset k, respectively ( $i = 1, \dots, s_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i=1,\cdots ,s_k$$\end{document} , $j = 1, \dots, J$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,\cdots ,J$$\end{document} , $k = 1, \dots, K$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k=1,\cdots ,K$$\end{document} ).

Note that we partition the examinees into different subsets, but the items remain the same across all subsets. This means that even though each subset has distinct individuals, they respond to the same items. In this case, our divide-and-conquer approach primarily focuses on the item parameters: discrimination and difficulty. The identifiability constraints in each subset are set to be consistent with those of the full dataset, so that the estimated item parameters from different subsets are put on the same scale.

Stage 2: Parallel Sampling of Each Subset using Modified Likelihood method

Denote the parameter of item j as $η_{j} = (a_{j}, b_{j})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_j=(a_j,b_j)$$\end{document} . $p (η_{j})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\varvec{\eta }_j)$$\end{document} is the prior distribution of $η_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_j$$\end{document} , and $ℓ_{k} (η_{j})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _k(\varvec{\eta }_j)$$\end{document} is the likelihood of $η_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_j$$\end{document} on subset k. Consequently, the conditional posterior of $η_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_j$$\end{document} on subset k is given by:

(6)

\begin{matrix} π (η_{j} (y^{(k)})) = \frac{{(ℓ_{k}, (η_{j}))}^{n / s_{k}} p (η_{j})}{\int {ℓ_{k} (η_{j})}^{n / s_{k}} p (η_{j}) d η_{j}}, k = 1, \dots, K, j = 1, \dots, J . \end{matrix}

Here, the likelihood has been raised to the power of $n / s_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n/s_k$$\end{document} . This adjustment, known as the stochastic approximation strategy, was proposed by Minsker et al. (2014; Reference Minsker, Srivastava, Lin and Dunson2017) to ensure that the variance of the subset posterior roughly matches the variance of the full data posterior.

While the LS-WASP algorithm can be applied to any sampling technique in MCMC, this paper chooses the Gibbs sampling algorithm based on the P $\overset{´}{o}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\acute{\text {o}}$$\end{document} lya-Gamma distribution (PG-Gibbs) (Jiang and Templin, Reference Jiang and Templin2019; Balamuta and Culpepper, Reference Balamuta and Culpepper2022; Jimenez et al., Reference Jimenez, Balamuta and Culpepper2023) for subset posterior sampling. The reasons for this choice are as follows. First, the M-H algorithm, though a common choice for Markov chain generation, introduces potential inefficiency because the acceptance rate of each MC draw is only between 20% and 40%. Additionally, applying the traditional Gibbs sampling algorithm to logistic regression also presents considerable challenges. Specifically, this function, which forms the basis for logistic regression, leads to non-standard and highly intricate posterior distributions, which often lacks straightforward analytical forms. However, the PG-Gibbs sampler not only adeptly addresses this challenge by providing a full conditional posterior distribution that allows for comprehensive analytical manipulation, but also provides essential theoretical support such as ergodicity (Choi and Hobert, Reference Choi and Hobert2013; Polson et al., Reference Polson, Scott and Windle2013). Finally, Balamuta and Culpepper (Reference Balamuta and Culpepper2022) showcased that the application of the P $\overset{´}{o}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\acute{\text {o}}$$\end{document} lya-gamma formulation to the logit link function outperforms the traditional probit link in computational speed. Notably, they also highlighted its exceptional proficiency when dealing with large-scale test data, making it a compelling choice for research involving substantial datasets. As a result, these combined factors substantially motivate our decision to utilize the PG-Gibbs algorithm.

Therefore, when the discrimination parameters and difficulty parameters are independent, the modified posterior density of subset k ( $k = 1, \dots, K$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k=1,\cdots ,K$$\end{document} ) based on PG-Gibbs algorithm for the discrimination parameter $a_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_j$$\end{document} and difficulty parameter $b_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_j$$\end{document} is given by:

(7)

\begin{matrix} f (a_{j}^{(k)} | θ^{(k)}, b^{(k)}, ω^{(k)}, y^{(k)}) \\ \propto f (a_{j}^{(k)}) exp (- \frac{n}{s_{k}} \cdot \frac{1}{2} {(z_{a}^{(k)} - (θ^{(k)} - 1 b_{j}^{(k)}) a_{j}^{(k)})}^{T} Ω_{ab}^{(k)} (z_{a}^{(k)} - (θ^{(k)} - 1 b_{j}^{(k)}) a_{j}^{(k)})), \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}&f\left( a_{j}^{(k)}|\varvec{\theta }^{(k)},{\varvec{b}}^{(k)},\varvec{\omega }^{(k)},{\varvec{y}}^{(k)} \right) \nonumber \\&\propto f\left( a_{j}^{(k)}\right) \exp \left\{ -\frac{n}{s_{k}}\cdot \frac{1}{2}\left[ {\varvec{z}}_{a}^{(k)}-\left( \varvec{\theta }^{(k)}-{\textbf{1}}b_{j}^{(k)}\right) a_{j}^{(k)}\right] ^{T}\varvec{\Omega }_{ab}^{(k)}\left[ {\varvec{z}}_{a}^{(k)}-\left( \varvec{\theta }^{(k)}-{\textbf{1}}b_{j}^{(k)}\right) a_{j}^{(k)}\right] \right\} , \end{aligned}$$\end{document}

(8)

\begin{matrix} f (b_{j}^{(k)} | θ^{(k)}, a^{(k)}, ω^{(k)}, y^{(k)}) \\ \propto f (b_{j}^{(k)}) exp (- \frac{n}{s_{k}} \cdot \frac{1}{2} {(z_{b}^{(k)} + 1 a_{j}^{(k)} b_{j}^{(k)})}^{T} Ω_{ab}^{(k)} (z_{b}^{(k)} + 1 a_{j}^{(k)} b_{j}^{(k)})) . \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}&f\left( b_{j}^{(k)}|\varvec{\theta }^{(k)},{\varvec{a}}^{(k)},\varvec{\omega }^{(k)},{\varvec{y}}^{(k)}\right) \nonumber \\&\propto f\left( b_{j}^{(k)}\right) \exp \left\{ -\frac{n}{s_{k}}\cdot \frac{1}{2}\left( {\varvec{z}}_{b}^{(k)}+{\textbf{1}}a_{j}^{(k)}b_{j}^{(k)}\right) ^{T}\varvec{\Omega }_{ab}^{(k)}\left( {\varvec{z}}_{b}^{(k)}+{\textbf{1}}a_{j}^{(k)}b_{j}^{(k)}\right) \right\} . \end{aligned}$$\end{document}

The prior distributions of $θ_{i}^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _i^{(k)}$$\end{document} , $a_{j}^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_j^{(k)}$$\end{document} and $b_{j}^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_j^{(k)}$$\end{document} are assumed to follow $N (μ_{1}^{(k)}, σ_{1}^{2 (k)})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N(\mu _{1}^{(k)},\sigma _{1}^{2(k)})$$\end{document} , $T N_{(0, + \infty)} (μ_{2}^{(k)}, σ_{2}^{2 (k)})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$TN_{(0,+\infty )}(\mu _{2}^{(k)},\sigma _{2}^{2(k)})$$\end{document} , and $N (μ_{3}^{(k)}, σ_{3}^{2 (k)})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N(\mu _{3}^{(k)},\sigma _{3}^{2(k)})$$\end{document} , respectively. Subsequently, the basic steps for sampling a subset are as follows:

(1) Given $θ^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }^{(k)}$$\end{document} , $a^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}^{(k)}$$\end{document} , and $b^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{b}}^{(k)}$$\end{document} , draw augmented variable $ω_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\omega _{ij}$$\end{document} from the $P G (1, a_{j}^{(k)} | θ_{i}^{(k)} - b_{j}^{(k)} |)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$PG(1,a_j^{(k)}|\theta _i^{(k)}-b_j^{(k)}|)$$\end{document} distribution;
(2) Given $ω^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\omega }^{(k)}$$\end{document} , $a^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}^{(k)}$$\end{document} , and $b^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{b}}^{(k)}$$\end{document} , draw $θ_{i}^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _{i}^{(k)}$$\end{document} from a normal distribution $N (m_{θ_{i}}^{(k)}, V_{θ_{i}}^{(k)})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N(m_{\theta _i}^{(k)},V_{\theta _i}^{(k)})$$\end{document} , where $m_{θ_{i}}^{(k)} = V_{θ_{i}}^{(k)} ({(a^{(k)})}^{T} Ω_{θ} z_{θ} + \frac{μ_{1}^{(k)}}{σ_{1}^{2 (k)}})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m_{\theta _i}^{(k)}=V_{\theta _i}^{(k)}\left( ({\varvec{a}}^{(k)})^{T}\varvec{\Omega }_{\theta } {\varvec{z}}_{\theta } +\frac{\mu _{1}^{(k)}}{\sigma _{1}^{2(k)}}\right) $$\end{document} , $V_{θ_{i}}^{(k)} = {({(a^{(k)})}^{T} Ω_{θ} a^{(k)} + \frac{1}{σ_{1}^{2 (k)}})}^{- 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_{\theta _i}^{(k)}=\left( ({\varvec{a}}^{(k)})^{T}\varvec{\Omega }_{\theta }{\varvec{a}}^{(k)}+\frac{1}{\sigma _{1}^{2(k)}}\right) ^{-1}$$\end{document} , $z_{θ} = {(\frac{a_{1}^{(k)} b_{1}^{(k)} ω_{i 1} + κ_{i 1}}{ω_{i 1}}, \dots, \frac{a_{J}^{(k)} b_{J}^{(k)} ω_{iJ} + κ_{iJ}}{ω_{iJ}})}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{z}}_{\theta }=\left( \frac{a_{1}^{(k)}b_{1}^{(k)}\omega _{i1} +\kappa _{i1}}{\omega _{i1}},\cdots ,\frac{a_{J}^{(k)}b_{J}^{(k)}\omega _{iJ}+\kappa _{iJ}}{\omega _{iJ}}\right) ^{T}$$\end{document} , $Ω_{θ} = diag (ω_{i 1}, \dots, ω_{iJ})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Omega }_{\theta }=\text {diag}(\omega _{i1},\cdots ,\omega _{iJ})$$\end{document} , and $κ_{ij} = y_{ij}^{(k)} - \frac{1}{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa _{ij}=y_{ij}^{(k)}-\frac{1}{2}$$\end{document} .
(3) Given $ω^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\omega }^{(k)}$$\end{document} , $θ^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }^{(k)}$$\end{document} , and $b^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{b}}^{(k)}$$\end{document} , draw $a_{j}^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{j}^{(k)}$$\end{document} from a truncated normal distribution $T N_{(0, + \infty)} (m_{a_{j}}^{(k)}, V_{a_{j}}^{(k)})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$TN_{\left( 0,+\infty \right) }\left( m_{a_{j}}^{\left( k\right) },V_{a_{j}}^{\left( k\right) }\right) $$\end{document} , where $m_{a_{j}}^{(k)} = V_{a_{j}}^{(k)} (\frac{n}{s_{k}} \cdot {(θ^{(k)} - 1 b_{j}^{(k)})}^{T} Ω_{ab}^{(k)} z_{a}^{(k)} + \frac{μ_{2}^{(k)}}{σ_{2}^{2 (k)}})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m_{a_j}^{(k)}=V_{a_j}^{(k)}\left( \frac{n}{s_k}\cdot (\varvec{\theta }^{(k)} -{{{\textbf {1}}}}b_j^{(k)})^T\varvec{{\Omega }}_{ab}^{(k)}{{\varvec{z}}}_{a}^{(k)}+\frac{\mu _2^{(k)}}{\sigma _2^{2(k)}}\right) $$\end{document} , $V_{a_{j}}^{(k)} = {(\frac{n}{s_{k}} \cdot {(θ^{(k)} - 1 b_{j}^{(k)})}^{T} Ω_{ab}^{(k)} (θ^{(k)} - 1 b_{j}^{(k)}) + \frac{1}{σ_{2}^{2 (k)}})}^{- 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_{a_j}^{(k)}=\left( \frac{n}{s_k}\cdot (\varvec{\theta }^{(k)}-{{{\textbf {1}}}}b_j^{(k)})^T\varvec{{\Omega }}_{ab}^{(k)}(\varvec{\theta }^{(k)} -{{{\textbf {1}}}}b_j^{(k)})+\frac{1}{\sigma _2^{2(k)}}\right) ^{-1}$$\end{document} , $z_{a}^{(k)} = {(\frac{κ_{1 j}}{ω_{1 j}}, \dots, \frac{κ_{s_{k} j}}{ω_{s_{k} j}})}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{z}}}_{a}^{(k)}=\left( \frac{\kappa _{1j}}{\omega _{1j}},\cdots ,\frac{\kappa _{s_kj}}{\omega _{s_kj}}\right) ^{T}$$\end{document} , $1 = {(1, \dots, 1)}_{s_{k} \times 1}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{1}}=(1,\cdots ,1)_{s_k\times 1}^{T}$$\end{document} , and $Ω_{ab}^{(k)} = diag (ω_{1 j}, \dots, ω_{s_{k} j})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{{\Omega }}_{ab}^{(k)}=\text{ diag }(\omega _{1j},\cdots ,\omega _{s_kj})$$\end{document} ;
(4) Given $ω^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\omega }^{(k)}$$\end{document} , $θ^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }^{(k)}$$\end{document} , and $a^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}^{(k)}$$\end{document} , draw $b_{j}^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{j}^{(k)}$$\end{document} from a normal distribution $N (m_{b_{j}}^{(k)}, V_{b_{j}}^{(k)})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N(m_{b_j}^{(k)},V_{b_j}^{(k)})$$\end{document} , where $m_{b_{j}}^{(k)} = V_{b_{j}}^{(k)} (- \frac{n}{s_{k}} \cdot {(1 a_{j}^{(k)})}^{T} Ω_{ab}^{(k)} z_{b}^{(k)} + \frac{μ_{3}^{(k)}}{σ_{3}^{2 (k)}})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m_{b_j}^{(k)}=V_{b_j}^{(k)}\left( -\frac{n}{s_k}\cdot ({{{\textbf {1}}}}a_j^{(k)})^T\varvec{{\Omega }}_{ab}^{(k)}{{\varvec{z}}}_{b}^{(k)} +\frac{\mu _3^{(k)}}{\sigma _3^{2(k)}}\right) $$\end{document} , $V_{b_{j}}^{(k)} = {(\frac{n}{s_{k}} \cdot {(1 a_{j}^{(k)})}^{T} Ω_{ab}^{(k)} 1 a_{j}^{(k)} + \frac{1}{σ_{3}^{2 (k)}})}^{- 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_{b_j}^{(k)}=\left( \frac{n}{s_k}\cdot ({{{\textbf {1}}}}a_j^{(k)})^T\varvec{{\Omega }}_{ab}^{(k)}{{{\textbf {1}}}}a_j^{(k)} +\frac{1}{\sigma _3^{2(k)}}\right) ^{-1}$$\end{document} , $z_{b}^{(k)} = {(\frac{κ_{1 j} - a_{j}^{(k)} θ_{1}^{(k)} ω_{1 j}}{ω_{1 j}}, \dots, \frac{κ_{s_{k} j} - a_{j}^{(k)} θ_{s_{k}}^{(k)} ω_{s_{k} j}}{ω_{s_{k} j}})}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{z}}}_{b}^{(k)}=\left( \frac{\kappa _{1j}-a_{j}^{(k)}\theta _{1}^{(k)}\omega _{1j}}{\omega _{1j}},\cdots ,\frac{\kappa _{s_kj} -a_{j}^{(k)}\theta _{s_k}^{(k)}\omega _{s_kj}}{\omega _{s_kj} }\right) ^{T}$$\end{document} , and $Ω_{ab}^{(k)} = diag (ω_{1 j}, \dots, ω_{s_{k} j})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{{\Omega }}_{ab}^{(k)}=\text{ diag }\left( \omega _{1j},\cdots ,\omega _{s_kj}\right) $$\end{document} .

For more details of the PG-Gibbs algorithm, please see Jiang and Templine (Reference Jiang and Templin2019). The online supplementary S1 provides details of the PG-Gibbs algorithm for full data.

Stage 3: Assembling and Integrating Sampled Parameters from Each Subset

Denote the item parameters of the j-th item be $η_{j} = (a_{j}, b_{j})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_j=(a_j,b_j)$$\end{document} . Assume that $π_{(1)}, \dots, π_{(K)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{(1)},\cdots ,\pi _{(K)}$$\end{document} represent the posterior distributions of the K subsets of $η_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_j$$\end{document} , respectively. Let $η_{j}^{(k, 1)}, \dots, η_{j}^{(k, M)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_{j}^{(k,1)},\cdots ,\varvec{\eta }_{j}^{(k,M)}$$\end{document} denote the samples from the kth subset posterior of $η_{j}^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_j^{(k)}$$\end{document} , where M is the total number of post-burn-in iterations. In distributed Bayesian applications for IRT model, $ν_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _k$$\end{document} of Definition 1 is the kth subset posterior distribution $π_{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{(k)}$$\end{document} . Therefore, the Wasserstein barycenter of $π_{(1)}, \dots, π_{(K)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{(1)},\cdots ,\pi _{(K)}$$\end{document} is $\tilde{π}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\pi }}$$\end{document} , which is also known as the Wasserstein posterior (WASP). We use WASP instead of the full data posterior for parameter inference. Let $μ_{η_{j}}^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\mu }_{\eta _j}^{(k)}$$\end{document} and $Σ_{η_{j}}^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Sigma }_{\eta _j}^{(k)}$$\end{document} represent the mean vector and covariance matrix of the kth subset posterior, respectively. From stage 2, we can obtain the Monte Carlo estimates of $μ_{η_{j}}^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\mu }_{\eta _j}^{(k)}$$\end{document} and $Σ_{η_{j}}^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Sigma }_{\eta _j}^{(k)}$$\end{document} as follows:

(9)

\begin{matrix} {\hat{μ}}_{η_{j}}^{(k)} = \frac{1}{M} \sum_{m = 1}^{M} η_{j}^{(k, m)}, \end{matrix}

(10)

\begin{matrix} {\hat{Σ}}_{η_{j}}^{(k)} = \frac{1}{M} \sum_{m = 1}^{M} (η_{j}^{(k, m)} - {\hat{μ}}_{η_{j}}^{(k)}) {(η_{j}^{(k, m)} - {\hat{μ}}_{η_{j}}^{(k)})}^{'} . \end{matrix}

Therefore, assuming $π_{(1)}, \dots, π_{(K)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{(1)},\cdots ,\pi _{(K)}$$\end{document} belong to the same location-scatter family, the estimates of the mean vector ${\tilde{μ}}_{η_{j}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\varvec{\mu }}}_{\eta _j}$$\end{document} and covariance matrix ${\tilde{Σ}}_{η_{j}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\varvec{\Sigma }}}_{\eta _j}$$\end{document} of the WASP are obtained from Theorem 1 as follows:

(11)

\begin{matrix} {\hat{\tilde{μ}}}_{η_{j}} = \sum_{k = 1}^{K} w_{k} {\hat{μ}}_{η_{j}}^{(k)}, \end{matrix}

(12)

\begin{matrix} {\hat{\tilde{Σ}}}_{η_{j}} = {\hat{Δ}}_{\infty}, \end{matrix}

where

(13)

\begin{matrix} {\hat{Δ}}_{t + 1} = {\hat{Δ}}_{t}^{- \frac{1}{2}} (\sum_{k = 1}^{K}, w_{k}, (, {\hat{Δ}}_{t}, {\hat{Σ}}_{η_{j}}^{(k)}, )^{\frac{1}{2}}) {(\sum_{k = 1}^{K}, w_{k}, (, {\hat{Δ}}_{t}, {\hat{Σ}}_{η_{j}}^{(k)}, )^{\frac{1}{2}})}^{'} {\hat{Δ}}_{t}^{- \frac{1}{2}}, t = 0, 1, 2, \dots, + \infty, \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\widehat{\varvec{\Delta }}_{t+1}=\widehat{\varvec{\Delta }}_t^{-\frac{1}{2}} \left\{ \sum _{k=1}^Kw_k\big (\widehat{\varvec{\Delta }}_t\widehat{\varvec{\Sigma }}_{\eta _j}^{(k)}\big )^{\frac{1}{2}}\right\} \left\{ \sum _{k=1}^Kw_k\big (\widehat{\varvec{\Delta }}_t\widehat{\varvec{\Sigma }}_{\eta _j}^{(k)}\big )^{\frac{1}{2}}\right\} '\widehat{\varvec{\Delta }}_t^{-\frac{1}{2}}, t=0,1,2,\ldots ,+\infty ,} \end{aligned}$$\end{document}

and we set ${\hat{Δ}}_{0} = I$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\varvec{\Delta }}_0={\varvec{I}}$$\end{document} , where $I$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{I}}$$\end{document} represents the identity matrix. Note that Eq. (13) is the numerically stabilized version proposed by Srivastava and Xu (Reference Srivastava and Xu2021) to solve the rank deficiency problem of ${\hat{Δ}}_{t}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\varvec{\Delta }}_{t}$$\end{document} in Eq. (4). Thus, we use the fixed-point iterations in Eq. (13) instead of Eq. (4) for computing the covariance matrix ${\hat{\tilde{Σ}}}_{η_{j}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\widetilde{\varvec{\Sigma }}}}_{\eta _j}$$\end{document} and use the following convergence criterion: $| tr ({\hat{Δ}}_{t + 1} - {\hat{Δ}}_{t}) | < 10^{- 6}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|\text {tr}(\widehat{\varvec{\Delta }}_{t+1}-\widehat{\varvec{\Delta }}_{t})|<10^{-6}$$\end{document} . In this study, given the similarities in the sizes of the K subsets $s_{1}, \dots, s_{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_1, \cdots , s_K$$\end{document} , we set the weights of $π_{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{(k)}$$\end{document} to $w_{k} = \frac{1}{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_k=\frac{1}{K}$$\end{document} . However, when the sizes $s_{1}, \dots, s_{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_1, \cdots , s_K$$\end{document} differ substantially, it would be more appropriate to set the weights as $w_{k} = s_{k} / (s_{1} + \dots + s_{K})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_k = s_k/(s_1 +\cdots +s_K)$$\end{document} . This approach assigns greater importance to subsets that contain more samples, thereby reflecting their greater contribution to the overall data set. Furthermore, adjusting the weights in this manner can ensure a more accurate representation of the data and potentially improve the robustness of the subsequent analysis.

According to the definition of the location-scatter family, we can obtain the WASP draws through the following steps for $k = 1, \dots, K, m = 1, \dots, M$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k=1,\cdots ,K,m=1,\cdots ,M$$\end{document} , $j = 1, \dots, J$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,\cdots ,J$$\end{document} :

1. Centralize and standardize the kth subset of posterior draws for item j to obtain ${\hat{u}}_{j}^{(k, m)} = {({\hat{Σ}}_{η_{j}}^{(k)})}^{- \frac{1}{2}} (η_{j}^{(k, m)} - {\hat{μ}}_{η_{j}}^{(k)})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\varvec{u}}}_j^{(k,m)}=(\widehat{\varvec{\Sigma }}_{\eta _j}^{(k)})^{-\frac{1}{2}}(\varvec{\eta }_{j}^{(k,m)}-\widehat{\varvec{\mu }}_{\eta _j}^{(k)})$$\end{document} . Consequently, ${\hat{u}}_{j}^{(k, m)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\varvec{u}}}_j^{(k,m)}$$\end{document} has a mean vector of zero and a covariance of the identity matrix.

2. Recentralize and restandardize ${\hat{u}}_{j}^{(k, m)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\varvec{u}}}_j^{(k,m)}$$\end{document} to derive the WASP draws as ${\hat{\tilde{η}}}_{j}^{(k, m)} = {\hat{\tilde{μ}}}_{η_{j}} + {({\hat{\tilde{Σ}}}_{η_{j}})}^{\frac{1}{2}} {\hat{u}}_{j}^{(k, m)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\widetilde{\varvec{\eta }}}}_{j}^{(k,m)}=\widehat{{\widetilde{\varvec{\mu }}}}_{\eta _j} +(\widehat{{\widetilde{\varvec{\Sigma }}}}_{\eta _j})^{\frac{1}{2}}\widehat{{\varvec{u}}}_j^{(k,m)}$$\end{document} .

In this study, we assume that the prior distributions of the discrimination parameters and difficulty parameters are independent. Assuming independence simplifies the model estimation process, making the estimation and inference of parameters more direct and computationally manageable. This is particularly important in cases of large parameter spaces or data volumes to ensure the feasibility and efficiency of model estimation. The independent prior is also supported by many literature (e.g., Wang et al. Reference Wang, Fan, Chang and Douglas2013, Reference Wang, Xu and Shang2018). Furthermore, there is often a lack of prior knowledge about the specific relationship between discrimination and difficulty parameters in practical applications, with no theoretical or empirical basis to support significant correlation between these parameters (van der Linden Reference van der Linden2007, please see Table 1 in their paper), making the use of independent prior distributions a prudent choice. Therefore, in this case, we can compute the mean and variance of each parameter separately. Take the discrimination parameter $a_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_j$$\end{document} as an example. The Monte Carlo estimates of $μ_{a_{j}}^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{a_{j}}^{(k)}$$\end{document} and $σ_{a_{j}}^{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{a_j}^{(k)}$$\end{document} , and the estimates of the mean and variance of the WASP are shown below:

(14)

\begin{matrix} {\hat{μ}}_{a_{j}}^{(k)} = \frac{1}{M} \sum_{m = 1}^{M} a_{j}^{(k, m)}, {\hat{σ}}_{a_{j}}^{(k)} = \frac{1}{M} \sum_{m = 1}^{M} {(a_{j}^{(k, m)} - {\hat{μ}}_{a_{j}}^{(k)})}^{2} . \end{matrix}

(15)

\begin{matrix} {\hat{\tilde{μ}}}_{a_{j}} = \frac{1}{K} \sum_{k = 1}^{K} {\hat{μ}}_{a_{j}}^{(k)}, {\hat{\tilde{σ}}}_{a_{j}} = {\hat{Δ}}_{\infty} = {(\frac{1}{K}, \sum_{k = 1}^{K}, (, {\hat{σ}}_{a_{j}}^{(k)}, )^{\frac{1}{2}})}^{2} . \end{matrix}

Note that since the variance ${\hat{\tilde{σ}}}_{a_{j}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\widetilde{\sigma }}}_{a_j}$$\end{document} is a unidimensional variable (as is the case for the difficulty parameter), the sequences ${\hat{Δ}}_{t}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\Delta }_t$$\end{document} actually reach the fixed point at $t = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t=1$$\end{document} , which simplifies the model estimation process and improves computational efficiency. The process of deriving WASP draws for the difficulty parameter $b$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{b}}$$\end{document} follows a similar approach as the discrimination parameter $a$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}$$\end{document} . The details of stage 3 of the LS-WASP algorithm for the independent item parameters are given in Algorithm 1.

Remark 1

By the WASP draws ${\hat{\tilde{a}}}_{j}^{(k, m)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\widetilde{a}}_{j}^{(k,m)}$$\end{document} of parameter $a_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_j$$\end{document} , we can derive the WASP estimator of $a_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_j$$\end{document} as follows:

(16)

\begin{matrix} {\hat{\tilde{a}}}_{j} & = \frac{1}{KM} \sum_{k = 1}^{K} \sum_{m = 1}^{M} {\hat{\tilde{a}}}_{j}^{(k, m)} = \frac{1}{KM} \sum_{k = 1}^{K} \sum_{m = 1}^{M} ({\hat{\tilde{μ}}}_{a_{j}} + {({\hat{\tilde{σ}}}_{a_{j}})}^{\frac{1}{2}} {\hat{u}}_{j}^{(k, m)}) \\ = {\hat{\tilde{μ}}}_{a_{j}} + {({\hat{\tilde{σ}}}_{a_{j}})}^{\frac{1}{2}} \frac{1}{KM} \sum_{k = 1}^{K} \sum_{m = 1}^{M} {({\hat{σ}}_{a_{j}}^{(k)})}^{- \frac{1}{2}} (a_{j}^{(k, m)} - {\hat{μ}}_{a_{j}}^{(k)}) = {\hat{\tilde{μ}}}_{a_{j}} . \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \widehat{\widetilde{a}}_{j}&=\frac{1}{KM}\sum _{k=1}^K \sum _{m=1}^M\widehat{\widetilde{a}}_{j}^{(k,m)} =\frac{1}{KM}\sum _{k=1}^K \sum _{m=1}^M\big (\widehat{{\widetilde{\mu }}}_{a_{j}} +(\widehat{{\widetilde{\sigma }}}_{a_j})^{\frac{1}{2}}{\widehat{u}}_j^{(k,m)}\big )\nonumber \\&=\widehat{{\widetilde{\mu }}}_{a_{j}}+(\widehat{{\widetilde{\sigma }}}_{a_j})^{\frac{1}{2}}\frac{1}{KM}\sum _{k=1}^K \sum _{m=1}^M(\widehat{\sigma }_{a_j}^{(k)})^{-\frac{1}{2}}(a_{j}^{(k,m)}-\widehat{\mu }_{a_{j}}^{(k)}) =\widehat{{\widetilde{\mu }}}_{a_{j}}. \end{aligned}$$\end{document}

According to Eq. (16), we note that the mean of WASP is essentially the average of the posterior means across all subsets. Note that our algorithm offers theoretical guarantee (please refer to Sect. 3 for details). In fact, after partitioning the data, we can only make posterior inferences on parameters of subsets, rather than on the full dataset’s posterior distribution. However, the proposed LS-WASP algorithm enables us to combine all posterior distributions from all subsets to form a new distribution, i.e., the WASP distribution. This WASP effectively approximates the full data posterior distribution, making it a suitable alternative for diverse analyses and posterior evaluations instead of the full data posterior.

Remark 2

Note that our algorithm exhibits high computational efficiency. It adopts a “divide-and-conquer” strategy to split larger datasets into smaller, more manageable subsets, thereby effectively reducing the computational and storage burden. Specifically, when the dataset is divided into K subsets, the effective sample size for each MCMC operation becomes $\frac{1}{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{1}{K}$$\end{document} of the total sample size; thus, the computation time for each subset approximately becomes $\frac{1}{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{1}{K}$$\end{document} of the total runtime. Leveraging the capabilities of parallel processing, as well as the simple and efficient merging operation in the third step of the algorithm, the overall runtime of the algorithm is essentially consistent with the computation time of a single subset, which is significantly less than traditional MCMC method that processes the entire dataset at once. Intuitively, the runtime of our algorithm is with a fraction of the runtime of the full dataset under MCMC method; the more subsets the data is divided into, the shorter the runtime. This point is validated in the subsequent simulation studies.

Algorithm 1 The details for the stage 3 of LS-WASP Algorithm

3. Theoretical Properties

The Bayesian inference using the LS-WASP algorithm involves two types of errors. The first is the statistical error, originating from using the WASP approximation posterior instead of the full data posterior. This deviation from the original full data posterior introduces discrepancies, consequently generating statistical errors. The second type is the Monte Carlo error, which arises from using Monte Carlo estimates. The inherent randomness in Monte Carlo simulations can introduce variations and errors, leading to the overall error in the LS-WASP algorithm.

In this section, we discuss the theoretical properties of the LS-WASP algorithm. First, we present five essential assumptions forming the foundation of our theoretical framework. Each assumption is outlined and justified for its significance to the overall methodology and its role in quantifying the errors previously mentioned. Second, we further explore the two sources of errors, gaining insights into their origins and impacts on the LS-WASP algorithm’s results. Moreover, it opens up opportunities for potential improvements to reduce these errors and enhance the algorithm’s precision and robustness.

3.1. Assumptions

For the notation simplicity in the following text, we define $η$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\eta }}$$\end{document} as either the $a$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}$$\end{document} parameter or the $b$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{b}}$$\end{document} parameter, even though $η$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\eta }}$$\end{document} can also represent the set of item parameters, i.e., $η = {a, b}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\eta }}=\{{\varvec{a}},{\varvec{b}}\}$$\end{document} . Below are the key assumptions for establishing the theoretical foundation of the LS-WASP algorithm.

Assumption 1

(Independent and identically distributed data): We assume that the observations $y_{1}, \dots, y_{n}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{y}}_{1},\cdots ,{\varvec{y}}_{n}$$\end{document} are independent and identically distributed (i.i.d), where $y_{i} = {(y_{i 1}, \dots, y_{iJ})}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{y}}_{i}=(y_{i1},\cdots ,y_{iJ})^T$$\end{document} $(i = 1, \dots, n)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(i=1,\cdots ,n)$$\end{document} .

Assumption 2

(Location-scatter family): There exist probability distributions G specifying location-scatter families $F (G)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {F}}(G)$$\end{document} . Both the full data posterior distribution $π$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi $$\end{document} of $η$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }$$\end{document} and subset posterior distributions $π_{(k)} (k = 1, \dots, K)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{(k)}(k=1,\cdots ,K)$$\end{document} of $η$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }$$\end{document} belong to $F (G)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {F}}(G)$$\end{document} with $P_{η^{*}}^{n}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P_{\varvec{\eta }^*}^n$$\end{document} -probability 1, where $P_{η^{*}}^{n}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P_{\varvec{\eta }^*}^n$$\end{document} is the joint probability distribution of the full data, $η^{*} \in Θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }^*\in \Theta $$\end{document} represents the true value of parameters $η$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }$$\end{document} , and $Θ \subset R^{J}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta \subset {\mathbb {R}}^J$$\end{document} is the parameter space.

Assumption 3

(Regularity conditions for Laplace approximation): We denote an open ball of radius $δ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta $$\end{document} centered at $η$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\eta }}$$\end{document} as $B_{δ} (η)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B_\delta (\varvec{\eta })$$\end{document} . Suppose $h_{n} (η) = - \frac{1}{n} \sum_{i = 1}^{n} log f (y_{i} | η)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_n(\varvec{\eta })=-\frac{1}{n}\sum _{i=1}^{n}\log f({\varvec{y}}_i|\varvec{\eta })$$\end{document} is a six times continuously differentiable real function on $Θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta $$\end{document} (Kass et al., Reference Kass, Tierney and Kadane1990). Let ${\hat{η}}_{n}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\varvec{\eta }}_n$$\end{document} be the maximum likelihood estimate (MLE) of $η$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }$$\end{document} , and $D^{2} h_{n} (η)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D^2h_n(\varvec{\eta })$$\end{document} be the Hessian matrix of $h_{n} (η)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_n(\varvec{\eta })$$\end{document} at $η$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }$$\end{document} . There exist positive numbers $ϵ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon $$\end{document} , N, and $ξ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi $$\end{document} and an integer $n_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n_0$$\end{document} such that for all $n \geq n_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\ge n_0$$\end{document} :

(a) For every $η \in B_{ϵ} ({\hat{η}}_{n})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }\in B_{\epsilon }(\widehat{\varvec{\eta }}_n)$$\end{document} and all $1 \leq j_{1}, \dots, j_{d} \leq J$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1\le j_1,\cdots ,j_d\le J$$\end{document} with $1 \leq d \leq 6$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1\le d\le 6$$\end{document} , the absolute value of the dth derivative of the log likelihood $| \partial_{j_{1}, \dots, j_{d}} h_{n} (η) | < N$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|\partial _{j_1,\cdots ,j_d}h_n(\varvec{\eta })|<N$$\end{document} ;
(b) The determinant of $D^{2} h_{n} ({\hat{η}}_{n})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D^2h_n(\widehat{\varvec{\eta }}_n)$$\end{document} is greater than $ξ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi $$\end{document} , that is, $| D^{2} h_{n} ({\hat{η}}_{n}) | > ξ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|{D^2h_n(\widehat{\varvec{\eta }}_n)}|>\xi $$\end{document} ;
(c) For every $δ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta $$\end{document} satisfying $0 < δ < ϵ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0<\delta <\epsilon $$\end{document} , $B_{δ} ({\hat{η}}_{n}) \subseteq Θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B_{\delta }(\widehat{\varvec{\eta }}_n)\subseteq \Theta $$\end{document} and the upper limit as n goes to infinity of the supremum of $h_{n} ({\hat{η}}_{n}) - h_{n} (η)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${h_n(\widehat{\varvec{\eta }}_n)-h_n(\varvec{\eta })}$$\end{document} over $η \in Θ - B_{δ} ({\hat{η}}_{n})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }\in \Theta -B_\delta (\widehat{\varvec{\eta }}_n)$$\end{document} is less than zero, that is,
$\begin{matrix} \underset{n \to \infty}{lim sup} sup_{η \in Θ - B_{δ} ({\hat{η}}_{n})} {h_{n} ({\hat{η}}_{n}) - h_{n} (η)} < 0 . \end{matrix}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \limsup \limits _{n\rightarrow \infty }\sup \limits _{\varvec{\eta }\in \Theta -B_\delta (\widehat{\varvec{\eta }}_n)}\{h_n(\widehat{\varvec{\eta }}_n)-h_n(\varvec{\eta })\}< 0. \end{aligned}$$\end{document}

Assumption 4

(Disjoint subsets of equal size): The subsets are disjoint, and the number of subsets K and the sample size of subsets s satisfy the conditions $K = o (n^{\frac{1}{2}})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=o(n^{\frac{1}{2}})$$\end{document} , and $K s = n$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Ks=n$$\end{document} , where $s = s_{1} = \dots = s_{K}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s=s_1=\ldots =s_K$$\end{document} .

Assumption 5

(Number of iterations): Let M be the number of iterations. M satisfies the conditions $n = o (\sqrt{M})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=o(\sqrt{M})$$\end{document} and ${\hat{μ}}_{(k)} - μ_{(k)} = O_{p} (M^{- \frac{1}{2}})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\varvec{\mu }}_{(k)}-\varvec{\mu }_{(k)}=O_p(M^{-\frac{1}{2}})$$\end{document} and ${\hat{Σ}}_{(k)} - Σ_{(k)} = O_{p} (M^{- \frac{1}{2}})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\varvec{\Sigma }}_{(k)}-\varvec{\Sigma }_{(k)}=O_p(M^{-\frac{1}{2}})$$\end{document} in $P_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P_k$$\end{document} -probability( $k = 1, \dots, K)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k=1,\cdots ,K)$$\end{document} , where ${\hat{μ}}_{(k)} = {({\hat{μ}}_{1}^{(k)}, \dots, {\hat{μ}}_{J}^{(k)})}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\varvec{\mu }}_{(k)}=(\widehat{\mu }_1^{(k)},\cdots ,\widehat{\mu }_J^{(k)})^T$$\end{document} with ${\hat{μ}}_{j}^{(k)} = \frac{1}{M} \sum_{m = 1}^{M} η_{j}^{(k, m)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\mu }_j^{(k)}=\dfrac{1}{M}\sum _{m=1}^M\eta _{j}^{(k,m)}$$\end{document} represents the sample mean of the sequence of $η_{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_{(k)}$$\end{document} values and ${\hat{Σ}}_{(k)} = diag ({\hat{σ}}_{1}^{(k)}, \dots, {\hat{σ}}_{J}^{(k)})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\varvec{\Sigma }}_{(k)}=\text {diag}(\widehat{\sigma }_1^{(k)},\cdots ,\widehat{\sigma }_J^{(k)})$$\end{document} with ${\hat{σ}}_{j}^{(k)} = \frac{1}{M} \sum_{m = 1}^{M} {(η_{j}^{(k, m)} - {\hat{μ}}_{j}^{(k)})}^{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\sigma }_j^{(k)}=\dfrac{1}{M}\sum _{m=1}^M(\eta _{j}^{(k,m)}-\widehat{\mu }_j^{(k)})^2$$\end{document} denotes the sample covariance of the same sequence. $P_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P_k$$\end{document} is the probability measure of the posterior draw ${η_{(k)}^{(m)}, m = 1, \dots, M}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\varvec{\eta }_{(k)}^{(m)},m=1,\cdots ,M\}$$\end{document} on subset k.

Assumptions 1–4 are standard in the divide-and-conquer strategy for Bayesian inference, which is typically applied to the exponential family (Xue and Liang, Reference Xue and Liang2019; Shyamalkumar and Srivastava, Reference Shyamalkumar and Srivastava2022). In other words, if $P_{η^{*}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P_{\eta ^*}$$\end{document} belongs to the exponential family, Assumptions 1–4 are valid. Assumption 2 can be used to derive the mean and covariance matrix of the WASP posterior of item parameters. In fact, in IRT models, given the data $y$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{y}}$$\end{document} , the full data and subset posterior distributions of item parameters do not belong to the same location-scatter family. However, in many divide-and-conquer studies, it is common to approximate the subset posterior distribution using the same location-scatter family (Srivastava and Xu, Reference Srivastava and Xu2021), making Assumption 2 is reasonable. By combining the LS-WASP algorithm with subset parameter sampling, we can obtain a true WASP approximation under Assumption 2, using the location-scatter family-based subset posterior distribution. Moreover, when the sample size of each subset is large, we can demonstrate that the Bernstein–von Mises theorem holds, indicating that substituting the full data posterior distribution with a true WASP approximation is justifiable. Assumption 3 is a standard requirement for the posterior expansion based on the Laplace method in the quantization of the approximation (Kass et al., Reference Kass, Tierney and Kadane1990). Although Assumption 3 is commonly applied in various statistical models (e.g., Xue and Liang Reference Xue and Liang2019; Shyamalkumar and Srivastava Reference Shyamalkumar and Srivastava2022), our study exclusively focuses on IRT models. Therefore, we introduce a proposition to verify Assumption 3 (a) and (b) holds specifically for 2PL and M2PL models. This proposition necessitates that the discrimination parameter $a$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}$$\end{document} , the difficulty parameter $b$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{b}}$$\end{document} , and the ability parameter $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }$$\end{document} are all bounded. The proposition and its detailed proof are provided in S5.1 of the online supplement. Assumption 3 (c) is a reasonable assumption to ensure the uniqueness of MLE ${\hat{η}}_{n}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\varvec{\eta }}}_n$$\end{document} and has been studied in the IRT literature (e.g., San Martín Reference San Martín2016). Assumption 4 requires a uniform subset sample size for simplifying the analysis, although the LS-WASP algorithm can still be applied when subset sizes vary. In this study, to simplify the calculations, we assume $K = o (n^{\frac{1}{2}})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=o(n^{\frac{1}{2}})$$\end{document} , which follows the assumptions made by Xue and Liang (Reference Xue and Liang2019). Assumption 5 is valid when the subset sampling scheme exhibits geometric ergodicity. With geometric ergodicity, the Markov chain (which the subsets are based on) mixes quickly, reducing the autocorrelation and, consequently, the Monte Carlo error. This, in turn, allows for an accurate approximation of the expectation, contributing to the overall effectiveness and precision of the LS-WASP algorithm. In this study, we adopt the PG-Gibbs algorithm as the subset sampling scheme. Choi and Hobert (Reference Choi and Hobert2013) have demonstrated that the P $\overset{´}{o}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\acute{\text {o}}$$\end{document} lya-Gamma data augmentation strategy exhibits uniform ergodicity.

3.2. Statistical Error

Note that the LS-WASP algorithm introduces a source of error for posterior inference on parameter $η$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }$$\end{document} when it uses $\tilde{π}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\pi }}$$\end{document} to infer parameter $η$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }$$\end{document} , rather than the true $π$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi $$\end{document} . The reason it is called statistical error is due to the absence of any Monte Carlo approximation. Another source of error is the Monte Carlo error, which is described in detail in subsection 3.3.

To accurately quantify the statistical error, the following specific corollary is necessary. This corollary provides a mathematical framework for precise computation of the statistical error. The magnitude and impact of the statistical error on the results may vary depending on factors such as the complexity of the model, the nature of the data, and the specifics of the algorithmic implementation. Therefore, proper management of the statistical error is crucial for the performance and reliability of the LS-WASP algorithm.

Corollary 1

Consider two probability measures A and B in $P_{2} (R^{p})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {P}}_2({\mathbb {R}}^p)$$\end{document} . Let $m_{A}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{m}}_A$$\end{document} and $m_{B}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{m}}_B$$\end{document} represent the means of A and B, and let $Σ_{A}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Sigma }_A$$\end{document} and $Σ_{B}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Sigma }_B$$\end{document} denote the covariance matrices of A and B, respectively. Under the assumption that $Σ_{A}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Sigma }_A$$\end{document} is nonsingular, the following inequality holds

(17)

\begin{matrix} W_{2}^{2} (A, B) \geq {‖ m_{A} - m_{B} ‖}_{2}^{2} + tr (Σ_{A} + Σ_{B} - 2 {(Σ_{A}^{\frac{1}{2}} Σ_{B} Σ_{A}^{\frac{1}{2}})}^{\frac{1}{2}}), \end{matrix}

where ${‖ \cdot ‖}_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Vert \cdot \Vert _2$$\end{document} denotes the Euclidean distance and $tr (\cdot)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {tr}(\cdot )$$\end{document} represents the trace of a matrix. The equality holds when the map $T (x) = (m_{B} - m_{A}) + D x$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(x) = ({\varvec{m}}_B - {\varvec{m}}_A) + {\varvec{D}}x$$\end{document} transports A to B, where $D = Σ_{A}^{- \frac{1}{2}} {(Σ_{A}^{\frac{1}{2}} Σ_{B} Σ_{A}^{\frac{1}{2}})}^{\frac{1}{2}} Σ_{A}^{- \frac{1}{2}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{D}}=\varvec{\Sigma }_A^{-\frac{1}{2}}(\varvec{\Sigma }_A^{\frac{1}{2}}\varvec{\Sigma }_B \varvec{\Sigma }_A^{\frac{1}{2}})^{\frac{1}{2}}\varvec{\Sigma }_A^{-\frac{1}{2}}$$\end{document} . Here, $D$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{D}}$$\end{document} is a positive semi-definite matrix.

For the detailed proof of this corollary, please refer to the proof of Theorem 2.3 in $\overset{´}{A}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\acute{\text {A}}$$\end{document} lvarez-Esteban et al. (Reference Álvarez-Esteban, Del Barrio, Cuesta-Albertos and Matrán2016). Next, we introduce the notational conventions used to state the theoretical results. We define the full data posterior, the kth subset posterior, and the WASP approximation of the interested parameter $η$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }$$\end{document} , as $π$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi $$\end{document} , $π_{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{(k)}$$\end{document} and $\tilde{π}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\pi }}$$\end{document} , for $k = 1, \dots, K$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k=1,\cdots ,K$$\end{document} , respectively. Let $μ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\mu }$$\end{document} , $μ_{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\mu }_{(k)}$$\end{document} , $\tilde{μ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\varvec{\mu }}}$$\end{document} , and $Σ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Sigma }$$\end{document} , $Σ_{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Sigma }_{(k)}$$\end{document} , $\tilde{Σ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\varvec{\Sigma }}}$$\end{document} represent the means and covariance matrices of $π$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi $$\end{document} , $π_{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{(k)}$$\end{document} , and $\tilde{π}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\pi }}$$\end{document} , respectively. Therefore, we define the statistical error as $W_{2}^{2} (π, \tilde{π})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W_2^2(\pi ,{\widetilde{\pi }})$$\end{document} . Under Corollary 1, we have

(18)

\begin{matrix} W_{2}^{2} (π, \tilde{π}) = {‖ μ - \tilde{μ} ‖}_{2}^{2} + tr (Σ + \tilde{Σ} - 2 {({\tilde{Σ}}^{\frac{1}{2}} Σ {\tilde{Σ}}^{\frac{1}{2}})}^{\frac{1}{2}}) . \end{matrix}

Subsequently, we establish asymptotic statistical guarantees for the LS-WASP algorithm, provided that Assumptions 1–4 are satisfied, as detailed below.

Theorem 2

If Assumptions 1–4 hold, as $n, s \to \infty$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n,s\rightarrow \infty $$\end{document} ,

(19)

\begin{matrix} W_{2}^{2} (π, \tilde{π}) = O_{p} (s^{- 2}) + O_{p} (n^{- \frac{3}{2}}), \end{matrix}

where $π$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi $$\end{document} denotes the full data posterior; $\tilde{π}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\pi }}$$\end{document} denotes the WASP approximate posterior. And n and s represent the sample sizes of the full data and subset, respectively.

The detailed proof of Theorem 2 is provided in online supplement S5. According to Theorem 2, as the sample size, n, of the full data increases, the sample size, s, of each subset will also increase under a specific number of partitions, leading to a smaller statistical error that tends toward zero. Therefore, to minimize the statistical error, it is essential to maximize the sample size whenever possible. In cases where the sample size of the full data is limited, one can effectively reduce the statistical error by controlling the number of samples in each subset through appropriate partitioning.

3.3. Monte Carlo Error

The Monte Carlo error is also introduced due to the approximation of the Wasserstein barycenter of subset posterior distributions, $\tilde{π}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\pi }}$$\end{document} , by an empirical measure $\hat{\tilde{π}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\widetilde{\pi }}}$$\end{document} . This empirical measure, while being a necessary approximation for computational tractability, is not the true Wasserstein barycenter of subset posterior distributions. Therefore, the distance between these two measures can be seen as the ‘Monte Carlo error.’ However, the true Wasserstein barycenter and its empirical approximation are both random measures, which presents additional challenges in the analysis of this type of error. To overcome this, we consider coupled versions of these measures, which evolve from the same randomness and are therefore dependent. The Wasserstein distance between these coupled measures provides a controlled environment in which we can investigate the magnitude of the Monte Carlo error. Next, we will provide a detailed introduction of this coupling procedure and outline the key steps to establish the asymptotic order of the Monte Carlo error.

Let M denote the number of iterations after burn-in. We assume $η$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }$$\end{document} , $η_{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_{(k)}$$\end{document} , and $\tilde{η}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\varvec{\eta }}}$$\end{document} to follow the distributions of $π$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi $$\end{document} , $π_{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{(k)}$$\end{document} , and $\tilde{π}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\pi }}$$\end{document} , respectively. Provided that Assumption 2 is satisfied, the mth iterative draw of parameter $η_{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_{(k)}$$\end{document} in subset k can be obtained as $η_{(k)}^{(m)} = μ_{(k)} + Σ_{(k)}^{\frac{1}{2}} ξ_{(k)}^{(m)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }_{(k)}^{(m)}=\varvec{\mu }_{(k)}+\varvec{\Sigma }_{(k)}^{\frac{1}{2}}\varvec{\xi }_{(k)}^{(m)}$$\end{document} for $m = 1, \dots, M$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m=1,\cdots ,M$$\end{document} , as stated in Definition 2. According to this definition, $ξ_{(k)}^{(m)} (k = 1, \dots, K, m = 1, \dots, M)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\xi }_{(k)}^{(m)}~(k=1,\cdots ,K,m=1,\cdots ,M)$$\end{document} represent KM independent draws from distribution G, which has a zero mean and a covariance matrix of identity matrix. Moreover, Definition 2 allows us to derive the posterior draw of the WASP approximation posterior $\tilde{π}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\pi }}$$\end{document} as follows:

(20)

\begin{matrix} \tilde{η} = \tilde{μ} + {\tilde{Σ}}^{\frac{1}{2}} ξ_{(k)}^{(m)} = \tilde{μ} + {\tilde{Σ}}^{\frac{1}{2}} Σ_{(k)}^{- \frac{1}{2}} (η_{(k)}^{(m)} - μ_{(k)}), k = 1, \dots, K, m = 1, \dots, M, \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\widetilde{\varvec{\eta }}}={\widetilde{\varvec{\mu }}} +{\widetilde{\varvec{\Sigma }}}^{\frac{1}{2}}\varvec{\xi }_{(k)}^{(m)} ={\widetilde{\varvec{\mu }}}+{\widetilde{\varvec{\Sigma }}}^{\frac{1}{2}}\varvec{\Sigma }_{(k)}^{-\frac{1}{2}} \big (\varvec{\eta }_{(k)}^{(m)}-\varvec{\mu }_{(k)}\big ),~~k=1,\cdots ,K,m=1,\cdots ,M, \end{aligned}$$\end{document}

where $\tilde{μ} = \frac{1}{K} \sum_{k = 1}^{K} μ_{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\varvec{\mu }}}=\dfrac{1}{K}\sum _{k=1}^K \varvec{\mu }_{(k)}$$\end{document} and $\tilde{Σ} = Δ_{\infty}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\varvec{\Sigma }}}=\varvec{\Delta }_{\infty }$$\end{document} . As $\tilde{μ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\varvec{\mu }}}$$\end{document} , $μ_{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\mu }_{(k)}$$\end{document} , $\tilde{Σ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\varvec{\Sigma }}}$$\end{document} , and $Σ_{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Sigma }_{(k)}$$\end{document} are unknown, they are replaced by their Monte Carlo estimates $\hat{\tilde{μ}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\widetilde{\varvec{\mu }}}}$$\end{document} , ${\hat{μ}}_{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\varvec{\mu }}_{(k)}$$\end{document} , $\hat{\tilde{Σ}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\widetilde{\varvec{\Sigma }}}}$$\end{document} , and ${\hat{Σ}}_{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\varvec{\Sigma }}_{(k)}$$\end{document} in the LS-WASP algorithm, yielding $\hat{\tilde{η}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\widetilde{\varvec{\eta }}}}$$\end{document} , the Monte Carlo estimate of $\tilde{η}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\varvec{\eta }}}$$\end{document} . Let $\hat{\tilde{π}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\widetilde{\pi }}}$$\end{document} represent the empirical measure of the WASP approximate posterior acquired from Monte Carlo estimation in the LS-WASP algorithm. Consequently, $\hat{\tilde{η}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\widetilde{\varvec{\eta }}}}$$\end{document} follows the distribution of $\hat{\tilde{π}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\widetilde{\pi }}}$$\end{document} , and the empirical measure $\hat{\tilde{π}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\widetilde{\pi }}}$$\end{document} is comprised of observations

(21)

\begin{matrix} \hat{\tilde{η}} = \hat{\tilde{μ}} + {\hat{\tilde{Σ}}}^{\frac{1}{2}} {\hat{Σ}}_{(k)}^{- \frac{1}{2}} (μ_{(k)} - {\hat{μ}}_{(k)}) + {\hat{\tilde{Σ}}}^{\frac{1}{2}} {\hat{Σ}}_{(k)}^{- \frac{1}{2}} Σ_{(k)}^{\frac{1}{2}} ξ_{(k)}^{(m)}, k = 1, \dots, K, m = 1, \dots, M, \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \widehat{{\widetilde{\varvec{\eta }}}}=\widehat{{\widetilde{\varvec{\mu }}}} +\widehat{{\widetilde{\varvec{\Sigma }}}}^{\frac{1}{2}}\widehat{\varvec{\Sigma }}_{(k)}^{-\frac{1}{2}} \big (\varvec{\mu }_{(k)}-\widehat{\varvec{\mu }}_{(k)}\big )+\widehat{{\widetilde{\varvec{\Sigma }}}}^{\frac{1}{2}} \widehat{\varvec{\Sigma }}_{(k)}^{-\frac{1}{2}}\varvec{\Sigma }_{(k)}^{\frac{1}{2}}\varvec{\xi }_{(k)}^{(m)},~~k=1,\cdots ,K,m=1,\cdots ,M, \end{aligned}$$\end{document}

where $\hat{\tilde{μ}} = \frac{1}{K} \sum_{k = 1}^{K} {\hat{μ}}_{(k)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\widetilde{\varvec{\mu }}}}=\dfrac{1}{K}\sum _{k=1}^K \widehat{\varvec{\mu }}_{(k)}$$\end{document} and $\hat{\tilde{Σ}} = {\hat{Δ}}_{\infty}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{{\widetilde{\varvec{\Sigma }}}}=\widehat{\varvec{\Delta }}_{\infty }$$\end{document} . In light of Corollary 1, we characterize the Monte Carlo error as $W_{2} (\tilde{π}, \hat{\tilde{π}})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W_2({\widetilde{\pi }},\widehat{{\widetilde{\pi }}})$$\end{document} . Consequently, the following theorem is presented.

Theorem 3

Under Assumptions 1–5, when $n, M \to \infty$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n, M\rightarrow \infty $$\end{document} ,

(22)

\begin{matrix} W_{2}^{2} (\tilde{π}, \hat{\tilde{π}}) = O_{p} (M^{- 1}) + o_{p} (n^{- 1}) . \end{matrix}

where n represents the sample size of the full data and M denotes the number of iterations after burn-in.

The detailed proof of Theorem 3 is provided in online supplement S5.

4. Simulation Studies

Two simulation studies were conducted to assess the effectiveness and feasibility of the proposed LS-WASP algorithm. Simulation studies 1 and 2 performed the LS-WASP algorithm on the 2PL model and M2PL model, respectively. The sample sizes, test lengths, and the number of subsets under different partitions were varied to evaluate the robustness of the LS-WASP algorithm across diverse scenarios. The MC chain length was set to be 10,000, with the first 5000 iterations as burn-in. The bias and RMSE of item parameters and ability parameters were computed to assess the accuracy of the parameter estimates:

(23)

\begin{matrix} Bias (η) = \frac{1}{R} \sum_{r = 1}^{R} ({\hat{η}}_{r} - η^{*}), RMSE (η) = \sqrt{\frac{1}{R} \sum_{r = 1}^{R} {({\hat{η}}_{r} - η^{*})}^{2}}, \end{matrix}

where R represents the number of replications, ${\hat{η}}_{r}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\varvec{\eta }}}_r$$\end{document} denotes the parameter estimate of the rth replication, and $η^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }^*$$\end{document} indicates the true value of the parameter. Each simulation condition was conducted $R = 25$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R=25$$\end{document} replications. Additionally, we calculated the running time for each replication to assess the computational efficiency of our LS-WASP algorithm. The maximum value of each replication running time was displayed as the final running time. The computation implementation was executed on an AMD EPYC 7542 32-Core Processor (2.90 GHz) with 1.5 TB RAM on a Windows Server 2019 Standard operating system. Note that the parallel sampling of subsets is conducted using different cores of the computer. Specifically, after partitioning the full data into K subsets, the K cores of the computer are used for parallel sampling, with one core used for sampling one subset.

4.1. Study 1: Performance of LS-WASP Algorithm for the 2PL Model

4.1.1. Design

The purpose of study 1 is to examine the performance of the LS-WASP algorithm on the 2PL model. Three factors were manipulated to vary different simulation conditions: (1) The number of examinees, i.e., n=10,000, 20,000, 50,000; (2) The number of items, i.e., J=20, 40; (3) The number of subsets, i.e., K=1, 2, 5, 10, 20, which is different number of partitions of the full data. Varying different levels of these three factors produced 30 simulation conditions. Each simulation condition was replicated 25 times.

Table 1 Bias and RMSE of parameter estimates and running time across different conditions in simulation study 1.

Bias and RMSE denote the average bias and RMSE for the parameter estimates. $a$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}$$\end{document} represents all discrimination parameters, $b$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{b}}$$\end{document} represents all difficulty parameters, and $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }$$\end{document} denotes all ability parameters.

Note that $K = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=1$$\end{document} corresponds to the scenario where no partitioning is performed, and the full data set is used for analysis. Thus, we employed the PG-Gibbs Sampler for the full data set hereafter, rather than using the LS-WASP algorithm which requires data partitioning.

4.1.2. Data Generation

The response data of 2PL model are generated from Eq. (1). The ability parameters were generated from a normal distribution N(0, 1). The discrimination parameters and difficulty parameters were sampled from a lognormal distribution $log N (0.3, 0.2)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\log N(0.3,0.2)$$\end{document} and a normal distribution N(0, 1), respectively. The manner of data generation is consistent with that of Jiang and Templin Jiang and Templin (Reference Jiang and Templin2019). To ensure the model identification, the priors for $θ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _i$$\end{document} , $a_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_j$$\end{document} , and $b_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_j$$\end{document} were set to a normal distribution N(0, 1), a truncated normal distribution $T N_{(0, + \infty)} (0, 10)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$TN_{(0,+\infty )}(0,10)$$\end{document} , and a normal distribution N(0, 10), respectively. Our selection of these distributions and priors is based on their theoretical properties and widespread use in the existing literature. However, it is important to note that the LS-WASP algorithm can work with other distributions and priors as well.

Figure 1 Running times under different subset conditions in simulation study 1. Note that ‘Group’ indicates the number of subsets.

4.1.3. Results

Table 1 shows the bias and RMSE of parameter estimates and running time across different conditions. We observe a slight increase in the bias and RMSE for the ability parameters $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }$$\end{document} and difficulty parameters $b$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{b}}$$\end{document} as the number of subsets increases. However, this increase is minimal, indicating that the number of subsets has a negligible effect on $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }$$\end{document} and $b$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{b}}$$\end{document} . Furthermore, Figures S-1 and S-2 in online supplement S2 display the average bias and RMSE of the discrimination parameter $a$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}$$\end{document} across different number of subsets K. From Table 1 and Figures S-1 and S-2, it can be observed that both the average bias and RMSE for parameter $a$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}$$\end{document} increase with K, but this increasing trend becomes more slower as the sample size n increases. This is because an increase in K leads to a smaller subset sample size, resulting in larger bias and RMSE. However, as the full dataset sample size increases, the subset sample size also increases for the same K, thus slowing down the increase in estimation error. Additionally, we observe that as the sample size increases, the overall bias and RMSE decrease, indicating that the larger the sample size, the more accurate the parameter estimates become. It is noteworthy that when $K = 20$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=20$$\end{document} , the bias of $a$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}$$\end{document} is twice as high as that when $K = 10$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=10$$\end{document} . Nevertheless, even with small sample sizes when $K = 20$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=20$$\end{document} , the RMSE results remain satisfactory. Note that the selection of the optimal subset sample size will be influenced by the complexity of the model. Specifically, more complex model generally requires larger sample size per subset to ensure adequate accuracy of parameter estimation. Therefore, it is suggested that the sample size within each subset should at least meet the minimum requirement necessary for accurate model parameter estimation. For instance, an accurate estimation of a 2PL model requires at least 500 individuals (König et al., Reference König, Spoden and Frey2020), a 3PL model needs at least 1000 (de la Torre and Hong, Reference de la Torre and Hong2010; De Ayala, Reference De Ayala2013), and a 4PL model necessitates at least 4000 individuals (Cuhadar, Reference Cuhadar2022). This strategy ensures that the accuracy of estimates within each subset is maintained, thereby ensuring the overall accuracy of the parameter estimation.

Figure 2 Bias and RMSE of each item parameter estimate across various sample sizes with $J = 20$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J=20$$\end{document} in simulation study 1. Note that ‘Group’ indicates the number of subsets.

In addition, from the running time in Table 1, the running time for the full data is significantly larger than that of subset using the LS-WASP algorithm. For example, the running time of the LS-WASP algorithm is halved when partitioning the full data into two subsets, and it becomes a tenth when partitioned into ten subsets, and so forth. To provide a detailed analysis of running time, Fig. 1 presents a graphical comparison of the running time across conditions. We observe that the running time increases with more examinees and items, while decreasing with a larger number of subsets. When the full data are partitioned into $K = 10$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=10$$\end{document} and $K = 20$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=20$$\end{document} subsets, the running time remains below 0.5 h for each condition. Thus, our algorithm holds a distinct advantage in computational speed.

To delve deeper into these differences, we compared the bias and RMSE of each item. Figure 2 shows the results of $J = 20$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J=20$$\end{document} . Please see online supplement S2 for results of $J = 40$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J = 40$$\end{document} . Clearly, as the number of examinees increases, the bias and RMSE lines of item parameters converge and overlap more. In essence, as examinee numbers grow, LS-WASP algorithm’s item parameter estimation aligns more closely with full data, diminishing partitioning-related discrepancies. This is due to a larger sample size in each subset with a constant partition number, reducing partitioning effects. Moreover, overall estimation accuracy improves with more examinees, highlighting our algorithm’s advantages as examinee numbers increase.

In cases of small samples split into numerous subsets, the LS-WASP algorithm may slightly compromise discriminant parameter accuracy. However, it still produces accurate difficulty parameter estimates and offers a clear running time advantage. For larger samples, regardless of subset numbers, our method ensures accurate parameter estimation and maintains fast computation.

4.2. Study 2: Performance of LS-WASP Algorithm for the M2PL Model

4.2.1. Design

The aim of study 2 is to investigate the performance of the LS-WASP algorithm for the M2PL model. 10,000 examinees were considered to be consistent with the sample size in empirical example 2 for the multidimensional case. Three factors were manipulated to vary different simulation conditions: (1) The number of items, i.e., J=20, 40; (2) The number of subsets, i.e., K=1, 2, 5, 10. In this case, we excluded $K = 20$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=20$$\end{document} because the M2PL model required a minimum of 1000 examinees per subset to ensure precise parameter estimation when $n = 10, 000$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n = 10,000$$\end{document} ; (3) The number of dimensions of latent traits, i.e., Q=2, 3. In total, 16 simulation conditions were conducted, and each condition was performed 25 replications.

4.2.2. Data Generation

We generated item response data for the M2PL model from Eq. (2). The ability parameters $θ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }_i$$\end{document} for each examinee i were sampled from a multivariate normal distribution $N (0, I_{Q})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N({\varvec{0}},{\varvec{I}}_Q)$$\end{document} , where $0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{0}}$$\end{document} is a Q-dimensional vector with all elements being 0 and $I_{Q}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{I}}_Q$$\end{document} is a $Q \times Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Q\times Q$$\end{document} dimensional unit matrix. The discrimination parameter $a_{jq}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{jq}$$\end{document} and difficulty parameter $b_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_j$$\end{document} for each item j were generated from $log N (0.3, 0.2)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\log N(0.3,0.2)$$\end{document} and N(0, 1) (for $q = 1, \dots, Q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q=1,\cdots ,Q$$\end{document} ), respectively. The model identification condition of the M2PL model follows the constrains given in subsection 1.2 (Béguin and Glas, Reference Béguin and Glas2001). The prior settings for the M2PL are same with those for the 2PL model, i.e., the prior for $θ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }_i$$\end{document} , $a_{jq}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{jq}$$\end{document} , and $b_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_j$$\end{document} follows $N (0, I_{Q})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N({\varvec{0}},{\varvec{I}}_Q)$$\end{document} , $T N_{(0, + \infty)} (0, 10)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$TN_{\left( 0,+\infty \right) }\left( 0, 10\right) $$\end{document} , and N(0, 10), respectively.

Table 2 Bias and RMSE of parameter estimates and running time across different conditions in simulation study 2.

Bias and RMSE denote the average bias and RMSE for the parameters. $a$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}$$\end{document} represents all discrimination parameters across Q dimensions, $b$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{b}}$$\end{document} represents all difficulty parameters, and $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }$$\end{document} denotes all ability parameters across Q dimensions.

4.2.3. Results

Table 2 presents bias and RMSE of parameter estimates and running time across different conditions for the M2PL model. The trend of parameter estimates in the M2PL model mirrors that of the 2PL model; that is, as the number of subsets increases, the bias and RMSE for each parameter estimate slightly grow, but the increase remains minimal. Furthermore, the bias and RMSE for each parameter expand as latent trait dimensions increase. However, the difference in results between the LS-WASP algorithm and the full data set is not significant, indicating its applicability to the M2PL model. Figure 3 depicts the running time under different subsets for the M2PL model. With more subsets, LS-WASP algorithm speeds up significantly. Additionally, computation time slightly increases as latent trait dimensions grow. Figure 4 shows the bias and RMSE of each item parameter estimates for $J = 20$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J = 20$$\end{document} . Please see online supplement S2 for results of $J = 40$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J = 40$$\end{document} . As the number of subsets and latent trait dimensions increase, there is a noticeable increase in bias and RMSE for each item. The bias and RMSE for the discrimination parameter $a$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}$$\end{document} stay mostly within 0.1, given that true values of $a$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}$$\end{document} are generally between 1 and 2.5. For the difficulty parameter $b$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{b}}$$\end{document} , where true values are primarily around 0 with a few exceeding 1, the bias and RMSE are kept within 0.08. Therefore, the LS-WASP algorithm ensures accurate parameter estimation and notably improves computational efficiency in the M2PL model.

Figure 3 Running times under different subset conditions in simulation study 2. Note that ‘Group’ indicates the number of subsets.

Figure 4 Bias and RMSE of each item parameter estimate across various latent trait dimensions with $J = 20$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J=20$$\end{document} in simulation study 2. Note that ‘Group’ indicates the number of subsets.

5. Empirical Examples

Two examples from PISA computer-based mathematics data were analyzed using our proposed LS-WASP algorithm. These two examples showcased the application of the LS-WASP algorithm for the 2PL model and M2PL model, respectively. Due to page limit, please see online supplement S4 for details on the empirical example of M2PL model.

5.1. Data Description

The first data set is from PISA 2018 computer-based cognitive mathematics test. We selected 9 scored items, i.e., CM447Q01S, CM273Q01S, CM408Q01S, CM420Q01S, CM446Q01S, CM559Q01S, CM828Q03S, CM464Q01S, and CM800Q01S. After excluding students with missing responses, the remaining sample size was 80,352. Further, 20,412 students with Not Reached (original code 6), Not Applicable (original code 7), Invalid (original code 8), or No Response (original code 9) were removed. Thus, the final sample size consisted of 59,940 students.

5.2. Purposes and Designs

We calculated the standard deviations (SD) and 95% highest posterior density (HPD) intervals for each item parameter estimates. Our proposed LS-WASP algorithm primarily addresses the significant challenges in parameter estimation arising from large sample sizes in large-scale testing. The primary idea of our algorithm is as follows. At stage 1 of our algorithm, the full dataset is partitioned, i.e., the “persons” are grouped. At stage 2, the same set of items are estimated across different data subsets. At stage 3, the item parameters obtained from the stage 2 are integrated to yield the final item parameter estimates. Since the LS-WASP algorithm is grounded in the full Bayesian framework, the ability parameters are sampled from PG-Gibbs algorithm of stage 2. Here, we also present the results of ability parameter estimation. The running time was given to illustrate the time efficiency of the LS-WASP algorithm.

With the sample size exceeding 50,000, we divided the data into subsets of $K = 2, 5, 10, 20$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=2,5,10,20$$\end{document} to align with simulation studies. We employed the 2PL model to fit the data to adhere to the PISA operational analysis plan. We set the MCMC iterations to 10,000, with the initial 5000 iterations as burn-in. Unlike simulation studies, real data do not have predefined “true” parameter values. To verify the effectiveness of the LS-WASP algorithm, we computed parameter estimates from the full data using the P $\overset{´}{o}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\acute{\text {o}}$$\end{document} lya-Gamma Gibbs sampler for comparison.

5.3. Results

Table 3 shows the expected a posteriori (EAP) values and SD values of item parameters under different subsets. As the number of subsets increases, sample sizes within each subset decrease, potentially impacting parameter accuracy for inference. Conversely, fewer subsets yield larger sample sizes, theoretically obtaining more accurate parameter estimates. From Table 3, with a maximum of 20 subsets, the EAP of $a$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}$$\end{document} increases by about 0.01 compared to full data. Similarly, for parameter $b$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{b}}$$\end{document} , EAP differences primarily stay within 0.0007, with only a few exceeding 0.001. Therefore, for these two parameters, estimations remain consistent between the full data and conditions with 20 subsets.

Table 3 EAPs and SD item parameters for PISA 2018 mathematics cognitive test.

PARM represents parameter, EAP denotes the expected a posteriori estimation, and SD is the standard deviation. $K = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=1$$\end{document} indicates the full data set, i.e., no data partitioning.

Figure 5 shows the SD differences of item parameter estimates between different subsets and full data in empirical example 1. Clearly, as the number of subsets increases, the differences in item parameter SDs are larger. However, the largest difference in SD of parameter $b$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{b}}$$\end{document} is around 0.025. Notably, the SD differences for parameter $a$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}$$\end{document} are even smaller, staying within 0.004. This indicates that the LS-WASP algorithm can accurately estimate the item parameters. In addition, the differences in EAP estimates of ability parameters between different subsets and full data are presented in online supplement S3. The results show that our algorithm’s estimation of ability parameters remains closely aligned with the full data, without any significant deviation.

Figure 5 Differences in SD values of item parameter estimates between different subsets and full data in empirical example 1.

Figure 6 shows the 95% HPDIs of nine item parameters under different subsets. While the HPDI ranges of parameters $a$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{a}}$$\end{document} and $b$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{b}}$$\end{document} appear to increase with more subsets, the extent of this increase remains minimal. The running time of full data is approximately 1.3 h. When the full data are partitioned into two subsets, the time required is reduced by approximately half compared to the full data. When the full data are partitioned into 20 subsets, the computation time of the LS-WASP algorithm takes approximately three minutes. Therefore, the LS-WASP algorithm substantially enhances computational speed for big data and accurately estimates parameters in the 2PL model.

6. Discussion

For large-scale educational assessment data, current MCMC methods are significantly time-consuming when estimating the IRT models. To address this issue, we propose a divide-and-conquer algorithm named LS-WASP for distributed Bayesian inference in IRT models. This algorithm partitions the data into several subsets and then conducts parallel sampling of parameters for these subsets. Under the assumption of a location-scatter family, we propose an approximate Wasserstein posterior method as a substitute for the full data posterior sampling of parameters.

Figure 6 95% HPDIs of the nine item parameters in empirical example 1.

Simulation results and real data analysis validate the effectiveness of the LS-WASP algorithm in estimating IRT models. First, the LS-WASP algorithm exhibits a significant advantage in computational time. Specifically, the computational time of the LS-WASP algorithm is roughly inversely proportional to the number of subsets K. Second, the LS-WASP algorithm accurately estimates IRT model parameters. Simulation studies and real data analyses show that when sample sizes are large, regardless of the number of subsets, the estimates for item parameters from the LS-WASP algorithm closely align with those derived from the full data. However, when the sample size is small but the number of subsets is large, our method exhibits a slight difference in estimating discrimination parameters compared to the results based on full data, yet the estimation of difficulty parameters remains precise. The advantages of our proposed method become more prominent with larger sample sizes, and in the case of smaller sample sizes, we suggest to select an appropriate number of subsets to obtain the precise parameter estimates. The optimal subset sample size depends on the model’s complexity; as complexity increases, so does the needed sample size per subset. Therefore, each subset’s sample size should meet the minimum requirement for precise estimation, ensuring accuracy both within subsets and across the entire dataset. Furthermore, although our method primarily focuses on item parameters based on the data partitioning (i.e., grouping the persons), simulation results show that our method can still estimate ability parameters accurately.

The LS-WASP algorithm is not only conceptually simple and easy to implement, but also facilitates fast and accurate parameter estimation of IRT models as well as provides theoretical guarantees. We use this algorithm in conjunction with subsampling of parameters, and then based on the assumption that the posterior distributions of each subset belong to the same location-scatter family, an approximation of the true WASP can be derived. Along with other regularity assumptions, we derived the asymptotic Monte Carlo and statistical theoretical guarantees for the theoretical foundation of this algorithm.

In addition, we would like to reiterate the purpose of this study. As known, the EM algorithm has been the de facto method in operationally processing large educational datasets. However, it is crucial to acknowledge the fundamental differences between the two approaches: EM being an optimization algorithm requires relatively fewer iterations for convergence, and Bayesian estimation being a sampling-based approach requires a larger number of iterations for parameter convergence. This fundamental difference suggests that comparing them directly may not be entirely fair. Therefore, our aim is not to replace the EM algorithm but to offer an alternative that leverages the strengths of Bayesian inference in standard IRT settings, and provide a strong theoretical and practical framework for managing complex datasets. Our research seeks to explore the potential and applicability of the Bayesian “divide-and-conquer” method in scenarios where the Bayesian paradigm offers distinct advantages, particularly in handling the complexities inherent in large-scale educational data.

Despite its advantages, the LS-WASP algorithm has several limitations. First, the algorithm requires large sample size and relies on the appropriate selection of the number of subsets. The larger the sample size, the better the performance of the algorithm. When the sample size is small, choosing a large number of subsets may deteriorate the estimation performance, making the selection of an appropriate subset number crucial. It is recommended that each subset’s sample size should meet the minimum requirement for precise parameter estimation. Second, the sampling algorithm used in this paper is the P $\overset{´}{o}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\acute{\text {o}}$$\end{document} lya-Gamma Gibbs algorithm, but the LS-WASP algorithm is actually applicable to any MCMC sampling algorithms, such as the M-H algorithm, the slice algorithm (Neal, Reference Neal2003; Lu et al., Reference Lu, Zhang and Tao2018), and so on. Thus, additional sampling methods combined with our proposed LS-WASP algorithm can be explored in the future. Third, our results can also be generalized to cases with different subset sample sizes, but a common subset sample size still needs to be assumed to simplify the analysis. Fourth, the complexity of the LS-WASP algorithm may increase with complex IRT models; the further investigation of our proposed algorithm can be explored. Finally, the accuracy of parameter estimation using the “divide-and-conquer” approach can also be influenced by external factors, such as the unbalanced subset response data due to data partitioning or improper handling of missing data. Inappropriate data partitioning can easily lead to highly unbalanced response data within subsets, for instance, when the data allocated to a subset consist entirely of 0 s or 1 s. In such cases, some subsets may not contain enough responses from the minority groups, potentially leading to inaccurate parameter estimates. Typically, we need to be cautious with data partitioning to ensure that each subset contains enough minority class samples. After data partitioning, we should check each subset as thoroughly as possible to avoid unbalanced data splitting. Additionally, handling missing data improperly can lead to biased estimates. Imputing missing responses without proper consideration could deviate from our research goals and potentially distort the results (e.g., Robitzsch and Rupp Reference Robitzsch and Rupp2009; Pohl et al. Reference Pohl, Gräfe and Rose2014; Sportisse et al. Reference Sportisse, Boyer and Josse2020; Du et al. Reference Du, Enders, Keller, Bradbury and Karney2022), especially when the missing response data are incorrectly imputed as complete data. Therefore, in this study, we believe that strictly managing missing data and relying solely on complete cases are pivotal for the reliability and validity of the study outcomes. Thus, while the “missing data” approach may be beneficial in certain contexts, given its potential problems and complexity, we recommend further exploration and evaluation of the feasibility and effectiveness of distributed Bayesian estimation in future research.

Acknowledgements

This research was supported by the general projects of National Social Science Fund of China on Statistics (Grant No. 23BTJ067).

Declarations

Conflict of interest

Each author signed a form for conflict of interest of potential conflict of interest. No authors reported any financial or other conflict of interest in relation to the work described. The author(s) declared no potential conflict of interest with respect to the research, authorship, and/or publication of this article

Ethical approval

The authors affirm having followed professional ethical guidelines in preparing this work. These guidelines include obtaining informed consent from human participants, maintaining ethical treatment and respect for the rights of human or animal participants, and ensuring the privacy of participants and their data, such as ensuring that individual participants cannot be identified in reported results or from publicly available original or archival data.

Data availability

The real data set was derived from the following resources available in the public domain: http://www.oecd.org/pisa.

Footnotes

Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/s11336-024-09978-1.

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

References

Ackerman, T. (1996). Graphical representation of multidimensional item response theory analyses. Applied Psychological Measurement, 20(4), 311–329.CrossRef Google Scholar

Agueh, M., Carlier, G. (2011). Barycenters in the Wasserstein space. SIAM Journal on Mathematical Analysis, 43(2), 904–924.CrossRef Google Scholar

Alquier, P., Friel, N., Everitt, R., Boland, A. (2016). Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels. Statistics and Computing, 26 1–229–47.CrossRef Google Scholar

Álvarez-Esteban, P. C., Del Barrio, E., Cuesta-Albertos, J. A., Matrán, C. (2016). A fixed-point approach to barycenters in Wasserstein space. Journal of Mathematical Analysis and Applications, 441(2), 744–762.CrossRef Google Scholar

Baker, F. B., Kim, S. H. (2004). Item response theory: Parameter estimation techniques, New York: Dekker.CrossRef Google Scholar

Balamuta, J. J., Culpepper, S. A. (2022). Exploratory restricted latent class models with monotonicity requirements under Pólya-Gamma data augmentation. Psychometrika, 87(3), 903–945.CrossRef Google Scholar PubMed

Béguin, A. A., Glas, C. A. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66, 541–561.CrossRef Google Scholar

Birnbaum, A. (1957). Efficient design and use of tests of a mental ability for various decision-making problems. Series Report No. 58–16. Randolph Air Force Base. USAF School of Aviation Medicine.Google Scholar

Bock, R. D., Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443–459.CrossRef Google Scholar

Choi, H. M., Hobert, J. P. (2013). The Pólya-Gamma Gibbs sampler for Bayesian logistic regression is uniformly ergodic. Electronic Journal of Statistics, 7, 2054–2064.CrossRef Google Scholar

Cuhadar, I. (2022). Sample size requirements for parameter recovery in the 4-Parameter logistic model. Measurement: Interdisciplinary Research and Perspectives, 20(2), 57–72.Google Scholar

Culpepper, S. A. (2016). Revisiting the 4-parameter item response model: Bayesian estimation and application. Psychometrika, 81(4), 1142–1163.CrossRef Google Scholar PubMed

De Ayala, R. J. (2013). Theory and practice of item response theory, Cham: Guilford Publications.Google Scholar

de la Torre, J., Hong, Y. (2010). Parameter estimation with small sample size a higher-order IRT model approach. Applied Psychological Measurement, 34, 267–285.CrossRef Google Scholar

Du, H., Enders, C., Keller, B. T., Bradbury, T. N., Karney, B. R. (2022). A Bayesian latent variable selection model for nonignorable missingness. Multivariate Behavioral Research, 57 2–3478–512.CrossRef Google Scholar PubMed

Embretson, S. E., Reise, S. P. (2000). Item response theory for psychologists, Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar

Fox, J. P. (2010). Bayesian item response modeling: Theory and applications, New York: Springer.CrossRef Google Scholar

Giordano, R., Broderick, T., Jordan, M. I. (2018). Covariances, robustness and variational bayes. Journal of Machine Learning Research, 19(51), 1–49.Google Scholar

Hartshorne, J. K., Tenenbaum, J. B., Pinker, S. (2018). A critical period for second language acquisition: Evidence from 2/3 million English speakers. Cognition, 177, 263–277.CrossRef Google Scholar PubMed

Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1), 97–109.CrossRef Google Scholar

Hoffman, M. D., Blei, D. M., Wang, C., Paisley, J. (2013). Stochastic variational inference. Journal of Machine Learning Research., 14(5), 1303–1347.Google Scholar

Jiang, Z., Templin, J. (2019). Gibbs samplers for logistic item response models via the Pólya-Gamma distribution: A computationally efficient data-augmentation strategy. Psychometrika, 84(2), 358–374.CrossRef Google Scholar PubMed

Jimenez, A., Balamuta, J. J., Culpepper, S. A. (2023). A sequential exploratory diagnostic model using a Pólya-gamma data augmentation strategy. British Journal of Mathematical and Statistical Psychology, 76(3), 513–538.CrossRef Google Scholar PubMed

Kass, R. E., Tierney, L., Kadane, J. B. (1990). The validity of posterior expansions based on Laplace’s method. Bayesian and Likelihood Methods in Statistics and Econometrics, 7, 473–487.Google Scholar

König, C., Spoden, C., Frey, A. (2020). An optimized Bayesian hierarchical two-parameter logistic model for small-sample item calibration. Applied Psychological Measurement, 44(4), 311–326.CrossRef Google Scholar PubMed

Korattikara, A., Chen, Y., & Welling, M. (2014). Austerity in MCMC land: Cutting the Metropolis-Hastings budget. In International Conference on Machine Learning, pp. 181–189.Google Scholar

Lee, C. Y. Y., Wand, M. P. (2016). Streamlined mean field variational Bayes for longitudinal and multilevel data analysis. Biometrical Journal, 58(4), 868–895.CrossRef Google Scholar PubMed

Li, C., Srivastava, S., Dunson, D. B. (2017). Simple, scalable and accurate posterior interval estimation. Biometrika, 104(3), 665–680.CrossRef Google Scholar

Lu, J., Zhang, J. W., Tao, J. (2018). Slice-Gibbs sampling algorithm for estimating the parameters of a multilevel item response model. Journal of Mathematical Psychology, 82, 12–25.CrossRef Google Scholar

Martin, M. O., & Kelly, D. L. (1996). Third international mathematics and science study technical report volume 1: Design and development. Chestnut Hill: Boston College.Google Scholar

Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., Teller, E. (1953). Equations of state space calculations by fast computing machines. The Journal of Chemical Physics, 21(6), 1087–1092.CrossRef Google Scholar

Minsker, S., Srivastava, S., Lin, L., Dunson, D. B. (2017). Robust and scalable Bayes via a median of subset posterior measures. The Journal of Machine Learning Research, 18(1), 4488–4527.Google Scholar

Minsker, S., Srivastava, S., Lin, L., & Dunson, D. (2014). Scalable and robust Bayesian inference via the median posterior. In International Conference on Machine Learning, pp. 1656–1664.Google Scholar

Neal, R. (2003). Slice sampling. The Annals of Statistics, 31(3), 705–767.CrossRef Google Scholar

Neiswanger, W., Wang, C., & Xing, E. (2014). Asymptotically exact, embarrassingly parallel MCMC. In Proceedings of the 30th International Conference on Uncertainty in Artificial Intelligence, pp. 623–632.Google Scholar

OECD (2021). PISA 2018 technical report, Paris: OECD Publishing.Google Scholar

Pohl, S., Gräfe, L., Rose, N. (2014). Dealing with omitted and not-reached items in competence tests: Evaluating approaches accounting for missing responses in item response theory models. Educational and Psychological Measurement, 74(3), 423–452.CrossRef Google Scholar

Polson, N. G., Scott, J. G., Windle, J. (2013). Bayesian inference for logistic models using Pólya-Gamma latent variables. Journal of the American Statistical Association, 108(504), 1339–1349.CrossRef Google Scholar

Quiroz, M., Kohn, R., Villani, M., Tran, M. N. (2019). Speeding up MCMC by efficient data subsampling. Journal of the American Statistical Association, 114(526), 831–843.CrossRef Google Scholar

Reckase, M. D. (1972). Development and application of a multivariate logistic latent trait model, Syracuse University.Google Scholar

Reckase, M. D. (2009). Multidimensional item response theory, New York, NY: Springer.CrossRef Google Scholar

Robitzsch, A., Rupp, A. A. (2009). Impact of missing data on the detection of differential item functioning: The case of Mantel–Haenszel and logistic regression analysis. Educational and Psychological Measurement, 69(1), 18–34.CrossRef Google Scholar

Rue, H., Martino, S., Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society Series B: Statistical Methodology, 71(2), 319–392.CrossRef Google Scholar

San Martín, E. (2016). Identification of item response theory models. Handbook of item response theory, 2, 127–150.Google Scholar

Schilling, S., Bock, R. D. (2005). High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika, 70, 533–555.Google Scholar

Scott, S. L., Blocker, A. W., Bonassi, F. V., Chipman, H. A., George, E. I., McCulloch, R. E. (2016). Bayes and big data: The consensus Monte Carlo algorithm. International Journal of Management Science and Engineering Management, 11(2), 78–88.CrossRef Google Scholar

Shyamalkumar, N. D., Srivastava, S. (2022). An algorithm for distributed Bayesian inference. Stat, 11(1).CrossRef Google Scholar

Skrondal, A., Rabe-Hesketh, S. (2004). Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models, Cham: Crc Press.CrossRef Google Scholar

Sportisse, A., Boyer, C., Josse, J. (2020). Imputation and low-rank estimation with missing not at random data. Statistics and Computing, 30(6), 1629–1643.CrossRef Google Scholar

Srivastava, S., Cevher, V., Dinh, Q., & Dunson, D. (2015). WASP: Scalable Bayes via barycenters of subset posteriors. In Artificial Intelligence and Statistics, pp. 912–920.Google Scholar

Srivastava, S., Li, C., Dunson, D. B. (2018). Scalable Bayes via barycenter in Wasserstein space. The Journal of Machine Learning Research, 19(1), 312–346.Google Scholar

Srivastava, S., Xu, Y. (2021). Distributed Bayesian inference in linear mixed-effects models. Journal of Computational and Graphical Statistics, 30(3), 594–611.CrossRef Google Scholar

Tan, L. S., Nott, D. J. (2014). A stochastic variational framework for fitting and diagnosing generalized linear mixed models. Bayesian Analysis, 9(4), 963–1004.CrossRef Google Scholar

van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72(3), 287–308.CrossRef Google Scholar

van der Linden, W. J., Hambleton, R. K. (1997). Handbook of modern item response theory, New York: Springer-Verlag.CrossRef Google Scholar

van Rijn, P. W., Sinharay, S., Haberman, S. J., Johnson, M. S. (2016). Assessment of fit of item response theory models used in large-scale educational survey assessments. Large-Scale Assessments in Education, 4, 1–23.CrossRef Google Scholar

Vehtari, A., Gelman, A., Sivula, T., Jylänki, P., Tran, D., Sahai, S., Robert, C. P. (2020). Expectation propagation as a way of life: A framework for Bayesian inference on partitioned data. The Journal of Machine Learning Research, 21(1), 577–629.Google Scholar

Wang, C., Fan, Z., Chang, H. H., Douglas, J. (2013). A semiparametric model for jointly analyzing response times and accuracy in computerized testing. Journal of Educational and Behavioral Statistics, 38(4), 381–417.CrossRef Google Scholar

Wang, C., Srivastava, S. (2023). Divide-and-conquer Bayesian inference in hidden Markov models. Electronic Journal of Statistics, 17(1), 895–947.CrossRef Google Scholar

Wang, C., Xu, G., Shang, Z. (2018). A two-stage approach to differentiating normal and aberrant behavior in computer based testing. Psychometrika, 83, 223–254.CrossRef Google Scholar PubMed

Wu, M., Davis, R. L., Domingue, B. W., Piech, C., & Goodman, N. (2020). Variational item response theory: Fast, accurate, and expressive. ArXiv:2002.00276.Google Scholar

Xue, J., Liang, F. (2019). Double-parallel Monte Carlo for Bayesian analysis of big data. Statistics and Computing, 29(1), 23–32.CrossRef Google Scholar PubMed

Algorithm 1 The details for the stage 3 of LS-WASP Algorithm

Table 1 Bias and RMSE of parameter estimates and running time across different conditions in simulation study 1.

Figure 1 Running times under different subset conditions in simulation study 1. Note that ‘Group’ indicates the number of subsets.

Figure 2 Bias and RMSE of each item parameter estimate across various sample sizes with J=20\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$J=20$$\end{document} in simulation study 1. Note that ‘Group’ indicates the number of subsets.

Table 2 Bias and RMSE of parameter estimates and running time across different conditions in simulation study 2.

Figure 3 Running times under different subset conditions in simulation study 2. Note that ‘Group’ indicates the number of subsets.

Figure 4 Bias and RMSE of each item parameter estimate across various latent trait dimensions with J=20\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$J=20$$\end{document} in simulation study 2. Note that ‘Group’ indicates the number of subsets.

Table 3 EAPs and SD item parameters for PISA 2018 mathematics cognitive test.

Figure 5 Differences in SD values of item parameter estimates between different subsets and full data in empirical example 1.

Figure 6 95% HPDIs of the nine item parameters in empirical example 1.

Xu et al. supplementary materials

File 1.5 MB

Article contents

Optimizing Large-Scale Educational Assessment with a “Divide-and-Conquer” Strategy: Fast and Efficient Distributed Bayesian Inference in IRT Models

Abstract

Keywords

1. Models

1.1. Unidimensional Two-Parameter Logistic Model

1.2. Multidimensional Two-Parameter Logistic Model

2. The Location-Scatter Wasserstein Posterior (LS-WASP) Algorithm with a Divide-and-Conquer-Based Strategy

2.1. Preliminaries

Definition 1

Definition 2

Theorem 1

2.2. Implementation of the LS-WASP Algorithm within the IRT Framework

Remark 1

Remark 2

3. Theoretical Properties

3.1. Assumptions

Assumption 1

Assumption 2

Assumption 3

Assumption 4

Assumption 5

3.2. Statistical Error

Corollary 1

Theorem 2

3.3. Monte Carlo Error

Theorem 3

4. Simulation Studies

4.1. Study 1: Performance of LS-WASP Algorithm for the 2PL Model

4.1.1. Design

4.1.2. Data Generation

4.1.3. Results

4.2. Study 2: Performance of LS-WASP Algorithm for the M2PL Model

4.2.1. Design

4.2.2. Data Generation

4.2.3. Results

5. Empirical Examples

5.1. Data Description

5.2. Purposes and Designs

5.3. Results

6. Discussion

Acknowledgements

Declarations

Conflict of interest

Ethical approval

Data availability

Footnotes

References

Xu et al. supplementary materials

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests