1 Introduction
Regression models, encompassing multiple predictors and outcome variables, are pervasive in the social sciences, where research endeavors seek to comprehend the relationships among various variables (e.g., Aiken et al., Reference Aiken, West and Reno1991; Cohen et al., Reference Cohen, Cohen, West and Aiken2003). The inclusion of numerous variables, particularly as predictors, often introduces a level of dependence among them, potentially resulting in the well-known collinearity issue. Several methodologies have been proposed to tackle collinearity, one of which involves integrating a regression model with a data-reduction technique. Extended redundancy analysis (ERA) is one such technique (e.g., DeSarbo et al., Reference DeSarbo, Hwang, Blank and Kappe2015; Hwang et al., Reference Hwang, Suk, Lee, Moskowitz and Lim2012; Lee et al., Reference Lee, Choi, Kim, Kim, Consortium, Hwang and Park2016, Reference Lee, Kim, Choi, Hwang and Park2018; Lovaglio & Vacca, Reference Lovaglio and Vacca2016; Lovaglio & Vittadini, Reference Lovaglio and Vittadini2014; Takane & Hwang, Reference Takane and Hwang2005; Tan et al., Reference Tan, Choi and Hwang2015).
In ERA, a component is derived from a set of predictors as a weighted composite, maximizing the explained variance in the outcome variables. From a technical standpoint, ERA can be seen as a special case of structural equation models (SEM). This classification arises from ERA’s specification of a formative relation between the predictor set and each component and its exploration of the relationship between the constructed components and outcome variables.
To date, the majority of regression models employing data-reduction techniques, including ERA, have typically assumed that the outcome variables share the same data type, such as being either all continuous or all binary. However, in practical scenarios, encountering multivariate outcome data with diverse types is common—examples include mixed measurements involving both continuous and categorical outcomes. In such instances, we would no longer assume a simplified correlation structure such as compound symmetry or independence, which were conventionally used for most applications with multivariate data in behavioral and social sciences studies (e.g., Henningsson et al., Reference Henningsson, Sundbom, Armelius and Erdberg2001; Lovaglio & Vittadini, Reference Lovaglio and Vittadini2014).
Moreover, the application of mediation analysis has seen a growing prevalence across diverse fields, including psychology, sociology, economics, and other social sciences (e.g., Bullock et al., Reference Bullock, Green and Ha2010; MacKinnon & Fairchild, Reference MacKinnon and Fairchild2009; Selig & Preacher, Reference Selig and Preacher2021; VanderWeele, Reference VanderWeele2016). According to Pieters (Reference Pieters2017), mediation plays a crucial role in theory development, serving as an indispensable tool for assessing potential intermediate effects through intervening variables, referred to as mediators. It is conceptualized as a third-variable effect that illuminates the relationship between a predictor variable and an outcome variable (e.g., Baron & Kenny, Reference Baron and Kenny1986; Preacher & Hayes, Reference Preacher and Hayes2008). Researchers frequently utilize latent variables with SEM framework to investigate and validate theoretical models that encompass both direct and indirect pathways among variables. This involves examining how the influence of a predictor on an outcome variable operates through intermediary mediators. Despite these advancements, extant ERA models have given limited attention to explicating and analyzing such mediation effects.
This paper presents a Bayesian extension to ERA, expanding its application to facilitate not only the estimation of indirect effects involving multiple mediators but also the accommodation of a diverse array of outcome variable types. Implementing ERA with ordinal variables is challenging in the Frequentist approach due to its reliance on the alternating least squares algorithm. In contrast, the Bayesian approach offers greater flexibility in modeling complex structure, such as ordinal variables, by incorporating prior information and probabilistic reasoning. By leveraging a Markov Chain Monte Carlo (MCMC) algorithm, we derive the joint posterior distribution of key parameters, including indirect effects arising from components influencing outcome variables through mediators. To estimate indirect effects as a distinct set of parameters, we build on the fundamental ERA model specification, constructing components as weighted composites of observed predictors, and formulate a unified objective function encompassing both direct and mediation effects.
Furthermore, to enhance the efficiency of the MCMC algorithm while avoiding constraints on the covariance structure of outcome variables with mixed types, we adopt the assumption that a set of continuous latent variables, which underlies ordinal outcome variables, conforms to a multivariate t distribution (Park et al., Reference Park, Choi, Lee and Kyung2021). This strategy enables the incorporation of correlations with continuous outcome variables, which are presumed to adhere to a multivariate t distribution, thereby facilitating a more flexible and realistic modeling approach for mixed-type outcome data.
The proposed method is designed to serve several key objectives. Firstly, it aims to preserve the conceptual associations between predictors and their components, thereby handling multicollinearity in cases where a set of predictors exhibits high levels of correlation. Secondly, it seeks to incorporate the advantageous features of the Bayesian framework into ERA, allowing for the joint modeling of mixed outcome data that might otherwise be overlooked in social science research. Lastly, it provides a viable alternative for exploring potential intermediate effects through mediators within the ERA framework.
The remaining part of this paper is organized as follows. Section 2 first explains ERA with mixed types of outcome variables and then the full version of the proposed method that also includes multiple mediators in detail with parameter estimation. Section 3 provides a simulation study. Section 4 provides an application to illustrate the empirical usefulness of the proposed method. The final section summarizes the implications and possible extensions of the proposed method.
2 Method
2.1 Extended redundancy analysis
Let
${y}_{it}$
denote the ith value of the tth outcome variable
$\left(i=1,\cdots N;t=1,\cdots, T\right)$
and
${x}_{ilk}$
the ith value of the lth predictor in the kth set (
$l=1,\cdots, {p}_k$
and
$k=1,\cdots, K$
), where pk
refers to the number of predictors in the kth set. Let
$P={\sum}_{k=1}^K{p}_k$
be the total number of predictors in K sets. Let
${w}_{lk}$
denote a component weight assigned to
${x}_{ilk}$
. Let
${f}_{ik}$
denote the ith component score for the kth component defined as a linear combination or weighted composite of the predictors in the kth set, i.e.,
${f}_{ik}=\left[\sum \limits_{l=1}^{p_k}{x}_{ilk}{w}_{lk}\right]$
. Let akq
denote the kth regression coefficient connecting the kth component to the outcome variable
${y}_{it}$
, and
${e}_{it}$
denote the ith residual value for
${y}_{it}$
. ERA model can be written as follows:

In matrix notation, (1) is re-expressed as

where Y is an N by Q matrix of outcome variables, X is an N by P matrix of predictors, W is a P by K matrix of weights, A is a K by Q matrix of regression coefficients, and E is an N by Q matrix of residuals. For identifiability of F, a standardization constraint is imposed on F such that
$\mathit{diag}\left({\mathbf{F}}^{\prime}\mathbf{F}\right)=N\mathbf{I}$
. More details on the Frequentist ERA can be found in Takane and Hwang (Reference Takane and Hwang2005) or Choi et al. (Reference Choi, Kyung, Hwang and Park2020).
There have been several methodologies described to handle the collinearity problem, among which incorporating regression model with a data-reduction technique is an option. Principal component regression (PCR; Jolliffe, Reference Jolliffe1982), partial least squares (PLS; Wold, Reference Wold1975; Wold et al., Reference Wold, Ruhe, Wold and Dunn1984), and extended redundancy analysis (ERA; Takane & Hwang, Reference Takane and Hwang2005) share similarity such that they employ data reduction techniques to transform the predictors into a new set of uncorrelated underlying or latent constructs, more specifically called components. However, PCR extracts components by maximizing the explained variance of the predictors only without considering their associations with an outcome variable. Subsequently, the components are used as the predictors (to fit a regression model by least squares) in a regression model to predict the outcome variable (e.g., Abdi, Reference Abdi2010; Wehrens & Mevik, Reference Wehrens and Mevik2007). In PCR, because components are extracted independently of a regression model, they may not be optimal in explaining the outcome variable the best.
Different from PCR, PLS and ERA do take into account the association between the predictors and an outcome variable when extracting the components. A major distinct feature that differentiates ERA from PLS is on whether or not a single or unified objective function is derived for parameter estimation. While PLS sets up two different objective functions for extracting components and employing a regression model, respectively, ERA estimates the unknown parameters using a single (global) objective function. Also, ERA allows to handle multiple sets of predictors simultaneously whereas PLS involves only one set of predictors. In this paper, we will focus on the most inclusive form of regression with components, i.e., ERA, employing a unified single objective function.
2.2 ERA with mixed types of outcome variables
We consider multiple sets of
$Q$
outcome variables which are combinations of a set of
$T$
continuous responses and a set of
$Q-T$
ordinal responses of
$C$
categories. In many cases, outcome variables are correlated, and we need to consider the interdependency among outcome variables in the model regardless of the response structure to avoid biased statistical inference.
2.2.1 Univariate model structure with original regression mean
For continuous outcome variables, we consider a robust regression model to outliers, in the sense that a single out of bounds data point can strongly affect the inference for all the parameters in the model. We are able to reduce the influence of the influential points with considering a longer-tailed family of distributions compared to the normal population model, which allows for the possibility of extreme observations. One of the longer-tailed distributions which are considered frequently is the family of
$t$
distributions. In our work, we consider the distribution of errors with
$t$
distribution in the place of the normal. Thus, within the multiple regression context, we consider that for the
$q$
th response:
${\mathbf{Y}}_{\left[i,q\right]}\sim {t}_{\nu}\left({\mathbf{X}}_{\left[i,\right]}{\boldsymbol{\beta}}_{\left[,q\right]},{\sigma}_{iq}^2\right)$
if
${\mathbf{Y}}_{\left[i,q\right]}$
is a continuous response,
${\boldsymbol{Y}}_{\left[i,q\right]}\sim \textit{multinomial}\left(1,\left({p}_1^q,{p}_2^q,\cdots, {p}_C^q\right)\right)$
if
${\mathbf{Y}}_{\left[i,q\right]}$
is an ordinal response.
Here the
$t$
distribution
${t}_{\nu}\left({\mathbf{X}}_{\left[i,\right]}{\boldsymbol{\beta}}_{\left[,q\right]},{\sigma}_{iq}^2\right)$
is characterized by three parameters, center
${\mathbf{X}}_{\left[i,\right]}{\boldsymbol{\beta}}_{\left[,q\right]}$
, scale
${\sigma}_{iq}$
, and a degrees of freedom parameters
$\nu$
that determines the shape of the distribution, and
${p}_j^q=\Pr \left({\mathbf{Y}}_{\left[i,q\right]}=j\right)$
is the probability that the ith response for the qth outcome variable is the jth category with
$j=1,2,\dots, C$
for
$i=1,\dots, N$
,
$q=1,\dots, Q$
and
$\sum \limits_{j=1}^C{p}_j^q=1$
.
For the ordinal response, we consider latent variables with a continuous distributional assumption and cutpoints of the continuous scale for the latent variable. As in Albert and Chib (Reference Albert and Chib1993) with the underlying latent observation
${\mathbf{Z}}_{\left[i,q\right]}$
such that

where
${\mathbf{Z}}_{\left[i,q\right]}\sim L\left({\mathbf{X}}_{\left[i,\right]}{\boldsymbol{\beta}}_{\left[,q\right]},{\sigma}_{iq}\right)$
,
$\Psi \left(\cdot \right)$
is the cumulative distribution function of the standard logistic distribution, center
${\mathbf{X}}_{\left[i,\right]}{\boldsymbol{\beta}}_{\left[,q\right]}$
, scale
${\sigma}_{iq}$
, and cutpoints
$-\infty ={\gamma}_0^q<{\gamma}_1^q<\dots <{\gamma}_C^q=\infty$
.
Gelman et al. (Reference Gelman, Carlin, Stern, Dunson, Vehtari and Rubin2014) argued that the
$t$
distribution can be considered alternative to logistic and probit regression. Logistic and probit regressions can be nonrobust in the sense that for large absolute values of the linear predictors
${\mathbf{X}}_{\left[i,\right]}{\boldsymbol{\beta}}_{\left[,q\right]}$
, the inverse logit or probit transformations give probabilities close to 0 or 1. Such models could be made more robust by allowing the occasional misprediction for large values of
${\mathbf{X}}_{\left[i,\right]}{\boldsymbol{\beta}}_{\left[,q\right]}$
. This form of robustness is defined not in terms of the data
${\mathbf{Y}}_{\left[i,q\right]}$
but with respect to the predictors
${\mathbf{X}}_{\left[i,\right]}$
. A robust model can be implemented using the latent-variable formulation of discrete-data regression models which is described above, replacing the logistic or normal distribution of the latent continuous data
${\mathbf{Z}}_{\left[i,q\right]}$
with the model,
${\boldsymbol{Z}}_{\left[i,q\right]}\sim {t}_{\nu}\left({\boldsymbol{X}}_{\left[i,\right]}{\boldsymbol{\beta}}_{\left[,q\right]},{\sigma}_{iq}\right)$
. Gelman et al. (Reference Gelman, Carlin, Stern, Dunson, Vehtari and Rubin2014) argued that in realistic settings it is impractical to estimate
$\nu$
from the data, since the latent data are not directly observed, it is essentially impossible to form inference about the shape of their continuos underlying distribution, so it is set as a low value to ensure robustness. Setting
$\nu =4$
yields a distribution that is close to the logistic, and as
$\nu \to \infty$
, the model approaches the probit.
In the Bayesian inference, Gibbs sampler computations can often be simplified or convergence accelerated by adding auxiliary variables, and it is called data augmentation. One of the simple but important example of auxiliary variables is the
$t$
distribution which can be expressed as a mixture of normal distributions. The
${t}_{\nu}\left(\mu, {\sigma}^2\right)$
likelihood is equivalent to the model

where the
$V$
is auxiliary variable that cannot be directly observed.
2.2.2 Multivariate model structure with ERA mean
For the mean structure, we consider ERA with predefined components but unknown weights on predictors for each of components. ERA considers multiple sets (or blocks) of predictors and reduces each set into a component. Such formation is based on some substantive theories or domain knowledge about how certain predictors can be grouped into the same block and aggregated into a component. Each observed variable is hypothesized to be linked to only one component. Accordingly, some elements in
$\mathbf{W}$
will be constrained to be zero: more specifically, each row of
$\mathbf{W}$
has only one non-zero element (whose component weight will be freely estimated), and the remaining elements in each row will be constrained to zero. Following ERA model specification, the
$i$
th component
${\mathbf{F}}_{\left[i,\right]}$
is
${\mathbf{F}}_{\left[i,\right]}={\mathbf{X}}_{\left[i,\right]}\mathbf{W}$
for
$i=1,\cdots, N$
, and hence regression coefficients for the
$q$
th outcome variable
${\mathbf{A}}_{1\left[,q\right]}$
for
$q=1,\cdots, Q$
are unidentifiable, so
$\mathit{diag}\left({\mathbf{F}}^{\prime}\mathbf{F}\right)=\mathit{diag}\left({\mathbf{W}}^{\prime }{\mathbf{X}}^{\prime}\mathbf{XW}\right)=N\mathbf{I}$
with the
$N\times K$
component matrix
${\mathbf{F}}^{\prime}=\left({{\mathbf{F}}_{\left[1,\right]}^{\prime }},\cdots, {{\mathbf{F}}_{\left[N,\right]}^{\prime}}\right)$
is applied as a standardization contraint for identifiablity (Takane and Hwang, Reference Takane and Hwang2005).
For the multivariate logistic distribution, O’Brien and Dunson (Reference O’Brien and Dunson2004) proposed a Bayesian models with consideing a multivariate
$t$
distributed latent variables. The proposed multivariate logistic dstirbution results in the close approximation of the two densities, a multivariate logistic distribution of
$\nu$
degrees of freedom with center
$\boldsymbol{\mu}$
and scale matrix
$\mathbf{R}$
and a multivariate
$t$
distribution of
$\nu$
degrees of freedom with center
$\boldsymbol{\mu}$
and scale matrix
${\sigma}^2\mathbf{R}$
when
$\nu$
and
${\sigma}^2$
are chosen appropriately. To make the approximation almost exact, O’Brien and Dunson (Reference O’Brien and Dunson2004) set
${\sigma}^2={\pi}^2\left(\nu -2\right)/3\nu$
(a value chosen to make the variances of the univariate
$t$
and logistic distributions equail) and set
$\nu =7.3$
(a value chosen to minimize the integrated squared dsitnace between the univariate
$t$
and univariate logistic densities).
Park et al. (Reference Park, Choi, Lee and Kyung2021) introduced a Bayesian methodology for a component-based model that accounts for unstructured residual covariances, while regressing multivariate ordinal outcomes on pre-defined sets of predictors. The proposed Bayesian multivariate ordinal logistic model re-expresses ordinal outcomes of interest with a set of latent continuous variables based on an approximate multivariate t distribution based on the method of O’Brien and Dunson (Reference O’Brien and Dunson2004). This contributes not only to developing an efficient Gibbs sampler, a Markov Chain Monte Carlo algorithm, but also to facilitating the interpretation of regression coefficients as log-transformed odds ratio.
In this work, we consider multivariate t distribution for the residual error on continuous outcomes and multivariate t approximation of the latent continuous variables for the ordinal outcomes. We also need to consider the interdependency among outcome variables in the model regardless of the response structure to avoid biased statistical inference. Thus, with an ERA model of
$K$
components, for
$i=1,\dots, N$
and
$j=1,\dots, C$
${\mathbf{Y}}_{\left[i,q\right]}={\mathbf{Z}}_{\left[i,q\right]}$
for a continuous response
$\left(q=1,\cdots, T\right)$
${\mathbf{Y}}_{\left[i,q\right]}=j\kern0.6em \mathrm{if}\kern0.24em {\gamma}_{j-1}^q\le {\mathbf{Z}}_{\left[i,q\right]}\le {\gamma}_j^q$
for an ordianal response
$\left(q=T+1,\cdots, Q\right)$
,
we express our model as following:

where
${\mathbf{a}}_1$
is a length Q vector of intercepts,
${\mathbf{H}}_{\left[i,\right]}$
is a
$1 \times D$
vector of regression predictors,
${\mathbf{B}}_1$
is a
$D \times Q$
matrix of regression parameters,
${\mathbf{X}}_{[i,]}$
is a
$1 \times Q$
vector of ERA, W is a P by K matrix of weights,
${\mathbf{A}}_1$
is a
$K \times Q$
matrix of component coefficients,
${\boldsymbol{D}}_{1i}=diag\left({\xi}_{1i}^{-1}{\mathbf{1}}_T,{\overset{\sim}{\sigma}}^2{\phi}_{1i}^{-1}{\mathbf{1}}_{Q-T}\right)$
with length T vector of 1’s
${\mathbf{1}}_T{\mathbf{1}}_T$
and length
$Q-T$
vector of 1’s
${\mathbf{1}}_{Q-T}$
, and
${\boldsymbol{\Sigma}}_1$
is a
$Q \times Q$
unstructured variance-covariance matrix. Here
${\xi}_{1i}^{-1}$
and
${\phi}_{1i}^{-1}$
are precision parameters to form multivariate t distribution for the T continuous responses and
$Q-T$
ordinal responses, respectively. As discussed in O’Brien and Dunson (Reference O’Brien and Dunson2004), we set
$\overset{\sim }{\nu }$
= 7.3 and
${\overset{\sim }{\sigma}}^2={\pi}^2\left(\overset{\sim }{\nu }-1\right)/3\overset{\sim }{\nu }$
to make the multivariate t distribution approximate the multivariate logistic regression. Because a degrees of freedom parameters
${\nu}_1^{\ast }$
determines the shape of the distribution for continuous outcomes, we treat
${\nu}_1^{\ast }$
as an unknown parameter which need to be estimated.
2.3 The proposed method with mediating effects
A conventional single-level mediation model is expressed as follows:

where M, Y, and X denote a mediator, an outcome variable, and a predictor, respectively, with M being unobservable while Y and X are observable. Miočević et al. (Reference Miočević, Gonzalez, Valente and MacKinnon2017) included an interaction term (
$XM)$
in their model predicting Y

to test if the moderation effect is statistically significant based on Bayesian inference. They considered diffuse (non-informative) priors for the coefficients in their empirical example. In the Markov Chain Monte Carlo Estimation section, they used trace plots to evaluate whether the chains were mixing well, which is indicative of convergence, and provided indexes for diagnosing convergence. Even with the interaction term, there was no issue with the identifiability of model coefficient estimates.
For the indirect or mediation effect of intervening impacts on outcomes, we consider
$S$
contemporaneous mediators. For
$S$
mediators, we consider a multivariate t distribution with correlation structure on error term and a component-based ERA structure for the mean trend. As we described above, we express a multivariate t distribution with the degrees of freedom
${\nu}_2^{\ast }$
as the hierarchical form of a normal mixture such that

where
${\mathbf{a}}_2$
is a length
$S$
vector of intercepts,
${\mathbf{B}}_2$
is a
$D$
by
$S$
matrix of regression parameters,
${\mathbf{A}}_2$
is a
$K$
by
$S$
matrix of component coefficients, and
${\boldsymbol{\Sigma}}_2$
is a
$S$
by
$S$
unstructured variance-covariance matrix. We treat the degrees of freedom
${\nu}_2^{\ast }$
as an unknown parameter for the shape of the distribution.
Similar to the response model in (2), with the T continuous responses and
$Q-T$
ordinal responses, the multivariate mixed outcomes model with mediators can be expressed as

for
$i=1,\dots, N$
and
$j=1,\dots, C$
. Here
${\mathbf{a}}_3$
is a length
$Q$
vector of intercepts,
${\mathbf{B}}_3$
is a
$D$
by
$Q$
matrix of regression parameters,
${\mathbf{A}}_3$
is a
$K$
by
$Q$
matrix of component coefficients,
${\mathbf{A}}_4$
is a
$S$
by
$Q$
matrix of coefficients relating the mediators to the dependent variables adjusted for the independent variables,
${\boldsymbol{D}}_{3i}=diag\left({\xi}_{3i}^{-1}{\mathbf{1}}_T,\kern1em {\overset{\sim }{\sigma}}^2{\phi}_{3i}^{-1}{\mathbf{1}}_{Q-T}\right)$
with length
$T$
vector of 1’s
${\mathbf{1}}_T$
and length
$Q-T$
vector of 1’s
${\mathbf{1}}_{Q-T}$
,
${\boldsymbol{\Sigma}}_3$
is a
$Q$
by
$Q$
unstructured variance-covariance matrix and
${\xi}_{3i}^{-1}$
and
${\phi}_{3i}^{-1}$
are precision parameters to form multivariate t distribution. As we discussed above,
$\overset{\sim }{\nu}=7.3$
and
${\overset{\sim }{\sigma}}^2={\pi}^2\left(\overset{\sim }{\nu }-1\right)/3\overset{\sim }{\nu }$
for the approximation to multivariate logistic regression and we treat
${\nu}_3^{\ast }$
as an unknown parameter for the shape of the distribution for continuous outcomes.
In our model (2)–(4), the pre-determined components
$\mathbf{F}=\mathbf{XW}$
are considered to explain the relationship between predictors and multivariate outcomes (2), to extent to which components changes the mediators (3), and to explain which components are related to the multivariate outcomes adjusted for the mediation (4). Thus the mediated effects can be calculated in two ways as either
${\mathbf{A}}_1-{\mathbf{A}}_3$
or
${\mathbf{A}}_2{\mathbf{A}}_4$
. The value of the indirect effect can be estimated by taking the difference in the coefficients
${\mathbf{A}}_1-{\mathbf{A}}_3$
from (2) and (3) corresponds to the reduction in the independent variables effect on the dependence variables when adjusted for the mediator. The product of coefficients method is based on the estimation of (3) and (4) to form the mediated or indirect effect. It can be interpreted as that mediation depends on the extent to which the components change the mediators
${\mathbf{A}}_2$
and the extent to which the mediator affects the outcome variables adjusted for the predictors
${\mathbf{A}}_4$
. For the Bayesian inference with rich explanation, we use (3) and (4) models in our work.
2.4 Parameter estimation
2.4.1 Prior distributions
In the proposed method, with the likelihood of multivariate t distribution formation of the mediating effects in (3) and of the response with mediators in (4), we consider the conjugate distributions for specifying priors on the parameters of interest. For the prior on the weight W and the component coefficients A, we consider the large valued hyper-parameters of the covariance. For a more flexible Bayesian model, we might be able to consider a prior distribution on the hyper-parameters such as an inverse gamma prios distribution for the conjugacy. However, in our proposed method, to mitigate the effects fo priors, we consider flat priors with large valued hyper-parameters and because of the conjugacy, the convergence of chains could be gauranteed. Note that specifying a prior distribution for the unconstrained covariance matrix
$\boldsymbol{\Sigma}$
makes our approach different from O’Brien and Dunson (Reference O’Brien and Dunson2004), in which unique off-diagonal elements of a correlation matrix are assumed to follow a prior distribution of no specific form. This simple modification leads us more efficient but easier posterior computation, as detailed in Park et al. (Reference Park, Choi, Lee and Kyung2021) for the multivariate ordinal responses. Details about the prior distributions can be found in Appendix A1.
For the prior on
${\nu}_2^{\ast }$
and
${\nu}_3^{\ast }$
, degrees of freedom parameters of the distribution for mediators and continuous outcomes, we try a uniform density on
$1/{\nu}^{\ast }$
for the range
$\left[0,1\right]$
. Gelman et al. (Reference Gelman, Carlin, Stern, Dunson, Vehtari and Rubin2014) explained that the parameterization in terms of
$1/{\nu}^{\ast }$
rather than
${\nu}^{\ast }$
has the advantage of including the normal distribution at
$1/{\nu}^{\ast}=0$
and encompassing the entire range from normal to Cauchy distribution in the finite interval
$\left[0,1\right]$
. This prior distribution favors longer-tailed models. We cannot parameterize the
${t}_{\nu }$
distribution in terms of their variance, because the variance is infinite for
$\nu \le 2$
. Instead the interquartile range would be a more reasonable parameter than the curvature for setting up a prior distribution. The interquartile range varies mildly as a function of
$\nu$
, and we consider the convenient parameterization in terms of mean and variance, and set
$g\left({\nu}^{\ast}\right)\propto 1$
.
2.4.2 Markov chain Monte Carlo
When employing conjugate priors for model parameters, the majority of the full conditional posterior distributions can be derived in closed form. However, closed-form solutions are not available for the posterior distributions of precision parameters and degrees of freedom parameters. Therefore, we update these parameters using a Metropolis step within the MCMC iterations. For the other parameters, Gibbs sampling is a straightforward and easily implementable method. A code programmed in software R (version 4.3.1, R Core Team, 2023) is available on GitHub.
To update parameters from the posterior distributions, we choose initial values of parameters from the prior distributions and update parameters based on the full conditional distributions. First, we update the common weight parameter
$\mathbf{W}$
in mediator and response mean terms and standardize it for the identifiability. Next, we update the parameters of the mediator model
$\left({\boldsymbol{\xi}}_2,{\nu}_2^{\ast },{\mathbf{A}}_2,{\mathbf{B}}_2,{\mathbf{a}}_2,{\boldsymbol{\Sigma}}_2^{-1}\right)$
simultaneously and update the latent variable of the ordinal outcomes from the truncated conditional normal distribution. Finally, we update the other parameters in response model
$\left(\boldsymbol{\gamma}, {\boldsymbol{\phi}}_3,{\boldsymbol{\xi}}_3,{\nu}_3^{\ast },{\mathbf{A}}_3,{\mathbf{A}}_4,{\mathbf{B}}_3,{\mathbf{a}}_3,{\boldsymbol{\Sigma}}_3^{-1}\right)$
from the conditional posterior distribution of each. We repeat the described steps and the process of the sampling schemes is in Figure 1. Also, the full conditional posterior distributions of parameters and the detailed sampling schemes are in Appendix A2.

Figure 1 MCMC algorithm to update parameters from the posterior distribution.
There are a few comments to make regarding the proposed algorithm. First, to ensure the identifiability of
$\mathbf{F}=\mathbf{XW}$
, a standardization constraint is applied to
$\mathbf{F}$
such that
$\mathit{diag}\left({\mathbf{F}}^{\prime}\mathbf{F}\right)=\mathit{diag}\left({\mathbf{W}}^{\prime }{\mathbf{X}}^{\prime}\mathbf{XW}\right)=N\mathbf{I}$
, i.e., a procedure inherent in any ERA model. Second, sampling
${\boldsymbol{\Sigma}}^{-1}$
from a Wishart distribution can be easily done in most statistical softwares, for example, with function rWishart from the stats library in R. Furthermore, noting that scaled coefficients with
$diag\left(\boldsymbol{\Sigma} \right)$
being the diagonal variance matrix corresponding to
$\boldsymbol{\Sigma}$
are identifiable and correlations from
$\boldsymbol{\Sigma}$
are constrained to be [−1, 1], our proposed algorithm employs the parameter expansion of Gelman et al. (Reference Gelman, Huang, van Dyk and Boscardin2004), which proves improved convergence of Gibbs sampler in generalized linear models. In contrast, O’Brien and Dunson (Reference O’Brien and Dunson2004) draw unique correlations corresponding to
$\boldsymbol{\Sigma}$
instead using a Metropolis algorithm (Metropolis & Ulam, Reference Metropolis and Ulam1949), for which it is difficult to find a good value of a tuning parameter to control an acceptance probability, especially in a case of sampling multiple variates, often resulting in slow convergence of the algorithm. Third, there is still an additional identifiability problem in intercept
${\mathbf{a}}_{3\left[\left(T+1\right):Q\right]}$
, cutpoints
$\boldsymbol{\Gamma}$
, and
$diag\left({\overset{\sim }{\boldsymbol{\varSigma}}}_3\right)$
. As Hirk et al. (Reference Hirk, Hornik and Vana2019) pointed out,
$\frac{\gamma_j^q-{a}_{3,q}}{\sigma_{qq}}$
with
${a}_{3,q}$
being the qth element of
${\mathbf{a}}_3$
and
${\sigma}_{qq}$
being the qth diagonal element of
${\overset{\sim }{\boldsymbol{\varSigma}}}_3$
,
$j=1,\dots, C-1$
and
$q=T+1,\dots, Q$
, is only identifiable, and we set for all
$q=T+1,\dots, Q$
to secure the identifiability of scaled intercepts
$\frac{a_{3,q}}{\sigma_{qq}}$
and scaled cutpoints
$\frac{\gamma_j^q}{\sigma_{qq}}$
. Such a constraint of
${\gamma}_1^q=0$
is a common practice in Bayesian logistic regression models for both univariate and multivariate binary outcomes, which is a special case of the proposed multivariate ordinal logistic regression model when
$C=2$
. The fixing of the cutpoint at zero in the process of MCMC is to prevent the identifiability problem of the parameter estimation.
3 Simulation study
To validate the performance of the proposed method, we conducted simulation studies varying sample sizes of N = 100, 300, and 500 for a model with two continuous (
$T=2$
) and two ordinal outcome variables (
$Q-T=2$
) using (3) and (4). The hypothesized model had two mediators, and there were no covariate or explanatory variables affecting mediators and outcome variables, as shown in Figure 2. For data generation, we used


Figure 2 A hypothesized model for the first simulation study (with one continuous and two ordinal outcome variables).
where
${{\mathbf{E}}_{2\left[i,\right]}^{\prime}}\sim {N}_S\left(\mathbf{0},{\xi}_{2i}^{-1}{\boldsymbol{\Sigma}}_2\right)$
with
${\xi}_{2i}\sim Gamma\left(1.5,1.5\right)$
and
${{\mathbf{E}}_{3\left[i,\right]}^{\prime}}\sim {N}_Q\left(\mathbf{0},{\mathbf{D}}_{3i}^{1/2}{\boldsymbol{\Sigma}}_3{\mathbf{D}}_{3i}^{1/2}\right)$
with
${\mathbf{D}}_{3i}=diag\left({\xi}_{3i}^{-1}{\mathbf{1}}_T, {\overset{\sim }{\sigma}}^2{\phi}_{3i}^{-1}{\mathbf{1}}_{Q-T}\right)$
,
${\xi}_{3i}\sim Gamma\left(2.5,2.5\right)$
,
${\overset{\sim }{\sigma}}^2={\pi}^2\left(7.3-2\right)/\left(3\times 7.3\right)$
, and
${\phi}_{3i}\sim Gamma\left(7.3/2,7.3/2\right)$
. In (5), the true parameter values were set at
${\mathbf{a}}_{02}=\left[\begin{array}{c}1\\ {}1\end{array}\right]$
,
$\mathbf{W}=\left[\begin{array}{cc}{w}_{11}& {w}_{12}\\ {}{w}_{21}& {w}_{22}\\ {w}_{31}& {w}_{32}\\ {w}_{41} & {w}_{42}\\ {w}_{51}& {w}_{52}\\ {w}_{61} & {w}_{62}\\ {}{w}_{71}& {w}_{72} \end{array}\right]=\left[\begin{array}{cc}0.9& 0\\ 0.5 & 0\\ 0 & 0\\ 0 & 0\\ 0 & 0 \\ 0& 0.5\\ 0& 0.5\end{array}\right]$
,
${\mathbf{A}}_2=\left[\begin{array}{cc}{a}_{2\left[1,1\right]}& {a}_{2\left[1,2\right]}\\ {}{a}_{2\left[2,1\right]}& {a}_{2\left[2,1\right]}\end{array}\right]=\left[\begin{array}{cc}2& 0\\ {}0& 2\end{array}\right]$
,
${\mathbf{a}}_{03}=\left[\begin{array}{c}1\\ {}0\\ {}2\\ {}-1\end{array}\right]$
,
${\mathbf{A}}_3=\left[\begin{array}{cccc}{a}_{3\left[1,1\right]}& {a}_{3\left[1,2\right]}& {a}_{3\left[1,3\right]}& {a}_{3\left[1,4\right]}\\ {}{a}_{3\left[2,1\right]}& {a}_{3\left[2,2\right]}& {a}_{3\left[2,3\right]}& {a}_{3\left[2,4\right]}\end{array}\right]=\left[\begin{array}{cccc}1& 2& 1& 2\\ {}0& 1.5& 1.5& 0\end{array}\right]$
, and
${\mathbf{A}}_4=\left[\begin{array}{cccc}{a}_{4\left[1,1\right]}& {a}_{4\left[1,2\right]}& {a}_{4\left[1,3\right]}& {a}_{4\left[1,4\right]}\\ {}{a}_{4\left[2,1\right]}& {a}_{4\left[2,2\right]}& {a}_{4\left[2,3\right]}& {a}_{4\left[2,4\right]}\end{array}\right]=\left[\begin{array}{cccc}0& 1& 0.5& 0\\ {}2& 1& 1.5& 2\end{array}\right]$
with the covariance matrices
${\boldsymbol{\Sigma}}_2=\left[\begin{array}{cc}1& 0.3\\ {}0.3& 1\end{array}\right]$
and
${\boldsymbol{\Sigma}}_3=\left[\begin{array}{cccc}0.8& 0.3& 0.3& 0.3\\ {}0.3& 0.8& 0.3& 0.3\\ {}0.3& 0.3& 1& 0.3\\ {}0.3& 0.3& 0.3& 1\end{array}\right]$
. In addition,
${\mathbf{X}}_{\left[i,\right]}$
was sampled from a multivariate normal distribution
${N}_p\left(\mathbf{0},\boldsymbol{\Omega} \right)$
with
$\boldsymbol{\Omega} =\left[\begin{array}{cccc}1& 0.3& 0.1& 0.1\\ {}0.3& 1& 0.1& 0.1\\ {}0.1& 0.1& 1& 0.6\\ {}0.1& 0.1& 0.6& 1\end{array}\right]$
. Here
$P=7$
,
$K=2$
,
$S=2$
,
$T=2$
, and
$Q=4$
.
Note that the first two element of
${{\mathbf{E}}_{3\left[i,\right]}^{\prime }}$
was sampled from a multivariate
$t$
distribution with mean
$0$
and scale matrix of
${\boldsymbol{\Sigma}}_{3\left[1:2,1:2\right]}$
after integrating over
${\xi}_{3i}$
, and the next two elements were sampled from a multivariate
$t$
distribution with mean
$0$
and scale matrix of
${\overset{\sim }{\sigma}}^2{\boldsymbol{\varSigma}}_{3\left[3:4,3:4\right]}$
after integrating over
${\phi}_{3i}$
, while these three elements were correlated. A multivariate logistic random variables underlying the two ordinal outcomes were then generated by transforming each of the multivariate
$t$
distributed random variables of
${\mathbf{E}}_{3\left[i,3:4\right]}$
using the following formula proposed by O’Brien and Dunsion (Reference O’Brien and Dunson2004),

where
${\mu}_q={\mathbf{a}}_{03\left[q\right]}+{{\mathbf{A}}_{3\left[,q\right]}^{\prime }}{\mathbf{W}}^{\prime }{{\mathbf{X}}_{\left[i,\right]}^{\prime }}+{{\mathbf{A}}_{4\left[,q\right]}^{\prime }}{{\mathbf{M}}_{\left[i,\right]}^{\prime }}$
and
${F}_{\nu}\left(\cdot \right)$
is the cumulative Student’s t distribution with
$\nu$
degrees of freedom. The transformation from multivariate t variables to multivariate logistic variables is derived in detail in the supplementary information of Kyung et al. (Reference Kyung, Park and Choi2021). Once
${\mathbf{Z}}_{\left[i,q\right]}^{\ast }$
for
$q=3,4$
, was generated, two sets of cutoffs were used to obtain the ordinal variables;
${\boldsymbol{\Gamma}}_{\left[,3\right]}={\left(0,3,6\right)}^{\prime }$
and
${\boldsymbol{\Gamma}}_{\left[,4\right]}={\left(0,2,4\right)}^{\prime }$
.
For the proposed method, the hyperparameters for diffuse prior distributions were set with all the prior mean parameters such as
${\mathbf{W}}_{\left[,k\right]}^0$
being zero,
${\tau}^2{\boldsymbol{\Sigma}}_{{\mathbf{W}}_k}^0=10{\mathbf{I}}_7$
,
${\eta}_2^2{\boldsymbol{\Sigma}}_{{\mathbf{A}}_{2k}}^0=100{\mathbf{I}}_2$
,
${\eta}_3^2{\boldsymbol{\Sigma}}_{{\mathbf{A}}_{3k}}^0={\eta}_4^2{\boldsymbol{\Sigma}}_{{\mathbf{A}}_{4s}}^0=100{\mathbf{I}}_4$
,
${\sigma}_2^2{\boldsymbol{\Sigma}}_{{\mathbf{a}}_{02}}^0=100{\mathbf{I}}_2$
,
${\sigma}_3^2{\boldsymbol{\Sigma}}_{{\mathbf{a}}_{03}}^0=100{\mathbf{I}}_4$
,
${\nu}_2={\nu}_3=0.01$
,
${\boldsymbol{\Sigma}}_{20}^{-1}=100{\mathbf{I}}_2$
,
${\boldsymbol{\Sigma}}_{30}^{-1}=100{\mathbf{I}}_4$
, and the total number of iterations were set at 30,000. Note that these hyperparameters’ setting and the total number of iterations remained the same for all of the analyses conducted in this manuscript (also same for the empirical dataset presented in the next section). Fast convergence and good mixing of the MCMC chain were observed for all parameters of interest across all the simulation scenarios considered. The first 2,000 iterations were discarded as a burn-in period and every fifth posterior sample was used by applying a thinning approach to calculate the posterior means and credible intervals (CI) for the parameters of interest. Relevant simulation code written in R (R Core Team, 2023) is available on GitHub. Fast convergence and good mixing of the MCMC chain were observed for all parameters of interest across all the simulation scenarios considered. As an example, trace plots are presented in Figure 3.

Figure 3 Trace plots of selected parameters for the simulation study with N = 100.
Table 1 presents the results of analyzing the simulated data with two continuous and two ordinal outcome variables varying sample sizes. With increased sample size, posterior mean estimates obtained from the proposed method became on average closer to the true parameter values that were well embedded by narrower corresponding CIs. The residual variances and covariances for the two mediators and four outcome variables (
${\boldsymbol{\Sigma}}_2$
and
${\boldsymbol{\Sigma}}_3$
) exhibited similar patterns: with increased sample sizes, their estimates became closer to the prescribed true values with more precision (resulting in narrower CIs). This indicated that the proposed method is capable of taking into account the dependency among the outcome variables, recovering the true parameter values well.
Table 1 Results of the simulation study varying sample sizes (N = 100, 300, 500)

Note: Post.mean = posterior mean; 95CI.low = lower bound of the 95% credible interval; and 95CI.up = upper bound of the 95% credible interval. These abbreviated terms remained the same hereinafter.
Table 2 displays the results for indirect effects, whose true values were calculated based on the product term of the prescribed true values for
${\mathbf{A}}_2$
and
${\mathbf{A}}_4$
, i.e.
${\mathbf{A}}_2{\mathbf{A}}_4$
. For instance, the true value of an indirect effect from F
2 on Y
1 via M1 was calculated by multiplying the true values of the effect of F
2 on M1 and that of M1 on Y
1 (i.e., a
2[2,1]
$\times$
a
4[1,1]). At each iteration of MCMC, the obtained posterior samples on
${\mathbf{A}}_2$
and
${\mathbf{A}}_4$
, were used to get the posterior samples for the indirect effects. Based on those posterior samples, we computed posterior means and CIs for the indirect effects. As shown in Table 2, the posterior means of all indirect effect estimates were close to the prescribed true values, and this pattern became salient with narrower CIs as the sample size increased.
Table 2 Results of indirect effect estimates obtained from the simulation study varying across sample sizes

For comparison, using the same generated datasets from this simulation study, we fitted the same model but excluding the two mediators, that is corresponding to the model in Equation (2) without H. Table 3 displays the results without the mediators. Although there were no true values assigned to the direct effects between predictors and outcome variables because the simulation setting was generated with multiple mediators, each of the direct effects could be indirectly inferred from the total effect (i.e., addition of indirect and direct effects) obtained from a model with the mediators. Their direct effects, however, were just rough estimates after controlling for other predictor(s) and mediator(s) (an exact calculation would become only feasible in a setting with single predictor and mediator but without any covariates). When a model was mis-specified by omitting mediators, we found out that the estimates of intercepts a and residual variance-covariance matrix
${\boldsymbol{\Sigma}}_3$
would be more likely to be biased than the weight estimates W. The weight estimates remained relatively consistent with the earlier results shown in Table 1. However, unlike the weight estimates, elements of a and
${\boldsymbol{\Sigma}}_3$
exhibited more bias and susceptibility to the absence of mediators. This indicated that the direct effects alone were insufficient in accounting for all the omitted information associated with the mediators and rather remained as unexplained variation in outcome variables. Consequently, this contributed to biases in the estimates of their intercepts and the residual variance-covariance matrix.
Table 3 Results of total effect estimates without the mediators from the simulation study varying across sample sizes

* True values are equivalent to Table 1 as the simulation datasets were generated based on a model with the two mediators.
4 An illustrative example
We used a subset of the National Survey on Drug Use and Health (NSDUH) data (Substance Abuse and Mental Health Services Administration (SAMHSA), United States Department of Health and Human Services, 2015), from which 2,347 observations (N = 2,347 whose age was 12 or older) responded to a number of questionnaire items concerning substance use and health in 2012. As shown in Figure 4, we constructed two components, i.e., socioeconomic status (SES) and drug history (DRH). The first component SES was defined as a linear combination of four observed variables: education, insurance, family income, and employment status. The second component DRH was constructed as a linear combination of use of cigarette and alcohol, asking the age of first use. The extent to which participants were dependent on nicotine (measured by nicotine dependence syndrome scale) (nicotine) was used one of the outcome variables. Another continuous outcome variable was measuring past month psychological distress (Pdistress). The remaining two outcome variables were binary variables asking whether or not they have been dependent on alcohol (alcohol) and pain reliever (pain reliver) in the past year. In this example, we examined the effect of SES and drug history on alcohol dependence, psychological distress, and other drug addictions while exploring perceived mental and physical health conditions as potential mediators. Specifically, there were three mediators named distress (measuring the level of psychological distress in the past year excluding the latest past month used for measuring Pdistress), health (measuring the overall perceived health condition), and disturb (measuring the level of psychological impairment and disturbances in social adjustment and behavior). It was hypothesized that the perception of mental and physical health conditions would both mediate the relationship between SES and drug history to alcohol and drug addictions.

Figure 4 A hypothesized model for the empirical data.
We fitted this dataset with the proposed method, fast convergence was observed across all parameters of interest as shown in Figure 5. The results for the parameters are presented in Table 4. Looking at the results of direct effects from components on outcome variables given in A 3 , the two components SES and DRH showed significant and negative direct effects on nicotine addition and pain reliever but not on pain reliver. It suggested that those who have higher socioeconomic status and exposed to cigarette later were less likely to show nicotine dependence (a 3[3,1] = −0.090, 95% CI = [−0.117, −0.063]; a 3[4,1] = −0.120, 95% CI = [−0.143, −0.098]) and pain reliver dependence (a 3[3,4] = 0.383, 95% CI = [0.209, 0.701]; a 3[4,4] = 0.381, 95% CI = [0.223, 0.646]). The odds of pain reliver dependence was reduced by approximately 62% as both SES and DRH increased by a one-unit. Similarly, the directionality between the two components and past month psychological distress (Pdistress) were all negative. The statistical significance, however, seemed to be present for SES only among the two components. Among the covariates, age showed consistently significant impacts on nicotine, past month distress and pain reliever dependencies (a 3[2,1] = 0.137, 95% CI = [0.107, 0.167]; a 3[2,2] = −0.041, 95% CI = [−0.067, −0.015]; a 3[2,4] = 0.433, 95% CI = [0.205, 0.925]): older people tended to show more dependency on nicotine but less on past month distress and pain reliever. Each additional increase of one year in age was associated with a 56.7% decrease in the odds of being more dependent on pain reliever. Females showed much higher dependency rate on alcohol (1.44 times larger in the odds for alcohol dependency: a 3[1,2] = 1.439, 95% CI = [1.171, 1.776]) compared to males, but no significant direct impact on nicotine dependence, past month distress and pain reliever dependence.

Figure 5 Trace plots of selected parameters for the empirical data.
Table 4 Results of fitting the National Survey on Drug Use and Health (NSDUH) data

* The parameters loaded on binary outcome variables were reported by exponentiating the original estimate values. Those reported in odds ratio were displayed with grey shading behind the text. This applied the same to the following table.
The estimates for mediation effects of distress, health, and disturb are presented in Table 5. The indirect effect of distress as a mediator was significant on the two components as well as the two covariates. It was observed that SES and DRH were significant direct predictors of the lower level of nicotine dependence (as shown in A3 from Table 3) and also had significant indirect effects through distress (a
2[3,1]
$\times$
a
4[1,1] = −0.008, 95% CI = [−0.013, −0.004] for the indirect effect of SES on nicotine; a
2[4,1]
$\times$
a
4[1,1] = −0.005, 95% CI = [−0.009, −0.002] for the indirect effect of DRH on nicotine). SES and DRH led to lower levels of distress, which in turn yielded a positive impact on nicotine dependence. That is, those who have higher socioeconomic status and were exposed to cigarettes later showed lower level of distress (a
2[3,1] = −0.100, 95% CI = [−0.133, −0.068]; a
2[4,1] = −0.064, 95% CI = [−0.095, −0.034]), and this also led to less nicotine addiction (a
4[1,1] = 0.085, 95% CI = [0.049, 0.122]). This same relationship was also present for the mediation effect of distress on past month distress (a
2[3,1]
$\times$
a
4[1,2] = −0.047, 95% CI = [−0.063, −0.032] for the indirect effect of SES on distress; a
2[4,1]
$\times$
a
4[1,2] = −0.030 95%CI = [−0.045, −0.016] for the indirect effect of DRH on distress). For alcohol dependence, the odds were reduced by about 10% through the mediation effect via distress (a
2[3,1]
$\times$
a
4[1,3] = 0.876, 95% CI = [0.817, 0.930] for the indirect effect SES on alcohol; a
2[4,1]
$\times$
a
4[1,3] = 0.919, 95% CI = [0.870, 0.962] for the indirect effect of DRH on alcohol). While there was a notable and significant indirect effect observed for nicotine, past month distress, and alcohol dependence through a lower level of distress in the past year, the indirect effects on pain reliever were not found to be significant.
Table 5 Results of indirect effect estimates obtained from the empirical NSDUH data

In addition to significant direct effects from SES and DRH on nicotine dependence (a
3[3,1] = −0.090, 95% CI = [−0.117, −0.063] for SES; a
3[4,1] = −0.120, 95% CI = [−0.143, −0.098] for DRH), SES and DRH had a significant indirect effect on nicotine dependence (a
2[3,2]
$\times$
a
4[2,1] = −0.018, 95% CI = [−0.026, −0.011] for the indirect effect of SES on nicotine via health; a
2[4,2]
$\times$
a
4[2,1] = −0.005, 95% CI = [−0.008, −0.002] for the indirect effect of DRH on nicotine via health) and past month distress (a
2[3,2]
$\times$
a
4[2,2] = −0.024, 95% CI = [−0.031, −0.017] for the indirect effect of SES on past month distress via health; a
2[4,2]
$\times$
a
4[2,2] = −0.006, 95% CI = [−0.010, −0.003] for the indirect effect of DRH on distress via health) through lower risk for health problems, respectively. The odds of pain reliever dependence increased by 1.231 (a
2[3,2]
$\times$
a
4[2,4] = 1.231, 95% CI = [1.038, 1.457]) and 1.056 (a
2[4,2]
$\times$
a
4[2,4] = 1.056, 95% CI = [1.009, 1.120]) times, respectively, through the mediation effect via health. Such a pattern was not shown in alcohol dependence. Lastly, one’s perceived level of disturbances in social adjustment and behavior significantly mediated the relationships between the two components and three outcome variables (past month distress, alcohol, and pain reliever dependence). The estimated indirect effects from SES and DRH on distress (a
2[3,3]
$\times$
a
4[3,2] = −0.006 (95% CI = [−0.010, −0.002]); a
2[4,3]
$\times$
a
4[3,2] = −0.002 (95% CI = [−0.005, −0.001]), alcohol (a
2[3,3]
$\times$
a
4[3,3] = 0.910 (95% CI = [0.855, 0.959]); a
2[4,3]
$\times$
a
4[3,3] = 0.962 (95% CI = [0.926, 0.989]), and pain reliever dependence (a
2[3,3]
$\times$
a
4[3,4] = 0.846 (95% CI = [0.753, 0.934]); a
2[4,3]
$\times$
a
4[3,4] = 0.933 (95% CI = [0.867, 0.981]) via disturbances, respectively. SES and DRH showed a negative and significant effect on disturbance, which in turn led to a positive and significant impact on all three outcome variables. Individuals with a lower perceived level of disturbance, influenced by higher SES status and later onset of cigarette consumption, exhibited reduced addictive consumption of alcohol and pain reliever (though not for nicotine dependence). Specifically, when considering odds ratios obtained by exponentiating the corresponding posterior estimates, a one-unit increase in SES and DRH was associated with a 9.00% and 3.80% decrease in the odds of being more dependent on alcohol, respectively, mediated by the level of psychological impairment and disturbance. This statistical significance and interpretation remained consistent for the mediated effect of SES and DRH on pain reliever dependency via disturbance, resulting in a 15.40% and 6.70% decrease in the odds for pain reliever dependency, respectively.
5 Concluding remarks
We introduced a multivariate component-based regression model designed to handle mixed types of outcomes and estimate multiple pathways of indirect effects with multiple mediators within a Bayesian framework. The efficacy of our proposed approach was validated through simulated and real data instances. The simulation design intentionally aimed to depict a simplified scenario, characterized by significant discrepancies in actual values, with a particular focus on parameter estimation. In a real-world application, we analyzed a subset of NSDUH data to elucidate how the underlying mechanism of perceived mental and physical health conditions influences the relationship between components (SES and DRH) and drug dependence (nicotine, alcohol, and pain reliever).
In addition to its technical and empirical implications, the proposed method holds the potential for further enhancement in flexibility. An intriguing extension could involve the incorporation of variable selection techniques. While the current conceptualization of the method relies on a predetermined model, there are scenarios where it may be preferable to select an optimal subset of mediators, especially when dealing with a large number of potential mediators. The inclusion of techniques such as lasso (Tibshirani, Reference Tibshirani1996) or elastic net (Zou & Hastie, Reference Zou and Hastie2005) for variable selection could help eliminate irrelevant mediators, thereby improving the interpretability of potential indirect effects within the model.
Moreover, the current proposed method assumes that subjects in the dataset are randomly sampled from the same population. Consequently, it is not possible with the current version to assess differences in effect estimates for different subject clusters using the mediation model with mixed types of outcome data. Considering a recent study proposing a Bayesian approach to ERA with mixture modeling (Kyung et al., Reference Kyung, Park and Choi2021), future investigations are warranted to explore the technical and empirical feasibility of incorporating such settings.
Acknowledgements
This research was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No. 2021R1F1A1049834; NRF-2021R1F1A1049834).
Software availability
To ensure applicability, we have shared an R function along with the simulation code specific to this paper on GitHub (https://github.com/jhppack/BERA_mediation).
1 Appendix
A1 Prior distribution
In the proposed method, we consider the following conjugate distributions for specifying priors on the parameters of interest:
-
•
${\mathbf{W}}_{\left[,k\right]}\mid {\tau}^2\sim {N}_{p_k}\left({\mathbf{W}}_{\left[,k\right]}^0,\ {\tau}^2{\boldsymbol{\Sigma}}_{{\mathbf{W}}_k}^0\right)$ for
$k=1,\dots, K$
-
•
${{\mathbf{A}}_{2\left[k,\right]}^{\prime}}\mid {\eta}_2^2\sim {N}_S\left({{\mathbf{A}}_{2\left[k,\right]}^{0\prime }},\ {\eta}_2^2{\boldsymbol{\Sigma}}_{{\mathbf{A}}_{2k}}^0\right)$ for
$k=1,\dots, K$
-
•
${{\mathbf{B}}_{2\left[d,\right]}^{\prime}}\mid {\delta}_2^2\sim {N}_S\left({{\mathbf{B}}_{2\left[d,\right]}^{0\prime }},\ {\delta}_2^2{\boldsymbol{\Sigma}}_{B_{2d}}^0\right)$ for
$d=1,\dots, D$
-
•
${\mathbf{a}}_2\mid {\sigma}_2^2\sim {N}_S\left({\boldsymbol{\alpha}}_2^0,\ {\sigma}_2^2{\boldsymbol{\Sigma}}_{{\mathbf{a}}_{20}}^0\right)$
-
•
${\boldsymbol{\Sigma}}_2^{-1}\mid {\nu}_2\sim {Wishart}_S\left({\nu}_2,\ {\boldsymbol{\Sigma}}_{20}^{-1}\right)$ where
${\nu}_2\ge S$
-
•
${{\mathbf{A}}_{3\left[k,\right]}^{\prime}}\mid {\eta}_3^2\sim {N}_Q\left({{\mathbf{A}}_{3\left[k,\right]}^{0\prime }},\ {\eta}_3^2{\boldsymbol{\Sigma}}_{{\mathbf{A}}_{3k}}^0\right)$ for
$k=1,\dots, K$
-
•
${{\mathbf{A}}_{4\left[s,\right]}^{\prime}}\mid {\eta}_4^2\sim {N}_Q\left({{\mathbf{A}}_{4\left[s,\right]}^{0\prime }},\ {\eta}_4^2{\boldsymbol{\Sigma}}_{{\mathbf{A}}_{4s}}^0\right)$ for
$s=1,\dots, S$
-
•
${{\mathbf{B}}_{3\left[d,\right]}^{\prime}}\mid {\delta}_3^2\sim {N}_Q\left({{\mathbf{B}}_{3\left[d,\right]}^{0\prime }},\ {\delta}_3^2{\boldsymbol{\Sigma}}_{B_{3d}}^0\right)$ for
$d=1,\dots, D$
-
•
${\mathbf{a}}_3\mid {\sigma}_3^2\sim {N}_Q\left({\boldsymbol{\alpha}}_3^0,\ {\sigma}_3^2{\boldsymbol{\Sigma}}_{{\mathbf{a}}_{30}}^0\right)$
-
•
${\boldsymbol{\Sigma}}_3^{-1}\mid {\nu}_3\sim {Wishart}_Q\left({\nu}_3,\ {\boldsymbol{\Sigma}}_{30}^{-1}\right)$ where
${\nu}_3\ge Q$
-
•
$1/{\nu}_2^{\ast}\sim U\left[0,1\right]$
-
•
$1/{\nu}_3^{\ast}\sim U\left[0,1\right]$
-
•
${\boldsymbol{\Gamma}}_{\left[j,q\right]}\equiv {\gamma}_j^q\sim \pi \left({\gamma}_j^q\right)\propto constant$ for
$j=1,\dots, C-1$ and
$q=T+1,\dots, Q$ .
Here,
${\mathbf{W}}_{\left[,k\right]}$
refers to a
${p}_k$
by 1 column vector containing the weight estimates for the kth component,
${{\mathbf{A}}_{2\left[k,\right]}^{\prime }}$
is a
$S$
by 1 column vector containing the component coefficients for the kth component affecting the mediators,
${\mathbf{A}}_{3\left[k,\right]}^{\prime }$
is a
$Q$
by 1 column vector containing the component coefficients for the kth component affecting outcome variables when adjusted for the mediator and
${\mathbf{A}}_{4\left[s,\right]}^{\prime }$
is a
$Q$
by 1 column component coefficients vector of the sth mediation variable adjusted for the predictors. Also,
${\mathbf{B}}_{2\left[d,\right]}^{\prime }$
is a
$S$
by 1 column vector containing the regression coefficients for the dth predictor affecting the mediators and
${\mathbf{B}}_{3\left[d,\right]}^{\prime }$
is a
$Q$
by 1 column vector containing the regression coefficients for the dth predictor affecting the outcomes.
$\boldsymbol{\Gamma}$
is a
$C-1$
by
$Q-T$
matrix of
$C-1$
cutpoints for
$Q-T$
ordinal responses.
A2 Sampling scheme
An overview of the general sampling scheme is as follows.
ERA structure parameter
-
•
${\mathbf{W}}_{\left[,k\right]}\mid {\mathbf{W}}_{\left[,-k\right]},{\mathbf{A}}_2,{\mathbf{A}}_3,{\mathbf{A}}_4,{\mathbf{B}}_2,{\mathbf{B}}_3,{\mathbf{a}}_2,{\mathbf{a}}_3,{\boldsymbol{\Sigma}}_2,{\boldsymbol{\Sigma}}_3,\boldsymbol{\Gamma}, {\phi}_3,{\boldsymbol{\xi}}_3,{\nu}_3^{\ast },{\boldsymbol{\xi}}_2,{\nu}_2^{\ast },\mathbf{Z},\mathbf{X},\mathbf{M},\mathbf{Y}$
$\sim {N}_{p_k}\left({\mathbf{W}}_{\left[,k\right]}^{\ast },{\boldsymbol{\Sigma}}_{{\mathbf{W}}_k}\right)$
for
$k=1,\cdots, K$ where
${\boldsymbol{\Sigma}}_{{\mathbf{W}}_k}={\left(\sum \limits_{i=1}^N{\xi}_{2i}{{\mathbf{X}}_{\left[i,\right]}^{\prime }}{\mathbf{A}}_{2\left[k,\right]}{\boldsymbol{\Sigma}}_2^{-1}{{\mathbf{A}}_{2\left[k,\right]}^{\prime }}{\mathbf{X}}_{\left[i,\right]}+\sum \limits_{i=1}^N{{\mathbf{X}}_{\left[i,\right]}^{\prime }}{\mathbf{A}}_{3\left[k,\right]}{\mathbf{D}}_{3i}^{-1/2}{\boldsymbol{\Sigma}}_3^{-1}{\mathbf{D}}_{3i}^{-1/2}{{\mathbf{A}}_{3\left[k,\right]}^{\prime }}{\mathbf{X}}_{\left[i,\right]}+\frac{1}{\tau^2}{{\boldsymbol{\Sigma}}_{{\mathbf{W}}_k}^{0-1}}\right)}^{-1}$
${\mathbf{W}}_{\left[,k\right]}^{\ast} = {\boldsymbol{\Sigma}}_{{\mathbf{W}}_k}\left(\sum \limits_{i=1}^N{\xi}_{2i}{{\mathbf{X}}_{\left[i,\right]}^{\prime }}{\mathbf{A}}_{2\left[k,\right]}{\boldsymbol{\Sigma}}_2^{-1}{{\mathbf{M}}_{\left[i,\right]}^{-k\prime }}+\sum \limits_{i=1}^N{{\mathbf{X}}_{\left[i,\right]}^{\prime }}{\mathbf{A}}_{3\left[k,\right]}{\mathbf{D}}_{3i}^{-1/2}{\boldsymbol{\Sigma}}_3^{-1}{\mathbf{D}}_{3i}^{-1/2}{{\mathbf{Z}}_{\left[i,\right]}^{-k\prime }}+\frac{1}{\tau^2}{{\boldsymbol{\Sigma}}_{{\mathbf{W}}_k}^{0-1}}{\mathbf{W}}_{\left[,k\right]}^0\right),$
${\mathbf{M}}_{\left[i,\right]}^{-k}={\mathbf{M}}_{\left[i,\right]}-{{\mathbf{a}}_2^{\prime }}-{\mathbf{H}}_{\left[i,\right]}{\mathbf{B}}_2-{\mathbf{X}}_{\left[i,\right]}{\mathbf{W}}_{\left[,-k\right]}{\mathbf{A}}_{2\left[-k,\right]}$ , and
${\mathbf{Z}}_{\left[i,\right]}^{-k}={\mathbf{Z}}_{\left[i,\right]}-{{\mathbf{a}}_3^{\prime }}-{\mathbf{H}}_{\left[i,\right]}{\mathbf{B}}_3-{\mathbf{X}}_{\left[i,\right]}{\mathbf{W}}_{\left[,-k\right]}{\mathbf{A}}_{3\left[-k,\right]}-{\mathbf{M}}_{\left[i,\right]}{\mathbf{A}}_4$ .
-
• To satisfy the stadardization constraint
$\mathit{diag}\left({\mathbf{F}}^{\prime}\mathbf{F}\right)=\mathit{diag}\left({\mathbf{W}}^{\prime }{\mathbf{X}}^{\prime}\mathbf{XW}\right)=N\mathbf{I}$ , we standardize
$\mathbf{W}$ such as
$\overline{\mathbf{W}}=\sqrt{N}{\left({\mathbf{W}}^{\prime }{\mathbf{X}}^{\prime}\mathbf{XW}\right)}^{-1/2}\mathbf{W}$ .
Mediator model parameters
-
•
${\xi}_{2i}\mid {\mathbf{Z}}_{\left[i,\right]},{\mathbf{X}}_{\left[i,\right]},\mathbf{W},{\mathbf{A}}_3,\mathbf{H},{\mathbf{B}}_3,\mathbf{M},{\mathbf{A}}_4,{\boldsymbol{\Sigma}}_3,{\mathbf{a}}_3,\boldsymbol{\Gamma}, {\phi}_{3i},{\nu}_3^{\ast },{\xi}_{3i},{\nu}_2^{\ast },\mathbf{Y}$
$\sim Gamma\left(\frac{S+{\nu}_2^{\ast }}{2}, \frac{\nu_2^{\ast }}{2}+\frac{1}{2}{\mathbf{E}}_{2\left[i,\right]}{{\boldsymbol{\Sigma}}_2^{-1}}{{\mathbf{E}}_{2\left[i,\right]}^{\prime}}\right)$
where
${\mathbf{E}}_{2\left[i,\right]}={\mathbf{M}}_{\left[i,\right]}-{{\mathbf{a}}_2^{\prime }}-{\mathbf{H}}_{\left[i,\right]}{\mathbf{B}}_2-{\mathbf{X}}_{\left[i,\right]}{\mathbf{WA}}_2$ for
$i=1,\cdots, N$ .
-
•
${\nu}_2^{\ast}\mid \mathbf{Z},\mathbf{X},\mathbf{W},{\mathbf{A}}_3,\mathbf{H},{\mathbf{B}}_3,\mathbf{M},{\mathbf{A}}_4,{\boldsymbol{\Sigma}}_3,{\mathbf{a}}_3,\boldsymbol{\Gamma}, {\phi}_3,{\boldsymbol{\xi}}_3,{\nu}_3^{\ast },{\boldsymbol{\xi}}_2,\mathbf{Y}$
$\sim \pi \left({\nu}_2^{\ast }|\mathbf{Z},\mathbf{X},\mathbf{W},{\mathbf{A}}_3,\mathbf{H},{\mathbf{B}}_3,\mathbf{M},{\mathbf{A}}_4,{\boldsymbol{\Sigma}}_3,{\mathbf{a}}_3,\boldsymbol{\Gamma}, {\phi}_3,{\boldsymbol{\xi}}_3,{\nu}_3^{\ast },{\boldsymbol{\xi}}_2,\mathbf{Y}\right)$
$$\begin{align*}\propto \kern1em \frac{{\left(\frac{\nu_2^{\ast }}{2}\right)}^{\frac{\nu_2^{\ast }}{2}N}}{\varGamma {\left(\frac{\nu_2^{\ast }}{2}\right)}^N}{\left(\prod\limits_{i=1}^N{\xi}_{2i}\right)}^{\frac{\nu_2^{\ast }}{2}-1}\mathit{\exp}\left(-\frac{\nu_2^{\ast }}{2}\sum\limits_{i=1}^N{\xi}_{2i}\right)\end{align*}$$
$\equiv p\left({\nu_2^{\ast}}^{(t)}\right)$ (target distribution at
$t$ th iteration)
Metropolis step
-
⧫ generate
${{\overset{\sim }{\nu}}_2^{\ast}}^{-1}\sim N\left(\frac{1}{{\nu_2^{\ast}}^{\left(t-1\right)}}, 1\right)$ where
$N\left(\left.{{\overset{\sim }{\nu}}_2^{\ast}}^{-1}\right|\frac{1}{{\nu_2^{\ast}}^{\left(t-1\right)}}, 1\right)\equiv q\left({{\overset{\sim }{\nu}}_2^{\ast}}^{-1}|{\nu_2^{\ast}}^{\left(t-1\right)}\right)$ is a candidate distribution function of
${\nu}_2^{\ast }$ .
-
⧫ compute acceptance probability
$A\left({{\overset{\sim }{\nu}}_2^{\ast}}^{-1},{\nu_2^{\ast}}^{\left(t-1\right)}\right)$ such that
$$\begin{align*}A\left({{\overset{\sim }{\nu}}_2^{\ast}}^{-1},{\nu_2^{\ast}}^{\left(t-1\right)}\right)=\min \left(1, \frac{p\left({\overset{\sim }{\nu}}_2^{\ast}\right)}{p\left({\nu_2^{\ast}}^{\left(t-1\right)}\right)}\right).\end{align*}$$
-
⧫ set
${\nu_2^{\ast}}^{(t)}={\overset{\sim }{\nu}}_2^{\ast }$ with probability
$A\left({{\overset{\sim }{\nu}}_2^{\ast}}^{-1},{\nu_2^{\ast}}^{\left(t-1\right)}\right)$
-
-
•
${{\mathbf{A}}_{2\left[k,\right]}^{\prime}}\mid {\mathbf{A}}_{2\left[-k,\right]},\overline{\mathbf{W}},{\mathbf{A}}_3,{\mathbf{A}}_4,{\mathbf{B}}_2,{\mathbf{B}}_3,{\mathbf{a}}_2,{\mathbf{a}}_3,{\boldsymbol{\Sigma}}_2,{\boldsymbol{\Sigma}}_3,\boldsymbol{\Gamma}, {\phi}_3,{\boldsymbol{\xi}}_3,{\nu}_3^{\ast },{\boldsymbol{\xi}}_2,{\nu}_2^{\ast },\mathbf{Z},\mathbf{X},\mathbf{M},\mathbf{Y}$
$$\begin{align*}\sim {N}_S\left({{\overset{\sim }{\mathbf{A}}}_{2\left[k,\right]}^{\prime }},{\boldsymbol{\Sigma}}_{{\boldsymbol{A}}_{2k}}\right)\end{align*}$$
$k=1,\dots, K$ where
${\boldsymbol{\Sigma}}_{{\mathbf{A}}_{2k}}={\left(\sum \limits_{i=1}^N{\xi}_{2i}{\mathbf{X}}_{\left[i,\right]}{\overline{\mathbf{W}}}_{\left[,k\right]}{\boldsymbol{\Sigma}}_2^{-1}{{\overline{\mathbf{W}}}_{\left[,k\right]}^{\prime }}{{\mathbf{X}}_{\left[i,\right]}^{\prime }}+\frac{1}{\eta_2^2}{{\boldsymbol{\Sigma}}_{{\mathbf{A}}_{2k}}^{0-1}}\right)}^{-1}$ and
$$\begin{align*}{{\overset{\sim }{\mathbf{A}}}_{2\left[k,\right]}^{\prime}}={\boldsymbol{\Sigma}}_{{\mathbf{A}}_{2k}}\left({\sum}_{i=1}^N{\xi}_{2i}{\mathbf{X}}_{\left[i,\right]}{\overline{\mathbf{W}}}_{\left[,k\right]}{\boldsymbol{\Sigma}}_2^{-1}{{\mathbf{M}}_{\left[i,\right]}^{-k\prime }}+\frac{1}{\eta_2^2}{{\boldsymbol{\Sigma}}_{{\mathbf{A}}_{2k}}^{0-1}}{{\mathbf{A}}_{2\left[k,\right]}^{0\prime}}\right).\end{align*}$$
-
•
${{\mathbf{B}}_{2\left[d,\right]}^{\prime}}\mid {\mathbf{B}}_{2\left[-d,\right]},\overline{\mathbf{W}},{\mathbf{A}}_2,{\mathbf{A}}_3,{\mathbf{A}}_4,{\mathbf{B}}_3,{\mathbf{a}}_2,{\mathbf{a}}_3,{\boldsymbol{\Sigma}}_2,{\boldsymbol{\Sigma}}_3,\boldsymbol{\Gamma}, {\phi}_3,{\boldsymbol{\xi}}_3,{\nu}_3^{\ast },{\boldsymbol{\xi}}_2,{\nu}_2^{\ast },\mathbf{Z},\mathbf{X},\mathbf{M},\mathbf{Y}$
$$\begin{align*}\sim {N}_S\left({{\overset{\sim }{\mathbf{B}}}_{2\left[d,\right]}^{\prime }},{\boldsymbol{\Sigma}}_{{\mathbf{B}}_{2d}}\right)\end{align*}$$
$d=1,\dots, D$ where
${\boldsymbol{\Sigma}}_{{\mathbf{B}}_{2d}}={\left(\sum \limits_{i=1}^N{\xi}_{2i}{\mathbf{H}}_{\left[i,d\right]}{\boldsymbol{\Sigma}}_2^{-1}{{\mathbf{H}}_{\left[i,d\right]}^{\prime }}+\frac{1}{\delta_2^2}{{\boldsymbol{\Sigma}}_{{\mathbf{B}}_{2d}}^{0-1}}\right)}^{-1}$ and
${{\overset{\sim }{\mathbf{B}}}_{2\left[d,\right]}^{\prime}}={\boldsymbol{\Sigma}}_{{\mathbf{B}}_{2d}}\left({\sum}_{i=1}^N{\xi}_{2i}{\mathbf{H}}_{\left[i,d\right]}{\boldsymbol{\Sigma}}_2^{-1}{{\mathbf{M}}_{\left[i,\right]}^{-d\prime }}+\frac{1}{\delta_2^2}{{\boldsymbol{\Sigma}}_{{\mathbf{B}}_{2d}}^{0-1}}{{\mathbf{B}}_{2\left[d,\right]}^{0\prime}}\right)$ with
${\mathbf{M}}_{\left[i,\right]}^{-d}={\mathbf{M}}_{\left[i,\right]}-{{\mathbf{a}}_2^{\prime }}-{\mathbf{H}}_{\left[i,-d\right]}{\mathbf{B}}_{2\left[-d,\right]}-{\mathbf{X}}_{\left[i,\right]}{\mathbf{WA}}_2$ .
-
•
${\mathbf{a}}_2\mid \overline{\mathbf{W}},{\mathbf{A}}_2,{\mathbf{A}}_3,{\mathbf{A}}_4,{\mathbf{B}}_2,{\mathbf{B}}_3,{\mathbf{a}}_3,{\boldsymbol{\Sigma}}_2,{\boldsymbol{\Sigma}}_3,\boldsymbol{\Gamma}, {\phi}_3,{\boldsymbol{\xi}}_3,{\nu}_3^{\ast },{\boldsymbol{\xi}}_2,{\nu}_2^{\ast },\mathbf{Z},\mathbf{X},\mathbf{M},\mathbf{Y}$
$$\begin{align*}\sim {N}_S\left({\overset{\sim }{\boldsymbol{a}}}_2,{\left({\sum}_{i=1}^N{\xi}_{2i}{\boldsymbol{\varSigma}}_2^{-1}+\frac{1}{\sigma_2^2}{{\boldsymbol{\varSigma}}_{{\boldsymbol{a}}_2}^{0-1}}\right)}^{-1}\right)\end{align*}$$
${\overset{\sim }{\mathbf{a}}}_2={\left({\sum}_{i=1}^N{\xi}_{2i}{\boldsymbol{\Sigma}}_2^{-1}+\frac{1}{\sigma_2^2}{{\boldsymbol{\Sigma}}_{{\mathbf{a}}_2}^{0-1}}\right)}^{-1}\left({\boldsymbol{\Sigma}}_2^{-1}{\sum}_{i=1}^N{\xi}_{2i}{\left({\mathbf{M}}_{\left[i,\right]}-{\mathbf{H}}_{\left[i,\right]}{\mathbf{B}}_2-{\mathbf{X}}_{\left[i,\right]}\overline{\mathbf{W}}{\mathbf{A}}_2\right)}^{\prime }+\frac{1}{\sigma_2^2}{{\boldsymbol{\Sigma}}_{{\mathbf{a}}_2}^{0-1}}{\boldsymbol{\alpha}}_2^0\right)$ .
-
•
${\boldsymbol{\Sigma}}_2^{-1}\mid \overline{\mathbf{W}},{\mathbf{A}}_2,{\mathbf{A}}_3,{\mathbf{A}}_4,{\mathbf{B}}_2,{\mathbf{B}}_3,{\mathbf{a}}_2,{\mathbf{a}}_3,{\boldsymbol{\Sigma}}_3,\boldsymbol{\Gamma}, {\phi}_3,{\boldsymbol{\xi}}_3,{\phi}_3,{\boldsymbol{\xi}}_3,{\nu}_3^{\ast },{\boldsymbol{\xi}}_2,{\nu}_2^{\ast },\mathbf{Z},\mathbf{X},\mathbf{M},\mathbf{Y}$
$\sim {Wishart}_S\left({\nu}_2+N,{\left({\boldsymbol{\Sigma}}_{20}+\sum \limits_{i=1}^N{\xi}_{2i}{{\mathbf{E}}_{2\left[i,\right]}^{\prime }}{\mathbf{E}}_{2\left[i,\right]}\right)}^{-1}\right)$ where

Outcome model parameters
-
•
${{\mathbf{Z}}_{\left[i,\right]}^{\prime}}\mid {\mathbf{X}}_{\left[i,\right]},\mathbf{W},{\mathbf{A}}_3,\mathbf{H},{\mathbf{B}}_3,\mathbf{M},{\mathbf{A}}_4,{\boldsymbol{\Sigma}}_3,{\mathbf{a}}_3,\boldsymbol{\Gamma}, {\phi}_{3i},{\xi}_{3i},{\nu}_3^{\ast },{\xi}_{2i},{\nu}_2^{\ast },\mathbf{Y}$
$\sim {N}_Q\left({\mathbf{a}}_3+{{\mathbf{B}}_3}^{\prime }{{\mathbf{H}}_{\left[i,\right]}^{\prime }}+{{\mathbf{A}}_3}^{\prime }{\mathbf{W}}^{\prime }{{\mathbf{X}}_{\left[i,\right]}^{\prime }}+{{\mathbf{A}}_4}^{\prime }{{\mathbf{M}}_{\left[i,\right]}^{\prime }}, {\mathbf{D}}_{3i}^{1/2}{\boldsymbol{\Sigma}}_3{\mathbf{D}}_{3i}^{1/2}\right)$
The variance matrix of the latent variable of the mixed outcomes will have the form of
${\mathbf{D}}_{3i}^{1/2}{\boldsymbol{\Sigma}}_3{\mathbf{D}}_{3i}^{1/2}\equiv \left(\begin{array}{cc}{\xi}_{3i}^{-1}{\boldsymbol{\Sigma}}_3^{11}& {\xi}_{3i}^{-1/2}\overset{\sim }{\sigma }{\phi}_{3i}^{-1/2}{\boldsymbol{\Sigma}}_3^{12}\\ {}{\xi}_{3i}^{-1/2}\overset{\sim }{\sigma }{\phi}_{3i}^{-1/2}{\boldsymbol{\Sigma}}_3^{21}& {\overset{\sim }{\sigma}}^2{\phi}_{3i}^{-1}{\boldsymbol{\Sigma}}_3^{22}\end{array}\right)$ where
${{\boldsymbol{\Sigma}}_3^{12}}^{\prime}={\boldsymbol{\Sigma}}_3^{21}$ . The continuous outcomes are observed and we set
${\mathbf{Y}}_{\left[i,1:T\right]}={\mathbf{Z}}_{\left[i,1:T\right]}$ , but for the ordinal outcome, we need to update the latent variable while considering the correlation with continuous outcomes. Thus, we consider the conditional distribution of the latent variable of the ordinal outcome given the continuous outcome,
$\left.{\mathbf{Z}}_{\left[i,\left(T+1\right):Q\right]}\right|{\mathbf{Z}}_{\left[i,1:T\right]}$ , and it will have a multivariate normal distribution with mean vector
$$\begin{align*}{\overset{\sim }{\boldsymbol{\mu}}}_i={\mathbf{a}}_{3\left[\left(T+1\right):Q\right]}+{{\mathbf{B}}_{3\left[,\left(T+1\right):Q\right]}^{\prime }}{{\mathbf{H}}_{\left[i,\right]}^{\prime }}+{{\mathbf{A}}_{3\left[,\left(T+1\right):Q\right]}}^{\prime }{\mathbf{W}}^{\prime }{{\mathbf{X}}_{\left[i,\right]}^{\prime }}+{{\mathbf{A}}_{4\left[,\left(T+1\right):Q\right]}^{\prime }}{{\mathbf{M}}_{\left[i,\right]}^{\prime }}+{\xi}_{3i}^{-1/2}\overset{\sim }{\sigma }{\phi}_{3i}^{-1/2}{\boldsymbol{\Sigma}}_3^{21}{{\boldsymbol{\Sigma}}_3^{11}}^{-1}{{\mathbf{E}}_{3\left[i,1:T\right]}^{\prime }}\end{align*}$$
${\mathbf{E}}_{3\left[i,1:T\right]}={\mathbf{Z}}_{\left[i,1:T\right]}-{{\mathbf{a}}_{3\left[1:T\right]}^{\prime }}-{\mathbf{H}}_{\left[i,\right]}{\mathbf{B}}_{3\left[,1:T\right]}-{\mathbf{X}}_{\left[i,\right]}{\mathbf{WA}}_{3\left[,1:T\right]}-{\mathbf{M}}_{\left[i,\right]}{\mathbf{A}}_{4\left[,1:T\right]}$ and covariance matrix
${\overset{\sim }{\boldsymbol{\Sigma}}}_i={\overset{\sim }{\sigma}}^2{\phi}_{3i}^{-1}\left({\boldsymbol{\Sigma}}_3^{22}-{\boldsymbol{\Sigma}}_3^{21}{{\boldsymbol{\Sigma}}_3^{11}}^{-1}{\boldsymbol{\Sigma}}_3^{12}\right)\equiv {\overset{\sim }{\sigma}}^2{\phi}_{3i}^{-1}{\overset{\sim }{\boldsymbol{\Sigma}}}_3$ . Thus we update the latent variable of the ordinal outcomes from the conditional distribution
$$\begin{align*}\left.{\mathbf{Z}}_{\left[i,\left(T+1\right):Q\right]}\right|{\mathbf{Z}}_{\left[i,1:T\right]}\sim {N}_{Q-T}\left({\overset{\sim }{\boldsymbol{\mu}}}_i,{\overset{\sim }{\boldsymbol{\Sigma}}}_i\right)\end{align*}$$
${\mathbf{Z}}_{\left[i,q\right]}$ is truncated at the left and right by
${\gamma}_{j-1}^q$ and
${\gamma}_j^q$ if
${\mathbf{Y}}_{\left[i,q\right]}=j$ for
$i=1,\dots, N$ ,
$j=1,\dots, C$ and
$q=T+1,\dots, Q$ .
-
•
${\gamma}_j^q\mid {\boldsymbol{\Gamma}}_{\left[-j,-q\right]},\mathbf{Z},\mathbf{X},\mathbf{W},{\mathbf{A}}_3,\mathbf{H},{\mathbf{B}}_3,\mathbf{M},{\mathbf{A}}_4,{\boldsymbol{\Sigma}}_3,{\mathbf{a}}_3,{\phi}_3,{\boldsymbol{\xi}}_3,{\nu}_3^{\ast },{\boldsymbol{\xi}}_2,{\nu}_2^{\ast },\mathbf{Y}$
$\sim U\left(\max \left\{\max \left({\mathbf{Z}}_{\left[i,q\right]}:{\mathbf{Y}}_{\left[i,q\right]}=j\right),{\gamma}_{j-1}^q\right\},\min \left\{\min \left({\mathbf{Z}}_{\left[i,q\right]}:{\mathbf{Y}}_{\left[i,q\right]}=j+1\right),{\gamma}_j^q\right\}\right)$
-
•
${\phi}_{3i}\mid {\mathbf{Z}}_{\left[i,\right]},{\mathbf{X}}_{\left[i,\right]},\mathbf{W},{\mathbf{A}}_3,\mathbf{H},{\mathbf{B}}_3,\mathbf{M},{\mathbf{A}}_4,{\boldsymbol{\Sigma}}_3,{\mathbf{a}}_3,\boldsymbol{\Gamma}, {\xi}_{3i},{\nu}_3^{\ast },{\xi}_{2i},{\nu}_2^{\ast },\mathbf{Y}$
$\sim \pi \left({\phi}_{3i}|{\mathbf{Z}}_{\left[i,\right]},{\mathbf{X}}_{\left[i,\right]},\mathbf{W},{\mathbf{A}}_3,\mathbf{H},{\mathbf{B}}_3,\mathbf{M},{\mathbf{A}}_4,{\boldsymbol{\Sigma}}_3,{\mathbf{a}}_3,\boldsymbol{\Gamma}, {\xi}_{3i},{\nu}_3^{\ast },{\xi}_{2i},{\nu}_2^{\ast },\mathbf{Y}\right)$
$$\begin{align*}\propto {\phi}_{3i}^{\frac{Q-T+\overset{\sim }{\nu }}{2}-1}\mathit{\exp}\left[-{\phi}_{3i}\left\{\frac{\overset{\sim }{\nu }}{2}+\frac{1}{2{\overset{\sim }{\sigma}}^2}{\left({{\mathbf{Z}}_{\left[i,\left(T+1\right):Q\right]}^{\prime }}-{\overset{\sim }{\boldsymbol{\mu}}}_i\right)}^{\prime }{{\overset{\sim }{\boldsymbol{\Sigma}}}_3^{-1}}\left({\mathbf{Z}}_{\left[i,\left(T+1\right):Q\right]}-{\overset{\sim }{\boldsymbol{\mu}}}_i\right)\right\}\right]\end{align*}$$
$\equiv p\left({\phi}_{3i}^{(t)}\right)$ (target distribution at
$t$ th iteration) for
$i=1,\cdots, N$
Metropolis–Hastings step
-
⧫ generate
${\phi}_{3i}^{\ast}\sim U\left({\phi}_{3i}^{\left(t-1\right)}-\delta, {\phi}_{3i}^{\left(t-1\right)}+\delta \right)$ where
$U\left({\phi}_{3i}^{\left(t-1\right)}-\delta, {\phi}_{3i}^{\left(t-1\right)}+\delta \right)$ is a uniform distribution with small number of
$\delta$ and it is a candidate distribution based on previous iteration sample
${\phi}_{3i}^{\left(t-1\right)}$ , thus
$U\left({\phi}_{3i}^{\ast }|{\phi}_{3i}^{\left(t-1\right)}-\delta, {\phi}_{3i}^{\left(t-1\right)}+\delta \right)\equiv q\left({\phi}_{3i}^{\ast }|{\phi}_{3i}^{\left(t-1\right)}\right)$ is a candidate distribution function of
${\phi}_{3i}$ .
-
⧫ compute acceptance probability
$\mathrm{A}\left({\phi}_{3i}^{\ast },{\phi}_{3i}^{\left(t-1\right)}\right)$ such that
$$\begin{align*}\mathrm{A}\left({\phi}_{3i}^{\ast },{\phi}_{3i}^{\left(t-1\right)}\right)=\min \left(1,\frac{p\left({\phi}_{3i}^{\ast}\right)q\left({\phi}_{3i}^{\ast }|{\phi}_{3i}^{\left(t-1\right)}\right)}{p\left({\phi}_{3i}^{\left(t-1\right)}\right)q\left({\phi}_{3i}^{\left(t-1\right)}|{\phi}_{3i}^{\ast}\right)}\right).\end{align*}$$
-
⧫ set
${\phi_{3i}}^{(t)}={\phi}_{3i}^{\ast }$ with probability
$\mathrm{A}\left({\phi}_{3i}^{\ast },{\phi}_{3i}^{\left(t-1\right)}\right)$
-
-
•
${\xi}_{3i}\mid {\mathbf{Z}}_{\left[i,\right]},{\mathbf{X}}_{\left[i,\right]},\mathbf{W},{\mathbf{A}}_3,\mathbf{H},{\mathbf{B}}_3,\mathbf{M},{\mathbf{A}}_4,{\boldsymbol{\Sigma}}_3,{\mathbf{a}}_3,\boldsymbol{\Gamma}, {\phi}_{3i},{\nu}_3^{\ast },{\xi}_{2i},{\nu}_2^{\ast },\mathbf{Y}$
$\sim \pi \left({\xi}_{3i}|{\mathbf{Z}}_{\left[i,\right]},{\mathbf{X}}_{\left[i,\right]},\mathbf{W},{\mathbf{A}}_3,\mathbf{H},{\mathbf{B}}_3,\mathbf{M},{\mathbf{A}}_4,{\boldsymbol{\Sigma}}_3,{\mathbf{a}}_3,\boldsymbol{\Gamma}, {\phi}_{3i},{\nu}_3^{\ast },{\xi}_{2i},{\nu}_2^{\ast },\mathbf{Y}\right)$
$\propto {\xi}_{3i}^{\frac{T+{\nu}_3^{\ast }}{2}-1}\exp \left[-{\xi}_{3i}\left\{\frac{\nu_3^{\ast }}{2}+\frac{1}{2}{\left({{\mathbf{Z}}_{\left[i,1:T\right]}^{\prime }}-{\boldsymbol{\mu}}_i^{\ast}\right)}^{\prime }{{\boldsymbol{\Sigma}}_3^{\ast}}^{-1}\left({\mathbf{Z}}_{\left[i,\left(T+1\right):Q\right]}-{\boldsymbol{\mu}}_i^{\ast}\right)\right\}\right]$
$\equiv p\left({\xi}_{3i}^{(t)}\right)$ (target distribution at
$t$ th iteration)
where
$\left.{\mathbf{Z}}_{\left[i,1:T\right]}\right|{\mathbf{Z}}_{\left[i,\left(T+1\right):Q\right]}$ , and it will have a multivariate normal distribution with a mean vector
$$\begin{align*}{\boldsymbol{\mu}}_i^{\ast}={\mathbf{a}}_{3\left[1:T\right]}+{{\mathbf{B}}_{3\left[,1:T\right]}^{\prime }}{{\mathbf{H}}_{\left[i,\right]}^{\prime }}+{{\mathbf{A}}_{3\left[,1:T\right]}^{\prime }}{\mathbf{W}}^{\prime }{{\mathbf{X}}_{\left[i,\right]}^{\prime }}+{{\mathbf{A}}_{4\left[,1:T\right]}^{\prime }}{{\mathbf{M}}_{\left[i,\right]}^{\prime }}+{\xi}_{3i}^{-1/2}\overset{\sim }{\sigma }{\phi}_{3i}^{-1/2}{\boldsymbol{\Sigma}}_3^{21}{{\boldsymbol{\Sigma}}_3^{22}}^{-1}{{\mathbf{E}}_{3\left[i,\left(T+1\right):Q\right]}^{\prime }}\end{align*}$$
${\mathbf{E}}_{3\left[i,\left(T+1\right):Q\right]}={\mathbf{Z}}_{\left[i,\left(T+1\right):Q\right]}-{{\mathbf{a}}_{3\left[\left(T+1\right):Q\right]}^{\prime }}-{\mathbf{H}}_{\left[i,\right]}{\mathbf{B}}_{3\left[,\left(T+1\right):Q\right]}-{\mathbf{X}}_{\left[i,\right]}{\mathbf{WA}}_{3\left[,\left(T+1\right):Q\right]}-{\mathbf{M}}_{\left[i,\right]}{\mathbf{A}}_{4\left[,\left(T+1\right):Q\right]}$ and covariance matrix
${\boldsymbol{\Sigma}}_3^{\ast \ast}={\xi}_{3i}^{-1}\left({\boldsymbol{\Sigma}}_3^{11}-{\boldsymbol{\Sigma}}_3^{21}{{\boldsymbol{\Sigma}}_3^{22}}^{-1}{\boldsymbol{\Sigma}}_3^{12}\right)\equiv {\xi}_{3i}^{-1}{\boldsymbol{\Sigma}}_3^{\ast }$ for
$i=1,\cdots, N$ .
Metropolis–Hastings step
-
⧫ generate
${\xi}_{3i}^{\ast}\sim U\left({\xi}_{3i}^{\left(t-1\right)}-{\delta}^{\ast },{\xi}_{3i}^{\left(t-1\right)}+{\delta}^{\ast}\right)$ where
$U\left({\xi}_{3i}^{\left(t-1\right)}-{\delta}^{\ast },{\xi}_{3i}^{\left(t-1\right)}+{\delta}^{\ast}\right)$ is a uniform distribution with small number of
${\delta}^{\ast }$ and it is a candidate distribution based on previous iteration sample
${\xi}_{3i}^{\left(t-1\right)}$ , thus
$U\left({\xi}_{3i}^{\left(t-1\right)}-{\delta}^{\ast },{\xi}_{3i}^{\left(t-1\right)}+{\delta}^{\ast}\right)\equiv q\left({\xi}_{3i}^{\ast }|{\xi}_{3i}^{\left(t-1\right)}\right)$ is a candidate distribution function of
${\xi}_{3i}$ .
-
⧫ compute acceptance probability
$\mathrm{A}\left({\xi}_{3i}^{\ast },{\xi}_{3i}^{\left(t-1\right)}\right)$ such that
$$\begin{align*}\mathrm{A}\left({\xi}_{3i}^{\ast },{\xi}_{3i}^{\left(t-1\right)}\right)=\min \left(1,\frac{p\left({\xi}_{3i}^{\ast}\right)q\left({\xi}_{3i}^{\ast }|{\xi}_{3i}^{\left(t-1\right)}\right)}{p\left({\xi}_{3i}^{\left(t-1\right)}\right)q\left({\xi}_{3i}^{\left(t-1\right)}|{\xi}_{3i}^{\ast}\right)}\right).\end{align*}$$
-
⧫ set
${\xi_{3i}}^{(t)}={\xi}_{3i}^{\ast }$ with probability
$\mathrm{A}\left({\xi}_{3i}^{\ast },{\xi}_{3i}^{\left(t-1\right)}\right)$
-
-
•
${\nu}_3^{\ast}\mid \mathbf{Z},\mathbf{X},\mathbf{W},{\mathbf{A}}_3,\mathbf{H},{\mathbf{B}}_3,\mathbf{M},{\mathbf{A}}_4,{\boldsymbol{\Sigma}}_3,{\mathbf{a}}_3,\boldsymbol{\Gamma}, {\phi}_3,{\boldsymbol{\xi}}_3,{\boldsymbol{\xi}}_2,{\nu}_2^{\ast },\mathbf{Y}$
$\sim \pi \left({\nu}_3^{\ast }|\mathbf{Z},\mathbf{X},\mathbf{W},{\mathbf{A}}_3,\mathbf{H},{\mathbf{B}}_3,\mathbf{M},{\mathbf{A}}_4,{\boldsymbol{\Sigma}}_3,{\mathbf{a}}_3,\boldsymbol{\Gamma}, {\xi}_{3i},{\phi}_{3i},{\tau}^2,{\eta}^2,{\sigma}^2,\mathbf{Y}\right)$
$$\begin{align*}\propto \kern1em \frac{{\left(\frac{\nu_3^{\ast }}{2}\right)}^{\frac{\nu_3^{\ast }}{2}N}}{\varGamma {\left(\frac{\nu_3^{\ast }}{2}\right)}^N}{\left(\prod\limits_{i=1}^N{\xi}_{3i}\right)}^{\frac{\nu_3^{\ast }}{2}-1}\mathit{\exp}\left(-\frac{\nu_3^{\ast }}{2}\sum\limits_{i=1}^N{\xi}_{3i}\right)\end{align*}$$
$\equiv p\left({\nu_3^{\ast}}^{(t)}\right)$ (target distribution at
$t$ th iteration)
Metropolis step
-
⧫ generate
${{\overset{\sim }{\nu}}_3^{\ast}}^{-1}\sim N\left(\frac{1}{{\nu_3^{\ast}}^{\left(t-1\right)}}, 1\right)$ where
$N\left(\left.{{\overset{\sim }{\nu}}_3^{\ast}}^{-1}\right|\frac{1}{{\nu_3^{\ast}}^{\left(t-1\right)}}, 1\right)\equiv q\left({{\overset{\sim }{\nu}}_3^{\ast}}^{-1}|{\nu_3^{\ast}}^{\left(t-1\right)}\right)$ is a candidate distribution function of
${\nu}_3^{\ast }$ .
-
⧫ compute acceptance probability
$A\left({{\overset{\sim }{\nu}}_3^{\ast}}^{-1},{\nu_3^{\ast}}^{\left(t-1\right)}\right)$ such that
$A\left({{\overset{\sim }{\nu}}_3^{\ast}}^{-1},{\nu_3^{\ast}}^{\left(t-1\right)}\right)=\min \left(1, \frac{p\left({\overset{\sim }{\nu}}_3^{\ast}\right)}{p\left({\nu_3^{\ast}}^{\left(t-1\right)}\right)}\right)$ .
-
⧫ set
${\nu_3^{\ast}}^{(t)}={\overset{\sim }{\nu}}_3^{\ast }$ with probability
$A\left({{\overset{\sim }{\nu}}_3^{\ast}}^{-1},{\nu_3^{\ast}}^{\left(t-1\right)}\right)$ .
-
-
•
${{\mathbf{A}}_{3\left[k,\right]}}^{\prime}\mid {\mathbf{A}}_{3\left[-k,\right]},\overline{\mathbf{W}},{\mathbf{A}}_2,{\mathbf{A}}_4,{\mathbf{B}}_2,{\mathbf{B}}_3,{\mathbf{a}}_2,{\mathbf{a}}_3,{\boldsymbol{\Sigma}}_2,{\boldsymbol{\Sigma}}_3,\boldsymbol{\Gamma}, {\phi}_3,{\boldsymbol{\xi}}_3,{\nu}_3^{\ast },{\boldsymbol{\xi}}_2,{\nu}_2^{\ast },\mathbf{Z},\mathbf{X},\mathbf{M},\mathbf{Y}$
$$\begin{align*}\sim {N}_Q\left({{\overset{\sim }{\mathbf{A}}}_{3\left[k,\right]}^{\prime }},{\boldsymbol{\Sigma}}_{{\mathbf{A}}_{3k}}\right)\end{align*}$$
$k=1,\dots, K$ where
${\boldsymbol{\Sigma}}_{{\mathbf{A}}_{3k}}={\left(\sum \limits_{i=1}^N{\mathbf{X}}_{\left[i,\right]}{\overline{\mathbf{W}}}_{\left[,k\right]}{\mathbf{D}}_{3i}^{-1/2}{\boldsymbol{\Sigma}}_3^{-1}{\mathbf{D}}_{3i}^{-1/2}{{\overline{\mathbf{W}}}_{\left[,k\right]}^{\prime }}{{\mathbf{X}}_{\left[i,\right]}^{\prime }}+\frac{1}{\eta_3^2}{{\boldsymbol{\Sigma}}_{{\mathbf{A}}_{3k}}^{0-1}}\right)}^{-1}$ and
${{\overset{\sim }{\mathbf{A}}}_{3\left[k,\right]}^{\prime}}={\boldsymbol{\Sigma}}_{{\mathbf{A}}_{3k}}\left({\sum}_{i=1}^N\ {\mathbf{X}}_{\left[i,\right]}{\overline{\mathbf{W}}}_{\left[,k\right]}{\mathbf{D}}_{3i}^{-1/2}{\boldsymbol{\Sigma}}_3^{-1}{\mathbf{D}}_{3i}^{-1/2}{{\mathbf{Z}}_{\left[i,\right]}^{-k\prime }}+\frac{1}{\eta_3^2}{{\boldsymbol{\Sigma}}_{{\mathbf{A}}_{3k}}^{0-1}}{{\mathbf{A}}_{3\left[k,\right]}^{0\prime}}\right).$
-
•
${{\mathbf{A}}_{4\left[s,\right]}^{\prime}}\mid {\mathbf{A}}_{4\left[-s,\right]},\overline{\mathbf{W}},{\mathbf{A}}_2,{\mathbf{A}}_3,{\mathbf{B}}_2,{\mathbf{B}}_3,{\mathbf{a}}_2,{\mathbf{a}}_3,{\boldsymbol{\Sigma}}_2,{\boldsymbol{\Sigma}}_3,\boldsymbol{\Gamma}, {\phi}_3,{\boldsymbol{\xi}}_3,{\nu}_3^{\ast },{\boldsymbol{\xi}}_2,{\nu}_2^{\ast },\mathbf{Z},\mathbf{X},\mathbf{M},\mathbf{Y}$
$$\begin{align*}\sim {N}_Q\left({{\overset{\sim }{\mathbf{A}}}_{4\left[s,\right]}^{\prime }},{\boldsymbol{\Sigma}}_{{\mathbf{A}}_{4s}}\right)\end{align*}$$
$s=1,\dots, S$ where
${\boldsymbol{\Sigma}}_{{\mathbf{A}}_{4s}}={\left(\sum \limits_{i=1}^N{\mathbf{M}}_{\left[i,s\right]}{\mathbf{D}}_{3i}^{-1/2}{\boldsymbol{\Sigma}}_3^{-1}{\mathbf{D}}_{3i}^{-1/2}{{\mathbf{M}}_{\left[i,s\right]}^{\prime }}+\frac{1}{\eta_4^2}{{\boldsymbol{\Sigma}}_{{\mathbf{A}}_{4s}}^{0-1}}\right)}^{-1}$ and
${{\overset{\sim }{\mathbf{A}}}_{4\left[s,\right]}^{\prime}}={\boldsymbol{\Sigma}}_{{\mathbf{A}}_{3k}}\left({\sum}_{i=1}^N\ {\mathbf{M}}_{\left[i,s\right]}{\mathbf{D}}_{3i}^{-1/2}{\boldsymbol{\Sigma}}_3^{-1}{\mathbf{D}}_{3i}^{-1/2}{{\mathbf{Z}}_{\left[i,\right]}^{-s\prime }}+\frac{1}{\eta_4^2}{{\boldsymbol{\Sigma}}_{{\mathbf{A}}_{4s}}^{0-1}}{{\mathbf{A}}_{4\left[s,\right]}^{0\prime}}\right)$ with
${\mathbf{Z}}_{\left[i,\right]}^{-s}={\mathbf{Z}}_{\left[i,\right]}-{{\mathbf{a}}_3^{\prime }}-{\mathbf{H}}_{\left[i,\right]}{\mathbf{B}}_3-{\mathbf{X}}_{\left[i,\right]}{\mathbf{WA}}_3-{\mathbf{M}}_{\left[i,-s\right]}{\mathbf{A}}_{4\left[-s,\right]}$
-
•
${{\mathbf{B}}_{3\left[d,\right]}^{\prime}}\mid {\mathbf{B}}_{3\left[-d,\right]},\overline{\mathbf{W}},{\mathbf{A}}_2,{\mathbf{A}}_3,{\mathbf{A}}_4,{\mathbf{B}}_2,{\mathbf{a}}_2,{\mathbf{a}}_3,{\boldsymbol{\Sigma}}_2,{\boldsymbol{\Sigma}}_3,\boldsymbol{\Gamma}, {\phi}_3,{\boldsymbol{\xi}}_3,{\nu}_3^{\ast },{\boldsymbol{\xi}}_2,{\nu}_2^{\ast },\mathbf{Z},\mathbf{X},\mathbf{M},\mathbf{Y}$
$$\begin{align*}\sim {N}_Q\left({{\overset{\sim }{\mathbf{B}}}_{3\left[d,\right]}^{\prime }},{\boldsymbol{\Sigma}}_{{\mathbf{B}}_{3d}}\right)\end{align*}$$
$d=1,\dots, D$ where
${\boldsymbol{\Sigma}}_{{\mathbf{B}}_{2d}}={\left(\sum \limits_{i=1}^N{\mathbf{H}}_{\left[i,d\right]}{\mathbf{D}}_{3i}^{-1/2}{\boldsymbol{\Sigma}}_3^{-1}{\mathbf{D}}_{3i}^{-1/2}{{\mathbf{H}}_{\left[i,d\right]}^{\prime }}+\frac{1}{\delta_3^2}{{\boldsymbol{\Sigma}}_{{\mathbf{B}}_{3d}}^{0-1}}\right)}^{-1}$ and
${{\overset{\sim }{\mathbf{B}}}_{3\left[d,\right]}^{\prime}}={\boldsymbol{\Sigma}}_{{\mathbf{B}}_{3d}}\left({\sum}_{i=1}^N{\mathbf{H}}_{\left[i,d\right]}{\mathbf{D}}_{3i}^{-1/2}{\boldsymbol{\Sigma}}_3^{-1}{\mathbf{D}}_{3i}^{-1/2}{{\mathbf{Z}}_{\left[i,\right]}^{-d\prime }}+\frac{1}{\delta_3^2}{{\boldsymbol{\Sigma}}_{{\mathbf{B}}_{3d}}^{0-1}}{{\mathbf{B}}_{3\left[d,\right]}^{0\prime}}\right)$ with
${\mathbf{Z}}_{\left[i,\right]}^{-d}={\mathbf{Z}}_{\left[i,\right]}-{{\mathbf{a}}_3^{\prime }}-{\mathbf{H}}_{\left[i,-d\right]}{\mathbf{B}}_{3\left[-d,\right]}-{\mathbf{X}}_{\left[i,\right]}{\mathbf{WA}}_3-{\mathbf{M}}_{\left[i,\right]}{\mathbf{A}}_4$
-
•
${\mathbf{a}}_3\mid \overline{\mathbf{W}},{\mathbf{A}}_2,{\mathbf{A}}_3,{\mathbf{A}}_4,{\mathbf{B}}_2,{\mathbf{B}}_3,{\mathbf{a}}_2,{\boldsymbol{\Sigma}}_2,{\boldsymbol{\Sigma}}_3,\boldsymbol{\Gamma}, {\phi}_3,{\boldsymbol{\xi}}_3,{\nu}_3^{\ast },{\boldsymbol{\xi}}_2,{\nu}_2^{\ast },\mathbf{Z},\mathbf{X},\mathbf{M},\mathbf{Y}$
$$\begin{align*}\sim {N}_Q\left({\overset{\sim }{\boldsymbol{a}}}_3,{\left(\sum_{i=1}^N{\boldsymbol{D}}_{3i}^{-1/2}{\boldsymbol{\varSigma}}_3^{-1}{\boldsymbol{D}}_{3i}^{-1/2}+\frac{1}{\sigma_3^2}{{\boldsymbol{\varSigma}}_{{\boldsymbol{a}}_3}^{0-1}}\right)}^{-1}\right)\end{align*}$$
where
${\overset{\sim }{\mathbf{a}}}_3={\boldsymbol{\Sigma}}_{{\mathbf{a}}_3}\left({\sum}_{i=1}^N{\mathbf{D}}_{3i}^{-1/2}{\boldsymbol{\Sigma}}_3^{-1}{\mathbf{D}}_{3i}^{-1/2}{\left({\mathbf{Z}}_{\left[i,\right]}-{\mathbf{H}}_{\left[i,\right]}{\mathbf{B}}_3-{\mathbf{X}}_{\left[i,\right]}\overline{\mathbf{W}}{\mathbf{A}}_3-{\mathbf{M}}_{\left[i,\right]}{\mathbf{A}}_4\right)}^{\prime }+\frac{1}{\sigma_3^2}{{\boldsymbol{\Sigma}}_{{\mathbf{a}}_3}^{0-1}}{\boldsymbol{\alpha}}_3^0\right)$ .
-
•
${\boldsymbol{\Sigma}}_3^{-1}\mid \overline{\mathbf{W}},{\mathbf{A}}_2,{\mathbf{A}}_3,{\mathbf{A}}_4,{\mathbf{B}}_2,{\mathbf{B}}_3,{\mathbf{a}}_2,{\mathbf{a}}_3,{\boldsymbol{\Sigma}}_2,\boldsymbol{\Gamma}, {\phi}_3,{\boldsymbol{\xi}}_3,{\nu}_3^{\ast },{\boldsymbol{\xi}}_2,{\nu}_2^{\ast },\mathbf{Z},\mathbf{X},\mathbf{M},\mathbf{Y}$
$\sim {Wishart}_Q\left({\nu}_3+N,{\left({\boldsymbol{\Sigma}}_{30}+\sum \limits_{i=1}^N{{\mathbf{E}}_{3\left[i,\right]}^{\ast\prime }}{\mathbf{E}}_{3\left[i,\right]}^{\ast}\right)}^{-1}\right)$ where
${\mathbf{E}}_{3\left[i,\right]}^{\ast}=\left({\mathbf{Z}}_{\left[i,\right]}-{{\mathbf{a}}_3^{\prime }}-{\mathbf{H}}_{\left[i,\right]}{\mathbf{B}}_3-{\mathbf{X}}_{\left[i,\right]}{\mathbf{WA}}_3-{\mathbf{M}}_{\left[i,\right]}{\mathbf{A}}_4\right){\mathbf{D}}_{3i}^{-1/2}.$