Objective Bayesian Edge Screening and Structure Selection for Ising Networks

M. Marsman; K. Huth; L. J. Waldorp; I. Ntzoufras

doi:10.1007/s11336-022-09848-8

Objective Bayesian Edge Screening and Structure Selection for Ising Networks

Published online by Cambridge University Press: 01 January 2025

M. Marsman

K. Huth ,

L. J. Waldorp and

I. Ntzoufras

Show author details

M. Marsman*: Affiliation:
University of Amsterdam, Psychological Methods
K. Huth: Affiliation:
University of Amsterdam, Psychological Methods Centre for Urban Mental Health
L. J. Waldorp: Affiliation:
University of Amsterdam, Psychological Methods
I. Ntzoufras: Affiliation:
Athens University of Economics and Business
*: Correspondence should be made to M. Marsman, University of Amsterdam, Psychological Methods, Nieuwe Achtergracht 129B, PO Box 15906, 1001 NK Amsterdam, The Netherlands. Email: m.marsman@uva.nl

Article contents

Abstract
Bayesian Model Specification
Structure Selection Consistency
Objective Prior Specification
Bayesian Edge Screening and Structure Selection for the Ising model
Numerical Illustrations
Network Analysis of Alcohol Use Disorder and Depression Data
Discussion
Footnotes
References

Rights & Permissions

Abstract

The Ising model is one of the most widely analyzed graphical models in network psychometrics. However, popular approaches to parameter estimation and structure selection for the Ising model cannot naturally express uncertainty about the estimated parameters or selected structures. To address this issue, this paper offers an objective Bayesian approach to parameter estimation and structure selection for the Ising model. Our methods build on a continuous spike-and-slab approach. We show that our methods consistently select the correct structure and provide a new objective method to set the spike-and-slab hyperparameters. To circumvent the exploration of the complete structure space, which is too large in practical situations, we propose a novel approach that first screens for promising edges and then only explore the space instantiated by these edges. We apply our proposed methods to estimate the network of depression and alcohol use disorder symptoms from symptom scores of over 26,000 subjects.

Keywords

Bayesian model selection ising model spike and slab prior depression alcohol use disorder

Type: Application Reviews and Case Studies
Information: Psychometrika , Volume 87 , Issue 1: Special Issue on Network Psychometrics in Action , March 2022 , pp. 47 - 82

DOI: https://doi.org/10.1007/s11336-022-09848-8 [Opens in a new window]
Creative Commons: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Copyright: Copyright © 2022 The Author(s)

Undirected graphical models, also known as Markov random fields (MRFs; Kindermann & Snell, Reference Kindermann and Snell1980), have become an indispensable tool to describe the complex interplay of variables in many fields of science. The Ising model (Ising , Reference Ising1925), or quadratic exponential model (Cox, Reference Cox1972), is one MRF that attracted the interest of psychologists. It is defined by the following probability distribution over the configurations of a p -dimensional vector $x$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {x}$$\end{document} , with $x \in {0, 1}^{p}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {x} \in \{0\text {, }1\}^p$$\end{document} ,

(1)

\begin{matrix} p (x ∣ μ, Σ) = \frac{1}{Z (μ, Σ)} exp (\sum_{i = 1}^{p} x_{i} μ_{i} + \sum_{i = 1}^{p - 1} \sum_{j = i + 1}^{p} x_{i} x_{j} σ_{ij}), \end{matrix}

which covers all main effects $μ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _i$$\end{document} and pairwise associations $σ_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{ij}$$\end{document} of the p binary variables. The pairwise associations encode the conditional dependence and independence relations between variables in the model: If an association is equal to zero, the two variables are independent given the rest of the variables, and there is no direct relation between them. Otherwise, the two variables are directly related. These relations can be visualized as edges in a network, where the model’s variables populate the network’s nodes. This view of the Ising model in psychological applications inspired the field of network psychometrics (Epskamp, Maris, Waldorp, & Borsboom, Reference Epskamp, Maris, Waldorp, Borsboom and Irwing2018; Marsman et al., Reference Marsman, Maris, Bechger and Glas2015), which now spans research in, among others, personality (Constantini et al., Reference Cramer, van der Sluis, Noordhof, Wichers, Geschwind, Aggen and Borsboom2012; Cramer et al., Reference Constantini, Richetin, Preti, Casini, Epskamp and Perugi2019), psychopathology (Borsboom & Cramer, Reference Borsboom and Cramer2013; Cramer et al.,Reference Cramer, van Borkulo, Giltay, van der Maas, Kendler, Scheffer and Borsboom2016), attitudes (Dalege et al., Reference Dalege, Borsboom, van Harreveld and van der Maas2019; Dalege, Borsboom, van Harreveld, & van der Maas, Reference Dalege, Borsboom, van Harreveld, van den Berg, Conner and van der Maas2016), educational measurement (Marsman, Maris, Bechger, & Glas, Reference Marsman, Maris, Bechger and Glas2015; Marsman, Tanis, Bechger, & Waldorp, Reference Marsman, Tanis, Bechger and Waldorp2019), and intelligence (Savi, Marsman, van der Maas, & Maris, Reference Savi, Marsman, van der Maas and Maris2019; van der Maas, Kan, Marsman, & Stevenson, Reference van der Maas, Kan, Marsman and Stevenson2017).

The primary objective in empirical applications of the Ising model is determining the network’s structure or topology. Three practical challenges complicate this objective. The first practical challenge is the normalizing constant $Z (μ, Σ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {Z}(\varvec{\mu }\text {, }\varvec{\Sigma })$$\end{document} in Eq. (1), which is a sum over all $2^{p}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^p$$\end{document} possible configurations of the binary vector $x$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {x}$$\end{document} . Even for small graphs, this normalizing constant can be expensive to compute. For example, for a network of 20 variables, the normalizing constant consists of more than one million terms. Given that the normalizing constant is repeatedly evaluated in numerical optimization or simulation approaches to estimate the model’s parameters, the direct computation of the likelihood is computationally intractable. The second practical challenge in determining the Ising model’s structure is the balance between model complexity and data. With p main effects and $(\frac{p}{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${p\atopwithdelims ()2}$$\end{document} pairwise interactions, the number of free parameters can quickly overwhelm the limited information in available data. The third practical challenge is the efficient selection of a structure with desirable statistical properties from the vast space of possible structures. For a network of 20 variables, the structure space comprises $2^{190} = 1.57 \times 10^{57}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^{190} = 1.57 \times 10^{57}$$\end{document} potential structures, which is simply too large to enumerate in practice.

In psychology, eLasso (van Borkulo et al., Reference van Borkulo, Borsboom, Epskamp, Blanken, Boschloo, Schoevers and Waldorp2014) is the structure selection solution for the Ising model and overcomes all three challenges. First, it adopts a pseudolikelihood approach to circumvent the normalizing constant. The pseudolikelihood replaces the joint distribution of the vector variable $x$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {x}$$\end{document} —i.e., the full Ising model in Eq. (1)—with its respective full-conditional distributions:

(2)

\begin{matrix} p^{*} (x & ∣ μ, Σ) \propto \prod_{i = 1}^{p} p (x_{i} ∣ x^{(i)}, μ_{i}, σ_{i}^{(i)}) = \frac{exp (\sum_{i = 1}^{p} x_{i} μ_{i} + \sum_{i = 1}^{p - 1} \sum_{j = i + 1}^{p} x_{i} x_{j} σ_{ij})}{\prod_{i = 1}^{p} (1 + exp (μ_{i} + \sum_{j \neq i} σ_{ij} x_{j}))}, \end{matrix}

where $σ_{i}^{(i)} = {(σ_{i 1}, \dots, σ_{i (i - 1)}, σ_{i (i + 1)}, \dots σ_{ip})}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\sigma }_i^{(i)} = (\sigma _{i1}\text {, }\dots \text {, }\sigma _{i(i-1)}\text {, }\sigma _{i(i+1)}\text {, }\dots \sigma _{ip})^\mathsf{T}$$\end{document} . Observe that the pseudolikelihood is equivalent to Eq. (1) except that it replaces the intractable normalizing constant with a tractable one. Second, eLasso balances structure complexity with the information available from the data at hand using the Lasso (Tibshirani, Reference Tibshirani1996): An $l_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_1$$\end{document} -penalty is stipulated on the pseudolikelihood parameters (i.e., minimize $- ln p^{*} (x_{i} ∣ μ_{i}, σ_{i}^{(i)})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-\ln p^*(x_i \mid \mu _i\text {, }\varvec{\sigma }_i^{(i)})$$\end{document} subject to the constraint $\sum_{j \neq i} | σ_{ij} | \leq ρ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum _{j \ne i} |\sigma _{ij}| \le \rho $$\end{document} ) to effectively shrink negligible effects to precisely zero. Ravikumar, Wainwright, and Lafferty (Reference Ravikumar, Wainwright and Lafferty2010) showed that the pseudolikelihood in combination with Lasso can consistently uncover the true topology (see also Meinshausen & Bühlmann, Reference Meinshausen and hlmann2006). Third, eLasso selects the structure that optimizes the parameters subject to the $l_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_1$$\end{document} constraint, which is specified up to its tuning parameter $ρ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho $$\end{document} . It performs the optimization for a range of values for the tuning parameter and then selects the value that minimizes an extended Bayesian information criterion (Barber & Drton, Reference Barber and Drton2015; Chen & Chen, Reference Chen and Chen2008). Thus, structure selection with eLasso is analogous to selecting the tuning parameter. This combination of methods allows eLasso to efficiently perform structure selection for the Ising model, which is why it has become widely popular in psychometric practice.

We, however, have two concerns with frequentist regularization methods for estimating the Ising model, such as those used by eLasso. Our first concern is that traditional, frequentist approaches cannot express the uncertainty associated with a selected structure, and thus do not inform us about other structures that might be plausible for the data at hand. A structure’s plausibility is disclosed in its posterior probability. To compute posterior probabilities, we have to entertain multiple structures and take their prior plausibility into account. But eLasso searches for a single optimal structure instead. Our second concern is that eLasso does not articulate the precision of the parameters it estimates. Standard expressions for parameter uncertainty are unavailable for Lasso estimation (Tibshirani, Reference Tibshirani1996), since the limiting distribution of the Lasso estimator is non-Gaussian with a point mass at zero (e.g., Knight & Fu, Reference Knight and Fu2000; Pötscher & Leeb, Reference Pötscher and Leeb2009). Basic solutions such as the bootstrap, although frequently used (see, for instance, Epskamp, Borsboom, & Fried,Reference Epskamp, Borsboom and Fried2018; Tibshirani, Reference Tibshirani1996), can therefore not be used to obtain confidence intervals or standard errors (e.g., Bühlmann, Kalisch, and Meier, Reference Bühlmann, Kalisch and Meier2014, Section 3.1; Pötscher and Leeb, Reference Pötscher and Leeb2009, Williams, Reference Williams2021). Bayesian formulations of the Lasso offer a more natural framework for uncertainty quantification (Kyung, Gill, Ghosh, & Casella, Reference Kyung, Gill, Ghosh and Casella2010; Park & Casella, Reference Park and Casella2008; van Erp, Oberski, & Mulder, Reference van Erp, Oberski and Mulder2019), but approximate confidence intervals/standard errors could also be obtained by desparsifying the Lasso (Bühlmann et al., Reference Bühlmann, Kalisch and Meier2014; van de Geer, Bülmann, Ritov, & Dezeure, Reference van de Geer, Bülmann, Ritov and Dezeure2014).

In light of these concerns, our goals are threefold. Our primary goal is to introduce a new Bayesian approach for learning the topology of Ising models. Bayesian approaches to model selection often introduce binary indicators $γ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document} for the selection of variables in the model (e.g., George & McCulloch, Reference George and McCulloch1993; O‘Hara & Sillanpää, Reference O’Hara and Sillanpää2009). We will use these indicators here to model edge selection: If the indicator $γ_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{ij}$$\end{document} equals one, the edge between variables i and j is included. Otherwise, the edge is excluded. A structure s is then a specific configuration of a vector of $(\frac{p}{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${p\atopwithdelims ()2}$$\end{document} indicator variables $γ_{s}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\gamma }_s$$\end{document} , and the collection of network structures is equal to

S = {0, 1}^{(\frac{p}{2})} .

We wish to estimate the posterior structure probabilities $p (γ ∣ x)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\varvec{\gamma } \mid \mathbf {x})$$\end{document} , since they convey all the information that is available on the structures $γ \in S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\gamma } \in \mathcal {S}$$\end{document} and can be used to express the plausibility of a particular structure or the inclusion of a specific edge for the data at hand. To unlock these Bayesian benefits (see Marsman & Wagenmakers, Reference Marsman and Wagenmakers2017; Wagenmakers, Marsman, et al., Reference Wagenmakers, Marsman, Jamil, Ly, Verhagen, Love and Morey2018, for detailed examples), we have to connect the indicator variables to the selection problem at hand.

Our secondary goal is to formulate a continuous spike-and-slab approach, initially proposed by George and McCulloch (Reference George and McCulloch1993) for covariate selection in regression models, for edge selection in Ising networks. In this approach, the binary indicators are used to hierarchically model the prior distributions of focal parameters by assigning zero-centered diffuse priors to effects that should be included and priors that are sharply peaked about zero to negligible effects. These continuous spike-and-slab components are usually Gaussian (e.g., George & McCulloch, Reference George and McCulloch1993; Ročková & George, Reference Ročková and George2014) or Laplace distributions (e.g., Ročková, Reference Ročková2018; Ročková & George, Reference Ročková and George2018). Even though the Laplace distribution generates a Bayesian Lasso (Park & Casella, Reference Park and Casella2008), its drawback is that its posterior distribution is difficult to approximate using computational tools other than simulation. We therefore adopt Gaussian spike-and-slab components in our edge selection approach.

Our tertiary goal is to analyze the full or joint pseudolikelihood in Eq. (2) instead of analyzing the full-conditionals in isolation. Analyzing the full-conditionals in isolation is common practice since it is fast. However, it leads to two potentially divergent parameter estimates for the associations and does not provide a coherent procedure for quantifying parameter uncertainty. By analyzing the joint pseudolikelihood, we can formulate a single prior distribution for the focal parameters to obtain a single posterior distribution that we can analyze in a meaningful way. The disadvantage of using the joint pseudolikelihood is its increased computational expense for some numerical procedures and the inability to analyze the full-conditionals in parallel. However, this increase in computational expense is negligible for the network sizes typically encountered in psychological applications.

The continuous spike-and-slab approach to select a network’s topology poses three critical challenges that we address in this paper. The first challenge that we address is the consistency of the structure selection procedure. In a recent analysis of covariate selection in linear regression, Narisetty and He (Reference Narisetty and He2014) showed that the continuous spike-and-slab approach is inconsistent if the hyperparameters are not correctly scaled. We extend this observation to the current structure selection problemFootnote 1 and prove that a correct scaling of the hyperparameters leads to a consistent structure selection approach in an embedding with p fixed, n increasing. The second challenge that we address is the specification of tuning parameters. The effectiveness of the continuous spike-and-slab setup crucially depends on their specification. Unfortunately, objective methods to specify these parameters are currently unavailable, and tuning them is difficult and context dependent (e.g., George & McCulloch, Reference George and McCulloch1997; O’Hara & Sillanpää, Reference O’Hara and Sillanpää2009). To overcome this issue, we develop a new procedure to automatically set the tuning parameters in such a way that we achieve a high specificity. The final challenge that we address is the practical exploration of the structure space $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {S}$$\end{document} . Even for relatively small networks, the structure space $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {S}$$\end{document} can be vast, and exploring it poses a significant challenge. Moreover, even the most plausible structures have relatively small posterior probabilities and many similar structures exist (George, Reference George, Bernardo and Berger1999). As a result, valuable computational effort is wasted on relatively uninteresting structures and it is difficult to estimate their probabilities with reasonable precision. To overcome this issue, we propose a novel, two-step approach. We first employ a deterministic estimation approach (Ročková & George, Reference Ročková and George2014), utilizing an expectation-maximization (EM; Dempster, Laird, & Rubin, Reference Dempster, Laird and Rubin1977) variant of the continuous spike-and-slab approach to screen for a subset of promising edges. We then use a stochastic estimation approach (George & McCulloch, Reference George and McCulloch1993), utilizing a Gibbs sampling (Geman & Geman, Reference Geman and Geman1984) variant to explore the structure space instantiated by these promising edges. In sum, we propose a coherent Bayesian methodology for structure selection for the Ising model. The freely available R package rbinnet implements the proposed methods.Footnote 2

The remainder of this paper is organized as follows. After this introduction, we first specify our Bayesian model, i.e., we discuss the pseudolikelihood and prior setup. Then, we analyze the consistency of our spike-and-slab approach for structure selection and show that it is consistent if suitably scaled. We wrap up the blueprint of our Bayesian model with the objective specification of hyperparameters for our spike-and-slab setup. We then present an EM and a Gibbs implementation of our Bayesian structure selection setup used for edge screening and structure selection, respectively. In our suite of Bayesian tools, edge screening most closely resembles eLasso, and we will compare the performance of these two methods in a series of simulations. Finally, we present a full analysis of data on alcohol abuse and major depressive disorders from the National Survey on Drug Use and Health. As far as we know, these two disorders have not been analyzed on a symptom level together in a network approach.

1. Bayesian Model Specification

The setup of any Bayesian model comprises two parts: The likelihood of the model’s parameters and their prior distributions. We start with the likelihood dictated by the Ising model, and the pseudolikelihood approach that we adopt to circumvent the computational intractability of the full Ising model. We follow-up with the specification of prior distributions for the Ising model’s parameters, tying George and McCulloch’s (Reference George and McCulloch1993) continuous spike-and-slab prior setup for edge selection.

1.1. The Ising Model Pseudolikelihood

In this paper, we will adopt the pseudolikelihood approach of Besag (Reference Besag1975), as presented in Eq. (2). We will furthermore assume that the observations are independent and identically distributed, such that the full pseudolikelihood becomes

\begin{matrix} p^{*} (X ∣ μ, Σ) = \prod_{v = 1}^{n} p^{*} (x_{v} ∣ μ, Σ), \end{matrix}

where $X = {(x_{1}^{T}, \dots, x_{n}^{T})}^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {X} = (\mathbf {x}_1^\mathsf{T}\text {, }\dots \text {, }\mathbf {x}^\mathsf{T}_n)^\mathsf{T}$$\end{document} , and we have adopted v to index the n independent and identically distributed observations. Both maximum pseudolikelihood and Bayesian pseudoposterior estimates are consistent as n increases (e.g., Arnold & Strauss, Reference Arnold and Strauss1991; Geys, Molenberghs, & Ryan, Reference Geys, Molenberghs and Ryan2007; Miller, Reference Miller2019) and can consistently uncover the unknown graph structure of the full Ising model (Barber & Drton, Reference Barber and Drton2015; Csiszár & Talata, Reference Meinshausen and hlmann2006; Meinshausen & Bühlmann, Reference Csiszár and Talata2006; Ravikumar et al., Reference Ravikumar, Wainwright and Lafferty2010). As a result, the pseudolikelihood has become an indispensable tool in the structure selection of Ising models.

1.2. The Continuous Spike-and-Slab Prior Setup and Its Relation to Other Approaches

There are several ways to bring the indicator variables into our Bayesian model (e.g., Dellaportas, Forster, & Ntzoufras, Reference Dellaportas, Forster and Ntzoufras2002; George & McCulloch, Reference George and McCulloch1993; Kuo & Mallick, Reference Kuo and Mallick1998). O’Hara and Sillanpää Reference O’Hara and Sillanpää2009 and Consonni, Fouskakis, Liseo, and Ntzoufras Consonni et al. (Reference Consonni, Fouskakis, Liseo and Ntzoufras2018) provide two recent overviews. One interesting approach was recently proposed by Pensar, Nyman, Niiranen, and Corander Pensar et al. (Reference Pensar, Nyman, Niiranen and Corander2017), which essentially uses the indicator variables to draw a Markov blanket in the full-conditional distributions of the Ising model and then, construct a marginal pseudolikelihood to select the network’s structure. A key aspect of their approach is that they formulated a Bayesian model on the individual pseudolikelihoods rather than the model’s parameters, and, using a few simplifying assumptions, they were able to derive analytic expressions for the marginal pseudolikelihoods. Unfortunately, this also required treating the pairwise associations as nuisance parameters. As a result, inference on the model’s parameters remains out of reach, and, in addition, it is unclear how the priors on the pseudolikelihoods translate to the model’s parameters. We will take a different route, but a numerical comparison between our approach and that of Pensar et al. Pensar et al. (Reference Pensar, Nyman, Niiranen and Corander2017)—implemented in the R package |BDgraph| (R. Mohammadi & Wit, Reference Mohammadi and Wit2019)—can be found in the online appendix.

In this paper, we adopt the continuous spike and slab approach, which comprises two parts. First, a mixture of two zero-centered normal distributions is imposed on the focal parameters. Here, the focal parameters are the pairwise associations $σ_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{ij}$$\end{document} . The indicator variables are then used to distinguish between the two mixture components, and thus, the prior distribution on the focal parameters becomes

\begin{matrix} σ_{ij} ∣ γ_{ij} \sim (1 - γ_{ij}) N (0, ν_{0}) + γ_{ij} N (0, ν_{1}), \end{matrix}

where $N (0, ν)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {N}(0\text {, }\nu )$$\end{document} denotes the normal distribution with a mean equal to zero and a variance equal to $ν$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu $$\end{document} . A small but positive variance $ν_{0} > 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _0 > 0$$\end{document} is assigned to the component that is associated with $γ_{ij} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{ij} = 0$$\end{document} to encourage the exclusion of negligible nonzero values, and a large variance $ν_{1} > > ν_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _1>> \nu _0$$\end{document} is assigned to the component associated with $γ_{ij} = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{ij}=1$$\end{document} to accommodate all plausible values of the interaction. The continuous spike-and-slab approach is a computationally convenient alternative to the discontinuous spike-and-slab approach that is common in model selection.

In the discontinuous spike-and-slab approach, the continuous spike distribution is replaced with a Dirac delta measure at zero. In other words, the association is set to zero for structures in which the relation is absent. The discontinuous spike-and-slab setup is popular in structure selection for Gaussian graphical models (GGMs) (GGMs; e.g., Carvalho & Scott, Reference Carvalho and Scott2009; A. Mohammadi & Wit, Reference Mohammadi and Wit2015) and generalizations such as the copula GGM for binary and categorical variables (e.g., Dobra & Lenkoski, Reference Dobra and Lenkoski2011) and the multivariate probit model for binary variables (e.g., Talhouk, Doucet, & Murphy, Reference Talhouk, Doucet and Murphy2012)—variants of which are also implemented in the R packages |BDgraph| (R. Mohammadi & Wit, Reference Mohammadi and Wit2019) and |BGGM| (Williams & Mulder, Reference Williams and Mulder2020b). For the GGM and its generalizations, the slab priors are assigned to the inverse-covariance or precision matrix (i.e., the matrix of partial correlations) and thus, often use Wishart-type priors rather than the normal distribution that we propose for the Ising model’s associations.

The upside of using discontinuous over continuous spike-and-slab priors is that one only needs to consider the slab prior specification and that structure selection consistency is more easily attained. The downside, however, is that for models such as the Ising model, we run into severe computational challenges. The EM and Gibbs solutions that we advocate in this paper would not work for the Ising model if we would use the discontinuous spike-and-slab setup. The primary reason for this is that one cannot analytically integrate out the focal parameters for updating the edge indicators. Pensar et al. (Reference Pensar, Nyman, Niiranen and Corander2017) were able to derive their analytic solutions by stipulating the Ising model’s pseudolikelihood as the focal parameter and assuming orthogonality between different full-conditions. The continuous spike-and-slab approach proposed in this paper does not require an analytic integration of effects from the likelihood and is thus opportune to use in combination with the Ising model. Wang (Reference Wang2015) also applied it to edge-selection for the GGM, which is implemented in the R package |ssgraph| (R. Mohammadi, Reference Mohammadi2020).

The second part of our spike-and-slab approach is the specification of a prior distribution on the selection variables. Here, the selection variables are a priori modeled as i.i.d. Bernoulli $(θ)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\theta )$$\end{document} variables, which implies the following prior distribution on the structures $γ_{s}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\gamma }_s$$\end{document} ,

(3)

\begin{matrix} p (γ_{s}) = θ^{γ_{s + +}} {(1 - θ)}^{(\frac{p}{2}) - γ_{s + +}}, \end{matrix}

where $γ_{s + +} = \sum_{i = 1}^{p - 1} \sum_{j = i + 1}^{p} γ_{sij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{s++} = \sum _{i=1}^{p-1}\sum _{j=i+1}^p\gamma _{sij}$$\end{document} , with $γ_{ij} = γ_{ji}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{ij} = \gamma _{ji}$$\end{document} . Once the hyperparameters $ν_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _0$$\end{document} , $ν_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _1$$\end{document} and $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} are set, and the nuisance parameters are assigned a prior distribution, the posterior structure probabilities can then be estimated using, for example, a Gibbs sampler (Geman & Geman, Reference Geman and Geman1984; George & McCulloch, Reference George and McCulloch1993). We stipulate independent standard-normal prior distributions on the nuisance parameters $μ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\mu }$$\end{document} and make the objective specification of hyperparameters the topic of the ensuing sections.

2. Structure Selection Consistency

In this section, we analyze posterior selection consistency, the ability of our method to determine the correct network structure consistently. As alluded to in the introduction, selection consistency using George and McCulloch’s spike and slab approach crucially depends on the hyperparameters $ν_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _0$$\end{document} and $ν_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _1$$\end{document} . Unfortunately, fixing these parameters does not guarantee that our structure selection procedure is consistent. Narisetty and He (Reference Narisetty and He2014) showed that the use of fixed constants may lead to an inconsistent selection procedure in the context of linear regression. Below, we will demonstrate that this is also the case in the context of structure selection for Ising models. However, we will also show that our selection approach is consistent if the spike variance $ν_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _0$$\end{document} shrinks as a function of n. Footnote 3 Narisetty and He presented a similar result for linear regression.

We first work out the concepts relevant for selection consistency, such as the posterior structure probability, and derive an approximate Bayes factor that is useful for the large-sample analysis. Then, we analyze the case with fixed hyperparameters and show that the selection procedure is inconsistent for fixed p, increasing n. Finally, we analyze the situation where the spike variance shrinks with n and show that this shrinking hyperparameter setup leads to a consistent selection procedure for fixed p, increasing n.

2.1. Selection Consistency

We assume that the true structure t is in the set $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {S}$$\end{document} .Footnote 4 We quantify our uncertainty in selecting a structure s, $s \in S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s\in \mathcal {S}$$\end{document} , using the posterior structure probability

\begin{matrix} p (γ_{s} ∣ X) = \frac{p^{*} (X ∣ γ_{s}) p (γ_{s})}{\sum_{s \in S} p^{*} (X ∣ γ_{s}) p (γ_{s})} = \frac{{BF}_{st}^{*} o_{st}}{1 + \sum_{u \in S_{\ t}} {BF}_{ut}^{*} o_{ut}}, \end{matrix}

where $p^{*} (X ∣ γ_{s})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p^*(\mathbf {X} \mid \varvec{\gamma }_s)$$\end{document} denotes the integrated pseudolikelihood for the structure s, ${BF}_{st}^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {BF}^*_{st}$$\end{document} the Bayes factor pitting structure s against the correct structure t, and $o_{st}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {o}_{st}$$\end{document} denotes the prior model odds of the two structures. Selection consistency requires us to show that the posterior structure probabilities $p (γ_{s} ∣ X)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\varvec{\gamma }_s \mid \mathbf {X})$$\end{document} tend to zero for structures $s \neq t$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s \ne t$$\end{document} , and that $p (γ_{t} ∣ X)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\varvec{\gamma }_t \mid \mathbf {X})$$\end{document} tends to one as the sample size grows. This is equivalent to showing that the Bayes factors ${BF}_{st}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {BF}_{st}$$\end{document} tend to zero for structures $s \neq t$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s \ne t$$\end{document} . Unfortunately, analytic expressions for the Bayes factors are currently unavailable. To come to a workable expression for the Bayes factor, we first redefine it in terms of the expected prior odds under the correct posterior distribution

\begin{matrix} {BF}_{st}^{*} & = \frac{p^{*} (X ∣ γ_{s})}{p^{*} (X ∣ γ_{t})} \\ = \frac{\int \int p^{*} (X ∣ μ, Σ) p (μ) p (Σ ∣ γ_{s}) d Σ d μ}{\int \int p^{*} (X ∣ μ, Σ) p (μ) p (Σ ∣ γ_{t}) d Σ d μ} \\ = \int \int \frac{p^{*} (X ∣ μ, Σ) p (μ) p (Σ ∣ γ_{s})}{p^{*} (X ∣ μ, Σ) p (μ) p (Σ ∣ γ_{t})} \\ . \times \frac{p^{*} (X ∣ μ, Σ) p (μ) p (Σ ∣ γ_{t})}{\int \int p^{*} (X ∣ μ, Σ) p (μ) p (Σ ∣ γ_{t}) d Σ d μ} d Σ d μ \\ = E^{*} ((\frac{p (Σ ∣ γ_{s})}{p (Σ ∣ γ_{t})}), X, ,, γ_{t}) \\ = E^{*} ((\prod_{i = 1}^{p - 1}, \prod_{j = i + 1}^{p}, \frac{p (σ_{ij} ∣ γ_{sij})}{p (σ_{ij} ∣ γ_{tij})}), X, ,, γ_{t}), \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \text {BF}_{st}^*&= \frac{p^*(\mathbf {X} \mid \varvec{\gamma }_s)}{p^*(\mathbf {X} \mid \varvec{\gamma }_t)}\\&= \frac{\int \int p^*(\mathbf {X} \mid \varvec{\mu }\text {, }\varvec{\Sigma })\, p(\varvec{\mu })\, p(\varvec{\Sigma }\mid \varvec{\gamma }_s) \,\text {d}\varvec{\Sigma }\,\text {d}\varvec{\mu }}{\int \int p^*(\mathbf {X} \mid \varvec{\mu }\text {, }\varvec{\Sigma })\, p(\varvec{\mu })\, p(\varvec{\Sigma }\mid \varvec{\gamma }_t) \,\text {d}\varvec{\Sigma }\,\text {d}\varvec{\mu }}\\&= \int \int \frac{p^*(\mathbf {X} \mid \varvec{\mu }\text {, }\varvec{\Sigma })\, p(\varvec{\mu })\, p(\varvec{\Sigma }\mid \varvec{\gamma }_s)}{p^*(\mathbf {X} \mid \varvec{\mu }\text {, }\varvec{\Sigma })\, p(\varvec{\mu })\, p(\varvec{\Sigma }\mid \varvec{\gamma }_t)}\\&\text {. }\times \frac{ p^*(\mathbf {X} \mid \varvec{\mu }\text {, }\varvec{\Sigma })\, p(\varvec{\mu })\, p(\varvec{\Sigma }\mid \varvec{\gamma }_t) }{\int \int p^*(\mathbf {X} \mid \varvec{\mu }\text {, }\varvec{\Sigma })\, p(\varvec{\mu })\, p(\varvec{\Sigma }\mid \varvec{\gamma }_t) \,\text {d}\varvec{\Sigma }\,\text {d}\varvec{\mu }}\,\text {d}\varvec{\Sigma }\,\text {d}\varvec{\mu }\\&= \mathbb {E}^*\left( \left. \frac{p(\varvec{\Sigma }\mid \varvec{\gamma }_s)}{p(\varvec{\Sigma }\mid \varvec{\gamma }_t)}\text { }\right| \text { }\mathbf {X}\text {, }\varvec{\gamma }_t\right) \\&= \mathbb {E}^*\left( \left. \prod _{i=1}^{p-1}\prod _{j=i+1}^p \frac{p({\sigma }_{ij} \mid \gamma _{sij})}{p({\sigma }_{ij} \mid \gamma _{tij})} \text { }\right| \text { }\mathbf {X}\text {, }\varvec{\gamma }_t\right) , \end{aligned}$$\end{document}

which is the posterior expectation of the ratio of the prior distributions of $Σ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Sigma }$$\end{document} for the two models, s and t, under the correct structure specification $γ_{t}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\gamma }_t$$\end{document} . This is a convenient representation, as we only have to consider the pseudoposterior distribution under the correct network structure. Observe that this representation also holds when the full Ising likelihood is used, except that in the latter case the Bayes factor ${BF}_{st}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {BF}_{st}$$\end{document} is expressed as the expected prior odds w.r.t the posterior distribution and not the pseudoposterior distribution. For a fixed network of p variables, the posterior distribution can be accurately approximated with a normal distribution as n becomes large (see, for instance, Miller, Reference Miller2019, Theorem 6.2), and the same holds for the pseudoposterior distribution (see, for instance, Miller, Reference Miller2019, Theorems 3.2 and 7.3). To come to a workable expression of the Bayes factor we approximate the pseudoposterior with a normal distribution (i.e., a Laplace approximation), which leads to the following first-order approximation of the Bayes factor (Tierney, Kass, & Kadane, Reference Tierney, Kass and Kadane1989, Eq. 2.6),

(4)

\begin{matrix} {BF}_{st}^{*} & = \prod_{i = 1}^{p - 1} \prod_{j = i + 1}^{p} \frac{p ({\hat{σ}}_{ij} ∣ γ_{sij})}{p ({\hat{σ}}_{ij} ∣ γ_{tij})} (1 + O (n^{- 1})) \\ \approx \prod_{i = 1}^{p - 1} \prod_{j = i + 1}^{p} \frac{p ({\hat{σ}}_{ij} ∣ γ_{sij})}{p ({\hat{σ}}_{ij} ∣ γ_{tij})} \\ = \prod_{i = 1}^{p - 1} \prod_{j = i + 1}^{p} {(\sqrt{\frac{ν_{0}}{ν_{1}}} exp ({\hat{σ}}_{ij}^{2}, \frac{ν_{1} - ν_{0}}{2 ν_{1} ν_{0}}))}^{γ_{sij} - γ_{tij}} \end{matrix}

where $\hat{Σ} = [{\hat{σ}}_{ij}]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\varvec{\Sigma }} = [\hat{\sigma }_{ij}]$$\end{document} is the mode of $p^{*} (Σ, μ ∣ X, γ_{t})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p^*(\varvec{\Sigma }\text {, }\varvec{\mu }\mid \mathbf {X}\text {, }\varvec{\gamma }_t)$$\end{document} , or $p (Σ, μ ∣ X, γ_{t})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\varvec{\Sigma }\text {, }\varvec{\mu }\mid \mathbf {X}\text {, }\varvec{\gamma }_t)$$\end{document} if the full Ising likelihood is used. Tierney et al. (Reference Tierney, Kass and Kadane1989) show that the error of the first-order approximation—the rest term $O (n^{- 1})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {O}(n^{-1})$$\end{document} —is of order 1/n. Since the pseudoposterior is consistent (c.f., Miller, Reference Miller2019, Theorem 7.3), the Bayes factor using the pseudolikelihood and the full likelihood will converge to the same number.

We will show next that the approximate Bayes factors ${BF}_{st}^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {BF}^*_{st}$$\end{document} , for $s \neq t$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s \ne t$$\end{document} , do not shrink to zero with the three hyperparameters fixed, but do shrink to zero if $ν_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _0$$\end{document} shrinks to zero at a rate $n^{- 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n^{-1}$$\end{document} . The approximate Bayes factor comprises a product of the edge specific functions

\begin{matrix} f_{ij} = {(\sqrt{\frac{ν_{0}}{ν_{1}}} exp ({\hat{σ}}_{ij}^{2}, \frac{ν_{1} - ν_{0}}{2 ν_{1} ν_{0}}))}^{γ_{s, i j} - γ_{t, i j}} \geq 0 \end{matrix}

which consists of two parts: The selection variables $γ_{s, i j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{s\text {, }ij}$$\end{document} and $γ_{t, i j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{t\text {, }ij}$$\end{document} that inform about the differences in edge composition of structures s and t, and the function $\sqrt{ν_{0} / ν_{1}} exp ({\hat{σ}}_{ij} (ν_{1} - ν_{0}) / 2 ν_{1} ν_{0})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{\nu _0/\nu _1}\exp \left( \hat{\sigma }_{ij}(\nu _1-\nu _0)/2\nu _1\nu _0\right) $$\end{document} that weighs in the contribution of the pseudoposterior. The edge specific function $f_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_{ij}$$\end{document} is equal to one if the edge is present in both structures, or is absent from both structures, since then $γ_{t, i j} - γ_{s, i j} = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{t\text {, }ij} - \gamma _{s\text {, }ij}=0$$\end{document} . We therefore only have to consider what happens to the function $f_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_{ij}$$\end{document} for cases where $γ_{s, i j} \neq γ_{t, i j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{s\text {, }ij} \ne \gamma _{t\text {, }ij}$$\end{document} .

2.1.1. The Fixed Hyperparameter Case

If $γ_{t, i j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{t\text {, }ij}$$\end{document} is equal to zero, and $γ_{s, i j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{s\text {, }ij}$$\end{document} is equal to one, the correct value for the interaction parameter $σ_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{ij}$$\end{document} is zero, and we observe that

\begin{matrix} f_{ij} \overset{n}{⟶} \sqrt{\frac{ν_{0}}{ν_{1}}}, \end{matrix}

which, even though it is smaller than one and signals a preference for structure t, does not converge to zero as it should if the structure selection procedure would be consistent. If $γ_{t, i j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{t\text {, }ij}$$\end{document} is equal to one, and $γ_{s, i j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{s\text {, }ij}$$\end{document} is equal to zero, on the other hand, such that $| σ_{ij} | > 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|\sigma _{ij}|>0$$\end{document} , we observe that

\begin{matrix} f_{ij} \overset{n}{⟶} \sqrt{\frac{ν_{1}}{ν_{0}}} exp (- σ_{ij}^{2} \frac{ν_{1} - ν_{0}}{2 ν_{1} ν_{0}}), \end{matrix}

which does not converge to zero either. In fact, it may even signal a preference for the absence of the edge in structure s. These two observations indicate that the Bayes factors ${BF}_{st}^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {BF}^*_{st}$$\end{document} do not converge to zero, and thus, the posterior probability $p (γ_{t} ∣ X)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\varvec{\gamma }_t \mid \mathbf {X})$$\end{document} does not converge to one. In sum, the proposed structure selection procedure is inconsistent in the case that the three hyperparameters are fixed.

2.1.2. The Shrinking Hyperparameter Case

We next consider the case where $ν_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _0$$\end{document} shrinks at a rate $n^{- 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n^{-1}$$\end{document} and define $ν_{0} = \frac{ν_{1} ξ}{n}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _0 = \tfrac{\nu _1 \xi }{n}$$\end{document} . Here, $ξ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi $$\end{document} is a fixed (positive) penalty parameter that allows us some flexibility to emphasize the distinction between the spike and slab components. If, in this case, $γ_{t, i j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{t\text {, }ij}$$\end{document} is equal to one and $γ_{s, i j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{s\text {, }ij}$$\end{document} is equal to zero, the function $f_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_{ij}$$\end{document} is equal to

\begin{matrix} f_{ij} = \sqrt{\frac{n}{ξ}} exp (- {\hat{σ}}_{ij}^{2} \frac{n - ξ}{2 ν_{1} ξ}) = exp (\frac{1}{2} log (\frac{n}{ξ}) - {\hat{σ}}_{ij}^{2} \frac{n - ξ}{2 ν_{1} ξ}), \end{matrix}

where the first factor tends to infinity, and the second factor tends to zero. Because the second factor tends to zero faster than the first factor tends to infinity, their product, again, tends to zero, as it should. On the other hand, if $γ_{t, i j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{t\text {, }ij}$$\end{document} is equal to zero, the function $f_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_{ij}$$\end{document} becomes

\begin{matrix} \sqrt{\frac{ξ}{n}} exp ({\hat{σ}}_{ij}^{2}, \frac{n - ξ}{2 ν_{1} ξ}), \end{matrix}

where the first factor tends to zero, and the second factor tends to one because $\sqrt{n} {\hat{σ}}_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{n}\hat{\sigma }_{ij}$$\end{document} tends to zero ( $\hat{σ} = O_{p} (1 / \sqrt{n})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\sigma } =\mathcal {O}_p(1/\sqrt{n})$$\end{document} ). Therefore, $f_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_{ij}$$\end{document} tends to zero, as it should. In sum, the structure selection procedure is consistent if $ν_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _0$$\end{document} shrinks at a rate of $n^{- 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n^{-1}$$\end{document} .

3. Objective Prior Specification

We follow the results in the previous section, and set the spike variance to $ν_{0} = \frac{ν_{1} ξ}{n}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _0 = \tfrac{\nu _1\xi }{n}$$\end{document} , which leaves the specification of the slab variance $ν_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _1$$\end{document} , the penalty parameter $ξ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi $$\end{document} , and the prior inclusion probability to complete our Bayesian model blueprint. We first discuss a default setting for the spike and slab variances, i.e., the specification of $ν_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _1$$\end{document} and $ξ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi $$\end{document} . We then discuss two options for the prior inclusion probabilities that we adopt in this paper.

3.1. Specification of the Spike and Slab Variances

One approach to find default values for the slab variance is to set it equal to n times the inverse of the Fisher information matrix $I_{Σ} {(\hat{Σ}, \hat{μ})}^{- 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {I}_{\Sigma }(\hat{\varvec{\Sigma }}\text {, }\hat{\varvec{\mu }})^{-1}$$\end{document} , which approximately gives the information about $σ_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{ij}$$\end{document} in a single observation, hence the name unit information (Kass & Wasserman, Reference Kass and Wasserman1995). Kass and Wasserman Reference Kass and Wasserman1995 showed that the logarithm of the Bayes factor—pitting one network structure against another—is approximately equal to the difference in Bayesian information criteria (BIC; Schwarz, Reference Schwarz1978) of the two structures when we use unit information priors (see also, Raftery, Reference Raftery1999; Wagenmakers, Reference Wagenmakers2007, for details). This result, combined with the fact that unit information priors can be automatically selected, makes them a popular approach in Bayesian variable selection. We follow the approach of Ntzoufras (Reference Ntzoufras2009), who achieved good results by setting the off-diagonal elements of $I^{- 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {I}^{-1}$$\end{document} to zero in the prior specification. This renders the spike-and-slab prior densities independent, and sets the slab variance to $ν_{1, i j} = n Var ({\hat{σ}}_{ij})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _{1\text {, }ij} = n \text {Var}(\hat{\sigma }_{ij})$$\end{document} .Footnote 5 If we set the slab variance equal to the unit information, the spike variance is equal to $ν_{0, i j} = ξ Var ({\hat{σ}}_{ij})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _{0\text {, }ij}= \xi \, \text {Var}(\hat{\sigma }_{ij})$$\end{document} . Our structure selection procedure will still consistently select the correct structure, since $ν_{0, i j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _{0\text {, }ij}$$\end{document} shrinks with rate n because $Var ({\hat{σ}}_{ij})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {Var}(\hat{\sigma }_{ij})$$\end{document} does (e.g., Miller, Reference Miller2019, Section 5.2).

The spike-and-slab parameters are specified up to the constant $ξ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi $$\end{document} , which acts as a penalty parameter on the inclusion and exclusion of effects in the spike-and-slab prior. Larger values for $ξ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi $$\end{document} increase the overlap between the spike-and-slab components and consequently, make it more likely that an effect is excluded, i.e., ends up in the spike component. It is the opposite case for smaller values. It is thus absolutely crucial to find a good value for this penalty. We wish to specify the tuning parameter $ξ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi $$\end{document} such that the performance of our edge selection approach is similar to that of eLasso. To that aim, we introduce an automated procedure to specify the tuning parameter such that the corresponding continuous spike-and-slab setup is geared towards achieving a high specificity, or low type-1 error, similar to eLasso. The idea that we pursue here is to set the intersection of the spike-and-slab components equal to an approximate credible interval about zero. The left panel in Fig. 1 illustrates the idea.

George and McCulloch (Reference George and McCulloch1993) show that the two densities intersect at

\begin{matrix} | δ | = \sqrt{ν_{1} \frac{log (\frac{ν_{1}}{ν_{0}})}{\frac{ν_{1}}{ν_{0}} - 1}} . \end{matrix}

If we fill in our definitions for the spike and slab variances, the expression for $| δ |$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|\delta |$$\end{document} boils down to

(5)

\begin{matrix} | δ | = \sqrt{n Var ({\hat{σ}}_{ij}) \frac{log (\frac{n}{ξ})}{\frac{n}{ξ} - 1}}, \end{matrix}

Where George and McCulloch (Reference George and McCulloch1993) discuss the subjective specification of $δ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta $$\end{document} , we explore its automatic specification by matching it to the approximate credible interval. We first determine the range of parameter values $(- | δ |, | δ |)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(-|\delta |\text {, }|\delta |)$$\end{document} considered to be insignificant, and then select the value of $ξ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi $$\end{document} such that the spike and slab components intersect at $\pm | δ |$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pm |\delta |$$\end{document} . When n is sufficiently large, the pseudoposterior distribution of an association parameter $σ_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\sigma }_{ij}$$\end{document} is approximately normal (Miller, Reference Miller2019), and $Var ({\hat{σ}}_{ij})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {Var}(\hat{\sigma }_{ij})$$\end{document} is its approximate variance. Thus, $({\hat{σ}}_{ij} \pm 3 \sqrt{Var ({\hat{σ}}_{ij})})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\hat{\sigma }_{ij}\,\pm \,3\sqrt{\text {Var}(\hat{\sigma }_{ij})})$$\end{document} offers an approximate $99, 7 %$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$99,7\%$$\end{document} credible interval about the posterior mean ${\hat{σ}}_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\sigma }_{ij}$$\end{document} . To set the variance of the spike distribution for negligible effects, it is opportune to use the interval $(\pm 3 \sqrt{Var ({\hat{σ}}_{ij})})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\pm \,3\sqrt{\text {Var}(\hat{\sigma }_{ij})})$$\end{document} , which offers an approximate credible interval about zero, i.e., the credible interval assuming that the edge i–j should, in fact, be excluded from the model. Equating the expression for $| δ |$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|\delta |$$\end{document} on the right side of Eq. (5) with $3 \sqrt{Var ({\hat{σ}}_{ij})}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$3\sqrt{\text {Var}(\hat{\sigma }_{ij})}$$\end{document} gives:

(6)

\begin{matrix} \sqrt{n \frac{log (\frac{n}{ξ})}{\frac{n}{ξ} - 1}} = 3, \end{matrix}

which we can solve numerically to obtain a value for $ξ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi $$\end{document} . Observe that, by specifying $δ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta $$\end{document} in this particular way, the penalty parameter $ξ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi $$\end{document} depends on the sample size but not the data or network’s size. We denote the value of the penalty parameter that matches the intersection $δ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta $$\end{document} to the credible interval with $ξ_{δ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi _\delta $$\end{document} . The relation between $ξ_{δ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi _\delta $$\end{document} and sample size is illustrated in the right panel of Fig. 1.

Figure. 1 The left panel illustrates the spike-and-slab prior distribution and it’s intersection point $δ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta $$\end{document} . The right panel illustrates the relationship between n and the $ξ_{δ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi _\delta $$\end{document} value equating the intersection points $\pm δ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pm \delta $$\end{document} to three different credible intervals.

3.2. Specification of the Prior Inclusion Probability

Assuming that the correct structure is in $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {S}$$\end{document} , i.e., the $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {S}$$\end{document} -closed view of structure selection, a default choice to express ignorance or indifference between the structures in $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {S}$$\end{document} is to stipulate a uniform prior distribution over the topologies in $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {S}$$\end{document} :

\begin{matrix} p (γ_{s}) = \frac{1}{| S |}, for γ_{s} \in S, \end{matrix}

where $| S |$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|\mathcal {S}|$$\end{document} denotes the cardinality of the structure space. Here, the uniform prior is equal to

\begin{matrix} p (γ_{s}) = 2^{- \frac{1}{2} p (p - 1)}, \end{matrix}

and we can impose this prior on the structure space by fixing the prior inclusion probability $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} in Eq. (3) to $\frac{1}{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tfrac{1}{2}$$\end{document} . However, the uniform prior on the structure space does not take into account structural features of the models under consideration, such as sparsity. Various priors have been proposed as an alternative to accommodate these features (see Consonni et al., Reference Consonni, Fouskakis, Liseo and Ntzoufras2018, Section 3.6, for a detailed discussion). One particular issue inherent in structure comparisons is multiplicity, and Scott and Berger (Reference Scott and Berger2010) argue that the prior distribution should account for this. Consonni et al. (Reference Consonni, Fouskakis, Liseo and Ntzoufras2018) show that stipulating a hyperprior on the prior inclusion probability $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} accounts for multiplicity. In particular, they showed that the uniform hyperprior $Beta (1, 1)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {Beta}(1\text {, }1)$$\end{document} leads to the following prior on the structure space

\begin{matrix} p (γ_{s}) = p (γ_{s} ∣ γ_{s, + +} = c) \times p (γ_{s, + +} = c) = \frac{1}{(\frac{p}{2}) + 1} \times \frac{1}{(\frac{(\frac{p}{2})}{γ_{s, + +}})}, \end{matrix}

where c denotes the complexity of structures, with $c \in (0, 1, \dots, (\frac{p}{2}))$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c \in (0\text {, }1\text {, }\dots \text {, } {p\atopwithdelims ()2})$$\end{document} , i.e., the number of edges in the topology. Thus, instead of a uniform prior on the structure space, the hierarchical prior stipulates a uniform prior on the structure’s complexity. As a result, it favors models that have a relatively extreme level of complexity, e.g., are densely connected or are sparsely connected.

Figure. 2 The left panel illustrates how the two prior distributions assign probabilities to structure complexity. The right panel illustrates how the two prior distributions assign probabilities to different structures with the same complexity. The prior probabilities in the right panel are shown on a log scale. For both panels, $p = 10$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p = 10$$\end{document} .

Figure 2 illustrates the different probabilities that the two distributions assign to structure complexity, a priori, and the probabilities they assign to structures that have the same complexity (shown on a log scale). The left panel of Fig. 2 shows that whereas the hierarchical prior is uniform on the complexity, the uniform prior is not and favors structures that have approximately half of the available edges. However, the right panel of Fig. 2 illustrates that the hierarchical prior emphasizes structures at the extremes of complexity. We will adopt both the uniform prior distribution on the structure space and the uniform prior distribution on the structure’s complexity, and analyze them further in the section on numerical illustrations. Based on Fig. 2, however, we expect that for small samples, the hierarchical prior will place much emphasis on extremely sparse structures, since our penalty selection approach already gears toward sparse solutions.

4. Bayesian Edge Screening and Structure Selection for the Ising model

George and McCulloch (Reference George and McCulloch1993) proposed stochastic search variable selection (SSVS) as a principled approach to Bayesian variable selection. SSVS uses the spike-and-slab prior specification to emphasize the posterior probability of promising structures and Gibbs sampling to extract this information from the data at hand. The Gibbs sampler is a powerful tool for the exploration of the posterior distribution of potential network structures. However, since the structure space $S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {S}$$\end{document} can be quite large in practical settings, it might take a while to sufficiently explore the posterior distribution and produce reliable estimates of the posterior structure probabilities. We, therefore, wish to prune the structure space by selecting the promising edges before running the Gibbs sampler. We explore an EM variable selection approach for this initial edge screening and then, follow-up with an SSVS approach for structure selection on the set of promising edges.

4.1. Edge Screening with EM Variable Selection

Ročková and George (Reference Ročková and George2014) were the first to propose the use of EM for Bayesian variable selection, in combination with the spike-and-slab prior specification of George and McCulloch (Reference George and McCulloch1993), to covariate selection of linear models. The EM algorithm aims to find the posterior mode of the pseudoposterior distribution $p^{*} (Σ, μ, θ ∣ X)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p^*(\varvec{\Sigma }\text {, }\varvec{\mu }\text {, }\theta \mid \mathbf {X})$$\end{document} and does this by iteratively maximizing the “complete data” pseudoposterior distribution $p^{*} (Σ, μ, θ, γ ∣ X)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p^*(\varvec{\Sigma }\text {, }\varvec{\mu }\text {, }\theta \text {, }\varvec{\gamma }\mid \mathbf {X})$$\end{document} , treating the selection variables $γ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\gamma }$$\end{document} as missing or latent variables. The algorithm alternates between two steps. In the expectation or E-step, we compute the expected log-pseudoposterior distribution, or Q-function,

\begin{matrix} Q & ((Σ, ,, μ, ,, θ), Σ^{k}, ,, θ^{k}) = E ((ln p^{*} (Σ, μ, θ, γ ∣ X)), Σ^{k}, ,, θ^{k}), \end{matrix}

with respect to posterior distribution of the latent variables $p (γ ∣ Σ^{k}, θ^{k})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\varvec{\gamma } \mid \varvec{\Sigma }^k\text {, }\theta ^k)$$\end{document} , where $Σ^{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Sigma }^k$$\end{document} , and $θ^{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta ^k$$\end{document} denote the estimates in iteration k. The E-step is followed by a maximization or M-step in which we find the values $Σ^{k + 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Sigma }^{k+1}$$\end{document} , $μ^{k + 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\mu }^{k+1}$$\end{document} and $θ^{k + 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta ^{k+1}$$\end{document} that maximize the Q-function. The two steps are repeated until convergence.

The E-step of the EM algorithm involves expectations of the latent or missing variables, i.e., the vector of selection variables $γ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\gamma }$$\end{document} . Since the latent selection variables only operate in the spike-and-slab prior distributions, the derivation of the E-step will follow the derivation of Ročková and George (Reference Ročková and George2014). For a complete treatment of EMVS, however, we include an analysis of both the E-step and the M-step in “Appendix A”. “Appendix A” also includes details about estimating the (asymptotic) posterior standard deviations from the EM output.

4.1.1. Edge Screening

The EM algorithm that we outlined in the previous section identifies a posterior mode $(\hat{Σ}, \hat{μ}, \hat{θ})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\widehat{\varvec{\Sigma }}\text {, }\hat{\varvec{\mu }}\text {, }\hat{\theta })$$\end{document} , and we threshold the modal estimates to obtain a tightly matching network structure $\hat{γ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\varvec{\gamma }}$$\end{document} . The idea of Ročková and George (Reference Ročková and George2014) that we pursue here is that "large" interaction effect estimates define a set of promising edges, and we can thus prune edges that link to "small" interaction effect estimates. We define the structure $\hat{γ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\varvec{\gamma }}$$\end{document} that closely matches the modal estimates $(\hat{Σ}, \hat{μ}, \hat{θ})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\widehat{\varvec{\Sigma }}\text {, }\hat{\varvec{\mu }}\text {, }\hat{\theta })$$\end{document} to be the most probable structure $γ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\gamma }$$\end{document} given the parameter values $(Σ, μ, θ) = (\hat{Σ}, \hat{μ}, \hat{θ})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\varvec{\Sigma }}\text {, }{\varvec{\mu }}\text {, }{\theta }) = (\widehat{\varvec{\Sigma }}\text {, }\hat{\varvec{\mu }}\text {, }\hat{\theta })$$\end{document} , i.e.,

(7)

\begin{matrix} \hat{γ} = arg max_{γ} p (γ ∣ \hat{Σ}, \hat{θ}) . \end{matrix}

For our Bayesian model, the posterior inclusion probabilities for the different edges are conditionally independent, and the posterior inclusion probability for an edge i–j is given by

\begin{matrix} p (γ_{ij} ∣ {\hat{σ}}_{ij}, \hat{θ}) = \frac{p ({\hat{σ}}_{ij} ∣ γ_{ij}) p (γ_{ij} ∣ \hat{θ})}{\sum_{Γ_{ij} = γ_{ij}} p ({\hat{σ}}_{ij} ∣ γ_{ij}) p (γ_{ij} ∣ \hat{θ})} . \end{matrix}

Thus, we obtain $\hat{γ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\varvec{\gamma }}$$\end{document} from maximizing the inclusion and exclusion probabilities in Eq. (7), for each of the $(\frac{p}{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${p\atopwithdelims ()2}$$\end{document} edges, which means that

\begin{matrix} {\hat{γ}}_{ij} = 1 ⟺ p (γ_{ij} = 1 ∣ {\hat{σ}}_{ij}, \hat{θ}) \geq 0.5, \end{matrix}

and we prune the edges for which $p (γ_{ij} = 0 ∣ {\hat{σ}}_{ij}, \hat{θ}) \geq 0.5$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\gamma _{ij} = 0 \mid \hat{\sigma }_{ij}\text {, }\hat{\theta }) \ge 0.5$$\end{document} . This edge selection and pruning approach leads to a structure $γ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\gamma }$$\end{document} that is a median probability model, as defined by Barbieri and Berger (Reference Barbieri and Berger2004) to be the structure comprising edges that have a posterior inclusion probability at or above a half.Footnote 6 Ročková and George (Reference Ročková and George2014) show that instead of selecting the structure $\hat{γ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\varvec{\gamma }}$$\end{document} based on the posterior inclusion probabilities, we may equivalently select it through thresholding the values of ${\hat{σ}}_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\sigma }_{ij}$$\end{document} . Specifically,

(8)

\begin{matrix} {\hat{γ}}_{ij} = 1 ⟺ | {\hat{σ}}_{ij} | \geq \sqrt{2 log (\frac{1 - \hat{θ}}{\hat{θ}}, \sqrt{\frac{ν_{1}}{ν_{0}}}) \frac{ν_{1} ν_{0}}{ν_{1} - ν_{0}}} = \sqrt{2 log (\frac{1 - \hat{θ}}{\hat{θ}}, \sqrt{\frac{n}{ξ}}) \frac{n Var ({\hat{σ}}_{ij}) ξ}{n - ξ}} . \end{matrix}

Such a connection between the magnitude of the modal estimates ${\hat{σ}}_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\sigma }_{ij}$$\end{document} and promising edges i–j, we envisioned from the beginning. Observe that, since $n Var ({\hat{σ}}_{ij})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\text {Var}(\hat{\sigma }_{ij})$$\end{document} is the unit information, i.e., it is a constant, the right-most factor shrinks with n. Moreover, it shrinks much faster than that $log (\sqrt{n})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\log (\sqrt{n})$$\end{document} tends to infinity, such that the threshold moves to smaller values as n increase, as it should.

4.2. Structure Selection with SSVS

The EMVS approach enables us to screen for a promising set of edges by locating a local posterior mode and pruning edges associated with small modal parameters. The structure $γ^{'}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\gamma }^\prime $$\end{document} that comes out of this pruned edge set is a local median probability structure. We now wish to directly explore $p^{*} (γ ∣ X)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p^*(\varvec{\gamma }\mid \mathbf {X})$$\end{document} , the pseudoposterior distribution of network structures, to find out if $γ^{'}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\gamma }^\prime $$\end{document} is also the global median probability model, and if there are other promising structures for the data at hand. We do this using the stochastic search and variable selection (SSVS) approach of George and McCulloch (Reference George and McCulloch1993), which essentially combines the spike-and-slab prior setup with Gibbs sampling to produce a sequence

\begin{matrix} γ^{(0)}, γ^{(1)}, γ^{(2)}, \dots, \end{matrix}

which converges in distribution to samples from $γ \sim p (γ ∣ X)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\gamma } \sim p(\varvec{\gamma } \mid \mathbf {X})$$\end{document} . We then shift our focus to structures $γ_{s}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\gamma }_s$$\end{document} that occur frequently in the generated sequence, which are the structures that have a high posterior probability. We cut down the potentially large number of network structures that the Gibbs sampler needs to explore by applying SSVS only to the edges screened by EMVS.Footnote 7

The Gibbs sampler operates by iteratively simulating values from the conditional distributions of (a subset of) the model parameters given the (other parameters and the) observed data. Unfortunately for us, the full-conditional distributions of our Bayesian model are not available in closed form, as the normal prior distributions that we have specified are not conjugate to the pseudolikelihood. However, since the pseudolikelihood comprises a sequence of logistic regressions, we can use the data-augmentation strategy that was proposed by Polson, Scott, and Windle (Reference Polson, Scott and Windle2013a) to facilitate a simple Gibbs sampling approach, with full-conditionals that are easy to sample from. A similar approach to the Ising model’s pseudolikelihood was considered by Donner and Opper (Reference Donner and Opper2017). Here, we extend this idea to SSVS for the Ising model.

Polson et al. (Reference Polson, Scott and Windle2013a) proposed an ingenious data augmentation strategy based on the identity

\begin{matrix} \frac{{(e^{ψ})}^{a}}{{(1 + e^{ψ})}^{b}} = \frac{1}{2^{b}} e^{(a - b / 2) ψ} \int_{R^{+}} e^{- \frac{1}{2} ω ψ^{2}} p (ω) d ω, \end{matrix}

where $p (ω)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\omega )$$\end{document} is a Pólya–Gamma distribution. A key aspect of this augmentation strategy is that it relates the logistic function of a parameter $ψ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi $$\end{document} on the left to something that is proportional to a normal distribution on the right. Since our prior distributions are all (conditionally) normal, and the normal distribution is its own conjugate, the data-augmented full-conditionals will all be normal. To wit, applied to the pseudolikelihood in Eq. (2), we find

\begin{matrix} \prod_{i = 1}^{p} p^{*} (x_{i} ∣ x^{(i)}, μ, Σ) \\ = \frac{1}{2^{p}} \prod_{i = 1}^{p} \int_{R_{+}} e^{(x_{i} - 1 / 2) (μ_{i} + \sum_{j \neq i} σ_{ij} x_{j}) - \frac{1}{2} ω_{i} {(μ_{i} + \sum_{j \neq i} σ_{ij} x_{j})}^{2}} p (ω_{i}) d ω_{i}, \end{matrix}

and with normal prior distributions for the pseudolikelihoods parameters we readily find normal full-conditional distributions when we condition on the augmented variables $ω$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\omega }$$\end{document} . Another important aspect of the augmentation strategy is that the conditional distribution of the augmented variables $ω$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\omega $$\end{document} given the pseudolikelihood parameters and the observed data $X$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {X}$$\end{document} is again a Pólya-Gamma distribution. Polson et al. (Reference Polson, Scott and Windle2013a) and Windle, Polson, and Scott (Reference Windle, Polson and Scott2014) provide efficient rejection algorithms to simulate from this distribution.

With the Pólya-Gamma augmentation strategy in place, the Gibbs sampler iterates between five steps, which are detailed in “Appendix C”. The Gibbs output allows us to estimate a number of important quantities. For example, the posterior structure probabilities can be estimated as

\begin{matrix} p (γ_{s} ∣ X) \approx \frac{1}{R} \sum_{r = 1}^{R} I (γ^{(r)} = γ_{s}), \end{matrix}

where $I (\cdot)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I(\cdot )$$\end{document} is an indicator function that is equal to one if its conditions are satisfied and equal to zero otherwise, and the (global) posterior inclusion probabilities as,

\begin{matrix} p (γ_{ij} = 1 ∣ X) \approx \frac{1}{R} \sum_{r = 1}^{R} γ_{ij}^{(r)}, \end{matrix}

were $γ^{(r)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\gamma }^{(r)}$$\end{document} , for $r = 1,, \dots, R$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r = 1, \text {, }\dots \text {, }R$$\end{document} , denotes R iterates of the Gibbs sampler. In a similar way, one can compute quantities related to the model-averaged posterior distribution of the model parameters, e.g.,

\begin{matrix} p (μ, Σ ∣ X) = \sum_{γ_{s} \in S} p (μ, Σ ∣ γ_{s}, X) p (γ_{s} ∣ X), \end{matrix}

or any of its marginals. In sum, the Gibbs sampler grants us the full Bayesian experience.

5. Numerical Illustrations

In this section, we will focus on a comparison of our procedures with eLasso. A comparison between our edge screening and structure selection approaches and the approach of Pensar et al. (Reference Pensar, Nyman, Niiranen and Corander2017)—as implemented in |BDraph| (R. Mohammadi & Wit,Reference Mohammadi and Wit2019)—can be found in the online appendix. We have also included model selection for the multivariate probit model (e.g., Talhouk et al.,Reference Talhouk, Doucet and Murphy2012)—as implemented in |BGGM| (Williams & Mulder,Reference Williams and Mulder2020b)—in that comparison. There were some small variations, but overall the three different approaches performed very similar in terms of edge detection.

5.1. Edge Screening on Simulated Data with Sparse Topologies

The eLasso approach of van Borkulo et al.(Reference van Borkulo, Borsboom, Epskamp, Blanken, Boschloo, Schoevers and Waldorp2014) is the most popular method for analyzing Ising network models in psychology. We wish to find out how our EMVS approach stacks up against eLasso, and we, therefore, use the simulation setup of (van Borkulo et al.Reference van Borkulo, Borsboom, Epskamp, Blanken, Boschloo, Schoevers and Waldorp2014) to compare both methods. Specifically, we focus on the simulations that lead to their Table 2, where an Erdős and Rényi(Reference s and nyi1960) model is used to generate underlying sparse topologies, and normal distributions are used to simulate the model parameters.Footnote 8 In these simulations, we vary $π$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi $$\end{document} , the probability of generating an edge between two variables, p, the number of variables, and n, the number of observations and generate 100 datasets for each combination of values for $π$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi $$\end{document} , p, and n.

We analyze the simulated datasets using eLasso, using the default settings implemented in the IsingFit program (van Borkulo, Epskamp, & Robitzsch, Reference Borkulo, Epskamp and Robitzsch2016), i.e., the AND-rule and an EBIC penalty equal to 0.25. We also analyze the simulated datasets using EMVS in combination with the $ξ_{δ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi _\delta $$\end{document} method and the uniform and hierarchical specifications of the prior structure probabilities. We follow van Borkulo et al. (Reference van Borkulo, Borsboom, Epskamp, Blanken, Boschloo, Schoevers and Waldorp2014) and express the quality of the estimated solution using its sensitivity and specificity. Sensitivity is the proportion of present edges that are recovered by the method,

\begin{matrix} SEN = \frac{True positive}{True positive + False negative}, \end{matrix}

i.e., the true positive rate. Specificity is equal to the proportion of absent edges that are correctly recovered,

\begin{matrix} SPE = \frac{True negative}{True negative + False positive}, \end{matrix}

i.e., the true negative rate. For eLasso, edge inclusion refers to a nonzero association estimate using the AND approach. For EMVS, it is taken to mean that the posterior inclusion probability exceeds 0.5.

Table 1 shows the result of these simulations for the eLasso method in the column labelled “eLasso” and are similar to the results reported in Table 2 in van Borkulo et al. (Reference van Borkulo, Borsboom, Epskamp, Blanken, Boschloo, Schoevers and Waldorp2014). The first thing to note about these results is that eLasso has a high true negative rate across all simulations. This was to be expected, as $l_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_1$$\end{document} -regularization gears towards edge exclusions, which is why it performs best in the sparse network settings considered here. Indeed, it’s specificity goes down as the networks become more densely connected (i.e., larger values of $π$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi $$\end{document} ). The true positive rate of eLasso is significantly worse than its specificity, especially for the smaller sample sizes. However, the sensitivity increases with sample size, which underscores earlier results that a larger sample size helps overcome the prior shrinkage effect of the lasso (e.g., Epskamp, Kruis, & Marsman, Reference Epskamp, Kruis and Marsman2017).

Table 1 Sensitivity and specificity, as a measure of performance of eLasso and EMVS using either a uniform (U) or hierarchical prior (H), matching the spike and slab intersections to an approximate 99,7% credible interval.

Next, we consider the performance of EMVS. The results for EMVS when the penalty $ξ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi $$\end{document} is set to $ξ_{δ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi _\delta $$\end{document} , the penalty value for which the intersection of the spike and the slab components aligns with the $99, 7 %$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$99,7\%$$\end{document} approximate credible interval, c.f. Eq. (6), are shown in the columns labeled $ξ_{δ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi _\delta $$\end{document} in Table 1. We analyzed the data using the uniform prior on the model space— $ξ_{δ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi _\delta $$\end{document} (U)—and with the hierarchical model— $ξ_{δ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi _\delta $$\end{document} (H). The first striking result is that the performance of the $ξ_{δ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi _\delta $$\end{document} approach combined with a uniform prior on the structure space performs almost identical to eLasso, making it a valuable Bayesian alternative to the classical eLasso approach. Observe that the specificity equals the coverage probability of the credible interval for all but the smallest sample size. Thus, as one might expect, the coverage probability specified dictates the method’s type-1 error or specificity. The hierarchical prior on the structure space leads to an improvement to the already high specificity. For the smaller sample sizes, however, the method’s sensitivity is very low, suggesting that it is, perhaps, too conservative for settings with small sample sizes.

5.2. Parameter Estimation on Simulated Data with Dense Topology

We continue with an illustration of the estimation of parameters and inclusion probabilities. For this analysis, we simulate data for $n = 20, 000$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n = 20,000$$\end{document} cases on a $p = 15$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p = 15$$\end{document} variable network. The main effects were simulated from a Uniform $(- 1, 1)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(-1\text {, }1)$$\end{document} distribution, and the matrix of associations $Σ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Sigma }$$\end{document} was set to $u u^{T}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {u}\mathbf {u}^\mathsf{T}$$\end{document} , where $u$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {u}$$\end{document} is a p-dimensional vector of Uniform $(- \frac{1}{2}, \frac{1}{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(-\tfrac{1}{2}\text {, }\tfrac{1}{2})$$\end{document} variables, such that the elements in $Σ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Sigma }$$\end{document} lie between $- \frac{1}{4}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-\tfrac{1}{4}$$\end{document} and $\frac{1}{4}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tfrac{1}{4}$$\end{document} and concentrate around zero. Observe that, in principle, this is a densely connected network as all edges have a nonzero value, although most effects will be very small and negligible. A second data set of $n = 2, 000$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=2,000$$\end{document} cases was used to compare the performance across different sample sizes.

Figure 3 shows the posterior mode estimates for the two sample sizes using a standard normal prior distribution in Panels (a) and (b) and using our spike-and-slab setup, i.e., edge screening, in Panels (c) and (d). Observe that the effects are relatively small, and thus, many observations are needed to retrieve reasonable estimates (Panels (a) and (b)). We, therefore, cull considerably more of the effects in the edge screening step for the smaller sample size than for the larger sample size (white dots indicate culled associations in Panels (c) and (d)). The horizontal gray lines in Panels (c) and (d) reveal the spike-and-slab intersections for the different associations (there are 210 different lines, which all lie very close to each other), the thresholds from Eq. (8). Effects that lie in between the two intersection points end up in the spike (not selected; white dots); otherwise, they end up in the slab (selected; gray dots). Note that the intersections points lie closer to each other for the larger sample size, as expected. Panels (e) and (f) show the maximum pseudolikelihood estimates for eLasso, subject to the $l_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_1$$\end{document} constraint, which selects considerably fewer effects for the larger sample size, and a substantial shrinkage effect on the associations.Footnote 9

Figure. 3 The posterior modes of the association parameters using a standard normal prior are shown in the top two panels, and the posterior modes of the associations using our spike-and-slab prior setup, i.e., edge screening, are shown in the middle two panels. The horizontal gray lines in (c) and (d) reveal the thresholds from Eq. (8). The bottom two panels show the maximum pseudolikelihood estimates produced by eLasso. The dashed lines are the bisection lines.

Figure 4 illustrates the various shrinkage effects in edge screening using EM and structure selection using the Gibbs sampler. Panels (a) and (b), for example, show that the procedures produce point estimates that are close to each other. Still, there is also variation between the two methods, especially around the spike and slab intersection lines. Although we did not show it here, the posterior estimates from EM and the Gibbs sampler were identical when we used a standard normal prior distribution instead of our spike-and-slab setup. These observations suggest that the differences gleaned from Panels (a) and (b) come from the fact that the edge screening procedure optimizes the vector of inclusion variables with EM while the structure selection procedure averages over them in the Gibbs sampler. These differences become even more apparent when we compare the inclusion probabilities they estimate. Panels (c) and (d) show the inclusion probabilities against the posterior mode estimates for the edge screening approach, and Panels (e) and (f) show the inclusion probabilities against the posterior mean estimates for the structure selection procedure. Whereas the inclusion probabilities lie close to zero or one for the EM approach, they show a much smoother relation for the Gibbs sampling approach. The ability to estimate inclusion probabilities that are close to one or zero is called separation, and it is clear that the EM approach shows a better separation than the Gibbs approach. But the spike-and-slab Gibbs sampling approach, i.e., SSVS, already shows excellent separation compared to other methods (e.g., O’Hara & Sillanpää, Reference O’Hara and Sillanpää2009). Even though the edge screening approach shows better separation, it is also more liberal, as it includes more effects into the model than the structure selection procedure does. Panels (a) and (b) indicate these points in gray in Panels (a) and (b).

Figure. 4 The top two panels show scatterplots of the posterior means and posterior modes of the association parameters that were obtained from our structure selection and edge screening procedures, respectively. The gray points are points of disagreement. The middle two panels show the edge screening inclusion probabilities and the bottom two panels the structure selection inclusion probabilities. The dashed lines are the bisection lines.

6. Network Analysis of Alcohol Use Disorder and Depression Data

For an empirical illustration of our Bayesian methods, we assess the relationship between symptoms of alcohol use disorder (AUD) and major depressive disorder (MDD) using data from the National Survey on Drug Use and Health (NSDUH; United States Department of Health and Human Services, 2016). The NSDUH is an American population study on tobacco, alcohol, and drug use, and mental health issues in the USA. The goal of the NSDUH is to provide accurate estimates on current patterns of substance abuse and its consequences for mental health. The survey is conducted in all 50 states, aiming at a sample of 70,000 individuals; participants have to be above the age of 12 and are randomly selected based on household addresses. We focus on the data on alcohol use and depression obtained in 2014.

The 2014 data comprises 55,271 participants. We exclude participants below the age of 18, that never drank alcohol, or that did not drink alcohol on more than six occasions in the past year. The final data analyzed here comprises 26,571 participants.

We included the seven items related to the DSM-V (American Psychiatric Association, 2013) criteria for AUD, and the nine symptoms in the NSDUH survey data comprising the DSM-V criteria for MDD in our analysis. The NSDUH derives the MDD symptoms from survey items formulated in a skip-structure. In this setup, participants are allowed to skip certain items based on the answers they provide. Therefore, some specifics of symptoms are not assessed for participants, which may cause more absence scores for symptoms or problems than is the case.

In our analyses below, we first screen the network for promising edges, and then select plausible structures from the structure space instantiated by the set of promising edges. We will also perform structure selection without this initial pruning to illustrate the necessity of the edge screening step.

6.1. Edge Screening

In total, there were $p = 16$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p = 16$$\end{document} variables, and $(\frac{16}{2}) = 120$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${16\atopwithdelims ()2} = 120$$\end{document} associations or possible edges to consider. We ran the edge screening procedure using EMVS on the selected NSDUH data. The EMVS setup with a uniform prior on the structure space selected the same edges as the EMVS setup with a uniform prior on structure complexity. We continue here using the results from the former. The edge screening procedure identified 62 promising edges, pruning almost half of the available connections. Edge screening using a uniform prior distribution on structure complexity gave the same results. The eLasso method identified 61 edges, three of which were not identified by our edge screening procedure. There were four edges identified by our edge screening procedure, that were not identified by eLasso. Figure 5a shows the network generated by the screened edges, where blue edges constitute positive associations, and red edges constitute negative associations.

We glean several important observations from Fig. 5a. First, with 33 out of $(\frac{9}{2}) = 36$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${9\atopwithdelims ()2} = 36$$\end{document} possible connections between its nine symptoms, MDD appears to be densely connected. This result may be due, in part, to the skip structure that underlies the NSDUH assessment of MDD symptoms. However, it is in line with other results about MDD symptoms in the general population (e.g., Caspi et al., Reference Caspi, Houts, Belsky, Mellor, Harrington, Israel, Israel and Moffit2014). Second, with 20 out $(\frac{7}{2}) = 21$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${7\atopwithdelims ()2} = 21$$\end{document} possible connections between its seven symptoms, AUD also appears to be densely connected. The estimated associations are less strong than with MDD, which may be due to the skip structure that underlies the assessment of MDD symptoms. Third, there are relatively few estimated connections between the two disorders. Fourth, our edge screening procedure identified a negative association between depressed mood and withdrawal symptoms. Negative associations are scarce in cross-sectional analyses, such as the one reported here.

6.2. Structure Selection

We identified 62 promising edges with our screening procedure, which generates a local median probability structure (LMS, c.f. Fig. 5a). We now wish to find out what the plausible structures are for the data at hand and how the LMS in Fig. 5a relates to the global median probability structure (GMS), i.e., the structure with edges that have marginal posterior inclusion probabilities

\begin{matrix} p (γ_{ij} = 1 ∣ X) = \sum_{γ_{s} : γ_{ij} = 1} p (γ_{s} ∣ X) \geq \frac{1}{2} . \end{matrix}

Barbieri and Berger (Reference Barbieri and Berger2004) showed that this GMS has, in general, excellent predictive properties. We again use the uniform prior on the structure space, which is consistent with the edge screening results shown above.

We ran the Gibbs sampler for 100, 000 iterations, which visited 62 out of $2^{66} \approx 7 e^{19}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^{66}\approx 7e^{19}$$\end{document} possible structures. Pitting the visited structures against the most frequently visited structure using the Bayes factor,Footnote 10

\begin{matrix} {BF}_{1 s} = \frac{p (γ_{1} ∣ X)}{p (γ_{s} ∣ X)} = \frac{p^{*} (X ∣ γ_{1})}{p^{*} (X ∣ γ_{s})}, \end{matrix}

where $γ_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\gamma }_1$$\end{document} denotes the most frequently visited model, we identified three structures for which the most visited structure was less than ten times as plausible. A Bayes factor ${BF}_{1 s}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {BF}_{1s}$$\end{document} of ten or greater is often interpreted to provide strong evidence in favor of $γ_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\gamma }_1$$\end{document} (see, for instance Jeffreys, Reference Jeffreys1961; Lee & Wagenmakers, Reference Lee and Wagenmakers2013; Wagenmakers, Love, et al., Reference Wagenmakers, Love, Marsman, Jamil, Ly, Verhagen and Morey2018). The structures for which ${BF}_{1 s}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {BF}_{1s}$$\end{document} was less than ten, and their estimated posterior structure probabilities, are shown in panels (b), (c) and (d) in Fig. 5. The three structures only differed in the relations between the two disorders.

In Fig. 6a, we plot the posterior inclusion probabilities obtained from the edge screening analysis against those obtained from the structure selection analysis on the pruned structure space. We glean two things from this figure. First, the local inclusion probabilities are at the extremes, i.e., the values zero and one, whereas the global inclusion probabilities show a broader range of values. This difference in separation was also observed in the analysis of simulated data in Fig. 4. The bottom left corner comprises culled edges that have a zero probability of inclusion. Second, there is a great agreement about which edges are or are not in the median probability structure. The LMS and GMS differed in only one edge (indicated in white; points of agreement are in gray). In Fig. 8, we plot the GMS and a difference plot, which reveals the differences between the LMS and GMS (red edges indicating edges that are in the LMS, but not the GMS). Figure 5e shows that the negative association between nodes four and eight is not in the GMS ( $p (γ_{4, 8} = 1 ∣ X) = . 101$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\gamma _{4,8}=1\mid \mathbf {X}) = .101$$\end{document} ). Thus, the LMS produced by our edge screening approach (c.f., Fig. 5a) is an excellent approximation to the GMS identified on the pruned space (c.f., Figs. 5e and 5f). Similar to our simulated example, the edge screening procedure proved to be more liberal than the structure selection approach, i.e., more edges were included in the LMS than in the GMS.

Figure. 5 Edge screening and structure selection for NSDUH data. a indicates the network generated by the promising edges identified by edge screening; the local median probability structure. b–d indicate the three (most) plausible structures identified by structure selection on the pruned space. e indicates the global median probability structure, and f indicates the difference between the two median probability structures. The network plots are produced using the R package qgraph (Epskamp, Cramer, Waldorp, Schmittmann, & Borsboom, Reference Epskamp, Cramer, Waldorp, Schmittmann and Borsboom2012).

Figure. 6 Plots of the local posterior inclusion probabilities of edges against the global posterior inclusion probabilities for the pruned space in (a) and the full structure space in (b). The dashed lines are the bisection lines.

6.2.1. Parameter Uncertainty

One of the main benefits of using a Bayesian approach to estimate the network is that it provides a natural framework for quantifying parameter uncertainty. We have two ways to express this uncertainty. The first is the asymptotic posterior distribution based on EM output, which is the posterior distribution associated with the modal structure $\hat{γ} = E (γ ∣ \hat{Σ})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\varvec{\gamma }} = \mathbb {E}(\varvec{\gamma } \mid \hat{\varvec{\Sigma }})$$\end{document} . This is thus a conditional posterior distribution. The second is the model-averaged posterior distribution

\begin{matrix} p (Σ ∣ X) = \sum_{γ} p (Σ ∣ X, γ) p (γ ∣ X) \end{matrix}

that can be estimated from the Gibbs sampler’s output.

The model-averaged posterior distribution of the network parameters incorporates both the uncertainty that is associated with selecting a structure from the collection of possible structures, and the uncertainty that is associated with the parameters of the individual structures. In this way, the model-averaged posterior distributions offer robust estimates of the network parameters and their uncertainty. Since the model-averaged posterior embraces both sources of uncertainty, the posterior variance of a model-averaged quantity tends to be larger than that of a conditional posterior (i.e., conditioning on a specific structure selected), on average.Footnote 11 For some parameters, this does not lead to striking differences, as Fig. 7a illustrates for one of the associations in the NSDUH data example. In some occasions, however, single-model inference leads us to put faith in a model that assumes parameter values that are not supported by other plausible models. Figure 7b is an illustration of this.

Figure. 7 Estimated posterior distributions for two association parameters in the NSDUH example. We plot the asymptotic posterior distributions (i.e., the normal approximations) based on the EM edge screening output in gray, and the model-averaged posterior distribution based on Gibbs sampling in black. The density was estimated using the logspline R package (Kooperberg, Reference Kooperberg2019). The gray dot reflects the estimated posterior medians. The $95 %$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$95\%$$\end{document} highest posterior density intervals on top were estimated using the HDInterval R package (Meredith & Kruschke, Reference Meredith and Kruschke2020).

The illustrations above underscore the fact that model-averaging leads to more robust inference on the model parameters than single-model inference (e.g., the output of |rbinnet|’s Edge Screening procedure or the output from |IsingFit|). A benefit of using the Gibbs sampler to estimate the model-averaged posterior distributions is that we can use it’s output to construct model-averaged posterior distributions of other measures of interest. For example, Huth, Luigjes, Marsman, Goudriaan, and van Holst (Reference Huth, Luigjes, Marsman, Goudriaan and Holstin press) recently used the Gibbs output to estimate the model-averaged posterior distributions of node centrality measures.

6.3. Structure Selection Without Pruning

To analyze the benefit of our two-step procedure, with edge selection preceding structure selection to prune the structure space, we performed a structure selection analysis without pruning the structure space. We ran the Gibbs sampler for 100, 000 iterations, starting at the posterior mode, which visited 39, 885 out of $2^{120} \approx 1 e^{36}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^{120}\approx 1e^{36}$$\end{document} possible structures. This result immediately underscores the importance of pruning the structure space before structure selection. The posterior structure probabilities of such a large collection of models cannot be estimated with great precision in a reasonable amount of time. Pitting the visited structures against the most frequently visited structure using the Bayes factor identified 52 plausible models. Two questions arise. The first question is about identifying the GMS and how it fares against the LMS identified with edge screening. Second, we wish to determine how the three previously identified structures stack up against the 39, 885 visited structures in the structure selection on the full structure space.

In Fig. 6b, we plot the posterior inclusion probabilities obtained from the edge screening analysis against those obtained from the structure selection analysis on the full structure space. As before, the local inclusion probabilities are mostly located at the extreme ends of zero and one, whereas the global inclusion probabilities are more variable. This difference is emphasized in the bottom left corner of Fig. 6b, since the previously culled edges now received nonzero probabilities. However, Fig. 6b also reveals that there is great agreement about which edges are or are not in the median probability structure. The LMS and GMS on the full structure space differed in three edges.

In Fig. 8, we plot the GMS from the full space and a difference plot. Figure 8b shows that, as before with the pruned space, the negative association between nodes four and eight is not in the GMS ( $p (γ_{4, 8} = 1 ∣ X) = . 487$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\gamma _{4,8}=1\mid \mathbf {X}) = .487$$\end{document} ). The edge between nodes four and twelve was also not in the second plausible structure observed before (c.f. Fig. 5c; $p (γ_{4, 12} = 1 ∣ X) = . 394$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\gamma _{4,12}=1\mid \mathbf {X}) = .394$$\end{document} ). The edge between nodes five and sixteen, however, was not screened before ( $p (γ_{5, 16} = 1 ∣ X) = . 491$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\gamma _{5,16}=1\mid \mathbf {X}) = .491$$\end{document} ). In sum, the LMS produced by our edge screening approach (c.f., Fig. 5a) served as a good approximation to the GMS identified on the pruned space (c.f., Figs. 5e and 5f) and on the full space (c.f., Figs. 8a and 8b).

Figure. 8 Application of Structure Selection to NSDUH data on the full structure space. a indicates the global median probability structure on the full structure space, and b indicates the difference between the local and global median probability structures. See text for details. The network plots are produced using the R package qgraph (Epskamp et al., Reference Epskamp, Cramer, Waldorp, Schmittmann and Borsboom2012).

The Gibbs sampler on the full structure space visited 39, 885 structures. Of these 39, 885 structures, 75 were visited between 100 and 1, 450 times, indicating posterior probabilities between .0008 and .015. The remaining 39, 785 structures were visited less than 100 times, indicating a posterior probability of less than .0008. However, in total, the probabilities of these 39, 785 structures added up to .762. Thus, structure selection on the full structure space wastes valuable computational efforts on estimating insignificant structures. This is a prime example of dilution (George, Reference George, Bernardo and Berger1999), and once more underscores the importance of pruning the structure space before performing structure selection. The posterior probabilities of the three structures identified earlier were .008, .015 and .008, and with that they were the 7th, 1st and 6th most visited models, respectively. Nevertheless, given the vast amount of visited structures and the tiny probabilities associated with it, their estimates are highly uncertain.

7. Discussion

In this paper, we have introduced a novel objective spike-and-slab approach for structure selection for the Ising model, and we have illustrated the full suite of Bayesian tools using simulated and empirical data. The empirical analysis allowed us to underscore the importance of trimming the structure space before its exploration, and that edge screening is capable of identifying relevant edges. The default specification of the spike-and-slab variances resulted in a selection method with consistently high specificity in our simulations, i.e., a low type-1 error rate in edge detection. Posterior estimates of the parameters are easy to obtain for both edge screening and structure selection procedures. Our structure selection procedure opened up the full spectrum of Bayesian tools, and, when paired with edge screening, it quickly zoomed in on plausible structures and promising effects. In sum, we have presented a complete Bayesian methodology for structure determination for the Ising model.

A caveat in our suite of Bayesian tools is the Bayes factor comparing two specific topologies. In principle, we can compute the Bayes factor from the posterior structure probabilities obtained from our structure selection procedure, but only if the Gibbs sampler visited the two structures under scrutiny. However, there is no guarantee that the Gibbs sampler visits the two structures, and even if the Gibbs sampler visits them, their estimated posterior probabilities can be uncertain. We need a more dedicated approach to estimate the Bayes factor if we wish to compare two particular structures of interest. Dedicated procedures have been developed for GGMs (e.g., Williams & Mulder, Reference Williams and Mulder2020a; Williams, Rast, Pericchi, & Mulder, Reference Williams, Rast, Pericchi and Mulder2020) and implemented in the R-package |BGGM| (Williams & Mulder, Reference Williams and Mulder2020b). However, these procedures have not yet been developed for the Ising model. We believe that the Laplace approximation that we have used in the paper will be a good starting point for computing the marginal likelihoods. Another option would be the Bridge sampler (Gronau et al., Reference Gronau, Sarafoglou, Matzke, Boehm, Marsman, Leslie and Steingroever2017; Meng & Wong, Reference Meng and Wong1996), which fits seamlessly with our Gibbs sampling approach.

In practice, however, we often do not have specific structures that we wish to compare, while we do have hypotheses about entire collections of structures. As an example, consider the hypothesis $H_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {H}_1$$\end{document} that a particular edge should be included in the network. This hypothesis spans the collection of all structures that include the edge. The posterior plausibility for $H_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {H}_1$$\end{document} is therefore the collective plausibility of all structures in the hypothesised collection:

\begin{matrix} p (H_{1} ∣ X) = \sum_{γ_{s} : γ_{ij} = 1} p (γ_{s} ∣ X), \end{matrix}

i.e., the edge-inclusion probability. The posterior plausibility for the complementary hypothesis $H_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {H}_0$$\end{document} of edge exclusion is $p (H_{0} ∣ X) = 1 - p (H_{1} ∣ X)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\mathcal {H}_0 \mid \mathbf{X} ) = 1- p(\mathcal {H}_1 \mid \mathbf{X} )$$\end{document} . The ratios of the prior and posterior plausibility of these two competing hypotheses then determine the edge-inclusion Bayes factor.Footnote 12 Crucially, the edge-inclusion Bayes factor can quantify the evidence for $H_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {H}_1$$\end{document} —edge inclusion—and $H_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {H}_0$$\end{document} —edge-exclusion (Jeffreys, Reference Jeffreys1961; Wagenmakers, Marsman, et al., Reference Wagenmakers, Marsman, Jamil, Ly, Verhagen, Love and Morey2018). Moreover, the Bayes factor can tease apart the evidence of absence (i.e., edge-exclusion) from the absence of evidence. We therefore believe that the edge-inclusion Bayes factor is a valuable tool for analyzing psychological networks. The methods advocated in this paper—implemented in the R package |rbinnet|—can be used to estimate the edge-inclusion Bayes factors. Huth et al. (Reference Huth, Luigjes, Marsman, Goudriaan and Holstin press) recently used it to estimate the evidence for in- and exclusion of edges in networks of alcohol abuse disorder symptoms.

In a recent preprint, Bhattacharyya and Atchade (henceforth BA; Reference Bhattacharyya and Atchade2019) also proposed a continuous spike-and-slab edge selection approach for the Ising model using the pseudolikelihood. The two methods were designed with a different focus, however. Whereas BA focused on networks with many variables, we focused on psychological networks that are relatively small in comparison. As a result, the two approaches differ in several key aspects that make our approach more appealing to analyze psychological networks. For example, BA did not trim the structure space before exploring it with a Gibbs sampler. Our empirical example illustrated why we believe that this is a bad idea. At the same time, we addressed some outstanding issues in this paper that BA left open. For example, BA analyzed the p full-conditionals in Eq. (2) in isolation, which provided them an opportunity for fast parallel processing. However, this also forced them to stipulate two independent prior distributions on each focal parameter, which means that they ended up with two posterior distributions for each association. Unfortunately, BA provided no principled solution for combining these estimates for either structure selection or parameter estimation. Another issue is that their spike-and-slab approach required the specification of tuning parameters, but they offered no guidance or automated procedure for their specification. In sum, our method (i) offers an objective specification of the prior distributions that lead to sensible answers, (ii) trims the structure space to circumvent issues related to dilution, and (iii) allows for a meaningful interpretation of the estimated posteriors. Despite these crucial differences, however, the approach of BA is broader than ours, as they also analyzed networks of polytomous variables, while we exclusively focus on the binary case in this paper.

Our specification of the hyperparameters stipulates a mixture of two unit information priors, one fixed and one shrinking, that a priori match an approximate credible interval. We chose this setup to mimic the eLasso approach of van Borkulo et al. (Reference van Borkulo, Borsboom, Epskamp, Blanken, Boschloo, Schoevers and Waldorp2014) and aimed for high specificity. However, researchers might have a different aim and wish to have methods available that have a higher sensitivity (e.g., see the considerations in Epskamp et al., Reference Epskamp, Kruis and Marsman2017) or that aim for a low false discovery rate instead (e.g., Storey, Reference Storey2003). In principle, penalty tuning procedures and prior structure probabilities could be tailored to achieve different goals. For example, we could adopt the eLasso approach and select the penalty $ξ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi $$\end{document} that minimizes the Bayesian information criterion (BIC; Schwarz, Reference Schwarz1978) or the extended BIC (EBIC $_{λ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_\lambda $$\end{document} , where $λ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda $$\end{document} is a penalty on complexity; Chen & Chen, Reference Chen and Chen2008) instead of matching the spike-and-slab intersections to credible intervals. These two criteria usually achieve higher sensitivity than Lasso and naturally tie in with the two prior distributions on the structure space that we have used here: A uniform prior distribution on the structure space is consistent with BIC, and a uniform prior distribution on structure complexity is compatible with EBIC $_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_1$$\end{document} . Furthermore, several alternative prior distributions that account for multiple testing have been discussed in the variable selection literature (e.g., Castillo, Schmidt-Hieber, & van der Vaart, Reference Castillo, Schmidt-Hieber and van der Vaart2015; Womack, Fuentes, & Taylor-Rodriguez, Reference Womack, Fuentes and Rodriguez2015). In sum, there are plenty of options to tailor the spike-and-slab approach to the specific needs of empirical researchers.

The prior specification options that we discussed above are geared towards situations in which researchers have limited or only general ideas about the network they are analyzing. In principle, researchers could have substantive ideas or knowledge about the network under scrutiny, and it is opportune to use this information in its analysis. Prior information could be used to define $δ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta $$\end{document} (e.g., George & McCulloch, Reference George and McCulloch1997) or it could guide the specification of the prior inclusion probabilities of the network’s edges. A parameter’s sign is another common source of information since most relations are usually positive in psychological applications (see Williams & Mulder, Reference Williams and Mulder2020a, for an implementation of this idea for GGMs). Investigating how substantive knowledge can be best included in the Bayesian model—and what that implies for the spike-and-slab setup—is another fruitful area of future research.

Implementing our procedures in a compiled language is one of several improvements that we envision for the |rbinnet| package. At this moment, our methods are wholly implemented in R (R Core Team, Reference Vienna2019). Our current implementation of the edge screening procedure implementation is a bit slower than the eLasso implementation in |IsingFit|—the analysis of NSDUH data took approximately 40 seconds for edge screening and 15 seconds for |IsingFit|—structure selection is considerably slower since the Gibbs sampler needs more time to explore the network space. The online appendix contains a simulation to illustrate the running time differences between the different methods and their implementations. There are currently two computational bottlenecks: The specification of the Hessian matrix, and running the Gibbs sampler. Both involve iterating loops that can be computed much faster in a compiled language. Another aspect that we plan to implement shortly is the treatment of missing data. Two options present itself. The first uses selection functions for pairwise removal of missing data points; the second is data-augmentation or imputation. Both methods assume that data are missing at random, or are at least ignorable. The analysis of structurally missing data, e.g., missing data introduced by a skip structure as in our example, requires a different model setup, in principle, and remains an open problem. As for different models, we are currently working on extending the method to Ising models for polytomous (c.f., Bhattacharyya & Atchade, Reference Bhattacharyya and Atchade2019) and ordinal data, and different setups for the spike-and-slab priors. We also plan to implement our software in the open-source statistical software JASP (Love et al., Reference Love, Selker, Marsman, Jamil, Dropmann, Verhagen and Wagenmakers2019; Wagenmakers, Love, et al., Reference Wagenmakers, Love, Marsman, Jamil, Ly, Verhagen and Morey2018), which would build a user-friendly interface for the functions in |rbinnet|.

Appendix A EM Variable Selection for the Pseudolikelihood Ising Model

The E-Step

The Q-function factors into three distinct termsFootnote 13:

(9)

\begin{matrix} Q & ((Σ, ,, μ, ,, θ), Σ^{k}, ,, θ^{k}) \\ = Q_{1} (Σ, μ ∣ Σ^{k}, θ^{k}) + Q_{2} (θ ∣ Σ^{k}, θ^{k}) + C, \end{matrix}

where $C = - ln (p (X))$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {C} = -\ln (p(\mathbf {X}))$$\end{document} is a constant term.

The first term in Eq. (9) concerns the pseudoposterior of the Ising model’s parameters

\begin{matrix} Q_{1} (Σ, μ ∣ Σ^{k}, θ^{k}) = & ln (p^{*}, (X ∣ μ, Σ)) + ln (p (μ)) \\ + E ((ln (p (Σ ∣ γ))), Σ^{k}, ,, θ^{k}), \end{matrix}

and involves the expectation of the log-transformed spike-and-slab prior

\begin{matrix} E ((ln (p (Σ ∣ γ))), Σ^{k}, ,, θ^{k}) \\ = C_{1} - \frac{1}{2} \sum_{i = 1}^{p - 1} \sum_{j = i + 1}^{p} σ_{ij}^{2} E ((\frac{1}{γ_{ij} ν_{1} + (1 - γ_{ij}) ν_{0}}), σ_{ij}^{k}, ,, θ^{k}), \end{matrix}

where $C_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {C}_1$$\end{document} is a constant term, and the last term can be reformulated as (c.f., Ročková & George, Reference Ročková and George2014, Eq. 3.6)

\begin{matrix} - \frac{1}{2} \sum_{i = 1}^{p - 1} \sum_{j = i + 1}^{p} σ_{ij}^{2} (\frac{E ((γ_{ij}), σ_{ij}^{k}, ,, θ^{k})}{ν_{1}} + \frac{1 - E ((γ_{ij}), σ_{ij}^{k}, ,, θ^{k})}{ν_{0}}), \end{matrix}

where the posterior expectation of the selection variable is equal to

(10)

\begin{matrix} E ((γ_{ij}), σ_{ij}^{k}, ,, θ^{k}) = \frac{p (σ_{ij}^{k} ∣ γ_{ij} = 1) p (γ_{ij} = 1 ∣ θ^{k})}{\sum_{Γ_{ij} = γ_{ij}} p (σ_{ij}^{k} ∣ γ_{ij}) p (γ_{ij} ∣ θ^{k})} . \end{matrix}

The second term in Eq. (9) concerns the posterior distribution of the prior inclusion probability

\begin{matrix} Q_{2} (θ ∣ Σ^{k}, θ^{k}) & = E ((ln (p (γ ∣ θ))), Σ^{k}, ,, θ^{k}) + ln (p (θ)), \end{matrix}

and involves the expectation of the log-transformed prior distribution on the selector variables

\begin{matrix} E ((ln (p (γ ∣ θ))), Σ^{k}, ,, θ^{k}) \\ = ln (θ) (\frac{p}{2}) + ln (\frac{θ}{1 - θ}) \sum_{i = 1}^{p - 1} \sum_{j = i + 1}^{p} E ((γ_{ij}), σ_{ij}^{k}, ,, θ^{k}), \end{matrix}

and is also readily computed using the expression in Eq. (10).

The M-Step

We separately optimize the two components of the Q-function in the M-step. Unfortunately, there is no closed-form solution for the maximization of $Q_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {Q}_1$$\end{document} , and we approximate the M-Step using a single iteration of a Newton-Raphson algorithm (Lange, Reference Lange1995; Tanner, Reference Tanner1996). The details are in “Appendix B”. The maximization of $Q_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {Q}_2$$\end{document} is in closed-form,

\begin{matrix} θ^{k + 1} = \frac{\sum_{i = 1}^{p - 1} \sum_{j = i + 1}^{p} E (γ_{ij} ∣ σ_{ij}^{k}, θ^{(k)}) + α - 1}{α + β + (\frac{p}{2}) - 2} . \end{matrix}

Posterior Standard Deviations

The EM algorithm provides us with an estimate of a local posterior mode, and we seek a way to quantify the uncertainty in this modal estimate. We express this uncertainty using the variance-covariance matrix of the normal approximation to the posterior (Tanner, Reference Tanner1996, e.g.,), i.e., the inverse of the Hessian matrix. The Hessian matrix is computed in the M-step of our EMVS approach, see “Appendix B”, and serves as an estimate of the variance-covariance matrix of the complete posterior $p (Σ, μ, θ, γ ∣ X)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\varvec{\Sigma }\text {, }\varvec{\mu }\text {, }\theta \text {, }\varvec{\gamma } \mid \mathbf {X})$$\end{document} . To estimate the variance-covariance matrix of the marginal posterior $p (Σ, μ, θ ∣ X)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\varvec{\Sigma }\text {, }\varvec{\mu }\text {, }\theta \mid \mathbf {X})$$\end{document} , we have to use the inverse of the Hessian subject to the marginal spike-and-slab prior distributions on the interaction effects,

\begin{matrix} p (σ_{ij} ∣ θ) = θ \frac{1}{\sqrt{2 π ν_{1}}} e^{(- \frac{1}{2 ν_{1}} σ_{ij}^{2})} + (1 - θ) \frac{1}{\sqrt{2 π ν_{0}}} e^{(- \frac{1}{2 ν_{0}} σ_{ij}^{2})} . \end{matrix}

For prior specification—setting the values of $ν_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _1$$\end{document} —we use the inverse Hessian excluding prior distributions on the parameters.

Appendix B The M-Step approximation for $Q_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {Q}_1$$\end{document}

We approximate the M-step of $Q_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {Q}_1$$\end{document} using a single iteration of the Newton–Raphson algorithm. Let $η = {μ_{1}, \dots, μ_{p}, σ_{12}, \dots, σ_{(p - 1) p}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta } = \{\mu _1\text {, }\dots \text {, }\mu _p\text {, }\sigma _{12}\text {, }\dots \text {, }\sigma _{(p-1)p}\}$$\end{document} denote the $((\frac{p}{2}) + p) \times 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left( {p\atopwithdelims ()2} + p\right) \times 1$$\end{document} vector of pseudolikelihood parameters. Then, the Newton–Raphson iteration is equal to

\begin{matrix} η^{k + 1} = η^{k} + H^{- 1} D, \end{matrix}

where $D$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf{D} $$\end{document} is the $((\frac{p}{2}) + p) \times 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left( {p\atopwithdelims ()2} + p\right) \times 1$$\end{document} vector of first-order partial derivatives and $H$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf{H} $$\end{document} the $((\frac{p}{2}) + p) \times ((\frac{p}{2}) + p)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left( {p\atopwithdelims ()2} + p\right) \times \left( {p\atopwithdelims ()2} + p\right) $$\end{document} matrix of second-order partial derivativesi.e., the Hessian matrix.

The first-order partial derivatives are equal to

\begin{matrix} \frac{\partial}{\partial μ_{i}} Q_{1} = \sum_{v = 1}^{n} x_{vi} - \sum_{v = 1}^{n} \frac{exp (μ_{i} + \sum_{j \neq i} σ_{ij} x_{vj})}{1 + exp (μ_{i} + \sum_{j \neq i} σ_{ij} x_{vj})} - μ_{i} \end{matrix}

for the main effects, and

\begin{matrix} \frac{\partial}{\partial σ_{ij}} Q_{1} & = 2 \sum_{v = 1}^{n} x_{vi} x_{vj} \\ - \sum_{v = 1}^{n} x_{vj} \frac{exp (μ_{i} + \sum_{q \neq i} σ_{iq} x_{vq})}{1 + exp (μ_{i} + \sum_{q \neq i} σ_{iq} x_{vq})} \\ - \sum_{v = 1}^{n} x_{vi} \frac{exp (μ_{j} + \sum_{q \neq j} σ_{jq} x_{vq})}{1 + exp (μ_{j} + \sum_{q \neq j} σ_{jq} x_{vq})} \\ - σ_{ij} (\frac{E (γ_{ij} ∣ σ_{ij}^{k}, θ^{k})}{ν_{1}} + \frac{1 - E (γ_{ij} ∣ σ_{ij}^{k}, θ^{k})}{ν_{0}}) \\ = 2 \sum_{v = 1}^{n} x_{vi} x_{vj} - \sum_{v = 1}^{n} x_{vj} p_{vi}^{*} - \sum_{v = 1}^{n} x_{vi} p_{vj}^{*} - σ_{ij} e_{ij} \end{matrix}

for the interaction effects. Here, we have used $p_{vi}^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p^*_{vi}$$\end{document} to denote the conditional probability $p (X_{i} = 1 ∣ x_{v}^{(i)})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(X_i = 1 \mid \mathbf {x}_{v}^{(i)})$$\end{document} and $e_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$e_{ij}$$\end{document} to denote the expected precision.

The Hessian matrix is slightly more complicated, as it requires some tedious bookkeeping. To emphasize its structure and ease its derivation, we split the Hessian matrix in four components,

\begin{matrix} H = (\begin{matrix} H_{μ} & H_{μ Σ} \\ H_{μ Σ}^{T} & H_{Σ} \end{matrix}), \end{matrix}

where $H_{μ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf{H} _\mu $$\end{document} , and $H_{Σ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf{H} _\Sigma $$\end{document} are the second-order partial derivatives of the main effects $μ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\mu }$$\end{document} and $Σ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Sigma }$$\end{document} , respectively, and $H_{μ Σ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf{H} _{\mu \,\Sigma }$$\end{document} their cross-derivatives. The submatrix $H_{μ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf{H} _\mu $$\end{document} is diagonal and has elements

\begin{matrix} \frac{\partial^{2}}{\partial μ_{i} \partial μ_{j}} Q_{1} = (\begin{matrix} - \sum_{v = 1}^{n} p_{vi}^{*} (1 - p_{vi}^{*}) - 1 & if i = j \\ 0 & if i \neq j \end{matrix}) \end{matrix}

The submatrix $H_{μ Σ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf{H} _{\mu \, \Sigma }$$\end{document} has elements

\begin{matrix} \frac{\partial^{2}}{\partial μ_{i} \partial σ_{rj}} Q_{1} = (\begin{matrix} - \sum_{v = 1}^{n} x_{vj} p_{vi}^{*} (1 - p_{vi}^{*}) & if r = i \\ 0 & if i \neq r \end{matrix}) \end{matrix}

Finally, the submatrix $H_{Σ}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf{H} _{\Sigma }$$\end{document} has diagonal elements

\begin{matrix} \frac{\partial^{2}}{\partial σ_{ij}^{2}} Q_{1} = - \sum_{v = 1}^{n} x_{vj} p_{vi}^{*} q_{vi}^{*} - \sum_{v = 1}^{n} x_{vi} p_{vj}^{*} q_{vj}^{*} - e_{ij}, \end{matrix}

and off-diagonal elements

\begin{matrix} \frac{\partial^{2}}{\partial σ_{ij} \partial σ_{ru}} Q_{1} = (\begin{matrix} - \sum_{v = 1}^{n} x_{vj} x_{vu} p_{vi}^{*} q_{vi}^{*} & if i = r and j \neq u \\ - \sum_{v = 1}^{n} x_{vj} x_{vr} p_{vi}^{*} q_{vi}^{*} & if i = u and j \neq r \\ - \sum_{v = 1}^{n} x_{vi} x_{vr} p_{vj}^{*} q_{vj}^{*} & if i \neq r and j = u \\ - \sum_{v = 1}^{n} x_{vi} x_{vu} p_{vj}^{*} q_{vj}^{*} & if i \neq u and j = r \\ 0 & if i \neq u and j \neq r \end{matrix}) \end{matrix}

where $q_{vj}^{*} = 1 - p_{vj}^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{vj}^*= 1 - p_{vj}^*$$\end{document} .

Appendix C A Gibbs Sampling Routine for Structure Selection

The Gibbs sampler iterates between the following five steps. If a uniform prior is stipulated on the structure space, then step four is skipped.

Step 1. Sampling the main effects $μ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _i$$\end{document} . With the assumption of prior independence, the main effects are also found to be independent a posteriori and do not depend on the selection variables $γ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\gamma }$$\end{document} . Given the standard normal prior distribution, the full-conditional posterior distribution $p (μ_{i} ∣ σ_{i}, ω_{i}, X)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\mu _i \mid \varvec{\sigma }_i \text {, } \varvec{\omega }_i \text {, }\mathbf {X})$$\end{document} of the main effect $μ_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _i$$\end{document} is a normal distribution, with mean

\begin{matrix} \frac{x_{i +} - \frac{n}{2} - \sum_{v = 1}^{n} \sum_{j \neq i} σ_{ij} x_{jv} ω_{iv}}{1 + ω_{i +}} \end{matrix}

and variance ${(1 + ω_{i +})}^{- 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(1 + \omega _{i+})^{-1}$$\end{document} , where we have used $σ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\sigma }$$\end{document} to denote the $p - 1 \times 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p - 1 \times 1$$\end{document} vector

\begin{matrix} σ = {(σ_{i 1}, \dots, σ_{i (i - 1))}, σ_{i (i + 1)}, \dots, σ_{ip})}^{T}, \end{matrix}

and $x_{i +}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{i+}$$\end{document} and $ω_{+ i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\omega _{+i}$$\end{document} to denote the margins $\sum_{v = 1}^{n} x_{iv}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum _{v=1}^nx_{iv}$$\end{document} and $\sum_{v = 1}^{n} ω_{vi}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum _{v=1}^n\omega _{vi}$$\end{document} , respectively.

Step 2. Sampling the interaction effects $σ_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{ij}$$\end{document} . Given $γ_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{ij}$$\end{document} , the prior distribution for $σ_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{ij}$$\end{document} is normal with a zero mean, and variance $ϕ = γ_{ij} ν_{1} + (1 - γ_{ij}) ν_{0}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi = \gamma _{ij}\nu _{1} + (1-\gamma _{ij})\nu _0$$\end{document} . The full-conditional posterior distribution is then a normal distribution with mean

\begin{matrix} {(ω_{i}^{T} x_{j} + ω_{j}^{T} x_{i} + ϕ^{- 1})}^{- 1} \times (2 x_{i}^{T} x_{j} - \frac{1}{2} x_{+ i} - \frac{1}{2} x_{+ j}) \\ (- \sum_{v = 1}^{n} ω_{vi} x_{vj} (μ_{i} + \sum_{q \neq i \neq j} σ_{iq} x_{vq}) - \sum_{v = 1}^{n} ω_{vj} x_{vi} (μ_{j} + \sum_{q \neq i \neq j} σ_{jq} x_{vq})) \end{matrix}

and variance ${(ω_{i}^{T} x_{j} + ω_{j}^{T} x_{i} + ϕ^{- 1})}^{- 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left( \varvec{\omega }_i^\mathsf{T}\mathbf {x}_j + \varvec{\omega }_j^\mathsf{T}\mathbf {x}_i + \phi ^{-1}\right) ^{-1}$$\end{document} , where we have used $x_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {x}_i$$\end{document} to denote the $n \times 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n \times 1$$\end{document} vector with elements $[x_{iv}]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[x_{iv}]$$\end{document} .

Step 3. Sampling the inclusion variables $γ_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{ij}$$\end{document} . The full-conditional posterior distribution of $γ_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{ij}$$\end{document} is a Bernoulli distribution with probability of inclusion:

\begin{matrix} p (γ_{ij} = 1 ∣ σ_{ij}, θ) = \frac{1}{1 + \frac{1 - θ}{θ} exp (\frac{1}{2 (ν_{1} - ν_{0})}, σ_{ij}^{2})} . \end{matrix}

Step 4. Sampling the prior inclusion probability $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} . The full-conditional posterior distribution of $θ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} is a Beta distribution, with parameters

\begin{matrix} α & = 1 + γ_{+ +} / 2, \\ β & = 1 + (\frac{p}{2}) - γ_{+ +} / 2, \end{matrix}

where $γ_{+ +} = \sum_{i = 1}^{p} \sum_{j = 1}^{p} γ_{ij}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{++} = \sum _{i=1}^p\sum _{j=1}^p\gamma _{ij}$$\end{document} .

Step 5. Sampling the augmented variables $ω_{vi}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\omega _{vi}$$\end{document} . The full-conditional posterior distribution of $ω_{vi}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\omega _{vi}$$\end{document} is proportional to

\begin{matrix} p (ω_{vi} ∣ σ_{i}, x_{v}^{(i)}) \propto exp (- \frac{1}{2} ω_{vi} {(μ_{i} + \sum_{j \neq i} σ_{ij} x_{vj})}^{2}) p (ω_{vi}), \end{matrix}

where $p (ω_{vi}) = p (ω_{vi} ∣ 1, 0)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\omega _{vi}) = p(\omega _{vi} \mid 1\text {, }0)$$\end{document} is a Pólya-Gamma distribution with parameters $b = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b = 1$$\end{document} and $c = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c = 0$$\end{document} . Polson et al. (Reference Polson, Scott and Windle2013a) show that the Pólya-Gamma distribution with parameters $b = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b = 1$$\end{document} and $c \neq 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c \ne 1$$\end{document} is equal to an exponential tilting of the Pólya-Gamma distribution with parameters $b = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b = 1$$\end{document} and $c = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c = 0$$\end{document} ,

\begin{matrix} p (ω ∣ 1, c) = \frac{exp (- \frac{1}{2} c^{2} ω) p (ω ∣ 1, 0)}{cosh (\frac{1}{2}, c)}, \end{matrix}

which consequently shows that $p (ω_{vi} ∣ σ_{i}, x_{v}^{(j)})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p\left( \omega _{vi} \mid \varvec{\sigma }_i\text {, }\mathbf {x}_v^{(j)}\right) $$\end{document} is a Pólya-Gamma distribution with parameters $b = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b = 1$$\end{document} and $c = μ_{i} + \sum_{j \neq i} σ_{ij} x_{vj}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c = \mu _i + \sum _{j \ne i}\sigma _{ij}x_{vj}$$\end{document} . These values can be simulated using the R (R Core Team, Reference Vienna2019) programs BayesLogit (Polson, Scott, & Windle, Reference Polson, Scott and Windle2013b) and BayesReg (Makalic & Schmidt, Reference Makalic and Schmidt2016).

Footnotes

Supplementary Information the online version contains supplementary material available at https://doi.org/10.1007/s11336-022-09848-8.

1 While our manuscript was under review, Ly and Wagenmakers (Reference Ly and Wagenmakers2021) elegantly extended this inconsistency result to any statistical model using techniques similar to ours.

2 The package can be downloaded from https://github.com/MaartenMarsman/rbinnet/.

3 The analysis of structure selection consistency that we discuss here applies to either the edge screening procedure or the structure selection procedure that we present in the next sections. It does not, however, pertain to the situation of structure selection after initial edges have been screened.

4 This S \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {S}$$\end{document} -closed view might be unrealistic in practice. If the correct structure is not in S \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {S}$$\end{document} , t is the structure closest in Kullback–Leibler divergence to the true structure.

5 We estimate Var ( σ ^ ij ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {Var}(\hat{\sigma }_{ij})$$\end{document} by setting it to the diagonal of the Fisher information matrix for the pseudolikelihood, evaluated at the maximum pseudolikelihood estimate, c.f., “Appendix 7 ”. Alternatively, one could approximate the variance by running a Gibbs sampling approach using non-informative priors. This, however, would take considerably more time in practice.

6 With the caveat that Barbieri and Berger (Reference Barbieri and Berger2004) define the posterior inclusion probability of an edge i–j to be the aggregate of the posterior probabilities of structures that include the edge

p ( γ ij = 1 ∣ X ) = ∑ γ s : γ ij = 1 p ( γ s ∣ X ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p(\gamma _{ij} = 1 \mid \mathbf {X}) = \sum _{\varvec{\gamma }_s: \gamma _{ij} = 1} p(\varvec{\gamma }_s \mid \mathbf {X}) \end{aligned}$$\end{document}

where we define the posterior inclusion probabilities locally.

7 If we use S ∗ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {S}^*$$\end{document} to denote the pruned structure space, then S ∗ ⊂ S \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {S}^*\subset \mathcal {S}$$\end{document} . This implies that the absolute magnitude of the posterior structure probability will be higher when calculated over the pruned space:

p ( X ∣ γ ) p ( γ ) ∑ γ ∈ S ∗ p ( X ∣ γ ) p ( γ ) ≥ p ( X ∣ γ ) p ( γ ) ∑ γ ∈ S p ( X ∣ γ ) p ( γ ) , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{p(\mathbf {X} \mid \varvec{\gamma }) \,p(\varvec{\gamma })}{\sum _{\varvec{\gamma } \in \mathcal {S}^*} p(\mathbf {X} \mid \varvec{\gamma }) \,p(\varvec{\gamma })} \ge \frac{p(\mathbf {X} \mid \varvec{\gamma }) \,p(\varvec{\gamma })}{\sum _{\varvec{\gamma } \in \mathcal {S}} p(\mathbf {X} \mid \varvec{\gamma }) \,p(\varvec{\gamma })}, \end{aligned}$$\end{document}

since there are less terms in the denominator of the ratio on the left. But, for any two structures γ i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\gamma }_i$$\end{document} and γ j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\gamma }_j$$\end{document} in the pruned space S ∗ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {S}^*$$\end{document} , the posterior odds are

p ( γ i ∣ X ) p ( γ i ∣ X ) = p ( X ∣ γ i ) p ( γ i ) p ( X ∣ γ j ) p ( γ j ) , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{p(\varvec{\gamma }_i \mid \mathbf {X})}{p(\varvec{\gamma }_i \mid \mathbf {X})} = \frac{p(\mathbf {X} \mid \varvec{\gamma }_i) p(\varvec{\gamma }_i)}{p(\mathbf {X} \mid \varvec{\gamma }_j) p(\varvec{\gamma }_j)}, \end{aligned}$$\end{document}

in which the denominator does not play a role. Thus, even though the absolute values of the posterior probabilities for specific structures differ between the pruned and the full structure space, their relative probabilities do not. The upside is that after pruning the edge set, the remaining model probabilities are estimated with increased precision. This result does assume that the prior on the topologies is unaffected by pruningi.e., it is formulated on the full structure space S \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {S}$$\end{document} .

8 We follow the simulation setup of van Borkulo et al. (Reference van Borkulo, Borsboom, Epskamp, Blanken, Boschloo, Schoevers and Waldorp2014) and set the association parameters σ ij \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{ij}$$\end{document} equal to | Z ij | \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|Z_{ij}|$$\end{document} , where the Z ij \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_{ij}$$\end{document} are independent normal random variables with mean zero and variance 0.25. The main effect parameters μ i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _i$$\end{document} are set to - | Z i | \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-|Z_i|$$\end{document} , where Z i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_i$$\end{document} is an independent normal random variable with mean σ i + = ∑ j = 1 p σ ij / 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{i+}=\sum _{j = 1}^p\sigma _{ij} / 2$$\end{document} , with σ ii = 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{ii}=0$$\end{document} , and variance σ i + 2 / 36 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{i+}^2/36$$\end{document} .

9 In the sparse setting, one expects to detect approximately n l o g ( p ( p - 1 ) / 2 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{\tfrac{n}{log(p (p-1)/2)}}$$\end{document} edges with Lasso given a penalty of approximately l o g ( p ( p - 1 ) / 2 ) n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{\tfrac{log(p (p-1)/2)}{n}}$$\end{document} . This translates to approximately 20 edges with 2, 000 observations and 60 with 20, 000 observations. In this simulation, we used a dense structure setup, a setup for which eLasso was not designed. The penalty values selected by eLasso could be used to analyze the performance of eLasso in practical situations, where the lower bound would be approximately l o g ( p ( p - 1 ) / 2 ) n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{\tfrac{log(p (p-1)/2)}{n}}$$\end{document} . Here, the selected penalty values ranged between 0.0153 and 0.0319 with 2, 000 observations. These values range below the lower bound of 0.0482. With 20, 000 observations, we find values ranging between 0.0057 and 0.0158, which also largely range below the lower bound of 0.0153. This indicates that eLasso has difficulty with the non-sparse setting simulated here.

10 For the hierarchical setup, we have to include the prior structure probabilities in this equation:

BF 1 s = p ( γ 1 ∣ X ) p ( γ s ∣ X ) × p ( γ s ) p ( γ 1 ) = p ∗ ( X ∣ γ 1 ) p ∗ ( X ∣ γ s ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \text {BF}_{1s} = \frac{p(\varvec{\gamma }_1 \mid \mathbf {X})}{p(\varvec{\gamma }_s \mid \mathbf {X})}\times \frac{p(\varvec{\gamma }_s)}{p(\varvec{\gamma }_1)} = \frac{p^*( \mathbf {X}\mid \varvec{\gamma }_1)}{p^*( \mathbf {X}\mid \varvec{\gamma }_s)} \end{aligned}$$\end{document}

where p ( γ s ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\varvec{\gamma }_s)$$\end{document} is the beta-binomial distribution

p 2 γ s , + + B γ s , + + + α , , p 2 - γ s , + + + β B α , , β , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {{p \atopwithdelims ()2} \atopwithdelims ()\gamma _{s\text {, }++}} \frac{\text {B}\left( \gamma _{s\text {, }++} + \alpha , \text {, } {p\atopwithdelims ()2} - \gamma _{s\text {, }++} + \beta \right) }{\text {B}\left( \alpha , \text {, } \beta \right) },\end{aligned}$$\end{document}

with α = β = 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha = \beta = 1$$\end{document} .

11 From the law of total variance: Var ( Σ ∣ X ) ≥ E ( Var ( Σ ∣ X , Γ ) ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {Var}(\Sigma \mid \mathbf {X}) \ge \mathbb {E}(\text {Var}(\Sigma \mid \mathbf {X}\text {, }\varvec{\Gamma }))$$\end{document} .

12 The structure selection approach advocated in this paper offers unbiased estimates of these posterior probabilities and can be used to compute the edge-inclusion Bayes factor. However, as mentioned in Footnote 7, the edge screening step might inflate some of the involved probabilities, so we advocate not to use edge screening when the aim is to compute the posterior plausibility of H 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {H}_1$$\end{document} and H 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {H}_0$$\end{document} . More research is needed to investigate the impact of edge screening on the edge-inclusion Bayes factor.

13 Assuming that θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} is assigned a prior distribution. Otherwise, the second term is omitted.

The editorial process for this contribution to the special issue was not handled by the guest editors, Maarten Marsman and Mijke Rhemtulla, but by the ARCS Section editor Edward Ip.

References

American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed). American Psychiatric Association.Google Scholar

Arnold, B. C., & Strauss, D. (1991). Pseudolikelihood estimation: Some examples. Sankhy

\bar{a}

: The Indian Journal of Statistics, Series B, 53 (2), 233–243.Google Scholar

Barber, R. F., & Drton, M. (2015). High dimensional Ising model selection with Bayesian information criteria. Electronic Journal of Statistics, 9 (1), 567– 607. CrossRef Google Scholar

Barbieri, M. M., & Berger, J. O. (2004). Optimal predictive model selection. Annals of Statistics, 32 (3), 870– 897. CrossRef Google Scholar

Besag, J. (1975). Statistical analysis of non-lattice data. Journal of the Royal Statistical Society. Series D (The Statistician), 24 (3), 179– 195. Google Scholar

Bhattacharyya, A., & Atchade, Y. (2019). Bayesian analysis of high-dimensional discrete graphical models. arXiv. https://arxiv.org/abs/1907.01170 Google Scholar

Borsboom, D., & Cramer, A. O. J. (2013). Network analysis: An integrative approach to the structure of psychopathology. Annual Review of Clinical Psychology, 9 91– 121. CrossRef Google Scholar

Bühlmann, P., Kalisch, M., & Meier, L. (2014). High-dimensional statistics with a view toward applications in biology. Annual Reviews of Statistics and Its Applications, 1 255– 278. CrossRef Google Scholar

Carvalho, C. M., & Scott, J. G. (2009). Objective Bayesian model selection in Gaussian graphical models. Biometrika, 96 (3), 497– 512. CrossRef Google Scholar

Caspi, A., Houts, R., Belsky, D., Goldman-Mellor, S., Harrington, H., Israel, S., Israel, S., Moffit, T. (2014). The p factor: One general psychopathology factor in the structure of psychiatric disorders? Clinical Psychological Science, 2(2), 119–137. https://doi.org/10.1177/2167702613497473 CrossRef Google Scholar

Castillo, I., Schmidt-Hieber, J., & van der Vaart, A. (2015). Bayesian linear regression with sparse priors. The Annals of Statistics, 43(5), 1986–2018. https://doi.org/10.1214/15-AOS1334 CrossRef Google Scholar

Chen, J., & Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95(3), 759–771. https://doi.org/10.1093/biomet/asn034 CrossRef Google Scholar

Consonni, G., Fouskakis, D., Liseo, B., & Ntzoufras, I. (2018). Prior distributions for objective Bayesian analysis. Bayesian Analysis, 13 (2), 627– 679. CrossRef Google Scholar

Constantini, G., Richetin, J., Preti, E., Casini, E., Epskamp, S., & Perugi, M. (2019). Stability and variability of personality networks: A tutorial on recent developments in network psychometrics. Personality and Individual Differences, 136 68– 78. CrossRef Google Scholar

Cox, D. (1972). The analysis of multivariate binary data. Journal of the Royal Statistical Society. Series B (Applied Statistics), 21 (2), 113– 120. Google Scholar

Cramer, A. O. J., van Borkulo, C. D., Giltay, E. J., van der Maas, H. L. J., Kendler, K. S., Scheffer, M., & Borsboom, D. (2016). Major depression as a complex dynamic system. PLoS One, 11 (12), 1– 20. CrossRef Google Scholar PubMed

Cramer, A. O. J., van der Sluis, S., Noordhof, A., Wichers, M., Geschwind, N., Aggen, S. H., ... Borsboom, D. (2012). Dimensions of normal personality as networks in search of equilibrium: You can’t like parties if you don’t like people. European Journal of Personality, 26, 414–431. https://doi.org/10.1002/per.1866 CrossRef Google Scholar

Csiszár, I., & Talata, Z. (2006). Consistent estimation of the basic neighborhood of Markov random fields. The Annals of Statistics, 34 (1), 123– 145. CrossRef Google Scholar

Dalege, J., Borsboom, D., van Harreveld, F., van den Berg, H., Conner, M., & van der Maas, H. L. J. (2016). Towards a formalized acount of attitudes: The causal attitude network (CAN) model. Psychological Review, 123 (1), 2– 22. CrossRef Google Scholar

Dalege, J., Borsboom, D., van Harreveld, F., & van der Maas, H. L. J. (2019). A network perspective on political attitudes: Testing the connectivity hypothesis. Social Psychological and Personality Science, 10(6), 746–756. https://doi.org/10.1177/1948550618781062 CrossRef Google Scholar

Dellaportas, P., Forster, J. J., & Ntzoufras, I. (2002). On Bayesian model and variable selection using MCMC. Statistics and Computing, 12, 27–36. https://doi.org/10.1023/A:1013164120801199 CrossRef Google Scholar

Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38.CrossRef Google Scholar

Dobra, A., & Lenkoski, A. (2011). Copula Gaussian graphical models and their application to modeling functional disability data. The Annals of Applied Statistics, 5(2A), 969–993. https://doi.org/10.1214/10-AOAS397 CrossRef Google Scholar

Donner, C., & Opper, M. (2017). Inverse Ising problem in continuous time: A latent variable approach. Physical Review E, 96(062104), 1–9. https://doi.org/10.1103/PhysRevE.96.062104 CrossRef Google Scholar

Epskamp, S., Borsboom, D., & Fried, E. I. (2018). Estimating psychological networks and their accuracy: A tutorial paper. Behavior Research Methods, 50 195– 212. CrossRef Google Scholar PubMed

Epskamp, S., Cramer, A. O. J., Waldorp, L. J., Schmittmann, V. D., Borsboom, D. (2012). qgraph: Network visualizations of relationships in psychometric data. Journal of Statistical Software, 48 (4), 1– 18. CrossRef Google Scholar

Epskamp, S., Kruis, J., & Marsman, M. (2017). Estimating psychopathological networks: Be careful what you wish for. PLoS One, 12, e0179891. https://doi.org/10.1371/journal.pone.0179891 CrossRef Google Scholar

Epskamp, S., Maris, G., Waldorp, L., & Borsboom, D. (2018). Network psychometrics. In P. Irwing, D. Hughes, & T. Booth (Eds.), Handbook of psychometrics (pp. 953–986). Wiley.CrossRef Google Scholar

Erdős, P., & Rènyi, A. (1960). On the evolution of random graphs. Publications of the Mathematical Institute of the Hungarian Academy of Sciences, 5(1), 17–60.Google Scholar

Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6 (6), 721– 741. CrossRef Google Scholar PubMed

George, E. I. (1999). Discussion of “Bayesian model averaging and model search strategies by Clyde M. In J. Bernardo, J. Berger, A. Dawid, & A. Smith (Eds.), Bayesian statistics (Vol. 6, pp. 175–177). Oxford University Press.Google Scholar

George, E. I., & McCulloch, R. E. (1993). Variable selection via Gibbs sampling. Journal of the American Statistical Association, 88 (423), 881– 889. CrossRef Google Scholar

George, E. I., & McCulloch, R. E. (1997). Approaches for Bayesian variable selection. Statistica Sinica, 7 (2), 339– 373. Google Scholar

Geys, H., Molenberghs, G., & Ryan, L. M. (2007). Pseudo-likelihood inference for clustered binary data. Communications in Statistics-Theory and Methods, 26 (11), 2743– 2767. CrossRef Google Scholar

Gronau, Q. F., Sarafoglou, A., Matzke, D., Boehm, U., Marsman, M., Leslie, D. S., ... Steingroever, H. (2017). A tutorial on bridge sampling. Journal of Mathematical Psychology, 81, 80–97. https://doi.org/10.1016/j.jmp.2017.09.005 CrossRef Google Scholar

Huth, K., Luigjes, K., Marsman, M., Goudriaan, A. E., & van Holst, R. J. (in press). Modeling alcohol use disorder as a set of interconnected symptoms—Assessing differences between clinical and population samples and across external factors. Addictive Behaviors.Google Scholar

Ising, E. (1925). Beitrag zur theorie des ferromagnetismus. Zeitschrift für Physik, 31 (1), 253– 258. CrossRef Google Scholar

Jeffreys, H. (1961). Theory of probability (3rd ed.). Oxford University Press.Google Scholar

Kass, R. E., & Wasserman, L. (1995). A reference Bayesian test for nested hypotheses and its relation to the Schwarz criterion. Journal of the American Statistical Association, 90 (431), 928– 934. CrossRef Google Scholar

Kindermann, R., & Snell, J. L. (1980). Markov random fields and their applications (Vol. 1). American Mathematical Society.CrossRef Google Scholar

Knight, K., & Fu, W. (2000). Asymptotics of Lasso-type estimators. The Annals of Statistics, 28 (5), 1356– 1378. Google Scholar

Kooperberg, C. (2019). logspline: Routines for logspline density estimation. Retrieved from. https://CRAN.R-project.org/package=logspline R package version 2.1.15Google Scholar

Kuo, L., & Mallick, B. (1998). Variable selection for regression models. Sankhy

\bar{a}

: The Indian Journal of Statistics, Series B, 60(1), 65–81.Google Scholar

Kyung, M., Gill, J., Ghosh, M., & Casella, G. (2010). Penalized regression, standard errors, and Bayesian lassos. Bayesian Analysis, 5(2), 369–412.Google Scholar

Lange, K. (1995). A gradient algorithm locally equivalent to the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 57 (2), 425– 437. CrossRef Google Scholar

Lee, M. D., & Wagenmakers, E. (2013). Bayesian cognitive modeling: A practical course. Cambridge University Press.Google Scholar

Love, J., Selker, R., Marsman, M., Jamil, T., Dropmann, D., Verhagen, A. J., ... Wagenmakers, E. (2019). JASP—graphical statistical software for common statistical designs. Journal of Statistical Software, 88(2), 1–17. https://doi.org/10.18637/jss.v088.i02 CrossRef Google Scholar

Ly, A., & Wagenmakers, E. J. (2021). Bayes factors for peri-null hypotheses. arXiv. https://arxiv.org/abs/2102.07162.Google Scholar

Makalic, E., & Schmidt, D.F. (2016). High-dimensional Bayesian regularised regression with the BayesReg package. https://arxiv.org/abs/161106649v3 Google Scholar

Marsman, M., Borsboom, D., Kruis, J., Epskamp, S., van Bork, R., Waldorp, L. J., ... Maris, G. K. J. (2018). An introduction to network psychometrics: Relating Ising network models to item response theory models. Multivariate Behavioral Research, 53(1), 15–35. https://doi.org/10.1080/00273171.2017.1379379 CrossRef Google Scholar

Marsman, M., Maris, G. K. J., Bechger, T. M., & Glas, C. A. W. (2015). Bayesian inference for low-rank Ising networks. Scientific Reports, 5 9050– CrossRef Google Scholar PubMed

Marsman, M., Tanis, C. C., Bechger, T. M., & Waldorp, L. J. (2019). Network psychometrics in educational practice. Maximum likelihood estimation of the Curie–Weiss model. In B. P. Veldkamp & C. Sluijter (Eds.), Theoretical and practical advances in computer-based educational measurement (pp. 93–120). Springer.CrossRef Google Scholar

Marsman, M., & Wagenmakers, E. (2017). Bayesian benefits with JASP. European Journal of Developmental Psychology, 14 (5), 545– 555. CrossRef Google Scholar

Meinshausen, N., & Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. The Annals of Statistics, 34(3), 1436–1462.CrossRef Google Scholar

Meng, X., & Wong, W. H. (1996). Simulating ratios of normalizing constants via a simple identity: A theoretical exploration. Statistica Sinica, 6 (4), 831– 860. Google Scholar

Meredith, M., & Kruschke, J. (2020). HDInterval: Highest (posterior) density intervals. Retrieved from. https://cran.r-project.org/web/packages/HDInterval/index.html Google Scholar

Miller, J. W. (2019). Asymptotic normality, concentration, and coverage of generalized posteriors. arXiv. https://arxiv.org/abs/1907.09611 Google Scholar

Mohammadi, A., & Wit, E. C. (2015). Bayesian structure learning in sparse Gaussian graphical models. Bayesian Analysis, 10 (1), 109– 138. CrossRef Google Scholar

Mohammadi, R. (2020). ssgraph: Bayesian graphical estimation using spike-and-slab priors. Retrieved from. https://cran.r-project.org/package=ssgraph Google Scholar

Mohammadi, R., & Wit, E. (2019). BDgraph: An R package for Bayesian structure learning in graphical models. Journal of Statistical Software, 89(3)CrossRef Google Scholar

Narisetty, N. N., & He, X. (2014). Bayesian variable selection with shrinking and diffusing priors. The Annals of Statistics, 42 (2), 789– 817. CrossRef Google Scholar

Ntzoufras, I. (2009). Bayesian modeling using WinBUGS. Wiley.CrossRef Google Scholar

O’Hara, R. B., & Sillanpää, M. J. (2009). A review of Bayesian variable selection methods: What, how and which. Bayesian Analysis, 4 (1), 85– 118. Google Scholar

Park, T., & Casella, G. (2008). The Bayesian lasso. Journal of the American Statistical Association, 103 (482), 681– 686. CrossRef Google Scholar

Pensar, J., Nyman, H., Niiranen, J., & Corander, J. (2017). Marginal pseudo-likelihood learning of discrete Markov network structures. Bayesian Analysis, 12 (4), 1195– 1215. CrossRef Google Scholar

Polson, N. G., Scott, J. G., & Windle, J. (2013). Bayesian inference for logistic models using Pólya-Gamma latent variables. Journal of the American Statistical Association, 108 (504), 1339– 1349. CrossRef Google Scholar

Polson, N. G. , Scott, J. G. , & Windle, J. (2013b). BayesLogit: PolyaGamma sampling. Retrieved from. https://CRAN.R-project.org/package=BayesLogit CrossRef Google Scholar

Pötscher, B. M., & Leeb, H. (2009). On the distribution of penalized maximum likelihood estimators: The LASSO, SCAD, and thresholding. Journal of Multivariate Analysis, 100 (9), 2065– 2082. CrossRef Google Scholar

R Core Team. (2019). R: A language and environment for statistical computing. Vienna, Austria. https://www.R-project.org/ Google Scholar

Raftery, A. E. (1999). Bayes factors and BIC. Comment on “A critique of the Bayesian information criterion for model selection”. Sociological Methods & Research, 27 (3), 411– 427. CrossRef Google Scholar

Ravikumar, P., Wainwright, M. J., & Lafferty, J. D. High-dimensional Ising model selection using

l_{1}

-regularized logistic regression. (2010). Annals of Statistics, 38 (3), 1287– 1319. Google Scholar

Ročková, V. (2018). Bayesian estimation of sparse signals with a continuous spike-and-slab prior. The Annals of Statistics, 46 (1), 401– 437. CrossRef Google Scholar

Ročková, V., & George, E. I. (2014). The EM approach to Bayesian variable selection. Journal of the American Statistical Association, 109 (506), 828– 846. CrossRef Google Scholar

Ročková, V., & George, E. I. (2018). The spike-and-slab lasso. Journal of the American Statistical Association, 113 (521), 431– 444. CrossRef Google Scholar

Savi, A. O., Marsman, M., van der Maas, H. L. J., & Maris, G. K. J. (2019). The wiring of intelligence. Perspectives on Psychological Science, 16 (6), 1034– 1061. CrossRef Google Scholar

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6 (2), 461– 464. CrossRef Google Scholar

Scott, J. G., & Berger, J. O. (2010). Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. Annals of Statistics, 38 (5), 2587– 2619. CrossRef Google Scholar

Storey, J. (2003). The positive false discovery rate: A Bayesian interpretation and the q-value. The Annals of Statistics, 31 (6), 2013– 2035. CrossRef Google Scholar

Talhouk, A., Doucet, A., & Murphy, K. (2012). Efficient Bayesian inference for multivariate probit models with sparse inverse covariance matrices. Journal of Computational and Graphical Statistics, 21 (3), 739– 757. CrossRef Google Scholar

Tanner, M. (1996). Tools for statistical inference. Methods for the exploration of posterior distributions and likelihood functions. Springer. https://doi.org/10.1007/978-1-4612-4024-2 CrossRef Google Scholar

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58 (1), 267– 288. CrossRef Google Scholar

Tierney, L., Kass, R. E., & Kadane, J. B. (1989). Fully exponential Laplace approximations to expectations and variances of nonpositive functions. Journal of the American Statistical Association, 84 (407), 710– 716. CrossRef Google Scholar

United States Department of Health and Human Services. (2016). National survey on drug use and health, 2014. Inter-university Consortium for Political and Social Research [distributor]. https://doi.org/10.3886/ICPSR36361.v1 CrossRef Google Scholar

van Borkulo, C. D., Borsboom, D., Epskamp, S., Blanken, T. F., Boschloo, L., Schoevers, R. A., & Waldorp, L. J. (2014). A new method for constructing networks from binary data. Scientific Reports, 4, 5918., Google Scholar PubMed

van Borkulo, C. D., Epskamp, S., & Robitzsch, A. (2016). IsingFit: Fitting Ising models using the eLasso method. Retrieved from. https://CRAN.R-project.org/package=IsingFit (R package version 0.3.1)Google Scholar

van de Geer, S., Bülmann, P., Ritov, Y., & Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. The Annals of Statistics, 42 (3), 1166– 1202. CrossRef Google Scholar

van Erp, S., Oberski, D. L., & Mulder, J. (2019). Shrinkage priors for Bayesian penalized regression. Journal of Mathematical Psychology, 28 31– 50. CrossRef Google Scholar

van der Maas, H. L. J., Kan, K. J., Marsman, M., & Stevenson, C. E. (2017). Network models for cognitive development and intelligence. Journal of Intelligence, 5 (2), 1– 17. CrossRef Google Scholar PubMed

Wagenmakers, E. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14 779– 804. CrossRef Google Scholar

Wagenmakers, E., Love, J., Marsman, M., Jamil, T., Ly, A., Verhagen, J., ... Morey, R. D. (2018). Bayesian inference for psychology. Part ii: Example applications with JASP. Psychonomic Bulletin & Review, 25(1), 58–76. https://doi.org/10.3758/s13423-017-1323-7 CrossRef Google Scholar

Wagenmakers, E., Marsman, M., Jamil, T., Ly, A., Verhagen, J., Love, J., ... Morey, R. D. (2018). Bayesian inference for psychology. Part I: Theoretical advantages and practical ramifications. Psychonomic Bulleting & Review, 25(1), 58–76. https://doi.org/10.3758/s13423-017-1343-3 CrossRef Google Scholar

Wang, H. (2015). Scaling it up: Stochastic search structure learning in graphical models. Bayesian Analysis, 10 (2), 351– 377. CrossRef Google Scholar

Williams, D. R. (2021). The confidence interval that wasn’t: Bootstrapped “confidence intervals” in L

_{1}

-regularized partial correlation networks. (PsyArXiv.) https://doi.org/10.31234/osf.io/kjh2f CrossRef Google Scholar

Williams, D. R., & Mulder, J. (2020). Bayesian hypothesis testing for Gaussian graphical models: Conditional independence and order constraints. Journal of Mathematical Psychology, 99 102441– CrossRef Google Scholar

Williams, D. R., & Mulder, J. (2020). BGGM: Bayesian Gaussian graphical models in R. Journal of Open Source Software, 5 (51), 2111– CrossRef Google Scholar

Williams, D. R., Rast, P., Pericchi, L. R., & Mulder, J. (2020). Comparing Gaussian graphical models with the posterior predictive distribution and Bayesian model selection. Psychological Methods, 25 (5), 653– 672. CrossRef Google Scholar PubMed

Windle, J., Polson, N. G., & Scott, J. G. (2014). Sampling Pólya-Gamma random variates: Alternate and approximate techniques. https://arXiv.org/abs/1405.0506 Google Scholar

Womack, A. J., Fuentes, C., & Taylor-Rodriguez, D. (2015). Model space priors for objective sparse Bayesian regression. https://arXiv.org/abs/1511.04745 Google Scholar

Figure. 1 The left panel illustrates the spike-and-slab prior distribution and it’s intersection point δ\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\delta $$\end{document}. The right panel illustrates the relationship between n and the ξδ\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\xi _\delta $$\end{document} value equating the intersection points ±δ\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\pm \delta $$\end{document} to three different credible intervals.

Figure. 2 The left panel illustrates how the two prior distributions assign probabilities to structure complexity. The right panel illustrates how the two prior distributions assign probabilities to different structures with the same complexity. The prior probabilities in the right panel are shown on a log scale. For both panels, p=10\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$p = 10$$\end{document}.

Figure. 7 Estimated posterior distributions for two association parameters in the NSDUH example. We plot the asymptotic posterior distributions (i.e., the normal approximations) based on the EM edge screening output in gray, and the model-averaged posterior distribution based on Gibbs sampling in black. The density was estimated using the logspline R package (Kooperberg, 2019). The gray dot reflects the estimated posterior medians. The 95%\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$95\%$$\end{document} highest posterior density intervals on top were estimated using the HDInterval R package (Meredith & Kruschke, 2020).

Marsman et. al supplementary material

File 174 KB

Article contents

Objective Bayesian Edge Screening and Structure Selection for Ising Networks

Abstract

Keywords

1. Bayesian Model Specification

1.1. The Ising Model Pseudolikelihood

1.2. The Continuous Spike-and-Slab Prior Setup and Its Relation to Other Approaches

2. Structure Selection Consistency

2.1. Selection Consistency

2.1.1. The Fixed Hyperparameter Case

2.1.2. The Shrinking Hyperparameter Case

3. Objective Prior Specification

3.1. Specification of the Spike and Slab Variances

3.2. Specification of the Prior Inclusion Probability

4. Bayesian Edge Screening and Structure Selection for the Ising model

4.1. Edge Screening with EM Variable Selection

4.1.1. Edge Screening

4.2. Structure Selection with SSVS

5. Numerical Illustrations

5.1. Edge Screening on Simulated Data with Sparse Topologies

5.2. Parameter Estimation on Simulated Data with Dense Topology

6. Network Analysis of Alcohol Use Disorder and Depression Data

6.1. Edge Screening

6.2. Structure Selection

6.2.1. Parameter Uncertainty

6.3. Structure Selection Without Pruning

7. Discussion

Appendix A EM Variable Selection for the Pseudolikelihood Ising Model

The E-Step

The M-Step

Posterior Standard Deviations

Appendix C A Gibbs Sampling Routine for Structure Selection

Footnotes

References

Marsman et. al supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests