1 Introduction
Many of the datasets encountered by political scientists in applied research take the form of repeated observations from a fixed number of subjects, the most well-known of which is time series cross-sectional (TSCS) data. When examining these time series data, researchers frequently run into the change-point problem because an underlying theory predicts that the effect of some variables will change or because researchers are concerned that unknown breaks will result in model misspecification and omitted variable bias. In any scenario, identifying change points in regression coefficients has a significant impact on substantive findings.
Detecting change points in regression parameters requires researchers to distinguish between major parametric shifts that can be interpreted as structural changes and minor parametric shifts that must be interpreted as noise. There are two statistical challenges that arise when attempting to identify major change points. First, two “unknowns” in the change-point problem (change points and regime-dependent parameters) must be jointly estimated. Separate estimates, such as data segmentation and separate model fitting, introduce the danger of overfitting or incorrect data splitting. Second, due to the possibility of rank deficiency in subsample data, parameter regularization must be considered in conjunction with the joint estimation of change points and regime-dependent parameters.
In this paper, we propose a new Bayesian method for joint estimation of change points and regime-specific regression parameters in high-dimensional data. Our proposed method combines Bayesian methods for parameter regularization, change-point detection, and variable selection. We first introduce the hidden Markov Bayesian bridge model (HMBB), which combines a Bayesian bridge model for parameter regularization with a hidden Markov model for multiple change-point detection. In this paper, we present HMBB in the context of TSCS data because TSCS data have been the core of dynamic model development in political science literature (Beck Reference Beck2001; Beck et al. Reference Beck, Katz, Alvarez, Garrett and Lange1993; Beck and Katz Reference Beck and Katz1995; Beck and Katz Reference Beck and Katz2011; Box-Steffensmeier et al. Reference Box-Steffensmeier, Freeman, Hitt and Pevehouse2014; Brandt and Freeman Reference Brandt and Freeman2006; Hazlett and Wainstein Reference Hazlett and Wainstein2022; Imai and Kim Reference Imai and Kim2021; Pang, Liu, and Xu Reference Pang, Liu and Xu2022; Western and Kleykamp Reference Western and Kleykamp2004; Wucherpfennig et al. Reference Wucherpfennig, Kachi, Bormann and Hunziker2021).
Having said that, our work closely follows the development of change-point models in political science, economics, and statistics. In political science, Beck (Reference Beck1983) introduced the idea of change points as a special case of time-varying parameter models with a sharp change. Afterward, Western and Kleykamp (Reference Western and Kleykamp2004) elaborated on the benefits of using a Bayesian modeling method to the change-point problem. According to Western and Kleykamp, the Bayesian approach “combines the advantages of diagnostic and parametric approaches but addresses their limitations. …Like diagnostic methods, the Bayesian analysis treats the timing of change as uncertain and the location of a change point as a parameter to be estimated. …Like parametric models, the Bayesian model yields statistical inferences about regression coefficients. However, these inferences reflect prior uncertainty about the location of the change point that is unaccounted for in conventional models” (355). After Western and Kleykamp (Reference Western and Kleykamp2004), Spirling (Reference Spirling2007b) developed the concept of Bayesian change-point modeling in the setting of a limited dependent variable based on Carlin, Gelfand, and Smith (Reference Carlin, Gelfand and Smith1992). On the other hand, Park (Reference Park2010, Reference Park2011b, Reference Park2012) extended Chib’s (Reference Chib1998) multiple-change-point model to binary, ordinal, count, and panel data cases. Blackwell (Reference Blackwell2018) relaxed the restriction of the fixed change-point numbers in over-dispersed count data models by employing Fox et al.’s (Reference Fox, Sudderth, Jordan and Willsky2011) hierarchical Dirichlet process approach. Furthermore, Kent, Wilson, and Cranmer (Reference Kent, Wilson and Cranmer2022) presented a change-point detection method using a permutation-based parameter distribution.Footnote 1
We will discuss the change-point problem in regression models with a large number of predictors in the sections that follow. Following that, we provide a fixed-effects HMBB for TSCS data and describe our proposed estimation and model diagnostic procedures. We demonstrate the proposed method’s performance on simulated data. Then, we apply our method to Alvarez, Garrett, and Lange’s (Reference Alvarez, Garrett and Lange1991) study of the relationship between government partisanship and economic growth, as well as Allee and Scalera’s (Reference Allee and Scalera2012) study of membership effects in international organizations (IOs). Our proposed method is freely available as an open-source R package BridgeChange (https://github.com/jongheepark/BridgeChange).
2 Problem
The change-point problem in regression models with a large number of predictors is illustrated in Figure 1. The difficulty of detecting substantial from minor changes in time-varying characteristics is illustrated in panel (a). The ground truth shows two major shifts (vertical gray bars) in two groups of parameters (A and C). All parameters, however, have changed slightly. The number and location of major changes, as well as regime-specific parameter values, are the two main quantities of importance in the change-point analysis. We need a principled method to identify major changes (vertical gray bars) from minor changes (local fluctuations). It is also important to note that the regime shifts in panel (a) are abrupt, but not deterministic. Separate regression after data splitting (or using a period dummy regression model) does not adequately represent the data generating process with stochastic regime transitions.
Panel (b) in Figure 1 reveals a case of rank deficiency in the change-point analysis even when the pooled data have full rank. Despite the fact that the entire sample has $N> K$ , where N denotes the number of observations and K denotes the number of predictors, one of the subsamples identified by a hidden regime may have $N_m \leq K$ , where $N_m$ denotes the number of observations in regime m and K denotes the number of predictors.
In order to address the problems in Figure 1, we need a statistical method that combines change-point models with high-dimensional regression models. Recently, there has been a surge of high-dimensional change-point detection methods in frequentist approaches (e.g., Chan, Yau, and Zhang Reference Chan, Yau and Zhang2014; Frick, Munk, and Sieling Reference Frick, Munk and Sieling2014; Lee et al. Reference Lee, Liao, Seo and Shin2018; Lee, Seo, and Shin Reference Lee, Seo and Shin2016). Most of these methods focus on simple cases of high-dimensional change-point problems in which only a small subset of parameters are time-varying or the case under consideration is limited to a single-break case.
Our strategy is to take a full advantage of recent developments in regularization methods in the statistics literature (Carvalho, Polson, and Scott Reference Carvalho, Polson and Scott2010; Chernozhukov et al. Reference Chernozhukov, Chetverikov, Demirer, Duflo, Hansen and Newey2017; Fan and Li Reference Fan and Li2001; Hoerl and Kennard Reference Hoerl and Kennard1970; Park and Casella Reference Park and Casella2008; Polson Reference Polson2012; Tibshirani Reference Tibshirani1996; Tibshirani et al. Reference Tibshirani, Saunders, Rosset, Zhu and Knight2004; Zou and Hastie Reference Zou and Hastie2005). In particular, we emphasize that a Bayesian shrinkage approach to high-dimensional regression provides an effective framework for regularizing high-dimensional model parameters while also providing a credible measure of estimation uncertainty (Kyung et al. Reference Kyung, Gill, Ghosh and Casella2010; Polson and Scott Reference Polson and Scott2010).
3 Method
We introduce our proposed method for examining change-point effects in regression models with a large number of predictors in this section. An example procedure for implementing the proposed method is as follows:
-
1. model specification based on a theory and available data,
-
2. model fitting using multiple HMBBs with a varying number of break points,
-
3. model diagnostic using the Watanabe–Akaike Information Criterion (WAIC),
-
4. posterior summary of hidden state transitions and time-varying parameters.
3.1 Bridge Estimator
We begin with the Bridge estimator for parameter regularization (Frank and Friedman Reference Frank and Friedman1993; Fu Reference Fu1998). The Bridge estimator is motivated by the penalized likelihood formulation shown below:
where $0< \alpha \leq 2$ (Frank and Friedman Reference Frank and Friedman1993; Fu Reference Fu1998). The above formula has an interesting feature in that the popular lasso estimator and ridge regression can be obtained as special cases of this estimator when $\alpha = 1$ and $\alpha = 2$ , respectively. The variable selection feature of the bridge regression is asymptotic when $0 < \alpha \leq 1$ (Frank and Friedman Reference Frank and Friedman1993; Huang, Horowitz, and Ma Reference Huang, Horowitz and Ma2008). Because of this generality, Equation (1) has garnered an increasing attention in the statistical literature (Armagan Reference Armagan2009; Fan and Li Reference Fan and Li2001; Huang et al. Reference Huang, Ma, Xie and Zhang2009; Huang et al. Reference Huang, Horowitz and Ma2008; Liu et al. Reference Liu, Zhang, Park and Ahn2007; Polson, Scott, and Windle Reference Polson, Scott and Windle2014).
We employ Polson et al.’s (Reference Polson, Scott and Windle2014) Bayesian approach of the bridge model in this research. A significant novelty in Polson et al.’s (Reference Polson, Scott and Windle2014) Bayesian method is the use of Lévy processes to create joint priors for $\beta _j$ and local shrinkage parameters ( $\lambda _j$ ). A combined prior distribution of the regression parameter b and the local shrinkage parameter $\Lambda = \text {diag}(\lambda _1, \ldots , \lambda _j)$ is represented as follows using scale mixes of normal representation:
where $p(\lambda _j)$ is the density of $2 S_{\alpha /2}$ and $S_\alpha $ is the Lévy alpha-stable distribution.
Equation (2) increases the efficiency of the Markov chain Monte Carlo (MCMC) method by allowing posterior samples of local shrinkage parameters ( $\Lambda = \text {diag}(\lambda _1, \ldots , \lambda _j)$ ) to be sampled independently of global shrinkage parameter sampling ( $\tau $ ). The dependence of the sampling of the global shrinkage parameter on the sample of the local shrinkage parameter has been cited as a significant restriction of Bayesian shrinkage models (Hans Reference Hans2010; Polson et al. Reference Polson, Scott and Windle2014).
In the case of a linear regression model with $\boldsymbol {\beta }$ as regression slope parameters and $\sigma ^2$ as a residual variance parameter, the posterior distribution of the Bayesian bridge model can be written as
3.2 HMBB
Now, we allow the Bayesian bridge model’s parameters to vary in response to a hidden state transition. To be more precise, let $\mathbf {S}$ denote a vector of hidden state variables where $s_t$ is an integer-valued hidden state variable at t
and $\mathbf {P}$ as a forward moving $M \times M$ transition matrix where $\mathbf {p}_i$ is the ith row of $\mathbf {P}$ and M is the total number of hidden states. For example, if we assume a single break, $M = 2$ and $\mathbf {P}$ is a $2 \times 2$ transition matrix. For efficient sampling of hidden states, we adopt Chib’s (Reference Chib1998) non-ergodic transition of hidden states where a hidden state variable starts from state 1 and moves forward to the terminal state (M).
From the above description, we can develop a fixed-effects HMBB for TSCS data using the group-demeaned data ( $\bar {\mathbf {y}}, \bar {\mathbf {x}}$ ):
where $\tau _{m}$ is the break point between regime $m-1$ and regime m. In order to allow many predictors, we use the Bayesian bridge prior for $\boldsymbol {\beta }$ as shown in Equation (2). The posterior distribution of the resulting model is
where $\bar {\mathbf {y}}_{t} = (\bar {y}_{1t}, \ldots , \bar {y}_{nt})$ and $\bar {\mathbf {x}}_{t} = (\bar {\mathbf {x}}_{1t}, \ldots , \bar {\mathbf {x}}_{nt})$ . $\bar {\mathbf {Y}}_{t-1}$ and $\bar {\mathbf {X}}_{t-1}$ indicate all the group-demeaned data up to $t-1$ . The subscript m in model parameters ( $\boldsymbol {\beta }, \sigma ^2, \Lambda , \alpha , \nu $ ) indicates hidden states it belongs to.Footnote 2
If we ignore the Markov property of hidden regimes, we can simplify Equation (6) as
by setting $\mathbf {D}_t = (\bar {\mathbf {y}}_t, \bar {\mathbf {x}}_t)$ and $\boldsymbol {\Theta } = (\boldsymbol {\beta }, \sigma ^2, \Lambda , \alpha , \nu )$ . The posterior distribution takes a form of a mixture distribution. From this, it becomes clear that HMBB estimates can be considered as weighted averages of Bayesian bridge regression model estimates fitted to subsets of data partitioned by known change points. Because we do not know the location and number of change points in reality, the hidden state variable is added as a latent variable and sampled from data in HMBB. The sampling algorithm is discussed in Appendix A.
3.3 Model Diagnostics using WAIC
After fitting multiple HMBBs with a varying number of breaks (or different model specifications), researchers must assess the model’s fit to observed data. Model checking is a critical step in Bayesian analysis in general. This is especially true in the case of change-point analysis, where the break points are unknown.
We recommend the WAIC for model diagnostics of HMBB because of its low computational cost.Footnote 3 WAIC is a fully Bayesian estimate of model uncertainty. WAIC approximates the expected log pointwise predictive density by subtracting a bias for the effective number of parameters from the sum of log pointwise predictive density. Using Gelman, Hwang, and Vehtari’s (Reference Gelman, Hwang and Vehtari2014) formula, a WAIC of HMBB with M latent states ( $\mathcal {M}_M$ ) is
where G is the MCMC simulation size, $V[\cdot ]$ indicates a variance, and $\theta ^{(g)}$ are the gth simulated outputs for $\theta $ .
4 Simulation Study
4.1 Simulated Data
We construct 24 sets of TSCS data with varied group sizes ( $n = (10, 20)$ ), time lengths ( $t = (30, 60)$ ), predictor sizes ( $k = (20, 30)$ ), and break numbers ( $m = (0, 1, 2)$ ) to evaluate the validity of our proposed method. We set $\mathbf {x}_t \sim \mathcal {N}_{k}(\mathbf {0},\mathbf {I}_k)$ and
From this, we generate $\mathbf {y}_t$ as
where $\text {CHOL}$ indicates the Cholesky factorization. That is, the observed data are generated by the systematic component, time-invariant group-level factors, time-varying contemporaneous shocks, and time-varying observation error.
The fixed-effects HMBB of BridgeChange uses pre-transformed data for the input. The pre-transformation is done by plm package in R (Croissant and Millo Reference Croissant and Millo2008). For the one-way fixed-effects HMBB, the data are transformed into group-centered (either by time or group) data. For the two-way fixed-effects HMBB, the data are transformed into doubly group-centered data.
4.2 Simulation Results
Figure 2 summarizes the results of the simulation. Panel (a) shows the root-mean-square errors (RMSEs) of the frequentist fixed-effects model using plm package (FE) and three fixed-effects HMBBs with break numbers of 0, 1, and 2 (HMBB breaks 0, 1, and 2). The formula for RMSE is
The brown circles indicate the true break numbers in panel (a). When the true break value is 0 (left), the fixed-effects model and the fixed-effects HMBB with no break have the lowest RMSEs. When the true break number is 1 (center), the RMSEs of no-break models (FE or HMBB break 0) are much greater than those of HMBB with multiple breaks, indicating a poor model fit. Panel (b) in Figure 2 also shows that WAIC scores successfully identify true models (brown circles).
One interesting pattern in Figure 2 is the over-detection of hidden states by both RMSE and WAIC. When the ground truth is time series data with a single break (the middle column), RMSEs and WAIC scores sometimes favor two-break models over one break models, which are closer to the ground truth. RMSEs and WAIC scores do not, however, favor models with fewer breaks than the ground truth. That is, there is no sign of under-detection of hidden states by HMBB.
Generally speaking, under-detection is a significantly more problematic issue than over-detection. Figure 4 illustrates this point. We compare the simulation results with one break (a) and two breaks (b) for $n = 20, t= 60$ , and $k=30$ . The ground truth of panel (a) is time series data with a single break, and the ground truth of panel (b) is time series data with two breaks. Thus, the right column of panel (a) shows the case of the hidden state over-detection, and the left column of panel (b) shows the case of the hidden state under-detection. The left column of panel (a) and the right column of panel (b) show the cases where the number of hidden states is correctly assumed. It is clear that substantive results obtained using either of the two models in panel (a) are self-evidently similar, whereas substantive results obtained using the two models in panel (b) are vastly different. The under-detected model (the left column of panel (b)) fails to capture an upward parameter shift in $t>40$ .
Figure 3 shows additional simulation results. Panel (a) compares recovered hidden states (bright thin lines) with the ground truth (thick lines). Panel (a) clearly demonstrates that HMBB successfully uncovers hidden state structures under various setups. Panel (b) shows convergence diagnostics of the simulation results using a stabilized Gelman–Rubin statistics (Vats and Knudson Reference Vats and Knudson2021). The values that are close to 1 indicate good convergence of Markov chains. All Gelman–Rubin statistics are close to 1.
To summarize, Bayesian model diagnostics with WAIC give a solid framework for avoiding the problem of under-detection. However, WAIC frequently exposes researchers to the problem of over-detection. Comparing hidden state transitions between multiple models, as we did in Figure 4, is the best way to check for the over-detection problem. We will cover a more practical guidance in greater detail when we examine applications.
5 Applications
In this section, we apply our proposed method to two studies in political science. The first example is Alvarez et al.’s (Reference Alvarez, Garrett and Lange1991) study of partisan sources of economic growth, and the second example is Allee and Scalera’s (Reference Allee and Scalera2012) study on the IO membership effects.
5.1 Alvarez et al.’s (Reference Alvarez, Garrett and Lange1991) Study of Partisan Politics and Economic Growth
Alvarez et al. (Reference Alvarez, Garrett and Lange1991) investigate how government partisanship and the level of labor union centralization affect economic growth in advanced countries using a longitudinal dataset of 16 OECD countries. They discovered that centralized labor organizations have a conditional effect on economic growth: Centralized labor organizations are “conducive to better economic performance when the Left was politically powerful.” In contrast, weaker union movements “had desirable consequences for growth and inflation when governments were dominated by rightist parties” (551). Since then, the growth-promoting effect of left-party government and inclusive labor has received ongoing attention in comparative political economy literature (Beck et al. Reference Beck, Katz, Alvarez, Garrett and Lange1993; Boix Reference Boix1997; Franzese Jr. Reference Franzese2002; Garrett Reference Garrett1998; Rueda Reference Rueda2008; Scruggs Reference Scruggs2001; Soskice and Iversen Reference Soskice and Iversen2000; Western Reference Western1998).
The annual growth rate observed at country i and year t is the dependent variable of Alvarez et al. (Reference Alvarez, Garrett and Lange1991). The independent variables are lagged growth rate (lagg1), weighted OECD demand (opengdp), weighted OECD export (openex), weighted OECD import (openimp), cabinet composition of left-leaning parties (leftc), and the degree of labor organization encompassment (central).Footnote 4 We add a year-fixed effect to Alvarez et al.’s (Reference Alvarez, Garrett and Lange1991) partial interaction model,
Our main goal in the replication is to check whether the key explanatory variables (central, leftc, and leftc $\times $ central) have time-varying effects during the sample period. We consider three different model specifications (no interaction model, a partial interaction model, and a full interaction model) using the original input variables of Alvarez et al. (Reference Alvarez, Garrett and Lange1991). Then, given the short duration of the TSCS data ( $T=15$ ), we set the top limit of breaks to two for each model specification, providing nine models to compare. The panel method employed is the one-way year fixed effects. The country fixed effects are not used due to the time-invariant predictor (central).
The WAIC scores of the tested models are listed in Table 1. The partial interaction model with two breaks has the lowest WAIC score (642), followed by the same model with one break (646). However, as illustrated in the center-left in panel (a) of Figure 5, the initial regime of the two-break partial interaction models lasts only one year and appears to be redundant given the absence of a comparable pattern in the other models. Panel (b) of Figure 5 compares posterior estimates of time-varying parameters of the single-break partial interaction HMBB (left) with the two-break partial interaction HMBBs (right). Except the starting point difference, the two models produce almost identical posterior estimates of time-varying parameters. As a result, we will interpret the data using the single-break partial interaction model.
Panel (b) of Figure 5 shows the posterior estimates of time-varying parameters for a single-break HMBB (left) and a two-break HMBB (right). As expected, the left plot, generated using a single-break HMBB, exhibits a pattern identical to that of the right plot, generated using a two-break HMBB except the initial regime in the right plot.
Last, we compare HMBB estimates with conventional fixed-effects estimates in Figure 6. Several interesting patterns are worth noting. First, conventional fixed-effects estimates take the form of weighted averages of regime-specific HMBB values for certain covariates (e.g., central, inter, lagg1, and leftc), but not for others. Second, the interaction term (inter) and its two constituent terms (central and leftc) exhibit the most pronounced parametric change, which adds an interesting twist to Alvarez et al.’s (Reference Alvarez, Garrett and Lange1991) original conclusion. Only until 1978 did the left party government have a growth-promoting influence in the presence of centralized labor organizations. After that, the effect waned significantly, signaling the start of a new era, the era of neo-liberal reform (Frieden Reference Frieden2020; Helleiner Reference Helleiner1994).
5.2 Allee and Scalera’s (Reference Allee and Scalera2012) Study of Membership Effects in International Organizations
In our second example, we revisit Allee and Scalera’s (Reference Allee and Scalera2012) study on the divergent effects of membership in IOs from 1950 to 2006. Rose (Reference Rose2003) was the first to challenge the conventional wisdom about the GATT/trade-promoting WTO’s effects. Rose concluded from an analysis of bilateral trade data spanning 175 countries and 50 years that “the GATT/WTO seems to have a huge effect on trade if one does not hold other things constant; the multilateral trade regime matters, ceteris non paribus” (emphasis original, 111). This discovery sparked a flood of following studies amending or questioning the null effect (e.g., Goldstein, Rivers, and Tomz Reference Goldstein, Rivers and Tomz2007; Gowa and Kim Reference Gowa and Kim2005; Park Reference Park2012; Subramanian and Wei Reference Subramanian and Wei2007; Tomz, Goldstein, and Rivers Reference Tomz, Goldstein and Rivers2007). Allee and Scalera (Reference Allee and Scalera2012) is one of recent amendments that settles conflicting evidence regarding the effects of GATT/WTO membership on trade.
The key explanatory variable in Allee and Scalera (Reference Allee and Scalera2012) is the type of accession, which is classified into three categories: early accession, automatic accession, and rigorous accession. According to Hypothesis 1 in Allee and Scalera (Reference Allee and Scalera2012), rigorous accession must have a greater trade-promoting effect than other types of accession because “the more rigorous a state’s accession to an IO, and thus the more policy changes required to join, the greater the benefits it will receive from membership” (243). Temporal heterogeneity is one of main concerns in Allee and Scalera (Reference Allee and Scalera2012). They argue “although rigorous and early joiners should benefit from membership, we [Allee and Scalera] expect those benefits to be most pronounced in the years after accession and to fade over time” (260). They deal with the temporal effect heterogeneity by employing a “counter” variable that counts how many years have passed.
Our goal of replication is to check temporal heterogeneity in the key explanatory variables of Hypothesis 1 in Allee and Scalera (Reference Allee and Scalera2012) using HMBB.Footnote 5
In words, we examine whether the effect of accession type on a country’s total national trade may vary over time as a result of economic shocks, decaying effects of accession types, or network effects of IO membership. Due to the significant fraction of missing observations in the original data, we use Ranjit’s (Reference Ranjit2016) imputed data. The specification for the panel is identical to that in the original publication (a two-way fixed-effects at the country and year level).
The results of Bayesian model diagnostics using WAIC for the two models in our replication are summarized in Table 2. The two-break Column (6) model has the lowest WAIC score (22,078) among the six models in comparison. Given the possibility of the over-detection, we further examine hidden state transitions and time-varying movements of parameters in Figure 7.
Panel (a) of Figure 7 clearly shows that either 1964 or 2007 is estimated as change points across different HMBB specifications. Panel (b) demonstrates that a single-break HMBB of Column (6) shows a starkly different picture from a two-break HMBB of Column (6). Compare the two covariates at the top in panel (b). While the effect of (log) population (lnpop1) is not affected by hidden regime transitions, the effect of the level of economic development (gled_gdppc) shows dramatic shifts over time.
Figure 8 compares parameter estimates of the conventional fixed effects with HMBB estimates. Several interesting patterns are worth noting. First, the two key explanatory variables (rigorous and rigorouscounter) show strong time-dependent effects. In Regime 1 (1946–1964), the marginal effect of rigorous accession on country’s total national trade is statistically indistinguishable from 0. This is very much in accordance with reality. Between 1946 and 1954, the proportion of sample countries with the status of rigorous accession is zero. After 1955, the fraction gradually grew, and by 1964, only 12 countries (1%) had achieved rigorous accession. This historical pattern in the effect of rigorous accession is unobservable using traditional fixed-effect estimation methods.
Second, a substantial decline in rigorouscounter, the interaction of rigorous with its “counter,” following the second break (2007) can be attributed to the economic crisis of 2008–9. The greater negative trend in rigorouscounter following 2007 implies that countries that have a lengthy history of rigorous admission have borne the brunt of the economic crisis.
Last, as illustrated in Figure 6, there is no consistent pattern in the association between conventional fixed-effects and HMBB estimates. In some circumstances (e.g., gled_gdppc), the conventional fixed-effects estimates are close to the regime 2 estimates, but not in others (e.g., lnpop1 and earlymemcounter).
6 Discussion
A typical way to use our method for TSCS data will be as follows. First, researchers build a set of regression models for the change-point analysis. Next, researchers fit HMBBs with a varying number of breaks ranging from 0 to an upper limit researchers deem ideal given the model, data, and theory. Researchers make an informed decision regarding the best-fitting model by analyzing the WAIC scores, hidden state transitions, and parameter changes of several HMBBs.
Once researchers have selected an appropriate HMBB using the above procedure, they will have $K \times M$ regime-specific regression coefficients to analyze. When either K or M is large, it might be difficult to interpret all the time-varying information. Although Bayesian shrinkage approaches outperform variable selection methods such as spike and slab prior models in terms of computational efficiency, they lack the sparsity property. That is, in Bayesian shrinkage approaches, parameter estimates are never exactly zero. Thus, it would be beneficial for researchers to discriminate between “strong” (i.e., statistically distinct from 0) and “weak” (i.e., statistically indistinguishable from 0) signals across all $K \times M$ parameters.
In this scenario, we can employ the decoupled shrinkage and selection (DSS) method that minimizes an $\ell _0$ -type loss function on pre-regularized posterior distributions (Hahn and Carvalho Reference Hahn and Carvalho2015). The DSS loss function of the HMBB for regime m can be constructed as follows using HMBB estimations as shrinkage inputs:
where $\mathbf {X}_m\boldsymbol {\beta }^{*}_{m}$ is the fitted value of the fixed-effects HMBB at regime m. $\lambda $ is a nonnegative regularization parameter, and $\gamma $ is the new slope parameters that minimize the loss function.
To find the optimum of Equation (8), we take the popular approach of the $\ell _1$ surrogation of the $\ell _0$ problem. Furthermore, to better target the $\ell _0$ solution, we use the weight vector ( $\widehat {w}_{j, m} = \frac {1}{|\widehat {\gamma }_{j, m}|^{\delta }}$ ) where $\delta> 0$ and $\widehat {\gamma }_{j, m}$ is the root N-consistent estimate of $\gamma _{j, m}$ as suggested by Zou (Reference Zou2006):
Table 3 summarizes the DSS results using a single-break HMBB fitted on the full interaction model of Alvarez et al. (Reference Alvarez, Garrett and Lange1991). Notably, regime-specific parameters that are near to zero are forced to zero, which helps researchers concentrate on nonzero DSS estimates for succinct interpretation of HMBB results.
7 Conclusion
We presented a Bayesian strategy for detecting and estimating change points in regression models with a high number of predictors in this article. The suggested model unifies a variety of statistical methods (a Bayesian shrinkage method, a change-point model, and sparse regression) to enable researchers to perform effective and consistent inference on time-varying parameters in a variety of TSCS datasets. We concentrate on the fixed-effects method in this article because it is one of the most often-used ways to analyze TSCS data in political science. However, our proposed strategy has a far broader application than the fixed-effect model. Our software package includes a tool for constructing a regression model from univariate time series data as well as multilevel models with changing intercepts or slopes. We intend to expand the presented strategy to models with discrete response data.
While we are confident in our proposed method’s performance in regular TSCS data settings, we would like to point out a few limitations. First, HMBB is best suited to changes in parameter values that are quite abrupt or significant. Dynamic linear models are better equipped to handle slow, cyclical, or evolutionary changes (West and Harrison Reference West and Harrison1997). Second, HMBB determines whether all parameters have a common break. Assume that only a small subset of parameters changes over time, whereas the remainder remain constant, or that parameters exhibit heterogeneous change points. HMBB would be incapable of identifying these parameter-specific change points precisely. We are currently developing change-point regression models that allow for parameter-specific break detection and regularization using recent breakthroughs in Bayesian statistics, such as Hahn et al. (Reference Hahn, Carvalho, Puelz and He2018). Finally, HMBB uses the Bayesian bridge model as a baseline model for shrinkage. It is worth trying to combine different shrinkage methods, such as the horseshoe prior (Carvalho et al. Reference Carvalho, Polson and Scott2010), with change-point models.
Appendix A. Sampling Algorithm
We first implement centering of data (group-centering in the case of panel data) to make parameters have a common range.
-
1. Sampling $p(\boldsymbol {\beta } |\alpha , \boldsymbol {\Lambda }, \sigma ^2, \tau , \mathbf {P}, \mathbf {S}, \mathbf {y})$ : If $n_m> p$ , the posterior of $\boldsymbol {\beta }$ follows the multivariate normal distribution, which is given by
(10) $$ \begin{align} \boldsymbol{\beta}_m | \sigma^2, \lambda_m, \alpha_m, \tau, \mathbf{P}, \mathbf{S}, \mathbf{y}_m \sim \mathcal{N}_p \left(\frac{\mathbf{V}\mathbf{X}_m'\mathbf{y}_m}{\sigma^2_m}, \mathbf{V}=\left (\mathbf{X}_m'\mathbf{X}_m + \frac{\sigma_m^2}{\tau^2} \lambda _m \mathbf{I} \right)^{-1} \right). \end{align} $$If $n_m \leq p$ , we use Bhattacharya, Chakraborty, and Mallick’s (Reference Bhattacharya, Chakraborty and Mallick2016) algorithm:
-
(a) For each regime, sample $\mathbf {u}_m \sim \mathcal {N}(\mathbf {0},\mathbf {D}_m)$ and $\boldsymbol {\delta }_m \sim \mathcal {N}(\mathbf {0},\mathbf {I}_{n_m})$ where $d_{j,j} = \sqrt {\frac {\lambda _{m,j} \sigma ^2_m}{\tau _m}}$ is the jth diagonal entry of $\mathbf {D}_m$ matrix and $n_m$ is the number of observations at regime m.
-
(b) Set $\boldsymbol {\nu }_m = \mathbf {X}_m \mathbf {u}_m + \boldsymbol {\delta }_m$ .
-
(c) Solve $(\mathbf {X}_m\mathbf {D}\mathbf {X}^{\prime }_m + \mathbf {I}_{n_m} )\boldsymbol {\omega }_m = (\frac {\mathbf {y}_m}{\sigma _m^2} - \boldsymbol {\nu }_m)$ .
-
(d) Set $\boldsymbol {\beta }_m = \mathbf {u}_m + \mathbf {D}_m\mathbf {X}^{\prime }_m \boldsymbol {\omega }_m$ .
-
-
2. Sampling $p(\beta _{0}| \boldsymbol {\Lambda } , \boldsymbol {\beta }, \alpha , \tau , \sigma ^2, \mathbf {P}, \mathbf {S}, \mathbf {y})$ : We separately estimate the intercepts for each regime to remove any discrepancy in regression slopes in each simulation.
$$ \begin{align*} \beta_{0m} \gets \overline{\boldsymbol{y}}_{m} - \overline{\mathbf{X}}^{\top}_{m} \boldsymbol{\beta}_{m}, \end{align*} $$where(11) $$ \begin{align} \overline{\boldsymbol{y}}_{m} = \frac{\sum^{n}_{t=1}\boldsymbol{1}\{s_{t} = m\} y_{t}}{\sum^{n}_{t=1}\boldsymbol{1}\{s_{t} = m\}},\quad \text{and}\quad \overline{\mathbf{X}}_{m,j} = \frac{\sum^{n}_{t=1}\boldsymbol{1}\{s_{t} = m\} \mathbf{X}_{m,tj} }{\sum^{n}_{t=1}\boldsymbol{1}\{s_{t} = m\}}. \end{align} $$ -
3. Sampling $p(\alpha |\boldsymbol {\Lambda } , \boldsymbol {\beta }, \sigma ^2, \tau , \mathbf {P}, \mathbf {S}, \mathbf {y})$ : We use a Griddy Gibbs sampler (Tanner Reference Tanner1996) for the sampling of $\alpha $ because $\alpha $ is univariate and its support is bounded by $(0, 2]$ .
-
4. Sampling $p(\tau |\boldsymbol {\Lambda } , \boldsymbol {\beta }, \alpha , \sigma ^2, \mathbf {P}, \mathbf {S}, \mathbf {y})$ : Sample $\nu $ first and then transform $\nu $ to $\tau $ .
$$ \begin{align*} \nu_m &\sim \text{Gamma}(c, d),\\[4pt] \tau_m &= \nu^{-\frac{1}{\alpha_m}}, \end{align*} $$where $c = c_0 + p/\alpha _m$ and $d = d_0 + \sum _{j=1}^{p}|\beta _{j, m}|^{\alpha _m}$ . -
5. Sampling $\mathbf {S}|\boldsymbol {\Lambda } , \boldsymbol {\beta }, \alpha , \tau , \sigma ^2, \mathbf {P}, \mathbf {y}$ : Sample $\mathbf {S}$ recursively using Chib’s (Reference Chib1998) algorithm.
-
6. Sampling from $\mathbf {P}|\boldsymbol {\Lambda } , \boldsymbol {\beta }, \alpha , \tau , \sigma ^2, \mathbf {S}, \mathbf {y}$ :
$$ \begin{align*} p_{kk} \sim \mathcal{B}eta(a_0 + j_{k, k} - 1,b_{0} + j_{k, k+1}), \end{align*} $$where $p_{kk}$ is the probability of staying when the state is k, and $j_{k, k}$ is the number of jumps from state k to k, and $j_{k, k+1}$ is the number of jumps from state k to $k+1$ .
Appendix B. How to Use BridgeChange
Acknowledgments
We appreciate valuable comments from Le Bao, David Carlson, Taeryon Choi, Max Goplerud, Kosuke Imai, Heeseok Oh, and three anonymous reviewers. We would like to thank seminar participants of the 2018 International Society for Bayesian Analysis meeting, the 2019 Asian Political Methodology meeting, the 2019 Midwest Political Science Association meeting, and a symposium on Bayesian political methodology at Washington University in St. Louis, 2019. Jong Hee Park was supported by the SNU 10-10 project of Seoul National University.
Data Availability Statement
Replication code for this article is available in Park and Yamauchi (Reference Park and Yamauchi2022) at https://doi.org/10.7910/DVN/MCQTYC.
Supplementary Material
For supplementary material accompanying this paper, please visit https://doi.org/10.1017/pan.2022.23.