1. Introduction
In a spatial econometric model, the behavior of one cross-sectional unit is co-determined by spatial lags on other cross-sectional units through their coefficients and the specification of spatial weight matrices. Overview papers, book chapters, and monographs in political science discussing spatial econometric models are of Cook, Hays, and Franzese (Reference Cook, Hays, Franzese, Curini and Franzese2020), Darmofal (Reference Darmofal2015), Di Salvatore and Ruggeri (Reference Di Salvatore and Ruggeri2021) and Harbers and Ingram (Reference Harbers, Ingram, Giraudy, Moncada and Snyder2019). One prominent model, advocated by LeSage and Pace (Reference LeSage and Pace2009) as it stands out as superior in a wide number of empirical applications, is the spatial Durbin (SD) model that contains a spatial lag in the dependent variable and each of the explanatory variables. This model has been used to study different topics in political science, economics, and beyond. Examples are Karahasan, Pinar, and Deniz (Reference Karahasan, Pinar and Deniz2023) on political climate and regional well-being, Pantera, Bömelt, and Bakaki (Reference Pantera, Bömelt and Bakaki2023, Table A.2) on the attitude to disasters, Lewbel, Qu, and Tang (Reference Lewbel, Qu and Tang2023) on peer and contextual effects on student’s performance in mathematics, De Siano and Chiariello (Reference De Siano and Chiariello2022) on women’s political empowerment, Juhl (Reference Juhl2021) on search strategies for spatial model specifications in general and voting in particular, Borsky and Kalkschmeid (Reference Borsky and Kalkschmeid2019) on corruption, and Ingram (Reference Ingram, Shirk, Wood and Olson2014) on homicide rates.
Although the literature formally allows for different spatial weight matrices in models with multiple types of spatial lags, empirical applications and Monte Carlo simulations tend to use one common specification for all spatial lags being considered. Both from a theoretical and practitioner viewpoint, this is rather restrictive as the distance decay of each spatial lag is likely to be different for different regressors. Furthermore, using wrong spatial weight matrices may lead to biased estimation results. Our study develops an improvement upon this practice and contributes to the existing literature in the following ways. We parameterize each spatial weight matrix in the SD model with a different rather than one common distance decay parameter for two frequently used functional forms of distance decay, the negative exponential and the inverse distance form. We provide mathematical expressions for the direct and indirect effects and their standard errors, which represent useful summary statistics of the parameter estimates in a spatial econometric model. Finally, we show that our approach makes the spatial range of the indirect effects more flexible than the traditional approach that uses the same pre-specified matrix, which is prone to misspecification.
A (quasi) maximum likelihood ((Q)ML) estimator is used to jointly estimate the decay and response parameters of the SD model. This estimator is applicable to the common large N and fixed T framework and extends previous work of Lee and Yu (Reference Lee and Yu2010). R and MATLAB functions of the proposed estimator and the determination of the direct and indirect effects will be made available to give other researchers the opportunity to apply it to their own empirical setting.Footnote 1
Monte Carlo experiments show that our approach works well in estimating not only the parameters but also direct and indirect effects. Importantly, especially the indirect effects appear sensitive to an incorrect choice of the spatial weight matrix. This finding throws another light on the work of LeSage and Pace (Reference LeSage and Pace2014), who consider the notion of the estimated direct and indirect effects being sensitive to the specification of a particular spatial weight matrix “as the biggest myth about spatial regression models” (p. 218). In contrast to the direct effects, this does not apply to the indirect effects.
An empirical application on military expenditures from Yesilyurt and Elhorst (Reference Yesilyurt and Elhorst2017) is utilized to show the usefulness of the proposed parameterization approach for applied researchers. While these authors fail to find significant empirical evidence in favor of the SD model and the existence of indirect effects, we succeed because we estimate and allow for different rather than one common pre-specified distance decay parameter on which the spatial weight matrix of each lag is based. This extension also reveals that the spatial range of each regressor is no longer the same, which we both demonstrate mathematically and illustrate graphically.
This paper is organized as follows. In Section 2, we review previous studies that parameterized the spatial weight matrix or suggested alternative estimation approaches. In Section 3, we set out the SD model with parameterized weight matrices and explain how the parameters and direct and indirect effects can be estimated. In Sections 4 and 5, we present the results of a Monte Carlo experiment and discuss the results of an empirical analysis. Finally, we draw conclusions.
2. Related Work
Specifying the spatial weight matrix has been recognized as difficult and controversial (Bavaud, Reference Bavaud1998; Corrado and Fingleton, Reference Corrado and Fingleton2012; Neymayer and Plümper, Reference Neymayer and Plümper2016). Researchers generally agree that spatial weights should decrease with some generalized distance, but a consensus specification is still missing. According to Franzese and Hays (Reference Franzese, Hays, Box-Steffensmeier, Brady and Collier2008), Corrado and Fingleton (Reference Corrado and Fingleton2012), and Neymayer and Plümper (Reference Neymayer and Plümper2016), the spatial econometric model and the specification of the spatial weight matrix, commonly symbolized by $\boldsymbol {W}$ , should be theory-driven. While several studies develop a theoretical model that leads to a certain type of spatial econometric model, research containing a theoretical derivation of the spatial weight matrix is scarce. Bavaud (Reference Bavaud1998) provides a systematic overview of the general theoretical properties of spatial weight models, but he does not explain how to apply the theory in practice. The author even apologizes for this omission (p. 154). Instead, practitioners tend to adopt one of the many popular specifications and use it for all spatial lags in the model: (i) binary contiguity matrices, (ii) negative exponential or inverse distance decay matrices (with or without a cutoff point), (iii) nearest neighbor matrices, and (iv) block diagonal or group interaction matrices.
To justify the chosen spatial weight matrix, some researchers estimate the same model with different spatial weight matrices and then select the one that yields the highest log-likelihood value or Bayesian posterior model probability (Juhl, Reference Juhl2020). However, specifying and checking all potential spatial weight matrices is time-consuming, while the researcher runs the risk to pick the wrong one if the choice set is limited. Recent research deviates from pre-specified spatial weight matrices and proposes approaches to estimate them.
Several research papers infer $\boldsymbol {W}$ from the data using various (geo)statistical modeling techniques, provided that the number of observations over time (T) exceeds the number of observations across space (N) (Ahrens and Bhattarchajee, Reference Ahrens and Bhattarchajee2015; Lam and Souza, Reference Lam and Souza2019; also see Chapter 4 of Beenstock and Felsenstein, Reference Beenstock and Felsenstein2019, for an overview). However, this approach does not address the majority of empirical studies based on $N>>T$ . Instead of assuming that T is sufficiently large, recent studies in the social interactions literature propose to estimate $\boldsymbol {W}$ assuming that the number of unknown elements is sufficiently small (Lewbel, Qu, and Tang, Reference Lewbel, Qu and Tang2023). However, whether this assumption makes sense can only be determined by first estimating the strength of the distance decay effect and then the most plausible cutoff point. This can be achieved by specifying a functional form for the elements of $\boldsymbol {W}$ that depends on a distance decay parameter. The two most popular forms are the inverse and negative exponential distance decay matrices. However, many studies utilizing parameterized spatial weight matrices eventually use pre-specified values for the distance decay parameters (Fingleton and LeGallo, Reference Fingleton and LeGallo2008; Murdoch, Rahmatian, and Thayer, Reference Murdoch, Rahmatian and Thayer1993). The reason is that parameterizing $\boldsymbol {W}$ leads to an econometric model that is nonlinear in its parameters and thus requires nonlinear estimation techniques enhancing the complexity of the analysis. There are a few studies that estimate the distance decay parameter, albeit with limitations.
Fischer, Scherngell, and Reismann (Reference Fischer, Scherngell and Reismann2009) study knowledge spillovers employing a spatial error model (SEM) that uses a parameterized exponential distance decay matrix based on geographic distance. It contains a detailed description of the estimation of the response parameters but not of the distance decay parameter. Dubin (Reference Dubin1988) provides the concentrated log-likelihood function of the distance decay parameter in a SEM and then explains that the optimization can be performed with a sophisticated though unspecified optimization routine, or with a simple grid search. Kakamu (Reference Kakamu2005) uses Bayesian techniques to estimate a spatial autoregressive (SAR) model in which the spatial weight matrix is specified as a parameterized inverse distance matrix. In a short numerical example in which the distance decay parameter is set to $-8$ , an estimate of $-26.322$ is found, pointing to a serious estimation bias. Halleck Vega and Elhorst (Reference Halleck Vega and Elhorst2015) estimate a model with spatially lagged regressors, all with the same parameterized inverse distance spatial weight matrix. Boehmke et al. (Reference Boehmke, Avery, Good, Dainty and Ko2023) adopt an exponential distance decay matrix to measure the impact of “black live matter” protests, a spatially lagged regressor, on the public opinion of individuals and vary this decay parameter in a pre-specified interval (0.5–0.9) by steps of one-tenth. In sum, the aforementioned studies on parameterized spatial weight matrices focus on at most one type of spatial lag, one form of parameterization of the spatial weight matrix, and only one distance decay parameter. However, in empirical research models containing multiple forms of spatial lags and different spatial weight matrices for dependent variable and regressors are plausible. Therefore, our fully parameterized SD model is more general than existing work.
Another approach is to adopt a multiple-spatial-weights specification for each spatial lag in the model (Hays, Kachi, and Franzese, Reference Hays, Kachi and Franzese2010; Juhl, Reference Juhl2020; Neymayer and Plümper, Reference Neymayer and Plümper2016). Debarsy and LeSage (Reference Debarsy and LeSage2021) examine this in detail for a model with a spatially lagged dependent variable, where the spatial weight matrix is specified as a convex combination of several underlying unscaled matrices. The contribution of each submatrix to the overall matrix is estimated together with the other model parameters using Bayesian estimation. Although promising since Neymayer and Plümper (Reference Neymayer and Plümper2016) argue that “geographical proximity is a poor proxy for connectivity” (p. 180), one limitation is that only one spatial lag with such an overall spatial weight matrix is considered. If the number of submatrices to construct the overall matrix is M and the number of spatial lags would be extended to $K+1$ , as in an SD model, then $M(K+1)$ additional parameters need to be estimated, which could potentially lead to over-parameterization problems. In contrast, only $K+1$ additional (distance decay) parameters need to be estimated when adopting a parameterized distance decay matrix for each spatial lag in the model. Furthermore, when these parameters are estimated, their levels of significance explicitly provide information on the amount of uncertainty of the spatial weight matrices (Vande Kamp, Reference Vande Kamp2020).
To the best of our knowledge, there is no paper that allows for different parameterized spatial weight matrices for all spatial lags of the model and that enables estimating the distance decay parameters jointly with the response parameters in a unified framework. We aim to fill this gap, noting that it shares resemblance with multiscale geographically weighted regressions (MGWRs). This approach recognizes that the relationships between the dependent variable and the regressors may operate at different spatial scales and that each relationship has a different spatial weight matrix (Fotheringham et al., Reference Fotheringham, Yang and Kang2017, Reference Fotheringham, Oshan and Li2024). The difference is that MGWRs address parameter heterogeneity regarding the impact of changes in the regressors on the dependent variable of the unit itself, while SD models are also and especially used to determine the impact of these changes on the dependent variable of other units in the sample.
3. The Methodology of Parameterization
3.1. The Parameterized Spatial Durbin Model
We consider the following parameterized SD model:
where $i = 1,\ldots ,N$ , $t = 1,\ldots ,T$ , N is the number of cross-sectional units, and T is the number of time periods. We assume a balanced panel of large N and finite T. $y_{it}$ represents the dependent variable of unit i at time t, and $x_{kit}$ the kth non-stochastic explanatory variable with coefficient $\beta _k$ . The term $\sum _{j=1}^{N} w_{ij}(\alpha _0)y_{jt}$ denotes the spatial lag of the dependent variable of units other than i, and the accompanying coefficient $\rho $ the impact of this spatial lag. Similarly, the regressors $\sum _{j=1}^N w_{kij}(\alpha _{k})x_{kjt}$ ( $k=1,\ldots ,K$ ) denote the spatial lags of the explanatory variables, whose impacts are measured by the coefficients $\gamma _{k}$ . The elements $w_{ij}(\alpha _0)$ and $w_{kij}(\alpha _k)$ measure the relationships between units i and j. These elements are heterogeneous for the different types of regressors and are parameterized by additional parameters $\alpha _0$ or $\alpha _k$ . The individual fixed effects $c_{i}$ ( $i=1,\ldots ,N$ ) control for unobserved individual-specific, time-invariant effects. Similarly, the time period fixed effects $\xi _t$ ( $ t=1,\ldots ,T-1)$ Footnote 2 control for time-specific, unit-invariant effects. We allow $c_i$ and $\xi _t$ to be correlated with $x_{it}$ and $\sum _{j=1}^N w_{kij}(\alpha _{k})x_{kjt}$ . Furthermore, if T is small, $\xi _t$ can simply be included in the set of explanatory variables. Finally, $\varepsilon _{it}$ is a unit-time varying error term, with $E(\varepsilon _{it})=0$ and $Var(\varepsilon _{it})=\sigma ^2$ ( $i=1,...,N; t=1,...,T$ ).
Just as previous studies based on the SD model cited in the introduction, we assume that the spatial weight matrices are constant over time and that the explanatory variables and their spatial lags are exogenous.
3.2. Parameterization and Estimation
Two commonly used parameterizations of the spatial weight matrix are the negative exponential and the inverse distance matrix. For $i \neq j$ , the $ij$ th element of the row normalized negative exponential matrix is given by
and the $ij$ th element of the inverse distance matrix by
where $d_{ij}$ denotes the generalized distance between units i and j and $\alpha _k$ the distance decay parameter of the kth spatial lag. Both geographic and non-geographic (economic, cultural, geodemographic, and institutional) distance or connectivity measures, or combinations thereof, are possible, as long as they are exogenous. The diagonal elements of the spatial weight matrix are assumed to be zero to prevent units from predicting themselves.
For positive values of $\alpha _k$ and $d_{ij}$ , both functional forms have the following properties conditional on each other: (i) $w_{ij}(\alpha _k) \stackrel {(\alpha _k|d_{ij}) \downarrow 0}{\longrightarrow } 1$ and $w_{ij}(\alpha _k)\stackrel {(\alpha _k|d_{ij}) \rightarrow \infty }{\longrightarrow }0$ and (ii) $w_{ij}(\alpha _k) \stackrel {(d_{ij}|\alpha _k) \rightarrow \infty }{\longrightarrow }0$ . These properties imply that for large values of $\alpha _k$ , the elements for which $d_{ij}$ is also large converge to zero, that is, there exists a cutoff point where their values become so small that they can be set to zero without essentially changing the spatial weight matrix. Such a matrix with many zero elements is called a sparse matrix. Conversely, for small(er) values of $\alpha _k$ , the matrix will become dense with hardly any zero elements. In Supplementary Appendix A.1, we also consider scalar normalization of the spatial weight matrices by the largest eigenvalue as an alternative to row normalization.
By making the spatial weight matrices dependent on distance decay parameters $\alpha _k\ (k=0,1,\ldots ,K)$ , they become stochastic in the sense that they are subject to uncertainty and a margin of error (Vande Kamp, Reference Vande Kamp2020). Gupta (Reference Gupta2019) shows that many established estimation methods also work with an exogenous stochastic spatial weight matrix, as long as the sample size N diverges to infinity faster than the row and column sums of the stochastic spatial weight matrices. This condition requires that $\alpha _k>0$ for a negative exponential distance decay matrix and $\alpha _k>1$ for an inverse distance matrix (see Supplementary Appendix A.6 for details).
Lee and Yu (Reference Lee and Yu2010) set out the assumptions under which the (Q)ML estimator of the model parameters in Equation (1) are identified, consistent, and asymptotically normal, provided that the $\boldsymbol {W}$ matrices are nonstochastic, that is, not parameterized. The regular rate of convergence is $\sqrt {N}$ , provided that T is small or fixed. In Supplementary Appendix A.2, we set out the estimation procedure, and in Supplementary Appendix A.6, we discuss all assumptions, as well as those that need to be adapted, such that the proof by Lee and Yu (Reference Lee and Yu2010) carries over to the model in this study. One important change is that the response parameters $\rho $ and $\gamma _k$ ( $k=1,\ldots ,K$ ) should be bounded away from zero; otherwise, the distance decay parameters $\alpha _k$ ( $k=0,\ldots ,K$ ) are not identified. Just as the response parameters in Lee and Yu (Reference Lee and Yu2010), the distance decay parameters can also be estimated by ML or quasi (Q)ML, depending on whether or not the error terms are assumed to be normally distributed. If their distribution is not specified, application of QML will have a downward effect on the significance levels of the parameter estimates.
We developed routines in both MATLAB and R enabling estimation of the proposed model for different normalizations and distance decay functions. A description can be found in Supplementary Appendix A.8.
The average computation time for an SD model with $K=2$ regressors and thus three ( $K+1$ ) different decay parameters next to five ( $2K+1)$ response parameters and a sample size of $N=200$ and $T=5$ is about 0.87 seconds. When $N=800$ , this increases to 1.64 seconds, and when $N=1,600$ to 77.50 seconds. Overall, these computation times do not constitute an obstacle to further empirical research.
3.3. Direct and Indirect Effects
A quantitative interpretation of the coefficient estimates in Equation (1) is hindered by the fact that they do not represent the marginal effects of the explanatory variables (LeSage and Pace, Reference LeSage and Pace2009). Marginal effects are obtained by taking the first-order derivatives of the reduced form of the SD model reformulated in vector form, which yields an $N \times N$ matrix $\boldsymbol {M}_k$ ( $k=1,\ldots ,K$ ) of the form
Every diagonal element of $\boldsymbol {M}_k$ reflects the impact of changing the kth explanatory variable of one observational unit i on the dependent variable of i, and every off-diagonal element the corresponding impact on the dependent variable of another unit j ( $j \neq i$ ). To suppress the amount of output, LeSage and Pace (Reference LeSage and Pace2009) suggest two summary indicators: the direct effect $DE_{k}$ measured as the average of all diagonal elements of $\boldsymbol {M}_k$ , and the indirect effect $IE_{k}$ measured as the average row or column sum of the off-diagonal elements excluding the respective diagonal element.Footnote 3 For a recent and detailed discussion of the interpretation of spatial models, see Whitten, Williams, and Wimpy (Reference Whitten, Williams and Wimpy2021). Importantly, both studies explain that the direct effects also encompass feedback effects that circulate across all units in the sample and eventually influence the originator of the change in the explanatory variable.
The marginal effects of the SD model when allowing for different distance decay parameters have three important properties. First, Halleck Vega and Elhorst (Reference Halleck Vega and Elhorst2015) and Wimpy, Whitten, and Williams (Reference Wimpy, Whitten and Williams2021) demonstrate that only models that at least include spatially lagged regressors ( $\gamma _k\neq 0$ ) are able to produce indirect effects that are flexible in the sense that they can take on any empirical value relative to their corresponding direct effects. In contrast, models that include a spatially lagged dependent variable and/or a spatially lagged error only, among which the SAR model and the SEM, are less flexible since they impose restrictions on their mutual relationships in advance. Second, parameterizing the spatial weight matrix of each explanatory variable by a different decay parameter enhances this flexibility by relaxing the implicit restriction that the spatial range of each variable is the same. If all spatial lags have the same spatial weight matrix and thus distance decay parameter, $W(\alpha _0)=W(\alpha _1),\ldots ,W(\alpha _K)=W$ and $\alpha _0=\alpha _1=\ldots =\alpha _K$ , the last term in Equation (4) simplifies to $\beta _{k}\boldsymbol {I}_{N}+(\rho \beta _{k}+\gamma _{k})\sum _{g=0}^{\infty }\rho ^g\boldsymbol {W}^{g+1}(\alpha _0)$ . However, if all regressors have the same sum $\sum _{g=0}^{\infty }\boldsymbol {W}^{g+1}(\alpha _0)$ in common due to this restriction, the maximum distance at which the indirect effect $IE_k$ falls to zero is also the same for all of them, independent of $\rho $ , $\beta _k$ , and $\delta _k$ . Third, when drawing statistical inferences on the direct and indirect effects, the uncertainty in the distance decay parameters will also be accounted for. For this purpose, we use the delta method since it saves on computation time. Mathematical expressions of this method are provided in Supplementary Appendix A.7. In Section 5, we explore the implications of the last two properties, which are new to the applied spatial econometrics literature, in the context of our empirical application.
From these three properties, it follows that common but simpler models nested within the proposed SD model have limitations from an empirical viewpoint. It concerns the SAR model, which according to Whitten et al. (Reference Whitten, Williams and Wimpy2021) is used in 60% of 94 papers that appeared in political science journals as of May 2015, the spatial lag of X (SLX) model, used in 33% of these studies, and the SEM, which according to them is hardly used. These three models can be obtained by, respectively, imposing the restriction, $\gamma _k=0$ for all k, $\rho =0$ , and $\rho \beta _k \boldsymbol {W}(\alpha _0)+\gamma _k \boldsymbol {W}(\alpha _k) = \boldsymbol {0}_N$ for all k.Footnote 4 Consequently, the ratio between the indirect and direct effects in the SAR model is the same for every explanatory variable, which is unlikely from an empirical viewpoint. Indirect effects in the SEM are zero by construction. From an empirical viewpoint, this is also not helpful, especially if the purpose of a study is to determine these effects, which might also explain why this model is hardly used anymore. The indirect effects in the SLX model are flexible in that can take any empirical value. However, the second property shows that this flexibility, just as for the SD model, can be increased further by parameterizing the spatial weight matrices by different decay parameters. The choice between the SLX and SD models depends on whether the extension $\rho \neq 0$ can be motivated, either statistically as in LeSage and Pace (Reference LeSage and Pace2009) or theoretically. In the empirical analysis of this paper, we explain military expenditures by an SD model based on data from Yesilyurt and Elhorst (Reference Yesilyurt and Elhorst2017), who also provide an economic-political theoretical model in favor of this empirical model. Another way to motivate the extension $\rho \neq 0$ is to form a reasonable argument to expect global indirect effects. These occur if a change in one of the explanatory variables impact the dependent variable observed in another unit, even if these two units are not connected to each other according to the spatial weight matrix $(w_{ij}=0)$ . Sometimes a shock has a global impact even though it initially seemed local. Examples are the banking crisis that started in the United States, the euro crisis that started in Greece, the COVID-19 crisis that started in China, the war that Russia started with Ukraine, and so on. In these cases, a spatial lag in the dependent variable is likely because many more recipients were affected by this crisis without being adjacent to the originator of the shock.
4. Monte Carlo Simulations
We conduct Monte Carlo experiments to evaluate the performance of our proposed estimator for both parameterized distance decay matrices. Our data generating process contains two explanatory variables: $x_{1it} \sim N(2,5)$ and $x_{2it} \sim N(-1.5,3.5)$ . The coefficients of the first variable and its spatial lag are $\beta _1=-1$ and $\gamma _1=1.5$ , and of the second variable and its spatial lag are $\beta _2=0.2$ and $\gamma _2=-0.3$ . These two $\gamma $ parameters are greater in absolute values than their corresponding $\beta $ parameters to gain more insight into the sensitivity of parameterizing spatial weight matrices. The unobserved individual fixed effects and the error terms are both generated from a standard normal distribution $N(0,1)$ . To address the majority of empirical studies based on $N>>T$ , we set $N= \{200, 800 \}$ and $T=5$ . The number of iterations is 1,000. To construct distance matrices between the cross-sectional units, we use the coordinates of N data points evenly set in a rectangle of $10 \times 20$ for $N=200$ and $20 \times 40$ for $N=800$ . The decay parameters are $\alpha _0=2$ , $\alpha _1=1.5$ , and $\alpha _2=3$ . Finally, the parameter of the spatial lag of the dependent variable is set to $\rho =0.5$ .
We investigate both the parameter estimates and the estimates of the direct and indirect effects, because LeSage and Pace (Reference LeSage and Pace2018) demonstrate that past studies’ focus exclusively on parameter estimates may not provide useful information regarding the statistical properties of the direct and indirect effects obtained from these parameter estimates. To judge the performance of the estimators, we consider the average bias (Bias), the root-mean-square error (RMSE), the median bias (Mbias), and the median absolute value of the bias (Mabias). The latter two statistics are used as they are more robust to outliers.
Table 1 reports the results when using R. To save space, we focus on the results of the row normalized negative exponential distance decay matrix. Results for the inverse distance matrix are quite similar and reported in Supplementary Appendix A.9. We performed similar experiments varying $\rho $ , $\alpha _1$ , or $\alpha _2$ and when using scalar instead of row normalized distance decay matrices. In addition, we report the mean and standard deviations of the p-values of the parameter estimates and direct and indirect effects. If the underlying asymptotic distribution is true, then under the null, the p-values should follow a $U(0,1)$ distribution, and thus should have a mean p-value of 0.5 and a standard deviation of approximately 0.29. An overview of all parameter configurations and simulation results can be found in Supplementary Appendix A.9.
We compare the performance of three estimators. First, the proposed approach in which all spatial weight matrices are parameterized, labeled PWFE, where P stands for parameterized, W the spatial weight matrix, and FE fixed effects. The biases in the parameter estimates of this estimator are small and acceptable.Footnote 5 Generally, they are smaller for the coefficients of the variables than for the decay parameters, and smaller for $N=800$ than for $N=200$ . Similarly, increasing the sample size leads to a decrease in the RMSE and the median absolute value of the bias. While the biases in the coefficients and decay parameters are already small, the biases in the direct and indirect effects appear to be even smaller. The explanation is that an upward bias in one parameter is offset by a downward bias in another parameter in such a way as to produce direct and indirect effects that remain relatively stable (LeSage and Pace, Reference LeSage and Pace2014, p.227, 2018).
Second, we investigate what happens if a practitioner adopts one common spatial weight matrix with a decay parameter of 1 for all spatial lags. This may throw more light on how harmful this can be and is referred to as WFE (one common W and fixed effects). The results show that the performance of this estimator leads to higher values of bias, RMSE, Mbias, and Mabias in Table 1. Especially, the estimates on $\gamma _1$ , $\gamma _2$ , and $\rho $ are severely biased. When investigating the effect estimates of the WFE estimator, we see that the bias in the direct effect estimates is nonetheless small, whereas the bias in the indirect effects is large. We find biases that exceed 0.5 (in absolute value). This is a cause for concern and supports the approach advocated in this paper whereby different decay parameters of the spatial lags are estimated rather than set at one particular value in advance.
Third, we evaluate the performance of an estimator assuming that the true spatial weight matrices are known, which we refer to as TWFE (true spatial weight matrices W and fixed effects). Although this reflects an ideal, but hypothetical situation, it may throw more light on the relative performance of our approach and the associated costs in terms of estimation errors compared to this ideal estimator. The results in Table 1 show that the model parameters in this hypothetical situation can be estimated with greater accuracy than in the situation where the decay parameters also need to be estimated. This is also evident when comparing the median absolute value of the bias (Mabias) of the indirect effects of TWFE and PWFE. In general, the differences in the results of TWFE and PWFE are rather small compared to the differences and negative consequences of using wrong pre-specified weight matrices as with WFE. When taking the difference between the biases of the coefficients obtained by WFE and TWFE, representing the misspecification error due to erroneously assuming that $\alpha _0=\alpha _1=\cdots =\alpha _K$ minus the unavoidable estimation error, and comparing these biases relative to those obtained by PWFE, we get an indication of the proportionate improvement of applying PWFE. This improvement appears to be roughly 62% for the two $\gamma $ parameters. If they would be smaller rather than greater than the $\beta $ parameters as in our Monte Carlo experiment, this percentage is likely to go down but nevertheless remains substantial.
A somewhat different pattern across all experiments reported in Supplementary Appendix A.9 occurs when increasing the decay parameter to an extremely high value, that is, $\alpha _1=10$ (Cases VII and VIII). These cases resemble the property set out in Section 3.2 that $w_{ij}(\alpha _k)\stackrel {(\alpha _k|d_{ij})\rightarrow \infty }{\rightarrow }0$ . In this scenario, those elements for which $d_{ij}$ is also large converge to zero beyond a cutoff point and could just as well be set to zero without changing the structure of the spatial weight matrix. Bias and RMSE of $\alpha _1$ increase substantially, while bias and RMSE of the corresponding spatial lag parameter $\gamma _1$ remain largely the same. However, the increased bias and RMSE in the decay parameter $\alpha _1$ appears to have no effect on the bias and RMSE of the direct and indirect effects when using PWFE. This finding suggests that a practitioner who finds a high value for one of the decay parameters, perhaps one equal to the upper bound, could still value the direct and indirect effects of the corresponding explanatory variable and actually might adopt a first-order binary contiguity matrix equally well.
5. Empirical Analysis
The empirical analysis is based on Yesilyurt and Elhorst (Reference Yesilyurt and Elhorst2017) (YE hereafter), who investigate military spending measured as a ratio of GDP, also known as the defense burden, in 144 countries over the period of 1993 to 2007. Explanatory variables are GDP, population, international war, civil war, and political regime. The dependent variable and the first two explanatory variables are measured in logs, while the latter three are measured as scores. The scores on the variables international war and civil war range from 0 (no war) to 10 (greatest). The variable political regime ranges from $-$ 10 to +10, where $-10$ indicates strongly autocratic and +10 strongly democratic countries.
YE compare several spatial econometric models and spatial weight matrices. However, negative exponential and inverse distance decay matrices have not been investigated. Using Bayesian comparison methods developed by LeSage (Reference LeSage2015) and Juhl (Reference Juhl2020), they find that the SAR model in combination with a first-order binary contiguity matrix based on maritime borders produces the highest Bayesian posterior model probability. YE find little evidence in favor of indirect effects, also not when estimating the SD instead of the SAR model. This finding is typical of many empirical studies applying the SD model: often none or only some of the spatial lags in the explanatory variables and/or indirect effects appear to be significant. One explanation could be that these studies adopt one common spatial weight matrix for all spatial lags in the model. Our Monte Carlo simulations have shown that especially the indirect effects, which tend to be the main focus of empirical studies, are sensitive to a wrong choice of the spatial weight matrix. Therefore, we investigate whether using parameterized spatial weight matrices with different distance decay parameters for each spatial lag in the model alter the results. Three comments are in order before discussing the results. First, YE consider both static and dynamic versions of the SD model. This empirical application focuses on their static model results. Second, it focuses on geographical distances only.Footnote 6 Third, following YE, we consider all explanatory variables and their spatial lags as exogenous.
The first column of Table 2 reports the results using the preferred binary contiguity matrix from YE and the second column when using the proposed PWFE estimator applied to the parameterized row normalized negative exponential distance decay matrices in the SD model.Footnote 7
Notes: YE = one common binary contiguity matrix based on Yesilyurt and Elhorst (Reference Yesilyurt and Elhorst2017); PWFE = Parameterized W matrices and fixed effects (FE), EWFE = Estimated W matrices and FE, EWFEBC = Estimated W or binary contiguity (BC) matrices and FE; *, **, and *** significant at, respectively, 10%, 5%, and 1%.
The estimate of the SAR parameter $\rho $ is 0.241 and 0.251, respectively, and in both cases significant at the 1% level. This indicates that the SD model cannot be simplified to an SLX model. Three of the five explanatory variables in column 1 also appear to have coefficients and direct effects that are significant at the 1% level and one at the 10% level. However, only one of their corresponding spatial lags and indirect effects, the political regime, appears to be significant at the 5% level. We also test whether there is empirical evidence in favor of the SD model by conducting Wald tests on the null hypotheses $\gamma =0$ and $\rho \beta _k\boldsymbol {W}+\gamma _k\boldsymbol {W} = \boldsymbol {0}_N$ , yielding the SAR and SE model, respectively. The test statistics are $7.85$ (p-value $0.16$ ) and $7.39$ (p-value $0.19$ ). Consequently, no empirical evidence in favor of the SD model is found.
The LogL of the PWFE estimator in column 2 is $-1311.02$ , which is slightly better than its counterpart of $-1311.39$ for the binary contiguity matrix. With $\alpha _0=2.003$ (t-value 3.56), the decay parameter of the spatial lag in the dependent variable dominates the other decay parameters. However, the corresponding 95% confidence interval of (0.88, 3.12) does not contain the decay parameters obtained for the spatial lags in GDP, international war and political regime. The latter two even reach the upper bound set at 10. This indicates that one distance decay parameter for all spatial lags is unnecessarily restrictive.
To investigate the impact of ignoring the uncertainty in the spatial weight matrices if they are pre-specified, we run another regression using spatial weight matrices based on the estimated distance decay parameters from column 2. For the two variables whose distance decay parameters reached the upper bound, we also applied a cutoff point; elements smaller than 0.005 are set to 0. We label this model as EWFE and report the results in Column 3. Finally, column 4 reports the estimation results when the spatial lags for which an upper bound of 10 was found are replaced by the original binary contiguity matrix used by YE, labeled EWFEBC.
Although the parameter estimates and the direct and indirect effects hardly change as we move from column 2 to columns 3 and 4, their t-values do. While their magnitudes decrease slightly for the direct effects (on average by 1.3% and 10.5%), they increase substantially for the parameter estimates (on average by 31% and 46%) and the indirect effects (on average by 62% and 63%). This change in the t-values of the indirect effects is because the decay parameters in the last two columns are no longer part of the variance–covariance matrix used to determine their significance levels. It reflects the common approach applied by practitioners who pre-specify the spatial weight matrices and it shows that in particular the significance levels of the indirect effects can change when uncertainty in the choice of the spatial weight matrix is accounted for.
When we compare the direct effects of YE in column 1 with those in columns 2–4 (PWFE, EWFE, and EWFEBC), we see, in line with the Monte Carlo simulation experiments, that they are not sensitive for the choice of weight matrix. For GDP, we find a significant direct effect of about $-0.5$ in all four columns, which indicates that if the level of GDP in a country increases, the defense burden in that country also increases though less than proportionally. On the other hand, if we compare the indirect effects, we see different results. While the indirect effect of GDP is $-0.048$ and insignificant in column 1, it is $-1.4$ in the last three columns, implying that an increase in the level of GDP also has a diminishing effect on the defense burden of its neighbors. Furthermore, while insignificant in column 2 due to uncertainty in its corresponding distance decay parameter, it is significant in the last two columns when this uncertainty is left aside. The indirect effect of civil war is also (weakly) significant in the last two columns of Table 2 and fluctuates around $-0.1$ . This means that a country sharing a border with a country involved in a civil war does consider this as a potential threat. Finally, the indirect effect of the political regime is significant and takes values that range from $-0.012$ to $-0.035$ . A similar and significant effect of $-0.029$ was found when adopting one common binary contiguity matrix. This outcome demonstrates that the defense burden does not only increase with a higher level of autocracy in the own country, but also with higher levels of autocracy in neighboring countries.
In sum, three indirect effects appear to be significant when allowing the distance decay parameters of the spatial weight matrices to be different, while only one was significant in the model based on one common binary contiguity matrix. The magnitudes of the indirect effects also differ, in line with the Monte Carlo simulation results in Section 4. The same applies to the coefficients obtained for the spatially lagged regressors, as a result of which the hypotheses $H_0: \gamma =0$ (SAR) and $\rho \beta _k\boldsymbol {W}(\alpha _0)+\gamma _k\boldsymbol {W}(\alpha _k) = \boldsymbol {0}_N$ (SEM) now also need to be rejected. The p-values of the Wald test statistics are smaller than 0.05 in the last two columns.
Figure 1 breaks down the indirect effects of GDP, civil war, and political regime, which are significant in columns 3 and 4, by distance relative to their total indirect effects. The upper panel shows the results when adopting just one spatial weight matrix, the one used in column 1 of Table 2, and the lower panel when allowing the decay parameters to be different as in the last three columns. When using one common spatial weight matrix, the distance decay and spatial range of the indirect effect of all explanatory variables are similar, while they diverge if the distance decay parameters are allowed to be different. Whereas the spatial ranges of GDP, civil war, and political regime are all about 2,000 km in the upper panel, they become greater than 6,000 km for GDP and smaller than 1,500 km for civil war and political regime in the lower panel of Figure 1. This finding is in line with the property of the spatial range discussed in Section 3.3.
6. Conclusion and Discussion
This paper breaks the practice of employing one common pre-specified spatial weight matrix for all spatial lags in the SD model by parameterizing each spatial weight matrix with a different distance decay parameter. This extension provides information on the amount of uncertainty of the spatial weight matrices, and it relaxes the implicit restriction that the spatial range of the indirect effects is the same for all regressors.
Both our Monte Carlo simulation experiment and empirical application show that adopting one distance decay parameter for all spatial lags in the SD model, either explicitly or implicitly by choosing one of the popular spatial weight matrices listed in Section 2, biases the indirect effects and their significance levels, which are often the main focus of applied spatial econometric research. This is a cause for concern since it reflects the most widely used approach applied by practitioners. By contrast, it supports the approach advocated in this paper whereby the decay parameters of the spatial lags are estimated rather than set at one particular value for all spatial lags in advance.
In follow-up research, we aim at extending the model by considering a spatial lag in the error term and temporally dynamic relationships. The recent study by Cook, Hays, and Franzese (Reference Cook, Hays and Franzese2023) offers numerous starting points for this. In addition, it is recommended to investigate the possibility of integrating distance- and non-distance-based measures in the spatial weight matrix and the link with MGWR, which focuses on determining direct rather than indirect effects but applies the same principle of working with different spatial weight matrices for each explanatory variable.
Acknowledgements
We thank the Editor-in-Chief, four anonymous reviewers, and the participants of seminars at the Free University of Amsterdam, the University of Groningen, the Ohio State University, and the Renmin University, as well as the participants of the XV and XVI Spatial Econometric World Conference for valuable comments and suggestions. We also thank the replication analyst of our data and programming code for the many useful tips.
Funding Statement
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Competing interest
The authors have no competing interest to declare.
Data Availability Statement
Replication code for this article has been published in Political Analysis Harvard Dataverse at https://doi.org/10.7910/DVN/V9L9XC.
Supplementary Material
For supplementary material accompanying this paper, please visit https://doi.org/10.1017/pan.2024.16.