1. Introduction
1.1. Background
Modern studies of the behavior of glaciers, ice sheets and ice streams rely heavily on both observations and physically based models. Data acquired via remote sensing provide critical information on geometry and movement of ice over large sections of the Antarctic and Greenland. Although these datasets represent significant advancements in terms of spatial coverage and the variety of processes we can observe, the physical systems to be modeled are nevertheless imperfectly observed. Uncertainties associated with measurement errors are present and physical models are also subject to uncertainties. There is therefore a need for combining observations and models in a fashion that incorporates uncertainty and quantifies its impact on conclusions.
The goal of combining models and observations is hardly new in glaciology (e.g. Reference MacAyealMacAyeal, 1993; Reference Arthern and HindmarshArthern and Hindmarsh, 2003; Reference Gudmundsson and KnightGudmundsson, 2006) nor in the broad areas of the geosciences (e.g. data assimilation as practiced in numerical weather forecasting). We address the goal by formally modeling the uncertainties present, then using Bayes’ theorem to deduce information about all unknowns in the data. We focus on the development of statistical models with strong reliance on physical modeling, a strategy Reference BerlinerBerliner (2003) called ‘physical–statistical modeling’. This is different from traditional physical modeling, with perhaps data-based parameter estimates, and traditional statistical modeling which possibly relies upon qualitative physical reasoning.
An in-depth illustration based on data from the ‘Northeast Greenland Ice Stream’ (NEGIS) is presented. The physical models used are simplified approximations, but are familiar and provide an accessible arena for exemplifying the physical–statistical approach. Specifically, we develop statistically enhanced versions of a simple model of basal shear stress as the sum of driving stress and a corrector process representing other effects, such as lateral resistive stress. Inference for these unobservable stresses is a primary goal here. Bayesian inference is carried out using recently acquired datasets of ice thickness, surface elevation and surface velocity.
The introduction and uncertainty analysis of the corrector process is a crucial feature of our approach. Rather than introducing the correctors, it is natural to suggest that the effects they represent should be formally modeled. This would require more assumptions and/or more data, thereby introducing additional sources of uncertainty. In any case, we do not expect any implementable physical model to remove concerns regarding its applicability nor its uncertainty. Hence, the ability not only to combine models and observations, as discussed by Reference MacAyealMacAyeal (1989), Reference BlatterBlatter (1995) and Reference Joughin, Fahnestock, MacAyeal, Bamber and GogineniJoughin and others (2001), but also to analyze indicators of model validity and uncertainty is paramount.
The primary output of a Bayesian analysis is a posterior distribution, namely the joint probability distribution for unknown quantities conditional on the observed data. Even in our simple illustration, explicit presentation of their joint distribution is not feasible. Hence, a key aspect of Bayesian analysis in such settings is the ability to generate realizations or ensembles from the posterior distribution; the posterior is then studied through statistical summaries of such ensembles. A key by-product of such techniques is the ability to provide ensembles of initial conditions for use in a dynamic forecasting model. That is, one application of the modeling here is as a method for combining models and observations in support of full dynamical prediction or forecasting.
1.2. Glaciological motivations
That glaciers flow under the force of gravity means that important factors in determining velocities include the constraints of the constitutive relationship (Reference Whillans, Van der Veen and OerlemansWhillans, 1987) and ice thickness, in combination with resistive forces acting along the sides and at the base of the glacier. The lithostatic stress at some depth in the ice equals the weight of the ice above that depth, and horizontal gradients in this stress drive ice-stream flow. The flow of the ice stream is impeded by resistive stresses given by the difference between the full stress and the lithostatic component. A separation is therefore made between action or gravitational forces and reaction or resistive forces. After partitioning the full stress into lithostatic and resistive components, substitution into the horizontal balance equation and integration over the full ice thickness gives the following equation for driving stress along the flow (e.g. Reference Van der VeenVan der Veen, 1999):
where τ bx is basal shear stress and τ Rx represents resistive stresses arising from gradients in longitudinal stress and lateral drag. Defining the driving stress as
where s is ice-surface elevation, H is the ice thickness, ρ is the density of ice and g is the gravity constant, we have
Noting that no direct observations of these stresses are available, the main problem in this paper is estimation of the stresses. A further limitation on such estimation involves the degree of lubrication at the base of the ice. This feature is crucial to understanding glacial dynamics and predicting the glacier’s response to climatic controls. However, this feature is not observable directly.
Treating the flow parameter A as a constant, the surface velocity u(x) is approximated by
where u bx is sliding velocity and n is a flow parameter (e.g. Reference PatersonPaterson, 1994, p.251, equation 21). We fix n = 3 in this paper. Motivated by Equation (3), we model τ bx as
where ηx is a corrector process needed to explain the velocity data. To clarify, we substitute Equation (5) into Equation (4) and estimate all quantities in light of the data. If ηx is found to be negative in some region, the velocities in that region are smaller than those expected based on a simple balance between driving and basal shear stresses. Hence, such ηx indicates additional resistive stresses acting on the flow. Alternatively, positive ηx suggests that the glacier is moving faster than expected, perhaps indicating an area of high lubrication at the base.
The simple interpretations above are of course flawed to some degree; we are attempting to use a single quantity ηx to explain the combined effect of at least two phenomena. For example, there may well be regions of both high lubrication and high resistive stress due to lateral effects. These are confounded in the simple approach here. This also suggests a confounding in modeling both sliding velocity u bx and ηx as relatively unconstrained functions of x. We can break this confounding either by introducing very strong priors to constrain these processes or by constraining one of the processes while leaving the other relatively free and responsive to the observations.
In this paper, we compromise between these two notions. First, no matter how the modeling is carried out, we found that the data mandated the use of at least two different models. The estimated value of this change point between models is indicated in Figure 1. This estimate is close to an observable lineament. As will be indicated, creep flow appears to be the dominant mechanism upstream of this change point. Near the change point, sliding gains critical importance. We note that Reference Joughin, Fahnestock, MacAyeal, Bamber and GogineniJoughin and others (2001) suggest that there is a till plain downstream of where our analysis ends. We hope our results cast light on the nature of the bed between our change point and the start of the till plain.
Based on the previous notions, we specified sliding velocity as a constant upstream of the change point, and used a relatively uninformative prior for ηx (see section 3.4). Downstream of the change point, we used approximations to Weertman-type models for sliding velocity (e.g. Reference PatersonPaterson, 1994, ch. 7) of the form
for various combinations of p and q with τ bx given in Equation (5), i.e. without a corrector process. We found that the data strongly favored the selection of p = q = 1; we assume these values throughout the rest of this paper. Note that with these specifications, Equation (6) reduces to
We also tried using Equation (4) with no corrector process, no change point and u bx as given in Equation (6) for all x and various combinations of p and q. None of these models was competitive with the model presented here.
Another interesting issue arises in the inference of driving stress based on observations of s and H. Reliance on the slope of the upper ice surface in Equation (2) implies that results are very sensitive to small-scale variations in surface topography and to small-scale, perhaps unimportant, variations in ice thickness. Hence, driving stress is usually estimated based on averaging over horizontal distances of a few ice thicknesses to eliminate small-scale flow features not important to the flow (Reference Kamb and EchelmeyerKamb and Echelmeyer, 1986; Reference PatersonPaterson, 1994). Indeed, if averaging is not done, driving-stress estimates exhibit unreasonably large variations. Hence, we incorporate a statistical smoothing step in our Bayesian model.
1.3. Data
We assembled surface topography and ice-thickness observations for a portion of the NEGIS (Figs 2 and 3). The data were gathered as part of the Program for Arctic Regional Climate Assessment (PARCA). Surface topography and ice thickness were sampled every few hundred meters using equipment mounted on the Wallops Flight Facility P-3 aircraft. Surface velocity data were calculated (personal communication from I. Joughin, 2002) dataset. The three derived datasets are S (surface topography), B (basal topography), and U (surface velocities). Datasets S and B are shown in Figure 2; U in Figure 1.
1.4. Hierarchical Bayesian analysis
One rationale for the Bayesian approach is that it is a mathematically rigorous method for combining information in the presence of uncertainty. Two primary sources of information are available for inferring the unknown quantities of interest: (1) observations or data that convey some information regarding those unknowns, and (2) prior information based on scientific reasoning regarding the unknowns, including physical models as well as past experience and data. Both sources of information are subject to uncertainties. In the Bayesian approach, all such uncertainties are modeled probabilistically. The primary computational tool used is Bayes’ theorem, which is simply a result in probability theory relating various conditional distributions. However, the Bayesian view of statistical modeling and analysis involves more than the simple application of probability theory. It is a paradigm that involves the modeling of unknowns as random variables and using observations to update that modeling effort.
Many procedures used in the analysis of geophysical data are Bayesian or approximately Bayesian. For example, the Kalman filter is a Bayesian procedure (e.g. Reference Meinhold and SingpurwallaMeinhold and Singpurwalla, 1983). Other examples associated with data assimilation are discussed by Reference LorencLorenc (1986) and Reference Evensen and van LeeuwenEvensen and Van Leeuwen (2000). A general review of Bayesian analysis intended for geophysical audiences is given by Reference Wikle and BerlinerWikle and Berliner (2006); see also Reference EpsteinEpstein (1985) and Reference TarantolaTarantola (1987).
Nevertheless, the full power of the methodology has only recently been making progress in geophysics (e.g. Reference Berliner, Wikle and CressieBerliner and others, 2000; Reference Wikle, Berliner and MilliffWikle and others, 2003; Reference Tebaldi, Smith, Nychka and MearnsTebaldi and others, 2005). Advances in computation, through variants of Markov chain Monte Carlo (MCMC) algorithms, now enable hierarchical Bayesian modeling that is capable of dealing with the complexities in models and data that arise in geophysics. A skeleton of hierarchical reasoning begins with consideration of three basic collections of variables to be modeled: data (our observations) labeled y; process (those physical, state variables of interest (e.g. velocities, stresses, etc.)) labeled x; and parameters labeled θ, including unknown physical constants and parameters introduced in the statistical components of the model. We use the following notation for probability distributions: (1) the distribution of a random variable, say X, is written as [x]; and (2) the conditional distribution of Y given X = x is written as [y|x]. A review of the basic strategy and associated computations is given by Reference Gelman, Carlin, Stern and RubinGelman and others (2004).
Hierarchical thinking suggests a model with three primary components (e.g. Reference Berliner, Hanson and SilverBerliner, 1996):
-
1. Data model: [y|x, θ];
-
2. Prior process model: [x|θ]; and
-
3. Prior on parameters: [θ].
Although the construction of these components is often not easy, once the modeling process is complete, Bayes’theorem yields the posterior distribution [x, θ|y] given by
The denominator [y] is the marginal distribution of the data. It can be viewed simply as that constant which normalizes the numerator ensuring that the posterior is indeed a probability distribution, depending upon the observed data but not the unknowns.
In the Bayesian treatment of parameters, even unknown constants are treated as if they are random. This is important since the definitions of some model parameters may be based on approximations. For example, flow parameters are not really unknown physical constants but rather functions of other unmodeled process variables (e.g. temperature or pressure) which introduce uncertainty. Hence, while the Bayesian approach may appear complicated compared to others, its advantage is that it quantifies and manages uncertainties through to final conclusions.
2. Physical–Statistical Modeling of the ‘Northeast Greenland ICE Stream’
Recall our three datasets: S (surface observations), B (basal observations), and U (surface velocity). The corresponding processes of interest are true surface topography s(x), true basal topography b(x) and true surface velocities u(x), where x indexes a transect down the middle of the ice stream. There are no observations on the stresses acting on the ice, although physical relations allow inference from modeled stresses.
2.1. Translating physics into statistical models
In response to the issues introduced in section 1.2 regarding averaging of surface and thickness data for the estimation of driving stress, our first step is to model the basal topography b and the surface s as smoothed versions plus local noise, denoted generically by N. We therefore assume
where ψb (x) and ψs (x) are smooth versions of the base and surface, respectively.
Applying these definitions in Equation (2), we model smoothed driving stress as
where the smoothed ice thickness is . In all computations, we set ρ = 911 kg m−3 and g = 9.81 m s − 2.
We next define the partially smoothed basal shear stress in the region upstream of the currently undetermined change point, by applying Equation (5) for velocities taken in the negative x direction as
We say is only partially smoothed because we do not smooth the corrector process. As clarified later, η(x) is modeled as an unknown stochastic process. This enables further interpretation of the posterior behavior of η(x) as an indication of regions where successful or unsuccessful smoothing has occurred.
We then incorporate the velocity model Equation (4) (recall that we set n = 3) and the approximate sliding velocity model Equation (7) into a statistical model:
where k, A 1, ub , A 2 and c are treated as unknown, random parameters and Nu (x) is an error process.
Before proceeding with the specifics, we emphasize that these formulae motivate probability models that are updated in light of the observations. More directly, we do not produce single estimates of smoothed processes ψs (x) and ψb (x) based on the data and substitute those estimates into the stress and velocity models. Rather, we produce an ensemble of these quantities simulated from their posterior distributions and propagate them forward into our models. This produces an ensemble of stresses and velocities, which in turn are updated based on the velocity observations.
2.2. Underlying probability theory
Our main task is the development of the following probability distributions:
where θ denotes the collection of all model parameters. Our goal is to obtain the posterior distribution [ψb , ψs , u, η, θ |B, S, U], which can then be used to obtain the posterior distribution of stresses.
Our main assumption regarding the data model is that the three primary datasets are conditionally independent given the true processes they represent and given the model parameters. Notationally, the data model takes the form
where notation such as θ B is used to indicate those parameters (subsets of θ ) explicitly appearing in the indicated models. A possible objection to Equation (14) arises since the basal data B are actually computed as the difference between surface and thickness observations. The assumed conditional independence therefore may not hold. We checked our posterior results for indications of degrees of departure from this assumption and found none that would affect our results.
Our modeling of the processes begins with a probabilistic equality (i.e. this is not an assumption, but a fact):
Similarly, we have
We make the following two assumptions.
-
1. In formulating the second term on the right side of Equation (15), we assume that the base b and the surface s are independent, conditional on their smoothed versions and the model parameters. We then obtain
(17)where again notation such as θ b is used to indicate appropriate subsets of θ . It is critical to note here that we are not assuming that the base and the surface are independent. Our modeling of both the base and surface is conditional upon smooth processes ψb and ψs . Our assumption is that the small-scale departures from those large-scale processes are independent.
-
2. Consider the conditional model for u, namely the first term on the right side of Equation (16). We assume that the velocity profile depends on the base and surface only through their respective smoothed versions ψb and ψs , i.e.
(18)The specifications of prior distributions for the parameters are given in Appendix A.
3. Components of the Hierarchical Model
3.1. Basal model
First, the process model is described. We chose to use wavelets to model the smoothed basal topography ψb (x) because of their flexibility in representing highly variable processes and because we can easily control their smoothness.
Wavelets work best for equally spaced data where the number of data points is an integer power of 2. Hence, we partition the domain of the data into 211 = 2048 bins of equal length (189.5 m). Let denote the 2048-dimensional vector constructed by averaging b within each bin. Note that is not observed.
We used multi-resolution Daubechies wavelets (e.g. Bruce and Gao, Reference Bruce and Gao1996; Vidakovic, Reference Vidakovic1999). Two collections of wavelets were used. The first set is thought of as the smooth signal, while the second represents the detail signal. We consider a smooth signal of four wavelets. To this component we added a detail signal containing 28 more wavelets. All 32 wavelets are mutually orthogonal.
After converting to a discrete wavelet form, we obtain a linear model for :
where W is the 2048 × 32 matrix of discretized wavelet basis functions, C is the 32-dimensional vector of wavelet coefficients and Σ(ϕ 1, ϕ 2) is the correlation matrix of an autoregressive process of order 2 (AR(2)) with variance . Notationally, ψb = WC and . The selection of an AR(2) process to account for spatial dependence among the model errors (i.e. local variations in basal topography) was based on preliminary data analysis and practicality; our Bayesian computations require repeated inversion of a 2048 × 2048 matrix involving the inverse of Σ(ϕ 1, ϕ 2), which is straightforward for an AR(2) process.
We performed analyses for four resolutions, corresponding to 8, 16, 32 and 64 coefficients in Equation (19). We found that the model with k = 32 coefficients provided the best results in terms of maintaining fidelity to the thickness data (i.e. not over-smoothing), yet providing good inferences for velocity and therefore stress.
Turning to the data model, we define the basal data vector of length 2048 with the ith element given by the simple arithmetic average of those basal observations lying in bin i. Let ni (i = 1, …, 2048) be the number of data points lying in bin i. In our case, none of the ni were equal to zero; most were either 1 or 2.
Our data model (i.e. [B |b, θ B ] in Equation (14)) is
where represents measurement-error variance and is a 2048 × 2048 matrix with diagonal elements equal to and off-diagonal elements equal to 0 (i.e. the elements of are assumed to be conditionally independent). This suggestion arises from a familiar result from elementary statistics. The sample mean of n independent observations with common mean and common variance σ 2 has variance σ 2 /n.
Specifications of priors on parameters are given in Appendix A.
3.2. Surface model
Our modeling strategy for [s|ψs , θ s ] in Equation (17) separates the large-scale and small-scale behaviors of the surface. Recalling Equation (9), we suppose
where the large-scale surface is given by a function ψs , assumed known up to a low-dimensional set of parameters, and Ns is a zero-mean spatial stochastic process described in Appendix A. To model ψs , we rely on an analysis given in Paterson (Reference Paterson1994, p.243, equation 8) and assume the basic model for the surface:
where θ s = (μ, K, L) are treated as unknown parameters. Although we set n = 3 here, we could also model n as an unknown.
We use only the large-scale surface Equation (22) to compute ice thickness, the surface derivative and hence the stress in Equation (10). Nevertheless, the presence of Ns is important in determining the data model in Equation (14). Under the strategy which uses Equation (22) to obtain the stress, we need a data model [S |ψs , θ s , θ S ], where θ S includes measurement-error variances and parameters of the distribution of Ns in Equation (21); our specification is given by Equation (A3) in Appendix A.
3.3. Velocity model
For the data model [U |u, θ U ], we assume that, conditional on the true velocities, the elements of U have independent Gaussian distributions with means equal to the true velocities at the corresponding locations and common variances . That is, at each observation location x, we assume that
where m(x) is a zero-mean, Gaussian measurement error with variance . Substituting for u(x) as prescribed in Equations (12) and (13), we have
We next combine the model errors Nu and traditional measurement errors m into a single error
assuming that eU (x) are independent, zero-mean, Gaussian random variables with common variance . (Note that these steps mean that there is no need for further use of the symbol u.) This simplification can be relaxed, but would require additional modeling of the process Nu .
To simplify the presentation of the statistical model, we write our models in vector notation. Recalling Equations (9–11) and (19), we define the vector of smoothed ice thicknesses
where ψ s is the 2048-dimensional vector of smoothed surface elevations. Similarly, we define the vector of smoothed values of driving stress τ d as
where * indicates the Hadamard product, i.e. we compute the vector of element-wise products of the coordinates of the smoothed thickness and the derivative of the smoothed surface.
Next, we let η denote the vector of stress corrections at the locations of velocity observation and set
From Equation (12) we model u, the vector of true velocities at the observation locations, as a linear function of the corresponding coordinates of smoothed thickness multiplied by the third power of coordinates of τ b.
In preliminary data analyses, we noted that at least two models (one for small x and another for large x) are needed. Let x = c be an unknown change point and consider different linear functions above and below the change point. Finally, the model for the velocity data vector U is
where subscripts 1 and 2 indicate the varying dimensions of the vectors (a vector with all elements equal to 1) depending on the value of the change point c and e U which are measurement and model errors, respectively (recall Equation (26)).
3.4. Stress corrector process
Since the corrector process arises from unmodeled and unobserved effects (e.g. longitudinal and lateral drag, basal lubrication, local under- or over-smoothing, etc.), we assume a relatively uninformed prior for η . Specifically, our prior [ η ] indicates that each element has a uniform distribution on the interval (−100 kPa, 100 kPa) and that all elements are mutually independent.
3.5. Bayesian calculations
Although we can write down Bayes’ theorem for the posterior distribution of all unknowns conditional on the observations, the result is typically not computable in closed form. We use a Monte Carlo approach that produces an ensemble of realizations from the target posterior distribution.
The method relies on the emerging technology of MCMC. The idea of MCMC is to simulate a Markov chain that has been carefully designed so that its stationary distribution coincides with the target posterior distribution. It follows that, after a burn-in or transience period, the generated realizations of the chain comprise a simulated sample from the posterior. Data analysis (often known as ‘output analysis’) is performed on this sample to produce the desired inferences.
In our case, direct use of MCMC is possible but challenging, primarily due to the non-linearities present in Equations (3) and (12). Hence, we combine MCMC with the technique of importance-sampling Monte Carlo (ISMC). Consider a setting in which direct simulation from a target distribution is difficult or inefficient. In ISMC, one generates an ensemble from another more manageable distribution. The theory of ISMC provides formulae for the calculation of weights that are used to re-weight the ensemble, permitting inferences relative to the original target. General introductions to both MCMC and ISMC can be found in Robert and Casella (Reference Robert and Casella1999). An illustration of these technologies in a geophysical problem is given in Berliner and others (Reference Berliner, Milliff and Wikle2003).
An outline of the calculations used here is as follows. We first run separate, independent MCMC algorithms (simple versions known as Gibbs samplers) for the basal model and the surface model. These runs produce ensembles from the posterior distributions and [ψs , θ s , θ S| S]. They are then used in conjunction with the velocity model [u|η, ψb , ψs , θ b , θ s , θ u ] (recall Equation (18)) to simulate velocities conditional on and S. (As described in Appendix A, we actually use a portion of S.)
To incorporate the velocity data U, we re-weight all of these samples using ISMC results, yielding the ensemble used to summarize the full posterior distribution. Technical details regarding the actual implementation are presented in Appendix B.
4. Posterior Results
Figure 3 presents 50 realizations of the smoothed base ψb = WC (recall Equation (19)) from the posterior distribution conditional on the basal elevation data. The corresponding posterior mean basal topography, estimated using an ensemble of size 2000 and the original data, is also depicted. We see that the posterior distribution of the smoothed base is relatively faithful to the basal data, i.e. the wavelets are able to capture much of the variation of the basal data with 32 coefficients.
Figure 4a presents 50 realizations and the posterior mean, conditional on the basal and surface-elevation datasets, of the smoothed driving stresses ψ d (recall Equation (28)). Figure 4b presents 50 realizations and the posterior mean of the correctors η, conditional on all three datasets (recall that we only modeled the corrector process to the right of the change point). In both cases, the posterior means were estimated using ensembles of size 2000. We note that except for the upper end of the data extent, the correctors are relatively small compared to the values of the driving stress. However, we also note that there is spatial structure indicated in the posterior for the correctors, although none was imposed by the prior.
For example, we see a run of positive correctors (perhaps due to lubrication) over the approximate range 100–120 km and a run of negative values (perhaps due to additional resistive stresses beyond those explained by the model) from ∼260 km to 330 km. The large negative correctors at the top end of the range are probably the combined result of a poor surface model in that region and a failure of the approximate physics so far upstream.
Figure 1 presents 50 realizations from the posterior distribution of velocity and the posterior mean velocity, estimated using ensembles of size 2000; the original velocity data are also shown. As explained in Appendix A, our estimate of the change point is x = 127.3 km; this point is also indicated on the plot. Interestingly, this location corresponds roughly to that of an apparent lineament.
As mentioned in section 3.5 and developed in Appendix B, ensemble members shown in Figures 1 and 4b are to be weighted according to importance-sampling theory. Although not indicated in these figures, the correct weights were used in producing posterior mean estimates.
The estimates (i.e. posterior means) for the other parameters in the model of the large-scale surface elevation (Equation (22)) are , and . Posterior results for other key model parameters are presented in Tables 1–3.
5. Conclusions
Tables 1 and 2 list various prior and posterior means and posterior standard deviations of selected model parameters. Table 3 provides 95% posterior credible intervals (Bayesian analogues of confidence intervals) for the approximate Weertman coefficient, sliding velocity and flow parameters of the velocity model.
We note that while our prior estimate of sliding velocity upstream of the change point was fairly small (35 m a−1), the posterior value is substantially larger. Next, we find that the posterior results for the flow parameters A on either side of the change point are plausible, although the results upstream from the change point may suggest warmer conditions than expected. Our treatment of A is of course linked to the specification of ρ = 911 kg m − 3. It is very easy to determine the results of other choices since it enters the model as a multiplicative constant.
We proposed smoothed surface and basal elevation models and showed that their use, in combination with simple physical models coupled with a corrector process, resulted in good predictions of velocity over much of the study region. There are regions to the right of the change point where the fitted corrector process indicates interesting interpretations regarding unmodeled dynamic controls, including lubrication at the base and additional resistive stresses. To the left of the change point, sliding modeled via a Weertman-like approximation with p = q = 1 dominates our model. Indeed, even if we removed the stress component of the model in the region, we would obtain very similar results.
Acknowledgements
This research was supported by the US National Science Foundation, Office of Polar Programs and Probability and Statistics Program, under grant No. 0229292. We are grateful to G.H. Gudmundsson and R. Greve for useful suggestions and to two other reviewers for their remarks.
Appendix A Details of the Hierarchical Model
Basal model: parameter priors
Our prior for the measurement-error variance is that it has an inverse gamma distribution with mean 50 and standard deviation 15. We also assumed an inverse gamma prior distribution for with mean 2000 and standard deviation 200.
While we could treat ϕ 1 and ϕ 2 as unknown parameters, they are only of minor interest here and small variations in their values do not affect our results. Hence, we considered ordinary least-squares fits to the data and used the resulting residuals to estimate ϕ 1 and ϕ 2, resulting in
Next, we describe our prior for the wavelet coefficients. Regarding the four coefficients C s for the smooth signal,
where μ is the vector of conventional, ordinary least-squares estimates of Haar-wavelet coefficients. Regarding the coefficients for the detail signal, we assume their prior means to be zero and that they are independent.
The model is completed by forming priors for the variances of the wavelet coefficients. We assumed these variances to be independent and to have inverse gamma distributions. For , we used prior mean 2002 and standard deviation 40. For the variance of the detail signals, we assumed prior mean 10002 and prior standard deviation 100.
Surface modeling
Recall that we assume s(x) = ψs (x) + Ns (x) (Equation (21)), where the large-scale parameterized surface is given by ψs as defined in Equation (22). Assume Ns is a zero-mean, Gaussian spatial stochastic process with constant variance and stationary correlation function γ(x, x′) = γ(|x − x′|).
To produce a data model readily usable to isolate the large-scale surface, we first assumed that the individual observations were conditionally independent with means equal to the true surface values. Next, we formally eliminated the small-scale process Ns from the model via a probabilistic method. The result then leads to observations whose means are equal to the large-scale model but whose values are correlated. Informally, they inherit the correlation structure of Ns .
Rather than dealing with the overhead of modeling that correlation structure, we make a simple assumption. We assume that for all locations separated by at least 150 m, the correlation is approximately zero. (We arrived at this value by fitting the surface by conventional least-squares analysis and then inspected the correlation function of the residuals of the fitted surface. Residuals at locations separated by more than 150 m were essentially uncorrelated.)
We then took a subsample of the surface data such that all observations were at least 150 m apart, which resulted in a subsample of 600 observations. We should have very little loss in efficiency in basing our analysis of the surface on such a subsample: 600 is a very large sample size for estimating three parameters (μ, K and L). Further, we tried a total of six such subsamples and obtained essentially the same results. Of course, the analysis hinges on the approximation that the decorrelation length scale of small-scale variations is 150 m.
Data model
We assume that, conditional on the true surface and the measurement-error variance ,
where s is the vector of true surface values at the observation locations. Standard results from probability theory imply that if we integrate out Ns in Equation (21), the resulting data model is
where ψ s is the vector of the large-scale parametric surface model evaluated at the observation locations and Γ is the matrix of correlations of Ns at pairs of those locations, as implied by the correlation function γ.
Process model
A Bayesian or a non-Bayesian analysis (e.g. generalized least squares) based on Equation (A3) requires modeling and estimation of Γ (in parallel with the estimation of Σ(ϕ 1, ϕ 2) in Equation (19)). Assuming a decorrelation length of 150 m, a subsample S l such that all observations are at least 150 m apart has distribution
where and ψ s ,l is the vector of large-scale surface values at the subsample locations.
Parameter model
Under the assumptions and simplifications outlined above, we need only consider a prior for the parameters μ, K, L and . Probability theory greatly simplifies our task. Having reduced the original problem to that of only 4 parameters and 600 conditionally independent observations, we can approximate the desired posterior. For all but the most highly concentrated priors, the posterior is approximately Gaussian, with means given by the conventional least-squares estimates and covariance matrix given by the estimated inverse information matrix for our setting (Reference BergerBerger, 1985, section 4.9).
Velocity modeling
Data model
We assume
where u is the vector of true velocities at the observation locations and is the measurement-error variance.
Process model
Based on the corresponding smoothed versions of thickness Equation (27) and stress Equation (28), we consider the change-point models
This leads to the data models
as given in Equation (30).
Parameter model
We next describe our prior distribution for the unknown parameters k, ub , A 1, A 2, and c.
Our prior for k, ub , A 1, A 2 and is the following so-called conjugate multivariate normal-inverse gamma distribution. The prior is described hierarchically as
Where
and
where all off-diagonal elements of W are zero and
Under these specifications, the prior means of the regression coefficients are given in Equation (A11), the prior variances of the regression coefficients are
and the prior mean and variance of are both equal to 9 (Reference Goldstein, Engelhardt, Kamb and FrolichGoldstein and others, 1983).
We remark that the prior above may seem cumbersome and strange in that we model the regression-model parameters conditioned on the error variance . However, this strategy leads to a computable posterior distribution for the regression parameters and . Specifically,
where the first distribution is the usual multivariate normal posterior holding fixed, and is an inverse gamma distribution with updated parameters. Hence, it was a popular prior model before the advent of MCMC. A complete discussion and explanation is given by DeGroot (Reference DeGroot1970, ch. 9); see also Berger (Reference Berger1985, p.288). We use the prior in this case because it readily adapts to our use of ISMC.
To closely mimic an uninformative analysis regarding the change point, we performed a least-squares piecewise linear regression with an unknown change point. We found very strong agreement on the value of c and, hence, treated c as known and equal to 127.3 km.
Appendix B MCMC–ISMC Approach
Our computational approach combines MCMC and ISMC. The following derivations are motivated by a need to make the ISMC steps efficient. To achieve efficiency, we seek to generate ensembles of unknown quantities that are reasonably similar to ensembles generated directly from the posterior distribution. To do so, we apply Bayes’ theorem to portions of the hierarchical model in an attempt to allow the observational data to strongly influence the simulations. The details of these derivations involve interplay of our modeling assumptions and repeated use of two basic facts from probability theory. First, for the random quantities A and B, we have [A|B] [B] = [B|A] [A] (Bayes’ theorem). Second, such expressions hold conditionally. That is, if C is also some random quantity, we have [A|B, C] [B|C] = [B|A, C] [A|C].
We may write the full posterior distribution as
Note that we have incorporated the basal and surface data-sets into a partial posterior distribution for the basal and surface processes and parameters. This model is amenable to simple MCMC. Further, as indicated in Figure 2, simulated smoothed basal topographies from this partial posterior are clearly responsive to the observations. Next, we rewrite the first two terms on the right side of Equation (B1) as
and then note that
In summary, we have
Our basic algorithm is as follows.
-
1. Via MCMC, produce an ensemble of size M from the partial posterior distribution (recall Equation (B1)). The result includes an ensemble
of smoothed basal and surface elevations.
-
2. For each , generate η m from .
-
3. Similarly, generate from
Finally, define ISMC weights:
(B5)
Coupling these weights with the ensemble members yields a weighted ensemble from the full posterior distribution.
We remark that the selection of the prior described in Appendix A for θ u and enables the exact calculation of the partial posterior distribution needed in step 3 (Equation (A14)). However, there are two difficulties with implementation in our example. First, [U| ψ b, ψ s] is not easily obtained, disabling our ability to compute the weights in Equation (B5). To deal with this problem, we apply the general reasoning suggested in Reference Berliner and WikleBerliner and Wikle (2006) to claim that the earlier steps in the analysis, particularly step 3, have sufficiently accounted for the information in the velocity data regarding the unknowns. Therefore, a good approximate analysis arises if we ignore these weights.
Similarly, step 2 is problematic in that the required distribution is not obtainable in closed form. Recall that our prior for the elements of η is that they are independent, uniform random variables on the interval (−100,100). Hence,
for vectors η lying in the 2880-dimensional cube with sides given by (−100,100). We can obtain the exact form of [U| η , ψ b , ψ s ]. Specifically, it is a multivariate t density (Reference BergerBerger, 1985, p. 561), i.e.
where μ , W and (a, b) are defined in Equations (A11–A13) and Z=Z( η , ψ b , ψ s ) is a 2880 × 4 matrix obtained by rewriting Equation (30) as
Although we have Equation (B7), it is intractable viewed as a probability density function for η . To deal with this problem, we appeal to importance sampling. Specifically, rather than sampling from [ η |U, ψ b , ψ s ], we sample from another distribution, [ η ]IS, chosen to mimic it. Specifically, we used a multivariate t distribution, with mean specified by inverting the regression model and degrees of freedom parameter equal to 10. By inverting the regression model, we mean the following. For each observation of velocity U(x) at a location x after the change point, and each ensemble member, our regression model (Equation (30)) is
where we have suppressed notation indicating dependence on the ensemble member. We then simply approximate eU by 0, yielding the approximation
The right side of this expression is then used to specify the mean of [ η ]IS. This may appear to be cheating, but it is perfectly legal. One may construct importance-sampling distributions in any required fashion, including using the data, as long as ISMC weights are used. These weights are readily approximated by