We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Chapter 8 presents random forests for regression, which – at least in some situations – may outperform the least-squares-based regression methods. The chapter discusses bagging in the context of regression applications of random forests, the algorithm for splitting nodes in regression trees, and the variable importance metrics applicable to regression.
Researchers theorize that many real-world networks exhibit community structure where within-community edges are more likely than between-community edges. While numerous methods exist to cluster nodes into different communities, less work has addressed this question: given some network, does it exhibit statistically meaningful community structure? We answer this question in a principled manner by framing it as a statistical hypothesis test in terms of a general and model-agnostic community structure parameter. Leveraging this parameter, we propose a simple and interpretable test statistic used to formulate two separate hypothesis testing frameworks. The first is an asymptotic test against a baseline value of the parameter while the second tests against a baseline model using bootstrap-based thresholds. We prove theoretical properties of these tests and demonstrate how the proposed method yields rich insights into real-world datasets.
Chapter 11 covers inferences involving the mean when σ is not known, one- and two-sample designs, and includes the following specific topics, among others: t-distribution, degrees of freedom, t-test assumptions, one-sample t-test, two-sample t-test for independent groups, two-sample t-test for related groups, paired sample t-tests, effect size, the bootstrap, and power analysis.
Dense networks with weighted connections often exhibit a community-like structure, where although most nodes are connected to each other, different patterns of edge weights may emerge depending on each node’s community membership. We propose a new framework for generating and estimating dense weighted networks with potentially different connectivity patterns across different communities. The proposed model relies on a particular class of functions which map individual node characteristics to the edges connecting those nodes, allowing for flexibility while requiring a small number of parameters relative to the number of edges. By leveraging the estimation techniques, we also develop a bootstrap methodology for generating new networks on the same set of vertices, which may be useful in circumstances where multiple data sets cannot be collected. Performance of these methods is analyzed in theory, simulations, and real data.
The bootComb R package allows researchers to derive confidence intervals with correct target coverage for arbitrary combinations of arbitrary numbers of independently estimated parameters. Previous versions (<1.1.0) of bootComb used independent bootstrap sampling and required that the parameters themselves are independent—an unrealistic assumption in some real-world applications.
Findings
Using Gaussian copulas to define the dependence between parameters, the bootComb package has been extended to allow for dependent parameters.
Implications
The updated bootComb package can now handle cases of dependent parameters, with users specifying a correlation matrix defining the dependence structure. While in practice it may be difficult to know the exact dependence structure between parameters, bootComb allows running sensitivity analyses to assess the impact of parameter dependence on the resulting confidence interval for the combined parameter.
Predator-prey interactions are one of the central themes in ecology due to their importance as a key mechanism in structuring biotic communities. In the predator-prey systems, the behaviours of persecution and avoidance impact on the ecosystem dynamics as much as the trophic interactions. We aimed to analyse the spatiotemporal co-occurrences between prey and predators in a community of medium- and large-sized mammals in southern Mexico. We predict prey will avoid sites where a predator previously passed. Contrarily, we expect a search behaviour by predators and a synchronization in activity patterns among them. We found prey does not occur either in time or space where predators have passed, suggesting an avoidance behaviour. Contrary to our expectations, we did not find significant search behaviours from predators to prey. Synchronization in the daily temporal overlap between predators was higher (Δ = 0.77–0.82) than with their prey (Δ = 0.43 – 0.81). The results suggest prey perceives the risk of predation and displays avoidance behaviour both spatially and temporally, which is consistent with the fear theory. This study provides a complementary approach to understanding the behaviour mechanism between predators and prey through camera-trapping or similar data of spatiotemporal co-occurrences.
A goal in statistics is to make inferences about a population. Typically, such inferences are in the form of estimates of population parameters; for instance, the mean and variance of a normal distribution. Estimates of population parameters are imperfect because they are based on a finite amount of data. The uncertainty in a parameter estimate may be quantified using a confidence interval. A confidence interval is a random interval that encloses the population value with a specified probability. Confidence intervals are related to hypothesis tests about population parameters. Specifically, for a given hypothesis about the value of a parameter, a test at the 5% significance level would reject that value if the 95% confidence interval contained that hypothesized value. This chapter explains how to construct a confidence interval for a difference in means, a ratio of variances, and a correlation coefficient. These confidence intervals assume the samples come from normal distributions. If the distribution is not Gaussian, or the quantity being inferred is complicated, then bootstrap methods offer an important alternative approach, as discussed at the end of this chapter.
In the context of the Solvency II directive, the operation of an internal risk model is a possible way for risk assessment and for the determination of the solvency capital requirement of an insurance company in the European Union. A Monte Carlo procedure is customary to generate a model output. To be compliant with the directive, validation of the internal risk model is conducted on the basis of the model output. For this purpose, we suggest a new test for checking whether there is a significant change in the modeled solvency capital requirement. Asymptotic properties of the test statistic are investigated and a bootstrap approximation is justified. A simulation study investigates the performance of the test in the finite sample case and confirms the theoretical results. The internal risk model and the application of the test is illustrated in a simplified example. The method has more general usage for inference of a broad class of law-invariant and coherent risk measures on the basis of a paired sample.
The convex hull of a sample is used to approximate the support of the underlying distribution. This approximation has many practical implications in real life. To approximate the distribution of the functionals of convex hulls, asymptotic theory plays a crucial role. Unfortunately most of the asymptotic results are computationally intractable. To address this computational intractability, we consider consistent bootstrapping schemes for certain cases. Let $S_n=\{X_i\}_{i=1}^{n}$ be a sequence of independent and identically distributed random points uniformly distributed on an unknown convex set in $\mathbb{R}^{d}$
($d\ge 2$
). We suggest a bootstrapping scheme that relies on resampling uniformly from the convex hull of $S_n$
. Moreover, the resampling asymptotic consistency of certain functionals of convex hulls is derived under this bootstrapping scheme. In particular, we apply our bootstrapping technique to the Hausdorff distance between the actual convex set and its estimator. For $d=2$
, we investigate the asymptotic consistency of the suggested bootstrapping scheme for the area of the symmetric difference and the perimeter difference between the actual convex set and its estimate. In all cases the consistency allows us to rely on the suggested resampling scheme to study the actual distributions, which are not computationally tractable.
We consider compound Poisson claims reserving models applied to the paid claims and to the number of payments run-off triangles. We extend the standard Poisson-gamma assumption to account for over-dispersion in the payment counts and to account for various mean and variance structures in the individual payments. Two generalized linear models are applied consecutively to predict the unpaid claims. A bootstrap is used to estimate the mean squared error of prediction and to simulate the predictive distribution of the unpaid claims. We show that the extended compound Poisson models make reasonable predictions of the unpaid claims.
Nonparametric techniques are frequently applied in recreation demand studies when researchers are concerned that parametric utility specifications impart bias upon welfare estimates. A goal of this paper is to extend previous work on nonparametric bounds for welfare measures to allow for measurement errors in travel costs. Haab and McConnell (2002) state that issues in travel time valuation continue to be topical in the recreational demand literature. This paper introduces a bootstrap augmented nonparametric procedure to precisely bound welfare when price data contains measurement error. The technique can be extended and becomes more convenient relative to other approaches when more than two site visits are made by a single recreationist. These techniques are demonstrated in a Monte Carlo experiment.
One of the most critical problems in property/casualty insurance is to determine an appropriate reserve for incurred but unpaid losses. These provisions generally comprise most of the liabilities of a non-life insurance company. The global provisions are often determined under an assumption of independence between the lines of business. Recently, Shi and Frees (2011) proposed to put dependence between lines of business with a copula that captures dependence between two cells of two different runoff triangles. In this paper, we propose to generalize this model in two steps. First, by using an idea proposed by Barnett and Zehnwirth (1998), we will suppose a dependence between all the observations that belong to the same calendar year (CY) for each line of business. Thereafter, we will then suppose another dependence structure that links the CYs of different lines of business. This model is done by using hierarchical Archimedean copulas. We show that the model provides more flexibility than existing models, and offers a better, more realistic and more intuitive interpretation of the dependence between the lines of business. For illustration, the model is applied to a dataset from a major US property-casualty insurer, where a bootstrap method is proposed to estimate the distribution of the reserve.
A logistic growth equation with time and location varying parameters was used to model corn response to applied nitrogen. A nonlinear dummy-variable regression model provided a parsimonious representation of site and time effects on parameter values. The model was used to test for the equality of the mean marginal product of nitrogen fertilizer between locations on the coastal plain of North Carolina. Monte Carlo simulation and bootstrap simulation were used to construct finite sample covariance estimates. Results support rejection of the hypothesis that mean marginal products are equal when nitrogen is applied at 168 kg/ac. A comparison of bootstrapped errors and asymptotic errors suggests that results based on asymptotic theory are fairly reliable in this case.
Earth has been habitable through most of its history, but the anthropogenically mediated greenhouse effect, if sufficiently strong, can threaten Earth's long-standing equability. This paper's main aim is to determine the strength of the anthropogenic greenhouse effect (the climate sensitivity) from observational data and basic physics alone, without recourse to the parameterisations of earth-system models and their inevitable uncertainties. A key finding is that the sensitivity can be constrained by harmonising historical records of land and ocean temperatures with observations of potential climate-change drivers in a non-steady state, energy-balance equation via a least-squares optimisation. The global temperature increase, for a CO2 doubling, is found to lie (95 % confidence limits) between 3.0oC and 6.3oC, with a best estimate of +4oC. Under a business-as-usual scenario, which assumes that there will be no significant change in people's attitudes and priorities, Earth's surface temperature is forecast to rise by 7.9oC over the land, and by 3.6oC over the oceans, by the year 2100. Global temperature rise has slowed in the last decade, leading some to question climate predictions of substantial 21st-Century warming. A formal runs test, however, shows that the recent slowdown is part of the normal behaviour of the climate system.
U.S. ethanol production capacity increased more than threefold between 2002 and 2008. We study the effect of this growth on corn acreage. Connecting annual changes in county-level corn acreage to changes in ethanol plant capacities, we find a positive effect on planted corn. The building of a typical plant is estimated to increase corn in the county by over 500 acres and to increase acreage in surrounding counties up to almost 300 miles away. All ethanol plants are estimated to increase corn production by less than their annual requirements.
Subsidized crop insurance may encourage conversion of native grassland to
cropland. The Sodsaver provision of the 2008 farm bill could deny crop
insurance on converted land in the Prairie Pothole states for 5 years.
Supplemental Revenue Assistance payments, which are linked to crop insurance
purchases, could also be withheld. Using representative farms, we estimate
that Sodsaver would reduce expected crop revenue by up to 8% and expected
net return by up to 20%, while increasing the standard deviation of revenue
by as much as 6% of market revenue. Analysis based on elasticities from the
literature suggests that Sodsaver would reduce grassland conversion by 9% or
less.
A logistic regression procedure was used to assess the impact of socioeconomic attributes on the best management practices (BMPs) adoption decision by Louisiana dairy farmers relative to cost-share and fixed incentive payments. Analysis of the steps in the BMP adoption decision process indicated visits between producers and the U.S. Department of Agriculture–Natural Resource Conservation Service significantly increase likelihood of BMP adoption. Producer willingness-to-pay results indicate that marginal increases in dairy BMP adoption and associated improvement in environmental quality require increased technical and financial assistance.
The n-back task is a widely used neuroimaging paradigm for studying the neural basis of working memory (WM); however, its neuropsychometric properties have received little empirical investigation. The present study merged clinical neuropsychology and functional magnetic resonance imaging (fMRI) to explore the construct validity of the letter variant of the n-back task (LNB) and to further identify the task-evoked networks involved in WM. Construct validity of the LNB task was investigated using a bootstrapping approach to correlate LNB task performance across clinically validated neuropsychological measures of WM to establish convergent validity, as well as measures of related but distinct cognitive constructs (i.e., attention and short-term memory) to establish discriminant validity. Independent component analysis (ICA) identified brain networks active during the LNB task in 34 healthy control participants, and general linear modeling determined task-relatedness of these networks. Bootstrap correlation analyses revealed moderate to high correlations among measures expected to converge with LNB (|ρ|≥0.37) and weak correlations among measures expected to discriminate (|ρ|≤0.29), controlling for age and education. ICA identified 35 independent networks, 17 of which demonstrated engagement significantly related to task condition, controlling for reaction time variability. Of these, the bilateral frontoparietal networks, bilateral dorsolateral prefrontal cortices, bilateral superior parietal lobules including precuneus, and frontoinsular network were preferentially recruited by the 2-back condition compared to 0-back control condition, indicating WM involvement. These results support the use of the LNB as a measure of WM and confirm its use in probing the network-level neural correlates of WM processing. (JINS, 2014, 20, 1–15)