Search

8 - Regression with Random Forests
from Part II - Regression Methods for Estimation
Darius M. Dziuda, Central Connecticut State University
Book:

Multivariate Biomarker Discovery

Published online:

30 May 2024

Print publication:

06 June 2024, pp 126-135
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Chapter 8 presents random forests for regression, which – at least in some situations – may outperform the least-squares-based regression methods. The chapter discusses bagging in the context of regression applications of random forests, the algorithm for splitting nodes in regression trees, and the variable importance metrics applicable to regression.

A generalized hypothesis test for community structure in networks
Eric Yanchenko, Srijan Sengupta
Journal:

Network Science / Volume 12 / Issue 2 / June 2024

Published online by Cambridge University Press:

11 March 2024, pp. 122-138
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Researchers theorize that many real-world networks exhibit community structure where within-community edges are more likely than between-community edges. While numerous methods exist to cluster nodes into different communities, less work has addressed this question: given some network, does it exhibit statistically meaningful community structure? We answer this question in a principled manner by framing it as a statistical hypothesis test in terms of a general and model-agnostic community structure parameter. Leveraging this parameter, we propose a simple and interpretable test statistic used to formulate two separate hypothesis testing frameworks. The first is an asymptotic test against a baseline value of the parameter while the second tests against a baseline model using bootstrap-based thresholds. We prove theoretical properties of these tests and demonstrate how the proposed method yields rich insights into real-world datasets.

Chapter Eleven - Inferences Involving the Mean when σ is Not Known: One- and Two-Sample Designs
Sharon Lawner Weinberg, New York University, Sarah Knapp Abramowitz, Drew University, New Jersey, Daphna Harel, New York University
Book:

Statistics Using Stata

Published online:

26 January 2024

Print publication:

30 November 2023, pp 322-394
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Chapter 11 covers inferences involving the mean when σ is not known, one- and two-sample designs, and includes the following specific topics, among others: t-distribution, degrees of freedom, t-test assumptions, one-sample t-test, two-sample t-test for independent groups, two-sample t-test for related groups, paired sample t-tests, effect size, the bootstrap, and power analysis.

Block dense weighted networks with augmented degree correction
Benjamin Leinwand, Vladas Pipiras
Journal:

Network Science / Volume 10 / Issue 3 / September 2022

Published online by Cambridge University Press:

14 September 2022, pp. 301-321
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Dense networks with weighted connections often exhibit a community-like structure, where although most nodes are connected to each other, different patterns of edge weights may emerge depending on each node’s community membership. We propose a new framework for generating and estimating dense weighted networks with potentially different connectivity patterns across different communities. The proposed model relies on a particular class of functions which map individual node characteristics to the edges connecting those nodes, allowing for flexibility while requiring a small number of parameters relative to the number of edges. By leveraging the estimation techniques, we also develop a bootstrap methodology for generating new networks on the same set of vertices, which may be useful in circumstances where multiple data sets cannot be collected. Performance of these methods is analyzed in theory, simulations, and real data.

Relaxation of the parameter independence assumption in the bootComb R package
Marc Y. R. Henrion
Journal:

Experimental Results / Volume 4 / 2023

Published online by Cambridge University Press:

13 September 2022, e1
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Background
The bootComb R package allows researchers to derive confidence intervals with correct target coverage for arbitrary combinations of arbitrary numbers of independently estimated parameters. Previous versions (<1.1.0) of bootComb used independent bootstrap sampling and required that the parameters themselves are independent—an unrealistic assumption in some real-world applications.

Findings
Using Gaussian copulas to define the dependence between parameters, the bootComb package has been extended to allow for dependent parameters.

Implications
The updated bootComb package can now handle cases of dependent parameters, with users specifying a correlation matrix defining the dependence structure. While in practice it may be difficult to know the exact dependence structure between parameters, bootComb allows running sensitivity analyses to assess the impact of parameter dependence on the resulting confidence interval for the combined parameter.

Availability
bootComb is available from the Comprehensive R Archive Network (https://CRAN.R-project.org/package=bootComb).

Spatiotemporal co-occurrence of predators and prey in a neotropical mammal community in southern Mexico
R. Elena Galindo-Aguilar, Beatriz Carely Luna-Olivera, Marcelino Ramírez-Ibáñez, Mario C. Lavariega
Journal:

Journal of Tropical Ecology / Volume 38 / Issue 5 / September 2022

Published online by Cambridge University Press:

19 May 2022, pp. 285-294
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Predator-prey interactions are one of the central themes in ecology due to their importance as a key mechanism in structuring biotic communities. In the predator-prey systems, the behaviours of persecution and avoidance impact on the ecosystem dynamics as much as the trophic interactions. We aimed to analyse the spatiotemporal co-occurrences between prey and predators in a community of medium- and large-sized mammals in southern Mexico. We predict prey will avoid sites where a predator previously passed. Contrarily, we expect a search behaviour by predators and a synchronization in activity patterns among them. We found prey does not occur either in time or space where predators have passed, suggesting an avoidance behaviour. Contrary to our expectations, we did not find significant search behaviours from predators to prey. Synchronization in the daily temporal overlap between predators was higher (Δ = 0.77–0.82) than with their prey (Δ = 0.43 – 0.81). The results suggest prey perceives the risk of predation and displays avoidance behaviour both spatially and temporally, which is consistent with the fear theory. This study provides a complementary approach to understanding the behaviour mechanism between predators and prey through camera-trapping or similar data of spatiotemporal co-occurrences.

3 - Confidence Intervals
Timothy DelSole, George Mason University, Virginia, Michael Tippett, Columbia University, New York
Book:

Statistical Methods for Climate Scientists

Published online:

03 February 2022

Print publication:

24 February 2022, pp 52-68
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

A goal in statistics is to make inferences about a population. Typically, such inferences are in the form of estimates of population parameters; for instance, the mean and variance of a normal distribution. Estimates of population parameters are imperfect because they are based on a finite amount of data. The uncertainty in a parameter estimate may be quantified using a confidence interval. A confidence interval is a random interval that encloses the population value with a specified probability. Confidence intervals are related to hypothesis tests about population parameters. Specifically, for a given hypothesis about the value of a parameter, a test at the 5% significance level would reject that value if the 95% confidence interval contained that hypothesized value. This chapter explains how to construct a confidence interval for a difference in means, a ratio of variances, and a correlation coefficient. These confidence intervals assume the samples come from normal distributions. If the distribution is not Gaussian, or the quantity being inferred is complicated, then bootstrap methods offer an important alternative approach, as discussed at the end of this chapter.

TEST FOR CHANGES IN THE MODELED SOLVENCY CAPITAL REQUIREMENT OF AN INTERNAL RISK MODEL
Daniel Gaigall
Journal:

ASTIN Bulletin: The Journal of the IAA / Volume 51 / Issue 3 / September 2021

Published online by Cambridge University Press:

06 August 2021, pp. 813-837

Print publication:

September 2021
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
In the context of the Solvency II directive, the operation of an internal risk model is a possible way for risk assessment and for the determination of the solvency capital requirement of an insurance company in the European Union. A Monte Carlo procedure is customary to generate a model output. To be compliant with the directive, validation of the internal risk model is conducted on the basis of the model output. For this purpose, we suggest a new test for checking whether there is a significant change in the modeled solvency capital requirement. Asymptotic properties of the test statistic are investigated and a bootstrap approximation is justified. A simulation study investigates the performance of the test in the finite sample case and confirms the theoretical results. The internal risk model and the application of the test is illustrated in a simplified example. The method has more general usage for inference of a broad class of law-invariant and coherent risk measures on the basis of a paired sample.

On resampling schemes for polytopes
Part of
- Nonparametric inference
Weinan Qi, Mahmoud Zarepour
Journal:

Journal of Applied Probability / Volume 56 / Issue 4 / December 2019

Published online by Cambridge University Press:

11 December 2019, pp. 959-980

Print publication:

December 2019
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
The convex hull of a sample is used to approximate the support of the underlying distribution. This approximation has many practical implications in real life. To approximate the distribution of the functionals of convex hulls, asymptotic theory plays a crucial role. Unfortunately most of the asymptotic results are computationally intractable. To address this computational intractability, we consider consistent bootstrapping schemes for certain cases. Let $S_n=\{X_i\}_{i=1}^{n}$ be a sequence of independent and identically distributed random points uniformly distributed on an unknown convex set in $\mathbb{R}^{d}$ ($d\ge 2$ ). We suggest a bootstrapping scheme that relies on resampling uniformly from the convex hull of $S_n$ . Moreover, the resampling asymptotic consistency of certain functionals of convex hulls is derived under this bootstrapping scheme. In particular, we apply our bootstrapping technique to the Hausdorff distance between the actual convex set and its estimator. For $d=2$ , we investigate the asymptotic consistency of the suggested bootstrapping scheme for the area of the symmetric difference and the perimeter difference between the actual convex set and its estimate. In all cases the consistency allows us to rely on the suggested resampling scheme to study the actual distributions, which are not computationally tractable.

COMPOUND POISSON CLAIMS RESERVING MODELS: EXTENSIONS AND INFERENCE
Shengwang Meng, Guangyuan Gao
Journal:

ASTIN Bulletin: The Journal of the IAA / Volume 48 / Issue 3 / September 2018

Published online by Cambridge University Press:

11 May 2018, pp. 1137-1156

Print publication:

September 2018
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We consider compound Poisson claims reserving models applied to the paid claims and to the number of payments run-off triangles. We extend the standard Poisson-gamma assumption to account for over-dispersion in the payment counts and to account for various mean and variance structures in the individual payments. Two generalized linear models are applied consecutively to predict the unpaid claims. A bootstrap is used to estimate the mean squared error of prediction and to simulate the predictive distribution of the unpaid claims. We show that the extended compound Poisson models make reasonable predictions of the unpaid claims.

Nonparametric Bounds on Welfare with Measurement Error in Prices: Techniques for Non-Market Resource Valuation
John R. Crooker
Journal:

Agricultural and Resource Economics Review / Volume 36 / Issue 2 / October 2007

Published online by Cambridge University Press:

15 September 2016, pp. 239-252
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Nonparametric techniques are frequently applied in recreation demand studies when researchers are concerned that parametric utility specifications impart bias upon welfare estimates. A goal of this paper is to extend previous work on nonparametric bounds for welfare measures to allow for measurement errors in travel costs. Haab and McConnell (2002) state that issues in travel time valuation continue to be topical in the recreational demand literature. This paper introduces a bootstrap augmented nonparametric procedure to precisely bound welfare when price data contains measurement error. The technique can be extended and becomes more convenient relative to other approaches when more than two site visits are made by a single recreationist. These techniques are demonstrated in a Monte Carlo experiment.

MODEL SELECTION AND ESTIMATING DEGREES OF FREEDOM IN BAYESIAN LINEAR AND LINEAR MIXED EFFECT MODELS
Part of
CHONG YOU
Journal:

Bulletin of the Australian Mathematical Society / Volume 93 / Issue 1 / February 2016

Published online by Cambridge University Press:

13 November 2015, pp. 164-166

Print publication:

February 2016
- Article
- - You have access
- PDF
- Export citation

METHODS FOR ESTIMATING OCCUPANCY
Part of
- Parametric inference
NATALIE KARAVARSAMIS
Journal:

Bulletin of the Australian Mathematical Society / Volume 92 / Issue 3 / December 2015

Published online by Cambridge University Press:

11 August 2015, pp. 518-519

Print publication:

December 2015
- Article
- - You have access
- PDF
- Export citation

MODELING DEPENDENCE BETWEEN LOSS TRIANGLES WITH HIERARCHICAL ARCHIMEDEAN COPULAS
Anas Abdallah, Jean-Philippe Boucher, Hélène Cossette
Journal:

ASTIN Bulletin: The Journal of the IAA / Volume 45 / Issue 3 / September 2015

Published online by Cambridge University Press:

19 June 2015, pp. 577-599

Print publication:

September 2015
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
One of the most critical problems in property/casualty insurance is to determine an appropriate reserve for incurred but unpaid losses. These provisions generally comprise most of the liabilities of a non-life insurance company. The global provisions are often determined under an assumption of independence between the lines of business. Recently, Shi and Frees (2011) proposed to put dependence between lines of business with a copula that captures dependence between two cells of two different runoff triangles. In this paper, we propose to generalize this model in two steps. First, by using an idea proposed by Barnett and Zehnwirth (1998), we will suppose a dependence between all the observations that belong to the same calendar year (CY) for each line of business. Thereafter, we will then suppose another dependence structure that links the CYs of different lines of business. This model is done by using hierarchical Archimedean copulas. We show that the model provides more flexibility than existing models, and offers a better, more realistic and more intuitive interpretation of the dependence between the lines of business. For illustration, the model is applied to a dataset from a major US property-casualty insurer, where a bootstrap method is proposed to estimate the distribution of the reserve.

Variation in Marginal Response to Nitrogen Fertilizer between Locations
Dale K. Graybeal
Journal:

Journal of Agricultural and Applied Economics / Volume 32 / Issue 2 / August 2000

Published online by Cambridge University Press:

28 April 2015, pp. 363-372
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
A logistic growth equation with time and location varying parameters was used to model corn response to applied nitrogen. A nonlinear dummy-variable regression model provided a parsimonious representation of site and time effects on parameter values. The model was used to test for the equality of the mean marginal product of nitrogen fertilizer between locations on the coastal plain of North Carolina. Monte Carlo simulation and bootstrap simulation were used to construct finite sample covariance estimates. Results support rejection of the hypothesis that mean marginal products are equal when nitrogen is applied at 168 kg/ac. A comparison of bootstrapped errors and asymptotic errors suggests that results based on asymptotic theory are fairly reliable in this case.

Climate sensitivity
Roy Thompson
Journal:

Earth and Environmental Science Transactions of The Royal Society of Edinburgh / Volume 106 / Issue 1 / March 2015

Published online by Cambridge University Press:

08 December 2015, pp. 1-10

Print publication:

March 2015
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Earth has been habitable through most of its history, but the anthropogenically mediated greenhouse effect, if sufficiently strong, can threaten Earth's long-standing equability. This paper's main aim is to determine the strength of the anthropogenic greenhouse effect (the climate sensitivity) from observational data and basic physics alone, without recourse to the parameterisations of earth-system models and their inevitable uncertainties. A key finding is that the sensitivity can be constrained by harmonising historical records of land and ocean temperatures with observations of potential climate-change drivers in a non-steady state, energy-balance equation via a least-squares optimisation. The global temperature increase, for a CO2 doubling, is found to lie (95 % confidence limits) between 3.0oC and 6.3oC, with a best estimate of +4oC. Under a business-as-usual scenario, which assumes that there will be no significant change in people's attitudes and priorities, Earth's surface temperature is forecast to rise by 7.9oC over the land, and by 3.6oC over the oceans, by the year 2100. Global temperature rise has slowed in the last decade, leading some to question climate predictions of substantial 21st-Century warming. A formal runs test, however, shows that the recent slowdown is part of the normal behaviour of the climate system.

The Response of Corn Acreage to Ethanol Plant Siting
Yehushua Shay Fatal, Walter N. Thurman
Journal:

Journal of Agricultural and Applied Economics / Volume 46 / Issue 2 / May 2014

Published online by Cambridge University Press:

26 January 2015, pp. 157-171
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
U.S. ethanol production capacity increased more than threefold between 2002 and 2008. We study the effect of this growth on corn acreage. Connecting annual changes in county-level corn acreage to changes in ethanol plant capacities, we find a positive effect on planted corn. The building of a typical plant is estimated to increase corn in the county by over 500 acres and to increase acreage in surrounding counties up to almost 300 miles away. All ethanol plants are estimated to increase corn production by less than their annual requirements.

Crop Insurance, Disaster Payments, and Land Use Change: The Effect of Sodsaver on Incentives for Grassland Conversion
Roger Claassen, Joseph C. Cooper, Fernando Carriazo
Journal:

Journal of Agricultural and Applied Economics / Volume 43 / Issue 2 / May 2011

Published online by Cambridge University Press:

26 January 2015, pp. 195-211
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Subsidized crop insurance may encourage conversion of native grassland to cropland. The Sodsaver provision of the 2008 farm bill could deny crop insurance on converted land in the Prairie Pothole states for 5 years. Supplemental Revenue Assistance payments, which are linked to crop insurance purchases, could also be withheld. Using representative farms, we estimate that Sodsaver would reduce expected crop revenue by up to 8% and expected net return by up to 20%, while increasing the standard deviation of revenue by as much as 6% of market revenue. Analysis based on elasticities from the literature suggests that Sodsaver would reduce grassland conversion by 9% or less.

Factors Influencing and Steps Leading to the Adoption of Best Management Practices by Louisiana Dairy Farmers
Krishna P. Paudel, Wayne M. Gauthier, John V. Westra, Larry M. Hall
Journal:

Journal of Agricultural and Applied Economics / Volume 40 / Issue 1 / April 2007

Published online by Cambridge University Press:

26 January 2015, pp. 203-222
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
A logistic regression procedure was used to assess the impact of socioeconomic attributes on the best management practices (BMPs) adoption decision by Louisiana dairy farmers relative to cost-share and fixed incentive payments. Analysis of the steps in the BMP adoption decision process indicated visits between producers and the U.S. Department of Agriculture–Natural Resource Conservation Service significantly increase likelihood of BMP adoption. Producer willingness-to-pay results indicate that marginal increases in dairy BMP adoption and associated improvement in environmental quality require increased technical and financial assistance.

Merging Clinical Neuropsychology and Functional Neuroimaging to Evaluate the Construct Validity and Neural Network Engagement of the n-Back Task
Tonisha E. Kearney-Ramos, Jennifer S. Fausett, Jennifer L. Gess, Ashley Reno, Jennifer Peraza, Clint D. Kilts, G. Andrew James
Journal:

Journal of the International Neuropsychological Society / Volume 20 / Issue 7 / August 2014

Published online by Cambridge University Press:

25 June 2014, pp. 736-750
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
The n-back task is a widely used neuroimaging paradigm for studying the neural basis of working memory (WM); however, its neuropsychometric properties have received little empirical investigation. The present study merged clinical neuropsychology and functional magnetic resonance imaging (fMRI) to explore the construct validity of the letter variant of the n-back task (LNB) and to further identify the task-evoked networks involved in WM. Construct validity of the LNB task was investigated using a bootstrapping approach to correlate LNB task performance across clinically validated neuropsychological measures of WM to establish convergent validity, as well as measures of related but distinct cognitive constructs (i.e., attention and short-term memory) to establish discriminant validity. Independent component analysis (ICA) identified brain networks active during the LNB task in 34 healthy control participants, and general linear modeling determined task-relatedness of these networks. Bootstrap correlation analyses revealed moderate to high correlations among measures expected to converge with LNB (|ρ|≥0.37) and weak correlations among measures expected to discriminate (|ρ|≤0.29), controlling for age and education. ICA identified 35 independent networks, 17 of which demonstrated engagement significantly related to task condition, controlling for reaction time variability. Of these, the bilateral frontoparietal networks, bilateral dorsolateral prefrontal cortices, bilateral superior parietal lobules including precuneus, and frontoinsular network were preferentially recruited by the 2-back condition compared to 0-back control condition, indicating WM involvement. These results support the use of the LNB as a measure of WM and confirm its use in probing the network-level neural correlates of WM processing. (JINS, 2014, 20, 1–15)

Search Results

Refine search

Refine search

Actions for selected content:

45 results

8 - Regression with Random Forests

Summary

A generalized hypothesis test for community structure in networks

Chapter Eleven - Inferences Involving the Mean when σ is Not Known: One- and Two-Sample Designs

Summary

Block dense weighted networks with augmented degree correction

Relaxation of the parameter independence assumption in the bootComb R package

Spatiotemporal co-occurrence of predators and prey in a neotropical mammal community in southern Mexico

3 - Confidence Intervals

Summary

TEST FOR CHANGES IN THE MODELED SOLVENCY CAPITAL REQUIREMENT OF AN INTERNAL RISK MODEL

On resampling schemes for polytopes

COMPOUND POISSON CLAIMS RESERVING MODELS: EXTENSIONS AND INFERENCE

Nonparametric Bounds on Welfare with Measurement Error in Prices: Techniques for Non-Market Resource Valuation

MODEL SELECTION AND ESTIMATING DEGREES OF FREEDOM IN BAYESIAN LINEAR AND LINEAR MIXED EFFECT MODELS

METHODS FOR ESTIMATING OCCUPANCY

MODELING DEPENDENCE BETWEEN LOSS TRIANGLES WITH HIERARCHICAL ARCHIMEDEAN COPULAS

Variation in Marginal Response to Nitrogen Fertilizer between Locations

Climate sensitivity

The Response of Corn Acreage to Ethanol Plant Siting

Crop Insurance, Disaster Payments, and Land Use Change: The Effect of Sodsaver on Incentives for Grassland Conversion

Factors Influencing and Steps Leading to the Adoption of Best Management Practices by Louisiana Dairy Farmers

Merging Clinical Neuropsychology and Functional Neuroimaging to Evaluate the Construct Validity and Neural Network Engagement of the n-Back Task

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

45 results

Summary

Summary

Summary