We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Data Science for the Geosciences provides students and instructors with the statistical and machine learning foundations to address Earth science questions using real-world case studies in natural hazards, climate change, environmental contamination and Earth resources. It focuses on techniques that address common characteristics of geoscientific data, including extremes, multivariate, compositional, geospatial and space-time methods. Step-by-step instructions are provided, enabling readers to easily follow the protocols for each method, solve their geoscientific problems and make interpretations. With an emphasis on intuitive reasoning throughout, students are encouraged to develop their understanding without the need for complex mathematics, making this the perfect text for those with limited mathematical or coding experience. Students can test their skills with homework exercises that focus on data scientific analysis, modeling, and prediction problems, and through the use of supplemental Python notebooks that can be applied to real datasets worldwide.
Theoretical units of interest often do not align with the spatial units at which data are available. This problem is pervasive in political science, particularly in subnational empirical research that requires integrating data across incompatible geographic units (e.g., administrative areas, electoral constituencies, and grid cells). Overcoming this challenge requires researchers not only to align the scale of empirical and theoretical units, but also to understand the consequences of this change of support for measurement error and statistical inference. We show how the accuracy of transformed values and the estimation of regression coefficients depend on the degree of nesting (i.e., whether units fall completely and neatly inside each other) and on the relative scale of source and destination units (i.e., aggregation, disaggregation, and hybrid). We introduce simple, nonparametric measures of relative nesting and scale, as ex ante indicators of spatial transformation complexity and error susceptibility. Using election data and Monte Carlo simulations, we show that these measures are strongly predictive of transformation quality across multiple change-of-support methods. We propose several validation procedures and provide open-source software to make transformation options more accessible, customizable, and intuitive.
An important attribute for soil use is clay content, because it affects the water-holding capacity and hydraulic properties of a soil. Soil surveys are time-consuming, labour-intensive and costly, while geophysical methods, such as Electromagnetic Induction (EMI) and Ground Penetrating Radar (GPR), offer a non-invasive and non-destructive approach to mapping soil features. The objective of this paper is to assess the spatial relationship between clay content and geophysical data. The EMI and GPR data and soil cores were collected and analysed in a field of about 3 ha size in Rutigliano, Bari, in SE Italy. The EMI data and clay contents were interpolated using ordinary cokriging. The GPR data were pre-processed and the envelope of the filtered GPR data was used to produce 3Dmaps of the kriged estimates. The correlation with clay content was large and positive for EMI, whereas it was negative for the GPR measurement. In this work, a combination of geostatistical and statistical analysis has shown a significant correlation between the EMI and GPR observations. Estimation of the mathematical function relating these two groups of variables requires a multivariate approach of non-stationary geostatistics.
Predictions of weed seedling populations from seedbank data should characterize the spatial distribution as well as the composition and abundance of weeds. The spatial distribution of seedbank and seedling populations of common lambsquarters and annual grasses (giant foxtail, large crabgrass, and fall panicum) were described in moldboard plow and no-tillage soybean fields from 1990 to 1993. Spearman rank correlations between seedbank and seedling densities were significant for common lambsquarters in both tillages and all years, but for annual grasses correlations were significant only in no-tillage. Semivariograms showed spatial autocorrelation in seedbank and seedling populations of common lambsquarters in all years in no-till, but less often in the moldboard plow field. Annual grass seed and seedling populations were autocorrelated in the no-till field every year except 1993, and in the moldboard plow field in 1992 and 1993 only. Cross-semivariograms showed spatial continuity between seedbank and seedling population densities in 3 of 4 yr in no-till for common lambsquarters, and in all years of no-till and 1 yr of moldboard plow for annual grasses. Grey-scale field maps of common lambsquarters seedbanks corresponded visually to maps of seedling populations and could have been used to target control efforts, but visual correspondence between annual grass seedbank and seedling maps was poor. Seedbank and seedling mapping may be useful for site-specific management, but additional information is needed to understand the variation in the relationships between these two populations over time and space.
Seed dispersal, interacting with environmental disturbance and management across heterogeneous landscapes, results in irregular weed spatial distributions. Describing, predicting, and managing weed populations requires an understanding of how weeds are distributed spatially and the consequences of this distribution for population processes. Semivariograms and kriged maps of weed populations in several fields have helped describe spatial structure, but few generalizations can be drawn except that populations are aggregated at one or more scales. Limited information is available on the effect of weed arrangement, pattern, or field location on weed population processes. Because weeds are neither regular nor uniform in distribution, mean density alone is of limited value in estimating yield loss or describing population dynamics over a whole field. Sampling strategies that account for spatial distribution can increase sampling efficiency. Further research should focus on understanding processes that cause changes in spatial distributions over time to help predict rates of invasion and potential extent of colonization.
The nonuniform spatial distribution of weeds complicates sampling, modeling, and management of weed populations. Principles of a rational approach to analysis of weed spatial distribution, combining classical and spatial statistics, are presented using data for cumulative emergence of common lambsquarters in no-tillage soybean fields in 1990 and 1993. Classical statistics, dispersion indices, mean/variance relationships, and frequency histograms confirmed that raw and loge-transformed data were not normally distributed, that populations were aggregated, and that large-scale trends in population means violated assumptions of spatial statistics. Detrending was accomplished by median polishing loge-transformed data and confirmed by evaluation of standardized residuals and frequency histograms. Detrended residuals were used to construct omni-directional and uni-directional semivariograms to describe the spatial structure of the populations. A spherical model fit to the data was verified by cross validation. Semivariograms showed that common lambsquarters density was spatially autocorrelated at distances to 16 m, with more than 30% of the variance in density due to distance between field locations. Comparisons of kriged estimates and their standard deviations with and without detrending indicated that estimates using detrended data were more appropriate and more precise. Kriged estimates of common lambsquarters density were used to draw contour maps of the populations.
Aflatoxin is a fungal toxin contaminating corn and causing liver cancer in humans and animals. Contamination is driven by high temperatures and drought. Aflatoxin assessment is expensive so extension services need to identify high risk areas so irrigation, planting strategies and corn varieties can be adapted. This research presents a web-based decision support tool for risk illustrated with a case study from southern Georgia. The tool employs the approach, developed by Kerry et al. (2017b) where exceedance of key thresholds in temperatures, rainfall, soil type and corn production are used to determine risk. The tool also includes NDVI to indicate drought stress and could be further expanded to include new risk factors and adapted to other crops.
Aflatoxin contamination of food can cause liver cancer in humans and animals. Identification of aflatoxin risk areas allows farmers to adapt management strategies before planting, during growth and at harvest. Aflatoxin contamination is driven by high temperatures and drought conditions and crops grown on light textured soil in the south eastern USA are at particular risk. Aflatoxin assessment is expensive so a role of extension services in precision farming is to identify the areas most at risk of contamination so that farmers can adapt irrigation or planting strategies. This paper extends a county-level risk factors approach developed by Kerry et al. (2017) by investigating the use of NDVI and thermal IR data to indicate drought stress and thus aflatoxin contamination risk at the sub-county level.
Knowing the distribution of weed seedlings in farmer-managed fields could help researchers develop reliable distribution maps for site-specific weed management. With a knowledge of the spatial arrangement of a weed population, cost effective sampling programs and management strategies can be designed, so inputs can be selected and applied to specific field areas where management is warranted. In 1997 and 1998, weeds were sampled at 612 to 682 sites in two center pivot irrigated corn fields (71 and 53 ha) in eastern Colorado. Weeds were enumerated when corn reached the two-leaf, four-leaf, and physiological maturity stages in a 76.2- by 76.2-m grid, a random-directed grid where sites were established at intervals of 76.2 m, and a star configuration based on a 7.62- by 7.62-m grid within three 23,225 m2 areas. Directional correlograms were calculated for 0, 30, 60, 90, 120, and 150° from the crop row. Fifteen weed species were observed across fields. Spatial dependence occurred in 7 of the 93 samples (a collection of sampling units for a particular weed species that was detected within a field at a particular sampling time and year) for populations of field sandbur, pigweed species, nightshade species, and common lambsquarters. Correlogram analysis indicated that 18 to 72% of the variation in sample density was a result of spatial dependence over a geographic distance not exceeding 5 to 363 m among the examined data. Because of the lack of spatial correlation for weed seedling distributions in these eastern Colorado corn fields, interpolated density maps should be based on grid sizes (separation distances) less than 7.62 m for weed seedling infestations.
Comparing distributions among fields, species, and management practices will help us understand the spatial dynamics of weed seed banks, but analyzing observational data requires nontraditional statistical methods. We used cluster analysis and classification and regression tree analysis (CART) to investigate factors that influence spatial distributions of seed banks. CART is a method for developing predictive models, but it is also used to explain variation in a response variable from a set of possible explanatory variables. With cluster analysis, we identified patterns of variation with direction of the distance over which seed bank density was correlated (range of spatial dependence) with single-species seed banks in corn. Then we predicted patterns of the seed banks with CART using field and species characteristics and seed bank density as explanatory variables. Patterns differed by magnitude of variation in the range of spatial dependence (strength of anisotropy) and direction of the maximum range. Density and type of irrigation explained the most variation in pattern. Long ranges were associated with large seed banks and stronger anisotropy with furrow than center pivot irrigation. Pattern was also explained by seed size and longevity, characteristics for natural dispersal, species, soil texture, and whether the weed was a grass or broadleaf. Significance of these factors depended on density or type of irrigation, and some patterns were predicted for more than one combination of factors. Dispersal was identified as a primary process of spatial dynamics and pattern varied for seed spread by tillage, wind, or natural dispersal. However, demographic characteristics and density were more important in this research than in previous research. Impact of these factors may have been clearer because interactions were modeled. Lack of data will be the greatest obstacle to using comparative studies and CART to understand the spatial dynamics of weed seed banks.
Geostatistical techniques were used to describe and map the spatial distribution of crenate broomrape populations parasitizing broad bean over 6 yr (from 1985 to 1990). In the first year, the spatial distribution was random, but from 1986 to 1989, crenate broomrape populations were clearly aggregated. The crenate broomrape infection severity (IS: number of emerged broomrape m−2) increased every year, from an average of 0.45 in 1985 to 29.4 in 1989, with a slight decrease the following year (IS = 27.4). Spherical functions provided the best fit because the cross-validation criteria were accomplished in all study cases. Kriged estimates were used to draw contour maps of the populations. About 34.3, 43.3, and 74.3% of the field plot surface exhibited an IS ≥ 1 (economic threshold) in 1985, 1986, and 1987, respectively, and nearly 100% of the area exceeded the economic threshold from 1988 to 1990; 1985 and 1986 were key years for control of the parasitic weed population. The percentage of infested area at different IS intervals in each year's map obtained by kriging was used to estimate the percentage of yield losses in each infested area (YA) with the equation: YA = A ∗ Ymax ∗ (1 − IS ∗ 0.124), where A is the infested area at a given IS interval and Ymax is the expected broomrape-free broad bean yield. Yield losses under different IS intervals were compared with yield loss attributable to a uniform distribution of crenate broomrape. Results showed that yield loss assuming a uniform distribution of crenate broomrape was clearly overestimated, which is important to avoid overuse of herbicides.
Geostatistical techniques were used to describe and map weed spatial distribution in two sunflower fields in Cabello and Monclova, southern Spain. Data from the study were used to design intermittent spraying strategies. Weed species, overall infestation severity (IS) index, and spatial distribution varied considerably between the two sites. Weed species displayed differences in spatial dependence regardless of IS. The IS mapping of each single weed and of the overall infestation was achieved by kriging, and site-specific application maps were then drawn based on the multi-species weed map and the estimated economic threshold (ET). Herbicide treatment was assumed to be needed for an overall IS score of 2 or 3, and the infested “area exceeding the economic threshold” was determined. The overall weed-infested area varied considerably between locations. About 99 and 38% of the total area was moderately infested (IS ≥ 2) at Monclova and Cabello, respectively. Therefore, if a given herbicide were applied just to the areas exceeding the ET, a significant herbicide saving would be realized in Cabello but not in Monclova. A multi-species spatial analysis provides an opportunity to make site-specific management recommendations from a map of the distribution of IS of the total infestation. Furthermore, only in fields with hard-to-control weed species (e.g., nodding broomrape and corn caraway) would site-specific herbicide application maps developed from total weed infestations need to be complemented with targeted site-specific herbicide treatments to prevent further spread of these species, although their IS might be low.
The knowledge of weed distribution in a field is a key factor to manage weeds effectively. The feasibility of using weed distribution maps for site-specific weed control will largely depend on the stability of the spatial distribution of the populations. Seed banks are the most reliable way of telling the area's weediness, but the effect of regular herbicide applications on its stability is largely unknown. A field experiment was conducted during 3 yr in a winter wheat field under herbicide treatments with the aim of studying the seed bank's spatial distribution of prostrate knotweed and corn poppy and the spatiotemporal stability of their populations. Soil samples were taken each year on the same locations, and seed abundance was measured by germination in greenhouse. Both species accounted for more than 10% of the broad-leaved weed seed bank and they were selected for further analysis. Prostrate knotweed seed-bank density decreased 76% and corn poppy 88% in 3 yr. Spatial distribution was described by spherical isotropic semivariograms. Distance of spatial dependence (range) of prostrate knotweed and corn poppy decreased 33 and 11% respectively, and the spatial variability (sill) decreased 96 and 99%. Yearly spatial seed distribution was compared for each species and no temporal stability was observed over a 3-yr period. The lack of stability was attributed to the important decrease of seed density over time and the increase in the short-range variability (nugget). However, for prostrate knotweed, the location of minima and maxima were roughly the same between years, allowing farmers to extend the period of use of the weed distribution maps. Although spatial distribution of seed banks can be affected by processes that promote fast changes in the densities of weed populations, this fact does not mean that weed distribution maps could not be used in consecutive seasons.
Weed maps are typically produced from data sampled at discrete intervals on a regular grid. Errors are expected to occur as data are sampled at increasingly coarse scales. To demonstrate the potential effect of sampling strategy on the quality of weed maps, we analyzed a data set comprising the counts of capeweed in 225,000 quadrats completely covering a 0.9-ha area. The data were subsampled at different grid spacings, quadrat sizes, and starting points and were then used to produce maps by kriging. Spacings of 10 m were found to overestimate the geostatistical range by 100% and missed details apparently resulting from the spraying equipment. Some evidence was found supporting the rule of thumb that surveys should be conducted at a spacing of about half the scale of interest. Quadrat size had less effect than spacing on the map quality. At wider spacings the starting position of the sample grid had a considerable effect on the qualities of the maps but not on the estimated geostatistical range. Continued use of arbitrary survey designs is likely to miss the information of interest to biologists and may possibly produce maps inappropriate to spray application technology.
Growers need affordable methods to sample weed populations to reduce herbicide use with site-specific weed management. Sampling programs and methods of developing sampling programs for integrated pest management are not sufficient for site-specific weed management because more and different information is needed to make treatment maps than simply estimate average pest density. Sampling plans for site-specific weed management must provide information to map the weeds in the field but should be developed for the objective of prescribing spatially variable management. Weed scientists will be most successful at designing plans for site-specific weed management if they focus on this objective throughout the process of designing a sampling plan. They must also learn more about the spatial distribution and dynamics of weed populations and use that knowledge to identify cost-effective plans, recommend methods to make maps as well as collect data, and find ways to evaluate maps that reflect management to be prescribed from the map. Foremost, sampling must be thought of as an ongoing process over time that uses many types of information rather than a single event of collecting one type of information. Specifically, scientists will need to identify common characteristics rather than just differences of the spatial distribution of weeds among fields and species, recognize that map accuracy may be a poor indicator of the value of a sampling plan, and develop methods to use growers' knowledge of the distribution of weeds and past spatially variable management within a field for both making a map and recommending a sampling plan. The value of proposed methods for sampling and mapping must also be demonstrated or adoption of site-specific weed management might be limited to growers who enjoy using sophisticated technology.
The size, location, and variation in time of weed patches within an arable field were analyzed with the ultimate goal of simplifying weed mapping. Annual and perennial weeds were sampled yearly from 1993 to 1997 at 410 permanent grid points in a 1.3-ha no-till field sown to row crops each year. Geostatistical techniques were used to examine the data as follows: (1) spatial structure within years; (2) relationships of spatial structure to literature-derived population parameters, such as seed production and seed longevity; and (3) stability of weed patches across years. Within years, densities were more variable across crop rows and patches were elongated along rows. Aggregation of seedlings into patches was strongest for annuals and, more generally, for species whose seeds were dispersed by combine harvesting. Patches were most persistent for perennials and, more generally, for species whose seeds dispersed prior to expected dates of combine harvesting. For the most abundant weed in the field, the annual, Setaria viridis, locations of patches in the current year could be used to predict patch locations in the following year, but not thereafter.
Weed management could be more efficient and require less herbicide if growers could afford to estimate the composition, density, and distribution of weed seed banks. Spatial distribution of a weed seed bank will affect the accuracy of both mean estimates and interpolated maps of density. Consequently, information about the general characteristics of spatial distributions of seeds in a seed bank is needed to identify the most efficient strategies for sampling. Seed banks were sampled on 8.4-m square grids in eight irrigated corn fields to identify the common features of distributions of seed banks of annual weeds. Spatial dependence was described with correlograms for four to eight species in each field. Spatial dependence was detected for 36 of 45 distributions, and seed counts were correlated to an average distance of 25 to 150 m for a distribution. Seed banks of different species and fields had common features of spatial correlation: spatial pattern accounted for less than half of the total variability of seed counts, spatial correlation decreased rapidly over short distances, and ranges of spatial dependence varied with direction. For half of the distributions, the maximum range of spatial dependence was at least twice as long as the minimum range. Seed counts were correlated for the longest distances in the direction of the crop row for 16 distributions, and the distance was longer in the direction of the crop row than across rows for 26 of the 36 samples. Researchers should be able to design more efficient sampling plans for growers if the common features of spatial dependence are considered. For seed banks like these, the accuracy of maps and estimates of seed bank density may be improved by collecting multiple cores around each sampling location to mitigate the effect of short-scale spatial variability. In addition, sampling may be more efficient with grids and interpolation methods that account for ranges that are 1.5 to 2 times longer in the direction of the crop row than perpendicular to the row. With a 55- by 30-m sampling grid, adjacent observations would be correlated, and maps could be made for 80% of these seed banks. More closely spaced observations would be needed to describe the rapid decline in spatial correlation with distance for a more accurate or finer-scale map. Whether sampling seed banks for making management decisions will be cost-effective is not clear. However, potential methods to sample and map seed bank distributions more efficiently have not been exhausted.
Weeds generally occur in patches in production fields. Are these patches spatially and temporally stable? Do management recommendations change on the basis of these data? The population density and location of annual grass weeds and common ragweed were examined in a 65-ha corn/soybean production field from 1995 to 2004. Yearly treatment recommendations were developed from field means, medians, and kriging grid cell densities, using the hyperbolic yield loss (YL) equation and published incremental YL values (I), maximum YL values (A), and YL limits of 5, 10, or 15%. Mean plant densities ranged from 12 to 131 annual grasses m−2 and < 1 to 37 common ragweed m−2. Median weed densities ranged from 0 to 40 annual grasses m−2 and were 0 for common ragweed. The grass I values used to estimate corn YL were 0.1 and 2% and treatment was recommended in only 1 yr when the high I value and either the mean or median density was used. The grass I values used for soybean were 0.7 and 10% and estimated YL was over 10% all years, regardless of I value. The common ragweed I values were 4.5 and 6% for corn and 5.1 and 15.6% for soybean. On the basis of mean densities, fieldwide treatment would have been recommended in 6 of 9 yr but in no years when the median density was used. Recommendations on the basis of grid cell weed density and kriging ranged from > 80% of the field treated for grass weeds in 3 of 4 yr in soybean to < 20% of the field treated for common ragweed in 2002 and 2004 (corn). Grass patches were more stable in time, space, and density than common ragweed patches. Population densities and spatial distribution generally were variable enough so that site-specific information within this field would improve weed management decisions.
Recognising the scarcity of glacier mass-balance data in the Southern Hemisphere, a mass-balance measurement programme was started at Brewster Glacier in the Southern Alps of New Zealand in 2004. Evolution of the measurement regime over the 11 years of data recorded means there are differences in the spatial density of data obtained. To ensure the temporal integrity of the dataset a new geostatistical approach is developed to calculate mass balance. Spatial co-variance between elevation and snow depth allows a digital elevation model to be used in a co-kriging approach to develop a snow depth index (SDI). By capturing the observed spatial variability in snow depth, the SDI is a more reliable predictor than elevation and is used to adjust each year of measurements consistently despite variability in sampling spatial density. The SDI also resolves the spatial structure of summer balance better than elevation. Co-kriging is used again to spatially interpolate a derived mean summer balance index using SDI as a co-variate, which yields a spatial predictor for summer balance. The average glacier-wide surface winter, summer and annual balances over the period 2005–15 are 2484, −2586 and −102 mm w.e., respectively, with changes in summer balance explaining most of the variability in annual balance.
A soil-quality map is at present an important tool to integrate laws on soil quality with regional infrastructural works. Basic data are commonly available, but soil quality is an indicator that has to be derived from these data, including site-specific environmental standards. We propose three geostatistics-based methods for the comparison of interpolated contaminant concentrations and standards.
The study is illustrated by data from a part of the Betuwe railroad transect, which extends over 12 km in the western Netherlands. As it turns out, a useful procedure is to combine interpolated contaminant concentrations with interpolated threshold values.