We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
There are two main schools of thought about statistical inference: frequentist and Bayesian. The frequentist approach relies solely on available data for predictions, while the Bayesian approach incorporates both data and prior knowledge about the event of interest. Bayesian methods were developed hundreds of years ago; however, they were rarely used due to computational challenges and conflicts between the two schools of thought. Recent advances in computational capabilities and a shift toward leveraging prior knowledge for inferences have led to increased use of Bayesian methods.
Methods:
Many biostatisticians with expertise in frequentist approaches lack the skills to apply Bayesian techniques. To address this gap, four faculty experts in Bayesian modeling at the University of Michigan developed a practical, customized workshop series. The training, tailored to accommodate the schedules of full-time staff, focused on immersive, project-based learning rather than traditional lecture-based methods. Surveys were conducted to assess the impact of the program.
Results:
All 20 participants completed the program and when surveyed reported an increased understanding of Bayesian theory and greater confidence in using these techniques. Capstone projects demonstrated participants’ ability to apply Bayesian methodology. The workshop not only enhanced the participants’ skills but also positioned them to readily apply Bayesian techniques in their work.
Conclusions:
Accommodating the schedules of full-time biostatistical staff enabled full participation. The immersive project-based learning approach resulted in building skills and increasing confidence among staff statisticians who were unfamiliar with Bayesian methods and their practical applications.
Inferences are never assumption free. Data summaries that do not account for all relevant effects readily mislead. Distributions for the Pearson correlation and for counts, and extensions accounting for handling extra-binomial and extra-Poisson variation are noted. Notions of statistical power are introduced. Resampling methods, the bootstrap, and permutation tests, extend available inferential approaches. Regression with a single explanatory variable is used as a context in which to introduce residual plots, outliers, influence, robust regression, and standard errors of predicted values. There are two regression lines – that of y on x and that of x on y. Power transformations, with the logarithmic transformation as a special case, are often effective in giving a linear relationship. The training/test approach, and the closely allied of cross-validation approach, can be important for avoiding over-fitting. Other topics include one- and two-way comparisons, adjustments when there are multiple comparisons, and the estimation of false discovery rates when there is severe multiplicity. Discussions of theories of inference, including likelihood, and Bayes Factor and other Bayesian perspectives, ends the chapter.
Identifying racial disparities in policy and politics is a pressing area of research within the United States. Where early work made use of identifying potentially noisy correlations between county or precinct demographics and election outcomes, the advent of Bayesian Improved Surname Geocoding (BISG) vastly improved estimation of race by employing voter lists. Machine Learning (ML)-modified BISG in turn offers accuracy gains over the static – and potentially outdated – surname dictionaries present in traditional BISG. However, the extent to which ML might substantively alter the policy and political implications of redistricting is unclear given its improvements in voter race estimation. Therefore, we ascertain the potential gains of ML-modified BISG in improving the estimation of race for the purpose of redistricting majority-minority districts. We evaluate an ML-modified BISG program against traditional BISG estimates in correctly estimating the race of voters for creating majority-minority congressional districts within North Carolina and Georgia, and in state assembly districts in Wisconsin. Our results demonstrate that ML-modified BISG offers substantive gains over traditional BISG, especially in diverse political geographic units. Further, we find meaningful improvements in accuracy when estimating majority-minority district racial composition. We conclude with recommendations on when and how to use the two methods, in addition how to ensure transparency and confidence in BISG-related research.
Bayesian inference provides a probabilistic reasoning process for drawing conclusions based on imprecise and uncertain data that has been successful in many applications within robotics and information processing, but is most often considered in terms of data analysis rather than synthesis of behaviours. This paper presents the use of Bayesian inference as a means by which to perform Boolean operations in a logic programme while incorporating and propagating uncertainty information through logic operations by inference. Boolean logic operations are implemented in a Bayesian network of Bernoulli random variables with tensor-based discrete distributions to enable probabilistic hybrid logic programming of a robot. This enables Bayesian inference operations to coexist with Boolean logic in a unified system while retaining the ability to capture uncertainty by means of discrete probability distributions. Using a discrete Bayesian network with both Boolean and Bayesian elements, the proposed methodology is applied to navigate a mobile robot using hybrid Bayesian and Boolean operations to illustrate how this new approach improves robotic performance by inclusion of uncertainty without increasing the number of logic elements required. As any logical system could be programmed in this manner to integrate uncertainty into decision-making, this methodology can benefit a wide range of applications that use discrete or probabilistic logic.
The dynamics and fusion of vesicles during the last steps of exocytosis are not well established yet in cell biology. An open issue is the characterization of the diffusion process at the plasma membrane. Total internal reflection fluorescence microscopy (TIRFM) has been successfully used to analyze the coordination of proteins involved in this mechanism. It enables to capture dynamics of proteins with high frame rate and reasonable signal-to-noise values. Nevertheless, methodological approaches that can analyze and estimate diffusion in local small areas at the scale of a single diffusing spot within cells, are still lacking. To address this issue, we propose a novel correlation-based method for local diffusion estimation. As a starting point, we consider Fick’s second law of diffusion that relates the diffusive flux to the gradient of the concentration. Then, we derive an explicit parametric model which is further fitted to time-correlation signals computed from regions of interest (ROI) containing individual spots. Our modeling and Bayesian estimation framework are well appropriate to represent isolated diffusion events and are robust to noise, ROI sizes, and localization of spots in ROIs. The performance of BayesTICS is shown on both synthetic and real TIRFM images depicting Transferrin Receptor proteins.
Birnbaum and Quispe-Torreblanca (2018) evaluated a set of six models developed under true-and-error theory against data in which people made choices in repeated gambles. They concluded the three models based on expected utility theory were inadequate accounts of the behavioral data, and argued in favor of the simplest of the remaining three more general models. To reach these conclusions, they used non-Bayesian statistical methods: frequentist point estimation of parameters, bootstrapped confidence intervals of parameters, and null hypothesis significance testing of models. We address the same research goals, based on the same models and the same data, using Bayesian methods. We implement the models as graphical models in JAGS to allow for computational Bayesian analysis. Our results are based on posterior distribution of parameters, posterior predictive checks of descriptive adequacy, and Bayes factors for model comparison. We compare the Bayesian results with those of Birnbaum and Quispe-Torreblanca (2018). We conclude that, while the very general conclusions of the two approaches agree, the Bayesian approach offers better detailed answers, especially for the key question of the evidence the data provide for and against the competing models. Finally, we discuss the conceptual and practical advantages of using Bayesian methods in judgment and decision making research highlighted by this case study.
Scholars often use language to proxy ethnic identity in studies of conflict and separatism. This conflation of language and ethnicity is misleading: language can cut across ethnic divides and itself has a strong link to identity and social mobility. Language can therefore influence political preferences independently of ethnicity. Results from an original survey of two post-Soviet regions support these claims. Statistical analyses demonstrate that individuals fluent in a peripheral lingua franca are more likely to support separatism than those who are not, while individuals fluent in the language of the central state are less likely to support separatist outcomes. Moreover, linguistic fluency shows a stronger relationship with support for separatism than ethnic identification. These results provide strong evidence that scholars should disaggregate language and ethnic identity in their analyses: language can be more salient for political preferences than ethnicity, and the most salient languages may not even be ethnic.
Bayesian analysis has emerged as a rapidly expanding frontier in qualitative methods. Recent work in this journal has voiced various doubts regarding how to implement Bayesian process tracing and the costs versus benefits of this approach. In this response, we articulate a very different understanding of the state of the method and a much more positive view of what Bayesian reasoning can do to strengthen qualitative social science. Drawing on forthcoming research as well as our earlier work, we focus on clarifying issues involving mutual exclusivity of hypotheses, evidentiary import, adjudicating among more than two hypotheses, and the logic of iterative research, with the goal of elucidating how Bayesian analysis operates and pushing the field forward.
Models for converting expert-coded data to estimates of latent concepts assume different data-generating processes (DGPs). In this paper, we simulate ecologically valid data according to different assumptions, and examine the degree to which common methods for aggregating expert-coded data (1) recover true values and (2) construct appropriate coverage intervals. We find that the mean and both hierarchical Aldrich–McKelvey (A–M) scaling and hierarchical item-response theory (IRT) models perform similarly when expert error is low; the hierarchical latent variable models (A-M and IRT) outperform the mean when expert error is high. Hierarchical A–M and IRT models generally perform similarly, although IRT models are often more likely to include true values within their coverage intervals. The median and non-hierarchical latent variable models perform poorly under most assumed DGPs.
There is renewed interest in levelling up the regions of the UK. The combination of social and political discontent, and the sluggishness of key UK macroeconomic indicators like productivity growth, has led to increased interest in understanding the regional economies of the UK. In turn, this has led to more investment in economic statistics. Specifically, the Office for National Statistics (ONS) recently started to produce quarterly regional GDP data for the nine English regions and Wales that date back to 2012Q1. This complements existing real GVA data for the regions available from the ONS on an annual basis back to 1998; with the devolved administrations of Scotland and Northern Ireland producing their own quarterly output measures. In this paper we reconcile these two data sources along with UK quarterly output data that date back to 1970. This enables us to produce both more timely real terms estimates of quarterly economic growth in the regions of the UK and a new reconciled historical time-series of quarterly regional real output data from 1970. We explore a number of features of interest of these new data. This includes producing a new quarterly regional productivity series and commenting on the evolution of regional productivity growth in the UK.
Given the increasing quantity and impressive placement of work on Bayesian process tracing, this approach has quickly become a frontier of qualitative research methods. Moreover, it has dominated the process-tracing modules at the Institute for Qualitative and Multi-Method Research (IQMR) and the American Political Science Association (APSA) meetings for over five years, rendering its impact even greater. Proponents of qualitative Bayesianism make a series of strong claims about its contributions and scope of inferential validity. Four claims stand out: (1) it enables causal inference from iterative research, (2) the sequence in which we evaluate evidence is irrelevant to inference, (3) it enables scholars to fully engage rival explanations, and (4) it prevents ad hoc hypothesizing and confirmation bias. Notwithstanding the stakes of these claims and breadth of traction this method has received, no one has systematically evaluated the promises, trade-offs, and limitations that accompany Bayesian process tracing. This article evaluates the extent to which the method lives up to the mission. Despite offering a useful framework for conducting iterative research, the current state of the method introduces more bias than it corrects for on numerous dimensions. The article concludes with an examination of the opportunity costs of learning Bayesian process tracing and a set of recommendations about how to push the field forward.
Declining telephone response rates have forced several transformations in survey methodology, including cell phone supplements, nonprobability sampling, and increased reliance on model-based inferences. At the same time, advances in statistical methods and vast amounts of new data sources suggest that new methods can combat some of these problems. We focus on one type of data source—voter registration databases—and show how they can improve inferences from political surveys. These databases allow survey methodologists to leverage political variables, such as party registration and past voting behavior, at a large scale and free of overreporting bias or endogeneity between survey responses. We develop a general process to take advantage of this data, which is illustrated through an example where we use multilevel regression and poststratification to produce vote choice estimates for the 2012 presidential election, projecting those estimates to 195 million registered voters in a postelection context. Our inferences are stable and reasonable down to demographic subgroups within small geographies and even down to the county or congressional district level. They can be used to supplement exit polls, which have become increasingly problematic and are not available in all geographies. We discuss problems, limitations, and open areas of research.
This paper estimates an enriched version of the mainstream medium-scale dynamic stochastic general equilibrium model, which features nonseparability between consumption and real money balances in utility and a systematic response of the policy rate to money growth. Estimation results show that money is a significant factor in the monetary policy rule. As a consequence, econometric analysis that omits money from Taylor rules may lead to biased estimates of the model parameters. In contrast to earlier studies that rely on small-scale models, the paper stresses the merits of using a sufficiently rich model. First, it delivers different results, such as the role of nonseparability between consumption and money in utility. Second, the rich dynamics embedded in the model allow us to explore the responses of a larger set of macroeconomic variables, making the model more informative on the effects of shocks and more useful for understanding the sources of business cycles. Third and most importantly, it reveals the possible pitfalls of relying on small-scale models when studying money’s role in business cycles.
An en primeur agreement is an unconventional forward contract. In this article, we provide a new conceptual framework for analyzing the properties of en primeur prices based on the cost of carry approach. The results, based upon Bayesian modeling, indicate that the cost of carry increases up to 0.9598 when en primeur and bottled wines are traded in parallel. Moreover, our findings confirm that price dispersion around the mean value is greater for en primeur wines (22.42%) than for standard bottled wines (8.2%) traded after the sale of en primeur wines has ended. (JEL Classifications: G12, G15, L66, Q02)
We develop a new Bayesian split population survival model for the analysis of survival data with misclassified event failures. Within political science survival data, right-censored survival cases are often erroneously misclassified as failure cases due to measurement error. Treating these cases as failure events within survival analyses will underestimate the duration of some events. This will bias coefficient estimates, especially in situations where such misclassification is associated with covariates of interest. Our split population survival estimator addresses this challenge by using a system of two equations to explicitly model the misclassification of failure events alongside a parametric survival process of interest. After deriving this model, we use Bayesian estimation via slice sampling to evaluate its performance with simulated data, and in several political science applications. We find that our proposed “misclassified failure” survival model allows researchers to accurately account for misclassified failure events within the contexts of civil war duration and democratic survival.
Data sets quantifying phenomena of social-scientific interest often use multiple experts to code latent concepts. While it remains standard practice to report the average score across experts, experts likely vary in both their expertise and their interpretation of question scales. As a result, the mean may be an inaccurate statistic. Item-response theory (IRT) models provide an intuitive method for taking these forms of expert disagreement into account when aggregating ordinal ratings produced by experts, but they have rarely been applied to cross-national expert-coded panel data. We investigate the utility of IRT models for aggregating expert-coded data by comparing the performance of various IRT models to the standard practice of reporting average expert codes, using both data from the V-Dem data set and ecologically motivated simulated data. We find that IRT approaches outperform simple averages when experts vary in reliability and exhibit differential item functioning (DIF). IRT models are also generally robust even in the absence of simulated DIF or varying expert reliability. Our findings suggest that producers of cross-national data sets should adopt IRT techniques to aggregate expert-coded data measuring latent concepts.
Policy-critical, micro-level statistical data are often unavailable at the desired level of disaggregation. We present a Bayesian methodology for “downscaling” aggregated count data to the micro level, using an outside statistical sample. Our procedure combines numerical simulation with exact calculation of combinatorial probabilities. We motivate our approach with an application estimating the number of farms in a region, using count totals at higher levels of aggregation. In a simulation analysis over varying population sizes, we demonstrate both robustness to sampling variability and outperformance relative to maximum likelihood. Spatial considerations, implementation of “informative” priors, non-spatial classification problems, and best practices are discussed.
A new Bayesian approach for multiple satellite faults detection and exclusion is proposed by introducing a classification variable to each satellite observation. If we treat this classification variable as random and assume a prior distribution for it, then a rule for satellite fault detection and exclusion based on the posterior probabilities of the classification variables is constructed under the framework of Bayesian hypothesis testing. Secondly, the Gibbs sampler is introduced to compute the posterior probabilities of the classification variables. Then the implementation for a Bayesian Receiver Autonomous Integrity Monitoring (RAIM) algorithm is designed with the Gibbs sampler. Finally, different schemes are designed to evaluate the performance of the new Bayesian RAIM algorithm in the case of multiple faults. We compare the method in this paper with the Range Consensus (RANCO) method. Experiments illustrate that the proposed algorithm in this paper is capable of detecting and eliminating multiple satellite faults, and the probability of correctly detecting faults is high.
A key element of national control programmes (NCPs) for Salmonella in commercial laying flocks, introduced across the European Union, is the identification of infected flocks and holdings through statutory sampling. It is therefore important to know the sensitivity of the sampling methods, in order to design effective and efficient surveillance for Salmonella. However, improved Salmonella control in response to the NCP may have influenced key factors that determine the sensitivity of the sampling methods used to detect Salmonella in NCPs. Therefore the aim of this study was to compare estimates of the sensitivity of the sampling methods using data collected before and after the introduction of the NCP, using Bayesian methods. There was a large reduction in the sensitivity of dust in non-cage flocks between the pre-NCP studies (81% of samples positive in positive flocks) and post-NCP studies (10% of samples positive in positive flocks), leading to the conclusion that sampling dust is not recommended for detection of Salmonella in non-cage flocks. However, cage dust (43% of samples positive in positive flocks) was found to be more sensitive than cage faeces (29% of samples positive in positive flocks). To have a high probability of detection, several NCP-style samples need to be used. For confirmation of Salmonella, five NCP faecal samples for cage flocks, and three NCP faecal boot swab samples for non-cage flocks would be required to have the equivalent sensitivity of the EU baseline survey method, which was estimated to have an 87% and 75% sensitivity to detect Salmonella at a 5% within-flock prevalence in cage and non-cage flocks, respectively.
This paper presents a new method of graduation which uses parametric formulae together with Bayesian reversible jump Markov chain Monte Carlo methods. The aim is to provide a method which can be applied to a wide range of data, and which does not require a lot of adjustment or modification. The method also does not require one particular parametric formula to be selected: instead, the graduated values are a weighted average of the values from a range of formulae. In this way, the new method can be seen as an automatic graduation method which we believe can be applied in many cases without any adjustments and provide satisfactory graduated values. An advantage of a Bayesian approach is that it allows for model uncertainty unlike standard methods of graduation.