We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
When surveys contain direct questions about sensitive topics, participants may not provide their true answers. Indirect question techniques incentivize truthful answers by concealing participants’ responses in various ways. The Crosswise Model aims to do this by pairing a sensitive target item with a non-sensitive baseline item, and only asking participants to indicate whether their responses to the two items are the same or different. Selection of the baseline item is crucial to guarantee participants’ perceived and actual privacy and to enable reliable estimates of the sensitive trait. This research makes the following contributions. First, it describes an integrated methodology to select the baseline item, based on conceptual and statistical considerations. The resulting methodology distinguishes four statistical models. Second, it proposes novel Bayesian estimation methods to implement these models. Third, it shows that the new models introduced here improve efficiency over common applications of the Crosswise Model and may relax the required statistical assumptions. These three contributions facilitate applying the methodology in a variety of settings. An empirical application on attitudes toward LGBT issues shows the potential of the Crosswise Model. An interactive app, Python and MATLAB codes support broader adoption of the model.
The posterior distribution of the bivariate correlation is analytically derived given a data set where x is completely observed but y is missing at random for a portion of the sample. Interval estimates of the correlation are then constructed from the posterior distribution in terms of highest density regions (HDRs). Various choices for the form of the prior distribution are explored. For each of these priors, the resulting Bayesian HDRs are compared with each other and with intervals derived from maximum likelihood theory.
This paper assesses the psychometric value of allowing test-takers choice in standardized testing. New theoretical results examine the conditions where allowing choice improves score precision. A hierarchical framework is presented for jointly modeling the accuracy of cognitive responses and item choices. The statistical methodology is disseminated in the ‘cIRT’ R package. An ‘answer two, choose one’ (A2C1) test administration design is introduced to avoid challenges associated with nonignorable missing data. Experimental results suggest that the A2C1 design and payout structure encouraged subjects to choose items consistent with their cognitive trait levels. Substantively, the experimental data suggest that item choices yielded comparable information and discrimination ability as cognitive items. Given there are no clear guidelines for writing more or less discriminating items, one practical implication is that choice can serve as a mechanism to improve score precision.
Recently, there has been a renewed interest in the four-parameter item response theory model as a way to capture guessing and slipping behaviors in responses. Research has shown, however, that the nested three-parameter model suffers from issues of unidentifiability (San Martín et al. in Psychometrika 80:450–467, 2015), which places concern on the identifiability of the four-parameter model. Borrowing from recent advances in the identification of cognitive diagnostic models, in particular, the DINA model (Gu and Xu in Stat Sin https://doi.org/10.5705/ss.202018.0420, 2019), a new model is proposed with restrictions inspired by this new literature to help with the identification issue. Specifically, we show conditions under which the four-parameter model is strictly and generically identified. These conditions inform the presentation of a new exploratory model, which we call the dyad four-parameter normal ogive (Dyad-4PNO) model. This model is developed by placing a hierarchical structure on the DINA model and imposing equality constraints on a priori unknown dyads of items. We present a Bayesian formulation of this model, and show that model parameters can be accurately recovered. Finally, we apply the model to a real dataset.
This paper proposes a general approach to accounting for individual differences in the extreme response style in statistical models for ordered response categories. This approach uses a hierarchical ordinal regression modeling framework with heterogeneous thresholds structures to account for individual differences in the response style. Markov chain Monte Carlo algorithms for Bayesian inference for models with heterogeneous thresholds structures are discussed in detail. A simulation and two examples based on ordinal probit models are given to illustrate the proposed methodology. The simulation and examples also demonstrate that failing to account for individual differences in the extreme response style can have adverse consequences for statistical inferences.
Cognitive diagnosis models are partially ordered latent class models and are used to classify students into skill mastery profiles. The deterministic inputs, noisy “and” gate model (DINA) is a popular psychometric model for cognitive diagnosis. Application of the DINA model requires content expert knowledge of a Q matrix, which maps the attributes or skills needed to master a collection of items. Misspecification of Q has been shown to yield biased diagnostic classifications. We propose a Bayesian framework for estimating the DINA Q matrix. The developed algorithm builds upon prior research (Chen, Liu, Xu, & Ying, in J Am Stat Assoc 110(510):850–866, 2015) and ensures the estimated Q matrix is identified. Monte Carlo evidence is presented to support the accuracy of parameter recovery. The developed methodology is applied to Tatsuoka’s fraction-subtraction dataset.
Owen (1975) proposed an approximate empirical Bayes procedure for item selection in computerized adaptive testing (CAT). The procedure replaces the true posterior by a normal approximation with closed-form expressions for its first two moments. This approximation was necessary to minimize the computational complexity involved in a fully Bayesian approach but is no longer necessary given the computational power currently available for adaptive testing. This paper suggests several item selection criteria for adaptive testing which are all based on the use of the true posterior. Some of the statistical properties of the ability estimator produced by these criteria are discussed and empirically characterized.
Cultural Consensus Theory (CCT) models have been applied extensively across research domains in the social and behavioral sciences in order to explore shared knowledge and beliefs. CCT models operate on response data, in which the answer key is latent. The current paper develops methods to enhance the application of these models by developing the appropriate specifications for hierarchical Bayesian inference. A primary contribution is the methodology for integrating the use of covariates into CCT models. More specifically, both person- and item-related parameters are introduced as random effects that can respectively account for patterns of inter-individual and inter-item variability.
This Element highlights the employment within archaeology of classification methods developed in the field of chemometrics, artificial intelligence, and Bayesian statistics. These run in both high- and low-dimensional environments and often have better results than traditional methods. Instead of a theoretical approach, it provides examples of how to apply these methods to real data using lithic and ceramic archaeological materials as case studies. A detailed explanation of how to process data in R (The R Project for Statistical Computing), as well as the respective code, are also provided in this Element.
The design of gas turbine combustors for optimal operation at different power ratings is a multifaceted engineering task, as it requires the consideration of several objectives that must be evaluated under different test conditions. We address this challenge by presenting a data-driven approach that uses multiple probabilistic surrogate models derived from Gaussian process regression to automatically select optimal combustor designs from a large parameter space, requiring only a few experimental data points. We present two strategies for surrogate model training that differ in terms of required experimental and computational efforts. Depending on the measurement time and cost for a target, one of the strategies may be preferred. We apply the methodology to train three surrogate models under operating conditions where the corresponding design objectives are critical: reduction of NOx emissions, prevention of lean flame extinction, and mitigation of thermoacoustic oscillations. Once trained, the models can be flexibly used for different forms of a posteriori design optimization, as we demonstrate in this study.
In this Element, the authors introduce Bayesian probability and inference for social science students and practitioners starting from the absolute beginning and walk readers steadily through the Element. No previous knowledge is required other than that in a basic statistics course. At the end of the process, readers will understand the core tenets of Bayesian theory and practice in a way that enables them to specify, implement, and understand models using practical social science data. Chapters will cover theoretical principles and real-world applications that provide motivation and intuition. Because Bayesian methods are intricately tied to software, code in both R and Python is provided throughout.
This Element highlights the employment within archaeology of classification methods developed in the field of chemometrics, artificial intelligence, and Bayesian statistics. These operate in both high- and low-dimensional environments and often have better results than traditional methods. The basic principles and main methods are introduced with recommendations for when to use them.
One of the best methods to investigate and calculate a desired quantity using available limited data is the Bayesian statistical method, which has been recently entered the field of nuclear astrophysics and can be used to evaluate the astrophysical S-factors, the cross sections and, as a result, the nuclear reaction rates of Big Bang Nucleosynthesis. This study tries to calculate the astrophysical S-factor and the rate of reaction T(d,n)4He as an important astrophysical reaction with the help of this method in energies lower that electron repulsive barrier, and for this purpose, it uses the R-Software, which leads to improved results in comparison with the non-Bayesian methods for the mentioned reaction rate.
Gaussian graphical models are useful tools for conditional independence structure inference of multivariate random variables. Unfortunately, Bayesian inference of latent graph structures is challenging due to exponential growth of $\mathcal{G}_n$, the set of all graphs in n vertices. One approach that has been proposed to tackle this problem is to limit search to subsets of $\mathcal{G}_n$. In this paper we study subsets that are vector subspaces with the cycle space $\mathcal{C}_n$ as the main example. We propose a novel prior on $\mathcal{C}_n$ based on linear combinations of cycle basis elements and present its theoretical properties. Using this prior, we implement a Markov chain Monte Carlo algorithm, and show that (i) posterior edge inclusion estimates computed with our technique are comparable to estimates from the standard technique despite searching a smaller graph space, and (ii) the vector space perspective enables straightforward implementation of MCMC algorithms.
Under mild assumptions, we show that the exact convergence rate in total variation is also exact in weaker Wasserstein distances for the Metropolis–Hastings independence sampler. We develop a new upper and lower bound on the worst-case Wasserstein distance when initialized from points. For an arbitrary point initialization, we show that the convergence rate is the same and matches the convergence rate in total variation. We derive exact convergence expressions for more general Wasserstein distances when initialization is at a specific point. Using optimization, we construct a novel centered independent proposal to develop exact convergence rates in Bayesian quantile regression and many generalized linear model settings. We show that the exact convergence rate can be upper bounded in Bayesian binary response regression (e.g. logistic and probit) when the sample size and dimension grow together.
The superposition of data sets with internal parametric self-similarity is a longstanding and widespread technique for the analysis of many types of experimental data across the physical sciences. Typically, this superposition is performed manually, or recently through the application of one of a few automated algorithms. However, these methods are often heuristic in nature, are prone to user bias via manual data shifting or parameterization, and lack a native framework for handling uncertainty in both the data and the resulting model of the superposed data. In this work, we develop a data-driven, nonparametric method for superposing experimental data with arbitrary coordinate transformations, which employs Gaussian process regression to learn statistical models that describe the data, and then uses maximum a posteriori estimation to optimally superpose the data sets. This statistical framework is robust to experimental noise and automatically produces uncertainty estimates for the learned coordinate transformations. Moreover, it is distinguished from black-box machine learning in its interpretability—specifically, it produces a model that may itself be interrogated to gain insight into the system under study. We demonstrate these salient features of our method through its application to four representative data sets characterizing the mechanics of soft materials. In every case, our method replicates results obtained using other approaches, but with reduced bias and the addition of uncertainty estimates. This method enables a standardized, statistical treatment of self-similar data across many fields, producing interpretable data-driven models that may inform applications such as materials classification, design, and discovery.
This new graduate textbook adopts a pedagogical approach to contemporary cosmology that enables readers to build an intuitive understanding of theory and data, and of how they interact, which is where the greatest advances in the field are currently being made. Using analogies, intuitive explanations of complex topics, worked examples and computational problems, the book begins with the physics of the early universe, and goes on to cover key concepts such as inflation, dark matter and dark energy, large‑scale structure, and cosmic microwave background. Computational and data analysis techniques, and statistics, are integrated throughout the text, particularly in the chapters on late-universe cosmology, while another chapter is entirely devoted to the basics of statistical methods. A solutions manual for end-of-chapter problems is available to instructors, and suggested syllabi, based on different course lengths and emphasis, can be found in the Preface. Online computer code and datasets enhance the student learning experience.
This chapter reviews statistics and data-analysis tools. Starting from basic statistical concepts such as mean, variance, and the Gaussian distribution, we introduce the principal tools required for data analysis. We discuss both Bayesian and frequentist statistical approaches, with emphasis on the former. This leads us to describe how to calculate the goodness of fit of data to theory, and how to constrain the parameters of a model. Finally, we introduce and explain, both intuitively and mathematically, two important statistical tools: Markov chain Monte Carlo (MCMC) and the Fisher information matrix.
The Bayesian approach is a way to interpret a study within the context of the entire literature. It is an important method to use so that a single study isn’t overestimated. It also allows for input of clinical experience.