We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This cross-sectional ecological study described fruit and vegetable (F&V) intake variability across 144 cities in 8 Latin American countries and by city-level contextual variables. Data sources came from health surveys and census data (Argentina, Brazil, Chile, Colombia, El Salvador, Guatemala, Mexico, and Peru). Self-reported frequency of F&V intake was harmonised across surveys. Daily F&V intake was considered as consumption 7 d of the week. Using a mixed-effects model, we estimated age and sex-standardised city prevalences of daily F&V intake. Through Kruskal–Wallis tests, we compared city F&V daily intake prevalence by tertiles of city variables related to women’s empowerment, socio-economics, and climate zones. The median prevalence for daily F&V intake was 55.7% across all cities (22.1% to 85.4%). Compared to the least favourable tertile of city conditions, F&V daily intake prevalence was higher for cities within the most favourable tertile of per capita GDP (median = 65.7% vs. 53.0%), labour force participation (median = 68.7% vs. 49.4%), women achievement-labour force score (median = 63.9% vs. 45.7%), and gender inequality index (median = 58.6% vs. 48.6%). Also, prevalences were higher for temperate climate zones than arid climate zones (median = 65.9% vs. 50.6%). No patterns were found by city level of educational attainment, city size, or population density. This study provides evidence that the prevalence of daily F&V intake varies across Latin American cities and may be favoured by higher socio-economic development, women’s empowerment, and temperate weather. Interventions to improve F&V intake in Latin America should consider the behaviour disparities related to underlying local social, economic, and climate zone characteristics.
Differential item functioning (DIF) analysis is an important step in establishing the validity of measurements. Most traditional methods for DIF analysis use an item-by-item strategy via anchor items that are assumed DIF-free. If anchor items are flawed, these methods will yield misleading results due to biased scales. In this article, based on the fact that the item’s relative change of difficulty difference (RCD) does not depend on the mean ability of individual groups, a new DIF detection method (RCD-DIF) is proposed by comparing the observed differences against those with simulated data that are known DIF-free. The RCD-DIF method consists of a D-QQ (quantile quantile) plot that permits the identification of internal references points (similar to anchor items), a RCD-QQ plot that facilitates visual examination of DIF, and a RCD graphical test that synchronizes DIF analysis at the test level with that at the item level via confidence intervals on individual items. The RCD procedure visually reveals the overall pattern of DIF in the test and the size of DIF for each item and is expected to work properly even when the majority of the items possess DIF and the DIF pattern is unbalanced. Results of two simulation studies indicate that the RCD graphical test has Type I error rate comparable to those of existing methods but with greater power.
Behavioral and psychological researchers have shown strong interests in investigating contextual effects (i.e., the influences of combinations of individual- and group-level predictors on individual-level outcomes). The present research provides generalized formulas for determining the sample size needed in investigating contextual effects according to the desired level of statistical power as well as width of confidence interval. These formulas are derived within a three-level random intercept model that includes one predictor/contextual variable at each level to simultaneously cover various kinds of contextual effects that researchers can show interest. The relative influences of indices included in the formulas on the standard errors of contextual effects estimates are investigated with the aim of further simplifying sample size determination procedures. In addition, simulation studies are performed to investigate finite sample behavior of calculated statistical power, showing that estimated sample sizes based on derived formulas can be both positively and negatively biased due to complex effects of unreliability of contextual variables, multicollinearity, and violation of assumption regarding the known variances. Thus, it is advisable to compare estimated sample sizes under various specifications of indices and to evaluate its potential bias, as illustrated in the example.
In practice, it is common that a best fitting structural equation model (SEM) is selected from a set of candidate SEMs and inference is conducted conditional on the selected model. Such post-selection inference ignores the model selection uncertainty and yields too optimistic inference. Using the largest candidate model avoids model selection uncertainty but introduces a large variation. Jin and Ankargren (Psychometrika 84:84–104, 2019) proposed to use frequentist model averaging in SEM with continuous data as a compromise between model selection and the full model. They assumed that the true values of the parameters depend on \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n^{-1/2}$$\end{document} with n being the sample size, which is known as a local asymptotic framework. This paper shows that their results are not directly applicable to SEM with ordinal data. To address this issue, we prove consistency and asymptotic normality of the polychoric correlation estimators under the local asymptotic framework. Then, we propose a new frequentist model averaging estimator and a valid confidence interval that are suitable for ordinal data. Goodness-of-fit test statistics for the model averaging estimator are also derived.
Researchers have widely used exploratory factor analysis (EFA) to learn the latent structure underlying multivariate data. Rotation and regularised estimation are two classes of methods in EFA that they often use to find interpretable loading matrices. In this paper, we propose a new family of oblique rotations based on component-wise \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$L^p$$\end{document} loss functions \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(0 < p\le 1)$$\end{document} that is closely related to an \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$L^p$$\end{document} regularised estimator. We develop model selection and post-selection inference procedures based on the proposed rotation method. When the true loading matrix is sparse, the proposed method tends to outperform traditional rotation and regularised estimation methods in terms of statistical accuracy and computational cost. Since the proposed loss functions are nonsmooth, we develop an iteratively reweighted gradient projection algorithm for solving the optimisation problem. We also develop theoretical results that establish the statistical consistency of the estimation, model selection, and post-selection inference. We evaluate the proposed method and compare it with regularised estimation and traditional rotation methods via simulation studies. We further illustrate it using an application to the Big Five personality assessment.
In a recent paper, Bedrick derived the asymptotic distribution of Lord's modified sample biserial correlation estimator and studied its efficiency for bivariate normal populations. We present a more detailed examination of the properties of Lord's estimator and several competitors, including Brogden's estimator. We show that Lord's estimator is more efficient for three nonnormal distributions than a generalization of Pearson's sample biserial estimator. In addition, Lord's estimator is reasonably efficient relative to the maximum likelihood estimator for these distributions. These conclusions are consistent with Bedrick's results for the bivariate normal distribution. We also study the small sample bias and variance of Lord's estimator, and the coverage properties of several confidence interval estimates.
In this paper, we apply sequential one-sided confidence interval estimation procedures with β-protection to adaptive mastery testing. The procedures of fixed-width and fixed proportional accuracy confidence interval estimation can be viewed as extensions of one-sided confidence interval procedures. It can be shown that the adaptive mastery testing procedure based on a one-sided confidence interval with β-protection is more efficient in terms of test length than a testing procedure based on a two-sided/fixed-width confidence interval. Some simulation studies applying the one-sided confidence interval procedure and its extensions mentioned above to adaptive mastery testing are conducted. For the purpose of comparison, we also have a numerical study of adaptive mastery testing based on Wald's sequential probability ratio test. The comparison of their performances is based on the correct classification probability, averages of test length, as well as the width of the “indifference regions.” From these empirical results, we found that applying the one-sided confidence interval procedure to adaptive mastery testing is very promising.
Social scientists are frequently interested in assessing the qualities of social settings such as classrooms, schools, neighborhoods, or day care centers. The most common procedure requires observers to rate social interactions within these settings on multiple items and then to combine the item responses to obtain a summary measure of setting quality. A key aspect of the quality of such a summary measure is its reliability. In this paper we derive a confidence interval for reliability, a test for the hypothesis that the reliability meets a minimum standard, and the power of this test against alternative hypotheses. Next, we consider the problem of using data from a preliminary field study of the measurement procedure to inform the design of a later study that will test substantive hypotheses about the correlates of setting quality. The preliminary study is typically called the “generalizability study” or “G study” while the later, substantive study is called the “decision study” or “D study.” We show how to use data from the G study to estimate reliability, a confidence interval for the reliability, and the power of tests for the reliability of measurement produced under alternative designs for the D study. We conclude with a discussion of sample size requirements for G studies.
It has long been part of the item response theory (IRT) folklore that under the usual empirical Bayes unidimensional IRT modeling approach, the posterior distribution of examinee ability given test response is approximately normal for a long test. Under very general and nonrestrictive nonparametric assumptions, we make this claim rigorous for a broad class of latent models.
In applications of item response theory (IRT), it is often of interest to compute confidence intervals (CIs) for person parameters with prescribed frequentist coverage. The ubiquitous use of short tests in social science research and practices calls for a refinement of standard interval estimation procedures based on asymptotic normality, such as the Wald and Bayesian CIs, which only maintain desirable coverage when the test is sufficiently long. In the current paper, we propose a simple construction of second-order probability matching priors for the person parameter in unidimensional IRT models, which in turn yields CIs with accurate coverage even when the test is composed of a few items. The probability matching property is established based on an expansion of the posterior distribution function and a shrinkage argument. CIs based on the proposed prior can be efficiently computed for a variety of unidimensional IRT models. A real data example with a mixed-format test and a simulation study are presented to compare the proposed method against several existing asymptotic CIs.
Reporting effect size index estimates with their confidence intervals (CIs) can be an excellent way to simultaneously communicate the strength and precision of the observed evidence. We recently proposed a robust effect size index (RESI) that is advantageous over common indices because it’s widely applicable to different types of data. Here, we use statistical theory and simulations to develop and evaluate RESI estimators and confidence/credible intervals that rely on different covariance estimators. Our results show (1) counter to intuition, the randomness of covariates reduces coverage for Chi-squared and F CIs; (2) when the variance of the estimators is estimated, the non-central Chi-squared and F CIs using the parametric and robust RESI estimators fail to cover the true effect size at the nominal level. Using the robust estimator along with the proposed nonparametric bootstrap or Bayesian (credible) intervals provides valid inference for the RESI, even when model assumptions may be violated. This work forms a unified effect size reporting procedure, such that effect sizes with confidence/credible intervals can be easily reported in an analysis of variance (ANOVA) table format.
Establishing the invariance property of an instrument (e.g., a questionnaire or test) is a key step for establishing its measurement validity. Measurement invariance is typically assessed by differential item functioning (DIF) analysis, i.e., detecting DIF items whose response distribution depends not only on the latent trait measured by the instrument but also on the group membership. DIF analysis is confounded by the group difference in the latent trait distributions. Many DIF analyses require knowing several anchor items that are DIF-free in order to draw inferences on whether each of the rest is a DIF item, where the anchor items are used to identify the latent trait distributions. When no prior information on anchor items is available, or some anchor items are misspecified, item purification methods and regularized estimation methods can be used. The former iteratively purifies the anchor set by a stepwise model selection procedure, and the latter selects the DIF-free items by a LASSO-type regularization approach. Unfortunately, unlike the methods based on a correctly specified anchor set, these methods are not guaranteed to provide valid statistical inference (e.g., confidence intervals and p-values). In this paper, we propose a new method for DIF analysis under a multiple indicators and multiple causes (MIMIC) model for DIF. This method adopts a minimal \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$L_1$$\end{document} norm condition for identifying the latent trait distributions. Without requiring prior knowledge about an anchor set, it can accurately estimate the DIF effects of individual items and further draw valid statistical inferences for quantifying the uncertainty. Specifically, the inference results allow us to control the type-I error for DIF detection, which may not be possible with item purification and regularized estimation methods. We conduct simulation studies to evaluate the performance of the proposed method and compare it with the anchor-set-based likelihood ratio test approach and the LASSO approach. The proposed method is applied to analysing the three personality scales of the Eysenck personality questionnaire-revised (EPQ-R).
In this paper, we consider the Rasch model and suggest novel point estimators and confidence intervals for the ability parameter. They are based on a proposed confidence distribution (CD) whose construction has required to overcome some difficulties essentially due to the discrete nature of the model. When the number of items is large, the computations due to the combinatorics involved become heavy, and thus, we provide first- and second-order approximations of the CD. Simulation studies show the good behavior of our estimators and intervals when compared with those obtained through other standard frequentist and weakly informative Bayesian procedures. Finally, using the expansion of the expected length of the suggested interval, we are able to identify reasonable values of the sample size which lead to a desired length of the interval.
Studies with sensitive questions should include a sufficient number of respondents to adequately address the research interest. While studies with an inadequate number of respondents may not yield significant conclusions, studies with an excess of respondents become wasteful of investigators’ budget. Therefore, it is an important step in survey sampling to determine the required number of participants. In this article, we derive sample size formulas based on confidence interval estimation of prevalence for four randomized response models, namely, the Warner’s randomized response model, unrelated question model, item count technique model and cheater detection model. Specifically, our sample size formulas control, with a given assurance probability, the width of a confidence interval within the planned range. Simulation results demonstrate that all formulas are accurate in terms of empirical coverage probabilities and empirical assurance probabilities. All formulas are illustrated using a real-life application about the use of unethical tactics in negotiation.
Samejima’s graded response model (GRM) has gained popularity in the analyses of ordinal response data in psychological, educational, and health-related assessment. Obtaining high-quality point and interval estimates for GRM parameters attracts a great deal of attention in the literature. In the current work, we derive generalized fiducial inference (GFI) for a family of multidimensional graded response model, implement a Gibbs sampler to perform fiducial estimation, and compare its finite-sample performance with several commonly used likelihood-based and Bayesian approaches via three simulation studies. It is found that the proposed method is able to yield reliable inference even in the presence of small sample size and extreme generating parameter values, outperforming the other candidate methods under investigation. The use of GFI as a convenient tool to quantify sampling variability in various inferential procedures is illustrated by an empirical data analysis using the patient-reported emotional distress data.
A Monte Carlo experiment is conducted to investigate the performance of the bootstrap methods in normal theory maximum likelihood factor analysis both when the distributional assumption is satisfied and unsatisfied. The parameters and their functions of interest include unrotated loadings, analytically rotated loadings, and unique variances. The results reveal that (a) bootstrap bias estimation performs sometimes poorly for factor loadings and nonstandardized unique variances; (b) bootstrap variance estimation performs well even when the distributional assumption is violated; (c) bootstrap confidence intervals based on the Studentized statistics are recommended; (d) if structural hypothesis about the population covariance matrix is taken into account then the bootstrap distribution of the normal theory likelihood ratio test statistic is close to the corresponding sampling distribution with slightly heavier right tail.
Intraclass correlation and Cronbach's alpha are widely used to describe reliability of tests and measurements. Even with Gaussian data, exact distributions are known only for compound symmetric covariance (equal variances and equal correlations). Recently, large sample Gaussian approximations were derived for the distribution functions.
New exact results allow calculating the exact distribution function and other properties of intraclass correlation and Cronbach's alpha, for Gaussian data with any covariance pattern, not just compound symmetry. Probabilities are computed in terms of the distribution function of a weighted sum of independent chi-square random variables.
New F approximations for the distribution functions of intraclass correlation and Cronbach's alpha are much simpler and faster to compute than the exact forms. Assuming the covariance matrix is known, the approximations typically provide sufficient accuracy, even with as few as ten observations.
Either the exact or approximate distributions may be used to create confidence intervals around an estimate of reliability. Monte Carlo simulations led to a number of conclusions. Correctly assuming that the covariance matrix is compound symmetric leads to accurate confidence intervals, as was expected from previously known results. However, assuming and estimating a general covariance matrix produces somewhat optimistically narrow confidence intervals with 10 observations. Increasing sample size to 100 gives essentially unbiased coverage. Incorrectly assuming compound symmetry leads to pessimistically large confidence intervals, with pessimism increasing with sample size. In contrast, incorrectly assuming general covariance introduces only a modest optimistic bias in small samples. Hence the new methods seem preferable for creating confidence intervals, except when compound symmetry definitely holds.
Generalized fiducial inference (GFI) has been proposed as an alternative to likelihood-based and Bayesian inference in mainstream statistics. Confidence intervals (CIs) can be constructed from a fiducial distribution on the parameter space in a fashion similar to those used with a Bayesian posterior distribution. However, no prior distribution needs to be specified, which renders GFI more suitable when no a priori information about model parameters is available. In the current paper, we apply GFI to a family of binary logistic item response theory models, which includes the two-parameter logistic (2PL), bifactor and exploratory item factor models as special cases. Asymptotic properties of the resulting fiducial distribution are discussed. Random draws from the fiducial distribution can be obtained by the proposed Markov chain Monte Carlo sampling algorithm. We investigate the finite-sample performance of our fiducial percentile CI and two commonly used Wald-type CIs associated with maximum likelihood (ML) estimation via Monte Carlo simulation. The use of GFI in high-dimensional exploratory item factor analysis was illustrated by the analysis of a set of the Eysenck Personality Questionnaire data.
In educational and psychological measurement when short test forms are used, the asymptotic normality of the maximum likelihood estimator of the person parameter of item response models does not hold. As a result, hypothesis tests or confidence intervals of the person parameter based on the normal distribution are likely to be problematic. Inferences based on the exact distribution, on the other hand, do not suffer from this limitation. However, the computation involved for the exact distribution approach is often prohibitively expensive. In this paper, we propose a general framework for constructing hypothesis tests and confidence intervals for IRT models within the exponential family based on exact distribution. In addition, an efficient branch and bound algorithm for calculating the exact p value is introduced. The type-I error rate and statistical power of the proposed exact test as well as the coverage rate and the lengths of the associated confidence interval are examined through a simulation. We also demonstrate its practical use by analyzing three real data sets.
Although dietary factors have been examined as potential risk factors for liver cancer, the evidence is still inconclusive. Using a diet-wide association analysis, our research evaluated the associations of 126 foods and nutrients on the risk of liver cancer in a Chinese population. We obtained the diet consumption of 72,680 women in the Shanghai Women’s Health Study using baseline dietary questionnaires. The association between each food and nutrient and liver cancer risk was quantified by Cox regression model. A false discovery rate of 0.05 was used to determine the foods and nutrients which need to be verified. Totally 256 incident liver cancer cases were identified in 1,267,391 person-years during the follow-up duration. At the statistical significance level (P ≤ 0.05), higher intakes of cooked wheaten foods, pear, grape and copper were inversely associated with liver cancer risk, while spinach, leafy vegetables, eggplant and carrots showed the positive associations. After considering multiple comparisons, no dietary variable was associated with liver cancer risk. Similar findings were seen in the stratification, secondary and sensitivity analyses. Our findings observed no significant association between dietary factors and liver cancer risk after considering multiple comparisons in Chinese women. More evidence is needed to explore the associations between diet and female liver cancer occurrence.