No CrossRef data available.
Published online by Cambridge University Press: 21 December 2023
Definitive diagnosis of Alzheimer’s disease (AD) is often unavailable, so clinical diagnoses with some degree of inaccuracy are often used in research instead. When researchers test methods that may improve clinical accuracy, the error in initial diagnosis can penalize predictions that are more accurate to true diagnoses but differ from clinical diagnoses. To address this challenge, the current study investigated the use of a simple bias adjustment for use in logistic regression that accounts for known inaccuracy in initial diagnoses.
A Bayesian logistic regression model was developed to predict unobserved/true diagnostic status given the sensitivity and specificity of an imperfect reference. This model considers cases as a mixture of true (with rate = sensitivity) and false positives (rate = 1 - specificity) while controls are mixtures of true (rate = specificity) and false negatives (rate = 1 - sensitivity). This bias adjustment was tested using Monte Carlo simulations over four conditions that varied the accuracy of clinical diagnoses. Conditions utilized 1000 iterations each generating a random dataset of n = 1000 based on a true logistic model with an intercept and three arbitrary predictors. Coefficients for parameters were randomly selected in each iteration and used to produce a set of two diagnoses: true diagnoses and observed diagnoses with imperfect accuracy. Sensitivity and specificity of the simulated clinical diagnosis varied with each of the four conditions (C): C1 = (0.77, 0.60), C2 = (0.87, 0.44), C3 = (0.71, 0.71), and C4 = (0.83, 0.55), which are derived from published values for clinical AD diagnoses against autopsy-confirmed pathology. Unadjusted and bias-adjusted logistic regressions were then fit to the simulated data to determine the models’ accuracy in estimating regression parameters and prediction of true diagnosis.
Under all conditions, the bias-adjusted logistic regression model outperformed its unadjusted counterpart. Root mean square error (the variability of estimated coefficients around their true parameter values) ranged from 0.23 to 0.79 for the unadjusted model versus 0.24 to 0.29 for the bias-adjusted model. The empirical coverage rate (the proportion of 95% credible intervals that include their true parameter) ranged from 0.00 to 0.47 for the unadjusted model versus 0.95 to 0.96 for the bias-adjusted model. Finally, the bias-adjusted model produced the best overall diagnostic accuracy with correct classification of true diagnostic values about 78% of the time versus 62-72% without adjustment.
Results of this simulation study, which used published AD sensitivity and specificity statistics, provide evidence that bias-adjustments to logistic regression models are needed when research involves diagnoses from an imperfect standard. Results showed that unadjusted methods rarely identified true effects with credible intervals for coefficients including the true value anywhere from never to less than half of the time. Additional simulations are needed to examine the bias-adjusted model’s performance under additional conditions. Future research is needed to extend the bias adjustment to multinomial logistic regressions and to scenarios where the rate of misdiagnosis is unknown. Such methods may be valuable for improving detection of other neurological disorders with greater diagnostic error as well.