Introduction
Mounting evidence supports the efficacy of Cognitive Processing Therapy (CPT; Resick, Monson, & Chard, Reference Resick, Monson and Chard2017a), which is considered a first line intervention for treating posttraumatic stress disorder (PTSD; APA, 2017; ISTSS, 2017; VA/DoD, 2017). Support comes from randomized controlled trials (Monson et al., Reference Monson, Schnurr, Resick, Friedman, Young-Xu and Stevens2006; Resick, Nishith, Weaver, Astin, & Feuer, Reference Resick, Nishith, Weaver, Astin and Feuer2002; Resick et al., Reference Resick, Uhlmansiek, Clum, Galovski, Scher and Young-Xu2008, Reference Resick, Wachen, Mintz, Young-McCaughan, Roache, Borah and Peterson2015, Reference Resick, Wachen, Dondanville, Pruiksma, Yarvis, Peterson and Young-McCaughan2017b) as well as clinical research (Asmundson et al., Reference Asmundson, Thorisdottir, Roden-Foreman, Baird, Witcraft, Stein and Powers2019; Held, Smith, Pridgen, Coleman, & Klassen, Reference Held, Smith, Pridgen, Coleman and Klassen2022c; Lloyd et al., Reference Lloyd, Couineau, Hawkins, Kartal, Nixon, Perry and Forbes2015). CPT has been successfully delivered in different formats such as the traditional 12 sessions delivered on a weekly basis (Monson et al., Reference Monson, Schnurr, Resick, Friedman, Young-Xu and Stevens2006; Resick et al., Reference Resick, Nishith, Weaver, Astin and Feuer2002, Reference Resick, Uhlmansiek, Clum, Galovski, Scher and Young-Xu2008, Reference Resick, Wachen, Mintz, Young-McCaughan, Roache, Borah and Peterson2015, Reference Resick, Wachen, Dondanville, Pruiksma, Yarvis, Peterson and Young-McCaughan2017b) and massed/intensive treatments which deliver a full course of treatment in as little as one to three weeks (Galovski et al., Reference Galovski, Werner, Weaver, Morris, Dondanville, Nanney and Iverson2021; Held et al., Reference Held, Kovacevic, Petrey, Meade, Pridgen, Montes and Karnik2022a, Reference Held, Smith, Pridgen, Coleman and Klassen2022c). Effect sizes for PTSD severity reduction in CPT are generally large and meaningful when delivered weekly or in massed format (e.g. d > 1.0; Asmundson et al. Reference Asmundson, Thorisdottir, Roden-Foreman, Baird, Witcraft, Stein and Powers2019; Held, Bagley, Klassen, & Pollack, Reference Held, Bagley, Klassen and Pollack2019) and have been demonstrated to persist after treatment for up to ten years following treatment completion (Held et al., Reference Held, Zalta, Smith, Bagley, Steigerwald, Boley and Pollack2020b; Resick, Williams, Suvak, Monson, & Gradus, Reference Resick, Williams, Suvak, Monson and Gradus2012). However, not all participants benefit to the same extent (Dewar, Paradis, & Fortin, Reference Dewar, Paradis and Fortin2020). Recent research on massed CPT delivered as part of an intensive PTSD treatment program (ITP) identified four separate PTSD response trajectories (Held et al., Reference Held, Smith, Bagley, Kovacevic, Steigerwald, Van Horn and Karnik2021). In line with other research examining response trajectories in weekly CPT (Galovski et al., Reference Galovski, Harik, Blain, Farmer, Turner and Houle2016; Schumm, Walter, & Chard, Reference Schumm, Walter and Chard2013), approximately 15% reached treatment goals within a small number of sessions and 14% didn't respond to treatment in any meaningful way (Held et al., Reference Held, Smith, Bagley, Kovacevic, Steigerwald, Van Horn and Karnik2021). Given this variability in treatment response across treatment programs for psychiatric conditions, development of prediction models for determining who is, or is likely to be, benefitting from treatment is paramount.
The emerging emphasis on machine learning in developing prediction models in psychological medicine, as well as the increase in the types and amount of data collected in the field, has led to increased use of these methods for various applications, including tracking treatment response (Shatte et al., Reference Shatte, Hutchinson and Teague2019). Such approaches often differ from traditional statistical approaches in their emphasis on prediction accuracy rather than probabilistic emphasis on specific predictors and aspects of their relationships with outcomes (e.g. slopes or odds ratios). Machine learning models are able to accommodate a larger number of variables as predictors than generally found in traditional statistical approaches. Although some baseline predictors, such as baseline PTSD severity or negative posttraumatic cognitions, have been shown to be useful in predicting such non-responders, the amount of variability in post-treatment PTSD and depression severity that can be accounted for solely via baseline assessment is usually limited (Held et al., Reference Held, Smith, Bagley, Kovacevic, Steigerwald, Van Horn and Karnik2021, Reference Held, Schubert, Pridgen, Kovacevic, Montes, Christ and Smith2022b; Hilbert et al., Reference Hilbert, Kunas, Lueken, Kathmann, Fydrich and Fehm2020; Nixon et al., Reference Nixon, King, Smith, Gradus, Resick and Galovski2021).
Primarily focusing on baseline predictors may be important for initial determination of the appropriateness of a treatment program for an individual (Held et al., Reference Held, Smith, Bagley, Kovacevic, Steigerwald, Van Horn and Karnik2021; Hilbert et al., Reference Hilbert, Kunas, Lueken, Kathmann, Fydrich and Fehm2020; Nixon et al., Reference Nixon, King, Smith, Gradus, Resick and Galovski2021), however such models also involve considerable uncertainty given the dynamic nature of treatment response over time. The recent emphasis on implementation of precision medicine approaches (Aafjes-van Doorn, Kamsteeg, Bate, & Aafjes, Reference Aafjes-van Doorn, Kamsteeg, Bate and Aafjes2021; Chekroud et al., Reference Chekroud, Bondar, Delgadillo, Doherty, Wasil, Fokkema and Choi2021; Delgadillo, Reference Delgadillo2021; Hilbert et al., Reference Hilbert, Kunas, Lueken, Kathmann, Fydrich and Fehm2020) necessitates identification of participants who may or may not be responding to treatment as early as possible. Recently developed machine learning approaches that account for the longitudinal structure of repeated assessments hold promise for improved accuracy in predicting participants' treatment response by continuously updating models with newly acquired information about a patient's treatment response (e.g. repeatedly measured symptom severity scores). The ability to assess individual progress during treatment and update predictions of patient's response is likely a necessary precursor to treatment adjustments in any precision medicine approach.
Although others have attempted clinical prediction models in PTSD outcomes during the course of treatment (Held et al., Reference Held, Schubert, Pridgen, Kovacevic, Montes, Christ and Smith2022b; Nixon et al., Reference Nixon, King, Smith, Gradus, Resick and Galovski2021), these studies have not utilized approaches designed to accommodate the correlated structure inherent to longitudinal data, in which observations are nested within individuals, or have predicted variants of categorized non-response rather than overall PTSD severity. Given the lack of a generally agreed-upon standards for what may constitute non-response to PTSD treatment (Varker et al., Reference Varker, Kartal, Watson, Freijah, O'Donnell, Forbes and Hinton2020), and in the interest of modeling the full spectrum of variability in treatment response, predicting continuous PTSD severity may be a preferred solution.
The current study aimed to examine the ability for machine learning and statistical prediction models to utilize both baseline data and updated PTSD symptom severity information throughout the program to generate increasingly accurate and informative predictions of post-treatment PTSD severity for participants in a 3-week CPT-based ITP. This was evaluated using three approaches; Mixed Effect Random Forest (MERF; Hajjem, Bellavance, & Larocque, Reference Hajjem, Bellavance and Larocque2011, Reference Hajjem, Bellavance and Larocque2014) and Mixed Effects Bayesian Additive Regression Trees (MixedBART; Spanbauer & Sparapani, Reference Spanbauer and Sparapani2021), which both appropriately model random effects, and gold-standard statistical linear mixed-effects longitudinal models (LMMs) were used to generate these updating predictions. As shown previously (Held et al., Reference Held, Schubert, Pridgen, Kovacevic, Montes, Christ and Smith2022b), we expected that models would provide acceptable performance with baseline predictors, but that accuracy would improve throughout the program with the incorporation of updated PTSD severity information as treatment progressed and change trajectories became more apparent. Testing continuously improving models could provide foundational information in implementing a precision medicine-based approach in PTSD treatment. We were generally agnostic regarding the ability for machine learning to outperform mixed-effects regression predictions, given prior research demonstrating that machine learning approaches may not necessary outperform standard statistical approaches in making clinical predictions (Cho et al., Reference Cho, Austin, Ross, Abdel-Qadir, Chicco, Tomlinson and Lee2021; Christodoulou et al., Reference Christodoulou, Ma, Collins, Steyerberg, Verbakel and Van Calster2019; Li et al., Reference Li, Zhou, Dong, Fu, Li, Luan and Peng2021).
Methods
Participants
Data utilized in this study were from 361 veterans with PTSD who completed a 3-week CPT-based ITP at Rush University Medical Center's Road Home Program: Center for Veterans and Their Families. Participants were included if they had complete dataFootnote †Footnote 1. On average, veterans in the sample were 41.46 years old (s.d. = 9.43). The majority identified as male (63.71%) and White, (67.87%). Additional sample characteristics can be found in Table 1.
a χ2 or t test comparisons indicated that significant differences exist between the two programs in sex, race, service era, MST status, and PCL-5 at baseline (ps < 0.05).
b PCL-5 = PTSD Checklist for DSM-5.
Program description
During the 3-week ITP, veterans received 14 individual CPT sessions, 13 group CPT sessions, 13 group mindfulness sessions, and 12 group yoga sessions in addition to psychoeducation classes on various topics, such as sleep hygiene. A more detailed description of the ITP and its outcomes can be found in elsewhere (Held et al., Reference Held, Klassen, Boley, Wiltsey Stirman, Smith, Brennan and Zalta2020a; Zalta et al., Reference Zalta, Held, Smith, Klassen, Lofgreen, Normand and Karnik2018). Veterans were eligible for the ITP if they met the diagnostic criteria for PTSD, which was verified using the Clinician-Administered PTSD Scale for DSM-5 (CAPS-5; Blevins, Weathers, Davis, Witte, & Domino, Reference Blevins, Weathers, Davis, Witte and Domino2015; Bovin et al., Reference Bovin, Marx, Weathers, Gallagher, Rodriguez, Schnurr and Keane2016; Weathers et al., Reference Weathers, Litz, Keane, Palmieri, Marx and Schnurr2013). Exclusionary criteria were unstable housing, inability to independently complete activities of daily living, a suicide attempt in the previous 30 days, untreated psychosis or mania, or severe alcohol or drug use that would require continuous medical observation. The study procedures were approved by the Institutional Review Board at Rush University Medical Center with a waiver of consent as all assessments were collected as a part of routine care.
Measures
Veterans were asked to provide demographic information and complete several self-report measures before and during the ITP. A complete list of all features that were used in the different analytic models as well as when they were assessed in ITP can be found in Table 2.
Clinician administered PTSD scale for DSM-5(CAPS-5)
The CAPS-5 is a structured diagnostic PTSD assessment based on the DSM-5 criteria, administered at baseline (Weathers et al., Reference Weathers, Bovin, Lee, Sloan, Schnurr, Kaloupek and Marx2018). It assesses the severity of PTSD symptoms across the four different clusters from 0 (absent) to 4 (extreme): intrusions, avoidance, alterations in cognition and mood, and hyperarousal. PTSD symptom severity was based on the past month. Cronbach's alpha within the current sample was 0.780.
PTSD checklist for DSM-5 (PCL-5)
The PCL-5 is a self-report measure that assess PTSD severity (Weathers et al., Reference Weathers, Litz, Keane, Palmieri, Marx and Schnurr2013). Individuals were asked to rate how much they were bothered by each of the 20 PTSD symptoms from 0 (not at all) to 4 (extremely). PTSD symptom severity was rated based on the past month during the intake and past week at every other timepoint after that. In the 3-week program, the PCL-5 was assessed at baseline and on days 2, 3, 5, 6, 8, 10, 11, 13, and post-treatment. A total score of 33 is considered the threshold for ‘probable PTSD.’ Cronbach's alphas ranged from 0.897-0.962 across study timepoints.
Patient health questionnaire (PHQ-9)
The PHQ-9 is a 9-item self-report measure of depressive symptoms (Kroenke, Spitzer, & Williams, Reference Kroenke, Spitzer and Williams2001). Individuals were asked to rate how much they were bothered by their depression symptoms from 0 (not at all) to 3 (nearly every day). For the present study, depression symptoms were assessed for the past two weeks at baseline. Cronbach's alpha within the current sample was 0.810.
Posttrauma cognition inventory (PTCI)
The PTCI is a 33-item self-report measure of negative posttrauma cognitions was administered at baseline (Foa, Ehlers, Clark, Tolin, & Orsillo, Reference Foa, Ehlers, Clark, Tolin and Orsillo1999). Individuals were asked to rate how much they agreed or disagreed with a range of beliefs from 1 (totally disagree) to 7 (totally agree). Cronbach's alpha among study participants was 0.951.
Alcohol use disorder identification test – consumption (AUDIT-C)
The AUDIT-C is a 3-item self-report measure of alcohol consumption (Bush, Kivlahan, McDonell, Fihn, & Bradley, Reference Bush, Kivlahan, McDonell, Fihn and Bradley1998). Individuals were asked to rate how often they drank, how many drinks they had when they were drinking, and how often they had six or more drinks on one occasion. The AUDIT-C assessed alcohol consumption over the past year and was administered at baseline. Cronbach's alpha in this study was 0.866.
Neurobehavioral symptom inventory – 10-item validity scale (VAL-10)
(Vanderploeg et al., Reference Vanderploeg, Cooper, Belanger, Donnell, Kennedy, Hopewell and Scott2014). The VAL-10 is a 10-item self-report scale made up of items from the Neurobehavioral Symptom Inventory, assessed at baseline (Vanderploeg et al., Reference Vanderploeg, Cooper, Belanger, Donnell, Kennedy, Hopewell and Scott2014). The items were selected to identify individuals who may be over-reporting neurobehavioral symptoms. Cronbach's alpha among study participants was 0.907.
Analytic strategy
We employed three mixed-effects-based prediction models designed to accommodate the longitudinal structure inherent to assessment of symptom severity during and at the end of the treatment programFootnote 2. The first, Mixed Bayesian Additive Regression Trees (MixedBART) is a recently developed non-parametric Bayesian approach which accommodates random effects within machine learning. This approach utilizes an ensemble of decision trees to predict response. Priors, which are utilized in Bayesian analyses and represent existing beliefs regarding quantities or distributions in Bayesian analysis, are placed on program parameters, including variable selection probabilities. MixedBART and BART default parameters regarding priors and number of trees, without extensive cross-validation, are generally adequate and outperform other machine learning and statistical methods under many conditions. Based on insight from previous work (Held et al., Reference Held, Schubert, Pridgen, Kovacevic, Montes, Christ and Smith2022b), we used Dirichlet, rather than uniform, priors for variable selection probabilities. This allows models to adapt to the existence of more useful predictors in the dataset, thus accommodating the expectation that clinical features and updated PTSD severity values are likely to be more useful in prediction than demographic features (Held et al., Reference Held, Smith, Bagley, Kovacevic, Steigerwald, Van Horn and Karnik2021, Reference Held, Schubert, Pridgen, Kovacevic, Montes, Christ and Smith2022b). As a Bayesian analytic method, MixBART approaches inference by sampling from the posterior distribution generated computationally utilizing existing data and relevant priors. We used 10 000 posterior draws with 5000 burn-in draws, which was a conservative approach compared to other applications of MixBART (Spanbauer & Sparapani, Reference Spanbauer and Sparapani2021), but aligns with common practices and recommendations in Bayesian analysis (e.g. Raftery & Lewis, Reference Raftery and Lewis1991) and resulted in good overall model convergence. Based on prior recommendations using BART approaches we employed 200 trees (Chipman, George, & McCulloch, Reference Chipman, George and McCulloch2010), though we explored reduced numbers of trees to assess importance of individual features due to the tendency for BART models to potentially incorporate more irrelevant features when the number of trees is large. However, due to overall consistency across models with differing numbers of trees we report results of the primary models utilizing 200 trees hereFootnote 3.
The second approach utilized mixed-effects random forest (MERF; Hajjem et al., Reference Hajjem, Bellavance and Larocque2011, Reference Hajjem, Bellavance and Larocque2014). This tree-based random forest approach accommodates random effects for longitudinal or otherwise clustered data utilizing the expectation-maximization (EM) algorithm, a maximum likelihood estimation method that progresses through stages of estimating latent variables and optimizing the model until convergence is reached. Five-folds cross validation on the training set was applied. We also progressively increased numbers of trees and iterations in training set model development, though asymptotes for the utility of such increases in both appeared to exist at beyond approximately 150 iterations and 200–300 trees.
Finally, linear mixed effects regression models (LMMs) were also explored for machine learning model comparison to traditional statistical model accuracy using the same data. This is an accepted approach to modeling longitudinal data due to its accommodation of random effects and missing data, and less restrictive assumptions (Hedeker & Gibbons, Reference Hedeker and Gibbons2006). For this analysis, we both examined models with all predictors and models utilizing only the top five predictors as defined by both the MixedBART and MERF machine learning programs, which both identified the same five predictors. Since the use of the top five predictors resulted in models that were as accurate as those including more, or all, covariates at every timepoint we examined, we present only these results of the LMM approach. The same strategy of creating and testing a model on the training and test sets, respectively, was utilized in order to remain comparable to the machine learning models.. Cross validation was not used in linear mixed model analyses to best approximate typical applied statistical use of this longitudinal approach.
We randomly split the data from the 3-week ITP approximately 60:40 into training (n = 232) and test (n = 130) datasets. This random split was implemented at the participant-level due to the nesting of timepoint measurements within individuals. Training and test sets did not differ on any demographic or clinical variable (ps > 0.10). The training set was then used to train machine learning and LMM models with all baseline demographic and clinical data (see Table 2) as well as lagged PCL-5 scores predicting post-treatment PCL-5. Following training, we examined prediction accuracy on the test set at baseline as well as at each assessment timepoint (see online Supplementary Table S4 for accuracy using training data). Thus, when examining accuracy on the test set at baseline only baseline predictors were used to predict post-treatment PTSD severity. On program days 3, 5, 6, 8, 10, 11, and 13 all baseline features as well as PCL-5 scores for all days up to, and including, that day's PCL-5 measurement were used. Only PTSD severity score was continuously updated throughout the program. Accuracy of predictions was assessed via R2 and RMSE. Each analytic approach models change longitudinally, though our primary emphasis here is on prediction of post-treatment PTSD severity measurement. For MixedBART these values were obtained via the mean of each participant's predicted values against actual post-treatment PCL-5 scores.
Due to the importance of external validation of prediction models, we examined the predictive accuracy of these three models in a sample of 108 participants who had completed a separate, equally established, 2-week CPT-based ITP with similar programming combining individual CPT with adjunctive services, which has previously been demonstrated to be non-inferior to the 3-week program (Held et al., Reference Held, Smith, Pridgen, Coleman and Klassen2022c). Due to the differences in timeline between the two ITPs, assessment timepoints were mapped onto the existing time points based on proportion of the program that had been completed at each measurement timepoint. The three longitudinal prediction models that were generated with 3-week training data were then used to predict post-treatment PTSD severity in the 2-week ITP using the same updating-prediction model approach. In the 2-week ITP we focused on baseline and mid-program (beginning of week 2) predictions of post-treatment PTSD severity. MixedBART and LMM analyses were conducted using the MxBART and LMER4 packages in R version 4.1.1, and MERF analyses were conducted using the MERF package in Python version 3.6. Figures were created using R.
Results
Veterans in the 3-week ITP improved in PTSD severity by an average of 21.57 points (s.d. = 18.80). Approximately 70% (n = 263) improved by at least 10 points, with 51% (n = 185) finishing treatment below the PCL-5 cutoff of 33. As illustrated in Fig. 1, this constituted meaningful overall change across program timepoints, though considerable variability existed in the amount of change, particularly as treatment progressed. This increase in variability across time is generally expected and illustrates the effect of participants' differential improvement during treatment. The demographic and clinical variables in the models other than PCL-5 accounted for approximately 6% of the variability in treatment response throughout the program beyond what PTSD severity accounted for, indicating that both initial accuracy and improvements in predictions were largely driven by PTSD severity and updated PTSD severity measurements.
Both machine learning approaches identified PCL-5, time, baseline PTCI, baseline PHQ-9, and CAPS-5 Intrusions as the most important or utilized features in predicting PTSD severity. Thus, these were used in subsequent LMMs for comparison (see online Supplementary Table S5 for comparison of LMM with all features and only these features). The three analytic approaches to predicting post-treatment PTSD severity closely aligned with regards to accuracy. Baseline predictions of final PCL-5 score on the test sample yielded an overall R2 of 0.18 for final PCL-5 severity score prediction across all three models (see Table 3). As expected, as updated PTSD severity scores became available during treatment, the accuracy of final timepoint predictions increased substantially (see Fig. 2). At the start of the second week of treatment, (Day 6), all models were able to account for roughly half of the variability in post-treatment PTSD severity. This could potentially represent a milestone at which current treatment progress could be reliably determined in the 3-week ITP. By mid-program (Day 8) R2 exceeded 0.60 for all analytic methods.
a LMMs including more predictors were examined but did not outperform the five-predictor model.
b Baseline model contained all baseline data, including intake PCL-5 score.
Results of external validation with the 2-week ITP suggest model predictions were similarly accurate as in the 3-week ITP despite not training the models on these data (see Table 4). Baseline predictions generally accounted for about 20% of the variability in post-treatment PTSD severity. Including PCL-5 data up to mid-program led to being able to account for over half of the variability in final PTSD severity by that point. This supports the generalizability of model predictions to similar, but external, clinical data.
a Baseline model contained all baseline data, including intake PCL-5 score, mid-program predictions included baseline data plus PCL-5 scores to mid-program.
Discussion
Our results support the utility of updating prediction models of PTSD severity as a potential clinical tool for assessing PTSD treatment progress and to help identify timepoints for altering a participant's treatment approach. Before the 3-week ITP's midpoint, each model was able to account for a large proportion of the variability in post-treatment PTSD severity. This remained true even in an external 2-week ITP sample. These models can provide valuable clinical information that support a precision-medicine approach to PTSD treatment, as the majority of those identified as likely non-responders with some certainty at mid-program were found to be non-responders at the end of treatment (see online Supplementary Fig. S1). Thus, by deploying such relatively low-cost models in clinical practice, a clinician would be able to obtain acceptable near real-time estimates about their patient's likely endpoint PTSD severity. As such, continuously updating prediction models may be helpful in PTSD treatment in general and may be particularly useful for intensive treatments given the rapid nature of this treatment approach and the limited time clinicians have to evaluate data before needing to make treatment decisions.
As illustrated, and commonly seen in treatment, improvement was far from uniform, with the amount of variability in reported PTSD severity increasing across time. Though generally expected in longitudinal studies, this highlights the need for increased attention to individual change, and the utility of assessing such change during treatment. Indeed, change in PTSD severity during the program was clearly the most effective predictor of PTSD severity at endpoint. Other clinical and demographic predictors accounted for approximately 6% of the variability in endpoint PTSD severity, with baseline PTSD severity accounting for both the remaining 14% at baseline and the improvements in these predictions as additional severity measurements became available. Thus, the best predictor of heterogeneity in total treatment response is clearly the amount of improvement that the individual is making during the program. This highlights the importance of models that can effectively accommodate this and the additional assumptions inherent to longitudinal modeling rather than basing treatment decisions entirely on baseline predictors or a pre-determined amount of change that needs to have been reached by mid-treatment without accounting for change trajectories.
Results obtained here do not support the superiority of any specific analytic method utilized, though all models performed at least as well as machine learning models that ignore the longitudinal structure of these data, without the potential bias that can arise when ignoring the lack of independence of observations over time (see online Supplementary Table S2). Linear mixed effects regression models were capable of predicting PTSD outcome severity with the same degree of accuracy as machine learning models. This result joins a wealth of evidence that traditional statistical approaches can perform similarly to machine learning models (Cho et al., Reference Cho, Austin, Ross, Abdel-Qadir, Chicco, Tomlinson and Lee2021; Christodoulou et al., Reference Christodoulou, Ma, Collins, Steyerberg, Verbakel and Van Calster2019; Li et al., Reference Li, Zhou, Dong, Fu, Li, Luan and Peng2021), though, to our knowledge, this study represents the first such application in a continuously updating prediction model for psychiatric treatment response.
Despite similarities in prediction accuracy, unique benefits to each longitudinal approach used exist. LMMs provide easily interpretable slope coefficients and metrics regarding significance of individual predictors. Assumptions, as well as aspects of longitudinal structure such as covariance structure or autocorrelation, are easily assessed with this approach, and missing data is easily accommodated. Conversely, both machine learning approaches may more readily accommodate more predictors in applications involving high dimensional datasets or multiple correlated predictors. An additional well-known benefit of Bayesian approaches is the ability to quantify and visualize uncertainty in estimates. Although the mean predicted value for each participant from the posterior is reported in model output and was utilized above to obtain model accuracy metrics, the credible intervals can also be easily obtained to assess the degree of uncertainty in predictions. However, we found that MixedBART yielded overly optimistic estimates of variability around prediction means, so we would caution against use of program generated credible intervals blindlyFootnote 4.
A number of limitations need to be acknowledged. The use of self-report assessments may have increased variability in reporting and it being the only continuously updated variable may be viewed as a limitation, though our prior work has suggested that updates in other variables did not improve predictions in any meaningful way when including lagged PCL-5 scores. Additionally, the fact that PTSD severity measurements over time explained most of the variability in treatment response may have obscured potential roles of other contributing factors. However, this also highlights the importance of utilizing such updated severity information. Also, sample size considerations for some demographic variables, such as race, reduced power for intensive examination of demographic moderators of treatment response, though our prior work has indicated that such demographic variables generally did not impact treatment response for either the 3- or 2-week ITP (Held et al., Reference Held, Smith, Pridgen, Coleman and Klassen2022c). Only ITP completers were examined, although a completer bias is unlikely since completion rates were quite high (>90%), and completers and non-completers did not differ on any baseline demographic or clinical variables for either sample, except for a difference in race in the 3-week sample. Although use of a 2-week ITP validation sample is a strength of the current analysis, it may be similar in many ways to the original sample that an external sample may not. For example, although the treatment schedule differed between the 3- and 2-week ITPs with the later drastically reducing group treatment and adjunctive service components, both centered around CPT (Held et al., Reference Held, Smith, Pridgen, Coleman and Klassen2022c). However, demographics breakdown between the two programs indicated that significant differences existed in sex, MST status, race, service era, and baseline PTSD severity (see Table 1). Finally, although many of the exclusion criteria resemble those used in other PTSD treatment, some were specific to ITPs (e.g. stable housing, ability to travel) and may limit the generalizability of the findings presented here.
Conclusion
Considerable additional research is warranted to better understand specific individual factors that could interact with the chosen treatment approach to individualized treatment. However, our demonstration of the use of continuously updating machine learning or predictive modeling using standard longitudinal statistical approaches to assess progress and predict PTSD treatment outcomes shows promise for precision medicine in the field of PTSD. Such models can provide clinicians with information about which patients may progress through treatment as expected or benefit from treatment alterations based on their predicted response. Using the models presented here, such decisions can relatively reliably be made by mid-treatment in the two ITPs we examined. Future research should examine the feasibility of integrating these models into clinical care and systematically testing whether treatment modifications for individuals predicted to have less favorable treatment responses can improve their outcomes, as well as whether findings generalize to more traditional weekly treatment and/or evidence-based PTSD treatments.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0033291722002689.
Financial support
This work was supported by the Road Home Program at Rush's partnership with Wounded Warrior Project®.
Conflict of interest
Philip Held receives grant support from Wounded Warrior Project® and RTI International. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health, Wounded Warrior Project®, or any other funding agency.
All other authors declare that they have no competing interests.
Ethical standards
The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008.