Published online by Cambridge University Press: 02 November 2020
Background: The Bundled Payment Care Improvement Program is a CMS initiative designed to encourage greater collaboration across settings of care, especially as it relates to an initial set of targeted clinical episodes, which include sepsis and pneumonia. As with many CMS incentive programs, performance evaluation is retrospective in nature, resulting in after-the-fact changes in operational processes to improve both efficiency and quality. Although retrospective performance evaluation is informative, care providers would ideally identify a patient’s potential clinical cohort during the index stay and implement care management procedures as necessary to prevent or reduce the severity of the condition. The primary challenges for real-time identification of a patient’s clinical cohort are CMS-targeted cohorts are based on either MS-DRG (grouping of ICD-10 codes) or HCPCS coding—coding that occurs after discharge by clinical abstractors. Additionally, many informative data elements in the EHR lack standardization and no simple and reliable heuristic rules can be employed to meaningfully identify those cohorts without human review. Objective: To share the results of an ensemble statistical model to predict patient risks of sepsis and pneumonia during their hospital (ie, index) stay. Methods: The predictive model uses a combination of Bernoulli Naïve Bayes natural language processing (NLP) classifiers, to reduce text dimensionality into a single probability value, and an eXtreme Gradient Boosting (XGBoost) algorithm as a meta-model to collectively evaluate both standardized clinical elements alongside the NLP-based text probabilities. Results: Bernoulli Naïve Bayes classifiers have proven to perform well on short text strings and allow for highly explanatory unstructured or semistructured text fields (eg, reason for visit, culture results), to be used in a both comparative and generalizable way within the larger XGBoost model. Conclusions: The choice of XGBoost as the meta-model has the benefits of mitigating concerns of nonlinearity among clinical features, reducing potential of overfitting, while allowing missing values to exist within the data. Both the Bayesian classifier and meta-model were trained using a patient-level integrated dataset extracted from both a patient-billing and EHR data warehouse maintained by Premier. The data set, joined by patient admission-date, medical record number, date of birth, and hospital entity code, allows the presence of both the coded clinical cohort (derived from the MS-DRG) and the explanatory features in the EHR to exist within a single patient encounter record. The resulting model produced F1 performance scores of .65 for the sepsis population and .61 for the pneumonia population.
Funding: None
Disclosures: None