Surgical site infections (SSIs) are among the most common healthcare-associated infections (HAIs) and result in increased costs, morbidity, postoperative length of stay, and mortality. Reference Magill, Edwards and Bamberg1–Reference Shaw, Gomila and Piriz4 Reported SSI rates after colorectal surgery range from 5% to 30%, making them high-incidence procedures. Reference Limón, Shaw and Badia5–8 Colorectal surgeries are therefore incorporated in most SSI surveillance programs.
In most hospitals, surveillance is performed manually. However, this is experienced as labor intensive, and possibly inaccurate and is prone to subjectivity and low interrater agreement, thus limiting comparisons between hospitals. Reference Hedrick, Sawyer, Hennessy, Turrentine and Friel9–Reference Verberk, van Rooden and Hetem11 The increasing availability of data stored in the electronic health record (EHR) offers opportunities for (partially) automating SSI surveillance, thereby reducing the workload and supporting standardization of the surveillance process. To date, several studies have published (semi)automated methods to automate SSI surveillance after colorectal surgery. Unfortunately, most of these are not feasible for Dutch hospitals (1) because they include elements that are not representative of the Dutch clinical setting and practice, (2) because they have insufficient algorithm performance, (3) because processing time is delayed or (4) because they are too complex for application in real life. Reference Cho, Chung and Choi12–Reference van Mourik, van Duijn, Moons, Bonten and Lee18
Two published semiautomated surveillance algorithms targeting deep SSI after colorectal surgery may be feasible for the Dutch setting: a classification algorithm Reference van Rooden, Tacconelli and Pujol19 and a multivariable regression model. Reference Mulder, Kluytmans-van den Bergh and van Mourik20 The classification algorithm was pre-emptively designed based on clinical and surveillance practices from a French, a Spanish, and a Dutch hospital. The sensitivity was 93.3%–100% compared to manual surveillance, and the algorithm yielded a workload reduction of 73%–82%. The regression model was developed using data from a Dutch teaching hospital; we used it to predict the probability of deep SSI for each individual patient. This 5-predictor model had a sensitivity of 98.5% and a workload reduction of 63.3%. Reference Mulder, Kluytmans-van den Bergh and van Mourik20
External validation or actual implementation studies of new methods for automated surveillance are scarce. Reference Toll, Janssen, Vergouwe and Moons21,Reference Bleeker, Moll and Steyerberg22 As reported by 2 systematic reviews, only 23% of the included studies used a separate validation cohort Reference Streefkerk, Verkooijen, Bramer and Verbrugh23 and only 25% of automated surveillance were used in clinical routine. Reference de Bruin, Seeling and Schuh24 Hence, knowledge about generalizability of automated surveillance models is limited, and information about the path toward actual implementation is needed. Reference Bleeker, Moll and Steyerberg22,Reference Grota, Stone, Jordan, Pogorzelska and Larson25,Reference Verberk, Aghdassi and Abbas26
In this study, we present an independent and external validation of the previously developed classification and regression model in new cohorts of patients that underwent colorectal surgeries in different types of Dutch hospitals. Reference Toll, Janssen, Vergouwe and Moons21 We investigated the feasibility of data requirements for both algorithms. If feasible and externally valid, these models can be implemented in SSI surveillance practices and workflow processes.
Methods
Study design
In this retrospective cohort study, 4 Dutch hospitals (1 academic, 2 teaching, 1 general), each with different, or different versions of, EHR systems, extracted the data needed for algorithm application. To obtain insights in hospitals’ clinical practice and patient care, a questionnaire adapted from a previous study Reference van Rooden, Tacconelli and Pujol19 was filled in by the hospital staff at the start of the study (Appendix 1 online). Feasibility of the data collection (a precondition for implementation) was evaluated by assessing the completeness of the surveillance population (denominator) and the ability of the hospitals to automatically collect case-mix variables from their EHR. Thereafter, we applied the 2 surveillance algorithms to the extracted data. Model results were compared with conventional (ie, manually annotated) surveillance. Reference Verberk, van Rooden and Hetem11 Approval for this study was obtained from the institutional Review Board of the University Medical Centre Utrecht (reference no. 20-503/C) and from the local boards of directors of each participating site. Informed consent was waived given the observational and retrospective nature of this study.
Surveillance population and data collection
The hospitals identified patients aged >1 year undergoing primary colorectal resections in 2018 and/or 2019 based on procedure codes in EHR data. Hospitals could use other data sources to establish inclusion rules to construct the surveillance population and to distinguish secondary procedures or resurgeries. For the patients included in the surveillance population, structured data were extracted from the EHR including demographics, microbiological culture results, admissions (ie, prolonged length of stay or readmission), resurgeries, radiology orders, antibiotic prescriptions, and variables for case-mix correction (see Supplementary Table S1 in Appendix 2 online).
Outcome
The outcome of interest was a deep SSI (deep incisional or organ-space) within 30 days after surgery according to the Dutch surveillance protocol. 27 In short, patients having purulent drainage from the deep incision or from a drain that is placed through the wound, or having an abscess, a positive culture from the organ space, or signs and symptoms of infection in combination with wound dehiscence and a positive culture of deep soft tissue, or other evidence of infection by direct examination were considered deep SSIs. The criterion of a positive culture is not applicable in case of anastomotic leakage or perforation following the surgery. In each hospital, infection control practitioners (ICPs) manually screened patients to identify deep SSIs. This manual surveillance was considered the reference standard. All ICPs performing manual chart review received training to ensure the quality of data collection and case ascertainment. Reference Verberk, van Rooden and Hetem11 Moreover, all hospitals participated in an on-site visit to validate the conventional surveillance. Details about this on-site validation visit are described below.
Feasibility of data collection
To evaluate the feasibility of the data collection, we evaluated the completeness of the surveillance population (denominator data) by comparing the patients selected by procedure codes with patients included in the reference standard. Additionally, we compared agreement between the case-mix variables (ie, risk factors: age, sex, ASA classification, wound class, stoma creation, malignancy and anastomotic leakage) that were extracted from the EHR with the case-mix variables that were collected during conventional surveillance.
Algorithm validation
Model validation of the classification model
The classification algorithm was based on the development study, using 5 elements: antibiotics, radiology orders, (re)admissions (ie, prolonged length of stay, readmissions or death), resurgeries, and microbiological cultures (Fig. 1a and Supplementary Table S2 in Appendix 2 online). All extracted data were limited to 45 days following the colorectal surgery to enable the algorithm to capture deep SSIs that developed at the end of the 30-day follow-up period. In accordance with the development study, Reference van Rooden, Tacconelli and Pujol19 patients were classified into low probability of having had a deep SSI (≤1 element excluding microbiology, or 2–3 elements and no microbiology) and high probability of having had a deep SSI (4 elements excluding microbiology, or 2–3 elements and microbiology). High-probability patients required manual SSI confirmation, and low-probability patients were assumed free of deep SSI. If discrepancies were found between the clinical practice reported in the questionnaire and the algorithm, we evaluated whether an adaptation of the classification algorithm could have improved performance. When an algorithm element could not be computed due to incomplete data (eg, discharge date is missing so length of stay cannot be computed), the patient scored positive on that element.
Model validation of the regression model
The regression model utilizes wound class, hospital readmission, resurgery, postoperative length of stay, and death to calculate the probability of deep SSI. Coefficients estimated in the development setting Reference Mulder, Kluytmans-van den Bergh and van Mourik20 were multiplied with the predictor values of this validation cohort to estimate SSI probability (Fig. 2 and Supplementary Table S3 in Appendix 2 online). In accordance with the cutoff point in the development study, patients were classified into low probability of deep SSI (≤0.015) and high probability of deep SSI (>0.015). High-probability patients required manual SSI confirmation, whereas low-probability patients were assumed free of deep SSI. In case a predictor could not be automatically extracted by the hospital or had missing values, the predictor collected by the manual surveillance was used to evaluate algorithm performance.
On-site visit
All hospitals participated in an on-site visit to validate the conventional surveillance. This process was executed by 2 experienced surveillance advisors of the Dutch national HAI surveillance network who were blinded for the outcomes of both the reference standard and the algorithms. For each hospital, a sample of 20 patients was taken from the data according to the hierarchical rules (Fig. 3). All false-negative results were included, to confirm their deep SSI status. Additionally, records from every other group (false-positive, true-positive, and true-negative results) were included until 20 were gathered. The group size of 20 patients was based on the time capacity of the validation team.
Statistical analyses
After data linkage, descriptive statistics were generated. To evaluate data feasibility, missing data patterns were described, and no techniques such as multiple imputation were performed to complete the data. Both models were applied to the data extractions, and results were compared with the reference standard. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and workload reduction were calculated overall and were stratified per hospital. Workload reduction was defined as the proportion of colorectal surgeries no longer requiring manual review after algorithm application. A discrepancy analysis was performed in case of any false-negative results (ie, missed deep SSI); the algorithm elements were checked in the original data. Data cleaning and statistical analyses for the classification model were carried out in SAS version 9.4 software (SAS Institute, Cary, NC). For the regression model, we used R version 3.6.1 software (R Foundation for Statistical Computing, Vienna, Austria).
Results
Feasibility of data collection
Completeness of the surveillance population
The exact surveillance population could not be reconstructed because there were no separate procedure codes or potential inclusion rules to reliably distinguish secondary procedures or resurgeries from primary procedures (range, 8.7%–22.0%, Table 1). Vice versa, 0–25% of patients in the reference standard were not identified when using inclusion rules based on procedure codes (details in Table 1). Thus, 672 colorectal surgery patients were included in this study, and 28 had deep SSIs (4.1%).
Note. SSI, surgical site infection.
a Until July 1, 2019.
b Explanation of mismatch: manual review of a random sample of these records showed these were mainly revision/secondary procedures, and for hospital C surgeries performed at another hospital location that are excluded from manual surveillance.
c Explanation of mismatch:
Hospital B: incorrect inclusions in reference standard as they did not meet inclusion criteria (no primary procedure)
Hospital C: These surgeries were registered as executed by internal medicine department, while for the extractions only resections performed by surgery department were selected.
Hospital D: According to the national surveillance protocol the resection with the highest risk is to be registered in case of more resections during the same surgery. Hospital included the wrong procedure in these cases.
Completeness of data collection
Electronic collection of the minimum required data set from the EHR was feasible for all variables except wound class. Hospital A used text mining to establish the wound class. For hospitals B and C, wound class as collected during manual surveillance (reference standard) was used. For hospital D, wound class information was not available in the source data.
Figure 4 shows the percentage of agreement between the case-mix variables extracted from the EHR and those collected manually. Disagreement was mostly related to incomplete data, either variables were not registered in the original source or were not available from source data at all.
Algorithm validation
The original classification model had an overall sensitivity of 85.7% (95% CI, 67.3%–96.0%) ranging from 72.7% to 100% between hospitals, a specificity of 92.1% (95% CI, 89.7%–94.0%), PPV of 32.0% (95% CI, 21.7%–43.8%) and an NPV of 99.3% (95% CI, 98.3%–99.8%). For the performance per hospital, see Table 2. Only 8%–13% of the records required manual review after algorithm application. In hospitals C and D, respectively, 1 and 3 deep SSIs were missed by the algorithm (Table 3). In contrast to hospitals A and B, both hospitals had reported in the questionnaires that microbiological cultures were not consistently taken in case of suspected infection, and this was reflected in the percentage of patients meeting the microbiology element. Therefore, we adapted the algorithm and classified patients with 1 element (ie, radiology order, antibiotics, readmission, or resurgery) as low probability (Fig. 1b). This model resulted in higher sensitivity (overall sensitivity, 100%; 95% CI, 87.7%–100.0%) but at the cost of lower PPV and less workload reduction (Table 2).
Note. CI, confidence interval; PPV, positive predictive value; NPV, negative predictive value.
a Algorithm elements are radiology orders, antibiotics, (re)admissions, resurgeries, and microbiology. Patients needed 4 elements excluding microbiology, or 2-3 elements and microbiology to be classified as high probability by the algorithm. See also Fig. 1 and Appendix 2 (online).
b Both hospitals had reported in the questionnaires that cultures were not consistently taken in case of suspected infection.
The regression model could only be validated for hospitals A–C because wound class was not available for hospital D. Similar to the development study, patients with infected wounds (wound class 4) were excluded, leaving respectively 187, 116, and 207 records from hospitals A–C for analyses, including 4, 3, and 7 deep SSIs. For this model, overall sensitivity was 100% (95% CI, 76.8%–100.0%); the specificity was 76.9% (95% CI, 73.0%–80.5%); the PPV was 11.9% (95% CI, 6.6%–19.1%); and the NPV was 100% (95% CI, 99.0%–100%). With this algorithm only 22.7%–23.5% records required manual review. The results per hospital are shown in Table 2. Due to the small sample size and low number of deep SSIs, discrimination and calibration were not evaluated.
No discrepancies were found during the on-site validation visit in hospital D. In the other 3 hospitals, on-site validation revealed 5 additional deep SSIs: 2 were overlooked in the conventional surveillance and 3 were initially classified as superficial SSIs. All additional deep SSIs were classified correctly as high probability by both the (modified) classification model and the regression model. Other findings of the on-site validation of the reference standard, though not essential for the assessment of the algorithms, were reclassifications of superficial SSIs to no SSI (n = 1), missed superficial SSIs (n = 2), and incorrect inclusions (n = 8).
Discussion
This study demonstrated the external validity, both temporal and geographical, of 2 surveillance algorithms that identify patients with a high probability of deep SSI after colorectal surgery. Both had a high detection rate for deep SSI and can be used for semiautomated surveillance and, thus, to further improve efficiency and quality of SSI surveillance.
Both the classification model, especially when adapted to local practices, as well as the regression model, performed very well. To select a model for use within an organization, we considered other aspects of implementation. First, in case of incomplete data, the original development study of the regression model used multiple imputation techniques. For the classification model, the patient scored positive on the algorithm element that could not be computed due to incomplete data. This was a more convenient method for which no complex data management techniques were required. Second, according to the original study, patients with a dirty-infected wound (ie, wound class 4) were excluded from the cohort of the regression model. However, according to the national surveillance protocol, these cases should have been included in the surveillance. In addition, in 2 hospitals, wound class was not available in a structured format for automated extraction hindering algorithm application. Third, the classification model was easily be adapted to local practices. For the regression model, a sufficient sample size was required for redevelopment or recalibration in case of low predictive accuracy. This aspect may be challenging for hospitals performing few colorectal resections. Therefore, the (modified) classification model is more feasible and sustainable for real-life implementation within hospitals, improving standardization and benchmarking. We know from a previous study that the classification model has also been successful in other European countries and in low-risk surgeries such as hip and knee arthroplasties. Reference van Rooden, Tacconelli and Pujol19,Reference Verberk, van Rooden and Koek28
For both algorithms, however, several hurdles remain for implementation. The exact surveillance population could not be automatically selected by procedure codes, but a change in the current inclusion criteria or target population could be considered. In this study, 10%–22% of surgeries detected by procedure codes did not concern a resection, were not the main indication for surgery (but performed concomitant to other intra-abdominal surgeries), or were not the first colon resection for that patient. Also, the variables necessary for case-mix adjustment are sometimes difficult to extract automatically. Although the search for a proper case-mix correction is ongoing, Reference Grant, Aupee and Buchs14,Reference Young, Knepper, Moore, Johnson, Mehler and Price29–Reference Watanabe, Suzuki and Nomura32 automated extraction of a minimal set of risk factors is necessary to interpret the surveillance results and to maintain the workload reduction delivered by (semi)automated surveillance.
Two findings in this study emphasize that close monitoring, validation of algorithm components, and future maintenance are important to maintaining alignment with clinical practice and guarantee high-quality surveillance. First, as appeared from the questionnaire, 2 hospitals did not consistently obtain microbiological cultures in case of suspected deep SSI. We advise researchers to first verify whether algorithms align with clinical practice and consider adapting algorithms to differences subsequently. Reference Streefkerk, Verkooijen, Bramer and Verbrugh23,Reference Leal and Laupland33–Reference Freeman, Moore, Garcia Alvarez, Charlett and Holmes35 Secondly, new treatment techniques should also be evaluated regularly and algorithms adapted accordingly. Endosponge therapy is increasingly used after anastomotic leakage; however, this intervention is often not registered or is regarded as resurgery but as outpatient treatment performed by a different specialty than the initial colorectal surgery. Each hospital should therefore periodically evaluate care practices and algorithm elements to select the appropriate resurgeries or to include recently introduced interventions, such as endosponge therapy, within the re-surgery element in the surveillance algorithm.
This study had several strengths. We performed an independent external validation in independent patient data from different types of hospitals, as well as a temporal validation. Apart from algorithm performance, automated selection of patients and case-mix variables were investigated as well, which are prerequisites for actual implementation.
This study also had several limitations. First, both algorithms targeted deep SSIs only, but in colorectal surgeries 20%–50% of SSIs are superficial. 6,36 Debate continues regarding the inclusion of superficial SSI in surveillance programs given their subjective criteria and limited clinical implications. Reference Verberk, van Rooden and Koek28,Reference Skube, Hu and Arsoniadis37,Reference Kao, Ghaferi, Ko and Dimick38 Second, we aimed to validate all published automated surveillance systems that appeared applicable to Dutch practice; however, automated surveillance systems may have been developed by commercial companies that were not published in scientific literature and were therefore not included. Third, the small sample size and low number of deep SSIs resulted in large confidence intervals for the individual hospitals and impeded the evaluation of discrimination and calibration. Reference Van Calster, McLernon, van Smeden, Wynants and Steyerberg39,Reference Riley, Debray and Collins40 Although a larger validation cohort is preferred, the numbers used in this study reflect the reality of surveillance practices. Although underpowered, the overall sensitivity and hospitals’ individual point estimates were satisfying, and this study provided valuable insights into implementation. Fourth, for both manual- and semiautomated surveillance, postdischarge surveillance was limited to the initial hospital. In the Dutch setting, patients return to the operating hospital in case of complications, so this will likely not lead to underestimation of SSI rates. SSI benchmarking or widespread implementation of this semiautomated algorithm may be hampered for countries without this follow-up. Last, as actual widespread implementation of automated surveillance is still limited, Reference de Bruin, Seeling and Schuh24–Reference Verberk, Aghdassi and Abbas26 this study provides insights into validity and data requirements needed for implementation of semiautomated SSI surveillance after colorectal surgery. However, this study did not include a full feasibility study including economic, legal, and operational assessments. We emphasize that successful implementation also depends on organizational support, information technology knowledge, staff acceptance, change management, and possibilities for integration in workflows.
In this independent external validation both approaches to semiautomated surveillance of deep SSI after colorectal surgery performed well. However, the classification model was proven preferrable to the regression model because of source data availability and less complex data-management requirements. Our results have revealed several hurdles when automating surveillance. The targeted surveillance population could not be automatically selected by procedure codes, and not all risk factors were complete or available for case-mix correction. The next step is implementation in infection prevention practices and workflow processes to automatically identify patients at increased risk of deep SSI.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/ice.2022.147
Acknowledgments
We thank Titia Hopmans and Kati Halonen for performing on-site validation visits and Hetty Blok, Suzan Bongers, Désirée Oosterom, Wilma van Erdewijk, Fabio Bruna and Robin van der Vlies for the data collection and processing. We would like to thank Tessa Mulder and Stephanie van Rooden for providing details regarding the algorithms.
Financial support
This work was supported by the Regional Healthcare Network Antibiotic Resistance Utrecht with a subsidy of the Dutch Ministry of Health, Welfare and Sport (grant no. 331254).
Conflicts of interest
The authors declare that they have no competing interests.