The coronavirus disease (COVID-19) pandemic has had a significant impact on patient health and logistics in healthcare systems globally.Reference Moghadas, Shoukat and Fitzpatrick1,Reference Crespo, Fernanez Arrillo and Iruzubieta2 The Veterans’ Health Administration (VHA) developed a surveillance system using electronic health record (EHR) data to monitor the impact of COVID-19 at the 170 VHA medical centers across the United States in real time.3 The VHA leveraged this new system to report daily data for each VHA medical center to the Centers for Disease Control and Prevention (CDC) National Healthcare Safety Network (NHSN) COVID-19 Acute-Care Module.Reference Sapiano, Dudeck and Soe4
Although EHR data has been used for other VHA surveillance reporting,Reference Jones, Haroldsen and Madras-Kelly5 application of centralized EHR data for emerging disease reporting can be challenging. Here, we compared the COVID-19 data extracted centrally to data collected by individual VHA facilities to validate the EHR extractions. Our findings provide insight into data collection for public health reporting of novel diseases.
Methods
Data collection was conducted for 10 business days in June 2020 to compare the daily computer-extracted data from the COVID-19 surveillance system (ie, computer-extracted data) being sent to NHSN with data manually collected from volunteer VHA medical centers (ie, facility-reported data) for the same period and using the same NHSN definition for the count of hospitalized patients with suspected and confirmed COVID-19 (see the Supplementary Information online for definitions and details on data collection and analysis). Total counts of hospitalized patients determined from computer extraction and facility reporting were compared by t test (SAS version 9.4 software, SAS Institute, Cary, NC).
For both data sources, personally identifiable information (ie, patient first name, last name, birthdate, and last 4 digits of the Social Security number) corresponding to the daily patients counted was collected to compare accuracy at the patient level (ie, even when counts match, the patients reported can be different). Additionally, for facilities that had zero or only 1 day of total counts matching between the 2 sources, the medical charts of patients identified by both sources were reviewed to determine whether they should have been included in the count based on the NHSN definition.
This work was approved by VHA Central Office as a validation effort for operational systems improvement.
Results
Of 170 VHA facilities, 36 (21%) volunteered to participate in the project, resulting in 356 days of facility data (Table 1 and Fig. 1; see Supplementary Table S1 online for facility characteristics). Overall, the study included 1,472 patient days for computer-extracted data and 1,353 patient days for facility-reported data (P = .34) (Table 1). The count of hospitalized patients was the same for the 2 data sources in 151 (42%) of the 356 facility days reported. For 139 (92.1%) of these 151 days, there were complete patient matches of personally identifiable information between the 2 sources (Table 1 and Fig. 1). When counts from the 2 sources were compared allowing for discordance by 1 or 2 patients, the percentage of facility days included increased from 42% to 73% (261 of 356) and 85% (301 of 356), respectively (Table 1).
Note. VHA, Veterans Health Administration.
a 34 (94%) of 36 facilities reported data for all 10 days of the project; 2 facilities reported data for a portion of the 10 days of the project (see Fig. 1 for more details).
b Data extracted by VHA automated data extraction system for daily reporting of VHA medical facility counts to the National Healthcare Safety Network.
c Data reported by VHA medical facilities participating in this review based on manual assessment of daily counts by facility staff.
d Patient matching at a facility was determined by comparison of patient identifiers (names, dates of birth, and last four digits in social security numbers) between the computer-extracted and facility-reported counts for each day.
Daily counts were also assessed for each facility, and the review of personally identifiable information showed variability in the extent of patient matching (Supplementary Table S2 online). The difference in counts between the 2 sources on nonmatching days was low overall; the median difference in count was 1 for 23 (68%) of 34 facilities. On days when patient counts matched on the facility level, the personally identifiable information of patients also tended to match; the mean for all facilities was 95% ±16%. However, on days with nonmatching counts between data sources, the mean matching of personally identifiable information for all facilities was only 48% ±30% (Supplementary Table S2; see also Figure 1 for variability in matching personally identifiable information).
Overall, 12 (33%) facilities met criteria for patient chart reviews. For 8 facilities, chart reviews showed that the data source with higher daily counts overcounted cases (Supplementary Table S3 online). Reasons for facility-reported overcounting included reporting patients who had a past positive test result but no symptoms on admission, having a protocol designating certain patients as suspect without symptoms (eg, transfers from nursing homes), universal laboratory screening of all patients for COVID-19 regardless of symptoms or reason for admission, and errors in following instructions. Computer-extracted overcounting occurred with reporting patients who had a past positive result but no symptoms on admission, patients with no COVID-19 testing or a negative result, and patients without symptoms but who were tested based on local universal screening policy.
Discussion
Electronic health record data are increasingly being used for public health surveillance.Reference Salazar, Stinson, Sillau, Good and Newman6,Reference Klompas, Murphy and Lankiewicz7 Our findings offer insights for designing emerging disease surveillance systems, and they highlight the importance of periodic validation by those collecting data for reporting.
The hospitalized patient-count match between computer-extracted and facility-reported data was ∼42%. For the days that counts did match, the individual patients from the 2 sources almost always matched, indicating a key level of accuracy. The most common reason for count mismatching in both sources, substantiated by comparing personally identifiable information and selected chart reviews, was determination of “suspect” cases for inclusion in the daily report though the impact varied by facility. Symptoms of COVID-19 are broad and nonspecificReference Wiersinga, Rhodes, Cheng, Peacock and Prescott8 and are especially difficult to discern in older people who may have atypical presentation and multiple comorbidities.Reference Tay and Harwood9,Reference Neumann-Podczaska, Al-Saad, Karbowski, Chojnicki, Tobis and Wieczorowska-Tobis10 Therefore, identifying suspected cases by automated extraction was limited to those who had a specific tag in their electronic record such as being a “person under investigation.” From a facility-reporting perspective, local criteria for suspected cases sometimes differed from the NHSN definition. These findings lead to several observations to improve emerging disease surveillance: (1) early standardization of healthcare system laboratory and health-record terms in the EHR for an emerging disease can facilitate data extraction; (2) capturing suspect cases is often necessary for emerging disease reporting systems, and clear definitions and instructions for suspect cases can prevent overreporting; and (3) having separate data elements for confirmed cases and suspect cases will lessen data interpretation issues.
Although we identified areas for improvement, computer extraction was useful for assessing the change in patient counts over time and often was more accurate than reporting by facilities. Allowing for a limited discrepancy in count matching between sources increased the percentage of days matching from 42% to 85%. And, when counts at facilities were discordant, the difference in counts between sources was usually low. In the early months of an emerging disease, leveraging EHRs for automated extraction can relieve the data-collection burden from staff with sufficient accuracy for monitoring changes in case counts.
This study had several limitations. Facility participation was voluntary, and several of the facilities had low hospitalized-patient counts. Our results may not reflect validation in high-incidence areas. Nonetheless, in an emerging biological event, even capturing small numbers of cases is critical, and validation in low-incidence areas is valuable. Another limitation is the unknown generalizability of the VHA extraction system for public health reporting in other healthcare systems; however, the reported challenges related to designation of suspect cases is informative for the development of any such system. Finally, we did not collect detailed information on how facilities were determining their counts; interpretations had to be drawn from selected patient chart reviews.
Surveillance during an emerging infectious disease depends on data collection definitions and reporting systems. Here, 2 data-collection mechanisms, computer extraction and manual facility reporting demonstrated comparable results for surveillance of disease incidence, with similar pitfalls related to the sometimes ambiguous nature of case classifications. Validation studies are critical for identifying areas for improvement when developing data-collection platforms for emerging diseases.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/ice.2022.55
Acknowledgments
The authors are very grateful to the staff at the 36 participating facilities who collected data for this project. The views expressed are those of the authors and are not necessarily the official views of the Department of Veterans’ Affairs.
Financial support
No external financial support was received for this work.
Conflicts of interest
All authors report no conflicts of interest relevant to this article.