Introduction
There is an urgent need to address pervasive inequities in health and healthcare in the USA [1,Reference Braveman, Arkin, Orleans, Proctor and Plough2]. Many areas of health inequity, particularly those affecting Black, Indigenous, and people of color are well known and well described [Reference Smedley, Stith and Nelson3–Reference Swope and Hernandez8]. However, there remain important unexplored area of health equity and for many populations in the USA, accessing data to visualize and monitor health equity is difficult, despite an exponential increase in data from electronic health records (EHRs). Furthermore, for institutions that offer tools for self-service data inquiry, the tools typically only support queries at a high level, for a single variable such as race, ethnicity, or sex. These systems also require training and expertise and are generally not accessible to community members due to privacy concerns.
Accurate assessment of health equity requires secure, accessible, reliable data, and analytic code. Translational informatics innovation has led to broad adoption of “common data models (CDMs)” such as the “Informatics for Integrating Biology with the Bedside (i2b2),” “Observational Medical Outcomes Partnership (OMOP),” and “National Patient-Centered Clinical Research Network (PCORNet)” CDMs [Reference Murphy, Mendis, Berkowitz, Kohane and Chueh9,Reference Forrest, McTigue and Hernandez10] which has dramatically improved access to de-identified, standardized representations of EHR data for research and evaluation on a national scale. A growing number of institutions (including the one featured in this report) are also now routinely assessing social determinants (drivers) of health (SDoH) (e.g., food insecurity, housing insecurity, and economic instability) during clinical visits and recording responses in the EHR [Reference Buitron de la Vega, Losi and Sprague Martinez11]. These sites are able to integrate these patient-reported SDoH data elements into their CDMs to make them available for research [Reference Buitron de la Vega, Losi and Sprague Martinez11–Reference Phuong, Hong and Palchuk18]. SDoH features related to where patients live are also increasingly available for use in systems that can link addresses reported at the time of visits to publicly available “place-based” (geospatial) social and environmental data at multiple levels (i.e., address, block, block group, census tract, census block, zip code, and state).
Complementing development of CDMs and expanded SDoH data at the person and place level has been development of well-specified and validated measures of health and healthcare processes and outcomes. Led by the US Centers for Medicare and Medicaid, these standard measures support the accurate and reproducible computation of health outcome status using electronic health data [Reference Schreiber, Krauss, Blake, Boone and Almonte19]. Electronic clinical quality measures (eCQMs) are “measures specified in a standard electronic format that use data electronically extracted from EHRs and/or health information technology (IT) systems to measure the quality of health care provided [20].” eCQMs are routinely used within healthcare settings across the USA; however, their use in research settings has been more limited. Their use is expected to grow with increased use of Fast Healthcare Interoperable Resources (FHIR) and development of FHIR eCQM specifications.
In this report, we describe a project to build an open-source, R-Shiny application, the “Health Equity Explorer (H2E),” designed to enable users to explore health equity data in an interactive way by building graphs, tables, and maps and conducting statistical analyses in a way that can be easily shared within and across CDM using communities. We prioritized approaches that could be implemented by a wide variety of potential collaborators to support exploration of virtually any computable health outcome from a wide variety of health domains. For this paper, we focus on patient-reported SDoH. However, H2E supports a broad and scalable array of social, environmental, and clinical attributes. Also, the project described in this report used the OMOP CDM, but the underlying database for H2E (H2E DataMart) could be created from any CDM using analytic software code re-engineered to map to coding systems for that CDM.
Materials and methods
This project is located at the largest SafetyNet hospital in New England, Boston Medical Center (BMC), and includes its affiliated federally qualified community health centers (FQHCs). The project is a collaboration between the BMC Health Equity Accelerator (HEA) [21], BMC Research Operations, and the Boston University Clinical and Translational Sciences Institute (BU-CTSI). Data for the project were obtained from an OMOP CDM repository, the “Boston Data for Equity (D4E) Platform.” D4E includes non-narrative data available from the BMC Epic EHR (diagnoses, medications, procedures, labs, vitals, clinical observations, and visits) and will also include data from our partner FQHCs.
BMC is a national leader in routine assessment of SDoH during clinical encounters. In 2017, BMC developed a one-page SDoH screening tool, THRIVE, that uses a subset of eight SDoH questions from national screening instruments [Reference Buitron de la Vega, Losi and Sprague Martinez11] Most THRIVE questions use the same question text and answer choices as an existing national screener (PRAPARE, AHC). THRIVE includes graphics designed to improve readability and is shorter (one-page) than other screening instruments to optimize workflows. Although THRIVE use is limited to BMC and a limited number of partner CHCs and Health Systems, its adoption at other sites is growing and by reusing the same questions as those in other surveys is able to assess SDoH in a similar that is consistent with other instruments. THRIVE questions assess housing security, food security, financial stability (trouble paying for medications/utilities), transportation challenges, trouble caring for family members, employment/unemployment challenges, and desire for additional education. THRIVE data from the EHR are mapped to standard terminologies and stored within the D4E Datamart [Reference Phuong, Hong and Palchuk18].
H2E also supports integrated use of place-based SDoH attributes. During preparation and updates of D4E data, patient addresses are geocoded to the census and zip code level. Data for census- and zip-code-level “place-based” social and environmental drivers of health (e.g., Child Opportunity Index (COI), Social Vulnerability Index (SVI), and American Community Survey (ACS)) for all census tracts in the USA [Reference Cutter, Boruff and Shirley22–Reference Ferrara, Cammisa and Zona26]. Data in D4E are a limited dataset (PHI limited to dates, zip codes, and census tracts).
Equity dimensions and equity attributes
In H2E, health equity outcomes are referred to as “Equity Dimensions (Dimensions)” since not all the observations are health outcomes. Dimensions can be demographic features (e.g., percent of population by race, ethnicity, sex, or member of a special population), SDoH features (e.g., prevalence of food or housing insecurity), medical or behavioral health condition (e.g., prevalence of autism or anxiety/depression), or clinical quality measure (e.g., percent of patients with diabetes with controlled hemoglobin A1C). Dimension data are precomputed and stored within the H2E Datamart in the “equity_dimension” table for each person and each year of eligibility. We selected an initial set of Dimensions that included children and adults and represented a diverse set of health domains, including Health Conditions, Prevention/Screening, Immunizations, Behavioral Health, SDoH, Demographics, and Disability (see Table 1).
AHRQ = Agency for Healthcare Research and Quality; ASD = autism spectrum disorder; CDM = common data model; COI = Child Opportunity Index; EHRs = electronic health records; H2E= Health Equity Explorer; PHQ = Patient Health Questionnaire; PSC = Pediatric Symptom Checklist; SVI = Social Vulnerability Index; SDoH = Social Determinants of Health.
Race and ethnicity are used in our analyses based on availability within our EHRs. We consider these features to be social constructs that reflect unmeasured factors related to individual and structural racism, racialization, and experiences of discrimination.
In H2E, features that may be drivers or determinants of health are referred to as “Equity Attributes (Attribute)” and include demographics (e.g., race and ethnicity), SDoH features, Dimensions, and place-based features (see Table 1). Attributes are precomputed and stored in the H2E Datamart in the “demo_attribute” and “dim_attribute” tables. The “dim_attribute” table allows all Dimensions to be available as Attributes and is created via a a postprocessing table pivot. In H2E, features can function as both Dimension and Attribute. For example, “anxiety or depression diagnosis” can be a Dimension and “food insecurity” an Attribute in one analysis, and in a separate analysis, “anxiety or depression diagnosis” can be an Attribute and “control of hypertension” could be the Dimension.
Our goal with the initial set of Dimensions and Attributes was to demonstrate the feasibility of our approach and to build a framework that can support many more of each in the future. Table 1 describes domains, dimensions, measure specification location, and CDM source.
H2E data
SQL code customized to the vocabularies of the target CDM are used to populate the H2E DataMart. Dimension and Attribute processing begins in a staging area. For each year and Dimension, all eligible patients are assigned a “status” (e.g., “controlled” or “uncontrolled” for diabetes) and a value (e.g., “secure,” “at risk,” or “insecure” for food insecurity). Dimensions are assessed one time per patient per year (with most recent values typically used). Logic considers timing of events for clinical and place-based variables to ensure that they are only included after a condition was diagnoses (e.g., a patient with first diagnosis of diabetes in 2020 would only be considered to have the condition from on or after 2020) and when available are based on validated code sets and logic from CMS endorsed eCQMs.
After processing, data for each Dimension is consolidated in the H2E Datamart “equity_dimension” table along with supporting demographic tables. The H2E application only requires two tables: “person_data” and “fips_data.” The “person_data” table includes all patient-related data needed to generate Dimension and Attribute measures and links to the Federal Information Processing Standards (FIPS) codes of residence. The “fips_data” table includes FIPS code-level data related to the census tract of residence. The current “person_data” table design was developed to support an earlier Tableau-based H2E and will be optimized in the coming year to reduce duplication and increase efficiencies. The two primary tables for H2E are database “views” (linked tables presented as a single table) (see Fig. 1). The “dim_attributes” table is generated from the “equity_dimension” table via an SQL pivot script which allows the application to use any Dimension as an Attribute via a table linkage (SQL JOIN) (see Fig. 1). For our pilot version of H2E, we limited our place-based data to COI and SVI. The “fips_data” view was created via a join of SVI and COI data by FIPS code. FIPS data can easily be added to the “fips_data” View (Fig. 2) as needed via a relational join to the FIPS column. Currently, one race, ethnicity, and sex status are supported for each measurement year. In the coming year, we will add support for multiple races. Attribute data are also available as filters in H2E (Fig. 1).
H2E application development
We used an interactive design process with input from multiple stakeholders, including leaders of the BMC “Health Equity Accelerator,” CABs, and expert users. We also used materials from the Observational Health Data Sciences and Informatics (OHDSI) community, open-source statistical and application development tools, and standard measure specifications and value sets for target outcome measures. The application was initially developed in Tableau during May 2022–March 2023 and was then transformed into an R-Shiny app between June and September 2023. We chose R-Shiny to support a much broader array of statistical functionality not possible in Tableau and to enable open-source sharing of our application in the future. The R-Shiny develop work was done in collaboration with Appsilon, LLC (www.appsilon.com) and the “Data for Good” Program.
Results
We have developed an open-source platform that integrates clinical and place-based SDoH data. As of December 15, 2023, H2E contains 8,478,301 rows of Dimension data for 705,686 people who attended BMC at least once since 2016 and met criteria for at least one Dimension. For this group, 996,382 unique observations for questions related to food and housing security were available for 324,630 patients with 65,152 (20.1% of those with at least one visit) of the patients reporting food or housing insecurity at least once.
Health outcomes
In the Health Outcomes section, users choose a Dimension and then select Attributes to stratify the outcome and visualize it as a graph or a data table. For this paper, we demonstrate this functionality using the example of control of hemoglobin A1C for patients with diabetes, stratified by race and sex (see Fig. 3). The population can be filtered by Attribute values and by clicking the “missing” checkbox, the number and percent of patients where the Dimension was not assessed will be displayed so users can assess differences and biases in assessment rates (Fig. 3). We also explored relationships between the results of behavioral health screening for children and adults and food security. The PSC-17 is a routine screening tool to assess internalizing, externalizing, and attentional issue in 6–12-year-old children [Reference Jellinek and Murphy27–Reference Jellinek, Murphy, Little, Pagano, Comer and Kelleher29]. At BMC, the PSC-17 screener is given with a THRIVE form, so results of screening for both instruments is often available. A score of less than 15 is considered “normal.” The Patient Health Questionnaire (PHQ)-9 is a routine screening for depression [Reference Kroenke, Spitzer and Williams30]. At BMC, the PHQ-9 is also often given with THRIVE. A score of less than 10 is considered “normal.” As shown in Figure 4, food-insecure children and adults were substantially less likely to have a normal PSC-17 or PHQ-9 result. The results shown were generated in less than 5 minutes.
Advanced analytics
Dimensions and Attributes can be included in additional analyses in the Advanced Analytics section for inclusion in univariable and multivariable analyses using R Packages (Fig. 5). Users can descriptively model relationships of health outcomes and predictor variables. The exploratory data analysis tab helps assess collinearity, distribution of data, and the individual association of a variable with a health outcome. Data are fit to a logistic regression model to predict the likelihood or odds of a patient meeting the chosen Dimension criterion for a given set of Attribute. Estimated marginal means (EMMs) are used to calculate the average likelihood of a patient meeting the criteria for a given metric within different subgroups (by race, age, sex, etc.). EMM is calculated by taking the average of each group’s predicted values after adjusting for the other variables in the model providing a more interpretable understanding of the results of a logistic regression analysis.
The modeling component of the application allows users to select which Attributes to include in the model and how to group the results. Running the model returns coefficients, confidence intervals, and metrics to assess model performance, like variance inflation factor. A simple example of advanced analytics evaluating the association of results of depression screening via the PHQ-9, and sex and food security is shown in Figure 5. In this example, a score below 10 is a “normal” or subthreshold score, so a higher proportion having a score below 10 is a positive outcome. In the example, women and food-insecure respondents were significantly less likely to have a score less than 10.
Geospatial visualization
In the “Neighborhood Maps” Section, users can explore visualizations of Dimension by census tract and simultaneously view place-based features from the list of SVI and COI reference data by census tract. Users can then select the “bivariate” checkbox to layer the two views to visualize the additive effect of the two features. The current H2E data model can support place-based data at the zip code, census tract, county, and state. In the future, the functionality in this section will be expanded to support all these visualizations and will also be expanded to allow users to see much more detailed information about each area of interest. A simple example of a place-based visualization of blood pressure for patients with hypertension and SVI socioeconomic status is shown in Figure 6.
Discussion
In this report, we describe a process, application design, and provide pilot data. Our primary goal was to demonstrate the functionality and potential uses for H2E, especially in the area of assessing patient reported and place-based SDoH. Our experience to date shows that data for a diverse set of health equity dimensions and attributes can be generated for children and adults using a CDM and shareable code. Since December 2023, we have already added over 400 additional place-based attributes and are able to develop and validate new Dimensions in several days. At our site, we are hosting H2E on a server located within the hospital intranet. Our plan over the coming year is to offer access to users with access to our other translational informatics resources (TriNetX, OMOP, i2b2, and PCORNet). Our experience shows that with a tool like H2E, the data are easily explored in an interactive way as graphs, tables, statistical analyses, and maps in a way that allows dynamic exploration of the role of patient reported as well as place-based social and environmental drivers of health.
Advancing health equity is a national priority, and the fundamental causes of health inequity, such as racism, are increasingly being recognized as public health crises [31]. H2E is a platform that most sites with a CDM could implement and use with existing staff and expertise. For OMOP-based settings, our SQL scripts could be used directly. For other CDM sites, reverse-engineering our OMOP scripts with mappings to concepts within i2b2 or PCORNet would be relatively easy, and with shared code libraries, the scripts could be shared. In this way, the underlying data for H2E could be generated from virtually any clinical data source and could potentially serve as a standard way to share “health equity” insight between different CDM using communities.
We acknowledge that data visualization and analytics in isolation will do little to advance health equity; however, we hope that tools like H2E can “shine a light” on inequity and identify “bright spots” that could be used to potentially identify solutions. We envision multiple potential use cases for H2E.
One potential use case would be for research users on institutional level to perform self-service exploration onsite using H2E hosted on an internal “Proxy” server (as at the site of this project). In this way, a large number of research users could explore existing equity dimensions quickly to prepare for research proposal submission. New Dimensions could also be added quickly and then explored immediately by the full range of already computed Attributes. The benefits of this case would be to generate new projects and proposals and monitor improvement activities related to hatchback health equity moving forward.
A second potential use case would be collaboration with public health leaderships at a city or state level. Sites with an existing CDM could share aggregate findings easily or use privacy-preserving record linkages to link records across the city/state to study health equity in locations with multiple care sites. Results could be used to inform health and policymakers and evaluate community-based interventions such as those targeting economic mobility and housing in neighborhoods.
A third potential use case could be as a patient engagement tool. Patients could work with community advisory boards (CABs) to identify priority conditions and then develop new Dimensions informed by the community to explore health equity at a neighborhood level. Such an approach could help engage patients in the research process and stimulate conversations leading to new promising research activities.
D2E could also be used within a National Health Equity Research Network between CTSA’s and other research institutions with established CDM data. These centers are well prepared to add place-based data to their data models (if not already present), and with a library of shared analytic code tailored to each CDM, the effort required would be relatively small. Such a collaboration could start small but would be expected to grow quickly. Data sharing at this level would have great potential to support comparative health equity research on a national scale.
H2E is also well suited to education, training, and research applications. A wide range of potential users could be supported in a hands-on way that brings together geospatial and clinical data to evaluate equity in their community. We plan to extend the H2E Advanced Analytics module to include new modules related to machine learning this year. Developers and data scientists could also use the open-source H2E platform to build new applications that leverage existing R packages. Lastly, the underlying data for H2E can easily be linked to the source CDM to allow data scientist to use the precomputed H2E outcomes and the underlying CDM data to support advanced data science applications. Clearly, the benefits of this could potentially lead to accelerated workforce development, multi-stakeholder engagement, and new opportunities in data science.
Limitations
H2E provides easy access to detailed descriptive analyses; however, there are well-known limitations of using EHR data, and in most cases, additional analyses will be required to validate findings observed in H2E. Users of H2E will continue to need to have training in health equity and health services research. Analyses using H2E should be considered exploratory and best used for signal detection and hypothesis generation, since prediction modeling generally requires a specific set of methods that go beyond what is included in the tools. In addition, users are urged to not draw overly strong conclusions from results and to not use these results in ways that generalize, essentialize, or stereotype certain groups. H2E should be used alongside other sources of evidence if guiding interventions.
An additional limitation is that the THRIVE instrument is currently only used at a small number of clinical sites and that as a safety-net hospital system, our results may not be generalizable to other sites. Even in our health system that has placed a very high priority on screening for SDoH screening, assessment is not universal. Unfortunately, most sites in the USA do not currently screen for SDoH. However, it should be noted that H2E is not limited to nor does it require patient reported SDoH data to provide rich insights and analyses related to health equity. All systems with CDMs have access to rich clinical and demographic data, and the addition of place-based data is feasible for many sites. While we hope that more health systems will be routinely asking patients about their SDoH experience soon, for sites that do not, tools like H2E could still offer value and insights.
Conclusion
The H2E can be used to support dynamic and interactive explorations of the diverse drivers of health and health inequity as graphs, tabular data, statistical analyses, and maps. The system has the potential to support multiple CDMs and many more health equity dimensions and attributes in the future. With expanded use and partnerships, these tools have the potential to support distributed health equity research and intervention on a national scale.
Acknowledgements
The authors would like to acknowledge and thank (1) Ravin Davidoff, MD, Elena Mendez-Escobar, PhD, MBA, and Megan Bair-Merritt, MD, MSCE,for their support and leadership through the BMCHS Health Equity Accelerator over the past 2 years, (2) our colleagues from the National Covid Cohort Collaboration (N3C) Social Determinants of Health Workgroup for their pilot evaluation and enhancement suggestions: Charisse Madlock-Brown, PhD, Anthony Solomonides, PhD, MSc, MSc, and Juan Espinoza, MD, and (3) the project team from Appsilon: Ian E. Moore, Andrzej Białaś, Jakub Stępniak, Michał Parkoła, Yury Pribysh, and Can Taşlıçukur.
Author contributions
All four authors participated in the conception and design of the work. Dr Adams and Ms Gasman participated in the collection and management of data. All four authors participated in the creation and evaluation of the software and tools, conduct and interpretation of analysis, and drafting of the manuscript. Dr Adams takes responsibility for the manuscript as a whole.
Funding statement
This project was supported by the Boston Medical Center Health Equity Accelerator and National Center for Advancing Translational Sciences (1UL1TR001430).
Competing interests
None.