Introduction
The global COVID-19 pandemic has spotlighted the relevance of health research and evidence-based public policy decision-making around the world. Technological advancements have made it possible to collect, share, and analyze large amounts of health data. However, appropriate data collection infrastructures and instruments are needed to collect high-quality data, which have been shown to be lacking in several countries during the COVID-19 pandemic (e.g., Klingwort & Schnell, Reference Klingwort and Schnell2020; Schaurer & Weiß, Reference Schaurer and Weiß2020). Moreover, the quality of empirical evidence relies heavily on people’s willingness to share their health data for research purposes (Aitken et al., Reference Aitken, Jorre, Pagliari, Jepson and Cunningham-Burley2016). Willingness to share data is closely connected to questions of data privacy and ethics that need to be asked anew with the rise of novel data sources, such as smartphone sensors that track bodily functions and mobility (Oberski & Kreuter, Reference Oberski and Kreuter2020; Struminskaya et al., Reference Struminskaya, Lugtig, Toepoel, Schouten, Giesen and Dolmans2021). In this context, data collectors need to take a fine-grained perspective on such sentiments as acceptance of data use may strongly depend on the concrete scenario in which a person is asked to share personal information. This is because the legitimacy of a specific data collection may be questioned by individuals if strong and transparent privacy safeguards are not in place along each step of the data sharing process.
To comply with the public’s privacy expectations, policymakers and data collectors need to know the conditions under which the collection of specific kinds of data is considered acceptable by their citizens. Understanding privacy as “contextual integrity” (CI; see Nissenbaum, Reference Nissenbaum2010, Reference Nissenbaum2019) provides a context- and situation-sensitive perspective on data flows that allows us to investigate the circumstances under which people accept the collection and use of their health data. CI is upheld if no violation of context-specific privacy norms occurs. CI posits that the (novel) data flow needs to be specified and then evaluated to determine whether it conforms with established and context-specific privacy norms.
The novelty of data flows that aim to improve public health depends on which practices are already established in contexts within specific countries. For example, Germany is a country in which the digitalization of the health system is not advanced compared with many other EU countries (Bertelsmann Stiftung, 2019). Several technological and medical developments (e.g., electronic patient records) could be more integrated into the maintenance of individual and public health. Sensor data from smartphones promise greater digitalization of medical health research. However, in order to roll out new systems, such as applications that monitor COVID-19, in a manner that is ethical and acceptable to the public, it is crucial to construct data flows that align with contextual norms. Yet, most of these technologies require data flows that citizens are not familiar with, and social norms for these data have only been established to a limited extent. Still, these novel data flows may be embedded in established social contexts or resemble already existing data flows (see, e.g., Vitak and Zimmer, Reference Vitak and Zimmer2020, with respect to the acceptance of COVID-19 contact tracing apps depending on situational parameters). Therefore, to improve individual and public health, we need to learn which health data flows are considered appropriate in which contexts.
Against this backdrop, we investigated the conditions under which individuals deem the sharing of different types of health data to be more acceptable, particularly with respect to the sharing of health data for public or personal benefit. Our study drew on the framework of CI to define 18 unique data sharing scenarios, which were presented to respondents in an online vignette survey experiment (Auspurg & Hinz, Reference Auspurg and Hinz2015). These scenarios varied on three levels: data type, data recipient, and data use purpose. We presented randomly selected vignettes related to cancer research, which has the advantage that our results were not directly affected by current events or changes in the global health situation regarding the COVID-19 pandemic. At the same time, cancer receives a large amount of attention from the scientific community and the public and affects many people’s lives. Thus, combating this disease should be relevant to most citizens. Therefore, willingness to share data for cancer research may be higher than for comparatively less severe and/or less known diseases.
Studying willingness to share health data across different scenarios allows us to better understand which data flows are socially considered appropriate for sharing health data for private and public benefit. In particular, given the interplay of public and private entities in handling such new types of health data flows, the findings tell us whether private- and public-benefit uses of health data are accepted only when requested by private and public data recipients, respectively. This empirical investigation provides insights by shedding light on the nature of social norms in the health contexts—that is, which recipients and which data are appropriate to be used in the provision of personal and public health. For data sharing practice, the findings can inform the design of data collection activities of public and private organizations and help adjust practices to the expectations of individuals, thereby increasing the trust and willingness of citizens to participate.
Theoretical background
CI provides a framework to jointly investigate several relevant features of data flows, thereby allowing researchers to empirically ascertain which factor combinations are publicly accepted and align with social norms. From a CI perspective, the following situational parameters need to be specified to sufficiently describe data flows: the data type; the involved actors, such as the data sender and recipient; and the transmission principles, that is, the “rules” under which the data are transferred (Nissenbaum, Reference Nissenbaum2010, Reference Nissenbaum2019). For example, individuals (data senders) might find it acceptable to provide sensor data from their smartphones (Data Type A) to a company (Recipient A) or to give consent to transfer a copy of their medical records (Data Type B) to university researchers (Recipient B) but not to a public authority (Recipient C). The CI perspective, however, does not allow us to make predictions about whether specific parameters, such as specific data types or recipients, will be generally more accepted. Instead, it can be argued that the closer a specific data flow is to contextual privacy norms, the higher is the likelihood that people will accept this data flow.
The CI theory suggests a prescriptive understanding of social norms, that is, what is “right” to do in a certain situation (Nissenbaum, Reference Nissenbaum2010). Yet, from a CI perspective, novel data flows may still be acceptable if they fulfill contextual purposes better than established practices, even if they do not conform to them (Nissenbaum, Reference Nissenbaum2010). In such situations, individuals might still be willing to share data, for example, because the data flow serves a public purpose that is perceived as sufficiently important and appropriate. Similarly, individuals may think that a data flow conforms with established norms but may nonetheless be hesitant to provide their data—for example, because the purpose is not perceived as sufficiently important or the effort to share these data is considered too high.
From the perspective of individual decisions to share data when confronted with novel practices, we argue that individuals may consider potential benefits and risks, as suggested by the notion of the “privacy calculus” (Culnan & Armstrong, Reference Culnan and Armstrong1999). More specifically, the privacy calculus assumes that privacy is an economic good that can be traded for benefits, such as other goods or services (Kehr et al., Reference Kehr, Kowatsch, Wentzel and Fleisch2015; Smith et al., Reference Smith, Dinev and Xu2011). For example, individuals may decide whether to use new technologies depending on their ease of use and their usefulness (Davis et al., Reference Davis, Bagozzi and Warshaw1989). Considering the privacy calculus, we suggest that the privacy-specific risks and benefits are related to the fulfillment of contextual norms and goals. This means that individuals evaluate a novel health data flow depending on its appropriateness to fulfill the contextual purpose of promoting health. In short, we argue that novel data flows that do not conform to established norms may still be acceptable to individuals and that their acceptability is linked to the perceived benefits and costs of the new data flow, which are context dependent.
With respect to the purpose of a data flow, we need to determine which purposes individuals consider to be relevant contextual purposes. According to CI, purposes are core constitutive elements of social contexts (Nissenbaum, Reference Nissenbaum2019). Certain sub-contexts (see Nissenbaum, Reference Nissenbaum2010) of the health context might be understood to serve one purpose more than another. For example, the doctor-patient relationship is likely to constitute a sub-context that has the purpose of improving personal health. In contrast, transferring information about COVID-19 symptoms to a local public health agency likely serves the purpose of safeguarding public health. Yet, in both cases, personal and public benefits may arise. With respect to the acceptability of data sharing, however, it is a crucial to determine which uses are perceived to serve the desired improvement of public or personal health and which uses are perceived to violate central tenets of the health context.
In line with CI theory, our study has a strong situational and exploratory component as we cannot stipulate that any data type, recipient, or purpose that is aimed at providing individual and collective benefits is, as such, more or less acceptable to individuals. Instead, we need to consider the situational parameters in interaction with another. Given the theoretical considerations outlined earlier, our hypotheses are led by three prepositions: Health data flows that are closer to established privacy norms are more likely to be accepted by individuals (P1). Individuals are more likely to share their health data when the benefits (personal and collective) of doing so appear higher and the costs (e.g., required effort and consequences of out-of-context use) appear lower (P2). The potential benefits and costs of a novel data flow need to be interpreted with respect to the social context in which the data flow is embedded (P3). In the following, we specify the CI framework parameters to investigate the conditions under which individuals are willing to share their health data.
Previous research
Prior empirical research has investigated the willingness to share health data in several scenarios, showing, for example, that data sharing is viewed as most acceptable when the purpose is in the interest of the public, when the data are shared in a privacy- and security-preserving way,Footnote 1 and when the data recipient can be trusted (Waind, Reference Waind2020). Previous work on the use of health administrative and clinical trial data also found that trust and public benefits are key to data sharing acceptability (Hutchings et al., Reference Hutchings, Loomes, Butow and Boyle2020). In addition, control over the data that are shared was shown to be an important mediating factor that influenced willingness to share health data (e.g., Jones et al., Reference Jones, Daniels, Squires and Ford2019; Juga et al., Reference Juga, Juntunen and Koivumäki2021; Stockdale et al., Reference Stockdale, Cassell and Ford2018). It was also emphasized that citizens are concerned about the profit orientation of commercial data recipients and that they favored a public benefit for those data recipients (Aitken et al., Reference Aitken, Jorre, Pagliari, Jepson and Cunningham-Burley2016).
Earlier research also found that people are indeed willing to share (health) data, such as biobank data, for health research purposes (Husedzinovic et al., Reference Husedzinovic, Ose, Schickhardt, Fröhling and Winkler2015). In contrast, more skepticism can be expected for health-related use of data collected in nonhealth contexts. For example, previous research showed that the use of data collected on Facebook for research purposes is often less accepted than uses that are merely aimed at improving user experience (Gilbert et al., Reference Gilbert, Vitak and Shilton2021). Similarly, a survey showed that linking health data to personal nonhealth data was less acceptable than linking data from the same context (Aitken et al., Reference Aitken, McAteer, Davidson, Frostick and Cunningham-Burley2018).
Previous survey experiments based on CI have shown that respondents’ privacy attitudes changed depending on who exactly received which kinds of data under which conditions. For example, Martin and Nissenbaum (Reference Martin and Nissenbaum2017) showed that commercial uses (e.g., health data sold to pharmaceutical companies for marketing) overall conform less with privacy expectations than uses that fulfill contextual purposes (e.g., health data used for research to improve health conditions). In another study, Martin and Shilton (Reference Martin and Shilton2016) showed that privacy expectations with respect to data collection from mobile devices for targeted ads and tracking greatly vary depending on the situational parameters. In addition to such situational parameters and contextual norms, individual characteristics may impact citizens’ evaluations of data flows. For instance, individuals with high trust in government institutions may be less skeptical of data used by public authorities than individuals with lower institutional trust (Kehr et al., Reference Kehr, Kowatsch, Wentzel and Fleisch2015). While individuals may, regardless of their level of trust in the government, support the use of health data for research that aims to improve public health generally (Waind, Reference Waind2020), they would likely disagree on who should receive such data to achieve this purpose. Other individuals may reject the idea of sharing their personal health data with any recipient because they regard the requested data as too personal and the data sharing request as intrusive (Lacasse et al., Reference Lacasse, Gagnon, Nguena Nguefack, Gosselin, Pagé, Blais and Guénette2021).
Gerdon et al. (Reference Gerdon, Nissenbaum, Bach, Kreuter and Zins2021) conducted a vignette experiment on the acceptability of data sharing in which they compared the acceptance of data sharing of health data with two other data types (energy consumption and location data). They also experimentally varied the organization that received the data (a public authority or a company). Surprisingly, sharing data with a public institution was overall less accepted than sharing data with a private organization. This finding has worrisome implications, especially considering the COVID-19 pandemic but also in general for other public health crises, as public institutions rely on data to monitor and prevent the spread of diseases, for example, through contact tracing apps or the targeted implementation of public health campaigns. However, the study only investigated one specific type of health data, while health research and public health policy rely on several sources of data to tackle issues of public health.
Willingness to share health data: Data type, recipient, and purpose
In this section, we discuss the effects of changes in CI-based data flow parameters on the willingness to share health data. In particular, we are interested in several recent technological and medical opportunities that have the potential to be used more frequently in Germany and in many other countries in the near future: electronic health records, biomarker data,Footnote 2 and health-related smartphone sensor data. These data types cover different types of health data collections that may happen in different social contexts with various data recipients. They especially may involve privacy considerations specific to the data type and/or private actors (Gerdon et al., Reference Gerdon, Nissenbaum, Bach, Kreuter and Zins2021). On the one hand, medical records and biomarker data are usually collected in narrow and well-defined contexts that suggest high standards of data protection—that is, by physicians or other care providers, health insurances, and researchers. Sensors, one the other hand, can amass high volumes of data in infrastructures in which sharing is technically feasible among multitudes of actors, such as app developers, smartphone providers, and other third-party actors. Individuals may associate various contexts and potential uses when considering sharing their sensor data. The use of sensor data out-of-context appears to be a more salient threat than, for example, for the use of biomarkers, which has been discussed with respect to COVID-19 tracing apps (Vitak & Zimmer, Reference Vitak and Zimmer2020). Therefore, we expect that people will be more likely to agree to share their biomarker data and medical records than their sensor data if the recipient has a public background (H1.1). For private recipients, we expect that individuals will be less likely to share data that are associated with specific health contexts (medical records and biomarkers) than sensor data (H1.2). Overall, we argue that the high effort required to share biomarkers (e.g., blood) results in a particularly strong data sharing hesitancy for this data type. Therefore, the acceptance to share biomarker data should be, ceteris paribus, the lowest among the three data types studied (H1.3).
With respect to data recipients, a particular concern is the previously found lower acceptance of data sharing with public institutions compared with private entities in Germany (Gerdon et al., Reference Gerdon, Nissenbaum, Bach, Kreuter and Zins2021). Such reluctance might result from concerns that government institutions could use the data for different purposes than initially intended without asking for permission (Turow & Hennessy, Reference Turow and Hennessy2007; Weitzman et al., Reference Weitzman, Kelemen, Kaci and Mandl2012). While such concerns can be present for private recipients (e.g., companies) as well, concerns about potential consequences might be more pronounced for public institutions, especially with respect to government surveillance. However, research shows that there are differences in trust levels across public institutions (Krause et al., Reference Krause, Brossard, Scheufele, Xenos and Franke2019), and citizens may approve of public-benefit uses of data with respect to certain public institutions that explicitly follow research purposes, for example, dedicated university research centers (Karampela et al., Reference Karampela, Ouhbi and Isomursu2019; Mello et al., Reference Mello, Lieou and Goodman2018).
Given the different possible public and private recipients, we argue that out-of-context use is least likely to be expected from university research centers. At the same time, the recipients are unlikely to be associated with differences in perceived benefits or required data sharing efforts. Therefore, we expect that the willingness to share data will be higher for university research institutions than for public health authorities and private companies (H2.1). Moreover, trust is a central prerequisite for accepting the sharing of health data (Bauer et al., Reference Bauer, Keusch and Kreuter2019). Individuals may vary in their trust toward different recipients, irrespective of the indicated purposes for which the data will be used. Therefore, higher trust in the respective organization should, ceteris paribus, lead to a higher willingness to share data (H2.2).
Taking the contextual perspective into account, a data recipient can never be fully separated from the purpose for which the recipient plans to use the data. While each of the data types can be analyzed to provide a benefit to the individual data subject (e.g., improvement of diagnoses, recommendations on health-related behavior) and/or recipient, the public also appears to be willing to accept the use of health data for the public interest (Bearth & Siegrist, Reference Bearth and Siegrist2020; Waind, Reference Waind2020)—that is, to improve public health. In both cases, individuals may perceive the data sharing to be useful. Yet, while we assume that individuals will be generally more likely to share their data if they anticipate a personal benefit (H3.1), it may depend mainly on the data sharing context, especially on the data recipient, to determine in which situation(s) these benefits are considered as sufficient, for example, because of the low risk of out-of-context use.
Some sub-contexts of the health context might be more oriented toward promoting individual health (e.g., doctor-patient relationships), while others are more linked to the improvement of public health (e.g., health agency-individual relationships regarding notifiable infectious diseases). It is likely that public recipients are associated with public-specific contextual goals, while private recipients are associated with private-specific contextual goals. However, Gerdon et al. (Reference Gerdon, Nissenbaum, Bach, Kreuter and Zins2021) did not consistently find such a relationship. Yet, individuals are expected to have a higher likelihood of fearing out-of-context use if recipients use data for a purpose that is not in accordance with established norms. Therefore, we expect that a match between a private data recipient and a private purpose and a public data recipient and a public purpose will result in higher acceptance rates than a “mismatch” between data recipient and purpose (H3.2).
Beyond contextual characteristics, individuals may vary in how much they are willing to help others and contribute to the public welfare. That is, some individuals may be more inclined than others to perceive public health benefits as an appropriate purpose compared with individual health benefits. Thus, we hypothesize that individuals who display higher altruism (Kim & Stanton, Reference Kim and Stanton2016) will be more willing to share health data for public benefit than people with lower scores on altruism (H4.1). Similarly, we assume that the more individuals perceive public duties (Voigt et al., Reference Voigt, Holtz, Niemiec, Howard, Middleton and Prainsack2020), such as voting and paying taxes, as important obligations of good citizens, the more willing they will be to share health data for a public benefit (H4.2). In addition, given the general trend of increasing trust in scientists in recent years (Funk et al., Reference Funk, Hefferon, Kennedy and Johnson2019),Footnote 3 we expect that higher levels of general trust in the scientific community will positively affect the likelihood to share data for a public benefit (H4.3). Sharing for a personal benefit should be less or not affected by trust in science.
Finally, without a concrete hypothesis, we collected data about respondents’ cancer exposure, smartphone and smartwatch use, technical affinity, social trust, and political ideology. These supplementary analyses, which are exploratory in nature, are reported at the end of the Results section.
Given the importance of data sharing for health research and policymaking, the results of our study can help inform the scientific debate about data sharing hesitancy. The study can help develop best practice advice for three data types (sensor data, medical history, and biomarkers) but also identify privacy-related social norms. Since, in practice, there is rarely a previously tested scenario that exactly matches the needs of a data recipient, the study can contribute to a better general understanding of how situational parameters may work differently for different data types. Additionally, the breakdown of data types, recipients, and purposes allows us to estimate the relative importance of each component. This will help identify the main drivers of respondents’ willingness to share data. For example, for some groups of respondents, their level of trust in the data recipient might be especially important, whereas for other respondent groups, the purpose might be the most relevant variable. Getting a deeper understanding of the mechanisms behind nonacceptance can also help us develop successful and privacy-conforming data sharing practices that increase willingness to share data for research.
Preregistered research design
We conducted a preregistered survey experiment in which we randomly varied parameters of the data flow as defined by the CI framework to learn which kinds of health data German citizens were willing to share under which conditions.Footnote 4 The so-called vignette experiment or factorial survey experiment (Auspurg & Hinz, Reference Auspurg and Hinz2015) was implemented in a web survey in Germany with a minimum sample size of about 750 respondents. This sample size was based on an approximated power analysis using an ANOVA design with repeated measures and within-between interaction, using the software G*Power (Faul et al., Reference Faul, Erdfelder, Lang and Buchner2007) (input parameters: effect size = 0.1,Footnote 5 α error probability = 0.05, power = 0.95, number of groups = 18, number of measurements = 3, nonsphericity correction = 1). The suggested sample size was 648 respondents. To account for possible exclusion of cases because of insufficient data quality, we increased the minimum sample size by 15 percent, which resulted in 746 respondents. The respondents were recruited from a German commercial online nonprobability access panel and received a small monetary incentive for their participation. To ensure a heterogeneous sample, we screened by gender, age, and educational attainment to represent noncrossed quotas of the German general population.
As displayed in Table 1, the vignette experiment included three dimensions: data type (sensor data, medical records, biomarkers), data recipient (public health agency, university research center, private company), and purpose of the research (public policy, personal recommendation). This resulted in 18 unique vignettes (3 × 3 × 2). We presented each respondent with one vignette on each data type in random order. Thus, each respondent was randomly assigned to one of the six versions (three data recipients combined with two purposes) for each data type. Random assignment and order allowed us to control for potential context effects.
Structure of vignettes: [DATA TYPE]. With the consent of a person, these data are transmitted to a German [RECIPIENT]. This [RECIPIENT] uses these data [PURPOSE]. The [RECIPIENT] guarantees that the data are safe, anonymous, and protected from misuse.
To specify all CI parameters, we needed to define the data subject, data sender, and transmission principle. We kept the transmission principle constant by defining a high level of individual control over the data use—that is, we measured individual willingness to share under conditions that enable individuals to make an active decision to agree to data use or not (i.e., opt in). The data subjects were always the respondents themselves. Finally, the data sender was always fixed within each data type and adjusted to produce a realistic scenario.
The following sections provide descriptions of the vignettes by data type.
Data Type 1: Sensors
Sensors installed on smartphones, smartwatches, and other wearable devices collect data that can be used to assess the health condition of people. With the consent of a person, these data are transmitted to a German public health agency [private company; university research center]. This public health agency [private company; university research center] uses these data for a research program to fight cancer. [This public health agency [private company, university research center] uses these data to provide the persons with personal recommendations on their health behavior with respect to protection against cancer.Footnote 6] The public health agency [private company; university research center] guarantees that the data are safe, anonymous, and protected from misuse.
Data Type 2: Medical records
Health records obtained from doctors’ offices can be used to assess the health conditions of people. With the consent of a person, these data are transmitted to a German public health agency [private company; university research center]. This public health agency [private company; university research center] uses these data for a research program to fight cancer. [This public health agency [private company; university research center] uses these data to provide the persons with personal recommendations on their health behavior with respect to protection against cancer.] The public health agency [private company; university research center] guarantees that the data are safe, anonymous, and protected from misuse.
Data Type 3: Biomarkers
Blood samples that are collected for biobanks can be used to assess the health conditions of people. With the consent of a person, these data are transferred to a German public health agency [private company; university research center]. This public health agency [private company; university research center] uses these data for a research program to fight cancer. [This public health agency [private company, university research center] uses these data to provide the persons with personal recommendations on their health behavior with respect to protection against cancer.] The public health agency [private company; university research center] guarantees that the data are safe, anonymous, and protected from misuse.
We then asked respondents, “How likely or unlikely would you agree to share your health data for this purpose?” The response categories were as follows: (1) very unlikely, (2), (3), (4) neither likely nor unlikely, (5), (6), (7) very likely.
Other measures
The study included several additional measures,Footnote 7 which were needed to test some of our hypotheses (trust in science in general, trust in public health agencies, private companies, and university research centers, altruism, attitudes toward public duties) and to conduct the additional exploratory analyses (cancer exposure, smartphone and smartwatch usage, technical affinity, social trust, political ideology, and sociodemographic characteristics). Specifically, respondents’ cancer exposure was measured by asking whether the respondent, a relative, or a close friend had ever been diagnosed with cancer. Device ownership was measured by a single multiple-choice question. Technical affinity was measured using five rating scale items about, for example, how good a respondent is at operating digital systems (Schauffel et al., Reference Schauffel, Schmidt, Peiffer and Ellwart2021). Public duty was measured using three items featuring a rating scale that asked about what respondents think a good citizen should do (e.g., to obey laws; ESS Round 1: European Social Survey, 2018). A respondent’s level of institutional trust with respect to the three data recipients of our vignette design, and with respect to science in general, was assessed using individual items with a rating scale for each institution (based on ESS Round 9: European Social Survey, 2021). Similarly, social trust was asked using a single item with a rating scale asking whether most people can be trusted or not (ESS Round 9: European Social Survey, 2021). Respondents’ altruism was measured by asking about their willingness to do something good without expecting anything in return (SOEP-IS Group, 2021). Finally, political ideology was measured using respondents’ self-reported left-right orientation (ESS Round 9: European Social Survey, 2021).
The question wordings for all these measures are provided in the appendix. For measures that include multiple items, we conducted an explorative factor analysis to verify that the items load on a single factor. Items with lower factor loadings than 0.5 were excluded.Footnote 8 Basic sum scores were used to combine the items to a single measure for the respective construct.
The placement of the additional measures within the questionnaire is not a trivial decision. If they are placed before the vignette experiment, they could affect the answers to the vignettes. If they are placed after the vignette experiment, the vignette questions could affect the answers to the additional measures that are intended to explain the answers to the vignettes. Since none of these placements is optimal, a random half of the sample received the additional measures before the vignette experiment and a random half after the experiment. This randomization in the placement of the vignette experiment and the other measures allowed us to control for possible order effects within our analyses. Similarly, we randomized the order of the items within each multiple-item measure to avoid systematic question order effects.
Data
The data were collected using a sample drawn from a German online access panel administered by Bilendi and respondi, which had been used for scientific research before (e.g., Beuthner et al., Reference Beuthner, Keusch, Silber, Weiß and Schröder2022; Daikeler et al., Reference Daikeler, Bach, Silber and Eckman2022; Gerdon et al., Reference Gerdon, Nissenbaum, Bach, Kreuter and Zins2021; Silber et al., Reference Silber, Schröder, Struminskaya, Stocké and Bosnjak2019). The field time was between May 30 and June 2, 2022. The panel provider invited 14,000 panel members by email to our survey. In all, 2,423 individuals started the survey by clicking on the link in the invitation email. Of these, 34 panel members were screened out, and 1,088 could not participate because our quotas had been reached. Another 140 respondents did not complete the questionnaire. This resulted in 1,161 completed interviews before conducting quality checks.Footnote 9 The median response time was 5 minutes and 6 seconds, and the average enjoyment rating of the survey was 4.10 on a scale from 1 “not at all” to 5 “very good.”
To recruit a diverse set of participants, we used quotas based on the German “Mikrozensus” 2019 regarding age, gender, and educational attainment. Descriptive results of the demographics and the other measures of the initial sample (before the data quality checks) can be found in the online supplement (see Table A1 in the Supplementary Materials). The study was approved by the ethical review board of the University of Mannheim (EK Mannheim 22/2022).
Data quality checks
We implemented three data quality checks.Footnote 10 First, we excluded respondents who provided item nonresponse to one of the vignettes or the covariates. As a robustness check, we initially planned to impute missing values and report analyses of our hypotheses with imputation in the online supplement. Second, using paradata on response time, we excluded speeders, that is, respondents who answered the questions so fast that they could not possibly have read and processed the questions. For this, we used the method proposed by Roßmann (Reference Roßmann2010), which identifies all respondents who finish the survey in less than 60 percent of the median completion time as speeders. The analyses without speeders are included in the main text, whereas the analyses with speeders are provided in the Supplementary Materials.Footnote 11 Third, we tested whether the experimental assignment worked with respect to demographic characteristics (i.e., gender, age, and education). For this analysis, we used χ2-tests. In case there was a systematic dependency of the experimental assignment, we used those demographic variables as control variables throughout our analyses.
Data analyses
The data analyses included multilevel models to account for the vignette experiment’s hierarchical data structure (vignettes nested in respondents). First, we analyzed our hypotheses regarding the data type (H1.1 to H1.3), data recipient (H2.1 to H2.2), and purpose of the research (H3.1 and H3.2). H1.3, H2.1, and H3.1 are concerned with the main effects of the data type, data recipient, and research purpose on the willingness to share data, while H1.1, H1.2, and H3.2 were tested by considering an interaction effect between the vignette characteristics and data type, data recipient, and research purpose. To test H2.2, H4.1, H4.2, and H4.3, interactions between vignette characteristics and respondent characteristics were specified, namely, between data recipient and trust in the respective institution (H2.2), research purpose and altruism (H4.1), research purpose and attitudes toward public duties (H4.2), and research purpose and trust in science (H4.3). While the main analyses focused on random-effects models in which the dependent variable was treated as continuous, we implemented two additional model sets as robustness checks. These included fixed-effects models with continuous outcomes and random-effects models in which the dependent variable was treated as ordinal.
Lastly, we conducted exploratory analyses regarding the additional measures. The analyses were conducted using the statistical software R (R Core Team, 2020) and the packages broom.mixed (Bolker & Robinson, Reference Bolker and Robinson2021), flextable (Gohel, Reference Gohel2022a), GGally (Schloerke et al., Reference Schloerke, Cook, Larmarange, Briatte, Marbach, Thoen, Elberg and Crowley2021), hmisc (Harrell, Reference Harrell2021), knitr (Xie, Reference Xie2021), lme4 (Bates et al., Reference Bates, Maechler, Bolker and Walker2015), lmerTest (Kuznetsova et al., Reference Kuznetsova, Brockhoff and Christensen2017), missForest (Stekhoven, Reference Stekhoven2022), mitml (Grund et al., Reference Grund, Robitzsch and Luedtke2021), multilevelTools (Wiley, Reference Wiley2020), officer (Gohel, Reference Gohel2022b), ordinal (Christensen, Reference Christensen2019), plm (Croissant & Millo, Reference Croissant and Millo2018), psych (Revelle, Reference Revelle2021), stargazer (Hlavac, Reference Hlavac2018), summarytools (Comtois, Reference Comtois2022), texreg (Leifeld, Reference Leifeld2013), and tidyverse (Wickham et al., Reference Wickham, Averick, Bryan, Chang, D’Agostino McGowan, François, Grolemund, Hayes, Henry, Hester, Kuhn, Lin Pedersen, Miller, Milton Bache, Müller, Ooms, Robinson, Paige Seidel, Spinu, Takahashi, Vaughan, Wilke, Woo and Yutani2019) for the multilevel models. All statistical tests were two-sided. Anonymous data and statistical analysis code are available through a public repository.Footnote 12 Preprepared analysis code that makes use of synthetic data for implementing the modeling steps outlined earlier was provided on the OSF page of this study as part of the preregistration process.Footnote 13
Transparent changes
We deviated from the preregistration in three instances. First, we planned to test the experimental assignment regarding the region in which respondents live but failed to collect this variable in our study, so we had to deviate from the preregistered analyses in this respect. Second, the step of excluding respondents who did not complete the full questionnaire was moved from the data quality check section to the data section without changing the procedure of excluding incomplete interviews. Third, since only a small number of respondents contributed item nonresponse, we decided to deviate from the preregistration and did not replicate the analyses with imputed values for those respondents.
Results
Data quality and robustness checks
First, we excluded up to eight respondents who provided item nonresponses, depending on the variables included in the specific analysis (see Tables A1–A15 in the Supplementary Materials). Since such a small number of respondents provided item nonresponse, we decided against replicating the analyses with imputed values for those respondents. Second, we excluded 146 speeders, which were defined as respondents who finished the survey in less than 60 percent of the median completion time. Analyses with speeders can be found in the online supplement (see Table A15 in the Supplementary Materials). This robustness check showed that the decision of excluding speeders did not affect the substantive results reported here. Third, a series of χ2-tests confirmed that the experimental assignment of the vignettes worked except for education and data recipient. Thus, we included education as an additional covariate in all models (see Tables A2–A10 in the Supplementary Materials).Footnote 14
In addition, we included a question asking respondents whether they had read the vignettes carefully, with seven response categories ranging from 1, “not at all carefully,” to 7, “very carefully,” which had a mean rating of 6.10. Since only eight respondents selected the values 1 or 2, we decided against a robustness check excluding those respondents.
As robustness checks, we replicated the multilevel models (1) as fixed-effects models with continuous outcomes and (2) as random-effects models in which the dependent variable is treated as ordinal (see Tables A16–A21 in the Supplementary Materials). Neither alternative approach changed the substantive findings compared with the random-effects models with continuous outcomes.
Preregistered hypotheses
Table 2 shows the descriptive results for each of the 18 vignettes. The level of willingness to provide data for health research ranged from 3.37 for sharing sensor data with a public health agency for a personal benefit to 4.84 for sharing biomarkers with a university research center for a public benefit. Given that the scale ranged from 1, “very unlikely,” to 7, “very likely,” the sharing levels are around the midpoint of the answer scale, with four vignettes showing values above 4.5 and two vignettes showing values below 3.5.
Regarding our hypotheses about the main effect of the vignette experiment, Model 1a in Figure 1 shows that H1.3, which suggested that biomarkers would return the lowest willingness to share, was not supported. On the contrary, respondents reported that they would be significantly more likely to share biomarkers ( $ \hat{\beta} $ = .616, p < .001) and medical records ( $ \hat{\beta} $ = .435, p < .001) compared with sensor data. The main effect hypothesis regarding the recipient (H2.1) suggested that the willingness to share would be highest for university research centers. The data supported this hypothesis, with respondents showing significantly lower willingness to share health data with both other recipients: private companies ( $ \hat{\beta} $ = –.660, p < .001) and public health agencies ( $ \hat{\beta} $ = –.380, p < .001). With respect to the purpose, we expected that respondents would be more willing to share their health data if they anticipated a personal benefit (H3.1). However, the experimental results show that the willingness to share was significantly higher for the vignettes that featured a public benefit as compared to a personal benefit ( $ \hat{\beta} $ = –.256, p < .001). When considering interaction effects, none of our hypotheses about the interaction between data type and recipient (H1.1 and H1.2, Model 1b) and the interaction between recipient and purpose (H3.2, Model 1c) was supported (p ≥ .05).
Figure 2 shows the interaction effects with additional measures. H2.2 suggested that higher levels of trust in the respective recipient will result in a higher willingness to share health data. The experimental results support this hypothesis for the two recipients, private company ( $ \hat{\beta} $ = .117, p < .001, Model 2a) and university ( $ \hat{\beta} $ = .103, p < .001, Model 2c), but not for public agency ( $ \hat{\beta} $ = .025, p = .268, Model 2b). Hypotheses H4.1, H4.2, and H4.3 suggested interaction effects of public purpose with trust in science in general, perceptions of the importance of public duties, and altruism. The interaction effects for trust in science ( $ \hat{\beta} $ = .060, p = .013, Model 3a) and altruism ( $ \hat{\beta} $ = .054, p = .011, Model 3b) were in the expected direction and significant, showing higher willingness to share when they displayed higher values on these covariates, while public duty showed an effect in the expected direction, which was, however, not statistically significant ( $ \hat{\beta} $ = .019, p = .121, Model 3c).
Exploratory analyses
We also included several variables for additional exploratory analyses shown in Figure 3 (see Table A1 in the Supplementary Materials for descriptive results of these additional variables). With respect to demographics, young respondents (18–28 years) reported a significantly higher willingness to share their health data than respondents aged 29 to 64 years (p < .05, Model 4a). The effects of educational attainment and gender were statistically nonsignificant (p > .05). Respondents who owned a smartwatch ( $ \hat{\beta} $ = .300, p = .024) and/or a smartphone ( $ \hat{\beta} $ = .505, p = .022) and respondents with higher levels of technical affinity ( $ \hat{\beta} $ = .038, p < .001) reported a significantly higher willingness to share their data than respondents who did not own either of these devices (Model 4b). Respondents with higher levels of trust in others (i.e., social trust, $ \hat{\beta} $ = .147, p < .001, Model 4c) and respondents who have been confronted with cancer personally or in their close social environment reported a significantly higher willingness to share their health data ( $ \hat{\beta} $ = .271, p = .019, Model 4d). In contrast, respondents with higher privacy concerns reported a significantly lower willingness to share their health data ( $ \hat{\beta} $ = -.267, p < .001). Self-reported political ideology did not affect respondents’ willingness to share their data ( $ \hat{\beta} $ = –.018, p = .509).
Discussion
Summary of results
The results of the vignette experiment confirmed that all three dimensions experimentally tested in our vignette study (data type, recipient, and purpose) significantly influenced individual data sharing decisions. However, the effects of two of the three main effects of vignette dimensions were statistically significant in the opposite direction than hypothesized. Specifically, of our main effects hypotheses, only hypothesis H2.1 regarding the effect of the different recipients on respondents’ data sharing intentions was supported, as university researcher centers were the most accepted recipients. Yet, the hypotheses about interaction effects between the vignette dimensions were not supported. From a CI perspective, this finding is somewhat striking, as we would have expected the effects of single parameters to depend on the specification of the other parameters. One explanation is that the specific data sharing scenarios that we investigated come with similar privacy expectations once they are placed within the respective health contexts. In contrast, most of our hypotheses about interactions with additional measures were supported (e.g., public purpose and altruism), and most of our exploratory analyses showed statistically significant effects (e.g., social trust and privacy concerns). The latter results indicate that general attitudes and characteristics of respondents indeed influenced their willingness to share across scenarios.
With respect to the different data types, our study found that respondents reported higher willingness to share biomarkers and medical records compared with sensor data for health research, which echoes the finding of Beuthner et al. (Reference Beuthner, Silber and Stark2022). A possible reason for this finding is that the threat of out-of-context use for sensor data appeared to be more salient than for the other two data types (Vitak & Zimmer, Reference Vitak and Zimmer2020). Another reason is the hypothetical nature of the outcome variable of our study: respondents may have not considered the higher data sharing effort for biomarkers compared with sensor data.
Our study did not reproduce the result of Gerdon et al. (Reference Gerdon, Nissenbaum, Bach, Kreuter and Zins2021) that respondents were more willing to share their data with a private than with a public recipient. Possible reasons are that we referred to more specific public institutions than Gerdon et al. (Reference Gerdon, Nissenbaum, Bach, Kreuter and Zins2021) and that public trust levels toward public authorities changed during the pandemic. While the willingness to share was the highest for university research centers, respondents were also more likely to be willing to share their data with a public health agency compared with a private company. This finding reinsures confidence in publicly funded health research. However, for data related to current crises or data directly linked to concerns of government surveillance, the findings might be different. Additional research is needed to explore this further.
With respect to the purpose of the data collection, the study showed that respondents were more likely to be willing to share their data in case of a public benefit compared with a personal health recommendation. This finding confirms previous research suggesting that sharing health data in the interest of improving public health aligns with societal norms and is, therefore, highly accepted (Bearth & Siegrist, Reference Bearth and Siegrist2020; Waind, Reference Waind2020). However, our findings do not support the assumption drawn from the privacy calculus (Culnan & Armstrong, Reference Culnan and Armstrong1999), which would have suggested that individuals are more likely to share their data if they expect personal (health) benefits.
Practical implications
Our study illustrated that willingness to share health data is closely connected to individual variables such as institutional and social trust, privacy concerns, altruism, technical affinity, and age. Building on this information, invitation letters to protentional study participants could illustrate the trustworthiness of the respective data recipient and the purpose of the data collection. More generally, and in line with previous research (e.g., Aitken et al., Reference Aitken, Jorre, Pagliari, Jepson and Cunningham-Burley2016; Rosman et al., Reference Rosman, Bosnjak, Silber, Koßmann and Heycke2022; Waind, Reference Waind2020), the findings underline that health research needs to clearly show that it serves public interest to achieve public acceptance. In the invitation letter, researchers should also make sure to address study-specific privacy concerns regarding data collection, storage, and processing. Beyond that, the study suggested that a private company or public health agency, which plans to run a data sharing campaign, may likely increase the trustworthiness of their projects by involving independent university researchers. Finally, the more an institution knows about the data sharing norms, preferences, and privacy concerns of the target population, the more it can tailor the design of the health data collection.
Researchers who are interested in estimating how many participants they need for their study are advised to be mindful that a data sharing process has several steps. In this study, respondents first had to follow the invitation to take part in the survey. They then had to complete the entire survey and provide answers of sufficient quality (e.g., without speeding through the questionnaire). In actual health data collections, individuals would have to answer the request for sharing additional health data affirmatively and complete that data sharing procedure successfully. Yet, for the generalizability of a study, it is not merely important how many people are willing to share their data; it is as critical whether there are specific subgroups of invited persons who are not willing to share their health data (or take part in the survey). For example, if a study is focused on vaccinations against COVID-19 and the realized health data sample only includes people who had at least three vaccinations, important subgroups of the population would be missing, and the generalizability of the study would be limited in that respect. Thus, researchers should always consider both aspects simultaneously, optimizing participation and minimizing sample bias.
Limitations
This research has several limitations. First, we use cancer research as our study topic. While cancer research is less affected by current events than other health research topics, such as the COVID-19 pandemic, it remains an open question to what degree our findings will generalize to other health topics. Cancer research might be perceived as more important than less severe diseases, so that we expect lower data sharing rates for those topics. Second, our study was carried out during the COVID-19 pandemic, when sharing health data might be generally viewed more positively than during times when personal and public health are less salient topics. Third, one might wonder whether our findings will generalize to other countries. While this is again a question for future investigations, research has shown that privacy concerns and related behavior may differ across countries (e.g., Li, Reference Li, Knijnenburg, Page, Wisniewski, Lipford, Proferes and Romano2022; Trepte et al., Reference Trepte, Reinecke, Ellison, Quiring, Yao and Ziegele2017). Moreover, the digitalization of the health system in Germany is not considered very advanced (Bertelsmann Stiftung, 2019). Thus, willingness to share health data may be higher in countries with fewer privacy concerns and/or a higher level of digitalization of the health system. Fourth, our vignette experiment only captures people’s intent to share health data. While this approach allows us to experimentally manipulate several factors at once, it negatively influences the external validity of our study. However, previous research has shown that there is a strong association between intended and actual behavior (e.g., Hainmueller et al., Reference Hainmueller, Hangartner and Yamamoto2015; Petzold & Wolbring, Reference Petzold and Wolbring2018; Sheeran, Reference Sheeran2002), so that we believe that most of our main findings will be directly transferable to “real-world” data sharing situations. An advantage of our hypothetical study is that the results will not be influenced by the specific data sharing method, which can have a large impact on the results (Silber et al., Reference Silber, Breuer, Beuthner, Gummer, Keusch, Siegers, Stier and Weiss2021). Maybe most importantly, researchers should expect substantially lower data sharing rates in studies in which actual data is requested, because the costs for respondents are higher since they have to share their data.Footnote 15 Another aspect that could reduce the data sharing rates in studies that measure actual sharing behavior is that following the request and providing data appears to be socially desirable. Given the lower costs of the hypothetical situation, more people might tend to answer the request affirmatively. Finally, our study uses a nonprobability sample. While prior research has shown that multivariate relationships obtained from such surveys often generalize to the general population, univariate distributions and bivariate associations should be treated with the appropriate caution (Cornesse et al., Reference Cornesse, Blom, Dutwin, Krosnick, De Leeuw, Legleye, Pasek, Pennay, Phillips, Sakshaug, Struminskaya and Wenz2020). However, our study focuses on uncovering multivariate and causal relationships.
Conclusion
Our vignette study showed that the willingness to share health data is highly dependent on the specific data sharing situation. All three vignette dimensions (data type, recipient, and research purpose) significantly affected respondents’ willingness to share their data. Similarly, the additional variables measuring trust, privacy, age, and device ownership affected the reported willingness to share health data. However, we found no meaningful interaction effects between the vignette dimensions. From a CI perspective, this raises questions on the similarity of social norms of data sharing scenarios within specific health contexts. The results suggest that individual data sharing decisions are affected by a multitude of factors, which include the idiosyncrasies of a data sharing situation as well as individual variables. Thus, since data sharing decisions are embedded in complex social contexts, we need to ensure that study design, research infrastructure, and public communication of science, as well as invitations to participate in studies, create a trustworthy environment and aim to foster public benefits.
Supplementary Materials
To view supplementary material for this article, please visit http://doi.org/10.1017/pls.2022.15.
Data availability statement
This article earned Open Materials, Open Data, and Preregistration badges for open scientific practices. The materials, data, and preregistration that support the findings of this study and the award of these badges are openly available at https://doi.org/10.23668/psycharchives.7058 (data and codebook), https://osf.io/p6h7j/ (analyses code in R), and https://osf.io/kgwe7 (preregistration report).
Appendix: Overview of additional measures
The questionnaire was administered in German language.
Cancer exposure
Source: own
Have you, a relative, or a close friend ever been diagnosed with cancer?
-
• Yes
-
• No
-
• I prefer not to say
Device ownership
Source: own
Do you own one or more of the following devices? Please tick all that apply.
-
▪ A desktop computer / PC
-
▪ A laptop / notebook
-
▪ A smartphone
-
▪ A tablet
-
▪ A smartwatch
-
▪ No, none of these devices
Technical affinity
Source: Subscale “General” of the ICT Self-Concept Scale (Schauffel et al., Reference Schauffel, Schmidt, Peiffer and Ellwart 2021 ). Licensed under a CC BY 4.0 International License.
In the following, you will be asked questions about the handling of digital systems. Digital systems are all digital applications (e.g., software or apps) and all digital devices (e.g., computers or smartphones).
I can operate digital systems.
-
• Strongly disagree
-
• Disagree
-
• Slightly disagree
-
• Slightly agree
-
• Agree
-
• Strongly agree
I am good at using digital systems.
-
• Strongly disagree
-
• Disagree
-
• Slightly disagree
-
• Slightly agree
-
• Agree
-
• Strongly agree
I quickly learn when it comes to using digital systems.
-
• Strongly disagree
-
• Disagree
-
• Slightly disagree
-
• Slightly agree
-
• Agree
-
• Strongly agree
It is easy for me to get familiar with new digital systems.
-
• Strongly disagree
-
• Disagree
-
• Slightly disagree
-
• Slightly agree
-
• Agree
-
• Strongly agree
I have always been good at using digital systems.
-
• Strongly disagree
-
• Disagree
-
• Slightly disagree
-
• Slightly agree
-
• Agree
-
• Strongly agree
Political ideology
Source: ESS Round 9: European Social Survey (2021). Licensed under a CC BY-SA 4.0 International License.
In politics people sometimes talk of “left” and “right”. Where would you place yourself on this scale, where 0 means the left and 10 means the right?
-
• 0 – Left
-
• 1
-
• 2
-
• 3
-
• 4
-
• 5
-
• 6
-
• 7
-
• 8
-
• 9
-
• 10 – Right
Public duties
Source: ESS Round 1: European Social Survey (2018). Licensed under a CC BY-SA 4.0 International License.
To be a good citizen, how important would you say it is for a person to…
…support people who are worse off than themselves?
-
• 0 – Extremely unimportant
-
• 1
-
• 2
-
• 3
-
• 4
-
• 5
-
• 6
-
• 7
-
• 8
-
• 9
-
• 10 – Extremely important
…vote in elections?
-
• 0 – Extremely unimportant
-
• 1
-
• 2
-
• 3
-
• 4
-
• 5
-
• 6
-
• 7
-
• 8
-
• 9
-
• 10 – Extremely important
…always obey laws and regulations?
-
• 0 – Extremely unimportant
-
• 1
-
• 2
-
• 3
-
• 4
-
• 5
-
• 6
-
• 7
-
• 8
-
• 9
-
• 10 – Extremely important
Social trust
Source: ESS Round 9: European Social Survey (2021). Licensed under a CC BY-SA 4.0 International License.
In general, do you think that most people can be trusted, or that you can’t be careful enough when dealing with other people?
-
• 0 – You can never be too careful
-
• 1
-
• 2
-
• 3
-
• 4
-
• 5
-
• 6
-
• 7
-
• 8
-
• 9
-
• 10 – Most people can be trusted
Institutional trust
Source (based on): ESS Round 9: European Social Survey (2021). Licensed under a CC BY-SA 4.0 International License.
To what extent do you trust public health agencies in general?
-
• 0 – No trust at all
-
• 1
-
• 2
-
• 3
-
• 4
-
• 5
-
• 6
-
• 7
-
• 8
-
• 9
-
• 10 – Complete trust
To what extent do you trust private companies in general?
-
• 0 – No trust at all
-
• 1
-
• 2
-
• 3
-
• 4
-
• 5
-
• 6
-
• 7
-
• 8
-
• 9
-
• 10 – Complete trust
To what extent do you trust university researchers in general?
-
• 0 – No trust at all
-
• 1
-
• 2
-
• 3
-
• 4
-
• 5
-
• 6
-
• 7
-
• 8
-
• 9
-
• 10 – Complete trust
To what extent do you trust the scientific community in general?
-
• 0 – No trust at all
-
• 1
-
• 2
-
• 3
-
• 4
-
• 5
-
• 6
-
• 7
-
• 8
-
• 9
-
• 10 – Complete trust
Altruism
Source: SOEP-IS Group ( 2021 ). Licensed under a CC BY-SA 4.0 International License.
Now we would like to know how well the following statement describes you as a person.
I am willing to do something for a good purpose without expecting anything in return.
-
• 0 – Does not describe me at all
-
• 1
-
• 2
-
• 3
-
• 4
-
• 5
-
• 6
-
• 7
-
• 8
-
• 9
-
• 10 – Describes me perfectly