1 Introduction
A wealth of evidence suggests that, over the course of decision making, the evaluation of new information can become biased (Reference Russo, Medvec and MeloyRusso, Medvec & Meloy, 1996; Reference Russo, Meloy and MedvecRusso, Meloy & Medvec, 1998; Reference Holyoak and SimonHolyoak & Simon, 1999; Reference Russo, Meloy and WilksRusso, Meloy & Wilks, 2000; Reference Simon, Pham, Le and HolyoakSimon, Pham, Le & Holyoak, 2001; Reference BrownsteinBrownstein, 2003; Reference Simon, Snow and ReadSimon, Snow & Read, 2004; Reference Carlson, Meloy and RussoCarlson, Meloy & Russo, 2006; Reference Russo, Carlson and MeloyRusso, Carlson & Meloy, 2006; Reference DeKay, Patiño-Echeverri and FischbeckDeKay, Patiño-Echeverri, & Fischbeck, 2009; Reference DeKay, Stone and MillerDeKay, Stone & Miller, 2011; Reference Kostopoulou, Russo, Keenan, Delaney and DouiriKostopoulou, Russo, Keenan, Delaney & Douiri, 2012; Reference Miller, DeKay, Stone and SorensonMiller, DeKay, Stone & Sorenson, 2013; Reference Blanchard, Carlson and MeloyBlanchard et al., 2014; Reference DeKay, Miller, Schley and ErfordDeKay et al., 2014). This “predecisional information distortion” (referred to simply as “distortion” hereafter) occurs when the value ascribed to new information is altered to support an emerging preference or belief (Reference Russo, Meloy and MedvecRusso et al., 1998).
Distortion is thought to enable and maintain cognitive coherence, the consistency between new and processed information (Reference Holyoak and SimonHolyoak & Simon, 1999; Reference Simon, Pham, Le and HolyoakSimon et al., 2001; Reference Simon, Snow and ReadSimon et al., 2004; Reference Russo, Carlson, Meloy and YongRusso, Carlson, Meloy & Yong, 2008; Reference Svenson and JakobssonSvenson & Jakobsson, 2010; Reference Glöckner, Betsch and SchindlerGlöckner, Betsch & Schindler, 2010). In the context of choice, coherent representations could maximize confidence (Reference Simon, Snow and ReadSimon et al., 2004), justifiability (Reference Montgomery and SvensonMontgomery & Svenson, 1983; Reference TyszkaTyszka, 1998; Reference Svenson and JakobssonSvenson & Jakobsson, 2010), cognitive efficiency (Reference Russo, Medvec and MeloyRusso et al., 1996; Reference Russo, Meloy and MedvecRusso et al., 1998; Reference Simon, Snow and ReadSimon et al., 2004; Reference Russo, Carlson, Meloy and YongRusso et al., 2008) and positive affect (Reference MeloyMeloy, 2000; Reference Svenson and JakobssonSvenson & Jakobsson, 2010). More broadly, coherent representations allow for identification, prediction and perhaps exploitation of consistent relationships in the environment (Reference Simon, Snow and ReadSimon et al., 2004). Coherence has thus been thought to shape processes as diverse as visual perception (Reference Maloney, Martello, Sahm and SpillmannMaloney, Martello, Sahm & Spillmann, 2005) and attitude formation (Reference Read and MillerRead & Miller, 1993).
Distortion is widespread and robust. It manifests across domains (Reference BrownsteinBrownstein, 2003; Reference Kostopoulou, Russo, Keenan, Delaney and DouiriMiller et al., 2013), in the decisions of lay people (Reference Simon, Snow and ReadSimon et al., 2004; Reference Levy and HersheyLevy & Hershey, 2008; Reference DeKay, Miller, Schley and ErfordDeKay et al., 2014) and professionals alike (Reference Tyszka and WielochowskiTyszka & Wielochowski, 1991; Reference Russo, Meloy and WilksRusso et al., 2000; Reference Kostopoulou, Russo, Keenan, Delaney and DouiriKostopoulou et al., 2012). It has been linked to suboptimal decisions (Reference Russo, Carlson and MeloyRusso et al., 2006; Reference Kostopoulou, Russo, Keenan, Delaney and DouiriKostopoulou et al., 2012) and appears to withstand incentives for accuracy (Reference Russo, Meloy and WilksRusso et al., 2000; Reference Meloy, Russo and MillerMeloy, Russo & Miller, 2006).
Distortion could thus pose a threat to decisions of consequence. A case in point is medical diagnosis, where information arrives sequentially during the doctor-patient encounter and must be evaluated in light of competing diagnostic hypotheses, one of which may be leading at any given time. If the leading diagnostic hypothesis is incorrect, distortion could foster overconfidence and undue commitment to it (Reference Kostopoulou, Russo, Keenan, Delaney and DouiriKostopoulou et al., 2012), paving the way for diagnostic error and/or inadequate treatment (Reference Kostopoulou, Mousoulis and DelaneyKostopoulou, Mousoulis & Delaney, 2009).
While distortion has been established in medical diagnosis (Reference Kostopoulou, Russo, Keenan, Delaney and DouiriKostopoulou et al., 2012), little is known about the specific processes underlying it. Physicians may overestimate the extent to which the evidence supports a leading diagnostic hypothesis and/or underestimate the extent to which the evidence supports a competing, trailing hypothesis. Much of the distortion literature does not differentiate between these two processes. Distortion is typically conceptualized and measured as the overall advantage afforded a leading alternative (e.g., Russo et al., 1998; Reference Kostopoulou, Russo, Keenan, Delaney and DouiriKostopoulou et al., 2012); thus, the relative contribution of these two modes of distortion cannot be determined.
Researchers have only recently started exploring this. While we were engaged in the analysis and interpretation of the data reported here, two related publications appeared in the literature. Blanchard and colleagues (2014) and DeKay and colleagues (2014) adapted the traditional “stepwise evolution of preference” (SEP) paradigm (Reference Russo, Meloy and MedvecRusso et al., 1998) so as to measure distortion separately for a leading and a trailing alternative. Both identified simultaneous “proleader” and “antitrailer” distortion, with no reliable difference in their magnitudes.
Both of these studies involved lay participants evaluating consumer goods (e.g., backpacks, apartments). Although distortion has been found across populations and tasks, its properties may vary. For example, Russo and colleagues found distortion to be higher among salespersons than auditors (Reference Russo, Meloy and WilksRusso et al., 2000). Kostopoulou and colleagues found distortion in family physicians to be among the lowest reported in the literature (Reference Kostopoulou, Russo, Keenan, Delaney and DouiriKostopoulou et al., 2012).
We conducted two studies to investigate distortion in the diagnostic judgments of practicing physicians, where we measured separately distortion in relation to a leading diagnosis and distortion in relation to a competing, trailing diagnosis. We used the same materials and SEP methodology as Kostopoulou et al. (2012), who also studied distortion in physicians’ diagnostic judgments. In the study of Kostopoulou and colleagues, family physicians read three medical cases, rating the extent to which every new item of clinical information supported one diagnostic hypothesis over its competitor. Distortion was measured as the overall advantage afforded the leading diagnosis. In our studies, we adapted SEP to our purposes: family physicians rated the extent to which each new item supported each of the two competing diagnoses separately. Study 1 measured distortion relative to the responses of a separate control group (“mean-based” distortion, DeKay et al., 2011), while Study 2 measured distortion relative to the participants’ own responses obtained under control conditions on a separate occasion (“personalized” distortion, DeKay et al., 2011).
2 Study 1
2.1 Aim
To investigate the processes underlying distortion in the diagnostic judgments of family physicians: distortion in relation to a leading hypothesis and distortion in relation to a trailing hypothesis.
2.2 Methods
2.2.1 Participant recruitment
UK family physicians and residents were recruited in person or via email. Participants recruited in person were identified at medical conferences. Those recruited via email were identified from a database of family physicians who had participated in previous studies by the second author. Each participant received a £10 Amazon voucher as a token of appreciation.
2.2.2 Materials
We employed three patient scenarios, constructed by Kostopoulou et al. (2012). They described a patient with fatigue (which could be due to either diabetes or depression), a patient with dyspnea (which could be caused by either chronic lung disease or heart failure) and a patient with chest pain (which could be either musculoskeletal or cardiac in origin). Each scenario comprised a short introduction (patient name, age and health complaint) followed by a sequence of cues. Each cue contained information that could have been obtained through questions to the patient, physical examination or laboratory tests. Some cues were neutral, providing equal support for the two competing diagnoses; other cues were diagnostic, providing support for one of the two competing diagnoses (Kostopoulou et al., 2008, 2012). All study materials were presented as online questionnaires, using Qualtrics.
2.2.3 Procedure
Physicians were randomly assigned to either an experimental or a control condition. Physicians in the experimental group read the three scenarios in a random order. Each scenario began with a “steer”, i.e., three diagnostic cues providing strong support for one of the two diagnoses. For each scenario, half of the physicians were steered towards diagnosis A while half were steered towards diagnosis B (random assignment). Four neutral cues were then presented sequentially; each of these provided equal support for the two competing hypotheses. In the third scenario only, the neutral cues were followed by three diagnostic cues, intended to conflict with the initial steer (“conflicting” cues). These were the same three diagnostic cues used to form the “steer” for the opposing diagnosis; that is, the three cues that were presented so as to steer one physician towards diagnosis A were the same three cues that were presented as conflicting information to a physician steered towards diagnosis B. Therefore, in the third scenario only, all physicians saw exactly the same items of information, albeit in different orders.
Participants were required to respond to each item in turn. After they read the steer, they provided an estimate of diagnostic likelihood on a 21-point Visual Analogue Scale (VAS), anchored at “diagnosis A more likely” and “diagnosis B more likely”. Participants then assessed the diagnostic value of each neutral cue (and each conflicting cue, in scenario 3) for each of the two competing diagnoses. The manner in which they did so was the sole difference between the study by Kostopoulou et al. (2012) and the present one. In the 2012 study, participants rated the diagnostic value of each cue using a single 21-point VAS anchored at “favors diagnosis A” and “favors diagnosis B”. Hence, distortion was measured as the overall advantage afforded a leading diagnostic hypothesis. In the present study, participants rated the diagnostic value of each cue in relation to each diagnosis, using two 11-point VASs anchored at “no support” and “strong support” (Figure 1). Hence, distortion was measured separately for the leading and the trailing diagnostic hypotheses. Following each cue evaluation, participants updated their estimate of diagnostic likelihood, based on all information seen up to that point (Figure 2). They therefore provided three estimates in response to each cue (Figures 1 & 2).
Physicians in the control group evaluated the same cues as the experimental group. They used the same 11-point VASs (Figure 3) and therefore provided two ratings per cue. However, the control group did not have the opportunity to develop a leading diagnosis that could bias their cue evaluations (Reference Russo, Meloy and MedvecRusso et al., 1998; Reference Kostopoulou, Russo, Keenan, Delaney and DouiriKostopoulou et al., 2012). This was achieved in a number of ways:
-
1) All cues from the three scenarios were collected and scrambled, i.e., presented in a random order, different for each physician.
-
2) Each cue pertained to a new patient, introduced by a unique letter rather than a name (e.g., Patient A, Patient G), a health complaint (fatigue, dyspnea or chest pain) and minimal demographic information (sex and age).
-
3) Patient age was varied by a maximum of 4 years above or below the age specified in the corresponding scenario, to prevent participants from linking patients with the same health complaint and building a coherent representation. An experienced family physician and study co-author deemed that “…such small variations in age were not clinically significant” (Reference Kostopoulou, Russo, Keenan, Delaney and DouiriKostopoulou et al., 2012, p. 834).
-
4) Three decoy cues pertaining to entirely different pairs of diagnoses were included.
-
5) Participants were never asked to provide estimates of diagnostic likelihood.
2.2.4 Measuring of distortion
We measured distortion in two ways: the traditional way that averages cue ratings given by the control group to produce a point estimate of the “unbiased” rating per cue (“mean-based” method, DeKay et al., 2011), and a new way that takes into account the variation in control cue ratings.
2.2.5 The traditional way of measuring distortion
In most studies of distortion where participants use a single scale to rate the cues, distortion of a cue is calculated as the difference between an experimental participant’s cue rating and the mean cue rating by the control group. This difference is then signed as positive or negative depending on which option was leading just before the experimental participant rated the cue (“leader-signed” distortion, Russo et al., 1998).
In our study, physicians in both the experimental and control groups gave two ratings per cue, one in relation to diagnosis A and another in relation to diagnosis B. For each cue, we averaged the ratings of the control group in relation to each competing diagnosis, producing two mean control ratings per cue.
Each physician in the experimental group received two distortion scores per cue: one score in relation to the diagnosis that was leading just before the cue was evaluated and another score in relation to the diagnosis that was trailing just before the cue was evaluated. The diagnoses that were leading and trailing at any given time were identified from the physician’s most recent estimate of diagnostic likelihood (Figure 2).
To calculate distortion in relation to the leading diagnosis, we computed the difference between 1) a physician’s rating of a cue in relation to the leading diagnosis and 2) the mean control rating of the same cue in relation to the same diagnosis. A positive score indicated that the diagnostic value of information was overestimated to strengthen the leading diagnosis (“proleader distortion”, Blanchard et al., 2014; Reference DeKay, Miller, Schley and ErfordDeKay et al., 2014).
To calculate distortion in relation to the trailing diagnosis, we computed the difference between 1) a participant’s rating of a cue in relation to the trailing diagnosis and 2) the mean control rating of the same cue in relation to the same diagnosis. We then reversed the sign of the resulting difference, so that a positive score indicated that the diagnostic value of information was distorted to weaken the trailing diagnosis (“antitrailer distortion”, Blanchard et al., 2014; Reference DeKay, Miller, Schley and ErfordDeKay et al., 2014). Thus, positive distortion scores always indicated distortion in the predicted direction: strengthening the leading diagnosis and weakening the trailing diagnosis.
A new way of measuring distortion.
Mean-based distortion does not take into account the error of estimating the mean of the control group. If measured against an inflated mean control rating, proleader distortion would be underestimated and antitrailer distortion overestimated. Similarly, if measured against a diminished mean control rating, proleader distortion would be overestimated and antitrailer distortion underestimated.
To measure distortion in a way that accounts for the variance in the control cue ratings, we ran two 2-level mixed effects models: one to measure distortion in relation to the leading diagnosis and another to measure distortion in relation to the trailing diagnosis. We regressed the raw cue ratings on the study group (experimental vs. control), so that ratings cast under experimental conditions would be compared with ratings cast under control conditions, separately when a diagnosis was leading and when a diagnosis was trailing.
2.2.6 Sample size
In a linear regression of distortion on the estimated likelihood of a leading diagnosis, Kostopoulou et al. (2012) found that a 1-unit increase in diagnostic likelihood was associated with a 0.3-unit increase in physicians’ distortion on the next cue (slope = 0.3, p < 0.01). We estimated that to detect a similar association between diagnostic likelihood and distortion (the sum of proleader and antitrailer), with power of 0.8 and α = 0.05, we would need at least 71 participants in the experimental group. Likewise, the size of our control group was based on that of Kostopoulou et al. (2012) (n = 36).
2.3 Results
Of the 197 physicians e-mailed, 95 participated (48%). We recruited 44 additional participants at conferences, resulting in a final sample of 139 physicians: 50% female, 9% residents in family medicine, 28 to 64 years of age (M = 39.3, SD = 8.9, median = 36.0), with 0 to 36 years in family medicine (M = 10.1, SD = 9.5, median = 6.0). Demographics were comparable across the experimental (n = 96) and control (n = 43) groups.
2.3.1 Distortion in relation to the leading and trailing diagnoses
The traditional way of measuring distortion.
We averaged distortion in relation to the leading diagnoses across cues, per physician. We did the same for distortion in relation to the trailing diagnoses. One-sample t tests revealed that mean distortion in relation to the leading diagnoses was not significantly different from 0, while mean distortion in relation to the trailing diagnoses occupied almost one unit of the cue evaluation scale (Table 1). Paired-samples t tests revealed no reliable differences in the distortion of neutral vs. diagnostic cues (mean difference for distortion in relation to the leading diagnoses = 0.32 [−0.04, 0.68], t(95) = 1.76, p = 0.08, d = 0.18; mean difference for distortion in relation to the trailing diagnoses = 0.07 [−0.27, 0.41], t(95) = 0.38, p = 0.70, d = 0.04).
A new way of measuring distortion.
Variation in control cue ratings was substantial (mean SD = 2.07). Distortion in relation to leading diagnoses was not significant in the regression model: slope = 0.22 [−0.23, 0.66], p = 0.33. In contrast, the model that measured distortion in relation to trailing diagnoses found substantial antitrailer distortion: slope = −1.11 [−1.52, −0.70], p <0.01. Thus, the new method of measuring distortion confirmed the findings of the traditional way of measuring distortion.Footnote 1, Footnote 2
Individual differences.
The two modes of distortion, each averaged per physician, were reliably different from one another (mean difference = 0.69 [0.23,1.15], t (95) = 3.01, p < 0.01, d = 0.31). We explored this further using paired-samples t tests per physician. We identified 17 physicians who displayed reliably more proleader than antitrailer distortion, and 31 physicians who displayed the opposite tendency. The remaining 48 physicians (50% of the sample) did not exhibit significant differences between the two modes of distortion.
To explore whether the tendency toward proleader vs. antitrailer distortion was consistent across scenarios, we calculated each physician’s mean proleader distortion and mean antitrailer distortion per scenario (excluding conflicting cues in the third scenario), and subtracted antitrailer from proleader distortion. A positive score for a scenario would indicate more proleader than antitrailer distortion, while a negative score would indicate the opposite. Cronbach’s α for the three scores was 0.79, suggesting that the tendency for proleader vs. antitrailer distortion was consistent for a given physician across scenarios.
2.3.2 Distortion and diagnostic likelihood
We used a 2-level linear regression model with random intercept to investigate whether the estimated likelihood of the leading diagnosis accounted for the distortion on the next cue (Reference DeKay, Patiño-Echeverri and FischbeckDeKay et al., 2009; Reference Kostopoulou, Russo, Keenan, Delaney and DouiriKostopoulou et al., 2012). Separate models were created for each mode of distortion. The models used the distortion scores per cue, pairing each with the immediately preceding estimate of diagnostic likelihood. In both models, estimated diagnostic likelihood accounted for distortion on the next cue: slope = 0.14 [0.09, 0.19], p <0.01 for distortion in relation to the leading diagnosis; slope = 0.09 [0.04, 0.13], p <0.01 for distortion in relation to the trailing diagnosis.
We investigated whether each mode of distortion influenced the final diagnostic estimates in each scenario, after all the neutral cues had been rated and before any conflicting cues were seen in the third scenario (for comparability across scenarios). Table 2 shows the proportion of physicians who started and finished on the same side of the diagnostic likelihood VAS. It also shows the proportion of physicians who remained on the same side of the diagnostic likelihood VAS throughout a scenario. We excluded participants whose rating on the diagnostic likelihood scale was 0 (i.e., equal likelihood), either at the start or after the neutral cues were seen (n = 11 for chest pain, n = 19 for dyspnea, n = 15 for fatigue).
1 Fisher’s Exact p < 0.01
To explore the influence of each mode of distortion on final diagnostic estimates (i.e., estimates after all the neutral cues were seen and evaluated), we conducted a four-step analysis (Reference DeKay, Miller, Schley and ErfordDeKay et al., 2014).
-
1) We averaged per physician:
-
a. distortion in relation to Diagnosis 1 (D1) when it was leading,
-
b. distortion in relation to D1 when it was trailing,
-
c. distortion in relation to Diagnosis 2 (D2) when it was leading, and
-
d. distortion in relation to D2 when it was trailing.
-
-
2) We reversed the signs for b and c, so that higher scores always favored D1.
-
3) We took the average of a and c (distortion in relation to leading diagnoses) and the average of b and d (distortion in relation to trailing diagnoses).
-
4) We used hierarchical linear regression to assess the relationship of these two averages with the final diagnostic likelihood (−10 = “D2 more likely”, 0 = “equally likely”, 10 = “D1 more likely”). We controlled for the diagnostic steer (counterbalanced across participants) as follows: initial diagnostic likelihood was the sole predictor in the first run of the model (block 1) and the two modes of distortion were added together subsequently (block 2).
Finally, we compared the coefficient for proleader with that for antitrailer distortion in each scenario, to determine any differences in their magnitude.
Our findings, presented in Table 3, were consistent across scenarios. The initial estimate of D1 likelihood was the strongest determinant of the final estimate of D1 likelihood. However, distortion to favor D1 made a small, independent contribution, with significant input from both proleader and antitrailer distortion. In two scenarios, dyspnea and fatigue, the two modes of distortion had roughly equal influence upon final judgments: F (1, 90) = 0.02, p = 0.88 for dyspnea; F (1, 91) = 2.44, p = 0.12 for fatigue. The influence of proleader distortion was significantly weaker than that of antitrailer distortion in the chest pain scenario: F (1, 91) = 4.74, p = 0.03.
Note: Participants who did not develop a leading diagnosis in a given scenario were excluded from the analysis (n = 1 for chest pain, n = 2 for dyspnea, n = 1 for fatigue). Conflicting cues in the third scenario were excluded from the calculations.
Variance explained Total expresses, as a percentage, the Adjusted R Square statistic for the full model.
Variance explained Distortion expresses, as a percentage, the R Square Change statistic for the distortion component of the model.
2.3.3 Final diagnosis in the third scenario
At the start of the third scenario, the steer was successful in installing the intended leading diagnosis in 81 of the 96 physicians (84%); eight physicians considered the competing diagnosis more likely (8%), while the remaining seven physicians (7%) considered the two competing diagnoses equally likely (0-midpoint of the scale). At the end of the third scenario, after physicians had evaluated the three cues that opposed the initial steer, 32 considered the steer as more likely (33%), 52 physicians considered the competing diagnosis more likely (54%), and the remaining 12 considered the two diagnoses equally likely (13%).
3 Discussion
We measured physicians’ distortion of information in relation to a leading and a trailing diagnosis against the mean ratings of a separate control group. We also measured distortion using multilevel linear regression that bypassed the need to use mean control ratings as the baseline. The two ways of measuring distortion produced consistent findings. On average, we found minimal distortion to strengthen a leading diagnosis (proleader) but considerable distortion to weaken a competing, trailing diagnosis (antitrailer). However, analysis of proleader and antitrailer distortion per physician suggested individual differences, with a minority of physicians displaying predominantly proleader distortion. The higher the estimated likelihood of a leading diagnosis, the larger was each mode of distortion on the next cue. Increases in both modes of distortion were associated with increased final estimates of diagnostic likelihood. At the end of the third scenario, after physicians evaluated cues that conflicted with the initial steer, only about a third of the sample ended up considering the steered diagnosis more likely.
4 Study 2
4.1 Introduction
We expect that physicians’ cue ratings are informed by their unique constellation of medical knowledge and experiences. Therefore, variance resulting from individual differences in prior knowledge could be wrongfully attributed to distortion. The most valid estimate of distortion in medical diagnosis might thus be a “personalized” one (Reference DeKay, Stone and MillerDeKay et al., 2011), where each participant’s distortion is measured relative to his/her own baseline ratings of cues.
DeKay and colleagues (2011, 2014) compared the personalized and mean-based measures of distortion directly. The personalized method did not provide a superior estimate: in fact, it was less precise than the mean-based one (Reference DeKay, Stone and MillerDeKay et al., 2011). However, prior knowledge (or preference) was unlikely to influence cue evaluation in the study of DeKay et al. (2011), where undergraduates evaluated hypothetical gambles with which they had little or no experience. In DeKay et al.’s (2014) study, individual differences in preference for apartment features were clearly present, but the personalized and mean-based distortion measures still performed very similarly. Study 2 of the current article compared the personalized and mean-based estimates of distortion among physicians, whose prior knowledge and experience were relevant to the task at hand.
Study 1 revealed individual differences in the mode of distortion displayed. On average, physicians displayed predominantly antitrailer distortion, but a minority displayed predominantly proleader distortion. The dominant mode of distortion was consistent across scenarios. Study 2 tested two potential correlates of proleader and antitrailer distortion: the Personal Need for Structure (PNS) and the Personal Fear of Invalidity (PFI) (Reference Thompson, Naccarato, Parker and MoskowitzThompson, Naccarato, Parker & Moskowitz, 2001). The PNS captures “…the need to create and maintain simple structures” (Reference Neuberg, Judice and WestNeuberg, Judice & West, 1997, p. 1396; Reference Neuberg and NewsomNeuberg & Newsom, 1993). Individuals with high PNS tend to assimilate incoming information to preexisting or emerging judgments. Therefore, we expected these persons to display greater distortion. The PFI measures the “…fear of making judgmental errors” (Reference Neuberg, Judice and WestNeuberg et al., 1997, p. 1404). Individuals with high PFI tend to display ambivalent attitudes and indecisiveness (Reference Neuberg, Judice and WestNeuberg et al., 1997; Reference Thompson, Naccarato, Parker and MoskowitzThompson et al., 2001). Therefore, we expected them to provide lower estimates of diagnostic likelihood, which would in turn reduce the magnitude of distortion. Both scales have been shown to be valid and reasonably reliable (Reference Thompson, Naccarato, Parker and MoskowitzThompson et al., 2001).
Previous attempts to identify personality variables that moderate distortion have generally proven fruitless. Russo et al. (1998) found no relationship between distortion and the Preference for Consistency (Reference Cialdini, Trost and NewsomCialdini, Trost & Newsom, 1995) or the Myers-Briggs dimension of judgment, while Meloy (2000) found distortion to be unrelated to the Need to Evaluate (Reference Jarvis and PettyJarvis & Petty, 1996), the Need for Cognitive Closure (Reference Kruglanski, Webster and KlemKruglanski, Webster & Klem, 1993; Reference Webster and KruglanskiWebster & Kruglanski, 1994; Reference Kruglanski and WebsterKruglanski & Webster, 1996) and the PNS. However, these studies measured distortion as the total advantage afforded a leading alternative. Study 2 measured distortion in relation to competing diagnoses separately. It was therefore the first to explore whether personality variables might moderate mode of distortion (proleader and antitrailer) rather than total distortion.
4.2 Aims
-
1 To compare the mean-based and personalized estimates of distortion, in participants with task-related prior knowledge.
-
2 To explore potential sources of individual differences in the mode and magnitude of distortion, specifically, the Personal Need for Structure and Personal Fear of Invalidity.
Methods
4.2.1 Participant recruitment
We invited by e-mail UK family physicians and residents who had taken part in previous studies by the second author. Each participant received a £35 Amazon voucher as a token of appreciation. The voucher was of greater value than in Study 1, as participants were required to participate on two separate occasions and complete two additional questionnaires (PNS and PFI). We did not recruit physicians who had taken part in Study 1.
4.2.2 Materials and procedure
Only two changes were made to the materials and procedure of Study 1. Firstly, the study followed a within-participant design. Each participant completed both study conditions (experimental and control), in a counterbalanced order and with an interval of one month between conditions. The one-month interval was intended to remove potential carry-over effects (Reference DeKay, Stone and MillerDeKay et al., 2011). Secondly, after a participant had completed both conditions, s/he was asked to complete two measures of individual differences: Neuberg and Newsom’s (1993) abbreviated version of the Personal Need for Structure scale (PNS, Thompson et al., 2001) and the Personal Fear of Invalidity scale (PFI, Thompson et al., 2001). Participants indicated their agreement with each of the 11 items of the PNS scale (e.g., I don’t like situations that are uncertain) and each of the 14 items of the PFI scale (e.g., I tend to struggle with most decisions). Agreement was rated on a six-point scale (1 = “strongly disagree” to 6 = “strongly agree”). The order in which the PFI and PNS scales were completed was counterbalanced across participants.
4.2.3 Calculation of distortion
The two modes of distortion (in relation to the leading and the trailing diagnoses) were calculated relative to the participant’s own ratings provided under control conditions on a separate occasion (“personalized distortion”). In an attempt to replicate the results of Study 1 with a new sample of physicians, we also measured distortion relative to the mean ratings that the whole Study 2 sample provided under control conditions (“mean-based distortion”).
4.3 Results
Of the 187 UK family physicians e-mailed, 91 participated (49%), a response rate almost identical to that in Study 1 (48%). Two were excluded from the analyses because they did not complete the second questionnaire. A further two were excluded because we subsequently discovered that they had participated in Study 1. Finally, two more were excluded because they got in touch to let us know that they had misunderstood the response scales. Our final sample consisted of 85 family physicians: 46% female, 1% residents, 25 to 63 years of age (M = 40.5, SD = 8.7, median = 37.0), with 0 to 35 years in family medicine (M = 11.3, SD = 9.1, median = 8.00). The sample was therefore comparable to that of Study 1, except that it contained a lower proportion of trainees (1% vs. 9%).
4.3.1 Distortion in relation to the leading and trailing diagnoses
The two modes of personalized distortion (personalized: calculated relative to each physician’s own control ratings) were each averaged across cues per physician. One-sample t tests revealed that personalized distortion in relation to the leading diagnoses was close to zero, while personalized distortion in relation to the trailing diagnoses was nearly one scale unit (Table 4). Personalized distortion did not differ between neutral and diagnostic cues (mean difference for proleader distortion = 0.31 [−0.03, 0.66], t (84) = 1.80, p = 0.08, d = 0.19; mean difference for antitrailer distortion = 0.10 [−0.29, 0.48], t (84) = 0.50, p = 0.62, d = 0.06).
The two modes of mean-based distortion (mean-based: calculated relative to the mean ratings of the Study 2 sample cast under control conditions) were each averaged across cues per physician. As in Study 1, one-sample t tests revealed that mean-based distortion in relation to the leading diagnoses approached zero, while mean-based distortion relation to the trailing diagnoses approached one unit on the cue evaluation scale (Table 4). As in Study 1, mean-based distortion did not differ between neutral and diagnostic cues (mean difference for proleader distortion = 0.29 [−0.08, 0.65], t (84) = 1.56, p = 0.12, d = 0.17; mean difference for antitrailer distortion = 0.11 [−0.26, 0.47], t (84) = 0.58, p = 0.57, d = 0.06). Thus, Study 2 replicated the findings of Study 1 in terms of mean-based distortion in relation to leading and trailing alternatives. Furthermore, the two different methods of measuring distortion (mean-based and personalized) produced very similar estimates of distortion magnitude and variance (Table 4).Footnote 3
We compared the mean-based and personalized distortion estimates formally, using two paired-samples t tests, one per mode of distortion. Each t test compared the mean-based and personalized estimates of distortion for each physician. We found no statistical differences between the mean-based and personalized estimates for either mode of distortion (mean difference for proleader distortion = 0.10 [−0.17, 0.37], t (84) = 0.73, p = 0.47, d = 0.08; mean difference for antitrailer distortion = 0.12 [−0.12, 0.37], t (84) = 0.99, p = 0.33, d = 0.10).
As in Study 1, the two distortion modes were reliably different from each other, whether measured using the mean-based method (mean difference = 0.76 [0.26, 1.25], paired-samples t (84) = 3.03, p < 0.01, d = 0.33) or the personalized method (mean difference = 0.73 [0.31, 1.16], paired samples t (84) = 3.42, p < 0.01, d = 0.37). With the mean-based method, we identified 18 physicians (21%) who exhibited predominantly proleader distortion (18% in Study 1) and 30 physicians (35%) exhibiting predominantly antitrailer distortion (32% in Study 1). With the personalized method, we identified fewer physicians in each group: 9 physicians (11%) exhibited predominantly proleader distortion and 23 physicians (27%) exhibited predominantly antitrailer distortion. As in Study 1, the dominant mode of distortion was consistent across scenarios: Cronbach’s α = 0.81 for mean-based estimates (Cronbach’s α = 0.79 in Study 1), and Cronbach’s α = 0.67 for personalized estimates.
4.3.2 Distortion and diagnostic likelihood
As in Study 1, the estimated likelihood of the diagnosis that was leading at any one time was significantly and positively associated with both modes of personalized distortion on the next cue: slope = 0.14 [0.08, 0.20], p < 0.01 for proleader, and slope = 0.10 [0.05, 0.16], p < 0.01 for antitrailer distortion. Almost identical slopes were obtained for mean-based distortion: slope = 0.14 [0.08, 0.20], p < 0.01 for proleader, and slope = 0.10 [0.04, 0.15], p < 0.01 for antitrailer distortion.
Table 5 shows the proportion of physicians who started and finished on the same side of the diagnostic likelihood VAS. It also shows the proportion of physicians who remained on the same side of the diagnostic likelihood VAS throughout a scenario.
* Fisher’s Exact p < 0.01.
We excluded participants whose initial and/or final rating on the diagnostic likelihood scale was 0 (i.e., equal likelihood) (n = 11 for chest pain, n = 18 for dyspnea, n = 19 for fatigue).
As in Study 1, we used hierarchical linear regression to assess the influence of the two modes of distortion on final estimates of diagnostic likelihood (after all neutral cues but before the conflicting cues in the third scenario), controlling for initial estimates. Separate models were created for personalized and mean-based distortion. Conflicting cues in the third scenario were excluded from the calculations. The results are reported in Table 6.
Note: Participants who did not develop a leading diagnosis in a given scenario were excluded from the analysis (n = 1 for chest pain, n = 1 for dyspnea, n = 1 for fatigue).
Variance explained Total expresses, as a percentage, the Adjusted R Square statistic for the full model.
Variance explained Distortion expresses, as a percentage, the R Square Change statistic for the distortion component of the model.
Shaded cells denote departures from the findings of Study 1.
By and large, the personalized and mean-based models resemble those of Study 1: initial diagnostic likelihood had the strongest influence on final diagnostic likelihood, with distortion making a smaller, independent contribution. In two scenarios, dyspnea and fatigue, both modes of distortion were associated with final likelihood estimates, with roughly equal contributions thereto: personalized F (1, 80) = 2.20, p = 0.14, and mean-based F (1, 80) = 0.00, p = 0.94 for dyspnea; personalized F (1, 80) = 1.78, p = 0.19, and mean-based F (1, 80) = 0.18, p = 0.67 for fatigue. No association was observed for proleader distortion in the chest pain scenario, where the contribution of antitrailer distortion was significantly greater: personalized F (1, 80) = 7.64, p < 0.01, and mean-based F (1, 80) = 11.45, p < 0.01.
4.3.3 Final diagnosis in the third scenario
At the start of the third scenario, the steer was successful in installing the intended leading diagnosis in 74 of the 85 physicians (87%); five physicians considered the competing diagnosis more likely (6%), while six physicians considered the two competing diagnoses equally likely (7%). At the end of the third scenario, after physicians had evaluated the three cues that opposed the initial steer, 30 considered the steer as more likely (35%); 46 considered the competing diagnosis as more likely (54%), and the remaining nine physicians considered the two diagnoses equally likely (11%).
4.3.4 Individual differences measures
Responses to items of the Personal Need for Structure scale (PNS) were summed per participant. The mean PNS score was 41.4 (SD = 8.4, range 21 to 61). PNS did not correlate with either mode of distortion, whether calculated using the personalized method (leader distortion r = 0.04, p = 0.74; trailer distortion r = -0.06, p = 0.59) or the mean-based method (leader distortion r = 0.08, p = 0.48; trailer distortion r = -0.11, p = 0.30).
Responses to items of the Personal Fear of Invalidity (PFI) measure were also summed per participant. The mean PFI score was 45.4 (SD = 9.0, range 24 to 69). We found no significant relationship between PFI scores and initial estimates of diagnostic likelihood (Pearson r = −0.06, p = 0.57). As such, we no longer expected PFI score to correlate with either mode of distortion: personalized proleader distortion r = −0.06 (p = 0.58), mean-based r = −0.02 (p = 0.88); personalized antitrailer distortion r = 0.06 (p = 0.62), mean-based r = −0.09 (p = 0.40).
4.4 Discussion
Study 2 replicated the findings of Study 1, using both a personalized and a mean-based method for calculating distortion. On average, physicians displayed minimal distortion to strengthen a leading diagnosis and substantial distortion to weaken a trailing diagnosis. Again, we found individual differences in the mode of distortion, with a minority of physicians displaying predominantly proleader distortion. As in Study 1, the higher the estimated likelihood of a leading diagnosis, the larger was each mode of distortion on the next cue. An increase in either mode of distortion to favor one diagnosis tended to increase final estimates of its likelihood. This association was consistent across all three scenarios for antitrailer but not for proleader distortion. As in Study 1, at the end of the third scenario, after physicians evaluated cues that conflicted with the initial steer, only about a third of the sample ended up considering the steered diagnosis more likely.
Despite the expected relevance of prior knowledge to the task at hand, the personalized method for measuring distortion was statistically equivalent to the mean-based method. Neither mode of distortion correlated with Personal Need for Structure or Personal Fear of Invalidity.
5 General Discussion
In two studies, using two different samples of family physicians and two different methods of measuring predecisional information distortion in medical diagnosis, we divided distortion to its potential constituent modes: strengthening a leading diagnostic hypothesis or weakening a competing, trailing hypothesis. On average, we found consistent evidence for distortion to weaken a trailing hypothesis but not to strengthen a leading hypothesis. Only a minority of physicians engaged predominantly in proleader distortion. Physicians’ tendency to engage in one or the other mode of distortion was consistent across clinical scenarios. We explored two potential sources of individual differences, namely, Personal Need for Structure and Personal Fear of Invalidity. Consistent with previous research (Reference Russo, Meloy and MedvecRusso et al., 1998; Reference MeloyMeloy, 2000; Reference Russo, Meloy and WilksRusso et al., 2000), personality measures were not related to distortion.
In both studies, proleader and antitrailer distortion had similar effects upon final diagnostic judgments: an increase in either mode of distortion to favor a diagnosis was associated with increased final estimates of its likelihood. The influence of proleader distortion seemed weaker and less consistent across scenarios than that of antitrailer distortion, though the difference was significant for only one scenario in one study. Nonetheless, to the extent that proleader distortion occurred, it displayed the expected relations with emerging and final estimates of diagnostic likelihood (Reference DeKay, Miller, Schley and ErfordDeKay et al., 2014).
Two other research groups, entirely independently and almost simultaneously with our studies, used similar methods to investigate the different modes of distortion among lay people choosing consumer goods. They found evidence for both proleader and antitrailer distortion, which were of similar magnitude (Reference Blanchard, Carlson and MeloyBlanchard et al., 2014; Reference DeKay, Miller, Schley and ErfordDeKay et al., 2014).
The inconsistency with our findings could suggest that the processes underlying information distortion are specific to the study population and task (Reference DeKay, Miller, Schley and ErfordDeKay et al., 2014). There are plausible reasons why physicians might distort information to weaken a trailing diagnostic alternative rather than strengthen a leading one. Firstly, physicians are trained to generate multiple diagnostic hypotheses (a set of “differentials”) for the presenting problem. Subsequent information search aims to narrow down the set by excluding hypotheses rather than simply confirming a leading hypothesis (Reference Elstein, Shulman and SprafkaElstein, Shulman & Sprafka, 1978), though the extent to which physicians do this in practice may vary. If their approach is indeed to exclude rather than confirm, then distorting information to weaken a trailing hypothesis may be more beneficial than distorting information to strengthen a leading one. If antitrailer distortion is sufficient in helping physicians to exclude the competing diagnosis, then they may not need to engage in proleader distortion as well.
Reference Meloy and RussoMeloy and Russo (2004) found that distortion (conceptualized and measured as a single process) increased when there was a match between decision strategy (select vs. reject alternatives) and valence of alternatives (positive vs. negative): it was greatest when participants were required to select one of two positive alternatives or reject one of two negative alternatives. Their findings demonstrate that decision strategy can affect the magnitude of distortion; therefore, it may also affect the mode of distortion. Further research could explore the possibility that physicians’ predominant diagnostic strategy is responsible for the predominant mode of distortion found in our studies.
Secondly, the consequences of a misdiagnosis can be severe, arguably more severe than the consequences of selecting an inferior consumer item. Physicians may thus be prudent in evaluating diagnostic hypotheses. In our studies, this may influence their cue ratings within the diagnostic task (experimental condition) and not their ratings of random and seemingly unrelated cues (control condition). A conservative approach to the diagnostic task would curtail proleader but not antitrailer distortion. Future work could explore whether physicians’ perceived risk, inherent in the diagnostic task, is responsible for the predominant mode of distortion found in our studies.
Our findings have implications for theories of cognitive consistency, which suggest that information is distorted to maximize consistency between previously observed evidence, hypotheses and newly arriving evidence. As both modes of distortion can work to increase consistency, these accounts would predict the occurrence of both. The present findings pose a challenge to these accounts, calling for more research into the factors that might encourage one mode of distortion over another.
We note a difference between our findings on the final diagnosis in the third scenario and those of Kostopoulou et al. (2012). Across both studies reported here, 34% of physicians considered the steer as the most likely diagnosis after they evaluated the conflicting cues, in contrast to 49% reported by Kostopoulou and colleagues. Furthermore, 12% of physicians across both our studies considered the two competing diagnoses equally likely at the end of the third scenario, in contrast to 6% of physicians in the 2012 study. In summary, more physicians changed their diagnosis, and more gave the “normative” response of equal likelihood—normative because the net information in the third scenario (comprising steer cues, neutral cues, and conflicting cues) favored neither diagnosis. As the only methodological difference between the 2012 study and our two studies consists of the different cue evaluation scales (comparative vs. separate), this seems the most likely source of the different findings: the separate scales may have forced physicians to recognize evidence-based support for the opposing diagnosis at the end of the scenario, making it hard to dismiss. Although it is tempting to suggest that the separate scales operated akin to a “consider-the-opposite” debiasing strategy (Reference LarrickLarrick, 2004), most physicians did not become more accurate but simply demonstrated a recency effect: switching to the opposite side of the diagnostic scale was more common than judgments of equal likelihood, suggesting that physicians placed more weight on the final cues. Nevertheless, judgments of equal diagnostic likelihood were somewhat more common than in the 2012 study, providing some hope that forced consideration of two possible outcomes could improve physicians’ diagnostic judgments.
We employed and compared two different methods for measuring distortion in physicians’ diagnostic judgments: a traditional “mean-based” method (Study 1) and a “personalized” method that has rarely been used (Study 2). In agreement with previous research (DeKay et al., 2011, 2014), they returned comparable results. This lends no support to the hypothesis that a personalized approach would outperform a mean-based one, when prior knowledge and experience are relevant to the task at hand. The logistics of following up participants, in our case physicians, to obtain their responses on a second occasion, while ensuring a time interval that reduces the likelihood of carry-over effects, are hard to achieve. Therefore, the mean-based method that relies on a separate control group offers an easier and equally valid alternative. However, the mean-based method is based on averaging the cue ratings of the control group; this ignores the error around these ratings, which could result in erroneous estimates of distortion. To address this, we developed a new strategy for analyzing SEP data. We used multilevel regression to compare cue ratings cast under experimental vs. control conditions, thus taking into account the variation in control group ratings. This analysis returned results consistent with the mean-based approach. This analytical strategy could be used to supplement and validate mean-based findings in future studies.