1 Introduction
The study of misinformation has garnered significant attention within the social and behavioural sciences (Reference Van Bavel, Baicker, Boggio, Capraro, Cichocka, Cikara, Crockett, Crum, Douglas, Druckman, Drury, Dube, Ellemers, Finkel, Fowler, Gelfand, Han, Haslam, Jetten and WillerVan Bavel et al., 2020; Reference Van der Linden, Roozenbeek, Maertens, Basol, Kácha, Rathje and Steenbuch Trabergvan der Linden et al., 2021). A large variety of assessment tools have been developed to measure misinformation susceptibility (Reference Loomba, de Figueiredo, Piatek, de Graaf and LarsonLoomba et al., 2021; Maertens, Götz, et al., 2022), investigate predictors of why people fall for misinformation (Reference Pennycook and RandPennycook & Rand, 2019; 2020; Reference Van Bavel, Harris, Pärnamets, Rathje, Doell and TuckerVan Bavel et al., 2021; Reference Roozenbeek, Schneider, Dryhurst, Kerr, Freeman, Recchia, van der Bles and van der LindenRoozenbeek et al., 2020), and test the efficacy of interventions (Reference Guess, Lerner, Lyons, Montgomery, Nyhan, Reifler and SircarGuess et al., 2020; Reference Pennycook and RandPennycook et al., 2021; Reference Roozenbeek and van der LindenRoozenbeek & van der Linden, 2019). In doing so, researchers have used a variety of question framings (e.g., eliciting the perceived reliability, manipulativeness, trustworthiness, or accuracy of a set of items, usually news headlines or social media posts) and response modes (i.e., the number of response options, e.g., binary classification, 6-point, or 7-point rating scales). For instance, the work by Pennycook, Rand, and colleagues typically uses a set of real and false news headlines in a Facebook format, where participants are asked to rate the accuracy of each headline on a binary (e.g., “To the best of your knowledge, is this headline accurate? Yes/No”), 4-point, or 6-point scale (Pennycook et al., 2020, 2021; Reference Pennycook and RandPennycook & Rand, 2019; 2020). Similar framings and scales have been used by Fazio (2020) and Guess et al. (2020). Van der Linden, Roozenbeek and colleagues, on the other hand, tend to use social media posts (from Twitter or Facebook) or WhatsApp conversations as stimuli, either with or without source information, asking participants to rate the reliability (Reference Basol, Roozenbeek and van der LindenBasol et al., 2020; Reference Maertens, Roozenbeek, Basol and van der LindenMaertens et al., 2021; Reference Roozenbeek, Maertens, McClanahan and van der LindenRoozenbeek, Maertens, et al., 2021) or manipulativeness (Reference Basol, Roozenbeek, Berriche, Uenal, McClanahan and van der LindenBasol et al., 2021; Reference Saleh, Roozenbeek, Makki, McClanahan and van der LindenSaleh et al., 2021) of these posts on a 7-point scale. Other commonly used question framings in misinformation research include asking participants to rate items’ trustworthiness (Reference McGrewMcGrew, 2020; Roozenbeek, van der Linden, et al., 2022), credibility (Reference Pehlivanoglu, Lin, Deceus, Heemskerk, Ebner and CahillPehlivanoglu, 2021), and whether an item is real/true or fake/false (Maertens, Götz, et al., 2022; Reference Swire, Berinsky, Lewandowsky and EckerSwire et al., 2017).
In general, previous research has found that varying the question framings or response modes can have a significant impact on participants’ responses in a wide array of different domains. Bradburn (1982) and Schwartz (1999), for example, found that question wording matters a great deal when designing surveys (for a review, see Bruine de Bruin, 2011). Andrews (1984) showed that the number of answer scale categories had a big impact on data quality, indicating that the number of response options used in a survey could have a significant effect on the interpretation of different findings. Similarly, Preston and Colman (2000) and Revilla et al. (2014) also found that significant differences arise when varying response modes (see DeCastellarnau, 2018, for an overview). Within the context of misinformation research, this variability can have important consequences. For example, Smith (1995) discusses how self-reported levels of Holocaust denial can vary depending on how a survey question is phrased that seeks to elicit the degree of knowledge that people have about the Holocaust.
No research to date has directly compared the response patterns that are produced when using different question framings and response modes to assess misinformation susceptibility. Hence, it remains unknown whether different studies that ostensibly seek to assess the same construct indeed do so. This is important, because if assessing misinformation susceptibility is robust to different question framings and response modes, then the results of such diverse studies will be directly comparable, for example in meta-analyses. If this is not the case, then not only are the outcomes of studies using different question framings and response modes not directly comparable, but a careful rethinking of which question framings and response modes tap into which exact construct will also be required. We address these key open questions in this study.
This study has two additional goals. First, we investigate the role of confidence judgments in the assessment of misinformation susceptibility across question framings and response modes. The accuracy of people’s confidence in detecting misinformation is crucial for three reasons. Firstly, confidence influences whether people act on their initial (truth) judgment or seek additional information (Reference Berner and GraberBerner & Graber, 2008; Reference Meyer, Payne, Meeks, Rao and SinghMeyer et al., 2013), thereby making accurate confidence judgments a prerequisite for realising the need to verify information (Reference Salovich and RappSalovich & Rapp, 2020). Secondly, the level of confidence a person has in their beliefs affects their willingness and ability to defend these beliefs (Reference Tormala and PettyTormala & Petty, 2004). Individuals who are justifiably confident in their ability to assess the veracity of news content are thus less likely to fall for misinformation (Reference Basol, Roozenbeek and van der LindenBasol et al., 2020; Reference Compton and PfauCompton & Pfau, 2005). Thirdly, people listen more to more confident voices, especially in the absence of cues indicating competence (Reference Price and StonePrice & Stone, 2003; Reference Tenney, Spellman and MacCounTenney et al., 2008), making it crucial to understand how well confidence signals competence. Little is currently known about the extent to which confidence assessments are influenced by the use of different question framings and response modes, leaving a knowledge gap in the cross-study comparability of confidence measures (Reference Basol, Roozenbeek and van der LindenBasol et al., 2020). Additionally, the relationship between various item ratings (e.g., reliability) and accompanying confidence judgments are unclear: Are primary ratings of news headlines an indication of the extent to which participants think the headlines possess a particular continuous property (e.g., more or less reliability), or are ratings simply an indication of the confidence with which they classified the headline into one of two categories (e.g., as being reliable vs. unreliable)? If the latter is the case, then the rating responses collected in previous research may need to be re-interpreted and could possibly be treated as a proxy for confidence (such as in re-analyses).
Second, there is an ongoing discussion about the predictors of misinformation susceptibility (Reference Pennycook and RandPennycook & Rand, 2021; Reference Van Bavel, Harris, Pärnamets, Rathje, Doell and TuckerVan Bavel et al., 2021). Such studies are (again) often conducted using psychometrically non-validated measures. To test various accounts of misinformation belief against each other, their predictions should be compared within a common measurement framework. We therefore compare two overlapping accounts of misinformation susceptibility. The “classical reasoning” account of misinformation belief (Reference Pennycook and RandPennycook & Rand, 2019; 2020) argues that a lack of “reflexive open-mindedness” underlies belief in false news (Reference Pennycook and RandPennycook & Rand, 2020, p. 187), and that motivated or identity-protective thinking plays a relatively minor role (Reference Pennycook and RandPennycook & Rand, 2019, p. 48). This account emphasizes the role of analytical thinking in susceptibility to misinformation (Reference Pennycook and RandPennycook & Rand, 2020). Conversely, an “integrative account”, called for by Van Bavel et al. (2021) and van der Linden (2022), proposes that in addition to purely cognitive factors such as analytical skills, identity-protective thinking, “myside bias”, and political ideology are central factors in predicting misinformation susceptibility. Comparing these two accounts across different assessment methods could bring new insights not only into the nature of misinformation belief, but also into whether different assessment methods yield a comparable nomological net (i.e., a similar profile of predictor coefficients across different assessment methods of misinformation susceptibility).
We therefore explore how well four key psychological factors predict misinformation susceptibility across question framings and response modes: endorsement of actively open-minded thinking (AOT; Reference BaronBaron, 2019; e.g., Maertens, Götz, et al., 2022; Reference Erlich, Garner, Pennycook and RandErlich et al., 2022); political ideology (Reference Van Bavel and PereiraVan Bavel & Pereira, 2018; Reference Van Bavel, Harris, Pärnamets, Rathje, Doell and TuckerVan Bavel et al., 2021); analytical thinking (as assessed by the cognitive reflection test or CRT; Reference Pennycook and RandPennycook & Rand, 2019; 2020); and numeracy skills (Reference Roozenbeek, Schneider, Dryhurst, Kerr, Freeman, Recchia, van der Bles and van der LindenRoozenbeek et al., 2020). AOT is highly sensitive to acceptance of “myside bias” (Reference BaronBaron, 2019, p. 10; Svedholm-Häkkinen & Lindeman, 2018, p. 22), which "occurs when people evaluate evidence, generate evidence, and test hypotheses in a manner biased toward their own prior opinions and attitudes" (Stanovich et al,. 2013), and is claimed to be one of the strongest psychological predictors of misinformation belief (Reference Van Bavel, Harris, Pärnamets, Rathje, Doell and TuckerVan Bavel et al., 2021, p. 96). Political ideology is a measure of partisanship, which is argued to predict false news detection ability (see Reference Van Bavel, Harris, Pärnamets, Rathje, Doell and TuckerVan Bavel et al., 2021; Reference GawronskiGawronski, 2021). The CRT is commonly used as a proxy for analytical thinking, and is also argued to be a strong contributing factor to false news belief (Reference Pennycook and RandPennycook & Rand, 2019; 2020; 2021). Numeracy skills are also an indicator of analytical thinking ability, and were found to be the strongest predictor of lower belief in COVID-19 misinformation across 5 different countries (Reference Roozenbeek, Schneider, Dryhurst, Kerr, Freeman, Recchia, van der Bles and van der LindenRoozenbeek et al., 2020). Although the “integrative account” acknowledges that analytical thinking plays a role in predicting misinformation susceptibility, it also emphasises the role of myside bias and partisanship. According to this model, we thus expect AOT and political ideology to be more consistent predictors of misinformation susceptibility than CRT and numeracy. Conversely, an extreme form of the “classical reasoning” account implies that CRT and numeracy skills are more robust predictors than AOT and political ideology.Footnote 1
1.1 The Misinformation Susceptibility Test (MIST)
To address the above questions, we make use of the Misinformation Susceptibility Test (MIST), a psychometrically validated scale that assesses misinformation susceptibility (Maertens, Götz, et al., 2022). The full version of the MIST (the MIST-20) consists of 10 real and 10 made-up (i.e., false) headlines (without source information and images, both of which can affect people’s news evaluation, see Pehlivanoglu et al., 2021; Reference Zillmann, Gibson and SargentZillmann et al., 1999), obtained via a combination of factor analysis, classical test theory, and item-response theory models. These headlines were tested using Differential Item Functioning (DIF) analysis based on ideology (liberal–conservative), removing all items that lead to measurement inaccuracies due to their ideological slant. The 10 false headlines were created using GPT-2, a text-generating model developed by OpenAI that was trained on a large sample of false headlines. The 10 real headlines were taken from real and legitimate news sources. Because the psychometric properties of the test are known, the MIST is a strong instrument to evaluate misinformation susceptibility, and examine variations across question framings and response modes.
The MIST-8 is a shortened version of the MIST-20, consisting of the eight best-performing headlines from the MIST-20 (four false and four true; see Appendix). Although not preregistered, we report the results for both the MIST-20 and the MIST-8 throughout this paper, to illustrate minor variations that may arise when using subscales of larger item sets.
Performance on the MIST is scored according to three separate metrics: veracity discernment ability (VDA, the ability to discern true from false news), in addition to a real news score (RNS, accuracy in identifying real headlines) and fake news score (FNS, accuracy in detecting false headlines). These scores all correlate strongly with other item sets commonly used in misinformation research, such as the true and false COVID-19 headlines used by Pennycook et al. (2020). For a detailed discussion about the MIST’s design, usage, and psychometric properties, see Maertens, Götz, et al. (2022). The Appendix lists the MIST-20 and MIST-8 headlines.
2 The present study
In this study, we use the MIST-20 and MIST-8 to compare five question framings (eliciting the accuracy, manipulativeness, reliability and trustworthiness of each MIST headline, and whether the headline is real or fake) and three response modes (6-point, 7-point and binary scales) commonly used in misinformation research, as well as the level of confidence that people report to have in their judgment of each MIST headline. As preregisteredFootnote 2, we expect the MIST to yield similar responses for primary headline ratings as well as confidence judgments across different question framings and response modesFootnote 3.
In order to compare the “integrative” and “classical reasoning” accounts of misinformation belief and compare the nomological net of different assessment methods, we conduct a series of analyses (Pearson’s and disattenuated correlations, as well as linear regressions) to examine how MIST-20 and MIST-8 veracity discernment ability (VDA) correlates with actively open-minded thinking (AOT), political ideology, analytical thinking (CRT), and numeracy skills, across response modes and question framings.
Our Open Science Framework (OSF) page contains all the information required to replicate our methods and results, including the raw and cleaned datasets, Qualtrics survey, preregistration, supplementary tables and figures, and our analysis and visualisation scripts: https://osf.io/b9m3k/. Our preregistration can be accessed here: https://aspredicted.org/7ht5z.pdf.
3 Method
3.1 Sample and procedure
We conducted our study on Prolific Academic (Reference Peer, Brandimarte, Samat and AcquistiPeer et al., 2017) using the survey software Qualtrics. We followed previously established guidelines on initial scale development, which recommend recruiting at least 300 respondents per condition (Maertens, Götz, et al., 2022; Reference Clark and WatsonClark & Watson, 2019; Reference Boateng, Neilands, Frongillo, Melgar-Quiñonez and YoungBoateng et al., 2018). As preregistered, we sought to recruit a United-States-based sample of 2,674 participants, with 334 participants per condition. After excluding participants who failed attention checks, we ended up with a slightly smaller final sample of N = 2,622, consisting of 50.6% women (46.9% men, 1.7% non-binary, 0.6% other, 0.3% prefer not to say) with a mean age of 37.1 (SD = 13.7). Participants were, on average, left-leaning (political ideology: M = 3.07, SD = 1.74, on a 7-point scale), and 66.3% reported having obtained at least a bachelor’s degree. Most participants lived in the South (34.4%) or West (25.0%) of the United States (with 22.2% reporting living in the North-East and 18.4% in the Mid-West). Participants were paid GBP 0.55 for their participation. See Table S1 of the for the full sample composition.Footnote 4
The procedure of this study was as follows: after providing informed consent, participants were randomly assigned to one of eight conditions, which differed in their combination of question framings and/or response modes (see below); these combinations were selected based on their use in previous research. In each condition, participants rated the MIST-20 headline set (headline presentation order was random for each participant). Each condition used a different question framing and/or response mode for the primary judgment:
-
1. Accuracy (6 pt.) (n = 326): “How accurate do you find this headline?”, 1 being “not at all” and 6 being “very” (Reference Guess, Lerner, Lyons, Montgomery, Nyhan, Reifler and SircarGuess et al., 2020; Reference Pennycook and RandPennycook et al., 2020).
-
2. Accuracy (7 pt.) (n = 336): “How accurate do you find this headline?”, 1 being “not at all” and 7 being “very”.
-
3. Manipulativeness (7 pt.) (n = 330): “How manipulative do you find this headline?”, 1 being “not at all” and 7 being “very” (Reference Basol, Roozenbeek, Berriche, Uenal, McClanahan and van der LindenBasol, Roozenbeek, et al., 2021; Reference Saleh, Roozenbeek, Makki, McClanahan and van der LindenSaleh, Roozenbeek, et al., 2021).
-
4. Reliability (7 pt.) (n = 331): “How reliable do you find this headline?”, 1 being “not at all” and 7 being “very” (Reference Basol, Roozenbeek and van der LindenBasol et al., 2020; Reference Roozenbeek and van der LindenRoozenbeek & van der Linden, 2019).
-
5. Trustworthiness (7 pt.) (n = 330): “How trustworthy do you find this headline?”, 1 being “not at all” and 7 being “very” (Reference McGrewMcGrew, 2020; Roozenbeek, van der Linden, et al., 2022).
-
6. Real – Fake (6 pt.) (n = 315): “This headline is...”, 1 being “real” and 6 being “fake”.
-
7. Real – Fake (7 pt.) (n = 316): “This headline is...”, 1 being “real” and 7 being “fake”.
-
8. Real – Fake (Binary) (n = 338): “This headline is...”, real or fake as a binary judgment (Maertens, Götz, et al., 2022).
Conditions 2, 3, 4, 5, and 7 differ in their question framing but all use a 7-point scale. Conditions 1 and 6 use different question framings but use a 6-point scale. Conditions 1 and 2 and conditions 6, 7, and 8 use the same question framings (accuracy and real-vs-fake, respectively), but different response modes (6- and 7-point scales in conditions 1 and 2, and 6-point, 7-point, and binary scales in conditions 6, 7, and 8). After indicating their judgment of a headline, participants were also asked to indicate their confidence in their judgment (“How confident are you in your judgment?”, 1 being “not at all” and 7 being “very”; e.g., Saleh et al., 2021).
Participants then completed a series of demographic and other questions in the following order: age, gender, education level, political ideology (from 1 “very liberal” to 7 “very conservative”), political party identification (Democrat/Republican/Independent/Other), US geographic region (West/Mid-West/South/North-East), news consumption (how often people check the news, from 1 “never” to 5 “all the time”), social media use (from 1 “never” to 5 “all the time”), the 10-item actively open-minded thinking scale (AOT; for the specific scale used, see Baron et al., 2022), the 3-item cognitive reflection test (CRT-2, hereafter referred to as CRT; Reference Thomson and OppenheimerThomson & Oppenheimer, 2016), the 3-item Schwartz numeracy test (Reference Schwartz, Woloshin, Black and WelchSchwartz et al., 1997), and a single-item risk literacy test (“which represents the highest risk of something happening: 1 in 10 / 1 in 100 / 1 in 1000”; see Reference Dryhurst, Schneider, Kerr, Freeman, Recchia, van der Bles, Spiegelhalter and van der LindenDryhurst et al., 2020; Reference Roozenbeek, Schneider, Dryhurst, Kerr, Freeman, Recchia, van der Bles and van der LindenRoozenbeek et al., 2020). The Schwartz test and risk literacy test were combined into a single numeracy score, following Roozenbeek et al. (2020). We also recorded participants’ reaction times for both the primary and confidence ratings, as well as whether they had seen or heard about each MIST headline before.Footnote 5 Finally, participants were debriefed about the nature and purpose of the study. Figure 1 shows the study design (with the headline recognition task excluded).
3.2 Analyses
We preregistered and followed the following analysis plan: To determine whether different question framings and response modes measure the same latent construct, we used structural equation modelling (SEM). Specifically, we conducted a measurement invariance test (testing for three sequential levels of invariance: configural, metric, and scalar) in lavaan on the five question framings (only for conditions with 7-point scales) and three response modes (for the 6- and 7-point and binary scales for the real-vs-fake question framing, as well as for the 6- and 7-point scales for the accuracy question framing), for both the MIST-20 and the MIST-8. We also list here the model fit values for each group. Achieving configural invariance — the lowest level of invariance — means that the overall factor structure of the SEM exhibits a similar fit in each group, or that there is a “qualitatively invariant measurement pattern of latent constructs across groups” (Reference Xu and TraceyXu & Tracey, 2017, p. 75). A configural invariance test fits the model onto each group, while leaving factor loadings (the strength of each item’s — in our case a MIST headline — relation to the latent factor) and item intercepts (each item’s initial value) free to vary across groups. Metric invariance (the second level of invariance) is achieved when factor loadings (but not item intercepts) are equivalent across groups, indicating that each scale item (MIST headline) loads onto the model’s latent factor in a similar manner. Finally, scalar invariance means that both factor loadings and item intercepts are equivalent across different groups, indicating that there is very little difference in terms of scale properties between groups (Reference LeeLee, 2018).
Because we expected the structure of misinformation susceptibility to be the same across question framings and response modes, more invariance is more evidence in favour of this hypothesis. However, we recognise that changes in response modes and question framings could result in small changes in the interpretation of individual items, reflected in the factor loadings and intercepts, while still maintaining the general factor structure of the MIST-20 and MIST-8. Scalar invariance would provide excellent evidence and metric invariance very good evidence; we treat at least configural invariance across groups as valid support for the hypotheses (although note that this definition was not preregistered). To test for configural invariance, we fit a multiple-group SEM model and looked at the model fit indices. We expected model fit indices of CFI/TLI > .90 and SRMR/RMSEA < .10 for good fit, and CFI/TLI > .95 and SRMR/RMSEA < .06 for excellent fit (Reference Clark and WatsonClark & Watson, 2019; Reference Finch and WestFinch & West, 1997; Reference Hu and BentlerHu & Bentler, 1999; Reference Pituch and StevensPituch & Stevens, 2015; Reference Schumacker, Lomax and SchumackerSchumacker et al., 2015). To test for metric and scalar invariance, we used a standard chi square invariance test.
Additionally (not preregistered), we compared the eight conditions (using ANOVAs) according to three metrics introduced in Maertens, Götz, et al. (2022), each assessing a different aspect of misinformation susceptibility: veracity discernment ability (i.e., accuracy in discerning real news from false news; VDA), real news score (i.e., accuracy in identifying real headlines; RNS), and fake news score (accuracy in identifying false headlines; FNS). VDA is calculated by standardising each of the responses on a scale from 0 (most incorrect) to 1 (most correct) and taking the mean of the item scores. For more information about how the RNS and FNS scores are calculated, see Supplementary Analysis S1.Footnote 6
With respect to the confidence ratings, we employ an exploratory approach rather than formalised statistical tests. We descriptively compare participants’ mean confidence ratings across conditions. In addition, we investigate the association between the primary headline judgments and the confidence judgments by constructing an implied full-range confidence that ranges from very confident that an item is inaccurate (reliable/non-manipulative, etc.), to very confident that the item is accurate (unreliable/manipulative, etc.).Footnote 7 Then we descriptively compare the within-participant Spearman correlations between a participant’s primary and full-range confidence judgments across conditions; these analyses do not include the binary real–fake condition because in that condition only two distinct responses are possible (“real” or “fake”) and thus no continuous associations can be investigated.
To test whether actively open-minded thinking (AOT), political ideology, analytical thinking (CRT), or numeracy skills are most consistently associated with misinformation susceptibility, we compute both standard and disattenuated Pearson’s correlations between MIST-20 and MIST-8 veracity discernment ability (VDA) and AOT, political ideology, CRT, and numeracy. In addition, we report the correlations of each of these variables with news consumption and participants’ reaction time for the MIST headline ratings (log-transformed); we include these variables to check whether reaction time and news consumption may serve as confounds for the four variables mentioned above. We also estimate a series of linear regressions with AOT, CRT, numeracy and political ideology simultaneously predicting veracity discernment ability as well as participants’ real and fake news scores (RNS and FNS).
4 Results
4.1 Question framings and response modes
We performed a measurement invariance test on the five question framings using a 7-point scale (accuracy, manipulativeness, reliability, trustworthiness, and the 7-point real-vs-fake scale) and the two sets of response modes (the two accuracy conditions and the three real-vs-fake conditions). Table S3 shows the fit values for the configural invariance models across all comparisons, for both the MIST-20 and MIST-8.
With respect to question framings, we found no configural invariance for the MIST-20, indicating that question framings change the psychometric properties of the MIST-20 substantially. For the MIST-8, we found configural invariance but no metric invariance (Δ χ 2 = 73.83, p < .001). These results provide partial support for the hypothesis that different question framings measure the same latent construct.
With respect to response modes, we likewise found no configural invariance for the MIST-20. For the MIST-8, we found metric measurement invariance across all response modes for all three real-vs-fake conditions (Δ χ 2 = 18.51, p = .101) as well as for the two accuracy conditions (Δ χ 2 = 3.42, p = .755). We also find scalar invariance for the 6- and 7-point real-vs-fake scales (Δ χ 2 = 7.11, p = .210). These results indicate that using different response modes does not alter the psychometric properties of the MIST-8 but does alter the properties of the MIST-20, providing partial support for the hypothesis that different response modes measure the same latent construct.
Furthermore, looking at the fit values for the eight conditions (see Table S3), we see that for six out of eight conditions, the MIST-8 SEMs had good fit values, further demonstrating internal consistency across models. Only one of the MIST-20 models (the binary real-vs-fake condition) showed a good fit. However, the MIST-20 generally has a good reliability (McDonald’s ω > .70; Reference McDonaldMcDonald, 1999) in all eight conditions, indicating that the MIST-20 still provides a reliable measure of misinformation susceptibility across all response modes and question framings. Overall, these results show that, although varying question framings and response modes does result in variations in response patterns (particularly for the MIST-20), these variations are relatively minor.
We also compared (not preregistered) the eight conditions in order to gain more insight into between-condition variability in MIST veracity discernment ability (VDA), real news score (RNS), and fake news score (FNS). Figure 2 shows that all three scores are comparable across conditions, except for the binary real-vs-fake scale, which is significantly different from all other conditions in that participants in this condition have higher VDA and RNS (but not FNS). Overall, these results are in line with the results from the SEM analysis, and further support the notion that, minor variations notwithstanding, participants’ MIST headline ratings are similar across question framings and response modes. See Table S4 for the descriptive statistics and Tables S5-S10 for the Games-Howell post-hoc tests for each of the three measures.
4.2 Confidence judgments
Figure 3 shows the distribution of confidence ratings per condition. We find that confidence distributions follow a comparable pattern across all conditions, except for the manipulativeness condition, in which participants gave higher confidence scores (see the non-overlapping confidence intervals in Figure 3). Furthermore, invariance tests using SEMs suggest that configural invariance (but not metric or scalar invariance) was achieved across all conditions (see Table S11). Thus, with the exception of the “manipulativeness” question framing condition, the results support the notion that confidence judgments of real news and false news are not meaningfully affected by the use of different question framings or response modes.
Figures S2 and S3 further show the relationship between participants’ primary judgments and their confidence judgments across conditions, again confirming that both measures are very similar. Figure S4 shows the within-participant Spearman correlations between the primary (headline) and confidence ratings. This correlation is substantial in all conditions (all group medians > .9). These results largely support the notion that MIST headline ratings and confidence judgments measure the same latent construct.
4.3 Comparing two accounts of misinformation susceptibility
To test the “classical reasoning” account against the “integrative” account of misinformation belief, we preregistered exploratory analyses for actively open-minded thinking (AOT), the cognitive reflection test (CRT), and numeracy scales, as well as a single-item measure of political ideology. Table 1 shows the standard and disattenuated Pearson’s correlations between veracity discernment ability (VDA), AOT, CRT performance, numeracy skills, and political ideology, news consumption, and reaction time for MIST headline ratings (log-transformed), separately for the MIST-20 and the MIST-8. The table displays the results for all eight conditions pooled together; see Table S12 for the results per condition. Figure 4 shows the correlations between VDA and AOT, CRT, numeracy, and political ideology (respectively) separated by condition in a series of scatterplots with LOESS curves.
AOT was most strongly correlated with VDA (i.e., lower misinformation susceptibility), followed by political ideology (with participants identifying as left-wing showing generally higher veracity discernment), before numeracy skills and finally CRT performance. Neither news consumption nor reaction time are strongly correlated with VDA. A series of z-tests comparing correlation coefficients (see Table S14) further shows that the raw correlation between VDA and AOT is significantly stronger than for all other variables (all p-values < 0.001). In addition, political ideology is more strongly correlated with VDA than both numeracy and CRT (all p-values < 0.001). However, note that the disattenuated correlation for numeracy is closer to that for AOT, and many subjects were at ceiling on the numeracy measure itself, thus casting doubt on the sufficiency of the correction for its unreliability.
Separating the data by condition shows a similar pattern: AOT and political ideology are strongly and consistently correlated with VDA, whereas CRT and numeracy show weak or no correlations. The only exceptions to this pattern are the binary real-vs-fake condition, where none of the correlations between VDA and the four other variables differ significantly from one another, and the 7-point real-vs-fake condition, where the correlations between VDA and political ideology, numeracy and CRT are not significantly different; see Tables S12 and S13.
To assess the unique predictive contributions of each covariate, we estimated a series of linear regressions with veracity discernment ability (VDA) as the dependent variable and AOT, CRT, numeracy, and political ideology as the independent variables, both for the MIST-20 and the MIST-8.Footnote 8 These regression models corroborate the above findings based on zero-order correlations: AOT is the strongest predictor of higher veracity discernment ability in all conditions, followed by political ideology, numeracy skills, and lastly CRT performance (which does not significantly predict veracity discernment in any conditions except the binary real-vs-fake condition; see Tables S17 and S18). Parallel regression models for the real and fake news scores (RNS and FNS) further corroborate these results (see Tables S19 and S20). Finally, to examine whether the partisan slant of the MIST headlines might influence these findings (particularly because political ideology is such a strong predictor of veracity discernment), we ran the same regression models as above but with the three most partisan (i.e., right-leaning) false MIST headlines excluded; doing so does not alter our conclusions (see Table S22).Footnote 9 Finally, the nomological nets for the MIST-20 and MIST-8 are highly similar across all conditions (see Tables S15-S20), buttressing the earlier finding that different question framings and response modes are broadly comparable when measuring misinformation susceptibility.
Overall, we thus found more support for the “integrative” than the “classical reasoning” account of misinformation belief: actively open-minded thinking (i.e., “myside bias”) and political ideology are both robust predictors of misinformation susceptibility, whereas classical analytic thinking (as measured by both CRT performance and numeracy skills) is not.
5 General Discussion
Misinformation susceptibility has become a popular topic of academic research in recent years. To assess how susceptible individuals are to misinformation, researchers have used a variety of (often ad-hoc) measures, scales, and test item sets, as well as different question framings. While yielding impressive insights, this research has suffered from a lack of standardisation, and thus unclear cross-study comparability. To address this, we set out to examine whether measuring misinformation susceptibility is robust across different question framings and response modes. Moreover, we tested whether confidence judgments are affected by the use of different question framings and response modes, and whether confidence judgments measure the same construct as the primary misinformation ratings. Finally, we tested two well-known accounts of misinformation susceptibility against each other across different assessment methods, using a psychometrically validated scale: the “integrative” account (Reference Van Bavel, Harris, Pärnamets, Rathje, Doell and TuckerVan Bavel et al., 2021) and the “classical reasoning” account (Reference Pennycook and RandPennycook & Rand, 2019; 2020).
5.1 Question framings and response modes
While there are differences across different response modes and question framings when measuring misinformation susceptibility, these differences appear to be minor, particularly for the MIST-8. A confirmatory factor analysis of different question framings (all using the same 7-point scale) showed that at least configural invariance was achieved across conditions for the MIST-8, indicating a (qualitatively) invariant pattern of measurement of latent constructs across conditions (Reference LeeLee, 2018). Thus, while using different question framings for the MIST-8 does not result in exactly the same response patterns, they are similar enough to be broadly comparable. The results are even more robust for the different response modes, showing metric, and in some cases even scalar invariance across conditions, indicating that binary, 6-point, and 7-point scales can be expected to yield highly similar results, at times even down to the level of item intercepts.
For the MIST-20, the results are less clear: Although the fit measures of the SEMs are close to achieving configural invariance, they do not quite do so (see Table S3). However, a supplementary similarity test (looking separately at participants’ veracity discernment ability, real news score, and fake news score; Maertens, Götz, et al., 2022) showed that the between-condition variability for both question framings and response modes, although it does exist, is small (see Figure 2). These results offer external support for the idea that studies within misinformation research that are conceptually about the same thing (e.g., testing the efficacy of an anti-misinformation intervention using a set of test items) can be meaningfully compared to one another, for example in a meta-analysis.Footnote 10
The observed differences between the MIST-20 and the MIST-8 might be a function of the quality of the scale that is used to measure misinformation susceptibility. For example, the MIST-20 uses a wider range of test items than the MIST-8, and therefore potentially measures misinformation susceptibility with higher precision and more reliability (see Table S3). However, the MIST-8 uses only the best and most predictive items of the MIST-20, usually resulting in a much better model fit (again see Table S3). Our findings therefore indicate that better items result in better stability across different response modes and question framings. As most studies that measure misinformation susceptibility use ad-hoc scales and/or tests of limited quality, the measurements may thus differ across response modes and questions framings, highlighting the importance of using psychometrically validated tests.
5.2 Confidence judgments
With respect to confidence, we find that primary headline ratings (e.g., judging the accuracy, reliability, or trustworthiness of an item) and confidence judgments (i.e., the confidence in one’s primary judgment of the item) largely measure the same latent construct. Irrespective of question framing, we find very strong and similar associations between confidence judgments and primary (headline) judgments. Furthermore, we find support for the notion that confidence judgments are largely unaffected by the use of different response modes and question framings. That is, the average level of confidence is comparable across conditions.
The “manipulativeness” question framing (i.e,. “how manipulative do you find this headline?”) behaved somewhat differently than the other (7-point scale) question framings, despite its acceptable fit values in the SEM (see Table S3). The mean confidence judgments were higher in this category compared to all other categories (see Figure 3). It is possible that these differences are due to the fact that asking someone to assess a headline’s degree of manipulativeness is different from assessing its truth value (e.g., by rating its accuracy or whether it is true or false) because even true information can be presented in a manipulative way (e.g., by using emotionally manipulative language; Reference Brady, Wills, Jost, Tucker and Van BavelBrady et al., 2017). However, the same can be argued for the reliability and trustworthiness question framings, both of which behave very similarly to the accuracy and real-vs-fake framings. We encourage further research to gain more insight into this phenomenon, for example by eliciting “top of mind” associations with words such as “manipulativeness”, “reliability” and “accuracy” (see Reference Van der Linden, Panagopoulos and Roozenbeekvan der Linden, Panagopoulos, & Roozenbeek, 2020).
5.3 Comparing the “integrative” and “classical reasoning” accounts
Finally, we compared two well-known (and somewhat overlapping) accounts of misinformation belief: the “integrative” account (Reference Van Bavel, Harris, Pärnamets, Rathje, Doell and TuckerVan Bavel et al., 2021) and the “classical reasoning” account (Reference Pennycook and RandPennycook & Rand, 2019; 2020). The former predicts that in addition to analytical skills, actively open-minded thinking (AOT; Reference BaronBaron, 2019) and political ideology (Reference Van Bavel and PereiraVan Bavel & Pereira, 2018) are consistent predictors of misinformation susceptibility, whereas the latter, in its most extreme form, predicts that analytical thinking (as measured by CRT performance; Reference Pennycook and RandPennycook & Rand, 2019) and/or numeracy skills (a second, somewhat different indicator of analytical reasoning ability; Reference Roozenbeek, Schneider, Dryhurst, Kerr, Freeman, Recchia, van der Bles and van der LindenRoozenbeek et al., 2020) are more strongly associated with belief in misinformation than political ideology and “myside bias”.
Our results show robust support for the “integrative” account compared to the “classical reasoning” account. Overall, although analytical thinking plays a role, a propensity towards “myside bias” and political conservatism are more strongly correlated with misinformation susceptibility than purely cognitive factors such as numeracy skills and especially CRT performance. Specifically, actively open-minded thinking is strongly and consistently correlated with misinformation susceptibility. Susceptibility was also consistently higher among those identifying as more politically conservative, indicating that political partisanship plays an important role in misinformation belief (at least in the United States, where our study was conducted). Conversely, performance on a numeracy task and especially analytical thinking ability (as measured by the CRT) were comparatively weakly associated with misinformation susceptibility, although the numeracy task is less clear because many subjects were at ceiling. This means that the correlation between belief in misinformation and analytical thinking may not be robust across different methods of measurement when using a psychometrically validated instrument. These findings are somewhat inconsistent with prior research, which has identified analytical thinking (as measured by CRT performance) as an important predictor of misinformation susceptibility (Reference Pennycook and RandPennycook & Rand, 2019; 2020).
Previous research has proposed that both AOT and analytical thinking as measured by the CRT predict reasoning ability, and that both should therefore predict a person’s propensity to fall for misinformation, in accordance with the “classical reasoning” account (Reference Erlich, Garner, Pennycook and RandErlich et al., 2022). As predictors of misinformation susceptibility, however, AOT and CRT appear to be distinct: we find no collinearity between AOT and CRT (see Figure S5 and Table S23), and, even when removing AOT from the analysis, CRT remains the weakest predictor of veracity discernment ability in all conditions, after political ideology and numeracy skills (see Table S21). Thus, although both the CRT (Reference FrederickFrederick, 2005) and AOT (Svedholm-Häkkinen & Lindeman, 2018) scales are measures of reflective thinking ability, they measure distinct constructs within the context of misinformation susceptibility. In fact, these results are consistent with Pennycook, Cheyne, et al. (2020) who find that AOT is negatively correlated with belief in conspiracy theories and specifically that “CRT was a weaker (and often non-significant) predictor for every item relative to either [version of the] AOT-E scale” (p. 487). In line with our findings, the authors therefore correctly conclude that AOT is not merely a proxy for analytical thinking (see also Baron et al., 2015, 2022). One possible explanation for this is that the CRT assesses a participant’s ability to correctly solve a set of analytical problems (with correct and incorrect answers), whereas the AOT consists of self-reported agreement about a series of statements about standards for good thinking (such as “people should search actively for reasons why they might be wrong”). In other words, CRT measures cognitive reflection ability (Reference FrederickFrederick, 2005; Reference Thomson and OppenheimerThomson & Oppenheimer, 2016), whereas AOT is sensitive to “myside bias” (Reference BaronBaron, 2019, p. 10; Svedholm-Häkkinen & Lindeman, 2018, p. 22).
Interestingly, Pennycook, Cheyne et al. (2020) also note that the role of AOT was more pronounced for Democrats than Republicans. For example, they found that higher AOT scores were negatively associated with belief in conspiracy theories for Democrats but that this relationship was not significant for Republicans. In contrast, we find that for both Democrats and Republicans, AOT is strongly correlated with veracity discernment ability.Footnote 11
We note several limitations about our study. First, while we made efforts to recruit a large and diverse sample, it was not quota-matched, and we only recruited participants who were United States residents. Importantly, while our sample is well-balanced in terms of gender (with approximately 50% of the sample identifying as female) and US region (Table S1), it is not representative of the US population in terms of age or ethnic/racial background. We do note, however, that the fit values of the binary real-vs-fake condition’s SEM were highly similar to those reported by Maertens, Götz, et al. (2022), who made use of several representative samples to assess the validity of the MIST. In addition, Maertens, Götz, et al. (2022) ran studies on different recruitment platforms (Respondi, Prolific, and CloudResearch), as well as in different countries (the United States and the United Kingdom), reporting a high degree of robustness and consistency of the MIST. We thus have good reason to assume that our findings would be highly similar if a representative sample was obtained, or if we had run our study in the UK. Finally, it could be argued that the MIST is not an ecologically valid way of assessing misinformation susceptibility, as the test consists of (partially computer-generated) headlines, without source information, formatting, or other information that would ordinarily accompany a news headline in a real-world environment. We note, however, that the MIST was tested against more ecologically valid item sets such as those used by Reference Pennycook and RandPennycook and Rand (2019) and Maertens et al. (2021), showing very strong correlations (Maertens, Götz, et al., 2022). In addition, as the MIST has the advantage (over the currently available more ecologically valid tests) that it is psychometrically validated, we argue that the MIST was the most reliable instrument to use for the present study design.
6 Conclusion
This study is a first attempt at bringing together the large variety of assessment methods used to measure misinformation susceptibility. First of all, we conclude that the use of different question framings and especially response modes should not be expected to yield meaningfully different responses (at least when using the same item set). This finding is of key importance for researchers seeking to compare different studies (e.g., when comparing the efficacy of different anti-misinformation interventions, or for meta-analyses and systematic reviews). We conclude that such comparisons can be safely conducted without a significant risk of similar studies inadvertently assessing fundamentally different constructs. This is good news for the misinformation research community, as there is an urgent need to bring structure to the wide variety of approaches, methodologies, and frameworks that have been employed so far. We therefore encourage future work to use our findings as a starting point for further systematising misinformation research.
Second, people’s confidence in their primary judgments of true and false news headlines is not meaningfully affected by the question framing or response mode used to elicit this judgment. In addition, primary news headline ratings and confidence ratings measure a similar construct. This opens up the possibility for treating headline ratings as a proxy for confidence: for example, rating a false headline as highly manipulative is a strong indicator of high confidence that it is manipulative. Our findings may therefore act as a bridge to connect the sub-discipline of confidence (and metacognition) to misinformation susceptibility.
Finally, we tested two general approaches to prediction of misinformation susceptibility against each other: the “integrative” account (Reference Van Bavel, Harris, Pärnamets, Rathje, Doell and TuckerVan Bavel et al., 2021; Reference Van der Lindenvan der Linden, 2022), which emphasises the role of myside bias and political partisanship, and the “classical reasoning” account, which argues that a lack of analytical thinking (Reference Pennycook and RandPennycook & Rand, 2020, p. 186) is most useful in predicting susceptibility. Our study supports the former over the latter: cognitive factors and analytical thinking (i.e., CRT and numeracy skills) were consistently weaker predictors of belief in misinformation than open-mindedness (i.e., “myside bias”) and political ideology. Thus, although cognition and analytical thinking ability can play a role, the ability to consider the viewpoints of those one disagrees with, as well as partisanship and identity-related motivations, appear to be more predictive of misinformation susceptibility. As active open-mindedness was the strongest and most consistent predictor across conditions, we highlight the need to further explore the role of thinking standards as part of an integrated account of misinformation belief.